[GitHub] spark pull request: [SPARK-1133] add small files input in MLlib

mengxr Wed, 19 Mar 2014 00:58:27 -0700

Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/164#issuecomment-38025512
  
    If I understand the purpose correctly, this PR is for reading small text 
files. But most of the code is to handle the corner case when a file's size is 
greater than 2GB. You mentioned Mahout hit this problem. What was the use case 
there? If someone needs to concat several 2GB bytes buffers to create a single 
Text record, very likely he/she are not doing the right thing, IMHO.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1133] add small files input in MLlib

Reply via email to