[GitHub] spark pull request: Generic Binary File Support in Spark

mateiz Wed, 30 Jul 2014 16:20:08 -0700

Github user mateiz commented on the pull request:

    https://github.com/apache/spark/pull/1658#issuecomment-50693743
  
    Do you mind opening a JIRA issue on 
https://issues.apache.org/jira/browse/SPARK to track this?
    
    Also, I wonder if we should make the API just return an RDD of 
InputStreams. That way users can read directly from a stream and don't need to 
load the whole file in memory into a byte array. The only awkward thing is that 
calling cache() on an RDD of InputStreams wouldn't work, but hopefully this is 
obvious (and will be documented). Or if that doesn't sound good, we could 
return some objects that let you open a stream repeatedly (some kind of 
BinaryFile object with a stream method).



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Generic Binary File Support in Spark

Reply via email to