[GitHub] spark pull request: [CORE] [SPARK-6593] Provide a HadoopRDD varian...

srowen Mon, 06 Apr 2015 01:00:07 -0700

Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/5368#issuecomment-89965168
  
    Hm, so you can't get about the same behavior as here with an `InputFormat`? 
I suppose not 100%. At best it could check for problems opening the underlying 
streams ahead of time and not return bad splits, which could cover this 
specific case and maybe anything else of the form "block / file is corrupted". 
There is still the possibility of corrupt data later. The nice thing is that 
this would address other data sources, not just Hadoop files. I wonder if 
that's worth the tradeoff, but then again I have not looked into it at all to 
understand whether even that much is possible.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [CORE] [SPARK-6593] Provide a HadoopRDD varian...

Reply via email to