[GitHub] spark pull request: [CORE] [SPARK-6593] Provide option for HadoopR...

tigerquoll Mon, 30 Mar 2015 13:30:13 -0700

Github user tigerquoll commented on the pull request:

    https://github.com/apache/spark/pull/5250#issuecomment-87822606
  
    To sum up the debate so far there doesn't appear to be any major concerns 
about leaving the system in an inconsistent state.  The major concern seems to 
be about swallowing of the exception and not letting any sign of it propagate 
back up to the main flow of execution, with the additional potential risk that 
the fact we've done so could lead to non-deterministic results.
    
    By swallowing non-deterministic exceptions, we are explicitly converting 
non-deterministic application crashing to potentially non-deterministic data - 
no getting around this fact.
    
    I'd like to achieve an outcome here, and I'm not really wedded to the means 
that outcome is achieved. As a counter-proposal, how about a new HadoopRDD 
derivative that works with Try((K,V))?  We propagate the exception cleanly back 
to calling code and allow them to deal with it explicitly.  it is slightly more 
cumbersome to use, but you'd only use it when you are expecting corrupted data 
back, and this allows an easy manipulation of data that contains potential 
exceptions.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [CORE] [SPARK-6593] Provide option for HadoopR...

Reply via email to