Github user tigerquoll commented on the pull request:
https://github.com/apache/spark/pull/5250#issuecomment-87822606
To sum up the debate so far there doesn't appear to be any major concerns
about leaving the system in an inconsistent state. The major concern seems to
be about swallowing of the exception and not letting any sign of it propagate
back up to the main flow of execution, with the additional potential risk that
the fact we've done so could lead to non-deterministic results.
By swallowing non-deterministic exceptions, we are explicitly converting
non-deterministic application crashing to potentially non-deterministic data -
no getting around this fact.
I'd like to achieve an outcome here, and I'm not really wedded to the means
that outcome is achieved. As a counter-proposal, how about a new HadoopRDD
derivative that works with Try((K,V))? We propagate the exception cleanly back
to calling code and allow them to deal with it explicitly. it is slightly more
cumbersome to use, but you'd only use it when you are expecting corrupted data
back, and this allows an easy manipulation of data that contains potential
exceptions.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]