Github user ScrapCodes commented on the issue:
https://github.com/apache/spark/pull/15258
As we suspected, _SUCCESS file does not appear on copying file in HDFS. So
it can not be a trusted way to know, that input directory is not partial output
of a failed job. Problem, that spark can process a truncated output of a failed
job seems like a broader problem and might be okay to not address as part of
this JIRA ?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]