tgravescs commented on pull request #29895: URL: https://github.com/apache/spark/pull/29895#issuecomment-700866136
I'm fine with changing the default. I was trying to figure out cases when a user would really see this. The MapReduce paradigm and Spark rely on the output of tasks being deterministic. If they are not they have other issues with retries and the output has no guarantees. I thought Spark had deterministic output path naming but I was just starting to make sure I was remembering properly. If those are true. I think that just leaves the _SUCCESS file thing. Which I can see if people don't check would be a problem. Are there cases I'm missing here? Are there cases cloud providers or other tools are changing the output paths or something? @steveloughran did you see this in a particular situation? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
