GitHub user zheh12 opened a pull request: https://github.com/apache/spark/pull/21286
[SPARK-24194] HadoopFsRelation cannot overwrite a path that is also b⦠## What changes were proposed in this pull request? When there are multiple tasks at the same time append a `HadoopFsRelation`, there will be an error, there are the following two errors: 1. A task will succeed, but the data will be wrong and more data than excepted will appear 2. Other tasks will fail with `java.io.FileNotFoundException: Failed to get file status skip_dir/_temporary/0` The main reason for this problem is because multiple job will use the same `_temporary` directory. So the core idea of this `PR` is to create a different temporary directory with jobId for the different Job in the `output` folder , so that conflicts can be avoided. ## How was this patch tested? I manually tested. But I don't know how to write a unit test for this situation. Please help me. Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/zheh12/spark SPARK-24238 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21286.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21286 ---- commit b676a36af110b0b7d7dfc47ab292d09c441f6a0f Author: yangz <zheh12@...> Date: 2018-05-10T01:46:49Z [SPARK-24194] HadoopFsRelation cannot overwrite a path that is also being read from ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org