gengliangwang commented on issue #26671: Revert "[SPARK-26081][SPARK-29999]"
URL: https://github.com/apache/spark/pull/26671#issuecomment-558831120
 
 
   @HeartSaVioR I have updated the PR description from
   ```
   We found a bug on SPARK-26081 and SPARK-29999 was proposed to fix it, but we 
decided to revert both as it's too costly to apply SPARK-29999 for SPARK-26081; 
SPARK-26081 may be resubmitted if there's viable approach for dealing with bug.
   ```
   to
   ```
   For Spark file sources, in case of an empty job, we leave the first 
partition to save meta for file format like parquet.
   After the changes in SPARK-26081, CSV/JSON/TEXT won't be able to output an 
empty file for an empty job. This optimization causes a problem in 
`ManifestFileCommitProtocol`: the API `newTaskTempFile` is called without 
actual file creation. Then `fs.getFileStatus` throws FileNotFoundException 
since the file is not created.
   
   SPARK-29999 fixes the problem. But it is too costly to check file existence 
on each task commit. We should simply restore the behavior before SPARK-26081.
   ```
   
   So that the context is more straightforward to developers.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to