[GitHub] spark issue #23052: [SPARK-26081][SQL] Prevent empty files for empty partiti...

cloud-fan Wed, 21 Nov 2018 18:26:14 -0800

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/23052
  
    First of all, sometimes we do need to write "empty" files, so that we can 
infer schema of a parquet directory. Empty parquet file is not really empty, as 
it has header/footer. https://github.com/apache/spark/pull/20525 guarantees we 
always write out at least one empty file.
    
    One important thing is, when we write out an empty dataframe to file, and 
read it back, it should still be an empty dataframe. I'd suggest we skip empty 
file in text-based data sources, and later on send a followup PR to not write 
empty text files, as a perf improvement.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23052: [SPARK-26081][SQL] Prevent empty files for empty partiti...

Reply via email to