Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/23052
First of all, sometimes we do need to write "empty" files, so that we can
infer schema of a parquet directory. Empty parquet file is not really empty, as
it has header/footer. https://github.com/apache/spark/pull/20525 guarantees we
always write out at least one empty file.
One important thing is, when we write out an empty dataframe to file, and
read it back, it should still be an empty dataframe. I'd suggest we skip empty
file in text-based data sources, and later on send a followup PR to not write
empty text files, as a perf improvement.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]