Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/20525#discussion_r167158260
--- Diff: docs/sql-programming-guide.md ---
@@ -1930,6 +1930,9 @@ working with timestamps in `pandas_udf`s to get the
best performance, see
- Literal values used in SQL operations are converted to DECIMAL with
the exact precision and scale needed by them.
- The configuration `spark.sql.decimalOperations.allowPrecisionLoss`
has been introduced. It defaults to `true`, which means the new behavior
described here; if set to `false`, Spark uses previous rules, ie. it doesn't
adjust the needed scale to represent the values and it returns NULL if an exact
representation of the value is not possible.
+ - Since Spark 2.3, writing an empty dataframe (a dataframe with 0
partitions) in parquet or orc format, creates a format specific metadata only
file. In prior versions the metadata only file was not created. As a result,
subsequent attempt to read from this directory fails with AnalysisException
while inferring schema of the file. For example :
df.write.format("parquet").save("outDir")
--- End diff --
`Since Spark 2.3, writing an empty dataframe to a directory launches at
least one write task, even physically the dataframe has no partition. This
introduces a small behavior change that for self-described file formats like
Parquet and Orc, Spark creates a metadata-only file in the target directory
when writing 0-partition dataframe, so that schema inference can still work if
users read that directory later. The new behavior is more reasonable and more
consistent regarding writing empty dataframe.`
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]