Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/20525#discussion_r167162376
--- Diff: docs/sql-programming-guide.md ---
@@ -1930,6 +1930,8 @@ working with timestamps in `pandas_udf`s to get the
best performance, see
- Literal values used in SQL operations are converted to DECIMAL with
the exact precision and scale needed by them.
- The configuration `spark.sql.decimalOperations.allowPrecisionLoss`
has been introduced. It defaults to `true`, which means the new behavior
described here; if set to `false`, Spark uses previous rules, ie. it doesn't
adjust the needed scale to represent the values and it returns NULL if an exact
representation of the value is not possible.
+ - Since Spark 2.3, writing an empty dataframe to a directory launches at
least one write task, even if physically the dataframe has no partition. This
introduces a small behavior change that for self-describing file formats like
Parquet and Orc, Spark creates a metadata-only file in the target directory
when writing an empty dataframe, so that schema inference can still work if
users read that directory later. The new behavior is more reasonable and more
consistent regarding writing empty dataframe.
--- End diff --
`Spark creates a metadata-only file in the target directory when writing an
empty dataframe` -> `0-partition dataframe`. We only get a behavior change if
the dataframe has no partition.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]