[GitHub] spark pull request #20525: [SPARK-23271[SQL] Parquet output contains only _S...

cloud-fan Fri, 09 Feb 2018 00:16:11 -0800

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20525#discussion_r167162376
  
    --- Diff: docs/sql-programming-guide.md ---
    @@ -1930,6 +1930,8 @@ working with timestamps in `pandas_udf`s to get the 
best performance, see
         - Literal values used in SQL operations are converted to DECIMAL with 
the exact precision and scale needed by them.
         - The configuration `spark.sql.decimalOperations.allowPrecisionLoss` 
has been introduced. It defaults to `true`, which means the new behavior 
described here; if set to `false`, Spark uses previous rules, ie. it doesn't 
adjust the needed scale to represent the values and it returns NULL if an exact 
representation of the value is not possible.
     
    + - Since Spark 2.3, writing an empty dataframe to a directory launches at 
least one write task, even if physically the dataframe has no partition. This 
introduces a small behavior change that for self-describing file formats like 
Parquet and Orc, Spark creates a metadata-only file in the target directory 
when writing an empty dataframe, so that schema inference can still work if 
users read that directory later. The new behavior is more reasonable and more 
consistent regarding writing empty dataframe.
    --- End diff --
    
    `Spark creates a metadata-only file in the target directory when writing an 
empty dataframe` -> `0-partition dataframe`. We only get a behavior change if 
the dataframe has no partition.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #20525: [SPARK-23271[SQL] Parquet output contains only _S...

Reply via email to