[GitHub] spark pull request #20525: [SPARK-23271[SQL] Parquet output contains only _S...

dilipbiswal Fri, 09 Feb 2018 00:01:05 -0800

Github user dilipbiswal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20525#discussion_r167160086
  
    --- Diff: docs/sql-programming-guide.md ---
    @@ -1930,6 +1930,9 @@ working with timestamps in `pandas_udf`s to get the 
best performance, see
         - Literal values used in SQL operations are converted to DECIMAL with 
the exact precision and scale needed by them.
         - The configuration `spark.sql.decimalOperations.allowPrecisionLoss` 
has been introduced. It defaults to `true`, which means the new behavior 
described here; if set to `false`, Spark uses previous rules, ie. it doesn't 
adjust the needed scale to represent the values and it returns NULL if an exact 
representation of the value is not possible.
     
    + - Since Spark 2.3, writing an empty dataframe (a dataframe with 0 
partitions) in parquet or orc format, creates a format specific metadata only 
file. In prior versions the metadata only file was not created. As a result, 
subsequent attempt to read from this directory fails with AnalysisException 
while inferring schema of the file. For example : 
df.write.format("parquet").save("outDir")
    --- End diff --
    
    "launches at least one write task"
    Actually isn't it exactly one write task ? I am okay with what you have. 
Just wanted to check to make sure.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #20525: [SPARK-23271[SQL] Parquet output contains only _S...

Reply via email to