[GitHub] spark pull request #22453: [SPARK-20937][DOCS] Describe spark.sql.parquet.wr...

HyukjinKwon Mon, 24 Sep 2018 09:09:41 -0700

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22453#discussion_r219895824
  
    --- Diff: docs/sql-programming-guide.md ---
    @@ -1002,6 +1002,21 @@ Configuration of Parquet can be done using the 
`setConf` method on `SparkSession
         </p>
       </td>
     </tr>
    +<tr>
    +  <td><code>spark.sql.parquet.writeLegacyFormat</code></td>
    +  <td>false</td>
    +  <td>
    +    This configuration indicates whether we should use legacy Parquet 
format adopted by Spark 1.4
    +    and prior versions or the standard format defined in parquet-format 
specification to write
    +    Parquet files. This is not only related to compatibility with old 
Spark ones, but also other
    +    systems like Hive, Impala, Presto, etc. This is especially important 
for decimals. If this
    +    configuration is not enabled, decimals will be written in int-based 
format in Spark 1.5 and
    +    above, other systems that only support legacy decimal format (fixed 
length byte array) will not
    +    be able to read what Spark has written. Note other systems may have 
added support for the
    +    standard format in more recent versions, which will make this 
configuration unnecessary. Please
    --- End diff --
    
    This is another issue since we call the option something "legacy" which 
isn't actually legacy in Parquet's decimal side.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #22453: [SPARK-20937][DOCS] Describe spark.sql.parquet.wr...

Reply via email to