Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22453#discussion_r220409331
  
    --- Diff: docs/sql-programming-guide.md ---
    @@ -1002,6 +1002,21 @@ Configuration of Parquet can be done using the 
`setConf` method on `SparkSession
         </p>
       </td>
     </tr>
    +<tr>
    +  <td><code>spark.sql.parquet.writeLegacyFormat</code></td>
    +  <td>false</td>
    +  <td>
    +    This configuration indicates whether we should use legacy Parquet 
format adopted by Spark 1.4
    +    and prior versions or the standard format defined in parquet-format 
specification to write
    +    Parquet files. This is not only related to compatibility with old 
Spark ones, but also other
    +    systems like Hive, Impala, Presto, etc. This is especially important 
for decimals. If this
    +    configuration is not enabled, decimals will be written in int-based 
format in Spark 1.5 and
    +    above, other systems that only support legacy decimal format (fixed 
length byte array) will not
    +    be able to read what Spark has written. Note other systems may have 
added support for the
    +    standard format in more recent versions, which will make this 
configuration unnecessary. Please
    --- End diff --
    
    Let's make it short and get rid of all other things orthogonal with the 
issue itself (I think the issue is specific to decimals). For instance, we 
could say:
    
    If `true`, it writes Parquet file in a way of Spark 1.4 and earlier, for 
instance, decimal values will be written in Apache Parquet's fixed-length byte 
array format, which other systems such as Apache Hive and Apache Impala use. If 
`false`, the newer format in Parquet will be used, for instance, decimals will 
be written based on int. If Parquet output is intended for use with systems 
that do not support this newer format, set to `true`.
    
    Please feel free to change words as what you think is righter


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to