[GitHub] spark pull request #22453: [SPARK-20937][DOCS] Describe spark.sql.parquet.wr...

HyukjinKwon Sun, 23 Sep 2018 18:18:01 -0700

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22453#discussion_r219719166
  
    --- Diff: docs/sql-programming-guide.md ---
    @@ -1002,6 +1002,15 @@ Configuration of Parquet can be done using the 
`setConf` method on `SparkSession
         </p>
       </td>
     </tr>
    +<tr>
    +  <td><code>spark.sql.parquet.writeLegacyFormat</code></td>
    --- End diff --
    
    @srowen, actually, this configuration specifically related with 
compatibility with other systems like Impala (not only old Spark ones) where 
decimals are written based on fixed binary format (nowdays it's written in 
int-based in Spark). If this configurations is not enabled, they are unable to 
read what Spark wrote.
    
    Given 
https://stackoverflow.com/questions/44279870/why-cant-impala-read-parquet-files-after-spark-sqls-write
 and JIRA like 
[SPARK-20297](https://issues.apache.org/jira/browse/SPARK-20297), I think this 
configuration is kind of important. I even expected more documentation about 
this configuration specifically at the first place.
    
    Personally I have been thinking it would better to leave this configuration 
after 3.0 as well for better compatibility.




---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #22453: [SPARK-20937][DOCS] Describe spark.sql.parquet.wr...

Reply via email to