[GitHub] spark issue #20081: [SPARK-22833][EXAMPLE] Improvement SparkHive Scala Examp...

cloud-fan Mon, 25 Dec 2017 23:37:25 -0800

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/20081
  
    > spark.sql.parquet.writeLegacyFormat - if you don't use this 
configuration, hive external table won't be able to access parquet data.
    
    Well, that's really an undocumented feature... Can you submit a PR to 
update the description of `SQLConf.PARQUET_WRITE_LEGACY_FORMAT` and add a test?
    
    > repartition and coalesce is most common use case in Industry to control N 
Number of files under directory while doing partitioning data.
    
    Yea I know, but that's not accurate. It assumes each task would output one 
file, which is not true if `spark.sql.files.maxRecordsPerFile` is set to a 
small number. Anyway this is not a Hive feature, we should probably put it in 
the `SQL Programming Guide`.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #20081: [SPARK-22833][EXAMPLE] Improvement SparkHive Scala Examp...

Reply via email to