Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/20081
  
    > spark.sql.parquet.writeLegacyFormat - if you don't use this 
configuration, hive external table won't be able to access parquet data.
    
    Well, that's really an undocumented feature... Can you submit a PR to 
update the description of `SQLConf.PARQUET_WRITE_LEGACY_FORMAT` and add a test?
    
    > repartition and coalesce is most common use case in Industry to control N 
Number of files under directory while doing partitioning data.
    
    Yea I know, but that's not accurate. It assumes each task would output one 
file, which is not true if `spark.sql.files.maxRecordsPerFile` is set to a 
small number. Anyway this is not a Hive feature, we should probably put it in 
the `SQL Programming Guide`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to