Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/20081
> spark.sql.parquet.writeLegacyFormat - if you don't use this
configuration, hive external table won't be able to access parquet data.
Well, that's really an undocumented feature... Can you submit a PR to
update the description of `SQLConf.PARQUET_WRITE_LEGACY_FORMAT` and add a test?
> repartition and coalesce is most common use case in Industry to control N
Number of files under directory while doing partitioning data.
Yea I know, but that's not accurate. It assumes each task would output one
file, which is not true if `spark.sql.files.maxRecordsPerFile` is set to a
small number. Anyway this is not a Hive feature, we should probably put it in
the `SQL Programming Guide`.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]