Github user chetkhatri commented on the issue:
https://github.com/apache/spark/pull/20081
@cloud-fan Thanks for PR
4. spark.sql.parquet.writeLegacyFormat - if you don't use this
configuration, hive external table won't be able to access parquet data.
5. repartition and coalesce is most common use case in Industry to control
N Number of files under directory while doing partitioning data.
i.e If Data volume is very huge, then every partitions would have many
small-small files which may harm
downstream query performance due to File I/O, Bandwidth I/O, Network
I/O, Disk I/O.
Else I am good this your approach.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]