Github user fjh100456 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19218#discussion_r158457806
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetOptions.scala
---
@@ -42,8 +43,15 @@ private[parquet] class ParquetOptions(
* Acceptable values are defined in
[[shortParquetCompressionCodecNames]].
*/
val compressionCodecClassName: String = {
- val codecName = parameters.getOrElse("compression",
- sqlConf.parquetCompressionCodec).toLowerCase(Locale.ROOT)
+ // `compression`, `parquet.compression`(i.e.,
ParquetOutputFormat.COMPRESSION), and
+ // `spark.sql.parquet.compression.codec`
+ // are in order of precedence from highest to lowest.
+ val parquetCompressionConf =
parameters.get(ParquetOutputFormat.COMPRESSION)
+ val codecName = parameters
+ .get("compression")
+ .orElse(parquetCompressionConf)
--- End diff --
Yes it's new. I guess `PartitionOptions` did not used when writing hive
table before, because it's invisible for hive. I changeed it to `public`.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]