c21 commented on code in PR #37263:
URL: https://github.com/apache/spark/pull/37263#discussion_r928229102
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##########
@@ -922,6 +922,22 @@ object SQLConf {
.checkValues(Set("none", "uncompressed", "snappy", "gzip", "lzo", "lz4",
"brotli", "zstd"))
.createWithDefault("snappy")
+ val PARQUET_COMPRESSION_ZSTD_LEVEL =
buildConf("spark.sql.parquet.zstd.level")
+ .doc("Sets the zstd level when writing Parquet files and compression codec
is `zstd`. " +
+ "The valid range is 1~22. Generally the higher compression level, the
higher compression " +
+ "ratio can be achieved, but the writing time will be longer.")
+ .version("3.4.0")
+ .intConf
+ .createWithDefault(3)
Review Comment:
curious how we come to level 3 by default? Any data or benchmark backed for
the choice?
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##########
@@ -922,6 +922,22 @@ object SQLConf {
.checkValues(Set("none", "uncompressed", "snappy", "gzip", "lzo", "lz4",
"brotli", "zstd"))
.createWithDefault("snappy")
+ val PARQUET_COMPRESSION_ZSTD_LEVEL =
buildConf("spark.sql.parquet.zstd.level")
+ .doc("Sets the zstd level when writing Parquet files and compression codec
is `zstd`. " +
+ "The valid range is 1~22. Generally the higher compression level, the
higher compression " +
+ "ratio can be achieved, but the writing time will be longer.")
+ .version("3.4.0")
+ .intConf
+ .createWithDefault(3)
+
+ val PARQUET_COMPRESSION_ZSTD_WORKERS =
buildConf("spark.sql.parquet.zstd.workers")
+ .doc("Sets the zstd workers when writing Parquet files and compression
codec is `zstd`. " +
+ "The number of threads will be spawned to compress in parallel. More
workers improve " +
+ "speed, but also increase memory usage. When it is 0, it works as
single-threaded mode.")
+ .version("3.4.0")
+ .intConf
+ .createWithDefault(0)
Review Comment:
curious is it the default value used by Parquet?
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##########
@@ -922,6 +922,22 @@ object SQLConf {
.checkValues(Set("none", "uncompressed", "snappy", "gzip", "lzo", "lz4",
"brotli", "zstd"))
.createWithDefault("snappy")
+ val PARQUET_COMPRESSION_ZSTD_LEVEL =
buildConf("spark.sql.parquet.zstd.level")
Review Comment:
what about the existing config `spark.io.compression.zstd.level`? Can't we
just reuse the existing config instead of introducing new ones for each file
format?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]