pan3793 commented on code in PR #52925:
URL: https://github.com/apache/spark/pull/52925#discussion_r2502041707
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##########
@@ -4063,6 +4063,19 @@ object SQLConf {
.checkValues(Set("none", "zstd", "lz4"))
.createWithDefault("none")
+ val ARROW_EXECUTION_ZSTD_COMPRESSION_LEVEL =
+ buildConf("spark.sql.execution.arrow.zstd.compressionLevel")
+ .doc("Compression level for Zstandard (zstd) codec when compressing
Arrow IPC data. " +
+ "This config is only used when
spark.sql.execution.arrow.compressionCodec is set to " +
+ "'zstd'. Valid values are integers from 1 (fastest, lowest
compression) to 22 " +
+ "(slowest, highest compression). The default value 3 provides a good
balance between " +
+ "compression speed and compression ratio.")
+ .version("4.1.0")
+ .intConf
+ .checkValue(level => level >= 1 && level <= 22,
+ "Zstd compression level must be between 1 and 22")
+ .createWithDefault(3)
Review Comment:
I know zstd uses 3 as the default value. for
`spark.io.compression.zstd.level`, it chooses 1 as the default value, for our
internal testing, 1 definitely is the best choice in terms of the immediate
data exchange case (i.e., shuffle).
I'm not worried about either 1 or 3 as the default value, but it would be
great if you could provide some testing reports as a reference.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]