Re: [PR] [SPARK-54226][SQL] Extend Arrow compression to Pandas UDF [spark]

via GitHub Fri, 07 Nov 2025 00:03:40 -0800


pan3793 commented on code in PR #52925:
URL: https://github.com/apache/spark/pull/52925#discussion_r2502041707



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##########
@@ -4063,6 +4063,19 @@ object SQLConf {
       .checkValues(Set("none", "zstd", "lz4"))
       .createWithDefault("none")
 
+  val ARROW_EXECUTION_ZSTD_COMPRESSION_LEVEL =
+    buildConf("spark.sql.execution.arrow.zstd.compressionLevel")
+      .doc("Compression level for Zstandard (zstd) codec when compressing 
Arrow IPC data. " +
+        "This config is only used when 
spark.sql.execution.arrow.compressionCodec is set to " +
+        "'zstd'. Valid values are integers from 1 (fastest, lowest 
compression) to 22 " +
+        "(slowest, highest compression). The default value 3 provides a good 
balance between " +
+        "compression speed and compression ratio.")
+      .version("4.1.0")
+      .intConf
+      .checkValue(level => level >= 1 && level <= 22,
+        "Zstd compression level must be between 1 and 22")
+      .createWithDefault(3)

Review Comment:
   I know zstd uses 3 as the default value. for 
`spark.io.compression.zstd.level`, it chooses 1 as the default value, for our 
internal testing, 1 definitely is the best choice in terms of the immediate 
data exchange case (i.e., shuffle).
   
   I'm not worried about either 1 or 3 as the default value, but it would be 
great if you could provide some testing reports as a reference.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-54226][SQL] Extend Arrow compression to Pandas UDF [spark]

Reply via email to