Re: [PR] [SPARK-46256][CORE] Parallel Compression Support for ZSTD [spark]

via GitHub Mon, 04 Dec 2023 21:04:38 -0800


pan3793 commented on code in PR #44172:
URL: https://github.com/apache/spark/pull/44172#discussion_r1414881447



##########
core/src/main/scala/org/apache/spark/internal/config/package.scala:
##########
@@ -1910,6 +1910,16 @@ package object config {
       .booleanConf
       .createWithDefault(true)
 
+  private[spark] val IO_COMPRESSION_ZSTD_WORKERS =
+    ConfigBuilder("spark.io.compression.zstd.workers")
+      .doc("Thread size spawned to compress in parallel when using Zstd. When 
value <= 0, " +
+        "no worker is spawned, it works in single-threaded mode. When value > 
0, it triggers " +
+        "asynchronous mode, corresponding number of threads are spawned. More 
workers improve " +
+        "performance, but also increase memory cost.")
+      .version("4.0.0")
+      .intConf
+      .createWithDefault(8)

Review Comment:
   This does help for those IO-heavy but uses less CPU workloads, but I think 
we should use the single thread in default, otherwise, it would make task 
shuffle writing occupy additional CPU(threads), IMO it does not fit the Spark 
executor's thread-based parallel computing model.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-46256][CORE] Parallel Compression Support for ZSTD [spark]

Reply via email to