pan3793 commented on code in PR #44172: URL: https://github.com/apache/spark/pull/44172#discussion_r1414881447
########## core/src/main/scala/org/apache/spark/internal/config/package.scala: ########## @@ -1910,6 +1910,16 @@ package object config { .booleanConf .createWithDefault(true) + private[spark] val IO_COMPRESSION_ZSTD_WORKERS = + ConfigBuilder("spark.io.compression.zstd.workers") + .doc("Thread size spawned to compress in parallel when using Zstd. When value <= 0, " + + "no worker is spawned, it works in single-threaded mode. When value > 0, it triggers " + + "asynchronous mode, corresponding number of threads are spawned. More workers improve " + + "performance, but also increase memory cost.") + .version("4.0.0") + .intConf + .createWithDefault(8) Review Comment: This does help for those IO-heavy but uses less CPU workloads, but I think we should use the single thread in default, otherwise, it would make task shuffle writing occupy additional CPU(threads), IMO it does not fit the Spark executor's thread-based parallel computing model. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org