shanyu opened a new pull request #27781: [SPARK-31028] Add "-XX:ActiveProcessorCount" to Spark driver and executor in Yarn mode URL: https://github.com/apache/spark/pull/27781 # What changes were proposed in this pull request? When starting Spark driver and executors on Yarn cluster, the JVM process can discover all CPU cores on the system and set thread-pool or GC threads based on that value. We should limit what the JVM sees for the number of cores set by the user (spark.driver.cores or spark.executor.cores) by "-XX:ActiveProcessorCount", which was introduced in Java 8u191. Especially in running Spark on Yarn inside Kubernetes container, the number of CPU cores discovered sometimes is 1, which means it always use 1 thread in the default thread pool, or GC threads. ### Why are the changes needed? Without the change, when running Spark on Yarn, the number of available processors discovered by JVM is not correct. User has assigned driver and executors the number of cores to use and we should honor that. A simple test would be using this Java code: Runtime.getRuntime().availableProcessors() ### Does this PR introduce any user-facing change? No ### How was this patch tested? It is a simple change to the JVM start command, verified manually.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
