HyukjinKwon commented on a change in pull request #25545: [SPARK-28843][PYTHON]
Set OMP_NUM_THREADS to executor cores for python
URL: https://github.com/apache/spark/pull/25545#discussion_r318681877
##########
File path: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala
##########
@@ -106,6 +106,13 @@ private[spark] abstract class BasePythonRunner[IN, OUT](
val startTime = System.currentTimeMillis
val env = SparkEnv.get
val localdir = env.blockManager.diskBlockManager.localDirs.map(f =>
f.getPath()).mkString(",")
+ // if OMP_NUM_THREADS is not explicitly set, override it with the number
of cores
+ if (conf.getOption("spark.executorEnv.OMP_NUM_THREADS").isEmpty) {
+ // SPARK-28843: limit the OpenMP thread pool to the number of cores
assigned to this executor
+ // this avoids high memory consumption with pandas/numpy because of a
large OpenMP thread pool
+ // see https://github.com/numpy/numpy/issues/10455
+
conf.getOption("spark.executor.cores").foreach(envVars.put("OMP_NUM_THREADS",
_))
Review comment:
Yes, so the problem here is that the number is somewhat a bit arbitrary. It
should be yes 1 or up to users control.
Am I understanding correctly that, say in 16 cores machine,
spark.executor.cores=1-> all cores, OMP_NUM_THREADS=1 -> one process with 1
thread
spark.executor.cores=2 -> two cores, OMP_NUM_THREADS=2 -> one process with 2
threads
spark.executor.cores=3 -> three cores, OMP_NUM_THREADS=3 -> one process with
3 threads
...
spark.executor.cores=16-> all cores, OMP_NUM_THREADS=16 -> one process with
16 threads
is an expected behaviour?
UDF or RDD APIs cannot be optimized because we don't know what's users are
going to do in in this box. And here we're putting an assumption on that.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]