HyukjinKwon commented on a change in pull request #28085:
[SPARK-29641][PYTHON][CORE] Stage Level Sched: Add python api's and tests
URL: https://github.com/apache/spark/pull/28085#discussion_r407802033
##########
File path: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala
##########
@@ -106,26 +104,41 @@ private[spark] abstract class BasePythonRunner[IN, OUT](
// Authentication helper used when serving method calls via socket from
Python side.
private lazy val authHelper = new SocketAuthHelper(conf)
+ // each python worker gets an equal part of the allocation. the worker pool
will grow to the
+ // number of concurrent tasks, which is determined by the number of cores in
this executor.
+ private def getWorkerMemoryMb(mem: Option[Long], cores: Int): Option[Long] =
{
+ mem.map(_ / cores)
Review comment:
Yes ... let's just loop this in a separate JIRA if you don't mind.
I wary of this Python memory configuration which needed a bunch of followups
and separate investigation (see SPARK-25004) when it was first added. This
configuration also made Spark 2.4.0 useless on Windows (SPARK-26080).
This configuration is even incomplete and conflicts with
'spark.python.worker.memory' configuration, see also SPARK-26679.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]