[GitHub] [spark] tgravescs commented on a change in pull request #28085: [SPARK-29641][PYTHON][CORE] Stage Level Sched: Add python api's and tests

GitBox Mon, 13 Apr 2020 09:32:40 -0700

tgravescs commented on a change in pull request #28085: 
[SPARK-29641][PYTHON][CORE] Stage Level Sched: Add python api's and tests
URL: https://github.com/apache/spark/pull/28085#discussion_r407573038


 ##########
 File path: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala
 ##########
 @@ -106,26 +104,41 @@ private[spark] abstract class BasePythonRunner[IN, OUT](
   // Authentication helper used when serving method calls via socket from 
Python side.
   private lazy val authHelper = new SocketAuthHelper(conf)
 
+  // each python worker gets an equal part of the allocation. the worker pool 
will grow to the
+  // number of concurrent tasks, which is determined by the number of cores in 
this executor.
+  private def getWorkerMemoryMb(mem: Option[Long], cores: Int): Option[Long] = 
{
+    mem.map(_ / cores)
 
 Review comment:
   this was preexisting functionality 
(https://github.com/apache/spark/pull/28085/files#diff-6bc32eb2bef385137d7c16fc2c75e8b4L88)
 , I just changed to make it work with the resource profiles. From my 
understanding its just splitting the memory equally because you get a python 
worker per task. I thought the comment did decent job of relaying that, but 
maybe we need to clarify?    Thinking about this some more, there might 
actually be a bug here (was here before my changes) if the spark.task.cpus is > 
1 because the max tasks you get couldn't be equal to number of cores, so your 
are splitting the memory to much.  I can file a separate jira to look at that 
though.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] tgravescs commented on a change in pull request #28085: [SPARK-29641][PYTHON][CORE] Stage Level Sched: Add python api's and tests

Reply via email to