Github user rdblue commented on a diff in the pull request:
https://github.com/apache/spark/pull/23055#discussion_r234286173
--- Diff:
core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala ---
@@ -74,8 +74,13 @@ private[spark] abstract class BasePythonRunner[IN, OUT](
private val reuseWorker = conf.getBoolean("spark.python.worker.reuse",
true)
// each python worker gets an equal part of the allocation. the worker
pool will grow to the
// number of concurrent tasks, which is determined by the number of
cores in this executor.
- private val memoryMb = conf.get(PYSPARK_EXECUTOR_MEMORY)
+ private val memoryMb = if (Utils.isWindows) {
--- End diff --
My point is that if resource can't be loaded for any reason, the code
shouldn't fail. As it is, if resource can't be loaded then that is handled, but
if the memory limit is set then the worker will still try to use it. That's
what I think is brittle. There should be a flag for whether to attempt to use
the resource API, based on whether it was loaded.
If the worker operates as I described, then why make any changes on the JVM
side? Why avoid telling the worker how much memory it has?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]