Github user vanzin commented on a diff in the pull request:
https://github.com/apache/spark/pull/23055#discussion_r236483417
--- Diff:
core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala ---
@@ -74,8 +74,13 @@ private[spark] abstract class BasePythonRunner[IN, OUT](
private val reuseWorker = conf.getBoolean("spark.python.worker.reuse",
true)
// each python worker gets an equal part of the allocation. the worker
pool will grow to the
// number of concurrent tasks, which is determined by the number of
cores in this executor.
- private val memoryMb = conf.get(PYSPARK_EXECUTOR_MEMORY)
+ private val memoryMb = if (Utils.isWindows) {
--- End diff --
> Strictly we should move it into JVM rather then adding more controls at
workers.
There's a reason why the value is sent to the worker. It enables a feature
that, when available, gives you better error information.
With the python code, if the app runs over the specified limit, you'll get
an error from python saying it's using more memory than it should.
Without it, you'll get a generic error from the resource manager that your
app exceeded its memory allocation, and you wont' know exactly what caused it
(java? python? something else?).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]