Github user mccheah commented on a diff in the pull request:
https://github.com/apache/spark/pull/21092#discussion_r192457908
--- Diff:
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala
---
@@ -102,17 +110,30 @@ private[spark] object KubernetesConf {
appId: String,
mainAppResource: Option[MainAppResource],
mainClass: String,
- appArgs: Array[String]):
KubernetesConf[KubernetesDriverSpecificConf] = {
+ appArgs: Array[String],
+ maybePyFiles: Option[String]):
KubernetesConf[KubernetesDriverSpecificConf] = {
val sparkConfWithMainAppJar = sparkConf.clone()
+ val additionalFiles = mutable.ArrayBuffer.empty[String]
mainAppResource.foreach {
- case JavaMainAppResource(res) =>
- val previousJars = sparkConf
- .getOption("spark.jars")
- .map(_.split(","))
- .getOrElse(Array.empty)
- if (!previousJars.contains(res)) {
- sparkConfWithMainAppJar.setJars(previousJars ++ Seq(res))
- }
+ case JavaMainAppResource(res) =>
+ val previousJars = sparkConf
+ .getOption("spark.jars")
+ .map(_.split(","))
+ .getOrElse(Array.empty)
+ if (!previousJars.contains(res)) {
+ sparkConfWithMainAppJar.setJars(previousJars ++ Seq(res))
+ }
+ // The function of this outer match is to account for multiple
nonJVM
+ // bindings that will all have increased MEMORY_OVERHEAD_FACTOR to
0.4
+ case nonJVM: NonJVMResource =>
+ nonJVM match {
+ case PythonMainAppResource(res) =>
+ additionalFiles += res
+ maybePyFiles.foreach{maybePyFiles =>
+ additionalFiles.appendAll(maybePyFiles.split(","))}
+
sparkConfWithMainAppJar.set(KUBERNETES_PYSPARK_MAIN_APP_RESOURCE, res)
+ }
+ sparkConfWithMainAppJar.set(MEMORY_OVERHEAD_FACTOR, 0.4)
--- End diff --
I think that power users would want the ability to try to overwrite this if
they have a specific amount of memory overhead that they want and know that
they need. Configurations should always be configurable, with defaults that are
sane. I agree that we can afford to set a better default for Kubernetes, but
there should always be a way to override default settings if the user knows the
characteristics of their job. For example if the user does memory profiling of
their container and sees that it's not using the full amount of memory, they
can afford to drop this value and leave more resources for other applications.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]