Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21977#discussion_r207635841 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/AggregateInPandasExec.scala --- @@ -81,6 +82,17 @@ case class AggregateInPandasExec( val bufferSize = inputRDD.conf.getInt("spark.buffer.size", 65536) val reuseWorker = inputRDD.conf.getBoolean("spark.python.worker.reuse", defaultValue = true) + val memoryMb = { --- End diff -- The other configuration options are already duplicated, so I was trying to make as few changes as possible. Since there are several duplicated options, I think it makes more sense to pass the SparkConf through to PythonRunner so it can extract its own configuration. @holdenk, would you like this refactor done in this PR, or should I do it in a follow-up?
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org