Github user rdblue commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21977#discussion_r207635841
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/python/AggregateInPandasExec.scala
 ---
    @@ -81,6 +82,17 @@ case class AggregateInPandasExec(
     
         val bufferSize = inputRDD.conf.getInt("spark.buffer.size", 65536)
         val reuseWorker = 
inputRDD.conf.getBoolean("spark.python.worker.reuse", defaultValue = true)
    +    val memoryMb = {
    --- End diff --
    
    The other configuration options are already duplicated, so I was trying to 
make as few changes as possible.
    
    Since there are several duplicated options, I think it makes more sense to 
pass the SparkConf through to PythonRunner so it can extract its own 
configuration.
    
    @holdenk, would you like this refactor done in this PR, or should I do it 
in a follow-up?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to