[GitHub] spark pull request #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF shou...

icexelloss Fri, 22 Jun 2018 11:11:02 -0700

Github user icexelloss commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21427#discussion_r197527013
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowUtils.scala 
---
    @@ -120,4 +121,19 @@ object ArrowUtils {
           StructField(field.getName, dt, field.isNullable)
         })
       }
    +
    +  /** Return Map with conf settings to be used in ArrowPythonRunner */
    +  def getPythonRunnerConfMap(conf: SQLConf): Map[String, String] = {
    +    val timeZoneConf = if (conf.pandasRespectSessionTimeZone) {
    +      Seq(SQLConf.SESSION_LOCAL_TIMEZONE.key -> conf.sessionLocalTimeZone)
    +    } else {
    +      Nil
    +    }
    +    val pandasColsByPosition = if 
(conf.pandasGroupedMapAssignColumnssByPosition) {
    --- End diff --
    
    I am sorry can you explain why it's easier to process in the worker?
    
    Isn't it just removing the default value here:
    
https://github.com/apache/spark/pull/21427/files#diff-d33eea00c68dfd120f4ceae6381f34cdR99
    
    Also one thing is not great about omitting the conf for default case is 
that you need to put the default value in two places..(both python and java)



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF shou...

Reply via email to