[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

BryanCutler Fri, 25 May 2018 09:59:13 -0700

Github user BryanCutler commented on the issue:

    https://github.com/apache/spark/pull/21427
  
    I've been thinking about this and came to the same conclusion as 
@icexelloss here 
https://github.com/apache/spark/pull/21427#issuecomment-392070950 that we could 
really support both names and position, and fix this without changing behavior.
    
    When the user defines as grouped map udf, the StructType has field names so 
if the returned DataFrame has column names they should match.  If the user 
returned a DataFrame with positional columns only, pandas will name the columns 
with an integer index (not an integer string).  We could change the logic to do 
the following:
    ```
    Assign columns by name, catching a KeyError exception
    If the column names are all integers, then fallback to assign by position
    Else raise the KeyError (most likely the user has a typo in the column name)
    ```
    I think that will solve this issue and not change the behavior, but I would 
need check that this will hold for different pandas versions.  How does that 
sound?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

Reply via email to