[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

rxin Fri, 25 May 2018 10:09:45 -0700

Github user rxin commented on the issue:

    https://github.com/apache/spark/pull/21427
  
    If we can fix it without breaking existing behavior that would be awesome.
    
    On Fri, May 25, 2018 at 9:59 AM Bryan Cutler <[email protected]>
    wrote:
    
    > I've been thinking about this and came to the same conclusion as
    > @icexelloss <https://github.com/icexelloss> here #21427 (comment)
    > <https://github.com/apache/spark/pull/21427#issuecomment-392070950> that
    > we could really support both names and position, and fix this without
    > changing behavior.
    >
    > When the user defines as grouped map udf, the StructType has field names
    > so if the returned DataFrame has column names they should match. If the
    > user returned a DataFrame with positional columns only, pandas will name
    > the columns with an integer index (not an integer string). We could change
    > the logic to do the following:
    >
    > Assign columns by name, catching a KeyError exception
    > If the column names are all integers, then fallback to assign by position
    > Else raise the KeyError (most likely the user has a typo in the column 
name)
    >
    > I think that will solve this issue and not change the behavior, but I
    > would need check that this will hold for different pandas versions. How
    > does that sound?
    >
    > â
    > You are receiving this because you were mentioned.
    > Reply to this email directly, view it on GitHub
    > <https://github.com/apache/spark/pull/21427#issuecomment-392119306>, or 
mute
    > the thread
    > 
<https://github.com/notifications/unsubscribe-auth/AATvPMCqb9uccM8coTBel1PxwCReedS4ks5t2DiCgaJpZM4UM2oZ>
    > .
    >




---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

Reply via email to