Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21427
If we can fix it without breaking existing behavior that would be awesome.
On Fri, May 25, 2018 at 9:59 AM Bryan Cutler <[email protected]>
wrote:
> I've been thinking about this and came to the same conclusion as
> @icexelloss <https://github.com/icexelloss> here #21427 (comment)
> <https://github.com/apache/spark/pull/21427#issuecomment-392070950> that
> we could really support both names and position, and fix this without
> changing behavior.
>
> When the user defines as grouped map udf, the StructType has field names
> so if the returned DataFrame has column names they should match. If the
> user returned a DataFrame with positional columns only, pandas will name
> the columns with an integer index (not an integer string). We could change
> the logic to do the following:
>
> Assign columns by name, catching a KeyError exception
> If the column names are all integers, then fallback to assign by position
> Else raise the KeyError (most likely the user has a typo in the column
name)
>
> I think that will solve this issue and not change the behavior, but I
> would need check that this will hold for different pandas versions. How
> does that sound?
>
> â
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <https://github.com/apache/spark/pull/21427#issuecomment-392119306>, or
mute
> the thread
>
<https://github.com/notifications/unsubscribe-auth/AATvPMCqb9uccM8coTBel1PxwCReedS4ks5t2DiCgaJpZM4UM2oZ>
> .
>
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]