Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/21427
I've been thinking about this and came to the same conclusion as
@icexelloss here
https://github.com/apache/spark/pull/21427#issuecomment-392070950 that we could
really support both names and position, and fix this without changing behavior.
When the user defines as grouped map udf, the StructType has field names so
if the returned DataFrame has column names they should match. If the user
returned a DataFrame with positional columns only, pandas will name the columns
with an integer index (not an integer string). We could change the logic to do
the following:
```
Assign columns by name, catching a KeyError exception
If the column names are all integers, then fallback to assign by position
Else raise the KeyError (most likely the user has a typo in the column name)
```
I think that will solve this issue and not change the behavior, but I would
need check that this will hold for different pandas versions. How does that
sound?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]