[GitHub] spark pull request #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF shou...

BryanCutler Fri, 25 May 2018 13:51:24 -0700

Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21427#discussion_r191006873
  
    --- Diff: python/pyspark/worker.py ---
    @@ -111,9 +114,16 @@ def wrapped(key_series, value_series):
                     "Number of columns of the returned pandas.DataFrame "
                     "doesn't match specified schema. "
                     "Expected: {} Actual: {}".format(len(return_type), 
len(result.columns)))
    -        arrow_return_types = (to_arrow_type(field.dataType) for field in 
return_type)
    -        return [(result[result.columns[i]], arrow_type)
    -                for i, arrow_type in enumerate(arrow_return_types)]
    +        try:
    +            # Assign result columns by schema name
    +            return [(result[field.name], to_arrow_type(field.dataType)) 
for field in return_type]
    +        except KeyError:
    --- End diff --
    
    I think it's possible for the column index to be many things, the user 
could even assign it themselves right with `pdf.columns = ...`?
    
    As far as I can tell, using a string as a key should always result in a 
KeyError if not there..  If a MultiIndex is involved, it's a little more 
complicated but I don't think that's allowed anyway



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF shou...

Reply via email to