zhengruifeng commented on code in PR #38624:
URL: https://github.com/apache/spark/pull/38624#discussion_r1375616619
##########
python/pyspark/worker.py:
##########
@@ -306,6 +308,33 @@ def verify_element(elem):
)
+def wrap_cogrouped_map_arrow_udf(f, return_type, argspec, runner_conf):
Review Comment:
can we make `wrap_grouped_map_arrow_udf ` `wrap_cogrouped_map_arrow_udf` and
`verify_arrow_table_schema` more consistent with pandas side:
`wrap_grouped_map_pandas_udf`, `wrap_cogrouped_map_pandas_udf` and
`verify_pandas_result`?
##########
python/pyspark/worker.py:
##########
@@ -330,6 +359,97 @@ def wrapped(left_key_series, left_value_series,
right_key_series, right_value_se
return lambda kl, vl, kr, vr: [(wrapped(kl, vl, kr, vr),
to_arrow_type(return_type))]
+def verify_arrow_table_schema(table, assign_cols_by_name,
expected_cols_and_types):
+ import pyarrow as pa
+
+ if not isinstance(table, pa.Table):
+ raise TypeError(
+ "Return type of the user-defined function should be "
+ "pyarrow.Table, but is {}".format(type(table))
+ )
+
+ # the types of the fields have to be identical to return type
+ # an empty table can have no columns; if there are columns, they have to
match
+ if len(table.columns) != 0 or table.num_rows != 0:
Review Comment:
nit:
```suggestion
if table.num_columns != 0 or table.num_rows != 0:
```
##########
python/pyspark/sql/pandas/group_ops.py:
##########
@@ -30,13 +30,15 @@
PandasGroupedMapFunction,
PandasGroupedMapFunctionWithState,
PandasCogroupedMapFunction,
+ ArrowGroupedMapFunction,
+ ArrowCogroupedMapFunction,
)
from pyspark.sql.group import GroupedData
class PandasGroupedOpsMixin:
"""
- Min-in for pandas grouped operations. Currently, only :class:`GroupedData`
+ Min-in for Pandas grouped operations. Currently, only :class:`GroupedData`
Review Comment:
+1, this PR is huge, let's avoid unrelated changes
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]