Kimahriman commented on code in PR #48038:
URL: https://github.com/apache/spark/pull/48038#discussion_r1779744350


##########
python/pyspark/sql/pandas/group_ops.py:
##########
@@ -803,27 +806,30 @@ def applyInArrow(
         Applies a function to each cogroup using Arrow and returns the result
         as a `DataFrame`.
 
-        The function should take two `pyarrow.Table`\\s and return another
-        `pyarrow.Table`. Alternatively, the user can pass a function that takes
-        a tuple of `pyarrow.Scalar` grouping key(s) and the two 
`pyarrow.Table`\\s.
-        For each side of the cogroup, all columns are passed together as a
-        `pyarrow.Table` to the user-function and the returned `pyarrow.Table` 
are combined as
-        a :class:`DataFrame`.
+        The function can take one of two forms: It can take two 
`pyarrow.Table`\\s and return a
+        `pyarrow.Table`, or it can take two iterators of `pyarrow.RecordBatch` 
and yield
+        `pyarrow.RecordBatch`. Alternatively, each form can take a tuple of 
`pyarrow.Scalar`
+        as the first argument in addition to the input type above. For each 
cogroup, all columns
+        are passed together in the `pyarrow.Table` or `pyarrow.RecordBatch`, 
and the returned
+        `pyarrow.Table` or iterator of `pyarrow.RecordBatch` are combined as a 
:class:`DataFrame`.
 
         The `schema` should be a :class:`StructType` describing the schema of 
the returned

Review Comment:
   @zhengruifeng this is a potential way to address your comment above. Just 
load the left side fully into memory, but iteratively load the right side, so 
you at least don't need both in memory at once.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to