icexelloss commented on issue #24981: [WIP][SPARK-27463][PYTHON] Support 
Dataframe Cogroup via Pandas UDFs- Arrow Stream Impl
URL: https://github.com/apache/spark/pull/24981#issuecomment-512462884
 
 
   @d80tb7 My personal preference is to keep this PR because there are already 
quite bit of discussion and comments on this PR that will be lost in the new 
PR. But it's not a big deal.
   
   In terms of public API, I think either way we would need the CogroupedData 
class so to me it seems just a matter of whether we want to provide an API 
wrapper on DataFrame:
   
   ```
   def cogroup(self, other, on):
       return self.groupby(on).cogroup(other.groupby(on))
   ```
   
   I think this is a small code change that we can deal later without 
disrupting the majority change. For now my two cents is let us get this PR in a 
mergeable state with pending decision on:
   (1) `df.cogroup(df2, on='id')` or 
`df.groupby('id').cogroup(df2.groupby('id'))`
   (2) pandas_udf decorator change w.r.t to SPARK-28264
   
   @d80tb7 @HyukjinKwon @BryanCutler what do you guys think?
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to