icexelloss commented on issue #24981: [SPARK-27463][PYTHON] Support Dataframe 
Cogroup via Pandas UDFs
URL: https://github.com/apache/spark/pull/24981#issuecomment-532335682
 
 
   > > In response to @ueshin question, please correct me if I'm wrong @d80tb7
   > > > I'm just wondering what if the group keys are different between the 
two grouped data.
   > > > Is it okay to execute as are, or should we check the both keys are the 
same lengths and types?
   > > 
   > > 
   > > The group keys could be different, and then both are passed to the left, 
right dataframes in the udf. It currently does not restrict they be the same 
length or type, so the user has to make sure the udf can handle this. For 
example if using `pandas.merge_asof`, different keys can be used and it allows 
some flexibility for the comparison.
   > 
   > Hi, yes this is correct. Fundamentally, different key types/lengths are 
handled like any other case of disjoint key sets- the called udf will always 
have an empty dataframe passed as one of the arguments.
   > 
   > I think there is a good argument, however, for disallowing these cases as 
it's difficult to imagine a case where this would be what the user wants . I'd 
like to add this as a follow up if that's ok because:
   > 
   > a) The current implementation isn't awful (i.e. it does what you tell it 
to do).
   > b) Restricting on key/type length isn't trivial (it would have do be done 
in the optimizer I think as everything before that hasn't got the resolved key).
   > c) There might be some use case where you might want to have different key 
types/shapes (although I admit I am struggling to think of one)
   
   I am not sure I am following this. What do you mean by "different key 
types/lengths"?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to