nvander1 commented on issue #24563: [SPARK-27359] [OPTIMIZER] [SQL] Rewrite 
ArraysOverlap Join
URL: https://github.com/apache/spark/pull/24563#issuecomment-490726294
 
 
   @viirya 
   
   Oops, thanks for pointing out the missing title! :)
   
   I’ve only used this when the size of the arrays is several orders of 
magnitude less than the number of records on the largest side of the join. I 
don’t have any benchmarks to back this up yet (I’ll do some experiments and 
post the result here).
   
   An assumption is that the number of items in the largest array is several 
orders of magnitude less than the number of records on either side of the join. 
This feels similar to how the replication factor used to optimize skew joins is 
also small.
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to