hvanhovell opened a new pull request #25167: [SPARK-27485] 
EnsureRequirements.reorder should handle duplicate expressions gracefully
URL: https://github.com/apache/spark/pull/25167
 
 
   ## What changes were proposed in this pull request?
   When reordering joins EnsureRequirements only checks if all the join keys 
are present in the partitioning expression seq. This is problematic when the 
joins keys and and partitioning expressions both contain duplicates but not the 
same number of duplicates for each expression, e.g. `Seq(a, a, b)` vs `Seq(a, 
b, b)`. This fails with an index lookup failure in the `reorder` function.
   
   This PR fixes this removing the equality checking logic from the 
`reorderJoinKeys` function, and by doing the multiset equality in the `reorder` 
function while building the reordered key sequences.
   
   ## How was this patch tested?
   Added a unit test to the `PlannerSuite` and added an integration test to 
`JoinSuite`

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to