maryannxue commented on pull request #29797:
URL: https://github.com/apache/spark/pull/29797#issuecomment-728143402


   In LSR, we have a sanity check to make sure that the output partitioning 
requirement is not broken after this rule. For example, if there's a parent 
join sitting above the current BHJ for which we are trying to optimize the 
probe side to LSR, and that parent join has to take advantage of the output 
partitioning of the current BHJ. When applying the LSR rule, the check would 
fail, and we would back out of it.
   It's a similar situation here with repartition, only that repartition has 
been optimized away. So if we can introduce an idea like "required 
partitioning" (and in the future maybe even required sort order) of the query, 
then when applying the LSR rule we would know it could break the required 
partitioning. Hope it makes sense, @manuzhang 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to