maryannxue commented on pull request #29797: URL: https://github.com/apache/spark/pull/29797#issuecomment-728143402
In LSR, we have a sanity check to make sure that the output partitioning requirement is not broken after this rule. For example, if there's a parent join sitting above the current BHJ for which we are trying to optimize the probe side to LSR, and that parent join has to take advantage of the output partitioning of the current BHJ. When applying the LSR rule, the check would fail, and we would back out of it. It's a similar situation here with repartition, only that repartition has been optimized away. So if we can introduce an idea like "required partitioning" (and in the future maybe even required sort order) of the query, then when applying the LSR rule we would know it could break the required partitioning. Hope it makes sense, @manuzhang ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
