peter-toth commented on pull request #28885: URL: https://github.com/apache/spark/pull/28885#issuecomment-662911048
Thanks @squito for the review, very appreciated. > I discussed this offline with Peter a little bit. This is not my area of expertise, so take my comments with a grain of salt. > > * I think the internal details of `ReuseMap` are a little more complicated than they need to be, given the main purpose of this PR. I'd prefer to defer the optimizations to a separate change, and keep the initial implementation simple. I'm fine with this. I can revert it and some UTs to it in the next commit. If anyone has any comments on this, please let us know. > * I'd like to see some unit tests on ReuseMap. Even just really simple stuff, but at least making sure that if there are two things with a matching schema, you get the right one. Ok, I will add it soon. > * You mention a few cases when a ReusedExchange refers to something which no longer exists. Does spark outright fail in those cases? if so, I'd highlight that in the description, that these are bugs and not just performance improvements (maybe this obvious to everyone other than me) As I mentioned I don't know how to categorize this kind of issue, it doesn't affect the correctness of the result, but can cause serious performance issues: https://github.com/apache/spark/pull/28885#discussion_r459273107 > * I'm not totally clear which test covers part 3 of your description -- I guess its `ExchangeSuite."Exchange reuse across the whole plan"`? or maybe just part of it? A comment in the PR description at least, and perhaps even one on the test itself would be helpful. 3. is about whole plan reuse, this naturally comes from the 1 pass, whole plan, bottom-up traversal I suggest in this PR to fix 1. and 2. Please note that currently `ExchangeReuse` already does a limited version of multi-level exchange reuse to support DPP : ``` } transformAllExpressions { // Lookup inside subqueries for duplicate exchanges case in: InSubqueryExec => val newIn = in.plan.transformUp { case exchange: Exchange => reuse(exchange) } in.copy(plan = newIn.asInstanceOf[BaseSubqueryExec]) } ``` I don't see a reason why we couldn't extend this to other type of subqueries (`ScalarSubquery`) and to all levels of the plan which results the very clean new code in `ReuseExchangeAndSubquery`. I added test cases to `DynamicPartitionPruningSuite`, `SubquerySuite` and `ExchangeSuite` cover it. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
