peter-toth commented on pull request #28885:
URL: https://github.com/apache/spark/pull/28885#issuecomment-662911048


   Thanks @squito for the review, very appreciated.
   
   > I discussed this offline with Peter a little bit. This is not my area of 
expertise, so take my comments with a grain of salt.
   > 
   > * I think the internal details of `ReuseMap` are a little more complicated 
than they need to be, given the main purpose of this PR.  I'd prefer to defer 
the optimizations to a separate change, and keep the initial implementation 
simple.
   
   I'm fine with this. I can revert it and some UTs to it in the next commit. 
If anyone has any comments on this, please let us know.
   
   > * I'd like to see some unit tests on ReuseMap.  Even just really simple 
stuff, but at least making sure that if there are two things with a matching 
schema, you get the right one.
   
   Ok, I will add it soon.
   
   > * You mention a few cases when a ReusedExchange refers to something which 
no longer exists.  Does spark outright fail in those cases?  if so, I'd 
highlight that in the description, that these are bugs and not just performance 
improvements (maybe this obvious to everyone other than me)
   
   As I mentioned I don't know how to categorize this kind of issue, it doesn't 
affect the correctness of the result, but can cause serious performance issues: 
https://github.com/apache/spark/pull/28885#discussion_r459273107 
   
   > * I'm not totally clear which test covers part 3 of your description -- I 
guess its `ExchangeSuite."Exchange reuse across the whole plan"`? or maybe just 
part of it?  A comment in the PR description at least, and perhaps even one on 
the test itself would be helpful.
   
   3. is about whole plan reuse, this naturally comes from the 1 pass, whole 
plan, bottom-up traversal I suggest in this PR to fix 1. and 2. Please note 
that currently `ExchangeReuse` already does a limited version of multi-level 
exchange reuse to support DPP :
   ```
       } transformAllExpressions {
         // Lookup inside subqueries for duplicate exchanges
         case in: InSubqueryExec =>
           val newIn = in.plan.transformUp {
             case exchange: Exchange => reuse(exchange)
           }
           in.copy(plan = newIn.asInstanceOf[BaseSubqueryExec])
       }
   ```
   I don't see a reason why we couldn't extend this to other type of subqueries 
(`ScalarSubquery`) and to all levels of the plan which results the very clean 
new code in `ReuseExchangeAndSubquery`.
   I added test cases  to `DynamicPartitionPruningSuite`, `SubquerySuite` and 
`ExchangeSuite` cover it.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to