Github user liancheng commented on the issue:

    https://github.com/apache/spark/pull/13906
  
    My feeling is that, this optimization rule is mostly useful for binary plan 
nodes like inner join and intersection, where we can avoid scanning output of 
the non-empty side.
    
    On the other hand, for unary plan nodes, firstly it doesn't bring much 
performance benefits, especially when whole stage codegen is enabled; secondly 
there are non-obvious and tricky corner cases, like `Aggregate` and `Generate`.
    
    That said, although this patch is not a big one, it does introduce 
non-trivial complexities. For example, I didn't immediately realize that why 
`Aggregate` must be special cased at first (`COUNT(x)` may return 0 for empty 
input). The `Generate` case is even trickier.
    
    So my suggestion is to only implement this rule for inner join and 
intersection, which are much simpler to handle. what do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to