Github user zecevicp commented on the issue:

    https://github.com/apache/spark/pull/21109
  
    Hey Liang-Chi, thanks for looking into this.
    Yes, the problem can be circumvented by changing the join condition as you 
describe, but only in the benchmark case, because my "expensive function" was a 
bit misleading. 
    The problem is not in the function itself, but in the number of rows that 
are checked for each pair of matching equi-join keys. 
    I changed the benchmark test case now so to better demonstrate this. I 
completely removed the expensive function and I'm only doing a count on the 
matched rows. The results are the following.
    Without the optimization:
    ```
    AMD EPYC 7401 24-Core Processor
    sort merge join:                      Best/Avg Time(ms)    Rate(M/s)   Per 
Row(ns)   Relative
    
---------------------------------------------------------------------------------------------
    sort merge join wholestage off            30956 / 31374          0.0       
75575.5       1.0X
    sort merge join wholestage on             10864 / 11043          0.0       
26523.6       2.8X
    ```
    With the optimization:
    ```
    AMD EPYC 7401 24-Core Processor
    sort merge join:                      Best/Avg Time(ms)    Rate(M/s)   Per 
Row(ns)   Relative
    
---------------------------------------------------------------------------------------------
    sort merge join wholestage off            30734 / 31135          0.0       
75035.2       1.0X
    sort merge join wholestage on                959 / 1040          0.4        
2341.3      32.0X
    ```
    This shows a 10x improvement over the non-optimized case (as I already 
said, this depends on the range condition, number of matched rows, the 
calculated function, etc.).
    
    Regarding your second question as to why is the "wholestage off" case in 
the optimized version so slow, that is because the optimization is turned off 
when the wholestage code generation is turned off.
    And that is simply because it was too hard to debug it and I figured the 
wholestage generation is on by default, so I'm guessing (and hoping) that it 
would not be too hard of a requirement to have to turn wholestage codegen on if 
you want to use this optimization.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to