Zoltan Haindrich created HIVE-24376:
---------------------------------------

             Summary: SharedWorkOptimizer may retain the SJ filter condition 
during RemoveSemijoin  mode
                 Key: HIVE-24376
                 URL: https://issues.apache.org/jira/browse/HIVE-24376
             Project: Hive
          Issue Type: Improvement
            Reporter: Zoltan Haindrich


the mode name is also a bit confusing..but here is what happens:

{code}
TS[A1] -> ...
TS[A2] -> JOIN
TS[B] -> JOIN
{code}

we have an SJ edge between TS[B] -> TS[A2] to communicate informations about 
the join keys; lets assume the reducation ratio was r.


RemoveSemijoin right now does the following:
* removes the semijoin edge (so TS[A2] will become a full scan)
* merges TS[A1] and TS[A2]

w.r.t to read data from disk: this is great - we accessed A twice; from which 1 
was a full scan - and now we only read it once.

but from row traffic perspective: TS[A2] emits more rows from now on because we 
dont have the r ratio semijoin reduction anymore.
 





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to