Jesus Camacho Rodriguez created HIVE-23365:
----------------------------------------------

             Summary: Put RS deduplication optimization under cost based 
decision
                 Key: HIVE-23365
                 URL: https://issues.apache.org/jira/browse/HIVE-23365
             Project: Hive
          Issue Type: Improvement
          Components: Physical Optimizer
            Reporter: Jesus Camacho Rodriguez


Currently, RS deduplication is always executed whenever it is semantically 
correct. However, it could be beneficial if t to leave both RS operators in the 
plan, e.g., if the NDV of the second RS is very low. Thus, we would like this 
decision to be cost-based. We could use a simple heuristic that would work fine 
for most of the cases without introducing regressions for existing cases, e.g., 
if NDV for partition column is less than estimated parallelism in the second 
RS, do not execute deduplication.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to