Rajesh Balamohan created HIVE-24207:
---------------------------------------

             Summary: LimitOperator can leverage ObjectCache to bail out quickly
                 Key: HIVE-24207
                 URL: https://issues.apache.org/jira/browse/HIVE-24207
             Project: Hive
          Issue Type: Improvement
            Reporter: Rajesh Balamohan


{noformat}
select  ss_sold_date_sk from store_sales, date_dim where date_dim.d_year in 
(1998,1998+1,1998+2) and store_sales.ss_sold_date_sk = date_dim.d_date_sk limit 
100;

 select distinct ss_sold_date_sk from store_sales, date_dim where 
date_dim.d_year in (1998,1998+1,1998+2) and store_sales.ss_sold_date_sk = 
date_dim.d_date_sk limit 100;

 {noformat}

Queries like the above generate a large number of map tasks. Currently they 
don't bail out after generating enough amount of data. 

It would be good to make use of ObjectCache & retain the number of records 
generated. LimitOperator/VectorLimitOperator can bail out for the later tasks 
in the operator's init phase itself. 

https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorLimitOperator.java#L57

https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/LimitOperator.java#L58



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to