[jira] [Created] (HIVE-26104) HIVE-19138 May block queries to compile

2022-03-31 Thread liuyan (Jira)
liuyan created HIVE-26104:
-

 Summary: HIVE-19138 May block queries to compile
 Key: HIVE-26104
 URL: https://issues.apache.org/jira/browse/HIVE-26104
 Project: Hive
  Issue Type: Bug
  Components: CBO
Affects Versions: 3.1.2, 3.0.0
Reporter: liuyan


HIVE-19138 introduce a way to allow other queries to stay in compilation state 
while there are placeholder for the same query in result cache.   However, 
multiple queires may enter the same state and hence used all the avaliable 
parallel compilation limit via hive.driver.parallel.compilation.global.limit.   
 Althought we can turn off this feature by setting  
hive.query.results.cache.wait.for.pending.results = false, but seems this 
negelects all the efforts that Hive-19138 trying to reslove.  We need a better 
solution for such situation 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HIVE-24712) hive.map.aggr=false and hive.optimize.reducededuplication=false provide incorrect result on order by with limit

2021-02-01 Thread liuyan (Jira)
liuyan created HIVE-24712:
-

 Summary: hive.map.aggr=false and 
hive.optimize.reducededuplication=false provide incorrect result on order by 
with limit
 Key: HIVE-24712
 URL: https://issues.apache.org/jira/browse/HIVE-24712
 Project: Hive
  Issue Type: Improvement
  Components: CBO
Affects Versions: 3.1.0
Reporter: liuyan


 When Both param set to false , seems the result is not correct, only 35 rows. 
This is tested on HDP 3.1.5

set hive.map.aggr=false;
set hive.optimize.reducededuplication=false;

select cs_sold_date_sk,count(distinct cs_order_number) from 
tpcds_orc.catalog_sales_orc group by cs_sold_date_sk order by cs_sold_date_sk 
limit 200;

--
 VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED 
--
Map 1 .. llap SUCCEEDED 33 33 0 0 0 0 
Reducer 2 .. llap SUCCEEDED 4 4 0 0 0 0 
Reducer 3 .. llap SUCCEEDED 4 4 0 0 0 0 
Reducer 4 .. llap SUCCEEDED 1 1 0 0 0 0 
--
VERTICES: 04/04 [==>>] 100% ELAPSED TIME: 38.23 s 
--
FO : 
INFO : Task Execution Summary
INFO : 
--
INFO : VERTICES DURATION(ms) CPU_TIME(ms) GC_TIME(ms) INPUT_RECORDS 
OUTPUT_RECORDS
INFO : 
--
INFO : Map 1 38097.00 0 0 143,997,065 57,447
INFO : Reducer 2 9003.00 0 0 57,447 13,108
INFO : Reducer 3 0.00 0 0 13,108 35
INFO : Reducer 4 0.00 0 0 35 0
INFO : 
--
INFO : 
INFO : LLAP IO Summary

 

 


set hive.map.aggr=true;
set hive.optimize.reducededuplication=false;

select cs_sold_date_sk,count(distinct cs_order_number) from 
tpcds_orc.catalog_sales_orc group by cs_sold_date_sk order by cs_sold_date_sk 
limit 200;
--
 VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED 
--
Map 1 .. llap SUCCEEDED 33 33 0 0 0 0 
Reducer 2 .. llap SUCCEEDED 4 4 0 0 0 0 
Reducer 3 .. llap SUCCEEDED 2 2 0 0 0 0 
Reducer 4 .. llap SUCCEEDED 1 1 0 0 0 0 
--
VERTICES: 04/04 [==>>] 100% ELAPSED TIME: 36.24 s 
--


INFO : 
--
INFO : VERTICES DURATION(ms) CPU_TIME(ms) GC_TIME(ms) INPUT_RECORDS 
OUTPUT_RECORDS
INFO : 
--
INFO : Map 1 25595.00 0 0 143,997,065 16,703,757
INFO : Reducer 2 18556.00 0 0 16,703,757 800
INFO : Reducer 3 8018.00 0 0 800 200
INFO : Reducer 4 0.00 0 0 200 0
INFO : 
--
INFO :



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23777) hive.semantic.analyzer.hook missing documentation

2020-06-29 Thread liuyan (Jira)
liuyan created HIVE-23777:
-

 Summary: hive.semantic.analyzer.hook  missing documentation 
 Key: HIVE-23777
 URL: https://issues.apache.org/jira/browse/HIVE-23777
 Project: Hive
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 3.1.2, 3.1.1, 3.1.0, 3.0.0
Reporter: liuyan


hive.semantic.analyzer.hook is missing documentation from the 
[https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties] Page



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23749) Improve Hive Hook Documentation

2020-06-22 Thread liuyan (Jira)
liuyan created HIVE-23749:
-

 Summary: Improve Hive Hook Documentation 
 Key: HIVE-23749
 URL: https://issues.apache.org/jira/browse/HIVE-23749
 Project: Hive
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 3.1.2, 3.1.1, 3.1.0, 3.0.0
Reporter: liuyan


h5. Seems we have little documentation/examples arounnd 

_"org.apache.hadoop.hive.ql.hooks"_ on how to develop a hook and use it with 

hive.exec.post.hooks
hive.exec.pre.hooks
hive.exec.failure.hooks
hive.query.lifetime.hooks

 

 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22157) Hive Pushing Aggr extension to Druid

2019-08-29 Thread liuyan (Jira)
liuyan created HIVE-22157:
-

 Summary: Hive Pushing Aggr extension to Druid
 Key: HIVE-22157
 URL: https://issues.apache.org/jira/browse/HIVE-22157
 Project: Hive
  Issue Type: Wish
  Components: Druid integration
Affects Versions: 3.1.2, 3.1.1, 3.1.0, 3.0.0
Reporter: liuyan


Currently Hive can not push aggr spec if one want to use customized extension 
in druid for the execution

when using Hive, below query is been rewritten with no aggr defined 

Explain  select  floor_day(`__time`),count(distinct visitor_id) as uv from 
druid group by floor_day(`__time`);

.

"limitSpec":\{"type":"default"},

"aggregations":[],

"intervals":["1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z"]

.. 

 

But what one really need was 

 

"aggregations": [ { "type": "distinctCount", "name": "uv", "fieldName": 
"visitor_id" } ]

 

and aggregations spec is using the {{druid-distinctcount}} extension.  

 

It's we can call Druid's UDAF from HiveSQL and been able push that call into 
the execution plan to use that UDAF on Druid DataSource, this would be a nice 
thing to power up the Hive-Druid Integration.

 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)