[jira] [Created] (HIVE-26104) HIVE-19138 May block queries to compile
liuyan created HIVE-26104: - Summary: HIVE-19138 May block queries to compile Key: HIVE-26104 URL: https://issues.apache.org/jira/browse/HIVE-26104 Project: Hive Issue Type: Bug Components: CBO Affects Versions: 3.1.2, 3.0.0 Reporter: liuyan HIVE-19138 introduce a way to allow other queries to stay in compilation state while there are placeholder for the same query in result cache. However, multiple queires may enter the same state and hence used all the avaliable parallel compilation limit via hive.driver.parallel.compilation.global.limit. Althought we can turn off this feature by setting hive.query.results.cache.wait.for.pending.results = false, but seems this negelects all the efforts that Hive-19138 trying to reslove. We need a better solution for such situation -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HIVE-24712) hive.map.aggr=false and hive.optimize.reducededuplication=false provide incorrect result on order by with limit
liuyan created HIVE-24712: - Summary: hive.map.aggr=false and hive.optimize.reducededuplication=false provide incorrect result on order by with limit Key: HIVE-24712 URL: https://issues.apache.org/jira/browse/HIVE-24712 Project: Hive Issue Type: Improvement Components: CBO Affects Versions: 3.1.0 Reporter: liuyan When Both param set to false , seems the result is not correct, only 35 rows. This is tested on HDP 3.1.5 set hive.map.aggr=false; set hive.optimize.reducededuplication=false; select cs_sold_date_sk,count(distinct cs_order_number) from tpcds_orc.catalog_sales_orc group by cs_sold_date_sk order by cs_sold_date_sk limit 200; -- VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED -- Map 1 .. llap SUCCEEDED 33 33 0 0 0 0 Reducer 2 .. llap SUCCEEDED 4 4 0 0 0 0 Reducer 3 .. llap SUCCEEDED 4 4 0 0 0 0 Reducer 4 .. llap SUCCEEDED 1 1 0 0 0 0 -- VERTICES: 04/04 [==>>] 100% ELAPSED TIME: 38.23 s -- FO : INFO : Task Execution Summary INFO : -- INFO : VERTICES DURATION(ms) CPU_TIME(ms) GC_TIME(ms) INPUT_RECORDS OUTPUT_RECORDS INFO : -- INFO : Map 1 38097.00 0 0 143,997,065 57,447 INFO : Reducer 2 9003.00 0 0 57,447 13,108 INFO : Reducer 3 0.00 0 0 13,108 35 INFO : Reducer 4 0.00 0 0 35 0 INFO : -- INFO : INFO : LLAP IO Summary set hive.map.aggr=true; set hive.optimize.reducededuplication=false; select cs_sold_date_sk,count(distinct cs_order_number) from tpcds_orc.catalog_sales_orc group by cs_sold_date_sk order by cs_sold_date_sk limit 200; -- VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED -- Map 1 .. llap SUCCEEDED 33 33 0 0 0 0 Reducer 2 .. llap SUCCEEDED 4 4 0 0 0 0 Reducer 3 .. llap SUCCEEDED 2 2 0 0 0 0 Reducer 4 .. llap SUCCEEDED 1 1 0 0 0 0 -- VERTICES: 04/04 [==>>] 100% ELAPSED TIME: 36.24 s -- INFO : -- INFO : VERTICES DURATION(ms) CPU_TIME(ms) GC_TIME(ms) INPUT_RECORDS OUTPUT_RECORDS INFO : -- INFO : Map 1 25595.00 0 0 143,997,065 16,703,757 INFO : Reducer 2 18556.00 0 0 16,703,757 800 INFO : Reducer 3 8018.00 0 0 800 200 INFO : Reducer 4 0.00 0 0 200 0 INFO : -- INFO : -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23777) hive.semantic.analyzer.hook missing documentation
liuyan created HIVE-23777: - Summary: hive.semantic.analyzer.hook missing documentation Key: HIVE-23777 URL: https://issues.apache.org/jira/browse/HIVE-23777 Project: Hive Issue Type: Improvement Components: Documentation Affects Versions: 3.1.2, 3.1.1, 3.1.0, 3.0.0 Reporter: liuyan hive.semantic.analyzer.hook is missing documentation from the [https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties] Page -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23749) Improve Hive Hook Documentation
liuyan created HIVE-23749: - Summary: Improve Hive Hook Documentation Key: HIVE-23749 URL: https://issues.apache.org/jira/browse/HIVE-23749 Project: Hive Issue Type: Improvement Components: Documentation Affects Versions: 3.1.2, 3.1.1, 3.1.0, 3.0.0 Reporter: liuyan h5. Seems we have little documentation/examples arounnd _"org.apache.hadoop.hive.ql.hooks"_ on how to develop a hook and use it with hive.exec.post.hooks hive.exec.pre.hooks hive.exec.failure.hooks hive.query.lifetime.hooks -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22157) Hive Pushing Aggr extension to Druid
liuyan created HIVE-22157: - Summary: Hive Pushing Aggr extension to Druid Key: HIVE-22157 URL: https://issues.apache.org/jira/browse/HIVE-22157 Project: Hive Issue Type: Wish Components: Druid integration Affects Versions: 3.1.2, 3.1.1, 3.1.0, 3.0.0 Reporter: liuyan Currently Hive can not push aggr spec if one want to use customized extension in druid for the execution when using Hive, below query is been rewritten with no aggr defined Explain select floor_day(`__time`),count(distinct visitor_id) as uv from druid group by floor_day(`__time`); . "limitSpec":\{"type":"default"}, "aggregations":[], "intervals":["1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z"] .. But what one really need was "aggregations": [ { "type": "distinctCount", "name": "uv", "fieldName": "visitor_id" } ] and aggregations spec is using the {{druid-distinctcount}} extension. It's we can call Druid's UDAF from HiveSQL and been able push that call into the execution plan to use that UDAF on Druid DataSource, this would be a nice thing to power up the Hive-Druid Integration. -- This message was sent by Atlassian Jira (v8.3.2#803003)