[jira] [Commented] (HIVE-10244) Vectorization : TPC-DS Q80 fails with java.lang.ClassCastException when hive.vectorized.execution.reduce.enabled is enabled
[ https://issues.apache.org/jira/browse/HIVE-10244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537537#comment-14537537 ] Mostafa Mokhtar commented on HIVE-10244: [~jpullokkaran] Can you please take a look? Vectorization : TPC-DS Q80 fails with java.lang.ClassCastException when hive.vectorized.execution.reduce.enabled is enabled --- Key: HIVE-10244 URL: https://issues.apache.org/jira/browse/HIVE-10244 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Mostafa Mokhtar Assignee: Matt McCline Attachments: explain_q80_vectorized_reduce_on.txt Query {code} set hive.vectorized.execution.reduce.enabled=true; with ssr as (select s_store_id as store_id, sum(ss_ext_sales_price) as sales, sum(coalesce(sr_return_amt, 0)) as returns, sum(ss_net_profit - coalesce(sr_net_loss, 0)) as profit from store_sales left outer join store_returns on (ss_item_sk = sr_item_sk and ss_ticket_number = sr_ticket_number), date_dim, store, item, promotion where ss_sold_date_sk = d_date_sk and d_date between cast('1998-08-04' as date) and (cast('1998-09-04' as date)) and ss_store_sk = s_store_sk and ss_item_sk = i_item_sk and i_current_price 50 and ss_promo_sk = p_promo_sk and p_channel_tv = 'N' group by s_store_id) , csr as (select cp_catalog_page_id as catalog_page_id, sum(cs_ext_sales_price) as sales, sum(coalesce(cr_return_amount, 0)) as returns, sum(cs_net_profit - coalesce(cr_net_loss, 0)) as profit from catalog_sales left outer join catalog_returns on (cs_item_sk = cr_item_sk and cs_order_number = cr_order_number), date_dim, catalog_page, item, promotion where cs_sold_date_sk = d_date_sk and d_date between cast('1998-08-04' as date) and (cast('1998-09-04' as date)) and cs_catalog_page_sk = cp_catalog_page_sk and cs_item_sk = i_item_sk and i_current_price 50 and cs_promo_sk = p_promo_sk and p_channel_tv = 'N' group by cp_catalog_page_id) , wsr as (select web_site_id, sum(ws_ext_sales_price) as sales, sum(coalesce(wr_return_amt, 0)) as returns, sum(ws_net_profit - coalesce(wr_net_loss, 0)) as profit from web_sales left outer join web_returns on (ws_item_sk = wr_item_sk and ws_order_number = wr_order_number), date_dim, web_site, item, promotion where ws_sold_date_sk = d_date_sk and d_date between cast('1998-08-04' as date) and (cast('1998-09-04' as date)) and ws_web_site_sk = web_site_sk and ws_item_sk = i_item_sk and i_current_price 50 and ws_promo_sk = p_promo_sk and p_channel_tv = 'N' group by web_site_id) select channel , id , sum(sales) as sales , sum(returns) as returns , sum(profit) as profit from (select 'store channel' as channel , concat('store', store_id) as id , sales , returns , profit from ssr union all select 'catalog channel' as channel , concat('catalog_page', catalog_page_id) as id , sales , returns , profit from csr union all select 'web channel' as channel , concat('web_site', web_site_id) as id , sales , returns , profit from wsr ) x group by channel, id with rollup order by channel ,id limit 100 {code} Exception {code} Vertex failed, vertexName=Reducer 5, vertexId=vertex_1426707664723_1377_1_22, diagnostics=[Task failed, taskId=task_1426707664723_1377_1_22_00, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing vector batch (tag=0) \N\N09.285817653506076E84.639990363237801E7-1.1814318134887291E8 \N\N04.682909323885761E82.2415242712669864E7-5.966176123188091E7 \N\N01.2847032699693155E96.300096113768728E7-5.94963316209578E8 at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:330) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179) at
[jira] [Updated] (HIVE-10609) Vectorization : Q64 fails with ClassCastException
[ https://issues.apache.org/jira/browse/HIVE-10609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-10609: Attachment: HIVE-10609.02.patch Vectorization : Q64 fails with ClassCastException - Key: HIVE-10609 URL: https://issues.apache.org/jira/browse/HIVE-10609 Project: Hive Issue Type: Bug Components: Vectorization Affects Versions: 1.2.0 Reporter: Mostafa Mokhtar Assignee: Matt McCline Attachments: HIVE-10609.01.patch, HIVE-10609.02.patch TPC-DS Q64 fails with ClassCastException. Query {code} select cs1.product_name ,cs1.store_name ,cs1.store_zip ,cs1.b_street_number ,cs1.b_streen_name ,cs1.b_city ,cs1.b_zip ,cs1.c_street_number ,cs1.c_street_name ,cs1.c_city ,cs1.c_zip ,cs1.syear ,cs1.cnt ,cs1.s1 ,cs1.s2 ,cs1.s3 ,cs2.s1 ,cs2.s2 ,cs2.s3 ,cs2.syear ,cs2.cnt from (select i_product_name as product_name ,i_item_sk as item_sk ,s_store_name as store_name ,s_zip as store_zip ,ad1.ca_street_number as b_street_number ,ad1.ca_street_name as b_streen_name ,ad1.ca_city as b_city ,ad1.ca_zip as b_zip ,ad2.ca_street_number as c_street_number ,ad2.ca_street_name as c_street_name ,ad2.ca_city as c_city ,ad2.ca_zip as c_zip ,d1.d_year as syear ,d2.d_year as fsyear ,d3.d_year as s2year ,count(*) as cnt ,sum(ss_wholesale_cost) as s1 ,sum(ss_list_price) as s2 ,sum(ss_coupon_amt) as s3 FROM store_sales JOIN store_returns ON store_sales.ss_item_sk = store_returns.sr_item_sk and store_sales.ss_ticket_number = store_returns.sr_ticket_number JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk JOIN date_dim d1 ON store_sales.ss_sold_date_sk = d1.d_date_sk JOIN date_dim d2 ON customer.c_first_sales_date_sk = d2.d_date_sk JOIN date_dim d3 ON customer.c_first_shipto_date_sk = d3.d_date_sk JOIN store ON store_sales.ss_store_sk = store.s_store_sk JOIN customer_demographics cd1 ON store_sales.ss_cdemo_sk= cd1.cd_demo_sk JOIN customer_demographics cd2 ON customer.c_current_cdemo_sk = cd2.cd_demo_sk JOIN promotion ON store_sales.ss_promo_sk = promotion.p_promo_sk JOIN household_demographics hd1 ON store_sales.ss_hdemo_sk = hd1.hd_demo_sk JOIN household_demographics hd2 ON customer.c_current_hdemo_sk = hd2.hd_demo_sk JOIN customer_address ad1 ON store_sales.ss_addr_sk = ad1.ca_address_sk JOIN customer_address ad2 ON customer.c_current_addr_sk = ad2.ca_address_sk JOIN income_band ib1 ON hd1.hd_income_band_sk = ib1.ib_income_band_sk JOIN income_band ib2 ON hd2.hd_income_band_sk = ib2.ib_income_band_sk JOIN item ON store_sales.ss_item_sk = item.i_item_sk JOIN (select cs_item_sk ,sum(cs_ext_list_price) as sale,sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit) as refund from catalog_sales JOIN catalog_returns ON catalog_sales.cs_item_sk = catalog_returns.cr_item_sk and catalog_sales.cs_order_number = catalog_returns.cr_order_number group by cs_item_sk having sum(cs_ext_list_price)2*sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit)) cs_ui ON store_sales.ss_item_sk = cs_ui.cs_item_sk WHERE cd1.cd_marital_status cd2.cd_marital_status and i_color in ('maroon','burnished','dim','steel','navajo','chocolate') and i_current_price between 35 and 35 + 10 and i_current_price between 35 + 1 and 35 + 15 group by i_product_name ,i_item_sk ,s_store_name ,s_zip ,ad1.ca_street_number ,ad1.ca_street_name ,ad1.ca_city ,ad1.ca_zip ,ad2.ca_street_number ,ad2.ca_street_name ,ad2.ca_city ,ad2.ca_zip ,d1.d_year ,d2.d_year ,d3.d_year ) cs1 JOIN (select i_product_name as product_name ,i_item_sk as item_sk ,s_store_name as store_name ,s_zip as store_zip ,ad1.ca_street_number as b_street_number ,ad1.ca_street_name as b_streen_name ,ad1.ca_city as b_city ,ad1.ca_zip as b_zip ,ad2.ca_street_number as c_street_number ,ad2.ca_street_name as c_street_name ,ad2.ca_city as c_city ,ad2.ca_zip as c_zip ,d1.d_year as syear ,d2.d_year as fsyear ,d3.d_year as s2year ,count(*) as cnt ,sum(ss_wholesale_cost) as s1 ,sum(ss_list_price) as s2 ,sum(ss_coupon_amt) as s3 FROM store_sales JOIN store_returns ON store_sales.ss_item_sk = store_returns.sr_item_sk and store_sales.ss_ticket_number = store_returns.sr_ticket_number JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk JOIN date_dim d1 ON store_sales.ss_sold_date_sk = d1.d_date_sk JOIN date_dim d2 ON customer.c_first_sales_date_sk = d2.d_date_sk JOIN date_dim d3 ON customer.c_first_shipto_date_sk =
[jira] [Commented] (HIVE-10609) Vectorization : Q64 fails with ClassCastException
[ https://issues.apache.org/jira/browse/HIVE-10609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537484#comment-14537484 ] Matt McCline commented on HIVE-10609: - [~mmokhtar] Thanks for verifying it! Fix left join filtered code. [Note: the Tez results for vector_left_outer_join2.q are wrong. This is a known issue solved by pending HIVE-10565] Vectorization : Q64 fails with ClassCastException - Key: HIVE-10609 URL: https://issues.apache.org/jira/browse/HIVE-10609 Project: Hive Issue Type: Bug Components: Vectorization Affects Versions: 1.2.0 Reporter: Mostafa Mokhtar Assignee: Matt McCline Attachments: HIVE-10609.01.patch, HIVE-10609.02.patch TPC-DS Q64 fails with ClassCastException. Query {code} select cs1.product_name ,cs1.store_name ,cs1.store_zip ,cs1.b_street_number ,cs1.b_streen_name ,cs1.b_city ,cs1.b_zip ,cs1.c_street_number ,cs1.c_street_name ,cs1.c_city ,cs1.c_zip ,cs1.syear ,cs1.cnt ,cs1.s1 ,cs1.s2 ,cs1.s3 ,cs2.s1 ,cs2.s2 ,cs2.s3 ,cs2.syear ,cs2.cnt from (select i_product_name as product_name ,i_item_sk as item_sk ,s_store_name as store_name ,s_zip as store_zip ,ad1.ca_street_number as b_street_number ,ad1.ca_street_name as b_streen_name ,ad1.ca_city as b_city ,ad1.ca_zip as b_zip ,ad2.ca_street_number as c_street_number ,ad2.ca_street_name as c_street_name ,ad2.ca_city as c_city ,ad2.ca_zip as c_zip ,d1.d_year as syear ,d2.d_year as fsyear ,d3.d_year as s2year ,count(*) as cnt ,sum(ss_wholesale_cost) as s1 ,sum(ss_list_price) as s2 ,sum(ss_coupon_amt) as s3 FROM store_sales JOIN store_returns ON store_sales.ss_item_sk = store_returns.sr_item_sk and store_sales.ss_ticket_number = store_returns.sr_ticket_number JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk JOIN date_dim d1 ON store_sales.ss_sold_date_sk = d1.d_date_sk JOIN date_dim d2 ON customer.c_first_sales_date_sk = d2.d_date_sk JOIN date_dim d3 ON customer.c_first_shipto_date_sk = d3.d_date_sk JOIN store ON store_sales.ss_store_sk = store.s_store_sk JOIN customer_demographics cd1 ON store_sales.ss_cdemo_sk= cd1.cd_demo_sk JOIN customer_demographics cd2 ON customer.c_current_cdemo_sk = cd2.cd_demo_sk JOIN promotion ON store_sales.ss_promo_sk = promotion.p_promo_sk JOIN household_demographics hd1 ON store_sales.ss_hdemo_sk = hd1.hd_demo_sk JOIN household_demographics hd2 ON customer.c_current_hdemo_sk = hd2.hd_demo_sk JOIN customer_address ad1 ON store_sales.ss_addr_sk = ad1.ca_address_sk JOIN customer_address ad2 ON customer.c_current_addr_sk = ad2.ca_address_sk JOIN income_band ib1 ON hd1.hd_income_band_sk = ib1.ib_income_band_sk JOIN income_band ib2 ON hd2.hd_income_band_sk = ib2.ib_income_band_sk JOIN item ON store_sales.ss_item_sk = item.i_item_sk JOIN (select cs_item_sk ,sum(cs_ext_list_price) as sale,sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit) as refund from catalog_sales JOIN catalog_returns ON catalog_sales.cs_item_sk = catalog_returns.cr_item_sk and catalog_sales.cs_order_number = catalog_returns.cr_order_number group by cs_item_sk having sum(cs_ext_list_price)2*sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit)) cs_ui ON store_sales.ss_item_sk = cs_ui.cs_item_sk WHERE cd1.cd_marital_status cd2.cd_marital_status and i_color in ('maroon','burnished','dim','steel','navajo','chocolate') and i_current_price between 35 and 35 + 10 and i_current_price between 35 + 1 and 35 + 15 group by i_product_name ,i_item_sk ,s_store_name ,s_zip ,ad1.ca_street_number ,ad1.ca_street_name ,ad1.ca_city ,ad1.ca_zip ,ad2.ca_street_number ,ad2.ca_street_name ,ad2.ca_city ,ad2.ca_zip ,d1.d_year ,d2.d_year ,d3.d_year ) cs1 JOIN (select i_product_name as product_name ,i_item_sk as item_sk ,s_store_name as store_name ,s_zip as store_zip ,ad1.ca_street_number as b_street_number ,ad1.ca_street_name as b_streen_name ,ad1.ca_city as b_city ,ad1.ca_zip as b_zip ,ad2.ca_street_number as c_street_number ,ad2.ca_street_name as c_street_name ,ad2.ca_city as c_city ,ad2.ca_zip as c_zip ,d1.d_year as syear ,d2.d_year as fsyear ,d3.d_year as s2year ,count(*) as cnt ,sum(ss_wholesale_cost) as s1 ,sum(ss_list_price) as s2 ,sum(ss_coupon_amt) as s3 FROM store_sales JOIN store_returns ON store_sales.ss_item_sk = store_returns.sr_item_sk and store_sales.ss_ticket_number = store_returns.sr_ticket_number JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk
[jira] [Commented] (HIVE-10609) Vectorization : Q64 fails with ClassCastException
[ https://issues.apache.org/jira/browse/HIVE-10609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537518#comment-14537518 ] Hive QA commented on HIVE-10609: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12731842/HIVE-10609.02.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 8921 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_static org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark.testSparkQuery {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3843/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3843/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3843/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12731842 - PreCommit-HIVE-TRUNK-Build Vectorization : Q64 fails with ClassCastException - Key: HIVE-10609 URL: https://issues.apache.org/jira/browse/HIVE-10609 Project: Hive Issue Type: Bug Components: Vectorization Affects Versions: 1.2.0 Reporter: Mostafa Mokhtar Assignee: Matt McCline Attachments: HIVE-10609.01.patch, HIVE-10609.02.patch TPC-DS Q64 fails with ClassCastException. Query {code} select cs1.product_name ,cs1.store_name ,cs1.store_zip ,cs1.b_street_number ,cs1.b_streen_name ,cs1.b_city ,cs1.b_zip ,cs1.c_street_number ,cs1.c_street_name ,cs1.c_city ,cs1.c_zip ,cs1.syear ,cs1.cnt ,cs1.s1 ,cs1.s2 ,cs1.s3 ,cs2.s1 ,cs2.s2 ,cs2.s3 ,cs2.syear ,cs2.cnt from (select i_product_name as product_name ,i_item_sk as item_sk ,s_store_name as store_name ,s_zip as store_zip ,ad1.ca_street_number as b_street_number ,ad1.ca_street_name as b_streen_name ,ad1.ca_city as b_city ,ad1.ca_zip as b_zip ,ad2.ca_street_number as c_street_number ,ad2.ca_street_name as c_street_name ,ad2.ca_city as c_city ,ad2.ca_zip as c_zip ,d1.d_year as syear ,d2.d_year as fsyear ,d3.d_year as s2year ,count(*) as cnt ,sum(ss_wholesale_cost) as s1 ,sum(ss_list_price) as s2 ,sum(ss_coupon_amt) as s3 FROM store_sales JOIN store_returns ON store_sales.ss_item_sk = store_returns.sr_item_sk and store_sales.ss_ticket_number = store_returns.sr_ticket_number JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk JOIN date_dim d1 ON store_sales.ss_sold_date_sk = d1.d_date_sk JOIN date_dim d2 ON customer.c_first_sales_date_sk = d2.d_date_sk JOIN date_dim d3 ON customer.c_first_shipto_date_sk = d3.d_date_sk JOIN store ON store_sales.ss_store_sk = store.s_store_sk JOIN customer_demographics cd1 ON store_sales.ss_cdemo_sk= cd1.cd_demo_sk JOIN customer_demographics cd2 ON customer.c_current_cdemo_sk = cd2.cd_demo_sk JOIN promotion ON store_sales.ss_promo_sk = promotion.p_promo_sk JOIN household_demographics hd1 ON store_sales.ss_hdemo_sk = hd1.hd_demo_sk JOIN household_demographics hd2 ON customer.c_current_hdemo_sk = hd2.hd_demo_sk JOIN customer_address ad1 ON store_sales.ss_addr_sk = ad1.ca_address_sk JOIN customer_address ad2 ON customer.c_current_addr_sk = ad2.ca_address_sk JOIN income_band ib1 ON hd1.hd_income_band_sk = ib1.ib_income_band_sk JOIN income_band ib2 ON hd2.hd_income_band_sk = ib2.ib_income_band_sk JOIN item ON store_sales.ss_item_sk = item.i_item_sk JOIN (select cs_item_sk ,sum(cs_ext_list_price) as sale,sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit) as refund from catalog_sales JOIN catalog_returns ON catalog_sales.cs_item_sk = catalog_returns.cr_item_sk and catalog_sales.cs_order_number = catalog_returns.cr_order_number group by cs_item_sk having sum(cs_ext_list_price)2*sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit)) cs_ui ON store_sales.ss_item_sk = cs_ui.cs_item_sk WHERE cd1.cd_marital_status cd2.cd_marital_status and i_color in ('maroon','burnished','dim','steel','navajo','chocolate') and i_current_price between 35 and 35 + 10 and i_current_price between 35 + 1 and 35 + 15 group by i_product_name ,i_item_sk ,s_store_name ,s_zip ,ad1.ca_street_number
[jira] [Commented] (HIVE-10244) Vectorization : TPC-DS Q80 fails with java.lang.ClassCastException when hive.vectorized.execution.reduce.enabled is enabled
[ https://issues.apache.org/jira/browse/HIVE-10244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537525#comment-14537525 ] Matt McCline commented on HIVE-10244: - This information is from TPC-DS 67, but I think it is probably the same problem: The exception occurs in (vectorized) Reducer 3 fed from Map 2. The type shown in the Explain plan for Map 2 shows _col8 as type string. And, Reducer 3 shows _col8 as type string. Yet, we seem to create the Vectorized Row Batch in the Tez Reduce code as Decimal? The COALESCE in the last column of Select Operator expressions shows type decimal(18,2). Yet, the following Group By operator shows the last column with a funny name 0 and type string. Something is not right here. Partial explain output from the end of Map 2: {noformat} Select Operator expressions: _col3 (type: string), _col2 (type: string), _col1 (type: string), _col4 (type: string), _col12 (type: int), _col14 (type: int), _col13 (type: int), _col16 (type: string), COALESCE((_col8 * CAST( _col7 AS decimal(10,0))),0) (type: decimal(18,2)) outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8 Statistics: Num rows: 915185 Data size: 1236889305 Basic stats: COMPLETE Column stats: NONE Group By Operator aggregations: sum(_col8) keys: _col0 (type: string), _col1 (type: string), _col2 (type: string), _col3 (type: string), _col4 (type: int), _col5 (type: int), _col6 (type: int), _col7 (type: string), '0' (type: string) mode: hash outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9 Statistics: Num rows: 8236665 Data size: 11132003745 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: _col0 (type: string), _col1 (type: string), _col2 (type: string), _col3 (type: string), _col4 (type: int), _col5 (type: int), _col6 (type: int), _col7 (type: string), _col8 (type: string) sort order: + Map-reduce partition columns: _col0 (type: string), _col1 (type: string), _col2 (type: string), _col3 (type: string), _col4 (type: int), _col5 (type: int), _col6 (type: int), _col7 (type: string), _col8 (type: string) Statistics: Num rows: 8236665 Data size: 11132003745 Basic stats: COMPLETE Column stats: NONE value expressions: _col9 (type: decimal(28,2)) {noformat} Full explain output: {noformat} STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Tez Edges: Map 2 - Map 1 (BROADCAST_EDGE), Map 6 (BROADCAST_EDGE), Map 7 (BROADCAST_EDGE) Reducer 3 - Map 2 (SIMPLE_EDGE) Reducer 4 - Reducer 3 (SIMPLE_EDGE) Reducer 5 - Reducer 4 (SIMPLE_EDGE) A masked pattern was here Vertices: Map 1 Map Operator Tree: TableScan alias: item Statistics: Num rows: 18000 Data size: 29671008 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: i_item_sk is not null (type: boolean) Statistics: Num rows: 9000 Data size: 14835504 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: i_item_sk (type: int), i_brand (type: string), i_class (type: string), i_category (type: string), i_product_name (type: string) outputColumnNames: _col0, _col1, _col2, _col3, _col4 Statistics: Num rows: 9000 Data size: 14835504 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: _col0 (type: int) sort order: + Map-reduce partition columns: _col0 (type: int) Statistics: Num rows: 9000 Data size: 14835504 Basic stats: COMPLETE Column stats: NONE value expressions: _col1 (type: string), _col2 (type: string), _col3 (type: string), _col4 (type: string) Map 2 Map Operator Tree: TableScan alias: store_sales Statistics: Num rows: 2750370 Data size: 3717170032 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (ss_sold_date_sk is not null and ss_item_sk is not null) (type: boolean)
[jira] [Commented] (HIVE-10609) Vectorization : Q64 fails with ClassCastException
[ https://issues.apache.org/jira/browse/HIVE-10609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537532#comment-14537532 ] Matt McCline commented on HIVE-10609: - Test failures are not related to the changes here. Vectorization : Q64 fails with ClassCastException - Key: HIVE-10609 URL: https://issues.apache.org/jira/browse/HIVE-10609 Project: Hive Issue Type: Bug Components: Vectorization Affects Versions: 1.2.0 Reporter: Mostafa Mokhtar Assignee: Matt McCline Attachments: HIVE-10609.01.patch, HIVE-10609.02.patch TPC-DS Q64 fails with ClassCastException. Query {code} select cs1.product_name ,cs1.store_name ,cs1.store_zip ,cs1.b_street_number ,cs1.b_streen_name ,cs1.b_city ,cs1.b_zip ,cs1.c_street_number ,cs1.c_street_name ,cs1.c_city ,cs1.c_zip ,cs1.syear ,cs1.cnt ,cs1.s1 ,cs1.s2 ,cs1.s3 ,cs2.s1 ,cs2.s2 ,cs2.s3 ,cs2.syear ,cs2.cnt from (select i_product_name as product_name ,i_item_sk as item_sk ,s_store_name as store_name ,s_zip as store_zip ,ad1.ca_street_number as b_street_number ,ad1.ca_street_name as b_streen_name ,ad1.ca_city as b_city ,ad1.ca_zip as b_zip ,ad2.ca_street_number as c_street_number ,ad2.ca_street_name as c_street_name ,ad2.ca_city as c_city ,ad2.ca_zip as c_zip ,d1.d_year as syear ,d2.d_year as fsyear ,d3.d_year as s2year ,count(*) as cnt ,sum(ss_wholesale_cost) as s1 ,sum(ss_list_price) as s2 ,sum(ss_coupon_amt) as s3 FROM store_sales JOIN store_returns ON store_sales.ss_item_sk = store_returns.sr_item_sk and store_sales.ss_ticket_number = store_returns.sr_ticket_number JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk JOIN date_dim d1 ON store_sales.ss_sold_date_sk = d1.d_date_sk JOIN date_dim d2 ON customer.c_first_sales_date_sk = d2.d_date_sk JOIN date_dim d3 ON customer.c_first_shipto_date_sk = d3.d_date_sk JOIN store ON store_sales.ss_store_sk = store.s_store_sk JOIN customer_demographics cd1 ON store_sales.ss_cdemo_sk= cd1.cd_demo_sk JOIN customer_demographics cd2 ON customer.c_current_cdemo_sk = cd2.cd_demo_sk JOIN promotion ON store_sales.ss_promo_sk = promotion.p_promo_sk JOIN household_demographics hd1 ON store_sales.ss_hdemo_sk = hd1.hd_demo_sk JOIN household_demographics hd2 ON customer.c_current_hdemo_sk = hd2.hd_demo_sk JOIN customer_address ad1 ON store_sales.ss_addr_sk = ad1.ca_address_sk JOIN customer_address ad2 ON customer.c_current_addr_sk = ad2.ca_address_sk JOIN income_band ib1 ON hd1.hd_income_band_sk = ib1.ib_income_band_sk JOIN income_band ib2 ON hd2.hd_income_band_sk = ib2.ib_income_band_sk JOIN item ON store_sales.ss_item_sk = item.i_item_sk JOIN (select cs_item_sk ,sum(cs_ext_list_price) as sale,sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit) as refund from catalog_sales JOIN catalog_returns ON catalog_sales.cs_item_sk = catalog_returns.cr_item_sk and catalog_sales.cs_order_number = catalog_returns.cr_order_number group by cs_item_sk having sum(cs_ext_list_price)2*sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit)) cs_ui ON store_sales.ss_item_sk = cs_ui.cs_item_sk WHERE cd1.cd_marital_status cd2.cd_marital_status and i_color in ('maroon','burnished','dim','steel','navajo','chocolate') and i_current_price between 35 and 35 + 10 and i_current_price between 35 + 1 and 35 + 15 group by i_product_name ,i_item_sk ,s_store_name ,s_zip ,ad1.ca_street_number ,ad1.ca_street_name ,ad1.ca_city ,ad1.ca_zip ,ad2.ca_street_number ,ad2.ca_street_name ,ad2.ca_city ,ad2.ca_zip ,d1.d_year ,d2.d_year ,d3.d_year ) cs1 JOIN (select i_product_name as product_name ,i_item_sk as item_sk ,s_store_name as store_name ,s_zip as store_zip ,ad1.ca_street_number as b_street_number ,ad1.ca_street_name as b_streen_name ,ad1.ca_city as b_city ,ad1.ca_zip as b_zip ,ad2.ca_street_number as c_street_number ,ad2.ca_street_name as c_street_name ,ad2.ca_city as c_city ,ad2.ca_zip as c_zip ,d1.d_year as syear ,d2.d_year as fsyear ,d3.d_year as s2year ,count(*) as cnt ,sum(ss_wholesale_cost) as s1 ,sum(ss_list_price) as s2 ,sum(ss_coupon_amt) as s3 FROM store_sales JOIN store_returns ON store_sales.ss_item_sk = store_returns.sr_item_sk and store_sales.ss_ticket_number = store_returns.sr_ticket_number JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk JOIN date_dim d1 ON store_sales.ss_sold_date_sk = d1.d_date_sk JOIN date_dim d2 ON customer.c_first_sales_date_sk = d2.d_date_sk
[jira] [Updated] (HIVE-10565) LLAP: Native Vector Map Join doesn't handle filtering and matching on LEFT OUTER JOIN repeated key correctly
[ https://issues.apache.org/jira/browse/HIVE-10565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-10565: Attachment: HIVE-10565.092.patch LLAP: Native Vector Map Join doesn't handle filtering and matching on LEFT OUTER JOIN repeated key correctly Key: HIVE-10565 URL: https://issues.apache.org/jira/browse/HIVE-10565 Project: Hive Issue Type: Sub-task Components: Hive Affects Versions: 1.2.0 Reporter: Matt McCline Assignee: Matt McCline Priority: Critical Fix For: 1.3.0 Attachments: HIVE-10565.01.patch, HIVE-10565.02.patch, HIVE-10565.03.patch, HIVE-10565.04.patch, HIVE-10565.05.patch, HIVE-10565.06.patch, HIVE-10565.07.patch, HIVE-10565.08.patch, HIVE-10565.09.patch, HIVE-10565.091.patch, HIVE-10565.092.patch Filtering can knock out some of the rows for a repeated key, but those knocked out rows need to be included in the LEFT OUTER JOIN result and are currently not when only some rows are filtered out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10036) Writing ORC format big table causes OOM - too many fixed sized stream buffers
[ https://issues.apache.org/jira/browse/HIVE-10036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537540#comment-14537540 ] Selina Zhang commented on HIVE-10036: - The above two unit test failures seem irrelevant to this patch. Writing ORC format big table causes OOM - too many fixed sized stream buffers - Key: HIVE-10036 URL: https://issues.apache.org/jira/browse/HIVE-10036 Project: Hive Issue Type: Improvement Reporter: Selina Zhang Assignee: Selina Zhang Labels: orcfile Attachments: HIVE-10036.1.patch, HIVE-10036.2.patch, HIVE-10036.3.patch, HIVE-10036.5.patch, HIVE-10036.6.patch, HIVE-10036.7.patch, HIVE-10036.8.patch, HIVE-10036.9.patch ORC writer keeps multiple out steams for each column. Each output stream is allocated fixed size ByteBuffer (configurable, default to 256K). For a big table, the memory cost is unbearable. Specially when HCatalog dynamic partition involves, several hundreds files may be open and writing at the same time (same problems for FileSinkOperator). Global ORC memory manager controls the buffer size, but it only got kicked in at 5000 rows interval. An enhancement could be done here, but the problem is reducing the buffer size introduces worse compression and more IOs in read path. Sacrificing the read performance is always not a good choice. I changed the fixed size ByteBuffer to a dynamic growth buffer which up bound to the existing configurable buffer size. Most of the streams does not need large buffer so the performance got improved significantly. Comparing to Facebook's hive-dwrf, I monitored 2x performance gain with this fix. Solving OOM for ORC completely maybe needs lots of effort , but this is definitely a low hanging fruit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10565) LLAP: Native Vector Map Join doesn't handle filtering and matching on LEFT OUTER JOIN repeated key correctly
[ https://issues.apache.org/jira/browse/HIVE-10565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537611#comment-14537611 ] Hive QA commented on HIVE-10565: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12731848/HIVE-10565.092.patch {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 8935 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_static org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchEmptyCommit {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3844/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3844/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3844/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12731848 - PreCommit-HIVE-TRUNK-Build LLAP: Native Vector Map Join doesn't handle filtering and matching on LEFT OUTER JOIN repeated key correctly Key: HIVE-10565 URL: https://issues.apache.org/jira/browse/HIVE-10565 Project: Hive Issue Type: Sub-task Components: Hive Affects Versions: 1.2.0 Reporter: Matt McCline Assignee: Matt McCline Priority: Critical Fix For: 1.3.0 Attachments: HIVE-10565.01.patch, HIVE-10565.02.patch, HIVE-10565.03.patch, HIVE-10565.04.patch, HIVE-10565.05.patch, HIVE-10565.06.patch, HIVE-10565.07.patch, HIVE-10565.08.patch, HIVE-10565.09.patch, HIVE-10565.091.patch, HIVE-10565.092.patch Filtering can knock out some of the rows for a repeated key, but those knocked out rows need to be included in the LEFT OUTER JOIN result and are currently not when only some rows are filtered out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10036) Writing ORC format big table causes OOM - too many fixed sized stream buffers
[ https://issues.apache.org/jira/browse/HIVE-10036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Selina Zhang updated HIVE-10036: Attachment: HIVE-10036.9.patch Fixed the unit tests. Writing ORC format big table causes OOM - too many fixed sized stream buffers - Key: HIVE-10036 URL: https://issues.apache.org/jira/browse/HIVE-10036 Project: Hive Issue Type: Improvement Reporter: Selina Zhang Assignee: Selina Zhang Labels: orcfile Attachments: HIVE-10036.1.patch, HIVE-10036.2.patch, HIVE-10036.3.patch, HIVE-10036.5.patch, HIVE-10036.6.patch, HIVE-10036.7.patch, HIVE-10036.8.patch, HIVE-10036.9.patch ORC writer keeps multiple out steams for each column. Each output stream is allocated fixed size ByteBuffer (configurable, default to 256K). For a big table, the memory cost is unbearable. Specially when HCatalog dynamic partition involves, several hundreds files may be open and writing at the same time (same problems for FileSinkOperator). Global ORC memory manager controls the buffer size, but it only got kicked in at 5000 rows interval. An enhancement could be done here, but the problem is reducing the buffer size introduces worse compression and more IOs in read path. Sacrificing the read performance is always not a good choice. I changed the fixed size ByteBuffer to a dynamic growth buffer which up bound to the existing configurable buffer size. Most of the streams does not need large buffer so the performance got improved significantly. Comparing to Facebook's hive-dwrf, I monitored 2x performance gain with this fix. Solving OOM for ORC completely maybe needs lots of effort , but this is definitely a low hanging fruit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9392) JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to column names having duplicated fqColumnName
[ https://issues.apache.org/jira/browse/HIVE-9392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-9392: -- Attachment: HIVE-9392.6.patch patch without q files updated. JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to column names having duplicated fqColumnName Key: HIVE-9392 URL: https://issues.apache.org/jira/browse/HIVE-9392 Project: Hive Issue Type: Bug Components: Physical Optimizer Affects Versions: 0.14.0 Reporter: Mostafa Mokhtar Assignee: Pengcheng Xiong Priority: Critical Attachments: HIVE-9392.1.patch, HIVE-9392.2.patch, HIVE-9392.3.patch, HIVE-9392.4.patch, HIVE-9392.5.patch, HIVE-9392.6.patch In JoinStatsRule.process the join column statistics are stored in HashMap joinedColStats, the key used which is the ColStatistics.fqColName is duplicated between join column in the same vertex, as a result distinctVals ends up having duplicated values which negatively affects the join cardinality estimation. The duplicate keys are usually named KEY.reducesinkkey0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10609) Vectorization : Q64 fails with ClassCastException
[ https://issues.apache.org/jira/browse/HIVE-10609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537338#comment-14537338 ] Mostafa Mokhtar commented on HIVE-10609: [~mmccline] I tried the patch on TPC-DS 200GB and query runs fine and returns correct results. Vectorization : Q64 fails with ClassCastException - Key: HIVE-10609 URL: https://issues.apache.org/jira/browse/HIVE-10609 Project: Hive Issue Type: Bug Components: Vectorization Affects Versions: 1.2.0 Reporter: Mostafa Mokhtar Assignee: Matt McCline Attachments: HIVE-10609.01.patch TPC-DS Q64 fails with ClassCastException. Query {code} select cs1.product_name ,cs1.store_name ,cs1.store_zip ,cs1.b_street_number ,cs1.b_streen_name ,cs1.b_city ,cs1.b_zip ,cs1.c_street_number ,cs1.c_street_name ,cs1.c_city ,cs1.c_zip ,cs1.syear ,cs1.cnt ,cs1.s1 ,cs1.s2 ,cs1.s3 ,cs2.s1 ,cs2.s2 ,cs2.s3 ,cs2.syear ,cs2.cnt from (select i_product_name as product_name ,i_item_sk as item_sk ,s_store_name as store_name ,s_zip as store_zip ,ad1.ca_street_number as b_street_number ,ad1.ca_street_name as b_streen_name ,ad1.ca_city as b_city ,ad1.ca_zip as b_zip ,ad2.ca_street_number as c_street_number ,ad2.ca_street_name as c_street_name ,ad2.ca_city as c_city ,ad2.ca_zip as c_zip ,d1.d_year as syear ,d2.d_year as fsyear ,d3.d_year as s2year ,count(*) as cnt ,sum(ss_wholesale_cost) as s1 ,sum(ss_list_price) as s2 ,sum(ss_coupon_amt) as s3 FROM store_sales JOIN store_returns ON store_sales.ss_item_sk = store_returns.sr_item_sk and store_sales.ss_ticket_number = store_returns.sr_ticket_number JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk JOIN date_dim d1 ON store_sales.ss_sold_date_sk = d1.d_date_sk JOIN date_dim d2 ON customer.c_first_sales_date_sk = d2.d_date_sk JOIN date_dim d3 ON customer.c_first_shipto_date_sk = d3.d_date_sk JOIN store ON store_sales.ss_store_sk = store.s_store_sk JOIN customer_demographics cd1 ON store_sales.ss_cdemo_sk= cd1.cd_demo_sk JOIN customer_demographics cd2 ON customer.c_current_cdemo_sk = cd2.cd_demo_sk JOIN promotion ON store_sales.ss_promo_sk = promotion.p_promo_sk JOIN household_demographics hd1 ON store_sales.ss_hdemo_sk = hd1.hd_demo_sk JOIN household_demographics hd2 ON customer.c_current_hdemo_sk = hd2.hd_demo_sk JOIN customer_address ad1 ON store_sales.ss_addr_sk = ad1.ca_address_sk JOIN customer_address ad2 ON customer.c_current_addr_sk = ad2.ca_address_sk JOIN income_band ib1 ON hd1.hd_income_band_sk = ib1.ib_income_band_sk JOIN income_band ib2 ON hd2.hd_income_band_sk = ib2.ib_income_band_sk JOIN item ON store_sales.ss_item_sk = item.i_item_sk JOIN (select cs_item_sk ,sum(cs_ext_list_price) as sale,sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit) as refund from catalog_sales JOIN catalog_returns ON catalog_sales.cs_item_sk = catalog_returns.cr_item_sk and catalog_sales.cs_order_number = catalog_returns.cr_order_number group by cs_item_sk having sum(cs_ext_list_price)2*sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit)) cs_ui ON store_sales.ss_item_sk = cs_ui.cs_item_sk WHERE cd1.cd_marital_status cd2.cd_marital_status and i_color in ('maroon','burnished','dim','steel','navajo','chocolate') and i_current_price between 35 and 35 + 10 and i_current_price between 35 + 1 and 35 + 15 group by i_product_name ,i_item_sk ,s_store_name ,s_zip ,ad1.ca_street_number ,ad1.ca_street_name ,ad1.ca_city ,ad1.ca_zip ,ad2.ca_street_number ,ad2.ca_street_name ,ad2.ca_city ,ad2.ca_zip ,d1.d_year ,d2.d_year ,d3.d_year ) cs1 JOIN (select i_product_name as product_name ,i_item_sk as item_sk ,s_store_name as store_name ,s_zip as store_zip ,ad1.ca_street_number as b_street_number ,ad1.ca_street_name as b_streen_name ,ad1.ca_city as b_city ,ad1.ca_zip as b_zip ,ad2.ca_street_number as c_street_number ,ad2.ca_street_name as c_street_name ,ad2.ca_city as c_city ,ad2.ca_zip as c_zip ,d1.d_year as syear ,d2.d_year as fsyear ,d3.d_year as s2year ,count(*) as cnt ,sum(ss_wholesale_cost) as s1 ,sum(ss_list_price) as s2 ,sum(ss_coupon_amt) as s3 FROM store_sales JOIN store_returns ON store_sales.ss_item_sk = store_returns.sr_item_sk and store_sales.ss_ticket_number = store_returns.sr_ticket_number JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk JOIN date_dim d1 ON store_sales.ss_sold_date_sk = d1.d_date_sk JOIN date_dim d2 ON
[jira] [Updated] (HIVE-10568) Select count(distinct()) can have more optimal execution plan
[ https://issues.apache.org/jira/browse/HIVE-10568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-10568: -- Labels: TODOC1.2 (was: ) Select count(distinct()) can have more optimal execution plan - Key: HIVE-10568 URL: https://issues.apache.org/jira/browse/HIVE-10568 Project: Hive Issue Type: Improvement Components: CBO, Logical Optimizer Affects Versions: 0.6.0, 0.7.0, 0.8.0, 0.9.0, 0.10.0, 0.11.0, 0.12.0, 0.13.0, 0.14.0, 1.0.0, 1.1.0 Reporter: Mostafa Mokhtar Assignee: Ashutosh Chauhan Labels: TODOC1.2 Fix For: 1.2.0 Attachments: HIVE-10568.1.patch, HIVE-10568.2.patch, HIVE-10568.patch, HIVE-10568.patch {code:sql} select count(distinct ss_ticket_number) from store_sales; {code} can be rewritten as {code:sql} select count(1) from (select distinct ss_ticket_number from store_sales) a; {code} which may run upto 3x faster -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10568) Select count(distinct()) can have more optimal execution plan
[ https://issues.apache.org/jira/browse/HIVE-10568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537371#comment-14537371 ] Lefty Leverenz commented on HIVE-10568: --- Doc note: This adds *hive.optimize.distinct.rewrite* to HiveConf.java, so it needs to be documented in the wiki. Perhaps it belongs in the UDFs doc as well as Configs. * [Configuration Properties -- Query and DDL Execution | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-QueryandDDLExecution] * [Built-in Aggregate Functions (UDAF) | https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-Built-inAggregateFunctions(UDAF)] Select count(distinct()) can have more optimal execution plan - Key: HIVE-10568 URL: https://issues.apache.org/jira/browse/HIVE-10568 Project: Hive Issue Type: Improvement Components: CBO, Logical Optimizer Affects Versions: 0.6.0, 0.7.0, 0.8.0, 0.9.0, 0.10.0, 0.11.0, 0.12.0, 0.13.0, 0.14.0, 1.0.0, 1.1.0 Reporter: Mostafa Mokhtar Assignee: Ashutosh Chauhan Labels: TODOC1.2 Fix For: 1.2.0 Attachments: HIVE-10568.1.patch, HIVE-10568.2.patch, HIVE-10568.patch, HIVE-10568.patch {code:sql} select count(distinct ss_ticket_number) from store_sales; {code} can be rewritten as {code:sql} select count(1) from (select distinct ss_ticket_number from store_sales) a; {code} which may run upto 3x faster -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10036) Writing ORC format big table causes OOM - too many fixed sized stream buffers
[ https://issues.apache.org/jira/browse/HIVE-10036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537415#comment-14537415 ] Hive QA commented on HIVE-10036: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12731823/HIVE-10036.9.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 8921 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_static org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchCommit_Json {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3841/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3841/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3841/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12731823 - PreCommit-HIVE-TRUNK-Build Writing ORC format big table causes OOM - too many fixed sized stream buffers - Key: HIVE-10036 URL: https://issues.apache.org/jira/browse/HIVE-10036 Project: Hive Issue Type: Improvement Reporter: Selina Zhang Assignee: Selina Zhang Labels: orcfile Attachments: HIVE-10036.1.patch, HIVE-10036.2.patch, HIVE-10036.3.patch, HIVE-10036.5.patch, HIVE-10036.6.patch, HIVE-10036.7.patch, HIVE-10036.8.patch, HIVE-10036.9.patch ORC writer keeps multiple out steams for each column. Each output stream is allocated fixed size ByteBuffer (configurable, default to 256K). For a big table, the memory cost is unbearable. Specially when HCatalog dynamic partition involves, several hundreds files may be open and writing at the same time (same problems for FileSinkOperator). Global ORC memory manager controls the buffer size, but it only got kicked in at 5000 rows interval. An enhancement could be done here, but the problem is reducing the buffer size introduces worse compression and more IOs in read path. Sacrificing the read performance is always not a good choice. I changed the fixed size ByteBuffer to a dynamic growth buffer which up bound to the existing configurable buffer size. Most of the streams does not need large buffer so the performance got improved significantly. Comparing to Facebook's hive-dwrf, I monitored 2x performance gain with this fix. Solving OOM for ORC completely maybe needs lots of effort , but this is definitely a low hanging fruit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9392) JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to column names having duplicated fqColumnName
[ https://issues.apache.org/jira/browse/HIVE-9392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537492#comment-14537492 ] Hive QA commented on HIVE-9392: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12731833/HIVE-9392.6.patch {color:red}ERROR:{color} -1 due to 384 failed/errored test(s), 8921 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_filter org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_limit org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_select org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join30 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join31 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join32 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_stats org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_stats2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_without_localtask org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_14 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_15 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_5 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_6 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_7 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_8 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_9 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_binarysortable_1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucket_map_join_1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucket_map_join_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketsortoptimize_insert_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketsortoptimize_insert_4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketsortoptimize_insert_5 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketsortoptimize_insert_6 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_column_access_stats org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_complex_alias org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer10 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer11 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer15 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer5 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer6 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cross_product_check_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_explain_logical org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_explain_rearrange org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_filter_numeric org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_fold_case org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_fold_when org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_cube1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_grouping_sets2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_grouping_sets3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_grouping_sets4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_grouping_sets5 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_grouping_sets6 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_ppd org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_rollup1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_sort_6 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_having2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_identity_project_remove_skip org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_implicit_cast1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_empty org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_mult_tables
[jira] [Updated] (HIVE-10609) Vectorization : Q64 fails with ClassCastException
[ https://issues.apache.org/jira/browse/HIVE-10609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-10609: Attachment: HIVE-10609.01.patch Vectorization : Q64 fails with ClassCastException - Key: HIVE-10609 URL: https://issues.apache.org/jira/browse/HIVE-10609 Project: Hive Issue Type: Bug Components: Vectorization Affects Versions: 1.2.0 Reporter: Mostafa Mokhtar Assignee: Matt McCline Attachments: HIVE-10609.01.patch TPC-DS Q64 fails with ClassCastException. Query {code} select cs1.product_name ,cs1.store_name ,cs1.store_zip ,cs1.b_street_number ,cs1.b_streen_name ,cs1.b_city ,cs1.b_zip ,cs1.c_street_number ,cs1.c_street_name ,cs1.c_city ,cs1.c_zip ,cs1.syear ,cs1.cnt ,cs1.s1 ,cs1.s2 ,cs1.s3 ,cs2.s1 ,cs2.s2 ,cs2.s3 ,cs2.syear ,cs2.cnt from (select i_product_name as product_name ,i_item_sk as item_sk ,s_store_name as store_name ,s_zip as store_zip ,ad1.ca_street_number as b_street_number ,ad1.ca_street_name as b_streen_name ,ad1.ca_city as b_city ,ad1.ca_zip as b_zip ,ad2.ca_street_number as c_street_number ,ad2.ca_street_name as c_street_name ,ad2.ca_city as c_city ,ad2.ca_zip as c_zip ,d1.d_year as syear ,d2.d_year as fsyear ,d3.d_year as s2year ,count(*) as cnt ,sum(ss_wholesale_cost) as s1 ,sum(ss_list_price) as s2 ,sum(ss_coupon_amt) as s3 FROM store_sales JOIN store_returns ON store_sales.ss_item_sk = store_returns.sr_item_sk and store_sales.ss_ticket_number = store_returns.sr_ticket_number JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk JOIN date_dim d1 ON store_sales.ss_sold_date_sk = d1.d_date_sk JOIN date_dim d2 ON customer.c_first_sales_date_sk = d2.d_date_sk JOIN date_dim d3 ON customer.c_first_shipto_date_sk = d3.d_date_sk JOIN store ON store_sales.ss_store_sk = store.s_store_sk JOIN customer_demographics cd1 ON store_sales.ss_cdemo_sk= cd1.cd_demo_sk JOIN customer_demographics cd2 ON customer.c_current_cdemo_sk = cd2.cd_demo_sk JOIN promotion ON store_sales.ss_promo_sk = promotion.p_promo_sk JOIN household_demographics hd1 ON store_sales.ss_hdemo_sk = hd1.hd_demo_sk JOIN household_demographics hd2 ON customer.c_current_hdemo_sk = hd2.hd_demo_sk JOIN customer_address ad1 ON store_sales.ss_addr_sk = ad1.ca_address_sk JOIN customer_address ad2 ON customer.c_current_addr_sk = ad2.ca_address_sk JOIN income_band ib1 ON hd1.hd_income_band_sk = ib1.ib_income_band_sk JOIN income_band ib2 ON hd2.hd_income_band_sk = ib2.ib_income_band_sk JOIN item ON store_sales.ss_item_sk = item.i_item_sk JOIN (select cs_item_sk ,sum(cs_ext_list_price) as sale,sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit) as refund from catalog_sales JOIN catalog_returns ON catalog_sales.cs_item_sk = catalog_returns.cr_item_sk and catalog_sales.cs_order_number = catalog_returns.cr_order_number group by cs_item_sk having sum(cs_ext_list_price)2*sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit)) cs_ui ON store_sales.ss_item_sk = cs_ui.cs_item_sk WHERE cd1.cd_marital_status cd2.cd_marital_status and i_color in ('maroon','burnished','dim','steel','navajo','chocolate') and i_current_price between 35 and 35 + 10 and i_current_price between 35 + 1 and 35 + 15 group by i_product_name ,i_item_sk ,s_store_name ,s_zip ,ad1.ca_street_number ,ad1.ca_street_name ,ad1.ca_city ,ad1.ca_zip ,ad2.ca_street_number ,ad2.ca_street_name ,ad2.ca_city ,ad2.ca_zip ,d1.d_year ,d2.d_year ,d3.d_year ) cs1 JOIN (select i_product_name as product_name ,i_item_sk as item_sk ,s_store_name as store_name ,s_zip as store_zip ,ad1.ca_street_number as b_street_number ,ad1.ca_street_name as b_streen_name ,ad1.ca_city as b_city ,ad1.ca_zip as b_zip ,ad2.ca_street_number as c_street_number ,ad2.ca_street_name as c_street_name ,ad2.ca_city as c_city ,ad2.ca_zip as c_zip ,d1.d_year as syear ,d2.d_year as fsyear ,d3.d_year as s2year ,count(*) as cnt ,sum(ss_wholesale_cost) as s1 ,sum(ss_list_price) as s2 ,sum(ss_coupon_amt) as s3 FROM store_sales JOIN store_returns ON store_sales.ss_item_sk = store_returns.sr_item_sk and store_sales.ss_ticket_number = store_returns.sr_ticket_number JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk JOIN date_dim d1 ON store_sales.ss_sold_date_sk = d1.d_date_sk JOIN date_dim d2 ON customer.c_first_sales_date_sk = d2.d_date_sk JOIN date_dim d3 ON customer.c_first_shipto_date_sk = d3.d_date_sk JOIN
[jira] [Updated] (HIVE-10609) Vectorization : Q64 fails with ClassCastException
[ https://issues.apache.org/jira/browse/HIVE-10609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-10609: Attachment: (was: HIVE-10609.01.patch) Vectorization : Q64 fails with ClassCastException - Key: HIVE-10609 URL: https://issues.apache.org/jira/browse/HIVE-10609 Project: Hive Issue Type: Bug Components: Vectorization Affects Versions: 1.2.0 Reporter: Mostafa Mokhtar Assignee: Matt McCline Attachments: HIVE-10609.01.patch TPC-DS Q64 fails with ClassCastException. Query {code} select cs1.product_name ,cs1.store_name ,cs1.store_zip ,cs1.b_street_number ,cs1.b_streen_name ,cs1.b_city ,cs1.b_zip ,cs1.c_street_number ,cs1.c_street_name ,cs1.c_city ,cs1.c_zip ,cs1.syear ,cs1.cnt ,cs1.s1 ,cs1.s2 ,cs1.s3 ,cs2.s1 ,cs2.s2 ,cs2.s3 ,cs2.syear ,cs2.cnt from (select i_product_name as product_name ,i_item_sk as item_sk ,s_store_name as store_name ,s_zip as store_zip ,ad1.ca_street_number as b_street_number ,ad1.ca_street_name as b_streen_name ,ad1.ca_city as b_city ,ad1.ca_zip as b_zip ,ad2.ca_street_number as c_street_number ,ad2.ca_street_name as c_street_name ,ad2.ca_city as c_city ,ad2.ca_zip as c_zip ,d1.d_year as syear ,d2.d_year as fsyear ,d3.d_year as s2year ,count(*) as cnt ,sum(ss_wholesale_cost) as s1 ,sum(ss_list_price) as s2 ,sum(ss_coupon_amt) as s3 FROM store_sales JOIN store_returns ON store_sales.ss_item_sk = store_returns.sr_item_sk and store_sales.ss_ticket_number = store_returns.sr_ticket_number JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk JOIN date_dim d1 ON store_sales.ss_sold_date_sk = d1.d_date_sk JOIN date_dim d2 ON customer.c_first_sales_date_sk = d2.d_date_sk JOIN date_dim d3 ON customer.c_first_shipto_date_sk = d3.d_date_sk JOIN store ON store_sales.ss_store_sk = store.s_store_sk JOIN customer_demographics cd1 ON store_sales.ss_cdemo_sk= cd1.cd_demo_sk JOIN customer_demographics cd2 ON customer.c_current_cdemo_sk = cd2.cd_demo_sk JOIN promotion ON store_sales.ss_promo_sk = promotion.p_promo_sk JOIN household_demographics hd1 ON store_sales.ss_hdemo_sk = hd1.hd_demo_sk JOIN household_demographics hd2 ON customer.c_current_hdemo_sk = hd2.hd_demo_sk JOIN customer_address ad1 ON store_sales.ss_addr_sk = ad1.ca_address_sk JOIN customer_address ad2 ON customer.c_current_addr_sk = ad2.ca_address_sk JOIN income_band ib1 ON hd1.hd_income_band_sk = ib1.ib_income_band_sk JOIN income_band ib2 ON hd2.hd_income_band_sk = ib2.ib_income_band_sk JOIN item ON store_sales.ss_item_sk = item.i_item_sk JOIN (select cs_item_sk ,sum(cs_ext_list_price) as sale,sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit) as refund from catalog_sales JOIN catalog_returns ON catalog_sales.cs_item_sk = catalog_returns.cr_item_sk and catalog_sales.cs_order_number = catalog_returns.cr_order_number group by cs_item_sk having sum(cs_ext_list_price)2*sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit)) cs_ui ON store_sales.ss_item_sk = cs_ui.cs_item_sk WHERE cd1.cd_marital_status cd2.cd_marital_status and i_color in ('maroon','burnished','dim','steel','navajo','chocolate') and i_current_price between 35 and 35 + 10 and i_current_price between 35 + 1 and 35 + 15 group by i_product_name ,i_item_sk ,s_store_name ,s_zip ,ad1.ca_street_number ,ad1.ca_street_name ,ad1.ca_city ,ad1.ca_zip ,ad2.ca_street_number ,ad2.ca_street_name ,ad2.ca_city ,ad2.ca_zip ,d1.d_year ,d2.d_year ,d3.d_year ) cs1 JOIN (select i_product_name as product_name ,i_item_sk as item_sk ,s_store_name as store_name ,s_zip as store_zip ,ad1.ca_street_number as b_street_number ,ad1.ca_street_name as b_streen_name ,ad1.ca_city as b_city ,ad1.ca_zip as b_zip ,ad2.ca_street_number as c_street_number ,ad2.ca_street_name as c_street_name ,ad2.ca_city as c_city ,ad2.ca_zip as c_zip ,d1.d_year as syear ,d2.d_year as fsyear ,d3.d_year as s2year ,count(*) as cnt ,sum(ss_wholesale_cost) as s1 ,sum(ss_list_price) as s2 ,sum(ss_coupon_amt) as s3 FROM store_sales JOIN store_returns ON store_sales.ss_item_sk = store_returns.sr_item_sk and store_sales.ss_ticket_number = store_returns.sr_ticket_number JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk JOIN date_dim d1 ON store_sales.ss_sold_date_sk = d1.d_date_sk JOIN date_dim d2 ON customer.c_first_sales_date_sk = d2.d_date_sk JOIN date_dim d3 ON customer.c_first_shipto_date_sk = d3.d_date_sk
[jira] [Commented] (HIVE-10646) ColumnValue does not handle NULL_TYPE
[ https://issues.apache.org/jira/browse/HIVE-10646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537055#comment-14537055 ] Hive QA commented on HIVE-10646: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12731702/HIVE-10646.1.patch {color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 8921 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_percentile_approx_23 org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_static org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_index_bitmap3 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_stats_counter_partitioned org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchEmptyCommit org.apache.hive.jdbc.TestSSL.testSSLFetchHttp org.apache.hive.spark.client.TestSparkClient.testJobSubmission org.apache.hive.spark.client.TestSparkClient.testMetricsCollection org.apache.hive.spark.client.TestSparkClient.testSyncRpc {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3837/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3837/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3837/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 9 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12731702 - PreCommit-HIVE-TRUNK-Build ColumnValue does not handle NULL_TYPE - Key: HIVE-10646 URL: https://issues.apache.org/jira/browse/HIVE-10646 Project: Hive Issue Type: Bug Reporter: Yongzhi Chen Assignee: Yongzhi Chen Attachments: HIVE-10646.1.patch This will cause NPE if the thrift client use protocol V5 or older: {noformat} 1:46:07.199 PMERROR org.apache.thrift.server.TThreadPoolServer Error occurred during processing of message. java.lang.NullPointerException at org.apache.hive.service.cli.thrift.TRow$TRowStandardScheme.write(TRow.java:388) at org.apache.hive.service.cli.thrift.TRow$TRowStandardScheme.write(TRow.java:338) at org.apache.hive.service.cli.thrift.TRow.write(TRow.java:288) at org.apache.hive.service.cli.thrift.TRowSet$TRowSetStandardScheme.write(TRowSet.java:605) at org.apache.hive.service.cli.thrift.TRowSet$TRowSetStandardScheme.write(TRowSet.java:525) at org.apache.hive.service.cli.thrift.TRowSet.write(TRowSet.java:455) at org.apache.hive.service.cli.thrift.TFetchResultsResp$TFetchResultsRespStandardScheme.write(TFetchResultsResp.java:550) at org.apache.hive.service.cli.thrift.TFetchResultsResp$TFetchResultsRespStandardScheme.write(TFetchResultsResp.java:486) at org.apache.hive.service.cli.thrift.TFetchResultsResp.write(TFetchResultsResp.java:412) at org.apache.hive.service.cli.thrift.TCLIService$FetchResults_result$FetchResults_resultStandardScheme.write(TCLIService.java:13272) at org.apache.hive.service.cli.thrift.TCLIService$FetchResults_result$FetchResults_resultStandardScheme.write(TCLIService.java:13236) at org.apache.hive.service.cli.thrift.TCLIService$FetchResults_result.write(TCLIService.java:13187) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:53) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge20S.java:677) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:244) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {noformat} Reproduce: Run: select NULL as col, * from jsmall limit 5; from a V5 client (for example some version of Hue). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10646) ColumnValue does not handle NULL_TYPE
[ https://issues.apache.org/jira/browse/HIVE-10646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537089#comment-14537089 ] Yongzhi Chen commented on HIVE-10646: - All the 9 failures are not related: 1. The following 4 tests all passed on the first pre-commit running of this PATCH (3821). And my change will not cause random behavior(if it causes failures, should always fails), so the failures are not related. org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_percentile_approx_23 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_index_bitmap3 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_stats_counter_partitioned org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchEmptyCommit 2. org.apache.hive.jdbc.TestSSL.testSSLFetchHttp failure is network issue? org.apache.http.conn.HttpHostConnectException: Connect to localhost:60355 [localhost/127.0.0.1, localhost/127.0.0.1] failed: Connection refused. 3. org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_static ages 35 4. All 3 spark failures are java.util.concurrent.TimeoutException: null [~szehon], could you commit the patch, thanks. ColumnValue does not handle NULL_TYPE - Key: HIVE-10646 URL: https://issues.apache.org/jira/browse/HIVE-10646 Project: Hive Issue Type: Bug Reporter: Yongzhi Chen Assignee: Yongzhi Chen Attachments: HIVE-10646.1.patch This will cause NPE if the thrift client use protocol V5 or older: {noformat} 1:46:07.199 PMERROR org.apache.thrift.server.TThreadPoolServer Error occurred during processing of message. java.lang.NullPointerException at org.apache.hive.service.cli.thrift.TRow$TRowStandardScheme.write(TRow.java:388) at org.apache.hive.service.cli.thrift.TRow$TRowStandardScheme.write(TRow.java:338) at org.apache.hive.service.cli.thrift.TRow.write(TRow.java:288) at org.apache.hive.service.cli.thrift.TRowSet$TRowSetStandardScheme.write(TRowSet.java:605) at org.apache.hive.service.cli.thrift.TRowSet$TRowSetStandardScheme.write(TRowSet.java:525) at org.apache.hive.service.cli.thrift.TRowSet.write(TRowSet.java:455) at org.apache.hive.service.cli.thrift.TFetchResultsResp$TFetchResultsRespStandardScheme.write(TFetchResultsResp.java:550) at org.apache.hive.service.cli.thrift.TFetchResultsResp$TFetchResultsRespStandardScheme.write(TFetchResultsResp.java:486) at org.apache.hive.service.cli.thrift.TFetchResultsResp.write(TFetchResultsResp.java:412) at org.apache.hive.service.cli.thrift.TCLIService$FetchResults_result$FetchResults_resultStandardScheme.write(TCLIService.java:13272) at org.apache.hive.service.cli.thrift.TCLIService$FetchResults_result$FetchResults_resultStandardScheme.write(TCLIService.java:13236) at org.apache.hive.service.cli.thrift.TCLIService$FetchResults_result.write(TCLIService.java:13187) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:53) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge20S.java:677) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:244) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {noformat} Reproduce: Run: select NULL as col, * from jsmall limit 5; from a V5 client (for example some version of Hue). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10609) Vectorization : Q64 fails with ClassCastException
[ https://issues.apache.org/jira/browse/HIVE-10609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537122#comment-14537122 ] Hive QA commented on HIVE-10609: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12731779/HIVE-10609.01.patch {color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 8919 tests executed *Failed tests:* {noformat} TestHiveAuthFactory - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_left_outer_join2 org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_static org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_left_outer_join2 org.apache.hive.jdbc.TestSSL.testSSLFetchHttp {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3839/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3839/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3839/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12731779 - PreCommit-HIVE-TRUNK-Build Vectorization : Q64 fails with ClassCastException - Key: HIVE-10609 URL: https://issues.apache.org/jira/browse/HIVE-10609 Project: Hive Issue Type: Bug Components: Vectorization Affects Versions: 1.2.0 Reporter: Mostafa Mokhtar Assignee: Matt McCline Attachments: HIVE-10609.01.patch TPC-DS Q64 fails with ClassCastException. Query {code} select cs1.product_name ,cs1.store_name ,cs1.store_zip ,cs1.b_street_number ,cs1.b_streen_name ,cs1.b_city ,cs1.b_zip ,cs1.c_street_number ,cs1.c_street_name ,cs1.c_city ,cs1.c_zip ,cs1.syear ,cs1.cnt ,cs1.s1 ,cs1.s2 ,cs1.s3 ,cs2.s1 ,cs2.s2 ,cs2.s3 ,cs2.syear ,cs2.cnt from (select i_product_name as product_name ,i_item_sk as item_sk ,s_store_name as store_name ,s_zip as store_zip ,ad1.ca_street_number as b_street_number ,ad1.ca_street_name as b_streen_name ,ad1.ca_city as b_city ,ad1.ca_zip as b_zip ,ad2.ca_street_number as c_street_number ,ad2.ca_street_name as c_street_name ,ad2.ca_city as c_city ,ad2.ca_zip as c_zip ,d1.d_year as syear ,d2.d_year as fsyear ,d3.d_year as s2year ,count(*) as cnt ,sum(ss_wholesale_cost) as s1 ,sum(ss_list_price) as s2 ,sum(ss_coupon_amt) as s3 FROM store_sales JOIN store_returns ON store_sales.ss_item_sk = store_returns.sr_item_sk and store_sales.ss_ticket_number = store_returns.sr_ticket_number JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk JOIN date_dim d1 ON store_sales.ss_sold_date_sk = d1.d_date_sk JOIN date_dim d2 ON customer.c_first_sales_date_sk = d2.d_date_sk JOIN date_dim d3 ON customer.c_first_shipto_date_sk = d3.d_date_sk JOIN store ON store_sales.ss_store_sk = store.s_store_sk JOIN customer_demographics cd1 ON store_sales.ss_cdemo_sk= cd1.cd_demo_sk JOIN customer_demographics cd2 ON customer.c_current_cdemo_sk = cd2.cd_demo_sk JOIN promotion ON store_sales.ss_promo_sk = promotion.p_promo_sk JOIN household_demographics hd1 ON store_sales.ss_hdemo_sk = hd1.hd_demo_sk JOIN household_demographics hd2 ON customer.c_current_hdemo_sk = hd2.hd_demo_sk JOIN customer_address ad1 ON store_sales.ss_addr_sk = ad1.ca_address_sk JOIN customer_address ad2 ON customer.c_current_addr_sk = ad2.ca_address_sk JOIN income_band ib1 ON hd1.hd_income_band_sk = ib1.ib_income_band_sk JOIN income_band ib2 ON hd2.hd_income_band_sk = ib2.ib_income_band_sk JOIN item ON store_sales.ss_item_sk = item.i_item_sk JOIN (select cs_item_sk ,sum(cs_ext_list_price) as sale,sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit) as refund from catalog_sales JOIN catalog_returns ON catalog_sales.cs_item_sk = catalog_returns.cr_item_sk and catalog_sales.cs_order_number = catalog_returns.cr_order_number group by cs_item_sk having sum(cs_ext_list_price)2*sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit)) cs_ui ON store_sales.ss_item_sk = cs_ui.cs_item_sk WHERE cd1.cd_marital_status cd2.cd_marital_status and i_color in ('maroon','burnished','dim','steel','navajo','chocolate') and
[jira] [Commented] (HIVE-10591) Support limited integer type promotion in ORC
[ https://issues.apache.org/jira/browse/HIVE-10591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537049#comment-14537049 ] Lefty Leverenz commented on HIVE-10591: --- Should this be documented? {quote} Following type promotions can be supported without any casting smallint - int smallint - bigint int - bigint {quote} Support limited integer type promotion in ORC - Key: HIVE-10591 URL: https://issues.apache.org/jira/browse/HIVE-10591 Project: Hive Issue Type: New Feature Affects Versions: 1.3.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Fix For: 1.2.0, 1.3.0 Attachments: HIVE-10591.1.patch, HIVE-10591.2.patch, HIVE-10591.2.patch, HIVE-10591.3.patch, HIVE-10591.3.patch, HIVE-10591.3.patch ORC currently does not support schema-on-read. If we alter an ORC table with 'int' type to 'bigint' and if we query the altered table ClassCastException will be thrown as the schema on read from table descriptor will expect LongWritable whereas ORC will return IntWritable based on file schema stored within ORC file. OrcSerde currently doesn't do any type conversions or type promotions for performance reasons in inner loop. Since smallints, ints and bigints are stored in the same way in ORC, it will be possible be allow such type promotions without hurting performance. Following type promotions can be supported without any casting smallint - int smallint - bigint int - bigint Tinyint promotion is not possible without casting as tinyints are stored using RLE byte writer whereas smallints, ints and bigints are stored using RLE integer writer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10664) Unit tests run fail in windows because of illegal escape character in file path
[ https://issues.apache.org/jira/browse/HIVE-10664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537084#comment-14537084 ] Hive QA commented on HIVE-10664: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12731709/HIVE-10664.1.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 8921 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_2 org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_static {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3838/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3838/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3838/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12731709 - PreCommit-HIVE-TRUNK-Build Unit tests run fail in windows because of illegal escape character in file path Key: HIVE-10664 URL: https://issues.apache.org/jira/browse/HIVE-10664 Project: Hive Issue Type: Bug Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan Attachments: HIVE-10664.1.patch {code:title=In windows we hit errors as shown below } [ERROR] /D:/w/hv/itests/qtest/target/generated-test-sources/java/org/apache/hadoop/hive/cli/TestHBaseNegativeCliDriver.java:[97,54] illegal escape character {code} Specifically, the lines it is complaining about in the log, look like this: {code:title=line 97 of /itests/qtest/target/generated-test-sources/java/org/apache/hadoop/hive/cli/TestHBaseNegativeCliDriver.java} line 97: QTestUtil.addTestsToSuiteFromQfileNames(D:\w\hv\itests\qtest\target\generated-test-sources\java\org\apache\hadoop\hive\cli\TestHBaseNegativeCliDriverQFileNames.txt, qFilesToExecute, {code} It is executing the itests/qtests directory on the windows platform. It seems as if it is hitting a fairly simple portability problem, of the type that a small change will touch up all of them quickly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10327) Remove ExprNodeNullDesc
[ https://issues.apache.org/jira/browse/HIVE-10327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537299#comment-14537299 ] Gopal V commented on HIVE-10327: [~ashutoshc]: sure, added to to my lists. Remove ExprNodeNullDesc --- Key: HIVE-10327 URL: https://issues.apache.org/jira/browse/HIVE-10327 Project: Hive Issue Type: Task Components: Query Planning Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-10327.1.patch, HIVE-10327.patch Its purpose can be served by ExprNodeConstantDesc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10190) CBO: AST mode checks for TABLESAMPLE with AST.toString().contains(TOK_TABLESPLITSAMPLE)
[ https://issues.apache.org/jira/browse/HIVE-10190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537239#comment-14537239 ] Hive QA commented on HIVE-10190: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12731797/HIVE-10190.12.patch {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 8919 tests executed *Failed tests:* {noformat} TestCompareCliDriver - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_mult_tables_compact org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_static {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3840/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3840/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3840/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12731797 - PreCommit-HIVE-TRUNK-Build CBO: AST mode checks for TABLESAMPLE with AST.toString().contains(TOK_TABLESPLITSAMPLE) - Key: HIVE-10190 URL: https://issues.apache.org/jira/browse/HIVE-10190 Project: Hive Issue Type: Bug Components: CBO Affects Versions: 1.2.0 Reporter: Gopal V Assignee: Reuben Kuhnert Priority: Trivial Labels: perfomance Attachments: HIVE-10190-querygen.py, HIVE-10190.01.patch, HIVE-10190.02.patch, HIVE-10190.03.patch, HIVE-10190.04.patch, HIVE-10190.05.patch, HIVE-10190.05.patch, HIVE-10190.06.patch, HIVE-10190.07.patch, HIVE-10190.08.patch, HIVE-10190.09.patch, HIVE-10190.10.patch, HIVE-10190.11.patch, HIVE-10190.12.patch {code} public static boolean validateASTForUnsupportedTokens(ASTNode ast) { String astTree = ast.toStringTree(); // if any of following tokens are present in AST, bail out String[] tokens = { TOK_CHARSETLITERAL, TOK_TABLESPLITSAMPLE }; for (String token : tokens) { if (astTree.contains(token)) { return false; } } return true; } {code} This is an issue for a SQL query which is bigger in AST form than in text (~700kb). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10190) CBO: AST mode checks for TABLESAMPLE with AST.toString().contains(TOK_TABLESPLITSAMPLE)
[ https://issues.apache.org/jira/browse/HIVE-10190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reuben Kuhnert updated HIVE-10190: -- Attachment: HIVE-10190.12.patch CBO: AST mode checks for TABLESAMPLE with AST.toString().contains(TOK_TABLESPLITSAMPLE) - Key: HIVE-10190 URL: https://issues.apache.org/jira/browse/HIVE-10190 Project: Hive Issue Type: Bug Components: CBO Affects Versions: 1.2.0 Reporter: Gopal V Assignee: Reuben Kuhnert Priority: Trivial Labels: perfomance Attachments: HIVE-10190-querygen.py, HIVE-10190.01.patch, HIVE-10190.02.patch, HIVE-10190.03.patch, HIVE-10190.04.patch, HIVE-10190.05.patch, HIVE-10190.05.patch, HIVE-10190.06.patch, HIVE-10190.07.patch, HIVE-10190.08.patch, HIVE-10190.09.patch, HIVE-10190.10.patch, HIVE-10190.11.patch, HIVE-10190.12.patch {code} public static boolean validateASTForUnsupportedTokens(ASTNode ast) { String astTree = ast.toStringTree(); // if any of following tokens are present in AST, bail out String[] tokens = { TOK_CHARSETLITERAL, TOK_TABLESPLITSAMPLE }; for (String token : tokens) { if (astTree.contains(token)) { return false; } } return true; } {code} This is an issue for a SQL query which is bigger in AST form than in text (~700kb). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10190) CBO: AST mode checks for TABLESAMPLE with AST.toString().contains(TOK_TABLESPLITSAMPLE)
[ https://issues.apache.org/jira/browse/HIVE-10190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reuben Kuhnert updated HIVE-10190: -- Attachment: (was: HIVE-10190.12.patch) CBO: AST mode checks for TABLESAMPLE with AST.toString().contains(TOK_TABLESPLITSAMPLE) - Key: HIVE-10190 URL: https://issues.apache.org/jira/browse/HIVE-10190 Project: Hive Issue Type: Bug Components: CBO Affects Versions: 1.2.0 Reporter: Gopal V Assignee: Reuben Kuhnert Priority: Trivial Labels: perfomance Attachments: HIVE-10190-querygen.py, HIVE-10190.01.patch, HIVE-10190.02.patch, HIVE-10190.03.patch, HIVE-10190.04.patch, HIVE-10190.05.patch, HIVE-10190.05.patch, HIVE-10190.06.patch, HIVE-10190.07.patch, HIVE-10190.08.patch, HIVE-10190.09.patch, HIVE-10190.10.patch, HIVE-10190.11.patch, HIVE-10190.12.patch {code} public static boolean validateASTForUnsupportedTokens(ASTNode ast) { String astTree = ast.toStringTree(); // if any of following tokens are present in AST, bail out String[] tokens = { TOK_CHARSETLITERAL, TOK_TABLESPLITSAMPLE }; for (String token : tokens) { if (astTree.contains(token)) { return false; } } return true; } {code} This is an issue for a SQL query which is bigger in AST form than in text (~700kb). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10568) Select count(distinct()) can have more optimal execution plan
[ https://issues.apache.org/jira/browse/HIVE-10568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-10568: Fix Version/s: (was: 1.3.0) 1.2.0 Select count(distinct()) can have more optimal execution plan - Key: HIVE-10568 URL: https://issues.apache.org/jira/browse/HIVE-10568 Project: Hive Issue Type: Improvement Components: CBO, Logical Optimizer Affects Versions: 0.6.0, 0.7.0, 0.8.0, 0.9.0, 0.10.0, 0.11.0, 0.12.0, 0.13.0, 0.14.0, 1.0.0, 1.2.0, 1.1.0 Reporter: Mostafa Mokhtar Assignee: Ashutosh Chauhan Fix For: 1.2.0 Attachments: HIVE-10568.1.patch, HIVE-10568.2.patch, HIVE-10568.patch, HIVE-10568.patch {code:sql} select count(distinct ss_ticket_number) from store_sales; {code} can be rewritten as {code:sql} select count(1) from (select distinct ss_ticket_number from store_sales) a; {code} which may run upto 3x faster -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10568) Select count(distinct()) can have more optimal execution plan
[ https://issues.apache.org/jira/browse/HIVE-10568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537313#comment-14537313 ] Ashutosh Chauhan commented on HIVE-10568: - Committed to 1.2 branch as well. Select count(distinct()) can have more optimal execution plan - Key: HIVE-10568 URL: https://issues.apache.org/jira/browse/HIVE-10568 Project: Hive Issue Type: Improvement Components: CBO, Logical Optimizer Affects Versions: 0.6.0, 0.7.0, 0.8.0, 0.9.0, 0.10.0, 0.11.0, 0.12.0, 0.13.0, 0.14.0, 1.0.0, 1.1.0 Reporter: Mostafa Mokhtar Assignee: Ashutosh Chauhan Fix For: 1.2.0 Attachments: HIVE-10568.1.patch, HIVE-10568.2.patch, HIVE-10568.patch, HIVE-10568.patch {code:sql} select count(distinct ss_ticket_number) from store_sales; {code} can be rewritten as {code:sql} select count(1) from (select distinct ss_ticket_number from store_sales) a; {code} which may run upto 3x faster -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10568) Select count(distinct()) can have more optimal execution plan
[ https://issues.apache.org/jira/browse/HIVE-10568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-10568: Affects Version/s: (was: 1.2.0) Select count(distinct()) can have more optimal execution plan - Key: HIVE-10568 URL: https://issues.apache.org/jira/browse/HIVE-10568 Project: Hive Issue Type: Improvement Components: CBO, Logical Optimizer Affects Versions: 0.6.0, 0.7.0, 0.8.0, 0.9.0, 0.10.0, 0.11.0, 0.12.0, 0.13.0, 0.14.0, 1.0.0, 1.1.0 Reporter: Mostafa Mokhtar Assignee: Ashutosh Chauhan Fix For: 1.2.0 Attachments: HIVE-10568.1.patch, HIVE-10568.2.patch, HIVE-10568.patch, HIVE-10568.patch {code:sql} select count(distinct ss_ticket_number) from store_sales; {code} can be rewritten as {code:sql} select count(1) from (select distinct ss_ticket_number from store_sales) a; {code} which may run upto 3x faster -- This message was sent by Atlassian JIRA (v6.3.4#6332)