[jira] [Commented] (HIVE-10244) Vectorization : TPC-DS Q80 fails with java.lang.ClassCastException when hive.vectorized.execution.reduce.enabled is enabled

2015-05-10 Thread Mostafa Mokhtar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537537#comment-14537537
 ] 

Mostafa Mokhtar commented on HIVE-10244:


[~jpullokkaran]
Can you please take a look?

 Vectorization : TPC-DS Q80 fails with java.lang.ClassCastException when 
 hive.vectorized.execution.reduce.enabled is enabled
 ---

 Key: HIVE-10244
 URL: https://issues.apache.org/jira/browse/HIVE-10244
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Matt McCline
 Attachments: explain_q80_vectorized_reduce_on.txt


 Query 
 {code}
 set hive.vectorized.execution.reduce.enabled=true;
 with ssr as
  (select  s_store_id as store_id,
   sum(ss_ext_sales_price) as sales,
   sum(coalesce(sr_return_amt, 0)) as returns,
   sum(ss_net_profit - coalesce(sr_net_loss, 0)) as profit
   from store_sales left outer join store_returns on
  (ss_item_sk = sr_item_sk and ss_ticket_number = sr_ticket_number),
  date_dim,
  store,
  item,
  promotion
  where ss_sold_date_sk = d_date_sk
and d_date between cast('1998-08-04' as date) 
   and (cast('1998-09-04' as date))
and ss_store_sk = s_store_sk
and ss_item_sk = i_item_sk
and i_current_price  50
and ss_promo_sk = p_promo_sk
and p_channel_tv = 'N'
  group by s_store_id)
  ,
  csr as
  (select  cp_catalog_page_id as catalog_page_id,
   sum(cs_ext_sales_price) as sales,
   sum(coalesce(cr_return_amount, 0)) as returns,
   sum(cs_net_profit - coalesce(cr_net_loss, 0)) as profit
   from catalog_sales left outer join catalog_returns on
  (cs_item_sk = cr_item_sk and cs_order_number = cr_order_number),
  date_dim,
  catalog_page,
  item,
  promotion
  where cs_sold_date_sk = d_date_sk
and d_date between cast('1998-08-04' as date)
   and (cast('1998-09-04' as date))
 and cs_catalog_page_sk = cp_catalog_page_sk
and cs_item_sk = i_item_sk
and i_current_price  50
and cs_promo_sk = p_promo_sk
and p_channel_tv = 'N'
 group by cp_catalog_page_id)
  ,
  wsr as
  (select  web_site_id,
   sum(ws_ext_sales_price) as sales,
   sum(coalesce(wr_return_amt, 0)) as returns,
   sum(ws_net_profit - coalesce(wr_net_loss, 0)) as profit
   from web_sales left outer join web_returns on
  (ws_item_sk = wr_item_sk and ws_order_number = wr_order_number),
  date_dim,
  web_site,
  item,
  promotion
  where ws_sold_date_sk = d_date_sk
and d_date between cast('1998-08-04' as date)
   and (cast('1998-09-04' as date))
 and ws_web_site_sk = web_site_sk
and ws_item_sk = i_item_sk
and i_current_price  50
and ws_promo_sk = p_promo_sk
and p_channel_tv = 'N'
 group by web_site_id)
   select  channel
 , id
 , sum(sales) as sales
 , sum(returns) as returns
 , sum(profit) as profit
  from 
  (select 'store channel' as channel
 , concat('store', store_id) as id
 , sales
 , returns
 , profit
  from   ssr
  union all
  select 'catalog channel' as channel
 , concat('catalog_page', catalog_page_id) as id
 , sales
 , returns
 , profit
  from  csr
  union all
  select 'web channel' as channel
 , concat('web_site', web_site_id) as id
 , sales
 , returns
 , profit
  from   wsr
  ) x
  group by channel, id with rollup
  order by channel
  ,id
  limit 100
 {code}
 Exception 
 {code}
 Vertex failed, vertexName=Reducer 5, vertexId=vertex_1426707664723_1377_1_22, 
 diagnostics=[Task failed, taskId=task_1426707664723_1377_1_22_00, 
 diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running 
 task:java.lang.RuntimeException: java.lang.RuntimeException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
 processing vector batch (tag=0) 
 \N\N09.285817653506076E84.639990363237801E7-1.1814318134887291E8
 \N\N04.682909323885761E82.2415242712669864E7-5.966176123188091E7
 \N\N01.2847032699693155E96.300096113768728E7-5.94963316209578E8
   at 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
   at 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
   at 
 org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:330)
   at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
   at 
 

[jira] [Updated] (HIVE-10609) Vectorization : Q64 fails with ClassCastException

2015-05-10 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-10609:

Attachment: HIVE-10609.02.patch

 Vectorization : Q64 fails with ClassCastException
 -

 Key: HIVE-10609
 URL: https://issues.apache.org/jira/browse/HIVE-10609
 Project: Hive
  Issue Type: Bug
  Components: Vectorization
Affects Versions: 1.2.0
Reporter: Mostafa Mokhtar
Assignee: Matt McCline
 Attachments: HIVE-10609.01.patch, HIVE-10609.02.patch


 TPC-DS Q64 fails with ClassCastException.
 Query
 {code}
 select cs1.product_name ,cs1.store_name ,cs1.store_zip ,cs1.b_street_number 
 ,cs1.b_streen_name ,cs1.b_city
  ,cs1.b_zip ,cs1.c_street_number ,cs1.c_street_name ,cs1.c_city 
 ,cs1.c_zip ,cs1.syear ,cs1.cnt
  ,cs1.s1 ,cs1.s2 ,cs1.s3
  ,cs2.s1 ,cs2.s2 ,cs2.s3 ,cs2.syear ,cs2.cnt
 from
 (select i_product_name as product_name ,i_item_sk as item_sk ,s_store_name as 
 store_name
  ,s_zip as store_zip ,ad1.ca_street_number as b_street_number 
 ,ad1.ca_street_name as b_streen_name
  ,ad1.ca_city as b_city ,ad1.ca_zip as b_zip ,ad2.ca_street_number as 
 c_street_number
  ,ad2.ca_street_name as c_street_name ,ad2.ca_city as c_city ,ad2.ca_zip 
 as c_zip
  ,d1.d_year as syear ,d2.d_year as fsyear ,d3.d_year as s2year ,count(*) 
 as cnt
  ,sum(ss_wholesale_cost) as s1 ,sum(ss_list_price) as s2 
 ,sum(ss_coupon_amt) as s3
   FROM   store_sales
 JOIN store_returns ON store_sales.ss_item_sk = 
 store_returns.sr_item_sk and store_sales.ss_ticket_number = 
 store_returns.sr_ticket_number
 JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk
 JOIN date_dim d1 ON store_sales.ss_sold_date_sk = d1.d_date_sk
 JOIN date_dim d2 ON customer.c_first_sales_date_sk = d2.d_date_sk 
 JOIN date_dim d3 ON customer.c_first_shipto_date_sk = d3.d_date_sk
 JOIN store ON store_sales.ss_store_sk = store.s_store_sk
 JOIN customer_demographics cd1 ON store_sales.ss_cdemo_sk= 
 cd1.cd_demo_sk
 JOIN customer_demographics cd2 ON customer.c_current_cdemo_sk = 
 cd2.cd_demo_sk
 JOIN promotion ON store_sales.ss_promo_sk = promotion.p_promo_sk
 JOIN household_demographics hd1 ON store_sales.ss_hdemo_sk = 
 hd1.hd_demo_sk
 JOIN household_demographics hd2 ON customer.c_current_hdemo_sk = 
 hd2.hd_demo_sk
 JOIN customer_address ad1 ON store_sales.ss_addr_sk = 
 ad1.ca_address_sk
 JOIN customer_address ad2 ON customer.c_current_addr_sk = 
 ad2.ca_address_sk
 JOIN income_band ib1 ON hd1.hd_income_band_sk = ib1.ib_income_band_sk
 JOIN income_band ib2 ON hd2.hd_income_band_sk = ib2.ib_income_band_sk
 JOIN item ON store_sales.ss_item_sk = item.i_item_sk
 JOIN
  (select cs_item_sk
 ,sum(cs_ext_list_price) as 
 sale,sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit) as refund
   from catalog_sales JOIN catalog_returns
   ON catalog_sales.cs_item_sk = catalog_returns.cr_item_sk
 and catalog_sales.cs_order_number = catalog_returns.cr_order_number
   group by cs_item_sk
   having 
 sum(cs_ext_list_price)2*sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit))
  cs_ui
 ON store_sales.ss_item_sk = cs_ui.cs_item_sk
   WHERE  
  cd1.cd_marital_status  cd2.cd_marital_status and
  i_color in ('maroon','burnished','dim','steel','navajo','chocolate') 
 and
  i_current_price between 35 and 35 + 10 and
  i_current_price between 35 + 1 and 35 + 15
 group by i_product_name ,i_item_sk ,s_store_name ,s_zip ,ad1.ca_street_number
,ad1.ca_street_name ,ad1.ca_city ,ad1.ca_zip ,ad2.ca_street_number
,ad2.ca_street_name ,ad2.ca_city ,ad2.ca_zip ,d1.d_year ,d2.d_year 
 ,d3.d_year
 ) cs1
 JOIN
 (select i_product_name as product_name ,i_item_sk as item_sk ,s_store_name as 
 store_name
  ,s_zip as store_zip ,ad1.ca_street_number as b_street_number 
 ,ad1.ca_street_name as b_streen_name
  ,ad1.ca_city as b_city ,ad1.ca_zip as b_zip ,ad2.ca_street_number as 
 c_street_number
  ,ad2.ca_street_name as c_street_name ,ad2.ca_city as c_city ,ad2.ca_zip 
 as c_zip
  ,d1.d_year as syear ,d2.d_year as fsyear ,d3.d_year as s2year ,count(*) 
 as cnt
  ,sum(ss_wholesale_cost) as s1 ,sum(ss_list_price) as s2 
 ,sum(ss_coupon_amt) as s3
   FROM   store_sales
 JOIN store_returns ON store_sales.ss_item_sk = 
 store_returns.sr_item_sk and store_sales.ss_ticket_number = 
 store_returns.sr_ticket_number
 JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk
 JOIN date_dim d1 ON store_sales.ss_sold_date_sk = d1.d_date_sk
 JOIN date_dim d2 ON customer.c_first_sales_date_sk = d2.d_date_sk 
 JOIN date_dim d3 ON customer.c_first_shipto_date_sk = 

[jira] [Commented] (HIVE-10609) Vectorization : Q64 fails with ClassCastException

2015-05-10 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537484#comment-14537484
 ] 

Matt McCline commented on HIVE-10609:
-

[~mmokhtar] Thanks for verifying it!

Fix left join filtered code.

[Note: the Tez results for vector_left_outer_join2.q are wrong.  This is a 
known issue solved by pending HIVE-10565]

 Vectorization : Q64 fails with ClassCastException
 -

 Key: HIVE-10609
 URL: https://issues.apache.org/jira/browse/HIVE-10609
 Project: Hive
  Issue Type: Bug
  Components: Vectorization
Affects Versions: 1.2.0
Reporter: Mostafa Mokhtar
Assignee: Matt McCline
 Attachments: HIVE-10609.01.patch, HIVE-10609.02.patch


 TPC-DS Q64 fails with ClassCastException.
 Query
 {code}
 select cs1.product_name ,cs1.store_name ,cs1.store_zip ,cs1.b_street_number 
 ,cs1.b_streen_name ,cs1.b_city
  ,cs1.b_zip ,cs1.c_street_number ,cs1.c_street_name ,cs1.c_city 
 ,cs1.c_zip ,cs1.syear ,cs1.cnt
  ,cs1.s1 ,cs1.s2 ,cs1.s3
  ,cs2.s1 ,cs2.s2 ,cs2.s3 ,cs2.syear ,cs2.cnt
 from
 (select i_product_name as product_name ,i_item_sk as item_sk ,s_store_name as 
 store_name
  ,s_zip as store_zip ,ad1.ca_street_number as b_street_number 
 ,ad1.ca_street_name as b_streen_name
  ,ad1.ca_city as b_city ,ad1.ca_zip as b_zip ,ad2.ca_street_number as 
 c_street_number
  ,ad2.ca_street_name as c_street_name ,ad2.ca_city as c_city ,ad2.ca_zip 
 as c_zip
  ,d1.d_year as syear ,d2.d_year as fsyear ,d3.d_year as s2year ,count(*) 
 as cnt
  ,sum(ss_wholesale_cost) as s1 ,sum(ss_list_price) as s2 
 ,sum(ss_coupon_amt) as s3
   FROM   store_sales
 JOIN store_returns ON store_sales.ss_item_sk = 
 store_returns.sr_item_sk and store_sales.ss_ticket_number = 
 store_returns.sr_ticket_number
 JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk
 JOIN date_dim d1 ON store_sales.ss_sold_date_sk = d1.d_date_sk
 JOIN date_dim d2 ON customer.c_first_sales_date_sk = d2.d_date_sk 
 JOIN date_dim d3 ON customer.c_first_shipto_date_sk = d3.d_date_sk
 JOIN store ON store_sales.ss_store_sk = store.s_store_sk
 JOIN customer_demographics cd1 ON store_sales.ss_cdemo_sk= 
 cd1.cd_demo_sk
 JOIN customer_demographics cd2 ON customer.c_current_cdemo_sk = 
 cd2.cd_demo_sk
 JOIN promotion ON store_sales.ss_promo_sk = promotion.p_promo_sk
 JOIN household_demographics hd1 ON store_sales.ss_hdemo_sk = 
 hd1.hd_demo_sk
 JOIN household_demographics hd2 ON customer.c_current_hdemo_sk = 
 hd2.hd_demo_sk
 JOIN customer_address ad1 ON store_sales.ss_addr_sk = 
 ad1.ca_address_sk
 JOIN customer_address ad2 ON customer.c_current_addr_sk = 
 ad2.ca_address_sk
 JOIN income_band ib1 ON hd1.hd_income_band_sk = ib1.ib_income_band_sk
 JOIN income_band ib2 ON hd2.hd_income_band_sk = ib2.ib_income_band_sk
 JOIN item ON store_sales.ss_item_sk = item.i_item_sk
 JOIN
  (select cs_item_sk
 ,sum(cs_ext_list_price) as 
 sale,sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit) as refund
   from catalog_sales JOIN catalog_returns
   ON catalog_sales.cs_item_sk = catalog_returns.cr_item_sk
 and catalog_sales.cs_order_number = catalog_returns.cr_order_number
   group by cs_item_sk
   having 
 sum(cs_ext_list_price)2*sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit))
  cs_ui
 ON store_sales.ss_item_sk = cs_ui.cs_item_sk
   WHERE  
  cd1.cd_marital_status  cd2.cd_marital_status and
  i_color in ('maroon','burnished','dim','steel','navajo','chocolate') 
 and
  i_current_price between 35 and 35 + 10 and
  i_current_price between 35 + 1 and 35 + 15
 group by i_product_name ,i_item_sk ,s_store_name ,s_zip ,ad1.ca_street_number
,ad1.ca_street_name ,ad1.ca_city ,ad1.ca_zip ,ad2.ca_street_number
,ad2.ca_street_name ,ad2.ca_city ,ad2.ca_zip ,d1.d_year ,d2.d_year 
 ,d3.d_year
 ) cs1
 JOIN
 (select i_product_name as product_name ,i_item_sk as item_sk ,s_store_name as 
 store_name
  ,s_zip as store_zip ,ad1.ca_street_number as b_street_number 
 ,ad1.ca_street_name as b_streen_name
  ,ad1.ca_city as b_city ,ad1.ca_zip as b_zip ,ad2.ca_street_number as 
 c_street_number
  ,ad2.ca_street_name as c_street_name ,ad2.ca_city as c_city ,ad2.ca_zip 
 as c_zip
  ,d1.d_year as syear ,d2.d_year as fsyear ,d3.d_year as s2year ,count(*) 
 as cnt
  ,sum(ss_wholesale_cost) as s1 ,sum(ss_list_price) as s2 
 ,sum(ss_coupon_amt) as s3
   FROM   store_sales
 JOIN store_returns ON store_sales.ss_item_sk = 
 store_returns.sr_item_sk and store_sales.ss_ticket_number = 
 store_returns.sr_ticket_number
 JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk
 

[jira] [Commented] (HIVE-10609) Vectorization : Q64 fails with ClassCastException

2015-05-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537518#comment-14537518
 ] 

Hive QA commented on HIVE-10609:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12731842/HIVE-10609.02.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 8921 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_static
org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark.testSparkQuery
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3843/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3843/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3843/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12731842 - PreCommit-HIVE-TRUNK-Build

 Vectorization : Q64 fails with ClassCastException
 -

 Key: HIVE-10609
 URL: https://issues.apache.org/jira/browse/HIVE-10609
 Project: Hive
  Issue Type: Bug
  Components: Vectorization
Affects Versions: 1.2.0
Reporter: Mostafa Mokhtar
Assignee: Matt McCline
 Attachments: HIVE-10609.01.patch, HIVE-10609.02.patch


 TPC-DS Q64 fails with ClassCastException.
 Query
 {code}
 select cs1.product_name ,cs1.store_name ,cs1.store_zip ,cs1.b_street_number 
 ,cs1.b_streen_name ,cs1.b_city
  ,cs1.b_zip ,cs1.c_street_number ,cs1.c_street_name ,cs1.c_city 
 ,cs1.c_zip ,cs1.syear ,cs1.cnt
  ,cs1.s1 ,cs1.s2 ,cs1.s3
  ,cs2.s1 ,cs2.s2 ,cs2.s3 ,cs2.syear ,cs2.cnt
 from
 (select i_product_name as product_name ,i_item_sk as item_sk ,s_store_name as 
 store_name
  ,s_zip as store_zip ,ad1.ca_street_number as b_street_number 
 ,ad1.ca_street_name as b_streen_name
  ,ad1.ca_city as b_city ,ad1.ca_zip as b_zip ,ad2.ca_street_number as 
 c_street_number
  ,ad2.ca_street_name as c_street_name ,ad2.ca_city as c_city ,ad2.ca_zip 
 as c_zip
  ,d1.d_year as syear ,d2.d_year as fsyear ,d3.d_year as s2year ,count(*) 
 as cnt
  ,sum(ss_wholesale_cost) as s1 ,sum(ss_list_price) as s2 
 ,sum(ss_coupon_amt) as s3
   FROM   store_sales
 JOIN store_returns ON store_sales.ss_item_sk = 
 store_returns.sr_item_sk and store_sales.ss_ticket_number = 
 store_returns.sr_ticket_number
 JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk
 JOIN date_dim d1 ON store_sales.ss_sold_date_sk = d1.d_date_sk
 JOIN date_dim d2 ON customer.c_first_sales_date_sk = d2.d_date_sk 
 JOIN date_dim d3 ON customer.c_first_shipto_date_sk = d3.d_date_sk
 JOIN store ON store_sales.ss_store_sk = store.s_store_sk
 JOIN customer_demographics cd1 ON store_sales.ss_cdemo_sk= 
 cd1.cd_demo_sk
 JOIN customer_demographics cd2 ON customer.c_current_cdemo_sk = 
 cd2.cd_demo_sk
 JOIN promotion ON store_sales.ss_promo_sk = promotion.p_promo_sk
 JOIN household_demographics hd1 ON store_sales.ss_hdemo_sk = 
 hd1.hd_demo_sk
 JOIN household_demographics hd2 ON customer.c_current_hdemo_sk = 
 hd2.hd_demo_sk
 JOIN customer_address ad1 ON store_sales.ss_addr_sk = 
 ad1.ca_address_sk
 JOIN customer_address ad2 ON customer.c_current_addr_sk = 
 ad2.ca_address_sk
 JOIN income_band ib1 ON hd1.hd_income_band_sk = ib1.ib_income_band_sk
 JOIN income_band ib2 ON hd2.hd_income_band_sk = ib2.ib_income_band_sk
 JOIN item ON store_sales.ss_item_sk = item.i_item_sk
 JOIN
  (select cs_item_sk
 ,sum(cs_ext_list_price) as 
 sale,sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit) as refund
   from catalog_sales JOIN catalog_returns
   ON catalog_sales.cs_item_sk = catalog_returns.cr_item_sk
 and catalog_sales.cs_order_number = catalog_returns.cr_order_number
   group by cs_item_sk
   having 
 sum(cs_ext_list_price)2*sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit))
  cs_ui
 ON store_sales.ss_item_sk = cs_ui.cs_item_sk
   WHERE  
  cd1.cd_marital_status  cd2.cd_marital_status and
  i_color in ('maroon','burnished','dim','steel','navajo','chocolate') 
 and
  i_current_price between 35 and 35 + 10 and
  i_current_price between 35 + 1 and 35 + 15
 group by i_product_name ,i_item_sk ,s_store_name ,s_zip ,ad1.ca_street_number

[jira] [Commented] (HIVE-10244) Vectorization : TPC-DS Q80 fails with java.lang.ClassCastException when hive.vectorized.execution.reduce.enabled is enabled

2015-05-10 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537525#comment-14537525
 ] 

Matt McCline commented on HIVE-10244:
-


This information is from TPC-DS 67, but I think it is probably the same problem:

The exception occurs in (vectorized) Reducer 3 fed from Map 2.

The type shown in the Explain plan for Map 2 shows _col8 as type string.
And, Reducer 3 shows _col8 as type string.  Yet, we seem to create the 
Vectorized Row Batch in the Tez Reduce code as Decimal?

The COALESCE in the last column of Select Operator expressions shows type 
decimal(18,2).
Yet, the following Group By operator shows the last column with a funny name 
0 and type string.
Something is not right here.

Partial explain output from the end of Map 2:
{noformat}
  Select Operator
expressions: _col3 (type: string), _col2 (type: 
string), _col1 (type: string), _col4 (type: string), _col12 (type: int), _col14 
(type: int), _col13 (type: int), _col16 (type: string), COALESCE((_col8 * CAST( 
_col7 AS decimal(10,0))),0) (type: decimal(18,2))
outputColumnNames: _col0, _col1, _col2, _col3, 
_col4, _col5, _col6, _col7, _col8
Statistics: Num rows: 915185 Data size: 
1236889305 Basic stats: COMPLETE Column stats: NONE
Group By Operator
  aggregations: sum(_col8)
  keys: _col0 (type: string), _col1 (type: 
string), _col2 (type: string), _col3 (type: string), _col4 (type: int), _col5 
(type: int), _col6 (type: int), _col7 (type: string), '0' (type: string)
  mode: hash
  outputColumnNames: _col0, _col1, _col2, 
_col3, _col4, _col5, _col6, _col7, _col8, _col9
  Statistics: Num rows: 8236665 Data size: 
11132003745 Basic stats: COMPLETE Column stats: NONE
  Reduce Output Operator
key expressions: _col0 (type: string), 
_col1 (type: string), _col2 (type: string), _col3 (type: string), _col4 (type: 
int), _col5 (type: int), _col6 (type: int), _col7 (type: string), _col8 (type: 
string)
sort order: +
Map-reduce partition columns: _col0 (type: 
string), _col1 (type: string), _col2 (type: string), _col3 (type: string), 
_col4 (type: int), _col5 (type: int), _col6 (type: int), _col7 (type: string), 
_col8 (type: string)
Statistics: Num rows: 8236665 Data size: 
11132003745 Basic stats: COMPLETE Column stats: NONE
value expressions: _col9 (type: 
decimal(28,2))
{noformat}

Full explain output:

{noformat}
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
Tez
  Edges:
Map 2 - Map 1 (BROADCAST_EDGE), Map 6 (BROADCAST_EDGE), Map 7 
(BROADCAST_EDGE)
Reducer 3 - Map 2 (SIMPLE_EDGE)
Reducer 4 - Reducer 3 (SIMPLE_EDGE)
Reducer 5 - Reducer 4 (SIMPLE_EDGE)
 A masked pattern was here 
  Vertices:
Map 1 
Map Operator Tree:
TableScan
  alias: item
  Statistics: Num rows: 18000 Data size: 29671008 Basic stats: 
COMPLETE Column stats: NONE
  Filter Operator
predicate: i_item_sk is not null (type: boolean)
Statistics: Num rows: 9000 Data size: 14835504 Basic stats: 
COMPLETE Column stats: NONE
Select Operator
  expressions: i_item_sk (type: int), i_brand (type: 
string), i_class (type: string), i_category (type: string), i_product_name 
(type: string)
  outputColumnNames: _col0, _col1, _col2, _col3, _col4
  Statistics: Num rows: 9000 Data size: 14835504 Basic 
stats: COMPLETE Column stats: NONE
  Reduce Output Operator
key expressions: _col0 (type: int)
sort order: +
Map-reduce partition columns: _col0 (type: int)
Statistics: Num rows: 9000 Data size: 14835504 Basic 
stats: COMPLETE Column stats: NONE
value expressions: _col1 (type: string), _col2 (type: 
string), _col3 (type: string), _col4 (type: string)
Map 2 
Map Operator Tree:
TableScan
  alias: store_sales
  Statistics: Num rows: 2750370 Data size: 3717170032 Basic 
stats: COMPLETE Column stats: NONE
  Filter Operator
predicate: (ss_sold_date_sk is not null and ss_item_sk is 
not null) (type: boolean)
   

[jira] [Commented] (HIVE-10609) Vectorization : Q64 fails with ClassCastException

2015-05-10 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537532#comment-14537532
 ] 

Matt McCline commented on HIVE-10609:
-

Test failures are not related to the changes here.

 Vectorization : Q64 fails with ClassCastException
 -

 Key: HIVE-10609
 URL: https://issues.apache.org/jira/browse/HIVE-10609
 Project: Hive
  Issue Type: Bug
  Components: Vectorization
Affects Versions: 1.2.0
Reporter: Mostafa Mokhtar
Assignee: Matt McCline
 Attachments: HIVE-10609.01.patch, HIVE-10609.02.patch


 TPC-DS Q64 fails with ClassCastException.
 Query
 {code}
 select cs1.product_name ,cs1.store_name ,cs1.store_zip ,cs1.b_street_number 
 ,cs1.b_streen_name ,cs1.b_city
  ,cs1.b_zip ,cs1.c_street_number ,cs1.c_street_name ,cs1.c_city 
 ,cs1.c_zip ,cs1.syear ,cs1.cnt
  ,cs1.s1 ,cs1.s2 ,cs1.s3
  ,cs2.s1 ,cs2.s2 ,cs2.s3 ,cs2.syear ,cs2.cnt
 from
 (select i_product_name as product_name ,i_item_sk as item_sk ,s_store_name as 
 store_name
  ,s_zip as store_zip ,ad1.ca_street_number as b_street_number 
 ,ad1.ca_street_name as b_streen_name
  ,ad1.ca_city as b_city ,ad1.ca_zip as b_zip ,ad2.ca_street_number as 
 c_street_number
  ,ad2.ca_street_name as c_street_name ,ad2.ca_city as c_city ,ad2.ca_zip 
 as c_zip
  ,d1.d_year as syear ,d2.d_year as fsyear ,d3.d_year as s2year ,count(*) 
 as cnt
  ,sum(ss_wholesale_cost) as s1 ,sum(ss_list_price) as s2 
 ,sum(ss_coupon_amt) as s3
   FROM   store_sales
 JOIN store_returns ON store_sales.ss_item_sk = 
 store_returns.sr_item_sk and store_sales.ss_ticket_number = 
 store_returns.sr_ticket_number
 JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk
 JOIN date_dim d1 ON store_sales.ss_sold_date_sk = d1.d_date_sk
 JOIN date_dim d2 ON customer.c_first_sales_date_sk = d2.d_date_sk 
 JOIN date_dim d3 ON customer.c_first_shipto_date_sk = d3.d_date_sk
 JOIN store ON store_sales.ss_store_sk = store.s_store_sk
 JOIN customer_demographics cd1 ON store_sales.ss_cdemo_sk= 
 cd1.cd_demo_sk
 JOIN customer_demographics cd2 ON customer.c_current_cdemo_sk = 
 cd2.cd_demo_sk
 JOIN promotion ON store_sales.ss_promo_sk = promotion.p_promo_sk
 JOIN household_demographics hd1 ON store_sales.ss_hdemo_sk = 
 hd1.hd_demo_sk
 JOIN household_demographics hd2 ON customer.c_current_hdemo_sk = 
 hd2.hd_demo_sk
 JOIN customer_address ad1 ON store_sales.ss_addr_sk = 
 ad1.ca_address_sk
 JOIN customer_address ad2 ON customer.c_current_addr_sk = 
 ad2.ca_address_sk
 JOIN income_band ib1 ON hd1.hd_income_band_sk = ib1.ib_income_band_sk
 JOIN income_band ib2 ON hd2.hd_income_band_sk = ib2.ib_income_band_sk
 JOIN item ON store_sales.ss_item_sk = item.i_item_sk
 JOIN
  (select cs_item_sk
 ,sum(cs_ext_list_price) as 
 sale,sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit) as refund
   from catalog_sales JOIN catalog_returns
   ON catalog_sales.cs_item_sk = catalog_returns.cr_item_sk
 and catalog_sales.cs_order_number = catalog_returns.cr_order_number
   group by cs_item_sk
   having 
 sum(cs_ext_list_price)2*sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit))
  cs_ui
 ON store_sales.ss_item_sk = cs_ui.cs_item_sk
   WHERE  
  cd1.cd_marital_status  cd2.cd_marital_status and
  i_color in ('maroon','burnished','dim','steel','navajo','chocolate') 
 and
  i_current_price between 35 and 35 + 10 and
  i_current_price between 35 + 1 and 35 + 15
 group by i_product_name ,i_item_sk ,s_store_name ,s_zip ,ad1.ca_street_number
,ad1.ca_street_name ,ad1.ca_city ,ad1.ca_zip ,ad2.ca_street_number
,ad2.ca_street_name ,ad2.ca_city ,ad2.ca_zip ,d1.d_year ,d2.d_year 
 ,d3.d_year
 ) cs1
 JOIN
 (select i_product_name as product_name ,i_item_sk as item_sk ,s_store_name as 
 store_name
  ,s_zip as store_zip ,ad1.ca_street_number as b_street_number 
 ,ad1.ca_street_name as b_streen_name
  ,ad1.ca_city as b_city ,ad1.ca_zip as b_zip ,ad2.ca_street_number as 
 c_street_number
  ,ad2.ca_street_name as c_street_name ,ad2.ca_city as c_city ,ad2.ca_zip 
 as c_zip
  ,d1.d_year as syear ,d2.d_year as fsyear ,d3.d_year as s2year ,count(*) 
 as cnt
  ,sum(ss_wholesale_cost) as s1 ,sum(ss_list_price) as s2 
 ,sum(ss_coupon_amt) as s3
   FROM   store_sales
 JOIN store_returns ON store_sales.ss_item_sk = 
 store_returns.sr_item_sk and store_sales.ss_ticket_number = 
 store_returns.sr_ticket_number
 JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk
 JOIN date_dim d1 ON store_sales.ss_sold_date_sk = d1.d_date_sk
 JOIN date_dim d2 ON customer.c_first_sales_date_sk = d2.d_date_sk 

[jira] [Updated] (HIVE-10565) LLAP: Native Vector Map Join doesn't handle filtering and matching on LEFT OUTER JOIN repeated key correctly

2015-05-10 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-10565:

Attachment: HIVE-10565.092.patch

 LLAP: Native Vector Map Join doesn't handle filtering and matching on LEFT 
 OUTER JOIN repeated key correctly
 

 Key: HIVE-10565
 URL: https://issues.apache.org/jira/browse/HIVE-10565
 Project: Hive
  Issue Type: Sub-task
  Components: Hive
Affects Versions: 1.2.0
Reporter: Matt McCline
Assignee: Matt McCline
Priority: Critical
 Fix For: 1.3.0

 Attachments: HIVE-10565.01.patch, HIVE-10565.02.patch, 
 HIVE-10565.03.patch, HIVE-10565.04.patch, HIVE-10565.05.patch, 
 HIVE-10565.06.patch, HIVE-10565.07.patch, HIVE-10565.08.patch, 
 HIVE-10565.09.patch, HIVE-10565.091.patch, HIVE-10565.092.patch


 Filtering can knock out some of the rows for a repeated key, but those 
 knocked out rows need to be included in the LEFT OUTER JOIN result and are 
 currently not when only some rows are filtered out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10036) Writing ORC format big table causes OOM - too many fixed sized stream buffers

2015-05-10 Thread Selina Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537540#comment-14537540
 ] 

Selina Zhang commented on HIVE-10036:
-

The above two unit test failures seem irrelevant to this patch. 

 Writing ORC format big table causes OOM - too many fixed sized stream buffers
 -

 Key: HIVE-10036
 URL: https://issues.apache.org/jira/browse/HIVE-10036
 Project: Hive
  Issue Type: Improvement
Reporter: Selina Zhang
Assignee: Selina Zhang
  Labels: orcfile
 Attachments: HIVE-10036.1.patch, HIVE-10036.2.patch, 
 HIVE-10036.3.patch, HIVE-10036.5.patch, HIVE-10036.6.patch, 
 HIVE-10036.7.patch, HIVE-10036.8.patch, HIVE-10036.9.patch


 ORC writer keeps multiple out steams for each column. Each output stream is 
 allocated fixed size ByteBuffer (configurable, default to 256K). For a big 
 table, the memory cost is unbearable. Specially when HCatalog dynamic 
 partition involves, several hundreds files may be open and writing at the 
 same time (same problems for FileSinkOperator). 
 Global ORC memory manager controls the buffer size, but it only got kicked in 
 at 5000 rows interval. An enhancement could be done here, but the problem is 
 reducing the buffer size introduces worse compression and more IOs in read 
 path. Sacrificing the read performance is always not a good choice. 
 I changed the fixed size ByteBuffer to a dynamic growth buffer which up bound 
 to the existing configurable buffer size. Most of the streams does not need 
 large buffer so the performance got improved significantly. Comparing to 
 Facebook's hive-dwrf, I monitored 2x performance gain with this fix. 
 Solving OOM for ORC completely maybe needs lots of effort , but this is 
 definitely a low hanging fruit. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10565) LLAP: Native Vector Map Join doesn't handle filtering and matching on LEFT OUTER JOIN repeated key correctly

2015-05-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537611#comment-14537611
 ] 

Hive QA commented on HIVE-10565:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12731848/HIVE-10565.092.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 8935 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_static
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchEmptyCommit
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3844/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3844/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3844/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12731848 - PreCommit-HIVE-TRUNK-Build

 LLAP: Native Vector Map Join doesn't handle filtering and matching on LEFT 
 OUTER JOIN repeated key correctly
 

 Key: HIVE-10565
 URL: https://issues.apache.org/jira/browse/HIVE-10565
 Project: Hive
  Issue Type: Sub-task
  Components: Hive
Affects Versions: 1.2.0
Reporter: Matt McCline
Assignee: Matt McCline
Priority: Critical
 Fix For: 1.3.0

 Attachments: HIVE-10565.01.patch, HIVE-10565.02.patch, 
 HIVE-10565.03.patch, HIVE-10565.04.patch, HIVE-10565.05.patch, 
 HIVE-10565.06.patch, HIVE-10565.07.patch, HIVE-10565.08.patch, 
 HIVE-10565.09.patch, HIVE-10565.091.patch, HIVE-10565.092.patch


 Filtering can knock out some of the rows for a repeated key, but those 
 knocked out rows need to be included in the LEFT OUTER JOIN result and are 
 currently not when only some rows are filtered out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10036) Writing ORC format big table causes OOM - too many fixed sized stream buffers

2015-05-10 Thread Selina Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Selina Zhang updated HIVE-10036:

Attachment: HIVE-10036.9.patch

Fixed the unit tests. 

 Writing ORC format big table causes OOM - too many fixed sized stream buffers
 -

 Key: HIVE-10036
 URL: https://issues.apache.org/jira/browse/HIVE-10036
 Project: Hive
  Issue Type: Improvement
Reporter: Selina Zhang
Assignee: Selina Zhang
  Labels: orcfile
 Attachments: HIVE-10036.1.patch, HIVE-10036.2.patch, 
 HIVE-10036.3.patch, HIVE-10036.5.patch, HIVE-10036.6.patch, 
 HIVE-10036.7.patch, HIVE-10036.8.patch, HIVE-10036.9.patch


 ORC writer keeps multiple out steams for each column. Each output stream is 
 allocated fixed size ByteBuffer (configurable, default to 256K). For a big 
 table, the memory cost is unbearable. Specially when HCatalog dynamic 
 partition involves, several hundreds files may be open and writing at the 
 same time (same problems for FileSinkOperator). 
 Global ORC memory manager controls the buffer size, but it only got kicked in 
 at 5000 rows interval. An enhancement could be done here, but the problem is 
 reducing the buffer size introduces worse compression and more IOs in read 
 path. Sacrificing the read performance is always not a good choice. 
 I changed the fixed size ByteBuffer to a dynamic growth buffer which up bound 
 to the existing configurable buffer size. Most of the streams does not need 
 large buffer so the performance got improved significantly. Comparing to 
 Facebook's hive-dwrf, I monitored 2x performance gain with this fix. 
 Solving OOM for ORC completely maybe needs lots of effort , but this is 
 definitely a low hanging fruit. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9392) JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to column names having duplicated fqColumnName

2015-05-10 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-9392:
--
Attachment: HIVE-9392.6.patch

patch without q files updated.

 JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to 
 column names having duplicated fqColumnName
 

 Key: HIVE-9392
 URL: https://issues.apache.org/jira/browse/HIVE-9392
 Project: Hive
  Issue Type: Bug
  Components: Physical Optimizer
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Pengcheng Xiong
Priority: Critical
 Attachments: HIVE-9392.1.patch, HIVE-9392.2.patch, HIVE-9392.3.patch, 
 HIVE-9392.4.patch, HIVE-9392.5.patch, HIVE-9392.6.patch


 In JoinStatsRule.process the join column statistics are stored in HashMap  
 joinedColStats, the key used which is the ColStatistics.fqColName is 
 duplicated between join column in the same vertex, as a result distinctVals 
 ends up having duplicated values which negatively affects the join 
 cardinality estimation.
 The duplicate keys are usually named KEY.reducesinkkey0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10609) Vectorization : Q64 fails with ClassCastException

2015-05-10 Thread Mostafa Mokhtar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537338#comment-14537338
 ] 

Mostafa Mokhtar commented on HIVE-10609:


[~mmccline]
I tried the patch on TPC-DS 200GB and query runs fine and returns correct 
results.

 Vectorization : Q64 fails with ClassCastException
 -

 Key: HIVE-10609
 URL: https://issues.apache.org/jira/browse/HIVE-10609
 Project: Hive
  Issue Type: Bug
  Components: Vectorization
Affects Versions: 1.2.0
Reporter: Mostafa Mokhtar
Assignee: Matt McCline
 Attachments: HIVE-10609.01.patch


 TPC-DS Q64 fails with ClassCastException.
 Query
 {code}
 select cs1.product_name ,cs1.store_name ,cs1.store_zip ,cs1.b_street_number 
 ,cs1.b_streen_name ,cs1.b_city
  ,cs1.b_zip ,cs1.c_street_number ,cs1.c_street_name ,cs1.c_city 
 ,cs1.c_zip ,cs1.syear ,cs1.cnt
  ,cs1.s1 ,cs1.s2 ,cs1.s3
  ,cs2.s1 ,cs2.s2 ,cs2.s3 ,cs2.syear ,cs2.cnt
 from
 (select i_product_name as product_name ,i_item_sk as item_sk ,s_store_name as 
 store_name
  ,s_zip as store_zip ,ad1.ca_street_number as b_street_number 
 ,ad1.ca_street_name as b_streen_name
  ,ad1.ca_city as b_city ,ad1.ca_zip as b_zip ,ad2.ca_street_number as 
 c_street_number
  ,ad2.ca_street_name as c_street_name ,ad2.ca_city as c_city ,ad2.ca_zip 
 as c_zip
  ,d1.d_year as syear ,d2.d_year as fsyear ,d3.d_year as s2year ,count(*) 
 as cnt
  ,sum(ss_wholesale_cost) as s1 ,sum(ss_list_price) as s2 
 ,sum(ss_coupon_amt) as s3
   FROM   store_sales
 JOIN store_returns ON store_sales.ss_item_sk = 
 store_returns.sr_item_sk and store_sales.ss_ticket_number = 
 store_returns.sr_ticket_number
 JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk
 JOIN date_dim d1 ON store_sales.ss_sold_date_sk = d1.d_date_sk
 JOIN date_dim d2 ON customer.c_first_sales_date_sk = d2.d_date_sk 
 JOIN date_dim d3 ON customer.c_first_shipto_date_sk = d3.d_date_sk
 JOIN store ON store_sales.ss_store_sk = store.s_store_sk
 JOIN customer_demographics cd1 ON store_sales.ss_cdemo_sk= 
 cd1.cd_demo_sk
 JOIN customer_demographics cd2 ON customer.c_current_cdemo_sk = 
 cd2.cd_demo_sk
 JOIN promotion ON store_sales.ss_promo_sk = promotion.p_promo_sk
 JOIN household_demographics hd1 ON store_sales.ss_hdemo_sk = 
 hd1.hd_demo_sk
 JOIN household_demographics hd2 ON customer.c_current_hdemo_sk = 
 hd2.hd_demo_sk
 JOIN customer_address ad1 ON store_sales.ss_addr_sk = 
 ad1.ca_address_sk
 JOIN customer_address ad2 ON customer.c_current_addr_sk = 
 ad2.ca_address_sk
 JOIN income_band ib1 ON hd1.hd_income_band_sk = ib1.ib_income_band_sk
 JOIN income_band ib2 ON hd2.hd_income_band_sk = ib2.ib_income_band_sk
 JOIN item ON store_sales.ss_item_sk = item.i_item_sk
 JOIN
  (select cs_item_sk
 ,sum(cs_ext_list_price) as 
 sale,sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit) as refund
   from catalog_sales JOIN catalog_returns
   ON catalog_sales.cs_item_sk = catalog_returns.cr_item_sk
 and catalog_sales.cs_order_number = catalog_returns.cr_order_number
   group by cs_item_sk
   having 
 sum(cs_ext_list_price)2*sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit))
  cs_ui
 ON store_sales.ss_item_sk = cs_ui.cs_item_sk
   WHERE  
  cd1.cd_marital_status  cd2.cd_marital_status and
  i_color in ('maroon','burnished','dim','steel','navajo','chocolate') 
 and
  i_current_price between 35 and 35 + 10 and
  i_current_price between 35 + 1 and 35 + 15
 group by i_product_name ,i_item_sk ,s_store_name ,s_zip ,ad1.ca_street_number
,ad1.ca_street_name ,ad1.ca_city ,ad1.ca_zip ,ad2.ca_street_number
,ad2.ca_street_name ,ad2.ca_city ,ad2.ca_zip ,d1.d_year ,d2.d_year 
 ,d3.d_year
 ) cs1
 JOIN
 (select i_product_name as product_name ,i_item_sk as item_sk ,s_store_name as 
 store_name
  ,s_zip as store_zip ,ad1.ca_street_number as b_street_number 
 ,ad1.ca_street_name as b_streen_name
  ,ad1.ca_city as b_city ,ad1.ca_zip as b_zip ,ad2.ca_street_number as 
 c_street_number
  ,ad2.ca_street_name as c_street_name ,ad2.ca_city as c_city ,ad2.ca_zip 
 as c_zip
  ,d1.d_year as syear ,d2.d_year as fsyear ,d3.d_year as s2year ,count(*) 
 as cnt
  ,sum(ss_wholesale_cost) as s1 ,sum(ss_list_price) as s2 
 ,sum(ss_coupon_amt) as s3
   FROM   store_sales
 JOIN store_returns ON store_sales.ss_item_sk = 
 store_returns.sr_item_sk and store_sales.ss_ticket_number = 
 store_returns.sr_ticket_number
 JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk
 JOIN date_dim d1 ON store_sales.ss_sold_date_sk = d1.d_date_sk
 JOIN date_dim d2 ON 

[jira] [Updated] (HIVE-10568) Select count(distinct()) can have more optimal execution plan

2015-05-10 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-10568:
--
Labels: TODOC1.2  (was: )

 Select count(distinct()) can have more optimal execution plan
 -

 Key: HIVE-10568
 URL: https://issues.apache.org/jira/browse/HIVE-10568
 Project: Hive
  Issue Type: Improvement
  Components: CBO, Logical Optimizer
Affects Versions: 0.6.0, 0.7.0, 0.8.0, 0.9.0, 0.10.0, 0.11.0, 0.12.0, 
 0.13.0, 0.14.0, 1.0.0, 1.1.0
Reporter: Mostafa Mokhtar
Assignee: Ashutosh Chauhan
  Labels: TODOC1.2
 Fix For: 1.2.0

 Attachments: HIVE-10568.1.patch, HIVE-10568.2.patch, 
 HIVE-10568.patch, HIVE-10568.patch


 {code:sql}
 select count(distinct ss_ticket_number) from store_sales;
 {code}
 can be rewritten as
 {code:sql}
 select count(1) from (select distinct ss_ticket_number from store_sales) a;
 {code}
 which may run upto 3x faster



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10568) Select count(distinct()) can have more optimal execution plan

2015-05-10 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537371#comment-14537371
 ] 

Lefty Leverenz commented on HIVE-10568:
---

Doc note:  This adds *hive.optimize.distinct.rewrite* to HiveConf.java, so it 
needs to be documented in the wiki.  Perhaps it belongs in the UDFs doc as well 
as Configs.

* [Configuration Properties -- Query and DDL Execution | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-QueryandDDLExecution]
* [Built-in Aggregate Functions (UDAF) | 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-Built-inAggregateFunctions(UDAF)]

 Select count(distinct()) can have more optimal execution plan
 -

 Key: HIVE-10568
 URL: https://issues.apache.org/jira/browse/HIVE-10568
 Project: Hive
  Issue Type: Improvement
  Components: CBO, Logical Optimizer
Affects Versions: 0.6.0, 0.7.0, 0.8.0, 0.9.0, 0.10.0, 0.11.0, 0.12.0, 
 0.13.0, 0.14.0, 1.0.0, 1.1.0
Reporter: Mostafa Mokhtar
Assignee: Ashutosh Chauhan
  Labels: TODOC1.2
 Fix For: 1.2.0

 Attachments: HIVE-10568.1.patch, HIVE-10568.2.patch, 
 HIVE-10568.patch, HIVE-10568.patch


 {code:sql}
 select count(distinct ss_ticket_number) from store_sales;
 {code}
 can be rewritten as
 {code:sql}
 select count(1) from (select distinct ss_ticket_number from store_sales) a;
 {code}
 which may run upto 3x faster



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10036) Writing ORC format big table causes OOM - too many fixed sized stream buffers

2015-05-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537415#comment-14537415
 ] 

Hive QA commented on HIVE-10036:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12731823/HIVE-10036.9.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 8921 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_static
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchCommit_Json
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3841/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3841/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3841/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12731823 - PreCommit-HIVE-TRUNK-Build

 Writing ORC format big table causes OOM - too many fixed sized stream buffers
 -

 Key: HIVE-10036
 URL: https://issues.apache.org/jira/browse/HIVE-10036
 Project: Hive
  Issue Type: Improvement
Reporter: Selina Zhang
Assignee: Selina Zhang
  Labels: orcfile
 Attachments: HIVE-10036.1.patch, HIVE-10036.2.patch, 
 HIVE-10036.3.patch, HIVE-10036.5.patch, HIVE-10036.6.patch, 
 HIVE-10036.7.patch, HIVE-10036.8.patch, HIVE-10036.9.patch


 ORC writer keeps multiple out steams for each column. Each output stream is 
 allocated fixed size ByteBuffer (configurable, default to 256K). For a big 
 table, the memory cost is unbearable. Specially when HCatalog dynamic 
 partition involves, several hundreds files may be open and writing at the 
 same time (same problems for FileSinkOperator). 
 Global ORC memory manager controls the buffer size, but it only got kicked in 
 at 5000 rows interval. An enhancement could be done here, but the problem is 
 reducing the buffer size introduces worse compression and more IOs in read 
 path. Sacrificing the read performance is always not a good choice. 
 I changed the fixed size ByteBuffer to a dynamic growth buffer which up bound 
 to the existing configurable buffer size. Most of the streams does not need 
 large buffer so the performance got improved significantly. Comparing to 
 Facebook's hive-dwrf, I monitored 2x performance gain with this fix. 
 Solving OOM for ORC completely maybe needs lots of effort , but this is 
 definitely a low hanging fruit. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9392) JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to column names having duplicated fqColumnName

2015-05-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537492#comment-14537492
 ] 

Hive QA commented on HIVE-9392:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12731833/HIVE-9392.6.patch

{color:red}ERROR:{color} -1 due to 384 failed/errored test(s), 8921 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_filter
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_limit
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_select
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join30
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join31
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join32
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_stats
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_stats2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_without_localtask
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_14
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_15
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_6
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_7
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_9
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_binarysortable_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucket_map_join_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucket_map_join_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketsortoptimize_insert_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketsortoptimize_insert_4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketsortoptimize_insert_5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketsortoptimize_insert_6
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_column_access_stats
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_complex_alias
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer10
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer11
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer15
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer6
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cross_product_check_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_explain_logical
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_explain_rearrange
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_filter_numeric
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_fold_case
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_fold_when
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_cube1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_grouping_sets2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_grouping_sets3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_grouping_sets4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_grouping_sets5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_grouping_sets6
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_ppd
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_rollup1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_sort_6
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_having2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_identity_project_remove_skip
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_implicit_cast1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_empty
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_mult_tables

[jira] [Updated] (HIVE-10609) Vectorization : Q64 fails with ClassCastException

2015-05-10 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-10609:

Attachment: HIVE-10609.01.patch

 Vectorization : Q64 fails with ClassCastException
 -

 Key: HIVE-10609
 URL: https://issues.apache.org/jira/browse/HIVE-10609
 Project: Hive
  Issue Type: Bug
  Components: Vectorization
Affects Versions: 1.2.0
Reporter: Mostafa Mokhtar
Assignee: Matt McCline
 Attachments: HIVE-10609.01.patch


 TPC-DS Q64 fails with ClassCastException.
 Query
 {code}
 select cs1.product_name ,cs1.store_name ,cs1.store_zip ,cs1.b_street_number 
 ,cs1.b_streen_name ,cs1.b_city
  ,cs1.b_zip ,cs1.c_street_number ,cs1.c_street_name ,cs1.c_city 
 ,cs1.c_zip ,cs1.syear ,cs1.cnt
  ,cs1.s1 ,cs1.s2 ,cs1.s3
  ,cs2.s1 ,cs2.s2 ,cs2.s3 ,cs2.syear ,cs2.cnt
 from
 (select i_product_name as product_name ,i_item_sk as item_sk ,s_store_name as 
 store_name
  ,s_zip as store_zip ,ad1.ca_street_number as b_street_number 
 ,ad1.ca_street_name as b_streen_name
  ,ad1.ca_city as b_city ,ad1.ca_zip as b_zip ,ad2.ca_street_number as 
 c_street_number
  ,ad2.ca_street_name as c_street_name ,ad2.ca_city as c_city ,ad2.ca_zip 
 as c_zip
  ,d1.d_year as syear ,d2.d_year as fsyear ,d3.d_year as s2year ,count(*) 
 as cnt
  ,sum(ss_wholesale_cost) as s1 ,sum(ss_list_price) as s2 
 ,sum(ss_coupon_amt) as s3
   FROM   store_sales
 JOIN store_returns ON store_sales.ss_item_sk = 
 store_returns.sr_item_sk and store_sales.ss_ticket_number = 
 store_returns.sr_ticket_number
 JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk
 JOIN date_dim d1 ON store_sales.ss_sold_date_sk = d1.d_date_sk
 JOIN date_dim d2 ON customer.c_first_sales_date_sk = d2.d_date_sk 
 JOIN date_dim d3 ON customer.c_first_shipto_date_sk = d3.d_date_sk
 JOIN store ON store_sales.ss_store_sk = store.s_store_sk
 JOIN customer_demographics cd1 ON store_sales.ss_cdemo_sk= 
 cd1.cd_demo_sk
 JOIN customer_demographics cd2 ON customer.c_current_cdemo_sk = 
 cd2.cd_demo_sk
 JOIN promotion ON store_sales.ss_promo_sk = promotion.p_promo_sk
 JOIN household_demographics hd1 ON store_sales.ss_hdemo_sk = 
 hd1.hd_demo_sk
 JOIN household_demographics hd2 ON customer.c_current_hdemo_sk = 
 hd2.hd_demo_sk
 JOIN customer_address ad1 ON store_sales.ss_addr_sk = 
 ad1.ca_address_sk
 JOIN customer_address ad2 ON customer.c_current_addr_sk = 
 ad2.ca_address_sk
 JOIN income_band ib1 ON hd1.hd_income_band_sk = ib1.ib_income_band_sk
 JOIN income_band ib2 ON hd2.hd_income_band_sk = ib2.ib_income_band_sk
 JOIN item ON store_sales.ss_item_sk = item.i_item_sk
 JOIN
  (select cs_item_sk
 ,sum(cs_ext_list_price) as 
 sale,sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit) as refund
   from catalog_sales JOIN catalog_returns
   ON catalog_sales.cs_item_sk = catalog_returns.cr_item_sk
 and catalog_sales.cs_order_number = catalog_returns.cr_order_number
   group by cs_item_sk
   having 
 sum(cs_ext_list_price)2*sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit))
  cs_ui
 ON store_sales.ss_item_sk = cs_ui.cs_item_sk
   WHERE  
  cd1.cd_marital_status  cd2.cd_marital_status and
  i_color in ('maroon','burnished','dim','steel','navajo','chocolate') 
 and
  i_current_price between 35 and 35 + 10 and
  i_current_price between 35 + 1 and 35 + 15
 group by i_product_name ,i_item_sk ,s_store_name ,s_zip ,ad1.ca_street_number
,ad1.ca_street_name ,ad1.ca_city ,ad1.ca_zip ,ad2.ca_street_number
,ad2.ca_street_name ,ad2.ca_city ,ad2.ca_zip ,d1.d_year ,d2.d_year 
 ,d3.d_year
 ) cs1
 JOIN
 (select i_product_name as product_name ,i_item_sk as item_sk ,s_store_name as 
 store_name
  ,s_zip as store_zip ,ad1.ca_street_number as b_street_number 
 ,ad1.ca_street_name as b_streen_name
  ,ad1.ca_city as b_city ,ad1.ca_zip as b_zip ,ad2.ca_street_number as 
 c_street_number
  ,ad2.ca_street_name as c_street_name ,ad2.ca_city as c_city ,ad2.ca_zip 
 as c_zip
  ,d1.d_year as syear ,d2.d_year as fsyear ,d3.d_year as s2year ,count(*) 
 as cnt
  ,sum(ss_wholesale_cost) as s1 ,sum(ss_list_price) as s2 
 ,sum(ss_coupon_amt) as s3
   FROM   store_sales
 JOIN store_returns ON store_sales.ss_item_sk = 
 store_returns.sr_item_sk and store_sales.ss_ticket_number = 
 store_returns.sr_ticket_number
 JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk
 JOIN date_dim d1 ON store_sales.ss_sold_date_sk = d1.d_date_sk
 JOIN date_dim d2 ON customer.c_first_sales_date_sk = d2.d_date_sk 
 JOIN date_dim d3 ON customer.c_first_shipto_date_sk = d3.d_date_sk
 JOIN 

[jira] [Updated] (HIVE-10609) Vectorization : Q64 fails with ClassCastException

2015-05-10 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-10609:

Attachment: (was: HIVE-10609.01.patch)

 Vectorization : Q64 fails with ClassCastException
 -

 Key: HIVE-10609
 URL: https://issues.apache.org/jira/browse/HIVE-10609
 Project: Hive
  Issue Type: Bug
  Components: Vectorization
Affects Versions: 1.2.0
Reporter: Mostafa Mokhtar
Assignee: Matt McCline
 Attachments: HIVE-10609.01.patch


 TPC-DS Q64 fails with ClassCastException.
 Query
 {code}
 select cs1.product_name ,cs1.store_name ,cs1.store_zip ,cs1.b_street_number 
 ,cs1.b_streen_name ,cs1.b_city
  ,cs1.b_zip ,cs1.c_street_number ,cs1.c_street_name ,cs1.c_city 
 ,cs1.c_zip ,cs1.syear ,cs1.cnt
  ,cs1.s1 ,cs1.s2 ,cs1.s3
  ,cs2.s1 ,cs2.s2 ,cs2.s3 ,cs2.syear ,cs2.cnt
 from
 (select i_product_name as product_name ,i_item_sk as item_sk ,s_store_name as 
 store_name
  ,s_zip as store_zip ,ad1.ca_street_number as b_street_number 
 ,ad1.ca_street_name as b_streen_name
  ,ad1.ca_city as b_city ,ad1.ca_zip as b_zip ,ad2.ca_street_number as 
 c_street_number
  ,ad2.ca_street_name as c_street_name ,ad2.ca_city as c_city ,ad2.ca_zip 
 as c_zip
  ,d1.d_year as syear ,d2.d_year as fsyear ,d3.d_year as s2year ,count(*) 
 as cnt
  ,sum(ss_wholesale_cost) as s1 ,sum(ss_list_price) as s2 
 ,sum(ss_coupon_amt) as s3
   FROM   store_sales
 JOIN store_returns ON store_sales.ss_item_sk = 
 store_returns.sr_item_sk and store_sales.ss_ticket_number = 
 store_returns.sr_ticket_number
 JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk
 JOIN date_dim d1 ON store_sales.ss_sold_date_sk = d1.d_date_sk
 JOIN date_dim d2 ON customer.c_first_sales_date_sk = d2.d_date_sk 
 JOIN date_dim d3 ON customer.c_first_shipto_date_sk = d3.d_date_sk
 JOIN store ON store_sales.ss_store_sk = store.s_store_sk
 JOIN customer_demographics cd1 ON store_sales.ss_cdemo_sk= 
 cd1.cd_demo_sk
 JOIN customer_demographics cd2 ON customer.c_current_cdemo_sk = 
 cd2.cd_demo_sk
 JOIN promotion ON store_sales.ss_promo_sk = promotion.p_promo_sk
 JOIN household_demographics hd1 ON store_sales.ss_hdemo_sk = 
 hd1.hd_demo_sk
 JOIN household_demographics hd2 ON customer.c_current_hdemo_sk = 
 hd2.hd_demo_sk
 JOIN customer_address ad1 ON store_sales.ss_addr_sk = 
 ad1.ca_address_sk
 JOIN customer_address ad2 ON customer.c_current_addr_sk = 
 ad2.ca_address_sk
 JOIN income_band ib1 ON hd1.hd_income_band_sk = ib1.ib_income_band_sk
 JOIN income_band ib2 ON hd2.hd_income_band_sk = ib2.ib_income_band_sk
 JOIN item ON store_sales.ss_item_sk = item.i_item_sk
 JOIN
  (select cs_item_sk
 ,sum(cs_ext_list_price) as 
 sale,sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit) as refund
   from catalog_sales JOIN catalog_returns
   ON catalog_sales.cs_item_sk = catalog_returns.cr_item_sk
 and catalog_sales.cs_order_number = catalog_returns.cr_order_number
   group by cs_item_sk
   having 
 sum(cs_ext_list_price)2*sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit))
  cs_ui
 ON store_sales.ss_item_sk = cs_ui.cs_item_sk
   WHERE  
  cd1.cd_marital_status  cd2.cd_marital_status and
  i_color in ('maroon','burnished','dim','steel','navajo','chocolate') 
 and
  i_current_price between 35 and 35 + 10 and
  i_current_price between 35 + 1 and 35 + 15
 group by i_product_name ,i_item_sk ,s_store_name ,s_zip ,ad1.ca_street_number
,ad1.ca_street_name ,ad1.ca_city ,ad1.ca_zip ,ad2.ca_street_number
,ad2.ca_street_name ,ad2.ca_city ,ad2.ca_zip ,d1.d_year ,d2.d_year 
 ,d3.d_year
 ) cs1
 JOIN
 (select i_product_name as product_name ,i_item_sk as item_sk ,s_store_name as 
 store_name
  ,s_zip as store_zip ,ad1.ca_street_number as b_street_number 
 ,ad1.ca_street_name as b_streen_name
  ,ad1.ca_city as b_city ,ad1.ca_zip as b_zip ,ad2.ca_street_number as 
 c_street_number
  ,ad2.ca_street_name as c_street_name ,ad2.ca_city as c_city ,ad2.ca_zip 
 as c_zip
  ,d1.d_year as syear ,d2.d_year as fsyear ,d3.d_year as s2year ,count(*) 
 as cnt
  ,sum(ss_wholesale_cost) as s1 ,sum(ss_list_price) as s2 
 ,sum(ss_coupon_amt) as s3
   FROM   store_sales
 JOIN store_returns ON store_sales.ss_item_sk = 
 store_returns.sr_item_sk and store_sales.ss_ticket_number = 
 store_returns.sr_ticket_number
 JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk
 JOIN date_dim d1 ON store_sales.ss_sold_date_sk = d1.d_date_sk
 JOIN date_dim d2 ON customer.c_first_sales_date_sk = d2.d_date_sk 
 JOIN date_dim d3 ON customer.c_first_shipto_date_sk = d3.d_date_sk
 

[jira] [Commented] (HIVE-10646) ColumnValue does not handle NULL_TYPE

2015-05-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537055#comment-14537055
 ] 

Hive QA commented on HIVE-10646:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12731702/HIVE-10646.1.patch

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 8921 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_percentile_approx_23
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_static
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_index_bitmap3
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_stats_counter_partitioned
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchEmptyCommit
org.apache.hive.jdbc.TestSSL.testSSLFetchHttp
org.apache.hive.spark.client.TestSparkClient.testJobSubmission
org.apache.hive.spark.client.TestSparkClient.testMetricsCollection
org.apache.hive.spark.client.TestSparkClient.testSyncRpc
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3837/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3837/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3837/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12731702 - PreCommit-HIVE-TRUNK-Build

 ColumnValue does not handle NULL_TYPE
 -

 Key: HIVE-10646
 URL: https://issues.apache.org/jira/browse/HIVE-10646
 Project: Hive
  Issue Type: Bug
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
 Attachments: HIVE-10646.1.patch


 This will cause NPE if the thrift client use protocol V5 or older:
 {noformat}
 1:46:07.199 PMERROR   org.apache.thrift.server.TThreadPoolServer  
 Error occurred during processing of message.
 java.lang.NullPointerException
   at 
 org.apache.hive.service.cli.thrift.TRow$TRowStandardScheme.write(TRow.java:388)
   at 
 org.apache.hive.service.cli.thrift.TRow$TRowStandardScheme.write(TRow.java:338)
   at org.apache.hive.service.cli.thrift.TRow.write(TRow.java:288)
   at 
 org.apache.hive.service.cli.thrift.TRowSet$TRowSetStandardScheme.write(TRowSet.java:605)
   at 
 org.apache.hive.service.cli.thrift.TRowSet$TRowSetStandardScheme.write(TRowSet.java:525)
   at org.apache.hive.service.cli.thrift.TRowSet.write(TRowSet.java:455)
   at 
 org.apache.hive.service.cli.thrift.TFetchResultsResp$TFetchResultsRespStandardScheme.write(TFetchResultsResp.java:550)
   at 
 org.apache.hive.service.cli.thrift.TFetchResultsResp$TFetchResultsRespStandardScheme.write(TFetchResultsResp.java:486)
   at 
 org.apache.hive.service.cli.thrift.TFetchResultsResp.write(TFetchResultsResp.java:412)
   at 
 org.apache.hive.service.cli.thrift.TCLIService$FetchResults_result$FetchResults_resultStandardScheme.write(TCLIService.java:13272)
   at 
 org.apache.hive.service.cli.thrift.TCLIService$FetchResults_result$FetchResults_resultStandardScheme.write(TCLIService.java:13236)
   at 
 org.apache.hive.service.cli.thrift.TCLIService$FetchResults_result.write(TCLIService.java:13187)
   at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:53)
   at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
   at 
 org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge20S.java:677)
   at 
 org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:244)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 {noformat}
 Reproduce: Run: select NULL as col, * from jsmall limit 5; from a V5 client 
 (for example some version of Hue).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10646) ColumnValue does not handle NULL_TYPE

2015-05-10 Thread Yongzhi Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537089#comment-14537089
 ] 

Yongzhi Chen commented on HIVE-10646:
-

All the 9 failures are not related:
1. The following 4 tests all passed on the first pre-commit running of this 
PATCH (3821). And my change will not cause random behavior(if it causes 
failures, should always fails), so the failures are not related.
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_percentile_approx_23
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_index_bitmap3
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_stats_counter_partitioned
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchEmptyCommit
2. org.apache.hive.jdbc.TestSSL.testSSLFetchHttp failure is network issue? 
org.apache.http.conn.HttpHostConnectException: Connect to localhost:60355 
[localhost/127.0.0.1, localhost/127.0.0.1] failed: Connection refused.
3. 
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_static
 ages 35
4. All 3 spark failures are java.util.concurrent.TimeoutException: null

[~szehon], could you commit the patch, thanks. 


 ColumnValue does not handle NULL_TYPE
 -

 Key: HIVE-10646
 URL: https://issues.apache.org/jira/browse/HIVE-10646
 Project: Hive
  Issue Type: Bug
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
 Attachments: HIVE-10646.1.patch


 This will cause NPE if the thrift client use protocol V5 or older:
 {noformat}
 1:46:07.199 PMERROR   org.apache.thrift.server.TThreadPoolServer  
 Error occurred during processing of message.
 java.lang.NullPointerException
   at 
 org.apache.hive.service.cli.thrift.TRow$TRowStandardScheme.write(TRow.java:388)
   at 
 org.apache.hive.service.cli.thrift.TRow$TRowStandardScheme.write(TRow.java:338)
   at org.apache.hive.service.cli.thrift.TRow.write(TRow.java:288)
   at 
 org.apache.hive.service.cli.thrift.TRowSet$TRowSetStandardScheme.write(TRowSet.java:605)
   at 
 org.apache.hive.service.cli.thrift.TRowSet$TRowSetStandardScheme.write(TRowSet.java:525)
   at org.apache.hive.service.cli.thrift.TRowSet.write(TRowSet.java:455)
   at 
 org.apache.hive.service.cli.thrift.TFetchResultsResp$TFetchResultsRespStandardScheme.write(TFetchResultsResp.java:550)
   at 
 org.apache.hive.service.cli.thrift.TFetchResultsResp$TFetchResultsRespStandardScheme.write(TFetchResultsResp.java:486)
   at 
 org.apache.hive.service.cli.thrift.TFetchResultsResp.write(TFetchResultsResp.java:412)
   at 
 org.apache.hive.service.cli.thrift.TCLIService$FetchResults_result$FetchResults_resultStandardScheme.write(TCLIService.java:13272)
   at 
 org.apache.hive.service.cli.thrift.TCLIService$FetchResults_result$FetchResults_resultStandardScheme.write(TCLIService.java:13236)
   at 
 org.apache.hive.service.cli.thrift.TCLIService$FetchResults_result.write(TCLIService.java:13187)
   at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:53)
   at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
   at 
 org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge20S.java:677)
   at 
 org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:244)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 {noformat}
 Reproduce: Run: select NULL as col, * from jsmall limit 5; from a V5 client 
 (for example some version of Hue).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10609) Vectorization : Q64 fails with ClassCastException

2015-05-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537122#comment-14537122
 ] 

Hive QA commented on HIVE-10609:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12731779/HIVE-10609.01.patch

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 8919 tests executed
*Failed tests:*
{noformat}
TestHiveAuthFactory - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_left_outer_join2
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_static
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_left_outer_join2
org.apache.hive.jdbc.TestSSL.testSSLFetchHttp
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3839/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3839/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3839/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12731779 - PreCommit-HIVE-TRUNK-Build

 Vectorization : Q64 fails with ClassCastException
 -

 Key: HIVE-10609
 URL: https://issues.apache.org/jira/browse/HIVE-10609
 Project: Hive
  Issue Type: Bug
  Components: Vectorization
Affects Versions: 1.2.0
Reporter: Mostafa Mokhtar
Assignee: Matt McCline
 Attachments: HIVE-10609.01.patch


 TPC-DS Q64 fails with ClassCastException.
 Query
 {code}
 select cs1.product_name ,cs1.store_name ,cs1.store_zip ,cs1.b_street_number 
 ,cs1.b_streen_name ,cs1.b_city
  ,cs1.b_zip ,cs1.c_street_number ,cs1.c_street_name ,cs1.c_city 
 ,cs1.c_zip ,cs1.syear ,cs1.cnt
  ,cs1.s1 ,cs1.s2 ,cs1.s3
  ,cs2.s1 ,cs2.s2 ,cs2.s3 ,cs2.syear ,cs2.cnt
 from
 (select i_product_name as product_name ,i_item_sk as item_sk ,s_store_name as 
 store_name
  ,s_zip as store_zip ,ad1.ca_street_number as b_street_number 
 ,ad1.ca_street_name as b_streen_name
  ,ad1.ca_city as b_city ,ad1.ca_zip as b_zip ,ad2.ca_street_number as 
 c_street_number
  ,ad2.ca_street_name as c_street_name ,ad2.ca_city as c_city ,ad2.ca_zip 
 as c_zip
  ,d1.d_year as syear ,d2.d_year as fsyear ,d3.d_year as s2year ,count(*) 
 as cnt
  ,sum(ss_wholesale_cost) as s1 ,sum(ss_list_price) as s2 
 ,sum(ss_coupon_amt) as s3
   FROM   store_sales
 JOIN store_returns ON store_sales.ss_item_sk = 
 store_returns.sr_item_sk and store_sales.ss_ticket_number = 
 store_returns.sr_ticket_number
 JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk
 JOIN date_dim d1 ON store_sales.ss_sold_date_sk = d1.d_date_sk
 JOIN date_dim d2 ON customer.c_first_sales_date_sk = d2.d_date_sk 
 JOIN date_dim d3 ON customer.c_first_shipto_date_sk = d3.d_date_sk
 JOIN store ON store_sales.ss_store_sk = store.s_store_sk
 JOIN customer_demographics cd1 ON store_sales.ss_cdemo_sk= 
 cd1.cd_demo_sk
 JOIN customer_demographics cd2 ON customer.c_current_cdemo_sk = 
 cd2.cd_demo_sk
 JOIN promotion ON store_sales.ss_promo_sk = promotion.p_promo_sk
 JOIN household_demographics hd1 ON store_sales.ss_hdemo_sk = 
 hd1.hd_demo_sk
 JOIN household_demographics hd2 ON customer.c_current_hdemo_sk = 
 hd2.hd_demo_sk
 JOIN customer_address ad1 ON store_sales.ss_addr_sk = 
 ad1.ca_address_sk
 JOIN customer_address ad2 ON customer.c_current_addr_sk = 
 ad2.ca_address_sk
 JOIN income_band ib1 ON hd1.hd_income_band_sk = ib1.ib_income_band_sk
 JOIN income_band ib2 ON hd2.hd_income_band_sk = ib2.ib_income_band_sk
 JOIN item ON store_sales.ss_item_sk = item.i_item_sk
 JOIN
  (select cs_item_sk
 ,sum(cs_ext_list_price) as 
 sale,sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit) as refund
   from catalog_sales JOIN catalog_returns
   ON catalog_sales.cs_item_sk = catalog_returns.cr_item_sk
 and catalog_sales.cs_order_number = catalog_returns.cr_order_number
   group by cs_item_sk
   having 
 sum(cs_ext_list_price)2*sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit))
  cs_ui
 ON store_sales.ss_item_sk = cs_ui.cs_item_sk
   WHERE  
  cd1.cd_marital_status  cd2.cd_marital_status and
  i_color in ('maroon','burnished','dim','steel','navajo','chocolate') 
 and
  

[jira] [Commented] (HIVE-10591) Support limited integer type promotion in ORC

2015-05-10 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537049#comment-14537049
 ] 

Lefty Leverenz commented on HIVE-10591:
---

Should this be documented?

{quote}
Following type promotions can be supported without any casting
smallint - int
smallint - bigint
int - bigint
{quote}

 Support limited integer type promotion in ORC
 -

 Key: HIVE-10591
 URL: https://issues.apache.org/jira/browse/HIVE-10591
 Project: Hive
  Issue Type: New Feature
Affects Versions: 1.3.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran
 Fix For: 1.2.0, 1.3.0

 Attachments: HIVE-10591.1.patch, HIVE-10591.2.patch, 
 HIVE-10591.2.patch, HIVE-10591.3.patch, HIVE-10591.3.patch, HIVE-10591.3.patch


 ORC currently does not support schema-on-read. If we alter an ORC table with 
 'int' type to 'bigint' and if we query the altered table ClassCastException 
 will be thrown as the schema on read from table descriptor will expect 
 LongWritable whereas ORC will return IntWritable based on file schema stored 
 within ORC file. OrcSerde currently doesn't do any type conversions or type 
 promotions for performance reasons in inner loop. Since smallints, ints and 
 bigints are stored in the same way in ORC, it will be possible be allow such 
 type promotions without hurting performance. Following type promotions can be 
 supported without any casting
 smallint - int
 smallint - bigint
 int - bigint
 Tinyint promotion is not possible without casting as tinyints are stored 
 using RLE byte writer whereas smallints, ints and bigints are stored using 
 RLE integer writer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10664) Unit tests run fail in windows because of illegal escape character in file path

2015-05-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537084#comment-14537084
 ] 

Hive QA commented on HIVE-10664:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12731709/HIVE-10664.1.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 8921 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_2
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_static
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3838/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3838/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3838/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12731709 - PreCommit-HIVE-TRUNK-Build

 Unit tests run fail in windows because of  illegal escape character in file 
 path
 

 Key: HIVE-10664
 URL: https://issues.apache.org/jira/browse/HIVE-10664
 Project: Hive
  Issue Type: Bug
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan
 Attachments: HIVE-10664.1.patch


 {code:title=In windows we hit errors as shown below }
 [ERROR] 
 /D:/w/hv/itests/qtest/target/generated-test-sources/java/org/apache/hadoop/hive/cli/TestHBaseNegativeCliDriver.java:[97,54]
  illegal escape character
 {code}
 Specifically, the lines it is complaining about in the log, look like this:
 {code:title=line 97 of 
 /itests/qtest/target/generated-test-sources/java/org/apache/hadoop/hive/cli/TestHBaseNegativeCliDriver.java}
 line 97:  
 QTestUtil.addTestsToSuiteFromQfileNames(D:\w\hv\itests\qtest\target\generated-test-sources\java\org\apache\hadoop\hive\cli\TestHBaseNegativeCliDriverQFileNames.txt,
  qFilesToExecute,
 {code}
 It is executing the itests/qtests directory on the windows platform. It seems 
 as if it is hitting a fairly simple portability problem, of the type that a 
 small change will touch up all of them quickly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10327) Remove ExprNodeNullDesc

2015-05-10 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537299#comment-14537299
 ] 

Gopal V commented on HIVE-10327:


[~ashutoshc]: sure, added to to my lists.

 Remove ExprNodeNullDesc
 ---

 Key: HIVE-10327
 URL: https://issues.apache.org/jira/browse/HIVE-10327
 Project: Hive
  Issue Type: Task
  Components: Query Planning
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-10327.1.patch, HIVE-10327.patch


 Its purpose can be served by ExprNodeConstantDesc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10190) CBO: AST mode checks for TABLESAMPLE with AST.toString().contains(TOK_TABLESPLITSAMPLE)

2015-05-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537239#comment-14537239
 ] 

Hive QA commented on HIVE-10190:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12731797/HIVE-10190.12.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 8919 tests executed
*Failed tests:*
{noformat}
TestCompareCliDriver - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_mult_tables_compact
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_static
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3840/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3840/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3840/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12731797 - PreCommit-HIVE-TRUNK-Build

 CBO: AST mode checks for TABLESAMPLE with 
 AST.toString().contains(TOK_TABLESPLITSAMPLE)
 -

 Key: HIVE-10190
 URL: https://issues.apache.org/jira/browse/HIVE-10190
 Project: Hive
  Issue Type: Bug
  Components: CBO
Affects Versions: 1.2.0
Reporter: Gopal V
Assignee: Reuben Kuhnert
Priority: Trivial
  Labels: perfomance
 Attachments: HIVE-10190-querygen.py, HIVE-10190.01.patch, 
 HIVE-10190.02.patch, HIVE-10190.03.patch, HIVE-10190.04.patch, 
 HIVE-10190.05.patch, HIVE-10190.05.patch, HIVE-10190.06.patch, 
 HIVE-10190.07.patch, HIVE-10190.08.patch, HIVE-10190.09.patch, 
 HIVE-10190.10.patch, HIVE-10190.11.patch, HIVE-10190.12.patch


 {code}
 public static boolean validateASTForUnsupportedTokens(ASTNode ast) {
 String astTree = ast.toStringTree();
 // if any of following tokens are present in AST, bail out
 String[] tokens = { TOK_CHARSETLITERAL, TOK_TABLESPLITSAMPLE };
 for (String token : tokens) {
   if (astTree.contains(token)) {
 return false;
   }
 }
 return true;
   }
 {code}
 This is an issue for a SQL query which is bigger in AST form than in text 
 (~700kb).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10190) CBO: AST mode checks for TABLESAMPLE with AST.toString().contains(TOK_TABLESPLITSAMPLE)

2015-05-10 Thread Reuben Kuhnert (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reuben Kuhnert updated HIVE-10190:
--
Attachment: HIVE-10190.12.patch

 CBO: AST mode checks for TABLESAMPLE with 
 AST.toString().contains(TOK_TABLESPLITSAMPLE)
 -

 Key: HIVE-10190
 URL: https://issues.apache.org/jira/browse/HIVE-10190
 Project: Hive
  Issue Type: Bug
  Components: CBO
Affects Versions: 1.2.0
Reporter: Gopal V
Assignee: Reuben Kuhnert
Priority: Trivial
  Labels: perfomance
 Attachments: HIVE-10190-querygen.py, HIVE-10190.01.patch, 
 HIVE-10190.02.patch, HIVE-10190.03.patch, HIVE-10190.04.patch, 
 HIVE-10190.05.patch, HIVE-10190.05.patch, HIVE-10190.06.patch, 
 HIVE-10190.07.patch, HIVE-10190.08.patch, HIVE-10190.09.patch, 
 HIVE-10190.10.patch, HIVE-10190.11.patch, HIVE-10190.12.patch


 {code}
 public static boolean validateASTForUnsupportedTokens(ASTNode ast) {
 String astTree = ast.toStringTree();
 // if any of following tokens are present in AST, bail out
 String[] tokens = { TOK_CHARSETLITERAL, TOK_TABLESPLITSAMPLE };
 for (String token : tokens) {
   if (astTree.contains(token)) {
 return false;
   }
 }
 return true;
   }
 {code}
 This is an issue for a SQL query which is bigger in AST form than in text 
 (~700kb).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10190) CBO: AST mode checks for TABLESAMPLE with AST.toString().contains(TOK_TABLESPLITSAMPLE)

2015-05-10 Thread Reuben Kuhnert (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reuben Kuhnert updated HIVE-10190:
--
Attachment: (was: HIVE-10190.12.patch)

 CBO: AST mode checks for TABLESAMPLE with 
 AST.toString().contains(TOK_TABLESPLITSAMPLE)
 -

 Key: HIVE-10190
 URL: https://issues.apache.org/jira/browse/HIVE-10190
 Project: Hive
  Issue Type: Bug
  Components: CBO
Affects Versions: 1.2.0
Reporter: Gopal V
Assignee: Reuben Kuhnert
Priority: Trivial
  Labels: perfomance
 Attachments: HIVE-10190-querygen.py, HIVE-10190.01.patch, 
 HIVE-10190.02.patch, HIVE-10190.03.patch, HIVE-10190.04.patch, 
 HIVE-10190.05.patch, HIVE-10190.05.patch, HIVE-10190.06.patch, 
 HIVE-10190.07.patch, HIVE-10190.08.patch, HIVE-10190.09.patch, 
 HIVE-10190.10.patch, HIVE-10190.11.patch, HIVE-10190.12.patch


 {code}
 public static boolean validateASTForUnsupportedTokens(ASTNode ast) {
 String astTree = ast.toStringTree();
 // if any of following tokens are present in AST, bail out
 String[] tokens = { TOK_CHARSETLITERAL, TOK_TABLESPLITSAMPLE };
 for (String token : tokens) {
   if (astTree.contains(token)) {
 return false;
   }
 }
 return true;
   }
 {code}
 This is an issue for a SQL query which is bigger in AST form than in text 
 (~700kb).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10568) Select count(distinct()) can have more optimal execution plan

2015-05-10 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-10568:

Fix Version/s: (was: 1.3.0)
   1.2.0

 Select count(distinct()) can have more optimal execution plan
 -

 Key: HIVE-10568
 URL: https://issues.apache.org/jira/browse/HIVE-10568
 Project: Hive
  Issue Type: Improvement
  Components: CBO, Logical Optimizer
Affects Versions: 0.6.0, 0.7.0, 0.8.0, 0.9.0, 0.10.0, 0.11.0, 0.12.0, 
 0.13.0, 0.14.0, 1.0.0, 1.2.0, 1.1.0
Reporter: Mostafa Mokhtar
Assignee: Ashutosh Chauhan
 Fix For: 1.2.0

 Attachments: HIVE-10568.1.patch, HIVE-10568.2.patch, 
 HIVE-10568.patch, HIVE-10568.patch


 {code:sql}
 select count(distinct ss_ticket_number) from store_sales;
 {code}
 can be rewritten as
 {code:sql}
 select count(1) from (select distinct ss_ticket_number from store_sales) a;
 {code}
 which may run upto 3x faster



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10568) Select count(distinct()) can have more optimal execution plan

2015-05-10 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537313#comment-14537313
 ] 

Ashutosh Chauhan commented on HIVE-10568:
-

Committed to 1.2 branch as well.

 Select count(distinct()) can have more optimal execution plan
 -

 Key: HIVE-10568
 URL: https://issues.apache.org/jira/browse/HIVE-10568
 Project: Hive
  Issue Type: Improvement
  Components: CBO, Logical Optimizer
Affects Versions: 0.6.0, 0.7.0, 0.8.0, 0.9.0, 0.10.0, 0.11.0, 0.12.0, 
 0.13.0, 0.14.0, 1.0.0, 1.1.0
Reporter: Mostafa Mokhtar
Assignee: Ashutosh Chauhan
 Fix For: 1.2.0

 Attachments: HIVE-10568.1.patch, HIVE-10568.2.patch, 
 HIVE-10568.patch, HIVE-10568.patch


 {code:sql}
 select count(distinct ss_ticket_number) from store_sales;
 {code}
 can be rewritten as
 {code:sql}
 select count(1) from (select distinct ss_ticket_number from store_sales) a;
 {code}
 which may run upto 3x faster



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10568) Select count(distinct()) can have more optimal execution plan

2015-05-10 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-10568:

Affects Version/s: (was: 1.2.0)

 Select count(distinct()) can have more optimal execution plan
 -

 Key: HIVE-10568
 URL: https://issues.apache.org/jira/browse/HIVE-10568
 Project: Hive
  Issue Type: Improvement
  Components: CBO, Logical Optimizer
Affects Versions: 0.6.0, 0.7.0, 0.8.0, 0.9.0, 0.10.0, 0.11.0, 0.12.0, 
 0.13.0, 0.14.0, 1.0.0, 1.1.0
Reporter: Mostafa Mokhtar
Assignee: Ashutosh Chauhan
 Fix For: 1.2.0

 Attachments: HIVE-10568.1.patch, HIVE-10568.2.patch, 
 HIVE-10568.patch, HIVE-10568.patch


 {code:sql}
 select count(distinct ss_ticket_number) from store_sales;
 {code}
 can be rewritten as
 {code:sql}
 select count(1) from (select distinct ss_ticket_number from store_sales) a;
 {code}
 which may run upto 3x faster



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)