[jira] [Commented] (HIVE-25671) Hybrid Grace Hash Join NullPointer When query RCFile

2021-11-05 Thread Nemon Lou (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17439093#comment-17439093
 ] 

Nemon Lou commented on HIVE-25671:
--

I have create a demo for it.

https://github.com/loudongfeng/kryo-bug-demo

Even lastes kryo 5.2.0 has the same issue.

https://github.com/EsotericSoftware/kryo/issues/863

> Hybrid Grace Hash Join NullPointer When query RCFile
> 
>
> Key: HIVE-25671
> URL: https://issues.apache.org/jira/browse/HIVE-25671
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.2
>Reporter: Nemon Lou
>Priority: Major
> Attachments: rcfile_kryo.patch
>
>
> Hive 3.1.0 kryo 3.0.3 tez engine
> the following sql can reproduce this issue
> {code:sql}
> CREATE TABLE `nemon.rt_dm_lpc_customer_sum_tmp3_3`( 
>`logo` string,   
>`customer_code` string,  
>`brand_name` string, 
>`business_code` string,  
>`discount` double,   
>`creation_date` string,  
>`etl_time` string)stored as rcfile; 
>  
> CREATE TABLE `nemon.rt_dm_lpc_customer_sum_tmp4_1`( 
>`customer_code` string,  
>`etl_time` string) stored as rcfile; 
>
> insert into nemon.rt_dm_lpc_customer_sum_tmp3_3 values 
> ("logo","customer_code","brand_name","business_code",1,"creation_date","etl_time")
>
> ,("logo","customer_code1","brand_name","business_code",1,"creation_date","etl_time")
>
> ,("logo","customer_code2","brand_name","business_code",1,"creation_date","etl_time")
>
> ,("logo","customer_code3","brand_name","business_code",1,"creation_date","etl_time")
>
> ,("logo","customer_code4","brand_name","business_code",1,"creation_date","etl_time")
>
> ,("logo","customer_code5","brand_name","business_code",1,"creation_date","etl_time")
>
> ,("logo","customer_code6","brand_name","business_code",1,"creation_date","etl_time")
>
> ,("logo","customer_code7","brand_name","business_code",1,"creation_date","etl_time")
>
> ,("logo","customer_code8","brand_name","business_code",1,"creation_date","etl_time")
>
> ,("logo","customer_code9","brand_name","business_code",1,"creation_date","etl_time");
> insert into  nemon.rt_dm_lpc_customer_sum_tmp4_1  values 
> ("customer_code","etl_time")
>,("customer_code1","etl_time")
>,("customer_code2","etl_time")
>,("customer_code3","etl_time")
>;
> set hive.auto.convert.join.noconditionaltask.size=10;
> set hive.mapjoin.hybridgrace.hashtable=true;
> SELECT
> tt1.logo,
> tt1.customer_code,
> tt1.brand_name,
> tt1.business_code,
> tt1.discount,
> tt1.creation_date,
> date_format(from_utc_timestamp(unix_timestamp()*1000,'Asia/Shanghai'),'-MM-dd
>  HH:mm:ss') etl_time
> from
> (
> SELECT
> t1.logo,
> t1.customer_code,
> t1.brand_name,
> t1.business_code,
> t1.discount,
> t1.creation_date,
> row_number() over(partition by t1.customer_code,t1.logo order by 
> t1.creation_date desc) as discount_rank
> from nemon.rt_dm_lpc_customer_sum_tmp3_3 t1
> join nemon.rt_dm_lpc_customer_sum_tmp4_1 t2
> on t2.customer_code = t1.customer_code
> ) tt1
> where tt1.discount_rank = 1;
> {code}
> Error log from tez task:
> {noformat}
> 2021-11-04 10:02:47,553 [INFO] [TezChild] |exec.MapJoinOperator|: Hybrid 
> Grace Hash Join: Deserializing spilled hash partition...
> 2021-11-04 10:02:47,553 [INFO] [TezChild] |exec.MapJoinOperator|: Hybrid 
> Grace Hash Join: Number of rows in hashmap: 1
> 2021-11-04 10:02:47,554 [INFO] [TezChild] |exec.MapJoinOperator|: Hybrid 
> Grace Hash Join: Going to process spilled big table rows in partition 5. 
> Number of rows: 1
> 2021-11-04 10:02:47,561 [ERROR] [TezChild] |exec.MapJoinOperator|: Unexpected 
> exception from MapJoinOperator : null
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.serde2.columnar.ColumnarStructBase$FieldInfo.uncheckedGetField(ColumnarStructBase.java:114)
>   at 
> org.apache.hadoop.hive.serde2.columnar.ColumnarStructBase.getField(ColumnarStructBase.java:172)
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.ColumnarStructObjectInspector.getStructFieldData(ColumnarStructObjectInspector.java:67)
>   at 
> org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator._evaluate(ExprNodeColumnEvaluator.java:95)
>   at 
> org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:80)
>   at 
> org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:68)
>   at 
> org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer$GetAdaptor.setFromRow(MapJoinBytesTableContainer.java:552)
>   at 
> org.apache.hadoop.hive.ql.exec.MapJoinOperator.setMapJoinKey(MapJoinOperator.java:415)

[jira] [Updated] (HIVE-25671) Hybrid Grace Hash Join NullPointer When query RCFile

2021-11-05 Thread Nemon Lou (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nemon Lou updated HIVE-25671:
-
External issue URL: https://github.com/EsotericSoftware/kryo/issues/863

> Hybrid Grace Hash Join NullPointer When query RCFile
> 
>
> Key: HIVE-25671
> URL: https://issues.apache.org/jira/browse/HIVE-25671
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.2
>Reporter: Nemon Lou
>Priority: Major
> Attachments: rcfile_kryo.patch
>
>
> Hive 3.1.0 kryo 3.0.3 tez engine
> the following sql can reproduce this issue
> {code:sql}
> CREATE TABLE `nemon.rt_dm_lpc_customer_sum_tmp3_3`( 
>`logo` string,   
>`customer_code` string,  
>`brand_name` string, 
>`business_code` string,  
>`discount` double,   
>`creation_date` string,  
>`etl_time` string)stored as rcfile; 
>  
> CREATE TABLE `nemon.rt_dm_lpc_customer_sum_tmp4_1`( 
>`customer_code` string,  
>`etl_time` string) stored as rcfile; 
>
> insert into nemon.rt_dm_lpc_customer_sum_tmp3_3 values 
> ("logo","customer_code","brand_name","business_code",1,"creation_date","etl_time")
>
> ,("logo","customer_code1","brand_name","business_code",1,"creation_date","etl_time")
>
> ,("logo","customer_code2","brand_name","business_code",1,"creation_date","etl_time")
>
> ,("logo","customer_code3","brand_name","business_code",1,"creation_date","etl_time")
>
> ,("logo","customer_code4","brand_name","business_code",1,"creation_date","etl_time")
>
> ,("logo","customer_code5","brand_name","business_code",1,"creation_date","etl_time")
>
> ,("logo","customer_code6","brand_name","business_code",1,"creation_date","etl_time")
>
> ,("logo","customer_code7","brand_name","business_code",1,"creation_date","etl_time")
>
> ,("logo","customer_code8","brand_name","business_code",1,"creation_date","etl_time")
>
> ,("logo","customer_code9","brand_name","business_code",1,"creation_date","etl_time");
> insert into  nemon.rt_dm_lpc_customer_sum_tmp4_1  values 
> ("customer_code","etl_time")
>,("customer_code1","etl_time")
>,("customer_code2","etl_time")
>,("customer_code3","etl_time")
>;
> set hive.auto.convert.join.noconditionaltask.size=10;
> set hive.mapjoin.hybridgrace.hashtable=true;
> SELECT
> tt1.logo,
> tt1.customer_code,
> tt1.brand_name,
> tt1.business_code,
> tt1.discount,
> tt1.creation_date,
> date_format(from_utc_timestamp(unix_timestamp()*1000,'Asia/Shanghai'),'-MM-dd
>  HH:mm:ss') etl_time
> from
> (
> SELECT
> t1.logo,
> t1.customer_code,
> t1.brand_name,
> t1.business_code,
> t1.discount,
> t1.creation_date,
> row_number() over(partition by t1.customer_code,t1.logo order by 
> t1.creation_date desc) as discount_rank
> from nemon.rt_dm_lpc_customer_sum_tmp3_3 t1
> join nemon.rt_dm_lpc_customer_sum_tmp4_1 t2
> on t2.customer_code = t1.customer_code
> ) tt1
> where tt1.discount_rank = 1;
> {code}
> Error log from tez task:
> {noformat}
> 2021-11-04 10:02:47,553 [INFO] [TezChild] |exec.MapJoinOperator|: Hybrid 
> Grace Hash Join: Deserializing spilled hash partition...
> 2021-11-04 10:02:47,553 [INFO] [TezChild] |exec.MapJoinOperator|: Hybrid 
> Grace Hash Join: Number of rows in hashmap: 1
> 2021-11-04 10:02:47,554 [INFO] [TezChild] |exec.MapJoinOperator|: Hybrid 
> Grace Hash Join: Going to process spilled big table rows in partition 5. 
> Number of rows: 1
> 2021-11-04 10:02:47,561 [ERROR] [TezChild] |exec.MapJoinOperator|: Unexpected 
> exception from MapJoinOperator : null
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.serde2.columnar.ColumnarStructBase$FieldInfo.uncheckedGetField(ColumnarStructBase.java:114)
>   at 
> org.apache.hadoop.hive.serde2.columnar.ColumnarStructBase.getField(ColumnarStructBase.java:172)
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.ColumnarStructObjectInspector.getStructFieldData(ColumnarStructObjectInspector.java:67)
>   at 
> org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator._evaluate(ExprNodeColumnEvaluator.java:95)
>   at 
> org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:80)
>   at 
> org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:68)
>   at 
> org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer$GetAdaptor.setFromRow(MapJoinBytesTableContainer.java:552)
>   at 
> org.apache.hadoop.hive.ql.exec.MapJoinOperator.setMapJoinKey(MapJoinOperator.java:415)
>   at 
> org.apache.hadoop.hive.ql.exec.MapJoinOperator.process(MapJoinOperator.java:466)
>   at 
> 

[jira] [Commented] (HIVE-25671) Hybrid Grace Hash Join NullPointer When query RCFile

2021-11-04 Thread Nemon Lou (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17439012#comment-17439012
 ] 

Nemon Lou commented on HIVE-25671:
--

This bug seems related to kryo:
Null pointer occurs when JVM trying to invoke getLength method .
Invoker is in ColumnarStructBase$FieldInfo.uncheckedGetField(), while the 
actual method implementation is ColumnarStruct.getLength(), which overwride 
ColumnarStructBase.getLength().
The ColumnarStruct object is created by kryo deserializer.

Adding a reference to ColumnarStructBase can fix this issue.Uploading a patch 
to demonstrate this fix.

 [^rcfile_kryo.patch] 

> Hybrid Grace Hash Join NullPointer When query RCFile
> 
>
> Key: HIVE-25671
> URL: https://issues.apache.org/jira/browse/HIVE-25671
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.2
>Reporter: Nemon Lou
>Priority: Major
> Attachments: rcfile_kryo.patch
>
>
> Hive 3.1.0 kryo 3.0.3 tez engine
> the following sql can reproduce this issue
> {code:sql}
> CREATE TABLE `nemon.rt_dm_lpc_customer_sum_tmp3_3`( 
>`logo` string,   
>`customer_code` string,  
>`brand_name` string, 
>`business_code` string,  
>`discount` double,   
>`creation_date` string,  
>`etl_time` string)stored as rcfile; 
>  
> CREATE TABLE `nemon.rt_dm_lpc_customer_sum_tmp4_1`( 
>`customer_code` string,  
>`etl_time` string) stored as rcfile; 
>
> insert into nemon.rt_dm_lpc_customer_sum_tmp3_3 values 
> ("logo","customer_code","brand_name","business_code",1,"creation_date","etl_time")
>
> ,("logo","customer_code1","brand_name","business_code",1,"creation_date","etl_time")
>
> ,("logo","customer_code2","brand_name","business_code",1,"creation_date","etl_time")
>
> ,("logo","customer_code3","brand_name","business_code",1,"creation_date","etl_time")
>
> ,("logo","customer_code4","brand_name","business_code",1,"creation_date","etl_time")
>
> ,("logo","customer_code5","brand_name","business_code",1,"creation_date","etl_time")
>
> ,("logo","customer_code6","brand_name","business_code",1,"creation_date","etl_time")
>
> ,("logo","customer_code7","brand_name","business_code",1,"creation_date","etl_time")
>
> ,("logo","customer_code8","brand_name","business_code",1,"creation_date","etl_time")
>
> ,("logo","customer_code9","brand_name","business_code",1,"creation_date","etl_time");
> insert into  nemon.rt_dm_lpc_customer_sum_tmp4_1  values 
> ("customer_code","etl_time")
>,("customer_code1","etl_time")
>,("customer_code2","etl_time")
>,("customer_code3","etl_time")
>;
> set hive.auto.convert.join.noconditionaltask.size=10;
> set hive.mapjoin.hybridgrace.hashtable=true;
> SELECT
> tt1.logo,
> tt1.customer_code,
> tt1.brand_name,
> tt1.business_code,
> tt1.discount,
> tt1.creation_date,
> date_format(from_utc_timestamp(unix_timestamp()*1000,'Asia/Shanghai'),'-MM-dd
>  HH:mm:ss') etl_time
> from
> (
> SELECT
> t1.logo,
> t1.customer_code,
> t1.brand_name,
> t1.business_code,
> t1.discount,
> t1.creation_date,
> row_number() over(partition by t1.customer_code,t1.logo order by 
> t1.creation_date desc) as discount_rank
> from nemon.rt_dm_lpc_customer_sum_tmp3_3 t1
> join nemon.rt_dm_lpc_customer_sum_tmp4_1 t2
> on t2.customer_code = t1.customer_code
> ) tt1
> where tt1.discount_rank = 1;
> {code}
> Error log from tez task:
> {noformat}
> 2021-11-04 10:02:47,553 [INFO] [TezChild] |exec.MapJoinOperator|: Hybrid 
> Grace Hash Join: Deserializing spilled hash partition...
> 2021-11-04 10:02:47,553 [INFO] [TezChild] |exec.MapJoinOperator|: Hybrid 
> Grace Hash Join: Number of rows in hashmap: 1
> 2021-11-04 10:02:47,554 [INFO] [TezChild] |exec.MapJoinOperator|: Hybrid 
> Grace Hash Join: Going to process spilled big table rows in partition 5. 
> Number of rows: 1
> 2021-11-04 10:02:47,561 [ERROR] [TezChild] |exec.MapJoinOperator|: Unexpected 
> exception from MapJoinOperator : null
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.serde2.columnar.ColumnarStructBase$FieldInfo.uncheckedGetField(ColumnarStructBase.java:114)
>   at 
> org.apache.hadoop.hive.serde2.columnar.ColumnarStructBase.getField(ColumnarStructBase.java:172)
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.ColumnarStructObjectInspector.getStructFieldData(ColumnarStructObjectInspector.java:67)
>   at 
> org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator._evaluate(ExprNodeColumnEvaluator.java:95)
>   at 
> org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:80)
>   at 
> 

[jira] [Updated] (HIVE-25671) Hybrid Grace Hash Join NullPointer When query RCFile

2021-11-04 Thread Nemon Lou (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nemon Lou updated HIVE-25671:
-
Attachment: rcfile_kryo.patch

> Hybrid Grace Hash Join NullPointer When query RCFile
> 
>
> Key: HIVE-25671
> URL: https://issues.apache.org/jira/browse/HIVE-25671
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.2
>Reporter: Nemon Lou
>Priority: Major
> Attachments: rcfile_kryo.patch
>
>
> Hive 3.1.0 kryo 3.0.3 tez engine
> the following sql can reproduce this issue
> {code:sql}
> CREATE TABLE `nemon.rt_dm_lpc_customer_sum_tmp3_3`( 
>`logo` string,   
>`customer_code` string,  
>`brand_name` string, 
>`business_code` string,  
>`discount` double,   
>`creation_date` string,  
>`etl_time` string)stored as rcfile; 
>  
> CREATE TABLE `nemon.rt_dm_lpc_customer_sum_tmp4_1`( 
>`customer_code` string,  
>`etl_time` string) stored as rcfile; 
>
> insert into nemon.rt_dm_lpc_customer_sum_tmp3_3 values 
> ("logo","customer_code","brand_name","business_code",1,"creation_date","etl_time")
>
> ,("logo","customer_code1","brand_name","business_code",1,"creation_date","etl_time")
>
> ,("logo","customer_code2","brand_name","business_code",1,"creation_date","etl_time")
>
> ,("logo","customer_code3","brand_name","business_code",1,"creation_date","etl_time")
>
> ,("logo","customer_code4","brand_name","business_code",1,"creation_date","etl_time")
>
> ,("logo","customer_code5","brand_name","business_code",1,"creation_date","etl_time")
>
> ,("logo","customer_code6","brand_name","business_code",1,"creation_date","etl_time")
>
> ,("logo","customer_code7","brand_name","business_code",1,"creation_date","etl_time")
>
> ,("logo","customer_code8","brand_name","business_code",1,"creation_date","etl_time")
>
> ,("logo","customer_code9","brand_name","business_code",1,"creation_date","etl_time");
> insert into  nemon.rt_dm_lpc_customer_sum_tmp4_1  values 
> ("customer_code","etl_time")
>,("customer_code1","etl_time")
>,("customer_code2","etl_time")
>,("customer_code3","etl_time")
>;
> set hive.auto.convert.join.noconditionaltask.size=10;
> set hive.mapjoin.hybridgrace.hashtable=true;
> SELECT
> tt1.logo,
> tt1.customer_code,
> tt1.brand_name,
> tt1.business_code,
> tt1.discount,
> tt1.creation_date,
> date_format(from_utc_timestamp(unix_timestamp()*1000,'Asia/Shanghai'),'-MM-dd
>  HH:mm:ss') etl_time
> from
> (
> SELECT
> t1.logo,
> t1.customer_code,
> t1.brand_name,
> t1.business_code,
> t1.discount,
> t1.creation_date,
> row_number() over(partition by t1.customer_code,t1.logo order by 
> t1.creation_date desc) as discount_rank
> from nemon.rt_dm_lpc_customer_sum_tmp3_3 t1
> join nemon.rt_dm_lpc_customer_sum_tmp4_1 t2
> on t2.customer_code = t1.customer_code
> ) tt1
> where tt1.discount_rank = 1;
> {code}
> Error log from tez task:
> {noformat}
> 2021-11-04 10:02:47,553 [INFO] [TezChild] |exec.MapJoinOperator|: Hybrid 
> Grace Hash Join: Deserializing spilled hash partition...
> 2021-11-04 10:02:47,553 [INFO] [TezChild] |exec.MapJoinOperator|: Hybrid 
> Grace Hash Join: Number of rows in hashmap: 1
> 2021-11-04 10:02:47,554 [INFO] [TezChild] |exec.MapJoinOperator|: Hybrid 
> Grace Hash Join: Going to process spilled big table rows in partition 5. 
> Number of rows: 1
> 2021-11-04 10:02:47,561 [ERROR] [TezChild] |exec.MapJoinOperator|: Unexpected 
> exception from MapJoinOperator : null
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.serde2.columnar.ColumnarStructBase$FieldInfo.uncheckedGetField(ColumnarStructBase.java:114)
>   at 
> org.apache.hadoop.hive.serde2.columnar.ColumnarStructBase.getField(ColumnarStructBase.java:172)
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.ColumnarStructObjectInspector.getStructFieldData(ColumnarStructObjectInspector.java:67)
>   at 
> org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator._evaluate(ExprNodeColumnEvaluator.java:95)
>   at 
> org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:80)
>   at 
> org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:68)
>   at 
> org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer$GetAdaptor.setFromRow(MapJoinBytesTableContainer.java:552)
>   at 
> org.apache.hadoop.hive.ql.exec.MapJoinOperator.setMapJoinKey(MapJoinOperator.java:415)
>   at 
> org.apache.hadoop.hive.ql.exec.MapJoinOperator.process(MapJoinOperator.java:466)
>   at 
> 

[jira] [Updated] (HIVE-25671) Hybrid Grace Hash Join NullPointer When query RCFile

2021-11-04 Thread Nemon Lou (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nemon Lou updated HIVE-25671:
-
Description: 
Hive 3.1.0 kryo 3.0.3 tez engine
the following sql can reproduce this issue
{code:sql}
CREATE TABLE `nemon.rt_dm_lpc_customer_sum_tmp3_3`( 
   `logo` string,   
   `customer_code` string,  
   `brand_name` string, 
   `business_code` string,  
   `discount` double,   
   `creation_date` string,  
   `etl_time` string)stored as rcfile; 
 
CREATE TABLE `nemon.rt_dm_lpc_customer_sum_tmp4_1`( 
   `customer_code` string,  
   `etl_time` string) stored as rcfile; 
   
insert into nemon.rt_dm_lpc_customer_sum_tmp3_3 values 
("logo","customer_code","brand_name","business_code",1,"creation_date","etl_time")
   
,("logo","customer_code1","brand_name","business_code",1,"creation_date","etl_time")
   
,("logo","customer_code2","brand_name","business_code",1,"creation_date","etl_time")
   
,("logo","customer_code3","brand_name","business_code",1,"creation_date","etl_time")
   
,("logo","customer_code4","brand_name","business_code",1,"creation_date","etl_time")
   
,("logo","customer_code5","brand_name","business_code",1,"creation_date","etl_time")
   
,("logo","customer_code6","brand_name","business_code",1,"creation_date","etl_time")
   
,("logo","customer_code7","brand_name","business_code",1,"creation_date","etl_time")
   
,("logo","customer_code8","brand_name","business_code",1,"creation_date","etl_time")
   
,("logo","customer_code9","brand_name","business_code",1,"creation_date","etl_time");
insert into  nemon.rt_dm_lpc_customer_sum_tmp4_1  values 
("customer_code","etl_time")
   ,("customer_code1","etl_time")
   ,("customer_code2","etl_time")
   ,("customer_code3","etl_time")
   ;
set hive.auto.convert.join.noconditionaltask.size=10;
set hive.mapjoin.hybridgrace.hashtable=true;
SELECT
tt1.logo,
tt1.customer_code,
tt1.brand_name,
tt1.business_code,
tt1.discount,
tt1.creation_date,
date_format(from_utc_timestamp(unix_timestamp()*1000,'Asia/Shanghai'),'-MM-dd
 HH:mm:ss') etl_time
from
(
SELECT
t1.logo,
t1.customer_code,
t1.brand_name,
t1.business_code,
t1.discount,
t1.creation_date,
row_number() over(partition by t1.customer_code,t1.logo order by 
t1.creation_date desc) as discount_rank
from nemon.rt_dm_lpc_customer_sum_tmp3_3 t1
join nemon.rt_dm_lpc_customer_sum_tmp4_1 t2
on t2.customer_code = t1.customer_code
) tt1
where tt1.discount_rank = 1;
{code}

Error log from tez task:
{noformat}
2021-11-04 10:02:47,553 [INFO] [TezChild] |exec.MapJoinOperator|: Hybrid Grace 
Hash Join: Deserializing spilled hash partition...
2021-11-04 10:02:47,553 [INFO] [TezChild] |exec.MapJoinOperator|: Hybrid Grace 
Hash Join: Number of rows in hashmap: 1
2021-11-04 10:02:47,554 [INFO] [TezChild] |exec.MapJoinOperator|: Hybrid Grace 
Hash Join: Going to process spilled big table rows in partition 5. Number of 
rows: 1
2021-11-04 10:02:47,561 [ERROR] [TezChild] |exec.MapJoinOperator|: Unexpected 
exception from MapJoinOperator : null
java.lang.NullPointerException
at 
org.apache.hadoop.hive.serde2.columnar.ColumnarStructBase$FieldInfo.uncheckedGetField(ColumnarStructBase.java:114)
at 
org.apache.hadoop.hive.serde2.columnar.ColumnarStructBase.getField(ColumnarStructBase.java:172)
at 
org.apache.hadoop.hive.serde2.objectinspector.ColumnarStructObjectInspector.getStructFieldData(ColumnarStructObjectInspector.java:67)
at 
org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator._evaluate(ExprNodeColumnEvaluator.java:95)
at 
org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:80)
at 
org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:68)
at 
org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer$GetAdaptor.setFromRow(MapJoinBytesTableContainer.java:552)
at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.setMapJoinKey(MapJoinOperator.java:415)
at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.process(MapJoinOperator.java:466)
at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.reProcessBigTable(MapJoinOperator.java:755)
at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.continueProcess(MapJoinOperator.java:671)
at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.closeOp(MapJoinOperator.java:604)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:733)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:757)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:477)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:284)
at 

[jira] [Updated] (HIVE-25671) Hybrid Grace Hash Join NullPointer When query RCFile

2021-11-04 Thread Nemon Lou (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nemon Lou updated HIVE-25671:
-
Description: 
{noformat}
2021-11-04 10:02:47,553 [INFO] [TezChild] |exec.MapJoinOperator|: Hybrid Grace 
Hash Join: Deserializing spilled hash partition...
2021-11-04 10:02:47,553 [INFO] [TezChild] |exec.MapJoinOperator|: Hybrid Grace 
Hash Join: Number of rows in hashmap: 1
2021-11-04 10:02:47,554 [INFO] [TezChild] |exec.MapJoinOperator|: Hybrid Grace 
Hash Join: Going to process spilled big table rows in partition 5. Number of 
rows: 1
2021-11-04 10:02:47,561 [ERROR] [TezChild] |exec.MapJoinOperator|: Unexpected 
exception from MapJoinOperator : null
java.lang.NullPointerException
at 
org.apache.hadoop.hive.serde2.columnar.ColumnarStructBase$FieldInfo.uncheckedGetField(ColumnarStructBase.java:114)
at 
org.apache.hadoop.hive.serde2.columnar.ColumnarStructBase.getField(ColumnarStructBase.java:172)
at 
org.apache.hadoop.hive.serde2.objectinspector.ColumnarStructObjectInspector.getStructFieldData(ColumnarStructObjectInspector.java:67)
at 
org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator._evaluate(ExprNodeColumnEvaluator.java:95)
at 
org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:80)
at 
org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:68)
at 
org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer$GetAdaptor.setFromRow(MapJoinBytesTableContainer.java:552)
at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.setMapJoinKey(MapJoinOperator.java:415)
at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.process(MapJoinOperator.java:466)
at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.reProcessBigTable(MapJoinOperator.java:755)
at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.continueProcess(MapJoinOperator.java:671)
at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.closeOp(MapJoinOperator.java:604)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:733)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:757)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:477)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:284)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at 
com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
at 
com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
at 
com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
{noformat}

  was:
{format}
2021-11-04 10:02:47,553 [INFO] [TezChild] |exec.MapJoinOperator|: Hybrid Grace 
Hash Join: Deserializing spilled hash partition...
2021-11-04 10:02:47,553 [INFO] [TezChild] |exec.MapJoinOperator|: Hybrid Grace 
Hash Join: Number of rows in hashmap: 1
2021-11-04 10:02:47,554 [INFO] [TezChild] |exec.MapJoinOperator|: Hybrid Grace 
Hash Join: Going to process spilled big table rows in partition 5. Number of 
rows: 1
2021-11-04 10:02:47,561 [ERROR] [TezChild] |exec.MapJoinOperator|: Unexpected 
exception from MapJoinOperator : null
java.lang.NullPointerException
at 
org.apache.hadoop.hive.serde2.columnar.ColumnarStructBase$FieldInfo.uncheckedGetField(ColumnarStructBase.java:114)
at 
org.apache.hadoop.hive.serde2.columnar.ColumnarStructBase.getField(ColumnarStructBase.java:172)
at 
org.apache.hadoop.hive.serde2.objectinspector.ColumnarStructObjectInspector.getStructFieldData(ColumnarStructObjectInspector.java:67)
at 

[jira] [Assigned] (HIVE-22294) ConditionalWork cannot be cast to MapredWork When both skew.join and auto.convert is on.

2021-09-24 Thread Nemon Lou (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nemon Lou reassigned HIVE-22294:


Assignee: Nemon Lou  (was: Rui Li)

> ConditionalWork cannot be cast to MapredWork  When both skew.join and 
> auto.convert is on.  
> ---
>
> Key: HIVE-22294
> URL: https://issues.apache.org/jira/browse/HIVE-22294
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer
>Affects Versions: 2.3.0, 2.3.4, 3.1.1
>Reporter: Qiang.Kang
>Assignee: Nemon Lou
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Our hive version is 1.2.1 which has merged some patches (including patches 
> mentioned  in https://issues.apache.org/jira/browse/HIVE-14557, 
> https://issues.apache.org/jira/browse/HIVE-16155 ) .
>  
> My sql query string is like this:
> {code:java}
> // code placeholder
> set hive.auto.convert.join = true;
> set hive.optimize.skewjoin=true;
>  
> SELECT a.*
> FROM
> a
> JOIN b
> ON a.id=b.id AND a.uid = b.uid 
> LEFT JOIN c
> ON b.id=c.id AND b.uid=c.uid;
>  
> {code}
>  
> And we met some error: 
> FAILED: ClassCastException org.apache.hadoop.hive.ql.plan.ConditionalWork 
> cannot be cast to org.apache.hadoop.hive.ql.plan.MapredWork
>  
> The main reason is that there is a conditional task (*MapJoin*) in the list 
> tasks of another Conditional task (*SkewJoin*).  Here is the code snippet 
> where it throws this exception:
> `org.apache.hadoop.hive.ql.optimizer.physical.MapJoinResolver:`
>  
> {code:java}
> // code placeholder
> public Object dispatch(Node nd, Stack stack, Object... nodeOutputs)
>  throws SemanticException {
>  Task currTask = (Task) nd;
>  // not map reduce task or not conditional task, just skip
>  if (currTask.isMapRedTask()) {
>  if (currTask instanceof ConditionalTask) {
>  // get the list of task
>  List> taskList = ((ConditionalTask) 
> currTask).getListTasks();
>  for (Task tsk : taskList) {
>  if (tsk.isMapRedTask())
> {   //  ATTENTION: tsk May be ConditionalTask !!! 
> this.processCurrentTask(tsk, ((ConditionalTask) currTask)); }
> }
>  } else
> { this.processCurrentTask(currTask, null); }
> }
>  return null;
>  }
> private void processCurrentTask(Task currTask,
>  ConditionalTask conditionalTask) throws SemanticException {
>  // get current mapred work and its local work
>  MapredWork mapredWork = (MapredWork) currTask.getWork(); // WRONG!!
>  MapredLocalWork localwork = mapredWork.getMapWork().getMapRedLocalWork();
>  
> {code}
>  
> Here is some detail Information about query plan:
>  * 
>  --  set hive.auto.convert.join = true; set hive.optimize.skewjoin=false;*
> {code:java}
> // code placeholder
> Stage-1 is a root stage [a join b]
>  Stage-12 [map join]depends on stages: Stage-1 , consists of Stage-13, Stage-2
>  Stage-13 has a backup stage: Stage-2
>  Stage-11 depends on stages: Stage-13
>  Stage-8 depends on stages: Stage-2, Stage-11 , consists of Stage-5, Stage-4, 
> Stage-6
>  Stage-5
>  Stage-0 depends on stages: Stage-5, Stage-4, Stage-7
>  Stage-14 depends on stages: Stage-0
>  Stage-3 depends on stages: Stage-14
>  Stage-4
>  Stage-6
>  Stage-7 depends on stages: Stage-6
>  Stage-2
>  
> {code}
>  * 
>  --  set hive.auto.convert.join = false; set hive.optimize.skewjoin=true;*
> {code:java}
> // code placeholder
> STAGE DEPENDENCIES:
>  Stage-1 is a root stage
>  Stage-12 depends on stages: Stage-1 , consists of Stage-13, Stage-2
>  Stage-13 [skew Join map local task]
>  Stage-11 depends on stages: Stage-13
>  Stage-2 depends on stages: Stage-11
>  Stage-8 depends on stages: Stage-2 , consists of Stage-5, Stage-4, Stage-6
>  Stage-5
>  Stage-0 depends on stages: Stage-5, Stage-4, Stage-7
>  Stage-14 depends on stages: Stage-0
>  Stage-3 depends on stages: Stage-14
>  Stage-4
>  Stage-6
>  Stage-7 depends on stages: Stage-6
> {code}
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24579) Incorrect Result For Groupby With Limit

2021-09-22 Thread Nemon Lou (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17418562#comment-17418562
 ] 

Nemon Lou commented on HIVE-24579:
--

[~kkasa] Good job! 
After reading your PR, I have some concerns.
1. Does the sorting stage cause compatibility problems? For example, the 
returned content is different from the original after sort. 
(There are many examples in the .q.out file). This seems to be less of a 
problem than the incorrect result.
2. Faster by topn + order by, or faster by reducing one stage (no topn + no 
order by)? Do different solutions need to be selected for different scenarios?
3. In the scenario where cbo=false, do we need to fix it?
Thanks.

> Incorrect Result For Groupby With Limit
> ---
>
> Key: HIVE-24579
> URL: https://issues.apache.org/jira/browse/HIVE-24579
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer
>Affects Versions: 2.3.7, 3.1.2, 4.0.0
>Reporter: Nemon Lou
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {code:sql}
> create table test(id int);
> explain extended select id,count(*) from test group by id limit 10;
> {code}
> There is an TopN unexpectly for map phase, which casues incorrect result.
> {code:sql}
> STAGE PLANS:
>   Stage: Stage-1
> Tez
>   DagId: root_20210104141527_c599c0cd-ca2f-4c7d-a3cc-3a01d65c49a1:5
>   Edges:
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
>   DagName: root_20210104141527_c599c0cd-ca2f-4c7d-a3cc-3a01d65c49a1:5
>   Vertices:
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: test
>   Statistics: Num rows: 1 Data size: 13500 Basic stats: 
> COMPLETE Column stats: NONE
>   GatherStats: false
>   Select Operator
> expressions: id (type: int)
> outputColumnNames: id
> Statistics: Num rows: 1 Data size: 13500 Basic stats: 
> COMPLETE Column stats: NONE
> Group By Operator
>   aggregations: count()
>   keys: id (type: int)
>   mode: hash
>   outputColumnNames: _col0, _col1
>   Statistics: Num rows: 1 Data size: 13500 Basic stats: 
> COMPLETE Column stats: NONE
>   Reduce Output Operator
> key expressions: _col0 (type: int)
> null sort order: a
> sort order: +
> Map-reduce partition columns: _col0 (type: int)
> Statistics: Num rows: 1 Data size: 13500 Basic stats: 
> COMPLETE Column stats: NONE
> tag: -1
> TopN: 10
> TopN Hash Memory Usage: 0.1
> value expressions: _col1 (type: bigint)
> auto parallelism: true
> Execution mode: vectorized
> Path -> Alias:
>   file:/user/hive/warehouse/test [test]
> Path -> Partition:
>   file:/user/hive/warehouse/test 
> Partition
>   base file name: test
>   input format: org.apache.hadoop.mapred.TextInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>   properties:
> COLUMN_STATS_ACCURATE 
> {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true"}}
> bucket_count -1
> bucketing_version 2
> column.name.delimiter ,
> columns id
> columns.comments 
> columns.types int
> file.inputformat org.apache.hadoop.mapred.TextInputFormat
> file.outputformat 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> location file:/user/hive/warehouse/test
> name default.test
> numFiles 0
> numRows 0
> rawDataSize 0
> serialization.ddl struct test { i32 id}
> serialization.format 1
> serialization.lib 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> totalSize 0
> transient_lastDdlTime 1609730190
>   serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> 
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: 
> 

[jira] [Commented] (HIVE-22294) ConditionalWork cannot be cast to MapredWork When both skew.join and auto.convert is on.

2021-09-15 Thread Nemon Lou (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17415338#comment-17415338
 ] 

Nemon Lou commented on HIVE-22294:
--

The following sql can reproduce this issue,with tpc-ds factor 2, hive 2.3.0:
{code:sql}
use hive_tpcds_text;
set hive.optimize.skewjoin=true;
set hive.auto.convert.join.noconditionaltask.size=1000;
set hive.mapjoin.smalltable.filesize=2500;
select  i_item_id, 
avg(ss_quantity) agg1,
avg(ss_list_price) agg2,
avg(ss_coupon_amt) agg3,
avg(ss_sales_price) agg4 
 from store_sales, customer_demographics, date_dim, item, promotion
 where ss_sold_date_sk = d_date_sk and
   ss_item_sk = i_item_sk and
   ss_cdemo_sk = cd_demo_sk and
   ss_promo_sk = p_promo_sk and
   cd_gender = 'F' and 
   cd_marital_status = 'W' and
   cd_education_status = 'College' and
   (p_channel_email = 'N' or p_channel_event = 'N') and
   d_year = 2001 
 group by i_item_id
 order by i_item_id
 limit 100;
{code}

Error log:
{noformat}
2021-09-15 10:15:36,602 | ERROR | 43f5fc4c-2294-443e-897e-9c73261d4ccb 
HiveServer2-Handler-Pool: Thread-100 | FAILED: ClassCastException 
org.apache.hadoop.hive.ql.plan.ConditionalWork cannot be cast to 
org.apache.hadoop.hive.ql.plan.MapredWork
java.lang.ClassCastException: org.apache.hadoop.hive.ql.plan.ConditionalWork 
cannot be cast to org.apache.hadoop.hive.ql.plan.MapredWork
at 
org.apache.hadoop.hive.ql.optimizer.physical.MapJoinResolver$LocalMapJoinTaskDispatcher.processCurrentTask(MapJoinResolver.java:102)
at 
org.apache.hadoop.hive.ql.optimizer.physical.MapJoinResolver$LocalMapJoinTaskDispatcher.dispatch(MapJoinResolver.java:239)
at 
org.apache.hadoop.hive.ql.lib.TaskGraphWalker.dispatch(TaskGraphWalker.java:111)
at 
org.apache.hadoop.hive.ql.lib.TaskGraphWalker.walk(TaskGraphWalker.java:180)
at 
org.apache.hadoop.hive.ql.lib.TaskGraphWalker.startWalking(TaskGraphWalker.java:125)
at 
org.apache.hadoop.hive.ql.optimizer.physical.MapJoinResolver.resolve(MapJoinResolver.java:81)
at 
org.apache.hadoop.hive.ql.optimizer.physical.PhysicalOptimizer.optimize(PhysicalOptimizer.java:114)
at 
org.apache.hadoop.hive.ql.parse.MapReduceCompiler.optimizeTaskPlan(MapReduceCompiler.java:271)
at 
org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:292)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:11289)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:286)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:258)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:513)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1318)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1296)
at 
org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:206)
at 
org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:321)
at 
org.apache.hive.service.cli.operation.Operation.run(Operation.java:320)
at 
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:530)
at 
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:517)
at sun.reflect.GeneratedMethodAccessor77.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78)
at 
org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36)
at 
org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1840)
at 
org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59)
at com.sun.proxy.$Proxy38.executeStatementAsync(Unknown Source)
at 
org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:310)
at 
org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:761)
at 
org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1437)
at 
org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1422)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at 

[jira] [Commented] (HIVE-3562) Some limit can be pushed down to map stage

2021-09-13 Thread Nemon Lou (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17414659#comment-17414659
 ] 

Nemon Lou commented on HIVE-3562:
-

[~girishsk] I reported a same issue , also with some analysis : HIVE-24579

> Some limit can be pushed down to map stage
> --
>
> Key: HIVE-3562
> URL: https://issues.apache.org/jira/browse/HIVE-3562
> Project: Hive
>  Issue Type: Bug
>Reporter: Navis Ryu
>Assignee: Navis Ryu
>Priority: Trivial
> Fix For: 0.12.0
>
> Attachments: HIVE-3562.D5967.1.patch, HIVE-3562.D5967.2.patch, 
> HIVE-3562.D5967.3.patch, HIVE-3562.D5967.4.patch, HIVE-3562.D5967.5.patch, 
> HIVE-3562.D5967.6.patch, HIVE-3562.D5967.7.patch, HIVE-3562.D5967.8.patch, 
> HIVE-3562.D5967.9.patch
>
>
> Queries with limit clause (with reasonable number), for example
> {noformat}
> select * from src order by key limit 10;
> {noformat}
> makes operator tree, 
> TS-SEL-RS-EXT-LIMIT-FS
> But LIMIT can be partially calculated in RS, reducing size of shuffling.
> TS-SEL-RS(TOP-N)-EXT-LIMIT-FS



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24579) Incorrect Result For Groupby With Limit

2021-09-13 Thread Nemon Lou (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17414657#comment-17414657
 ] 

Nemon Lou commented on HIVE-24579:
--

Another user also reports the same issue : 
https://issues.apache.org/jira/browse/HIVE-3562?focusedCommentId=17170367=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17170367

> Incorrect Result For Groupby With Limit
> ---
>
> Key: HIVE-24579
> URL: https://issues.apache.org/jira/browse/HIVE-24579
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.3.7, 3.1.2, 4.0.0
>Reporter: Nemon Lou
>Priority: Major
>
> {code:sql}
> create table test(id int);
> explain extended select id,count(*) from test group by id limit 10;
> {code}
> There is an TopN unexpectly for map phase, which casues incorrect result.
> {code:sql}
> STAGE PLANS:
>   Stage: Stage-1
> Tez
>   DagId: root_20210104141527_c599c0cd-ca2f-4c7d-a3cc-3a01d65c49a1:5
>   Edges:
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
>   DagName: root_20210104141527_c599c0cd-ca2f-4c7d-a3cc-3a01d65c49a1:5
>   Vertices:
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: test
>   Statistics: Num rows: 1 Data size: 13500 Basic stats: 
> COMPLETE Column stats: NONE
>   GatherStats: false
>   Select Operator
> expressions: id (type: int)
> outputColumnNames: id
> Statistics: Num rows: 1 Data size: 13500 Basic stats: 
> COMPLETE Column stats: NONE
> Group By Operator
>   aggregations: count()
>   keys: id (type: int)
>   mode: hash
>   outputColumnNames: _col0, _col1
>   Statistics: Num rows: 1 Data size: 13500 Basic stats: 
> COMPLETE Column stats: NONE
>   Reduce Output Operator
> key expressions: _col0 (type: int)
> null sort order: a
> sort order: +
> Map-reduce partition columns: _col0 (type: int)
> Statistics: Num rows: 1 Data size: 13500 Basic stats: 
> COMPLETE Column stats: NONE
> tag: -1
> TopN: 10
> TopN Hash Memory Usage: 0.1
> value expressions: _col1 (type: bigint)
> auto parallelism: true
> Execution mode: vectorized
> Path -> Alias:
>   file:/user/hive/warehouse/test [test]
> Path -> Partition:
>   file:/user/hive/warehouse/test 
> Partition
>   base file name: test
>   input format: org.apache.hadoop.mapred.TextInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>   properties:
> COLUMN_STATS_ACCURATE 
> {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true"}}
> bucket_count -1
> bucketing_version 2
> column.name.delimiter ,
> columns id
> columns.comments 
> columns.types int
> file.inputformat org.apache.hadoop.mapred.TextInputFormat
> file.outputformat 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> location file:/user/hive/warehouse/test
> name default.test
> numFiles 0
> numRows 0
> rawDataSize 0
> serialization.ddl struct test { i32 id}
> serialization.format 1
> serialization.lib 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> totalSize 0
> transient_lastDdlTime 1609730190
>   serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> 
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> properties:
>   COLUMN_STATS_ACCURATE 
> {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true"}}
>   bucket_count -1
>   bucketing_version 2
>   column.name.delimiter ,
>   columns id
>   columns.comments 
>   columns.types int
>   file.inputformat 
> org.apache.hadoop.mapred.TextInputFormat
>   file.outputformat 
> 

[jira] [Commented] (HIVE-24579) Incorrect Result For Groupby With Limit

2021-09-13 Thread Nemon Lou (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17414151#comment-17414151
 ] 

Nemon Lou commented on HIVE-24579:
--

I think topn key operator has the same issue. What's your Opinion?  [~kkasa]

> Incorrect Result For Groupby With Limit
> ---
>
> Key: HIVE-24579
> URL: https://issues.apache.org/jira/browse/HIVE-24579
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.3.7, 3.1.2, 4.0.0
>Reporter: Nemon Lou
>Priority: Major
>
> {code:sql}
> create table test(id int);
> explain extended select id,count(*) from test group by id limit 10;
> {code}
> There is an TopN unexpectly for map phase, which casues incorrect result.
> {code:sql}
> STAGE PLANS:
>   Stage: Stage-1
> Tez
>   DagId: root_20210104141527_c599c0cd-ca2f-4c7d-a3cc-3a01d65c49a1:5
>   Edges:
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
>   DagName: root_20210104141527_c599c0cd-ca2f-4c7d-a3cc-3a01d65c49a1:5
>   Vertices:
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: test
>   Statistics: Num rows: 1 Data size: 13500 Basic stats: 
> COMPLETE Column stats: NONE
>   GatherStats: false
>   Select Operator
> expressions: id (type: int)
> outputColumnNames: id
> Statistics: Num rows: 1 Data size: 13500 Basic stats: 
> COMPLETE Column stats: NONE
> Group By Operator
>   aggregations: count()
>   keys: id (type: int)
>   mode: hash
>   outputColumnNames: _col0, _col1
>   Statistics: Num rows: 1 Data size: 13500 Basic stats: 
> COMPLETE Column stats: NONE
>   Reduce Output Operator
> key expressions: _col0 (type: int)
> null sort order: a
> sort order: +
> Map-reduce partition columns: _col0 (type: int)
> Statistics: Num rows: 1 Data size: 13500 Basic stats: 
> COMPLETE Column stats: NONE
> tag: -1
> TopN: 10
> TopN Hash Memory Usage: 0.1
> value expressions: _col1 (type: bigint)
> auto parallelism: true
> Execution mode: vectorized
> Path -> Alias:
>   file:/user/hive/warehouse/test [test]
> Path -> Partition:
>   file:/user/hive/warehouse/test 
> Partition
>   base file name: test
>   input format: org.apache.hadoop.mapred.TextInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>   properties:
> COLUMN_STATS_ACCURATE 
> {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true"}}
> bucket_count -1
> bucketing_version 2
> column.name.delimiter ,
> columns id
> columns.comments 
> columns.types int
> file.inputformat org.apache.hadoop.mapred.TextInputFormat
> file.outputformat 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> location file:/user/hive/warehouse/test
> name default.test
> numFiles 0
> numRows 0
> rawDataSize 0
> serialization.ddl struct test { i32 id}
> serialization.format 1
> serialization.lib 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> totalSize 0
> transient_lastDdlTime 1609730190
>   serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> 
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> properties:
>   COLUMN_STATS_ACCURATE 
> {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true"}}
>   bucket_count -1
>   bucketing_version 2
>   column.name.delimiter ,
>   columns id
>   columns.comments 
>   columns.types int
>   file.inputformat 
> org.apache.hadoop.mapred.TextInputFormat
>   file.outputformat 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>   location file:/user/hive/warehouse/test
>   

[jira] [Commented] (HIVE-24579) Incorrect Result For Groupby With Limit

2021-09-13 Thread Nemon Lou (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17414006#comment-17414006
 ] 

Nemon Lou commented on HIVE-24579:
--

After debuging,I find the bug is quite intuitive:

There is no order granted in the final result, but TopN in mapper filters out 
part of the data. Causing incorrect aggragation result of some keys.For example:

Assume that key1 is in the top 10 key at first and then is squeezed by other 
keys, but some data is still transmitted to the downstream. As a result, key1 
obtains an incorrect summarization result in the reduce phase.
However, the final result is not obtained from the top 10 keys but from the 
output results of multiple reduce. Therefore, key1 may be obtained, causing an 
error in the final result.

> Incorrect Result For Groupby With Limit
> ---
>
> Key: HIVE-24579
> URL: https://issues.apache.org/jira/browse/HIVE-24579
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.3.7, 3.1.2, 4.0.0
>Reporter: Nemon Lou
>Priority: Major
>
> {code:sql}
> create table test(id int);
> explain extended select id,count(*) from test group by id limit 10;
> {code}
> There is an TopN unexpectly for map phase, which casues incorrect result.
> {code:sql}
> STAGE PLANS:
>   Stage: Stage-1
> Tez
>   DagId: root_20210104141527_c599c0cd-ca2f-4c7d-a3cc-3a01d65c49a1:5
>   Edges:
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
>   DagName: root_20210104141527_c599c0cd-ca2f-4c7d-a3cc-3a01d65c49a1:5
>   Vertices:
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: test
>   Statistics: Num rows: 1 Data size: 13500 Basic stats: 
> COMPLETE Column stats: NONE
>   GatherStats: false
>   Select Operator
> expressions: id (type: int)
> outputColumnNames: id
> Statistics: Num rows: 1 Data size: 13500 Basic stats: 
> COMPLETE Column stats: NONE
> Group By Operator
>   aggregations: count()
>   keys: id (type: int)
>   mode: hash
>   outputColumnNames: _col0, _col1
>   Statistics: Num rows: 1 Data size: 13500 Basic stats: 
> COMPLETE Column stats: NONE
>   Reduce Output Operator
> key expressions: _col0 (type: int)
> null sort order: a
> sort order: +
> Map-reduce partition columns: _col0 (type: int)
> Statistics: Num rows: 1 Data size: 13500 Basic stats: 
> COMPLETE Column stats: NONE
> tag: -1
> TopN: 10
> TopN Hash Memory Usage: 0.1
> value expressions: _col1 (type: bigint)
> auto parallelism: true
> Execution mode: vectorized
> Path -> Alias:
>   file:/user/hive/warehouse/test [test]
> Path -> Partition:
>   file:/user/hive/warehouse/test 
> Partition
>   base file name: test
>   input format: org.apache.hadoop.mapred.TextInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>   properties:
> COLUMN_STATS_ACCURATE 
> {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true"}}
> bucket_count -1
> bucketing_version 2
> column.name.delimiter ,
> columns id
> columns.comments 
> columns.types int
> file.inputformat org.apache.hadoop.mapred.TextInputFormat
> file.outputformat 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> location file:/user/hive/warehouse/test
> name default.test
> numFiles 0
> numRows 0
> rawDataSize 0
> serialization.ddl struct test { i32 id}
> serialization.format 1
> serialization.lib 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> totalSize 0
> transient_lastDdlTime 1609730190
>   serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> 
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> properties:
>   COLUMN_STATS_ACCURATE 
> 

[jira] [Commented] (HIVE-24579) Incorrect Result For Groupby With Limit

2021-09-11 Thread Nemon Lou (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17413502#comment-17413502
 ] 

Nemon Lou commented on HIVE-24579:
--

I have repoduce this issue.But data is too big to upload(more than 30mb), any 
suggestions? [~kkasa]

> Incorrect Result For Groupby With Limit
> ---
>
> Key: HIVE-24579
> URL: https://issues.apache.org/jira/browse/HIVE-24579
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.3.7, 3.1.2, 4.0.0
>Reporter: Nemon Lou
>Priority: Major
>
> {code:sql}
> create table test(id int);
> explain extended select id,count(*) from test group by id limit 10;
> {code}
> There is an TopN unexpectly for map phase, which casues incorrect result.
> {code:sql}
> STAGE PLANS:
>   Stage: Stage-1
> Tez
>   DagId: root_20210104141527_c599c0cd-ca2f-4c7d-a3cc-3a01d65c49a1:5
>   Edges:
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
>   DagName: root_20210104141527_c599c0cd-ca2f-4c7d-a3cc-3a01d65c49a1:5
>   Vertices:
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: test
>   Statistics: Num rows: 1 Data size: 13500 Basic stats: 
> COMPLETE Column stats: NONE
>   GatherStats: false
>   Select Operator
> expressions: id (type: int)
> outputColumnNames: id
> Statistics: Num rows: 1 Data size: 13500 Basic stats: 
> COMPLETE Column stats: NONE
> Group By Operator
>   aggregations: count()
>   keys: id (type: int)
>   mode: hash
>   outputColumnNames: _col0, _col1
>   Statistics: Num rows: 1 Data size: 13500 Basic stats: 
> COMPLETE Column stats: NONE
>   Reduce Output Operator
> key expressions: _col0 (type: int)
> null sort order: a
> sort order: +
> Map-reduce partition columns: _col0 (type: int)
> Statistics: Num rows: 1 Data size: 13500 Basic stats: 
> COMPLETE Column stats: NONE
> tag: -1
> TopN: 10
> TopN Hash Memory Usage: 0.1
> value expressions: _col1 (type: bigint)
> auto parallelism: true
> Execution mode: vectorized
> Path -> Alias:
>   file:/user/hive/warehouse/test [test]
> Path -> Partition:
>   file:/user/hive/warehouse/test 
> Partition
>   base file name: test
>   input format: org.apache.hadoop.mapred.TextInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>   properties:
> COLUMN_STATS_ACCURATE 
> {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true"}}
> bucket_count -1
> bucketing_version 2
> column.name.delimiter ,
> columns id
> columns.comments 
> columns.types int
> file.inputformat org.apache.hadoop.mapred.TextInputFormat
> file.outputformat 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> location file:/user/hive/warehouse/test
> name default.test
> numFiles 0
> numRows 0
> rawDataSize 0
> serialization.ddl struct test { i32 id}
> serialization.format 1
> serialization.lib 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> totalSize 0
> transient_lastDdlTime 1609730190
>   serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> 
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> properties:
>   COLUMN_STATS_ACCURATE 
> {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true"}}
>   bucket_count -1
>   bucketing_version 2
>   column.name.delimiter ,
>   columns id
>   columns.comments 
>   columns.types int
>   file.inputformat 
> org.apache.hadoop.mapred.TextInputFormat
>   file.outputformat 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>   location 

[jira] [Updated] (HIVE-24579) Incorrect Result For Groupby With Limit

2021-09-11 Thread Nemon Lou (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nemon Lou updated HIVE-24579:
-
Attachment: testdata.tar.7z.007

> Incorrect Result For Groupby With Limit
> ---
>
> Key: HIVE-24579
> URL: https://issues.apache.org/jira/browse/HIVE-24579
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.3.7, 3.1.2, 4.0.0
>Reporter: Nemon Lou
>Priority: Major
>
> {code:sql}
> create table test(id int);
> explain extended select id,count(*) from test group by id limit 10;
> {code}
> There is an TopN unexpectly for map phase, which casues incorrect result.
> {code:sql}
> STAGE PLANS:
>   Stage: Stage-1
> Tez
>   DagId: root_20210104141527_c599c0cd-ca2f-4c7d-a3cc-3a01d65c49a1:5
>   Edges:
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
>   DagName: root_20210104141527_c599c0cd-ca2f-4c7d-a3cc-3a01d65c49a1:5
>   Vertices:
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: test
>   Statistics: Num rows: 1 Data size: 13500 Basic stats: 
> COMPLETE Column stats: NONE
>   GatherStats: false
>   Select Operator
> expressions: id (type: int)
> outputColumnNames: id
> Statistics: Num rows: 1 Data size: 13500 Basic stats: 
> COMPLETE Column stats: NONE
> Group By Operator
>   aggregations: count()
>   keys: id (type: int)
>   mode: hash
>   outputColumnNames: _col0, _col1
>   Statistics: Num rows: 1 Data size: 13500 Basic stats: 
> COMPLETE Column stats: NONE
>   Reduce Output Operator
> key expressions: _col0 (type: int)
> null sort order: a
> sort order: +
> Map-reduce partition columns: _col0 (type: int)
> Statistics: Num rows: 1 Data size: 13500 Basic stats: 
> COMPLETE Column stats: NONE
> tag: -1
> TopN: 10
> TopN Hash Memory Usage: 0.1
> value expressions: _col1 (type: bigint)
> auto parallelism: true
> Execution mode: vectorized
> Path -> Alias:
>   file:/user/hive/warehouse/test [test]
> Path -> Partition:
>   file:/user/hive/warehouse/test 
> Partition
>   base file name: test
>   input format: org.apache.hadoop.mapred.TextInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>   properties:
> COLUMN_STATS_ACCURATE 
> {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true"}}
> bucket_count -1
> bucketing_version 2
> column.name.delimiter ,
> columns id
> columns.comments 
> columns.types int
> file.inputformat org.apache.hadoop.mapred.TextInputFormat
> file.outputformat 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> location file:/user/hive/warehouse/test
> name default.test
> numFiles 0
> numRows 0
> rawDataSize 0
> serialization.ddl struct test { i32 id}
> serialization.format 1
> serialization.lib 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> totalSize 0
> transient_lastDdlTime 1609730190
>   serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> 
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> properties:
>   COLUMN_STATS_ACCURATE 
> {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true"}}
>   bucket_count -1
>   bucketing_version 2
>   column.name.delimiter ,
>   columns id
>   columns.comments 
>   columns.types int
>   file.inputformat 
> org.apache.hadoop.mapred.TextInputFormat
>   file.outputformat 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>   location file:/user/hive/warehouse/test
>   name default.test
>   numFiles 0
>

[jira] [Updated] (HIVE-24579) Incorrect Result For Groupby With Limit

2021-09-11 Thread Nemon Lou (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nemon Lou updated HIVE-24579:
-
Attachment: (was: testdata.tar.7z.007)

> Incorrect Result For Groupby With Limit
> ---
>
> Key: HIVE-24579
> URL: https://issues.apache.org/jira/browse/HIVE-24579
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.3.7, 3.1.2, 4.0.0
>Reporter: Nemon Lou
>Priority: Major
>
> {code:sql}
> create table test(id int);
> explain extended select id,count(*) from test group by id limit 10;
> {code}
> There is an TopN unexpectly for map phase, which casues incorrect result.
> {code:sql}
> STAGE PLANS:
>   Stage: Stage-1
> Tez
>   DagId: root_20210104141527_c599c0cd-ca2f-4c7d-a3cc-3a01d65c49a1:5
>   Edges:
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
>   DagName: root_20210104141527_c599c0cd-ca2f-4c7d-a3cc-3a01d65c49a1:5
>   Vertices:
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: test
>   Statistics: Num rows: 1 Data size: 13500 Basic stats: 
> COMPLETE Column stats: NONE
>   GatherStats: false
>   Select Operator
> expressions: id (type: int)
> outputColumnNames: id
> Statistics: Num rows: 1 Data size: 13500 Basic stats: 
> COMPLETE Column stats: NONE
> Group By Operator
>   aggregations: count()
>   keys: id (type: int)
>   mode: hash
>   outputColumnNames: _col0, _col1
>   Statistics: Num rows: 1 Data size: 13500 Basic stats: 
> COMPLETE Column stats: NONE
>   Reduce Output Operator
> key expressions: _col0 (type: int)
> null sort order: a
> sort order: +
> Map-reduce partition columns: _col0 (type: int)
> Statistics: Num rows: 1 Data size: 13500 Basic stats: 
> COMPLETE Column stats: NONE
> tag: -1
> TopN: 10
> TopN Hash Memory Usage: 0.1
> value expressions: _col1 (type: bigint)
> auto parallelism: true
> Execution mode: vectorized
> Path -> Alias:
>   file:/user/hive/warehouse/test [test]
> Path -> Partition:
>   file:/user/hive/warehouse/test 
> Partition
>   base file name: test
>   input format: org.apache.hadoop.mapred.TextInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>   properties:
> COLUMN_STATS_ACCURATE 
> {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true"}}
> bucket_count -1
> bucketing_version 2
> column.name.delimiter ,
> columns id
> columns.comments 
> columns.types int
> file.inputformat org.apache.hadoop.mapred.TextInputFormat
> file.outputformat 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> location file:/user/hive/warehouse/test
> name default.test
> numFiles 0
> numRows 0
> rawDataSize 0
> serialization.ddl struct test { i32 id}
> serialization.format 1
> serialization.lib 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> totalSize 0
> transient_lastDdlTime 1609730190
>   serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> 
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> properties:
>   COLUMN_STATS_ACCURATE 
> {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true"}}
>   bucket_count -1
>   bucketing_version 2
>   column.name.delimiter ,
>   columns id
>   columns.comments 
>   columns.types int
>   file.inputformat 
> org.apache.hadoop.mapred.TextInputFormat
>   file.outputformat 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>   location file:/user/hive/warehouse/test
>   name default.test
>   numFiles 0
> 

[jira] [Comment Edited] (HIVE-24579) Incorrect Result For Groupby With Limit

2021-09-11 Thread Nemon Lou (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17413481#comment-17413481
 ] 

Nemon Lou edited comment on HIVE-24579 at 9/11/21, 7:11 AM:


Thanks [~kkasa] for your attention.

This issue only happens on a customer's cluster, and i could not get the data.

This simplified reproduce step seems not match the customer's issue.

Here is the original issue(with table name changed):

 The query result is different for the same store_id when change limit 10 to 
limit 100
{code:sql}
SELECT store_id store_id_hive
, count(1) device_cnt_bound_30day
FROM db_name.table_name
WHERE i_rep_date <= 20201226
AND i_rep_date >= 
cast(from_unixtime(unix_timestamp('20201226','MMdd')-86400*29,'MMdd') 
as int)
AND nvl(is_curr_bound,1) = 1
group by store_id limit 10;
{code}
query plan :
  
{code:sql}
|  Explain   |
++
| Plan optimized by CBO. |
||
| Vertex dependency in root stage|
| Reducer 2 <- Map 1 (SIMPLE_EDGE)   |
||
| Stage-0|
|   Fetch Operator   |
| limit:10   |
| Stage-1|
|   Reducer 2|
|   File Output Operator [FS_8]  |
| Limit [LIM_7] (rows=10 width=39)   |
|   Number of rows:10|
|   Group By Operator [GBY_5] (rows=5618832 width=39) |
| 
Output:["_col0","_col1"],aggregations:["count(VALUE._col0)"],keys:KEY._col0 |
|   <-Map 1 [SIMPLE_EDGE]|
| SHUFFLE [RS_4] |
|   PartitionCols:_col0  |
|   Group By Operator [GBY_3] (rows=11237665 width=39) |
| 
Output:["_col0","_col1"],aggregations:["count()"],keys:store_id |
| Select Operator [SEL_2] (rows=11237665 width=39) |
|   Output:["store_id"]  |
|   Filter Operator [FIL_9] (rows=11237665 width=39) |
| predicate:(NVL(is_curr_bound,1) = 1) |
| TableScan [TS_0] (rows=22475330 width=39) |
|   
db_name@table_name,table_name,Tbl:COMPLETE,Col:NONE,Output:["store_id","is_curr_bound"]
 |
{code}
 part of the extended plan:
{code:sql}
++
|  Explain   |
++
| STAGE DEPENDENCIES:|
|   Stage-1 is a root stage  |
|   Stage-0 depends on stages: Stage-1   |
||
| STAGE PLANS:   |
|   Stage: Stage-1   |
| Tez|
|   DagId: omm_20201228025339_1ef293cf-c508-431a-bf00-6df95178c6e8:3229 |
|   Edges:   |
| Reducer 2 <- Map 1 (SIMPLE_EDGE)   |
|   DagName: omm_20201228025339_1ef293cf-c508-431a-bf00-6df95178c6e8:3229 |
|   Vertices:|
| Map 1  |
| Map Operator Tree: |
| TableScan  |
|   alias: table_name  |
|   Statistics: Num rows: 22475330 Data size: 876537870 Basic 
stats: COMPLETE Column stats: NONE |
|   GatherStats: false   |
|   Filter Operator  |
| isSamplingPred: false  |
| predicate: (NVL(is_curr_bound,1) = 1) (type: boolean) |
| Statistics: Num rows: 11237665 Data size: 438268935 Basic 
stats: COMPLETE Column stats: NONE |
| Select Operator|
|   expressions: store_id (type: string) |
|   outputColumnNames: store_id  |
|   Statistics: Num rows: 11237665 Data size: 438268935 
Basic stats: COMPLETE Column stats: NONE |
|   Group By Operator|
| aggregations: count()  |
| keys: store_id (type: string) |
| mode: hash |
| outputColumnNames: _col0, _col1 |
| Statistics: Num rows: 11237665 Data size: 438268935 
Basic stats: COMPLETE Column stats: NONE |
|

[jira] [Comment Edited] (HIVE-24579) Incorrect Result For Groupby With Limit

2021-09-11 Thread Nemon Lou (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17413481#comment-17413481
 ] 

Nemon Lou edited comment on HIVE-24579 at 9/11/21, 7:02 AM:


Thanks [~kkasa] for your attention.

This issue only happens on a customer's cluster, and i could not get the data.

This simplified reproduce step seems not match the customer's issue.

Here is the original issue(with table name changed):

 The query result is different for the same store_id when change limit 10 to 
limit 100
{code:sql}
SELECT store_id store_id_hive
, count(1) device_cnt_bound_30day
FROM db_name.table_name
WHERE i_rep_date <= 20201226
AND i_rep_date >= 
cast(from_unixtime(unix_timestamp('20201226','MMdd')-86400*29,'MMdd') 
as int)
AND nvl(is_curr_bound,1) = 1
group by store_id limit 10;
{code}
query plan :
  
{code:sql}
|  Explain   |
++
| Plan optimized by CBO. |
||
| Vertex dependency in root stage|
| Reducer 2 <- Map 1 (SIMPLE_EDGE)   |
||
| Stage-0|
|   Fetch Operator   |
| limit:10   |
| Stage-1|
|   Reducer 2|
|   File Output Operator [FS_8]  |
| Limit [LIM_7] (rows=10 width=39)   |
|   Number of rows:10|
|   Group By Operator [GBY_5] (rows=5618832 width=39) |
| 
Output:["_col0","_col1"],aggregations:["count(VALUE._col0)"],keys:KEY._col0 |
|   <-Map 1 [SIMPLE_EDGE]|
| SHUFFLE [RS_4] |
|   PartitionCols:_col0  |
|   Group By Operator [GBY_3] (rows=11237665 width=39) |
| 
Output:["_col0","_col1"],aggregations:["count()"],keys:store_id |
| Select Operator [SEL_2] (rows=11237665 width=39) |
|   Output:["store_id"]  |
|   Filter Operator [FIL_9] (rows=11237665 width=39) |
| predicate:(NVL(is_curr_bound,1) = 1) |
| TableScan [TS_0] (rows=22475330 width=39) |
|   
db_name@table_name,table_name,Tbl:COMPLETE,Col:NONE,Output:["store_id","is_curr_bound"]
 |
{code}
 part of the extended plan:
{code:sql}
 | Reduce Output Operator |
|   key expressions: _col0 (type: string) |
|   null sort order: a   |
|   sort order: +|
|   Map-reduce partition columns: _col0 (type: string) |
|   Statistics: Num rows: 11237665 Data size: 438268935 
Basic stats: COMPLETE Column stats: NONE |
|   tag: -1  |
|   TopN: 10 |
|   TopN Hash Memory Usage: 0.1 |
|   value expressions: _col1 (type: bigint) |
|   auto parallelism: true   |
{code}


was (Author: nemon):
Thanks [~kkasa] for your attention.

This issue only happens on a customer's cluster, and i could not get the data.

This simplified reproduce step seems not match the customer's issue.

Here is the original sql (with table name changed):

 {code:sql}
SELECT store_id store_id_hive
, count(1) device_cnt_bound_30day
FROM db_name.table_name
WHERE i_rep_date <= 20201226
AND i_rep_date >= 
cast(from_unixtime(unix_timestamp('20201226','MMdd')-86400*29,'MMdd') 
as int)
AND nvl(is_curr_bound,1) = 1
group by store_id limit 10;
{code}

query plan :
 {code:sql}
|  Explain   |
++
| Plan optimized by CBO. |
||
| Vertex dependency in root stage|
| Reducer 2 <- Map 1 (SIMPLE_EDGE)   |
||
| Stage-0|
|   Fetch Operator   |
| limit:10   |
| Stage-1|
|   Reducer 2|
|   File Output Operator [FS_8]  |
| Limit [LIM_7] (rows=10 width=39)   |
|   Number of rows:10|
|   Group By Operator [GBY_5] (rows=5618832 width=39) |
| 

[jira] [Commented] (HIVE-24579) Incorrect Result For Groupby With Limit

2021-09-11 Thread Nemon Lou (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17413481#comment-17413481
 ] 

Nemon Lou commented on HIVE-24579:
--

Thanks [~kkasa] for your attention.

This issue only happens on a customer's cluster, and i could not get the data.

This simplified reproduce step seems not match the customer's issue.

Here is the original sql (with table name changed):

 {code:sql}
SELECT store_id store_id_hive
, count(1) device_cnt_bound_30day
FROM db_name.table_name
WHERE i_rep_date <= 20201226
AND i_rep_date >= 
cast(from_unixtime(unix_timestamp('20201226','MMdd')-86400*29,'MMdd') 
as int)
AND nvl(is_curr_bound,1) = 1
group by store_id limit 10;
{code}

query plan :
 {code:sql}
|  Explain   |
++
| Plan optimized by CBO. |
||
| Vertex dependency in root stage|
| Reducer 2 <- Map 1 (SIMPLE_EDGE)   |
||
| Stage-0|
|   Fetch Operator   |
| limit:10   |
| Stage-1|
|   Reducer 2|
|   File Output Operator [FS_8]  |
| Limit [LIM_7] (rows=10 width=39)   |
|   Number of rows:10|
|   Group By Operator [GBY_5] (rows=5618832 width=39) |
| 
Output:["_col0","_col1"],aggregations:["count(VALUE._col0)"],keys:KEY._col0 |
|   <-Map 1 [SIMPLE_EDGE]|
| SHUFFLE [RS_4] |
|   PartitionCols:_col0  |
|   Group By Operator [GBY_3] (rows=11237665 width=39) |
| 
Output:["_col0","_col1"],aggregations:["count()"],keys:store_id |
| Select Operator [SEL_2] (rows=11237665 width=39) |
|   Output:["store_id"]  |
|   Filter Operator [FIL_9] (rows=11237665 width=39) |
| predicate:(NVL(is_curr_bound,1) = 1) |
| TableScan [TS_0] (rows=22475330 width=39) |
|   
db_name@table_name,table_name,Tbl:COMPLETE,Col:NONE,Output:["store_id","is_curr_bound"]
 |
{code}
 part of the extended plan:
{code:sql}
  Reduce Output Operator |
   key expressions: _col0 (type: string) |
   null sort order: a   |
   sort order: +|
   Map-reduce partition columns: _col0 (type: string) |
   Statistics: Num rows: 11237665 Data size: 438268935 
Basic stats: COMPLETE Column stats: NONE |
   tag: -1  |
   TopN: 100|
   TopN Hash Memory Usage: 0.1 |
   value expressions: _col1 (type: bigint) |
   auto parallelism: true   |
{code}


> Incorrect Result For Groupby With Limit
> ---
>
> Key: HIVE-24579
> URL: https://issues.apache.org/jira/browse/HIVE-24579
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.3.7, 3.1.2, 4.0.0
>Reporter: Nemon Lou
>Priority: Major
>
> {code:sql}
> create table test(id int);
> explain extended select id,count(*) from test group by id limit 10;
> {code}
> There is an TopN unexpectly for map phase, which casues incorrect result.
> {code:sql}
> STAGE PLANS:
>   Stage: Stage-1
> Tez
>   DagId: root_20210104141527_c599c0cd-ca2f-4c7d-a3cc-3a01d65c49a1:5
>   Edges:
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
>   DagName: root_20210104141527_c599c0cd-ca2f-4c7d-a3cc-3a01d65c49a1:5
>   Vertices:
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: test
>   Statistics: Num rows: 1 Data size: 13500 Basic stats: 
> COMPLETE Column stats: NONE
>   GatherStats: false
>   Select Operator
> expressions: id (type: int)
> outputColumnNames: id
> Statistics: Num rows: 1 Data size: 13500 Basic stats: 
> COMPLETE Column stats: NONE
> Group By Operator
>   aggregations: count()
>   keys: id (type: int)
>   mode: hash
>   outputColumnNames: _col0, _col1
>   Statistics: Num rows: 1 Data size: 13500 Basic stats: 
> COMPLETE Column stats: NONE
>   Reduce Output Operator
> 

[jira] [Assigned] (HIVE-24902) Incorrect result after fold CASE into COALESCE

2021-03-22 Thread Nemon Lou (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nemon Lou reassigned HIVE-24902:


Assignee: Nemon Lou

> Incorrect result after fold CASE into COALESCE
> --
>
> Key: HIVE-24902
> URL: https://issues.apache.org/jira/browse/HIVE-24902
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 3.1.2, 4.0.0
>Reporter: Nemon Lou
>Assignee: Nemon Lou
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The following sql returns only one record (20210308) but expected two(20210308
> 20210309).
> {code:sql}
> select * from (
> select 
>   case when b.a=1
>   then  
>   cast (from_unixtime(unix_timestamp(cast(20210309 as 
> string),'MMdd') - 86400,'MMdd') as bigint)
>   else 
>   20210309 
>  end 
> as col
> from 
> (select stack(2,1,2) as (a))
>  as b
> ) t 
> where t.col is not null;
> {code}
> The query plan has incorrect predict: 
>  predicate: COALESCE((col0 = 1),false) (type: boolean)
> {code:sql}
> STAGE DEPENDENCIES:
>   Stage-0 is a root stage
> STAGE PLANS:
>   Stage: Stage-0
> Fetch Operator
>   limit: -1
>   Processor Tree:
> TableScan
>   alias: _dummy_table
>   Row Limit Per Split: 1
>   Statistics: Num rows: 1 Data size: 10 Basic stats: COMPLETE Column 
> stats: COMPLETE
>   Select Operator
> expressions: 2 (type: int), 1 (type: int), 2 (type: int)
> outputColumnNames: _col0, _col1, _col2
> Statistics: Num rows: 1 Data size: 12 Basic stats: COMPLETE 
> Column stats: COMPLETE
> UDTF Operator
>   Statistics: Num rows: 1 Data size: 12 Basic stats: COMPLETE 
> Column stats: COMPLETE
>   function name: stack
>   Filter Operator
> predicate: COALESCE((col0 = 1),false) (type: boolean)
> Statistics: Num rows: 1 Data size: 12 Basic stats: COMPLETE 
> Column stats: COMPLETE
> Select Operator
>   expressions: CASE WHEN ((col0 = 1)) THEN (20210308L) ELSE 
> (20210309L) END (type: bigint)
>   outputColumnNames: _col0
>   Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
> Column stats: COMPLETE
>   ListSink
> Time taken: 0.155 seconds, Fetched: 28 row(s)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24902) Incorrect result after fold CASE into COALESCE

2021-03-22 Thread Nemon Lou (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17305966#comment-17305966
 ] 

Nemon Lou commented on HIVE-24902:
--

Thank you [~kgyrtkirk] 
FYI, Hive 2.3.7 can remove the whole case by ProjectReduceExpressionsRule among 
others.
An example sql:
{code:sql}
create table b(a int);
insert into b values (1),(2);

select * from (
select 
case when b.a=1
then  
cast (from_unixtime(unix_timestamp(cast(20210309 as 
string),'MMdd') - 86400,'MMdd') as bigint)
else 
  20210309 
   end 
as col
from 
b
) t 
where t.col is not null;
{code}

I can not tell why.

A direct fix is by validating booleans on both branches during COALESCE 
rewrite.I will submit a PR following this proposal.

> Incorrect result after fold CASE into COALESCE
> --
>
> Key: HIVE-24902
> URL: https://issues.apache.org/jira/browse/HIVE-24902
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 3.1.2, 4.0.0
>Reporter: Nemon Lou
>Priority: Major
>
> The following sql returns only one record (20210308) but expected two(20210308
> 20210309).
> {code:sql}
> select * from (
> select 
>   case when b.a=1
>   then  
>   cast (from_unixtime(unix_timestamp(cast(20210309 as 
> string),'MMdd') - 86400,'MMdd') as bigint)
>   else 
>   20210309 
>  end 
> as col
> from 
> (select stack(2,1,2) as (a))
>  as b
> ) t 
> where t.col is not null;
> {code}
> The query plan has incorrect predict: 
>  predicate: COALESCE((col0 = 1),false) (type: boolean)
> {code:sql}
> STAGE DEPENDENCIES:
>   Stage-0 is a root stage
> STAGE PLANS:
>   Stage: Stage-0
> Fetch Operator
>   limit: -1
>   Processor Tree:
> TableScan
>   alias: _dummy_table
>   Row Limit Per Split: 1
>   Statistics: Num rows: 1 Data size: 10 Basic stats: COMPLETE Column 
> stats: COMPLETE
>   Select Operator
> expressions: 2 (type: int), 1 (type: int), 2 (type: int)
> outputColumnNames: _col0, _col1, _col2
> Statistics: Num rows: 1 Data size: 12 Basic stats: COMPLETE 
> Column stats: COMPLETE
> UDTF Operator
>   Statistics: Num rows: 1 Data size: 12 Basic stats: COMPLETE 
> Column stats: COMPLETE
>   function name: stack
>   Filter Operator
> predicate: COALESCE((col0 = 1),false) (type: boolean)
> Statistics: Num rows: 1 Data size: 12 Basic stats: COMPLETE 
> Column stats: COMPLETE
> Select Operator
>   expressions: CASE WHEN ((col0 = 1)) THEN (20210308L) ELSE 
> (20210309L) END (type: bigint)
>   outputColumnNames: _col0
>   Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
> Column stats: COMPLETE
>   ListSink
> Time taken: 0.155 seconds, Fetched: 28 row(s)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24902) Incorrect result after fold CASE into COALESCE

2021-03-19 Thread Nemon Lou (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17304780#comment-17304780
 ] 

Nemon Lou commented on HIVE-24902:
--

Here is the process how filter expression goes wrong:
Pre optimize(good):
{code:sql}
IS NOT NULL(CASE(=($0, 1), 
CAST(FROM_UNIXTIME(-(UNIX_TIMESTAMP(CAST(_UTF-16LE'20210309'):VARCHAR(2147483647)
 CHARACTER SET "UTF-16LE" COLLATE "ISO-8859-1$en_US$primary", 
_UTF-16LE'MMdd'), CAST(86400):BIGINT), _UTF-16LE'MMdd')):BIGINT, 
20210309))
{code}

After pushes predicates into CASE(good):
{code:sql}
CASE(=($0, 1), IS NOT 
NULL(CAST(FROM_UNIXTIME(-(UNIX_TIMESTAMP(CAST(_UTF-16LE'20210309'):VARCHAR(2147483647)
 CHARACTER SET "UTF-16LE" COLLATE "ISO-8859-1$en_US$primary", 
_UTF-16LE'MMdd'), CAST(86400):BIGINT), _UTF-16LE'MMdd')):BIGINT), true)
{code}

After constants folding(good):
{code:sql}
CASE(=($0, 1), true, true)
{code}

After Rewrite CASE into COALESCE(bad):
{code:sql}
COALESCE(=($0, 1),false)
{code}

The related code of COALESCE rewrite:
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/type/TypeCheckProcFactory.java#L1079

> Incorrect result after fold CASE into COALESCE
> --
>
> Key: HIVE-24902
> URL: https://issues.apache.org/jira/browse/HIVE-24902
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 3.1.2, 4.0.0
>Reporter: Nemon Lou
>Priority: Major
>
> The following sql returns only one record (20210308) but expected two(20210308
> 20210309).
> {code:sql}
> select * from (
> select 
>   case when b.a=1
>   then  
>   cast (from_unixtime(unix_timestamp(cast(20210309 as 
> string),'MMdd') - 86400,'MMdd') as bigint)
>   else 
>   20210309 
>  end 
> as col
> from 
> (select stack(2,1,2) as (a))
>  as b
> ) t 
> where t.col is not null;
> {code}
> The query plan has incorrect predict: 
>  predicate: COALESCE((col0 = 1),false) (type: boolean)
> {code:sql}
> STAGE DEPENDENCIES:
>   Stage-0 is a root stage
> STAGE PLANS:
>   Stage: Stage-0
> Fetch Operator
>   limit: -1
>   Processor Tree:
> TableScan
>   alias: _dummy_table
>   Row Limit Per Split: 1
>   Statistics: Num rows: 1 Data size: 10 Basic stats: COMPLETE Column 
> stats: COMPLETE
>   Select Operator
> expressions: 2 (type: int), 1 (type: int), 2 (type: int)
> outputColumnNames: _col0, _col1, _col2
> Statistics: Num rows: 1 Data size: 12 Basic stats: COMPLETE 
> Column stats: COMPLETE
> UDTF Operator
>   Statistics: Num rows: 1 Data size: 12 Basic stats: COMPLETE 
> Column stats: COMPLETE
>   function name: stack
>   Filter Operator
> predicate: COALESCE((col0 = 1),false) (type: boolean)
> Statistics: Num rows: 1 Data size: 12 Basic stats: COMPLETE 
> Column stats: COMPLETE
> Select Operator
>   expressions: CASE WHEN ((col0 = 1)) THEN (20210308L) ELSE 
> (20210309L) END (type: bigint)
>   outputColumnNames: _col0
>   Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
> Column stats: COMPLETE
>   ListSink
> Time taken: 0.155 seconds, Fetched: 28 row(s)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24902) Incorrect result after fold CASE into COALESCE

2021-03-19 Thread Nemon Lou (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nemon Lou updated HIVE-24902:
-
Description: 
The following sql returns only one record (20210308) but expected two(20210308
20210309).

{code:sql}
select * from (
select 
case when b.a=1
then  
cast (from_unixtime(unix_timestamp(cast(20210309 as 
string),'MMdd') - 86400,'MMdd') as bigint)
else 
20210309 
   end 
as col
from 
(select stack(2,1,2) as (a))
 as b
) t 
where t.col is not null;
{code}

The query plan has incorrect predict: 
 predicate: COALESCE((col0 = 1),false) (type: boolean)
{code:sql}
STAGE DEPENDENCIES:
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-0
Fetch Operator
  limit: -1
  Processor Tree:
TableScan
  alias: _dummy_table
  Row Limit Per Split: 1
  Statistics: Num rows: 1 Data size: 10 Basic stats: COMPLETE Column 
stats: COMPLETE
  Select Operator
expressions: 2 (type: int), 1 (type: int), 2 (type: int)
outputColumnNames: _col0, _col1, _col2
Statistics: Num rows: 1 Data size: 12 Basic stats: COMPLETE Column 
stats: COMPLETE
UDTF Operator
  Statistics: Num rows: 1 Data size: 12 Basic stats: COMPLETE 
Column stats: COMPLETE
  function name: stack
  Filter Operator
predicate: COALESCE((col0 = 1),false) (type: boolean)
Statistics: Num rows: 1 Data size: 12 Basic stats: COMPLETE 
Column stats: COMPLETE
Select Operator
  expressions: CASE WHEN ((col0 = 1)) THEN (20210308L) ELSE 
(20210309L) END (type: bigint)
  outputColumnNames: _col0
  Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
Column stats: COMPLETE
  ListSink

Time taken: 0.155 seconds, Fetched: 28 row(s)

{code}


  was:
The following sql returns only one record (20210308)but we expect two(20210308
20210309).

{code:sql}
select * from (
select 
case when b.a=1
   then  
cast 
(from_unixtime(unix_timestamp(cast(20210309 as string),'MMdd') - 
86400,'MMdd') as bigint)
  else 
  20210309 
   end 
as col
from 
(select stack(2,1,2) as (a))
 as b
) t 
where t.col is not null;
{code}

After debuging, i find the ReduceExpressionsRule changes expression in the 
wrong way.
Original expression:

{code:sql}
IS NOT NULL(CASE(=($0, 1), 
CAST(FROM_UNIXTIME(-(UNIX_TIMESTAMP(CAST(_UTF-16LE'20210309'):VARCHAR(2147483647)
 CHARACTER SET "UTF-16LE" COLLATE "ISO-8859-1$en_US$primary", 
_UTF-16LE'MMdd'), CAST(86400):BIGINT), _UTF-16LE'MMdd')):BIGINT, 
20210309))
{code}

After reducing expressions:
{code:sql}
CASE(=($0, 1), IS NOT 
NULL(CAST(FROM_UNIXTIME(-(UNIX_TIMESTAMP(CAST(_UTF-16LE'20210309'):VARCHAR(2147483647)
 CHARACTER SET "UTF-16LE" COLLATE "ISO-8859-1$en_US$primary", 
_UTF-16LE'MMdd'), CAST(86400):BIGINT), _UTF-16LE'MMdd')):BIGINT), true)
{code}

The query plan in main branch:
{code:sql}
STAGE DEPENDENCIES:
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-0
Fetch Operator
  limit: -1
  Processor Tree:
TableScan
  alias: _dummy_table
  Row Limit Per Split: 1
  Statistics: Num rows: 1 Data size: 10 Basic stats: COMPLETE Column 
stats: COMPLETE
  Select Operator
expressions: 2 (type: int), 1 (type: int), 2 (type: int)
outputColumnNames: _col0, _col1, _col2
Statistics: Num rows: 1 Data size: 12 Basic stats: COMPLETE Column 
stats: COMPLETE
UDTF Operator
  Statistics: Num rows: 1 Data size: 12 Basic stats: COMPLETE 
Column stats: COMPLETE
  function name: stack
  Filter Operator
predicate: COALESCE((col0 = 1),false) (type: boolean)
Statistics: Num rows: 1 Data size: 12 Basic stats: COMPLETE 
Column stats: COMPLETE
Select Operator
  expressions: CASE WHEN ((col0 = 1)) THEN (20210308L) ELSE 
(20210309L) END (type: bigint)
  outputColumnNames: _col0
  Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
Column stats: COMPLETE
  ListSink

Time taken: 0.155 seconds, Fetched: 28 row(s)

{code}



> Incorrect result after fold CASE into COALESCE
> --
>
> Key: HIVE-24902
> URL: https://issues.apache.org/jira/browse/HIVE-24902
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 3.1.2, 4.0.0
>Reporter: Nemon Lou
>

[jira] [Updated] (HIVE-24902) Incorrect result after fold CASE into COALESCE

2021-03-19 Thread Nemon Lou (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nemon Lou updated HIVE-24902:
-
Summary: Incorrect result after fold CASE into COALESCE  (was: Incorrect 
result after fold CASE into NVL)

> Incorrect result after fold CASE into COALESCE
> --
>
> Key: HIVE-24902
> URL: https://issues.apache.org/jira/browse/HIVE-24902
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 3.1.2, 4.0.0
>Reporter: Nemon Lou
>Priority: Major
>
> The following sql returns only one record (20210308)but we expect two(20210308
> 20210309).
> {code:sql}
> select * from (
> select 
>   case when b.a=1
>  then  
>   cast 
> (from_unixtime(unix_timestamp(cast(20210309 as string),'MMdd') - 
> 86400,'MMdd') as bigint)
> else 
> 20210309 
>  end 
> as col
> from 
> (select stack(2,1,2) as (a))
>  as b
> ) t 
> where t.col is not null;
> {code}
> After debuging, i find the ReduceExpressionsRule changes expression in the 
> wrong way.
> Original expression:
> {code:sql}
> IS NOT NULL(CASE(=($0, 1), 
> CAST(FROM_UNIXTIME(-(UNIX_TIMESTAMP(CAST(_UTF-16LE'20210309'):VARCHAR(2147483647)
>  CHARACTER SET "UTF-16LE" COLLATE "ISO-8859-1$en_US$primary", 
> _UTF-16LE'MMdd'), CAST(86400):BIGINT), _UTF-16LE'MMdd')):BIGINT, 
> 20210309))
> {code}
> After reducing expressions:
> {code:sql}
> CASE(=($0, 1), IS NOT 
> NULL(CAST(FROM_UNIXTIME(-(UNIX_TIMESTAMP(CAST(_UTF-16LE'20210309'):VARCHAR(2147483647)
>  CHARACTER SET "UTF-16LE" COLLATE "ISO-8859-1$en_US$primary", 
> _UTF-16LE'MMdd'), CAST(86400):BIGINT), _UTF-16LE'MMdd')):BIGINT), 
> true)
> {code}
> The query plan in main branch:
> {code:sql}
> STAGE DEPENDENCIES:
>   Stage-0 is a root stage
> STAGE PLANS:
>   Stage: Stage-0
> Fetch Operator
>   limit: -1
>   Processor Tree:
> TableScan
>   alias: _dummy_table
>   Row Limit Per Split: 1
>   Statistics: Num rows: 1 Data size: 10 Basic stats: COMPLETE Column 
> stats: COMPLETE
>   Select Operator
> expressions: 2 (type: int), 1 (type: int), 2 (type: int)
> outputColumnNames: _col0, _col1, _col2
> Statistics: Num rows: 1 Data size: 12 Basic stats: COMPLETE 
> Column stats: COMPLETE
> UDTF Operator
>   Statistics: Num rows: 1 Data size: 12 Basic stats: COMPLETE 
> Column stats: COMPLETE
>   function name: stack
>   Filter Operator
> predicate: COALESCE((col0 = 1),false) (type: boolean)
> Statistics: Num rows: 1 Data size: 12 Basic stats: COMPLETE 
> Column stats: COMPLETE
> Select Operator
>   expressions: CASE WHEN ((col0 = 1)) THEN (20210308L) ELSE 
> (20210309L) END (type: bigint)
>   outputColumnNames: _col0
>   Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
> Column stats: COMPLETE
>   ListSink
> Time taken: 0.155 seconds, Fetched: 28 row(s)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24902) Incorrect result after fold CASE into NVL

2021-03-19 Thread Nemon Lou (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nemon Lou updated HIVE-24902:
-
Summary: Incorrect result after fold CASE into NVL  (was: Incorrect result 
due to ReduceExpressionsRule)

> Incorrect result after fold CASE into NVL
> -
>
> Key: HIVE-24902
> URL: https://issues.apache.org/jira/browse/HIVE-24902
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 3.1.2, 4.0.0
>Reporter: Nemon Lou
>Priority: Major
>
> The following sql returns only one record (20210308)but we expect two(20210308
> 20210309).
> {code:sql}
> select * from (
> select 
>   case when b.a=1
>  then  
>   cast 
> (from_unixtime(unix_timestamp(cast(20210309 as string),'MMdd') - 
> 86400,'MMdd') as bigint)
> else 
> 20210309 
>  end 
> as col
> from 
> (select stack(2,1,2) as (a))
>  as b
> ) t 
> where t.col is not null;
> {code}
> After debuging, i find the ReduceExpressionsRule changes expression in the 
> wrong way.
> Original expression:
> {code:sql}
> IS NOT NULL(CASE(=($0, 1), 
> CAST(FROM_UNIXTIME(-(UNIX_TIMESTAMP(CAST(_UTF-16LE'20210309'):VARCHAR(2147483647)
>  CHARACTER SET "UTF-16LE" COLLATE "ISO-8859-1$en_US$primary", 
> _UTF-16LE'MMdd'), CAST(86400):BIGINT), _UTF-16LE'MMdd')):BIGINT, 
> 20210309))
> {code}
> After reducing expressions:
> {code:sql}
> CASE(=($0, 1), IS NOT 
> NULL(CAST(FROM_UNIXTIME(-(UNIX_TIMESTAMP(CAST(_UTF-16LE'20210309'):VARCHAR(2147483647)
>  CHARACTER SET "UTF-16LE" COLLATE "ISO-8859-1$en_US$primary", 
> _UTF-16LE'MMdd'), CAST(86400):BIGINT), _UTF-16LE'MMdd')):BIGINT), 
> true)
> {code}
> The query plan in main branch:
> {code:sql}
> STAGE DEPENDENCIES:
>   Stage-0 is a root stage
> STAGE PLANS:
>   Stage: Stage-0
> Fetch Operator
>   limit: -1
>   Processor Tree:
> TableScan
>   alias: _dummy_table
>   Row Limit Per Split: 1
>   Statistics: Num rows: 1 Data size: 10 Basic stats: COMPLETE Column 
> stats: COMPLETE
>   Select Operator
> expressions: 2 (type: int), 1 (type: int), 2 (type: int)
> outputColumnNames: _col0, _col1, _col2
> Statistics: Num rows: 1 Data size: 12 Basic stats: COMPLETE 
> Column stats: COMPLETE
> UDTF Operator
>   Statistics: Num rows: 1 Data size: 12 Basic stats: COMPLETE 
> Column stats: COMPLETE
>   function name: stack
>   Filter Operator
> predicate: COALESCE((col0 = 1),false) (type: boolean)
> Statistics: Num rows: 1 Data size: 12 Basic stats: COMPLETE 
> Column stats: COMPLETE
> Select Operator
>   expressions: CASE WHEN ((col0 = 1)) THEN (20210308L) ELSE 
> (20210309L) END (type: bigint)
>   outputColumnNames: _col0
>   Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
> Column stats: COMPLETE
>   ListSink
> Time taken: 0.155 seconds, Fetched: 28 row(s)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24902) Incorrect result due to ReduceExpressionsRule

2021-03-18 Thread Nemon Lou (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17304572#comment-17304572
 ] 

Nemon Lou commented on HIVE-24902:
--

Sorry if I offended you. And thanks for your response.
I'm trying to figure out what cause this bug and calcite part is difficult for 
me now.I will dig more.
If it is a common issue , any help from the community is appreciated.

> Incorrect result due to ReduceExpressionsRule
> -
>
> Key: HIVE-24902
> URL: https://issues.apache.org/jira/browse/HIVE-24902
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 3.1.2, 4.0.0
>Reporter: Nemon Lou
>Priority: Major
>
> The following sql returns only one record (20210308)but we expect two(20210308
> 20210309).
> {code:sql}
> select * from (
> select 
>   case when b.a=1
>  then  
>   cast 
> (from_unixtime(unix_timestamp(cast(20210309 as string),'MMdd') - 
> 86400,'MMdd') as bigint)
> else 
> 20210309 
>  end 
> as col
> from 
> (select stack(2,1,2) as (a))
>  as b
> ) t 
> where t.col is not null;
> {code}
> After debuging, i find the ReduceExpressionsRule changes expression in the 
> wrong way.
> Original expression:
> {code:sql}
> IS NOT NULL(CASE(=($0, 1), 
> CAST(FROM_UNIXTIME(-(UNIX_TIMESTAMP(CAST(_UTF-16LE'20210309'):VARCHAR(2147483647)
>  CHARACTER SET "UTF-16LE" COLLATE "ISO-8859-1$en_US$primary", 
> _UTF-16LE'MMdd'), CAST(86400):BIGINT), _UTF-16LE'MMdd')):BIGINT, 
> 20210309))
> {code}
> After reducing expressions:
> {code:sql}
> CASE(=($0, 1), IS NOT 
> NULL(CAST(FROM_UNIXTIME(-(UNIX_TIMESTAMP(CAST(_UTF-16LE'20210309'):VARCHAR(2147483647)
>  CHARACTER SET "UTF-16LE" COLLATE "ISO-8859-1$en_US$primary", 
> _UTF-16LE'MMdd'), CAST(86400):BIGINT), _UTF-16LE'MMdd')):BIGINT), 
> true)
> {code}
> The query plan in main branch:
> {code:sql}
> STAGE DEPENDENCIES:
>   Stage-0 is a root stage
> STAGE PLANS:
>   Stage: Stage-0
> Fetch Operator
>   limit: -1
>   Processor Tree:
> TableScan
>   alias: _dummy_table
>   Row Limit Per Split: 1
>   Statistics: Num rows: 1 Data size: 10 Basic stats: COMPLETE Column 
> stats: COMPLETE
>   Select Operator
> expressions: 2 (type: int), 1 (type: int), 2 (type: int)
> outputColumnNames: _col0, _col1, _col2
> Statistics: Num rows: 1 Data size: 12 Basic stats: COMPLETE 
> Column stats: COMPLETE
> UDTF Operator
>   Statistics: Num rows: 1 Data size: 12 Basic stats: COMPLETE 
> Column stats: COMPLETE
>   function name: stack
>   Filter Operator
> predicate: COALESCE((col0 = 1),false) (type: boolean)
> Statistics: Num rows: 1 Data size: 12 Basic stats: COMPLETE 
> Column stats: COMPLETE
> Select Operator
>   expressions: CASE WHEN ((col0 = 1)) THEN (20210308L) ELSE 
> (20210309L) END (type: bigint)
>   outputColumnNames: _col0
>   Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
> Column stats: COMPLETE
>   ListSink
> Time taken: 0.155 seconds, Fetched: 28 row(s)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24902) Incorrect result due to ReduceExpressionsRule

2021-03-18 Thread Nemon Lou (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17303874#comment-17303874
 ] 

Nemon Lou commented on HIVE-24902:
--

[~julianhyde] Would you mind taking a look? Calcite version is 1.21.0

> Incorrect result due to ReduceExpressionsRule
> -
>
> Key: HIVE-24902
> URL: https://issues.apache.org/jira/browse/HIVE-24902
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 3.1.2, 4.0.0
>Reporter: Nemon Lou
>Priority: Major
>
> The following sql returns only one record (20210308)but we expect two(20210308
> 20210309).
> {code:sql}
> select * from (
> select 
>   case when b.a=1
>  then  
>   cast 
> (from_unixtime(unix_timestamp(cast(20210309 as string),'MMdd') - 
> 86400,'MMdd') as bigint)
> else 
> 20210309 
>  end 
> as col
> from 
> (select stack(2,1,2) as (a))
>  as b
> ) t 
> where t.col is not null;
> {code}
> After debuging, i find the ReduceExpressionsRule changes expression in the 
> wrong way.
> Original expression:
> {code:sql}
> IS NOT NULL(CASE(=($0, 1), 
> CAST(FROM_UNIXTIME(-(UNIX_TIMESTAMP(CAST(_UTF-16LE'20210309'):VARCHAR(2147483647)
>  CHARACTER SET "UTF-16LE" COLLATE "ISO-8859-1$en_US$primary", 
> _UTF-16LE'MMdd'), CAST(86400):BIGINT), _UTF-16LE'MMdd')):BIGINT, 
> 20210309))
> {code}
> After reducing expressions:
> {code:sql}
> CASE(=($0, 1), IS NOT 
> NULL(CAST(FROM_UNIXTIME(-(UNIX_TIMESTAMP(CAST(_UTF-16LE'20210309'):VARCHAR(2147483647)
>  CHARACTER SET "UTF-16LE" COLLATE "ISO-8859-1$en_US$primary", 
> _UTF-16LE'MMdd'), CAST(86400):BIGINT), _UTF-16LE'MMdd')):BIGINT), 
> true)
> {code}
> The query plan in main branch:
> {code:sql}
> STAGE DEPENDENCIES:
>   Stage-0 is a root stage
> STAGE PLANS:
>   Stage: Stage-0
> Fetch Operator
>   limit: -1
>   Processor Tree:
> TableScan
>   alias: _dummy_table
>   Row Limit Per Split: 1
>   Statistics: Num rows: 1 Data size: 10 Basic stats: COMPLETE Column 
> stats: COMPLETE
>   Select Operator
> expressions: 2 (type: int), 1 (type: int), 2 (type: int)
> outputColumnNames: _col0, _col1, _col2
> Statistics: Num rows: 1 Data size: 12 Basic stats: COMPLETE 
> Column stats: COMPLETE
> UDTF Operator
>   Statistics: Num rows: 1 Data size: 12 Basic stats: COMPLETE 
> Column stats: COMPLETE
>   function name: stack
>   Filter Operator
> predicate: COALESCE((col0 = 1),false) (type: boolean)
> Statistics: Num rows: 1 Data size: 12 Basic stats: COMPLETE 
> Column stats: COMPLETE
> Select Operator
>   expressions: CASE WHEN ((col0 = 1)) THEN (20210308L) ELSE 
> (20210309L) END (type: bigint)
>   outputColumnNames: _col0
>   Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
> Column stats: COMPLETE
>   ListSink
> Time taken: 0.155 seconds, Fetched: 28 row(s)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24579) Incorrect Result For Groupby With Limit

2021-01-03 Thread Nemon Lou (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nemon Lou updated HIVE-24579:
-
Priority: Major  (was: Critical)

> Incorrect Result For Groupby With Limit
> ---
>
> Key: HIVE-24579
> URL: https://issues.apache.org/jira/browse/HIVE-24579
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.3.7, 3.1.2, 4.0.0
>Reporter: Nemon Lou
>Priority: Major
>
> {code:sql}
> create table test(id int);
> explain extended select id,count(*) from test group by id limit 10;
> {code}
> There is an TopN unexpectly for map phase, which casues incorrect result.
> {code:sql}
> STAGE PLANS:
>   Stage: Stage-1
> Tez
>   DagId: root_20210104141527_c599c0cd-ca2f-4c7d-a3cc-3a01d65c49a1:5
>   Edges:
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
>   DagName: root_20210104141527_c599c0cd-ca2f-4c7d-a3cc-3a01d65c49a1:5
>   Vertices:
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: test
>   Statistics: Num rows: 1 Data size: 13500 Basic stats: 
> COMPLETE Column stats: NONE
>   GatherStats: false
>   Select Operator
> expressions: id (type: int)
> outputColumnNames: id
> Statistics: Num rows: 1 Data size: 13500 Basic stats: 
> COMPLETE Column stats: NONE
> Group By Operator
>   aggregations: count()
>   keys: id (type: int)
>   mode: hash
>   outputColumnNames: _col0, _col1
>   Statistics: Num rows: 1 Data size: 13500 Basic stats: 
> COMPLETE Column stats: NONE
>   Reduce Output Operator
> key expressions: _col0 (type: int)
> null sort order: a
> sort order: +
> Map-reduce partition columns: _col0 (type: int)
> Statistics: Num rows: 1 Data size: 13500 Basic stats: 
> COMPLETE Column stats: NONE
> tag: -1
> TopN: 10
> TopN Hash Memory Usage: 0.1
> value expressions: _col1 (type: bigint)
> auto parallelism: true
> Execution mode: vectorized
> Path -> Alias:
>   file:/user/hive/warehouse/test [test]
> Path -> Partition:
>   file:/user/hive/warehouse/test 
> Partition
>   base file name: test
>   input format: org.apache.hadoop.mapred.TextInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>   properties:
> COLUMN_STATS_ACCURATE 
> {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true"}}
> bucket_count -1
> bucketing_version 2
> column.name.delimiter ,
> columns id
> columns.comments 
> columns.types int
> file.inputformat org.apache.hadoop.mapred.TextInputFormat
> file.outputformat 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> location file:/user/hive/warehouse/test
> name default.test
> numFiles 0
> numRows 0
> rawDataSize 0
> serialization.ddl struct test { i32 id}
> serialization.format 1
> serialization.lib 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> totalSize 0
> transient_lastDdlTime 1609730190
>   serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> 
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> properties:
>   COLUMN_STATS_ACCURATE 
> {"BASIC_STATS":"true","COLUMN_STATS":{"id":"true"}}
>   bucket_count -1
>   bucketing_version 2
>   column.name.delimiter ,
>   columns id
>   columns.comments 
>   columns.types int
>   file.inputformat 
> org.apache.hadoop.mapred.TextInputFormat
>   file.outputformat 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>   location file:/user/hive/warehouse/test
>   name default.test
>   numFiles 0
>   

[jira] [Updated] (HIVE-24579) Incorrect Result For Groupby With Limit

2021-01-03 Thread Nemon Lou (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nemon Lou updated HIVE-24579:
-
Description: 
{code:sql}
create table test(id int);
explain extended select id,count(*) from test group by id limit 10;
{code}

There is an TopN unexpectly for map phase, which casues incorrect result.


{code:sql}
STAGE PLANS:
  Stage: Stage-1
Tez
  DagId: root_20210104141527_c599c0cd-ca2f-4c7d-a3cc-3a01d65c49a1:5
  Edges:
Reducer 2 <- Map 1 (SIMPLE_EDGE)
  DagName: root_20210104141527_c599c0cd-ca2f-4c7d-a3cc-3a01d65c49a1:5
  Vertices:
Map 1 
Map Operator Tree:
TableScan
  alias: test
  Statistics: Num rows: 1 Data size: 13500 Basic stats: 
COMPLETE Column stats: NONE
  GatherStats: false
  Select Operator
expressions: id (type: int)
outputColumnNames: id
Statistics: Num rows: 1 Data size: 13500 Basic stats: 
COMPLETE Column stats: NONE
Group By Operator
  aggregations: count()
  keys: id (type: int)
  mode: hash
  outputColumnNames: _col0, _col1
  Statistics: Num rows: 1 Data size: 13500 Basic stats: 
COMPLETE Column stats: NONE
  Reduce Output Operator
key expressions: _col0 (type: int)
null sort order: a
sort order: +
Map-reduce partition columns: _col0 (type: int)
Statistics: Num rows: 1 Data size: 13500 Basic stats: 
COMPLETE Column stats: NONE
tag: -1
TopN: 10
TopN Hash Memory Usage: 0.1
value expressions: _col1 (type: bigint)
auto parallelism: true
Execution mode: vectorized
Path -> Alias:
  file:/user/hive/warehouse/test [test]
Path -> Partition:
  file:/user/hive/warehouse/test 
Partition
  base file name: test
  input format: org.apache.hadoop.mapred.TextInputFormat
  output format: 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
  properties:
COLUMN_STATS_ACCURATE 
{"BASIC_STATS":"true","COLUMN_STATS":{"id":"true"}}
bucket_count -1
bucketing_version 2
column.name.delimiter ,
columns id
columns.comments 
columns.types int
file.inputformat org.apache.hadoop.mapred.TextInputFormat
file.outputformat 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
location file:/user/hive/warehouse/test
name default.test
numFiles 0
numRows 0
rawDataSize 0
serialization.ddl struct test { i32 id}
serialization.format 1
serialization.lib 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
totalSize 0
transient_lastDdlTime 1609730190
  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

input format: org.apache.hadoop.mapred.TextInputFormat
output format: 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
properties:
  COLUMN_STATS_ACCURATE 
{"BASIC_STATS":"true","COLUMN_STATS":{"id":"true"}}
  bucket_count -1
  bucketing_version 2
  column.name.delimiter ,
  columns id
  columns.comments 
  columns.types int
  file.inputformat org.apache.hadoop.mapred.TextInputFormat
  file.outputformat 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
  location file:/user/hive/warehouse/test
  name default.test
  numFiles 0
  numRows 0
  rawDataSize 0
  serialization.ddl struct test { i32 id}
  serialization.format 1
  serialization.lib 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
  totalSize 0
  transient_lastDdlTime 1609730190
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
name: default.test
  name: default.test
Truncated Path -> Alias:
  /test [test]
Reducer 

[jira] [Updated] (HIVE-24579) Incorrect Result For Groupby With Limit

2021-01-03 Thread Nemon Lou (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nemon Lou updated HIVE-24579:
-
Description: 
{code:sql}
create table test(id int);
explain extended select id,count(*) from test group by id limit 10;
{code}

There is an TopN unexpectly for map phase, which casues incorrect result.


{code:sql}
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
Tez
  DagId: root_20210104140946_940cd4ce-8bb5-41ac-91ec-1185245da009:4
  Edges:
Reducer 2 <- Map 1 (SIMPLE_EDGE)
  DagName: root_20210104140946_940cd4ce-8bb5-41ac-91ec-1185245da009:4
  Vertices:
Map 1 
Map Operator Tree:
TableScan
  alias: test
  Statistics: Num rows: 1 Data size: 13500 Basic stats: 
COMPLETE Column stats: NONE
  GatherStats: false
  Select Operator
expressions: id (type: int)
outputColumnNames: id
Statistics: Num rows: 1 Data size: 13500 Basic stats: 
COMPLETE Column stats: NONE
Top N Key Operator
  sort order: +
  keys: id (type: int)
  Statistics: Num rows: 1 Data size: 13500 Basic stats: 
COMPLETE Column stats: NONE
  top n: 10
  Group By Operator
aggregations: count()
keys: id (type: int)
mode: hash
outputColumnNames: _col0, _col1
Statistics: Num rows: 1 Data size: 13500 Basic stats: 
COMPLETE Column stats: NONE
Reduce Output Operator
  key expressions: _col0 (type: int)
  null sort order: a
  sort order: +
  Map-reduce partition columns: _col0 (type: int)
  Statistics: Num rows: 1 Data size: 13500 Basic stats: 
COMPLETE Column stats: NONE
  tag: -1
  TopN: 10
  TopN Hash Memory Usage: 0.1
  value expressions: _col1 (type: bigint)
  auto parallelism: true
Execution mode: vectorized
Path -> Alias:
  file:/user/hive/warehouse/test [test]
Path -> Partition:
  file:/user/hive/warehouse/test 
Partition
  base file name: test
  input format: org.apache.hadoop.mapred.TextInputFormat
  output format: 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
  properties:
COLUMN_STATS_ACCURATE 
{"BASIC_STATS":"true","COLUMN_STATS":{"id":"true"}}
bucket_count -1
bucketing_version 2
column.name.delimiter ,
columns id
columns.comments 
columns.types int
file.inputformat org.apache.hadoop.mapred.TextInputFormat
file.outputformat 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
location file:/user/hive/warehouse/test
name default.test
numFiles 0
numRows 0
rawDataSize 0
serialization.ddl struct test { i32 id}
serialization.format 1
serialization.lib 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
totalSize 0
transient_lastDdlTime 1609730190
  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

input format: org.apache.hadoop.mapred.TextInputFormat
output format: 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
properties:
  COLUMN_STATS_ACCURATE 
{"BASIC_STATS":"true","COLUMN_STATS":{"id":"true"}}
  bucket_count -1
  bucketing_version 2
  column.name.delimiter ,
  columns id
  columns.comments 
  columns.types int
  file.inputformat org.apache.hadoop.mapred.TextInputFormat
  file.outputformat 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
  location file:/user/hive/warehouse/test
  name default.test
  numFiles 0
  numRows 0
  rawDataSize 0
  serialization.ddl struct test { i32 id}
  serialization.format 1
  serialization.lib 

[jira] [Updated] (HIVE-24579) Incorrect Result For Groupby With Limit

2021-01-03 Thread Nemon Lou (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nemon Lou updated HIVE-24579:
-
Description: 
{code:sql}
create table test(id int);
explain extended select id,count(*) from test group by id limit 10;
{code}

There is an TopN unexpectly for map phase, which casues incorrect result.


{code:sql}
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
Tez
  DagId: root_20210104113831_2451d621-8f77-4a29-9da6-3a65bc4d9e56:2
  Edges:
Reducer 2 <- Map 1 (SIMPLE_EDGE)
  DagName: root_20210104113831_2451d621-8f77-4a29-9da6-3a65bc4d9e56:2
  Vertices:
Map 1 
Map Operator Tree:
TableScan
  alias: test
  Statistics: Num rows: 1 Data size: 13500 Basic stats: 
COMPLETE Column stats: NONE
  GatherStats: false
  Select Operator
expressions: id (type: int)
outputColumnNames: id
Statistics: Num rows: 1 Data size: 13500 Basic stats: 
COMPLETE Column stats: NONE
Group By Operator
  aggregations: count()
  keys: id (type: int)
  mode: hash
  outputColumnNames: _col0, _col1
  Statistics: Num rows: 1 Data size: 13500 Basic stats: 
COMPLETE Column stats: NONE
  Reduce Output Operator
key expressions: _col0 (type: int)
null sort order: a
sort order: +
Map-reduce partition columns: _col0 (type: int)
Statistics: Num rows: 1 Data size: 13500 Basic stats: 
COMPLETE Column stats: NONE
tag: -1
value expressions: _col1 (type: bigint)
auto parallelism: true
Execution mode: vectorized
Path -> Alias:
  file:/user/hive/warehouse/test [test]
Path -> Partition:
  file:/user/hive/warehouse/test 
Partition
  base file name: test
  input format: org.apache.hadoop.mapred.TextInputFormat
  output format: 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
  properties:
COLUMN_STATS_ACCURATE 
{"BASIC_STATS":"true","COLUMN_STATS":{"id":"true"}}
bucket_count -1
bucketing_version 2
column.name.delimiter ,
columns id
columns.comments 
columns.types int
file.inputformat org.apache.hadoop.mapred.TextInputFormat
file.outputformat 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
location file:/user/hive/warehouse/test
name default.test
numFiles 0
numRows 0
rawDataSize 0
serialization.ddl struct test { i32 id}
serialization.format 1
serialization.lib 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
totalSize 0
transient_lastDdlTime 1609730190
  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

input format: org.apache.hadoop.mapred.TextInputFormat
output format: 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
properties:
  COLUMN_STATS_ACCURATE 
{"BASIC_STATS":"true","COLUMN_STATS":{"id":"true"}}
  bucket_count -1
  bucketing_version 2
  column.name.delimiter ,
  columns id
  columns.comments 
  columns.types int
  file.inputformat org.apache.hadoop.mapred.TextInputFormat
  file.outputformat 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
  location file:/user/hive/warehouse/test
  name default.test
  numFiles 0
  numRows 0
  rawDataSize 0
  serialization.ddl struct test { i32 id}
  serialization.format 1
  serialization.lib 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
  totalSize 0
  transient_lastDdlTime 1609730190
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
name: default.test
  name: default.test
Truncated Path -> Alias:
  /test [test]
Reducer 

[jira] [Commented] (HIVE-24579) Incorrect Result For Groupby With Limit

2021-01-03 Thread Nemon Lou (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17257932#comment-17257932
 ] 

Nemon Lou commented on HIVE-24579:
--

A workaround is hive.limit.pushdown.memory.usage=0 .

 

> Incorrect Result For Groupby With Limit
> ---
>
> Key: HIVE-24579
> URL: https://issues.apache.org/jira/browse/HIVE-24579
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.3.7, 3.1.2, 4.0.0
>Reporter: Nemon Lou
>Priority: Critical
>
> {code:sql}
> create table test(id int);
> explain extended select id,count(*) from test group by id limit 10;
> {code}
> There is an TopN unexpectly for map phase, which casues incorrect result.
> {code:sql}
> STAGE PLANS:
>  Stage: Stage-1
>  Map Reduce
>  Map Operator Tree:
>  TableScan
>  alias: test
>  Statistics: Num rows: 337 Data size: 1350 Basic stats: COMPLETE Column 
> stats: NONE
>  GatherStats: false
>  Select Operator
>  expressions: id (type: int)
>  outputColumnNames: id
>  Statistics: Num rows: 337 Data size: 1350 Basic stats: COMPLETE Column 
> stats: NONE
>  Group By Operator
>  aggregations: count()
>  keys: id (type: int)
>  mode: hash
>  outputColumnNames: _col0, _col1
>  Statistics: Num rows: 337 Data size: 1350 Basic stats: COMPLETE Column 
> stats: NONE
>  Reduce Output Operator
>  key expressions: _col0 (type: int)
>  null sort order: a
>  sort order: +
>  Map-reduce partition columns: _col0 (type: int)
>  Statistics: Num rows: 337 Data size: 1350 Basic stats: COMPLETE Column 
> stats: NONE
>  tag: -1
>  TopN: 10
>  TopN Hash Memory Usage: 0.1
>  value expressions: _col1 (type: bigint)
>  auto parallelism: false
>  Path -> Alias:
>  file:/user/hive/warehouse/test [test]
>  Path -> Partition:
>  file:/user/hive/warehouse/test 
>  Partition
>  base file name: test
>  input format: org.apache.hadoop.mapred.TextInputFormat
>  output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>  properties:
>  COLUMN_STATS_ACCURATE \{"BASIC_STATS":"true"}
>  bucket_count -1
>  column.name.delimiter ,
>  columns id
>  columns.comments 
>  columns.types int
>  file.inputformat org.apache.hadoop.mapred.TextInputFormat
>  file.outputformat org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>  location file:/user/hive/warehouse/test
>  name default.test
>  numFiles 0
>  numRows 0
>  rawDataSize 0
>  serialization.ddl struct test \{ i32 id}
>  serialization.format 1
>  serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>  totalSize 0
>  transient_lastDdlTime 1609730036
>  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>  
>  input format: org.apache.hadoop.mapred.TextInputFormat
>  output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>  properties:
>  COLUMN_STATS_ACCURATE \{"BASIC_STATS":"true"}
>  bucket_count -1
>  column.name.delimiter ,
>  columns id
>  columns.comments 
>  columns.types int
>  file.inputformat org.apache.hadoop.mapred.TextInputFormat
>  file.outputformat org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>  location file:/user/hive/warehouse/test
>  name default.test
>  numFiles 0
>  numRows 0
>  rawDataSize 0
>  serialization.ddl struct test \{ i32 id}
>  serialization.format 1
>  serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>  totalSize 0
>  transient_lastDdlTime 1609730036
>  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>  name: default.test
>  name: default.test
>  Truncated Path -> Alias:
>  /test [test]
>  Needs Tagging: false
>  Reduce Operator Tree:
>  Group By Operator
>  aggregations: count(VALUE._col0)
>  keys: KEY._col0 (type: int)
>  mode: mergepartial
>  outputColumnNames: _col0, _col1
>  Statistics: Num rows: 168 Data size: 672 Basic stats: COMPLETE Column stats: 
> NONE
>  Limit
>  Number of rows: 10
>  Statistics: Num rows: 10 Data size: 40 Basic stats: COMPLETE Column stats: 
> NONE
>  File Output Operator
>  compressed: false
>  GlobalTableId: 0
>  directory: 
> file:/tmp/root/bd08973b-b58c-4185-9072-c1891f67878d/hive_2021-01-04_11-14-01_745_4475755683092435506-1/-mr-10001/.hive-staging_hive_2021-01-04_11-14-01_745_4475755683092435506-1/-ext-10002
>  NumFilesPerFileSink: 1
>  Statistics: Num rows: 10 Data size: 40 Basic stats: COMPLETE Column stats: 
> NONE
>  Stats Publishing Key Prefix: 
> file:/tmp/root/bd08973b-b58c-4185-9072-c1891f67878d/hive_2021-01-04_11-14-01_745_4475755683092435506-1/-mr-10001/.hive-staging_hive_2021-01-04_11-14-01_745_4475755683092435506-1/-ext-10002/
>  table:
>  input format: org.apache.hadoop.mapred.SequenceFileInputFormat
>  output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>  properties:
>  columns _col0,_col1
>  columns.types int:bigint
>  escape.delim \
>  hive.serialization.extend.additional.nesting.levels true
>  serialization.escape.crlf true
>  

[jira] [Commented] (HIVE-18537) [Calcite-CBO] Queries with a nested distinct clause and a windowing function seem to fail with calcite Assertion error

2020-10-23 Thread Nemon Lou (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-18537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17219498#comment-17219498
 ] 

Nemon Lou commented on HIVE-18537:
--

This issue get fixed after upgrade calcite to 1.17.0 or higher.

Master branch can not reproduce this issue any more.

> [Calcite-CBO] Queries with a nested distinct clause and a windowing function 
> seem to fail with calcite Assertion error
> --
>
> Key: HIVE-18537
> URL: https://issues.apache.org/jira/browse/HIVE-18537
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.1.0, 2.3.2, 3.1.2
>Reporter: Amruth Sampath
>Priority: Critical
>
> Sample test case to re-produce the issue. The issue does not occur if 
> *hive.cbo.enable=false*
> {code:java}
> create table test_cbo (
>  `a` BIGINT,
>  `b` STRING,
>  `c` TIMESTAMP,
>  `d` STRING
>  );
> SELECT 1
>  FROM
>  (SELECT
>  DISTINCT
>  a AS a_,
>  b AS b_,
>  rank() over (partition BY a ORDER BY c DESC) AS c_,
>  d AS d_
>  FROM test_cbo
>  WHERE b = 'some_filter' ) n
>  WHERE c_ = 1;
> {code}
> Fails with, 
> {code:java}
> Exception in thread "main" java.lang.AssertionError: Internal error: Cannot 
> add expression of different type to set:
> set type is RecordType(BIGINT a_, INTEGER c_, VARCHAR(2147483647) CHARACTER 
> SET "UTF-16LE" COLLATE "ISO-8859-1$en_US$primary" d_) NOT NULL
> expression type is RecordType(BIGINT a_, VARCHAR(2147483647) CHARACTER SET 
> "UTF-16LE" COLLATE "ISO-8859-1$en_US$primary" c_, INTEGER d_) NOT NULL
> set is rel#112:HiveAggregate.HIVE.[](input=HepRelVertex#121,group={0, 2, 3})
> expression is HiveProject#123{code}
> This might be related to https://issues.apache.org/jira/browse/CALCITE-1868.
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24165) CBO: Query fails after multiple count distinct rewrite

2020-10-23 Thread Nemon Lou (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nemon Lou resolved HIVE-24165.
--
Resolution: Invalid

> CBO: Query fails after multiple count distinct rewrite 
> ---
>
> Key: HIVE-24165
> URL: https://issues.apache.org/jira/browse/HIVE-24165
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 4.0.0
>Reporter: Nemon Lou
>Assignee: Nemon Lou
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24165.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> One way to reproduce:
>  
> {code:sql}
>  CREATE TABLE test(
>  `device_id` string, 
>  `level` string, 
>  `site_id` string, 
>  `user_id` string, 
>  `first_date` string, 
>  `last_date` string,
>  `dt` string) ;
>  set hive.execution.engine=tez;
>  set hive.optimize.distinct.rewrite=true;
>  set hive.cli.print.header=true;
>  select 
>  dt,
>  site_id,
>  count(DISTINCT t1.device_id) as device_tol_cnt,
>  count(DISTINCT case when t1.first_date='2020-09-15' then t1.device_id else 
> null end) as device_add_cnt 
>  from test t1 where dt='2020-09-15' 
>  group by
>  dt,
>  site_id
>  ;
> {code}
>  
> Error log:  
> {code:java}
> Exception in thread "main" java.lang.AssertionError: Cannot add expression of 
> different type to set:
> set type is RecordType(VARCHAR(2147483647) CHARACTER SET "UTF-16LE" COLLATE 
> "ISO-8859-1$en_US$primary" $f2, VARCHAR(2147483647) CHARACTER SET "UTF-16LE" 
> COLLATE "ISO-8859-1$en_US$primary" $f3, BIGINT $f2_0, BIGINT $f3_0) NOT NULL
> expression type is RecordType(VARCHAR(2147483647) CHARACTER SET "UTF-16LE" 
> COLLATE "ISO-8859-1$en_US$primary" $f2, BIGINT $f3, BIGINT $f2_0, BIGINT 
> $f3_0) NOT NULL
> set is rel#85:HiveAggregate.HIVE.[](input=HepRelVertex#84,group={2, 
> 3},agg#0=count($0),agg#1=count($1))
> expression is HiveProject#95
>   at 
> org.apache.calcite.plan.RelOptUtil.verifyTypeEquivalence(RelOptUtil.java:411)
>   at 
> org.apache.calcite.plan.hep.HepRuleCall.transformTo(HepRuleCall.java:57)
>   at 
> org.apache.calcite.plan.RelOptRuleCall.transformTo(RelOptRuleCall.java:234)
>   at 
> org.apache.calcite.rel.rules.AggregateProjectPullUpConstantsRule.onMatch(AggregateProjectPullUpConstantsRule.java:186)
>   at 
> org.apache.calcite.plan.AbstractRelOptPlanner.fireRule(AbstractRelOptPlanner.java:317)
>   at org.apache.calcite.plan.hep.HepPlanner.applyRule(HepPlanner.java:556)
>   at 
> org.apache.calcite.plan.hep.HepPlanner.applyRules(HepPlanner.java:415)
>   at 
> org.apache.calcite.plan.hep.HepPlanner.executeInstruction(HepPlanner.java:280)
>   at 
> org.apache.calcite.plan.hep.HepInstruction$RuleCollection.execute(HepInstruction.java:74)
>   at 
> org.apache.calcite.plan.hep.HepPlanner.executeProgram(HepPlanner.java:211)
>   at 
> org.apache.calcite.plan.hep.HepPlanner.findBestExp(HepPlanner.java:198)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.hepPlan(CalcitePlanner.java:2273)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.applyPreJoinOrderingTransforms(CalcitePlanner.java:2002)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1709)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1609)
>   at org.apache.calcite.tools.Frameworks$1.apply(Frameworks.java:118)
>   at 
> org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:1052)
>   at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:154)
>   at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:111)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1414)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:1430)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:450)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12164)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:330)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:285)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:659)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1826)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1773)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1768)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:126)
>   at 
> 

[jira] [Commented] (HIVE-24165) CBO: Query fails after multiple count distinct rewrite

2020-10-23 Thread Nemon Lou (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17219496#comment-17219496
 ] 

Nemon Lou commented on HIVE-24165:
--

Not able to reproduce in master branch.

After upgrade calcite from 1.16.0 to 1.17.0,this bug also gone for branch3 with 
multi distinct rewrite.

May be fixed in CALCITE-2232

> CBO: Query fails after multiple count distinct rewrite 
> ---
>
> Key: HIVE-24165
> URL: https://issues.apache.org/jira/browse/HIVE-24165
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 4.0.0
>Reporter: Nemon Lou
>Assignee: Nemon Lou
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24165.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> One way to reproduce:
>  
> {code:sql}
>  CREATE TABLE test(
>  `device_id` string, 
>  `level` string, 
>  `site_id` string, 
>  `user_id` string, 
>  `first_date` string, 
>  `last_date` string,
>  `dt` string) ;
>  set hive.execution.engine=tez;
>  set hive.optimize.distinct.rewrite=true;
>  set hive.cli.print.header=true;
>  select 
>  dt,
>  site_id,
>  count(DISTINCT t1.device_id) as device_tol_cnt,
>  count(DISTINCT case when t1.first_date='2020-09-15' then t1.device_id else 
> null end) as device_add_cnt 
>  from test t1 where dt='2020-09-15' 
>  group by
>  dt,
>  site_id
>  ;
> {code}
>  
> Error log:  
> {code:java}
> Exception in thread "main" java.lang.AssertionError: Cannot add expression of 
> different type to set:
> set type is RecordType(VARCHAR(2147483647) CHARACTER SET "UTF-16LE" COLLATE 
> "ISO-8859-1$en_US$primary" $f2, VARCHAR(2147483647) CHARACTER SET "UTF-16LE" 
> COLLATE "ISO-8859-1$en_US$primary" $f3, BIGINT $f2_0, BIGINT $f3_0) NOT NULL
> expression type is RecordType(VARCHAR(2147483647) CHARACTER SET "UTF-16LE" 
> COLLATE "ISO-8859-1$en_US$primary" $f2, BIGINT $f3, BIGINT $f2_0, BIGINT 
> $f3_0) NOT NULL
> set is rel#85:HiveAggregate.HIVE.[](input=HepRelVertex#84,group={2, 
> 3},agg#0=count($0),agg#1=count($1))
> expression is HiveProject#95
>   at 
> org.apache.calcite.plan.RelOptUtil.verifyTypeEquivalence(RelOptUtil.java:411)
>   at 
> org.apache.calcite.plan.hep.HepRuleCall.transformTo(HepRuleCall.java:57)
>   at 
> org.apache.calcite.plan.RelOptRuleCall.transformTo(RelOptRuleCall.java:234)
>   at 
> org.apache.calcite.rel.rules.AggregateProjectPullUpConstantsRule.onMatch(AggregateProjectPullUpConstantsRule.java:186)
>   at 
> org.apache.calcite.plan.AbstractRelOptPlanner.fireRule(AbstractRelOptPlanner.java:317)
>   at org.apache.calcite.plan.hep.HepPlanner.applyRule(HepPlanner.java:556)
>   at 
> org.apache.calcite.plan.hep.HepPlanner.applyRules(HepPlanner.java:415)
>   at 
> org.apache.calcite.plan.hep.HepPlanner.executeInstruction(HepPlanner.java:280)
>   at 
> org.apache.calcite.plan.hep.HepInstruction$RuleCollection.execute(HepInstruction.java:74)
>   at 
> org.apache.calcite.plan.hep.HepPlanner.executeProgram(HepPlanner.java:211)
>   at 
> org.apache.calcite.plan.hep.HepPlanner.findBestExp(HepPlanner.java:198)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.hepPlan(CalcitePlanner.java:2273)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.applyPreJoinOrderingTransforms(CalcitePlanner.java:2002)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1709)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1609)
>   at org.apache.calcite.tools.Frameworks$1.apply(Frameworks.java:118)
>   at 
> org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:1052)
>   at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:154)
>   at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:111)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1414)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:1430)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:450)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12164)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:330)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:285)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:659)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1826)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1773)
>   at 

[jira] [Updated] (HIVE-24165) CBO: Query fails after multiple count distinct rewrite

2020-09-14 Thread Nemon Lou (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nemon Lou updated HIVE-24165:
-
Attachment: HIVE-24165.patch

> CBO: Query fails after multiple count distinct rewrite 
> ---
>
> Key: HIVE-24165
> URL: https://issues.apache.org/jira/browse/HIVE-24165
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 4.0.0
>Reporter: Nemon Lou
>Priority: Major
> Attachments: HIVE-24165.patch
>
>
> One way to reproduce:
>  
> {code:sql}
>  CREATE TABLE test(
>  `device_id` string, 
>  `level` string, 
>  `site_id` string, 
>  `user_id` string, 
>  `first_date` string, 
>  `last_date` string,
>  `dt` string) ;
>  set hive.execution.engine=tez;
>  set hive.optimize.distinct.rewrite=true;
>  set hive.cli.print.header=true;
>  select 
>  dt,
>  site_id,
>  count(DISTINCT t1.device_id) as device_tol_cnt,
>  count(DISTINCT case when t1.first_date='2020-09-15' then t1.device_id else 
> null end) as device_add_cnt 
>  from test t1 where dt='2020-09-15' 
>  group by
>  dt,
>  site_id
>  ;
> {code}
>  
> Error log:  
> {code:java}
> Exception in thread "main" java.lang.AssertionError: Cannot add expression of 
> different type to set:
> set type is RecordType(VARCHAR(2147483647) CHARACTER SET "UTF-16LE" COLLATE 
> "ISO-8859-1$en_US$primary" $f2, VARCHAR(2147483647) CHARACTER SET "UTF-16LE" 
> COLLATE "ISO-8859-1$en_US$primary" $f3, BIGINT $f2_0, BIGINT $f3_0) NOT NULL
> expression type is RecordType(VARCHAR(2147483647) CHARACTER SET "UTF-16LE" 
> COLLATE "ISO-8859-1$en_US$primary" $f2, BIGINT $f3, BIGINT $f2_0, BIGINT 
> $f3_0) NOT NULL
> set is rel#85:HiveAggregate.HIVE.[](input=HepRelVertex#84,group={2, 
> 3},agg#0=count($0),agg#1=count($1))
> expression is HiveProject#95
>   at 
> org.apache.calcite.plan.RelOptUtil.verifyTypeEquivalence(RelOptUtil.java:411)
>   at 
> org.apache.calcite.plan.hep.HepRuleCall.transformTo(HepRuleCall.java:57)
>   at 
> org.apache.calcite.plan.RelOptRuleCall.transformTo(RelOptRuleCall.java:234)
>   at 
> org.apache.calcite.rel.rules.AggregateProjectPullUpConstantsRule.onMatch(AggregateProjectPullUpConstantsRule.java:186)
>   at 
> org.apache.calcite.plan.AbstractRelOptPlanner.fireRule(AbstractRelOptPlanner.java:317)
>   at org.apache.calcite.plan.hep.HepPlanner.applyRule(HepPlanner.java:556)
>   at 
> org.apache.calcite.plan.hep.HepPlanner.applyRules(HepPlanner.java:415)
>   at 
> org.apache.calcite.plan.hep.HepPlanner.executeInstruction(HepPlanner.java:280)
>   at 
> org.apache.calcite.plan.hep.HepInstruction$RuleCollection.execute(HepInstruction.java:74)
>   at 
> org.apache.calcite.plan.hep.HepPlanner.executeProgram(HepPlanner.java:211)
>   at 
> org.apache.calcite.plan.hep.HepPlanner.findBestExp(HepPlanner.java:198)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.hepPlan(CalcitePlanner.java:2273)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.applyPreJoinOrderingTransforms(CalcitePlanner.java:2002)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1709)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1609)
>   at org.apache.calcite.tools.Frameworks$1.apply(Frameworks.java:118)
>   at 
> org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:1052)
>   at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:154)
>   at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:111)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1414)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:1430)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:450)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12164)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:330)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:285)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:659)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1826)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1773)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1768)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:126)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:214)
>   at 
> 

[jira] [Commented] (HIVE-24165) CBO: Query fails after multiple count distinct rewrite

2020-09-14 Thread Nemon Lou (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17195807#comment-17195807
 ] 

Nemon Lou commented on HIVE-24165:
--

In fact , i reproduce this issue by apply HIVE-22448 back to Hive branch 3.1.2. 
Master branch should have the same issue.

AggregateProjectPullUpConstantsRule expects groupSet in Aggregate to be ordered 
and start with 0, like \{0,1,2}.but after multiple distinct rewrite, groupSet 
is \{3,4,5}.

 

> CBO: Query fails after multiple count distinct rewrite 
> ---
>
> Key: HIVE-24165
> URL: https://issues.apache.org/jira/browse/HIVE-24165
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 4.0.0
>Reporter: Nemon Lou
>Priority: Major
>
> One way to reproduce:
>  
> {code:sql}
>  CREATE TABLE test(
>  `device_id` string, 
>  `level` string, 
>  `site_id` string, 
>  `user_id` string, 
>  `first_date` string, 
>  `last_date` string,
>  `dt` string) ;
>  set hive.execution.engine=tez;
>  set hive.optimize.distinct.rewrite=true;
>  set hive.cli.print.header=true;
>  select 
>  dt,
>  site_id,
>  count(DISTINCT t1.device_id) as device_tol_cnt,
>  count(DISTINCT case when t1.first_date='2020-09-15' then t1.device_id else 
> null end) as device_add_cnt 
>  from test t1 where dt='2020-09-15' 
>  group by
>  dt,
>  site_id
>  ;
> {code}
>  
> Error log:  
> {code:java}
> Exception in thread "main" java.lang.AssertionError: Cannot add expression of 
> different type to set:
> set type is RecordType(VARCHAR(2147483647) CHARACTER SET "UTF-16LE" COLLATE 
> "ISO-8859-1$en_US$primary" $f2, VARCHAR(2147483647) CHARACTER SET "UTF-16LE" 
> COLLATE "ISO-8859-1$en_US$primary" $f3, BIGINT $f2_0, BIGINT $f3_0) NOT NULL
> expression type is RecordType(VARCHAR(2147483647) CHARACTER SET "UTF-16LE" 
> COLLATE "ISO-8859-1$en_US$primary" $f2, BIGINT $f3, BIGINT $f2_0, BIGINT 
> $f3_0) NOT NULL
> set is rel#85:HiveAggregate.HIVE.[](input=HepRelVertex#84,group={2, 
> 3},agg#0=count($0),agg#1=count($1))
> expression is HiveProject#95
>   at 
> org.apache.calcite.plan.RelOptUtil.verifyTypeEquivalence(RelOptUtil.java:411)
>   at 
> org.apache.calcite.plan.hep.HepRuleCall.transformTo(HepRuleCall.java:57)
>   at 
> org.apache.calcite.plan.RelOptRuleCall.transformTo(RelOptRuleCall.java:234)
>   at 
> org.apache.calcite.rel.rules.AggregateProjectPullUpConstantsRule.onMatch(AggregateProjectPullUpConstantsRule.java:186)
>   at 
> org.apache.calcite.plan.AbstractRelOptPlanner.fireRule(AbstractRelOptPlanner.java:317)
>   at org.apache.calcite.plan.hep.HepPlanner.applyRule(HepPlanner.java:556)
>   at 
> org.apache.calcite.plan.hep.HepPlanner.applyRules(HepPlanner.java:415)
>   at 
> org.apache.calcite.plan.hep.HepPlanner.executeInstruction(HepPlanner.java:280)
>   at 
> org.apache.calcite.plan.hep.HepInstruction$RuleCollection.execute(HepInstruction.java:74)
>   at 
> org.apache.calcite.plan.hep.HepPlanner.executeProgram(HepPlanner.java:211)
>   at 
> org.apache.calcite.plan.hep.HepPlanner.findBestExp(HepPlanner.java:198)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.hepPlan(CalcitePlanner.java:2273)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.applyPreJoinOrderingTransforms(CalcitePlanner.java:2002)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1709)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1609)
>   at org.apache.calcite.tools.Frameworks$1.apply(Frameworks.java:118)
>   at 
> org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:1052)
>   at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:154)
>   at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:111)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1414)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:1430)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:450)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12164)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:330)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:285)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:659)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1826)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1773)
>   at 

[jira] [Updated] (HIVE-24165) CBO: Query fails after multiple count distinct rewrite

2020-09-14 Thread Nemon Lou (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nemon Lou updated HIVE-24165:
-
Description: 
One way to reproduce:
 
{code:sql}

 CREATE TABLE test(
 `device_id` string, 
 `level` string, 
 `site_id` string, 
 `user_id` string, 
 `first_date` string, 
 `last_date` string,
 `dt` string) ;

 set hive.execution.engine=tez;
 set hive.optimize.distinct.rewrite=true;
 set hive.cli.print.header=true;

 select 
 dt,
 site_id,
 count(DISTINCT t1.device_id) as device_tol_cnt,
 count(DISTINCT case when t1.first_date='2020-09-15' then t1.device_id else 
null end) as device_add_cnt 
 from test t1 where dt='2020-09-15' 
 group by
 dt,
 site_id
 ;
{code}
 

Error log:  

{code:java}
Exception in thread "main" java.lang.AssertionError: Cannot add expression of 
different type to set:
set type is RecordType(VARCHAR(2147483647) CHARACTER SET "UTF-16LE" COLLATE 
"ISO-8859-1$en_US$primary" $f2, VARCHAR(2147483647) CHARACTER SET "UTF-16LE" 
COLLATE "ISO-8859-1$en_US$primary" $f3, BIGINT $f2_0, BIGINT $f3_0) NOT NULL
expression type is RecordType(VARCHAR(2147483647) CHARACTER SET "UTF-16LE" 
COLLATE "ISO-8859-1$en_US$primary" $f2, BIGINT $f3, BIGINT $f2_0, BIGINT $f3_0) 
NOT NULL
set is rel#85:HiveAggregate.HIVE.[](input=HepRelVertex#84,group={2, 
3},agg#0=count($0),agg#1=count($1))
expression is HiveProject#95
at 
org.apache.calcite.plan.RelOptUtil.verifyTypeEquivalence(RelOptUtil.java:411)
at 
org.apache.calcite.plan.hep.HepRuleCall.transformTo(HepRuleCall.java:57)
at 
org.apache.calcite.plan.RelOptRuleCall.transformTo(RelOptRuleCall.java:234)
at 
org.apache.calcite.rel.rules.AggregateProjectPullUpConstantsRule.onMatch(AggregateProjectPullUpConstantsRule.java:186)
at 
org.apache.calcite.plan.AbstractRelOptPlanner.fireRule(AbstractRelOptPlanner.java:317)
at org.apache.calcite.plan.hep.HepPlanner.applyRule(HepPlanner.java:556)
at 
org.apache.calcite.plan.hep.HepPlanner.applyRules(HepPlanner.java:415)
at 
org.apache.calcite.plan.hep.HepPlanner.executeInstruction(HepPlanner.java:280)
at 
org.apache.calcite.plan.hep.HepInstruction$RuleCollection.execute(HepInstruction.java:74)
at 
org.apache.calcite.plan.hep.HepPlanner.executeProgram(HepPlanner.java:211)
at 
org.apache.calcite.plan.hep.HepPlanner.findBestExp(HepPlanner.java:198)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.hepPlan(CalcitePlanner.java:2273)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.applyPreJoinOrderingTransforms(CalcitePlanner.java:2002)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1709)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1609)
at org.apache.calcite.tools.Frameworks$1.apply(Frameworks.java:118)
at 
org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:1052)
at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:154)
at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:111)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1414)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:1430)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:450)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12164)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:330)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:285)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:659)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1826)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1773)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1768)
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:126)
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:214)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:239)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:188)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:402)
at 
org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:683)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 

[jira] [Updated] (HIVE-24165) CBO: Query fails after multiple count distinct rewrite

2020-09-14 Thread Nemon Lou (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nemon Lou updated HIVE-24165:
-
Description: 
One way to reproduce:
 
{code:sql}

 CREATE TABLE test(
 `device_id` string, 
 `level` string, 
 `site_id` string, 
 `user_id` string, 
 `first_date` string, 
 `last_date` string,
 `dt` string) ;

 set hive.execution.engine=tez;
 set hive.optimize.distinct.rewrite=true;
 set hive.cli.print.header=true;

 select 
 dt,
 site_id,
 count(DISTINCT t1.device_id) as device_tol_cnt,
 count(DISTINCT case when t1.first_date='2020-09-15' then t1.device_id else 
null end) as device_add_cnt 
 from test t1 where dt='2020-09-15' 
 group by
 dt,
 site_id
 ;
{code}
 

Error log:  

```
Exception in thread "main" java.lang.AssertionError: Cannot add expression of 
different type to set:
set type is RecordType(VARCHAR(2147483647) CHARACTER SET "UTF-16LE" COLLATE 
"ISO-8859-1$en_US$primary" $f2, VARCHAR(2147483647) CHARACTER SET "UTF-16LE" 
COLLATE "ISO-8859-1$en_US$primary" $f3, BIGINT $f2_0, BIGINT $f3_0) NOT NULL
expression type is RecordType(VARCHAR(2147483647) CHARACTER SET "UTF-16LE" 
COLLATE "ISO-8859-1$en_US$primary" $f2, BIGINT $f3, BIGINT $f2_0, BIGINT $f3_0) 
NOT NULL
set is rel#85:HiveAggregate.HIVE.[](input=HepRelVertex#84,group={2, 
3},agg#0=count($0),agg#1=count($1))
expression is HiveProject#95
at 
org.apache.calcite.plan.RelOptUtil.verifyTypeEquivalence(RelOptUtil.java:411)
at 
org.apache.calcite.plan.hep.HepRuleCall.transformTo(HepRuleCall.java:57)
at 
org.apache.calcite.plan.RelOptRuleCall.transformTo(RelOptRuleCall.java:234)
at 
org.apache.calcite.rel.rules.AggregateProjectPullUpConstantsRule.onMatch(AggregateProjectPullUpConstantsRule.java:186)
at 
org.apache.calcite.plan.AbstractRelOptPlanner.fireRule(AbstractRelOptPlanner.java:317)
at org.apache.calcite.plan.hep.HepPlanner.applyRule(HepPlanner.java:556)
at 
org.apache.calcite.plan.hep.HepPlanner.applyRules(HepPlanner.java:415)
at 
org.apache.calcite.plan.hep.HepPlanner.executeInstruction(HepPlanner.java:280)
at 
org.apache.calcite.plan.hep.HepInstruction$RuleCollection.execute(HepInstruction.java:74)
at 
org.apache.calcite.plan.hep.HepPlanner.executeProgram(HepPlanner.java:211)
at 
org.apache.calcite.plan.hep.HepPlanner.findBestExp(HepPlanner.java:198)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.hepPlan(CalcitePlanner.java:2273)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.applyPreJoinOrderingTransforms(CalcitePlanner.java:2002)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1709)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1609)
at org.apache.calcite.tools.Frameworks$1.apply(Frameworks.java:118)
at 
org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:1052)
at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:154)
at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:111)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1414)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:1430)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:450)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12164)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:330)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:285)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:659)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1826)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1773)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1768)
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:126)
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:214)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:239)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:188)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:402)
at 
org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:683)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 

[jira] [Updated] (HIVE-18537) [Calcite-CBO] Queries with a nested distinct clause and a windowing function seem to fail with calcite Assertion error

2020-09-14 Thread Nemon Lou (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-18537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nemon Lou updated HIVE-18537:
-
Affects Version/s: 3.1.2

> [Calcite-CBO] Queries with a nested distinct clause and a windowing function 
> seem to fail with calcite Assertion error
> --
>
> Key: HIVE-18537
> URL: https://issues.apache.org/jira/browse/HIVE-18537
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.1.0, 2.3.2, 3.1.2
>Reporter: Amruth Sampath
>Priority: Critical
>
> Sample test case to re-produce the issue. The issue does not occur if 
> *hive.cbo.enable=false*
> {code:java}
> create table test_cbo (
>  `a` BIGINT,
>  `b` STRING,
>  `c` TIMESTAMP,
>  `d` STRING
>  );
> SELECT 1
>  FROM
>  (SELECT
>  DISTINCT
>  a AS a_,
>  b AS b_,
>  rank() over (partition BY a ORDER BY c DESC) AS c_,
>  d AS d_
>  FROM test_cbo
>  WHERE b = 'some_filter' ) n
>  WHERE c_ = 1;
> {code}
> Fails with, 
> {code:java}
> Exception in thread "main" java.lang.AssertionError: Internal error: Cannot 
> add expression of different type to set:
> set type is RecordType(BIGINT a_, INTEGER c_, VARCHAR(2147483647) CHARACTER 
> SET "UTF-16LE" COLLATE "ISO-8859-1$en_US$primary" d_) NOT NULL
> expression type is RecordType(BIGINT a_, VARCHAR(2147483647) CHARACTER SET 
> "UTF-16LE" COLLATE "ISO-8859-1$en_US$primary" c_, INTEGER d_) NOT NULL
> set is rel#112:HiveAggregate.HIVE.[](input=HepRelVertex#121,group={0, 2, 3})
> expression is HiveProject#123{code}
> This might be related to https://issues.apache.org/jira/browse/CALCITE-1868.
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-16839) Unbalanced calls to openTransaction/commitTransaction when alter the same partition concurrently

2017-06-23 Thread Nemon Lou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061759#comment-16061759
 ] 

Nemon Lou commented on HIVE-16839:
--

Our system does not support concurrency. 
When users submit both drop partition and modify the same partition 
concurrently by accident,then got uncommitted transaction.
For postgresql as backend,there will be a connection in state of idle in 
transaction.

> Unbalanced calls to openTransaction/commitTransaction when alter the same 
> partition concurrently
> 
>
> Key: HIVE-16839
> URL: https://issues.apache.org/jira/browse/HIVE-16839
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Nemon Lou
>Assignee: Vihang Karajgaonkar
>
> SQL to reproduce:
> prepare:
> {noformat}
>  hdfs dfs -mkdir -p 
> /hzsrc/external/writing_dc/ltgsm/16e7a9b2-21a1-3f4f-8061-bc3395281627
>  1,create external table tb_ltgsm_external (id int) PARTITIONED by (cp 
> string,ld string);
> {noformat}
> open one beeline run these two sql many times 
> {noformat} 2,ALTER TABLE tb_ltgsm_external ADD IF NOT EXISTS PARTITION 
> (cp=2017060513,ld=2017060610);
>  3,ALTER TABLE tb_ltgsm_external PARTITION (cp=2017060513,ld=2017060610) SET 
> LOCATION 
> 'hdfs://hacluster/hzsrc/external/writing_dc/ltgsm/16e7a9b2-21a1-3f4f-8061-bc3395281627';
> {noformat}
> open another beeline to run this sql many times at the same time.
> {noformat}
>  4,ALTER TABLE tb_ltgsm_external DROP PARTITION (cp=2017060513,ld=2017060610);
> {noformat}
> MetaStore logs:
> {noformat}
> 2017-06-06 21:58:34,213 | ERROR | pool-6-thread-197 | Retrying HMSHandler 
> after 2000 ms (attempt 1 of 10) with error: 
> javax.jdo.JDOObjectNotFoundException: No such database row
> FailedObject:49[OID]org.apache.hadoop.hive.metastore.model.MStorageDescriptor
>   at 
> org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:475)
>   at 
> org.datanucleus.api.jdo.JDOAdapter.getApiExceptionForNucleusException(JDOAdapter.java:1158)
>   at 
> org.datanucleus.state.JDOStateManager.isLoaded(JDOStateManager.java:3231)
>   at 
> org.apache.hadoop.hive.metastore.model.MStorageDescriptor.jdoGetcd(MStorageDescriptor.java)
>   at 
> org.apache.hadoop.hive.metastore.model.MStorageDescriptor.getCD(MStorageDescriptor.java:184)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.convertToStorageDescriptor(ObjectStore.java:1282)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.convertToStorageDescriptor(ObjectStore.java:1299)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.convertToPart(ObjectStore.java:1680)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartition(ObjectStore.java:1586)
>   at sun.reflect.GeneratedMethodAccessor35.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:98)
>   at com.sun.proxy.$Proxy0.getPartition(Unknown Source)
>   at 
> org.apache.hadoop.hive.metastore.HiveAlterHandler.alterPartitions(HiveAlterHandler.java:538)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.alter_partitions(HiveMetaStore.java:3317)
>   at sun.reflect.GeneratedMethodAccessor37.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:102)
>   at com.sun.proxy.$Proxy12.alter_partitions(Unknown Source)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$alter_partitions.getResult(ThriftHiveMetastore.java:9963)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$alter_partitions.getResult(ThriftHiveMetastore.java:9947)
>   at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
>   at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110)
>   at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1673)
>   at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:118)
>   at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285)
>   at 
> 

[jira] [Commented] (HIVE-16907) "INSERT INTO" overwrite old data when destination table encapsulated by backquote

2017-06-15 Thread Nemon Lou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16050173#comment-16050173
 ] 

Nemon Lou commented on HIVE-16907:
--

Refer to this method :
https://github.com/apache/hive/blob/release-2.0.0/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java#L322
 tdb.t1 is treated as a table name.
--> 'tdb.tdb.t1' is putted in insertIntoTables of QBParseInfo 
--> QBParseInfo.isInsertIntoTable('tdb.t1') returns false
-->LoadTableDesc.setReplace(!qb.getParseInfo().isInsertIntoTable(dest_tab.getDbName(),
dest_tab.getTableName()))  setting replace to true.



>  "INSERT INTO"  overwrite old data when destination table encapsulated by 
> backquote 
> 
>
> Key: HIVE-16907
> URL: https://issues.apache.org/jira/browse/HIVE-16907
> Project: Hive
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 1.1.0, 2.1.1
>Reporter: Nemon Lou
>
> A way to reproduce:
> {noformat}
> create database tdb;
> use tdb;
> create table t1(id int);
> create table t2(id int);
> explain insert into `tdb.t1` select * from t2;
> {noformat}
> {noformat}
> +---+
> |  
> Explain  |
> +---+
> | STAGE DEPENDENCIES: 
>   |
> |   Stage-1 is a root stage   
>   |
> |   Stage-6 depends on stages: Stage-1 , consists of Stage-3, Stage-2, 
> Stage-4  |
> |   Stage-3   
>   |
> |   Stage-0 depends on stages: Stage-3, Stage-2, Stage-5  
>   |
> |   Stage-2   
>   |
> |   Stage-4   
>   |
> |   Stage-5 depends on stages: Stage-4
>   |
> | 
>   |
> | STAGE PLANS:
>   |
> |   Stage: Stage-1
>   |
> | Map Reduce  
>   |
> |   Map Operator Tree:
>   |
> |   TableScan 
>   |
> | alias: t2   
>   |
> | Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
> stats: NONE |
> | Select Operator 
>   |
> |   expressions: id (type: int)   
>   |
> |   outputColumnNames: _col0  
>   |
> |   Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
> stats: NONE   |
> |   File Output Operator  
>   

[jira] [Commented] (HIVE-16907) "INSERT INTO" overwrite old data when destination table encapsulated by backquote

2017-06-15 Thread Nemon Lou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16050146#comment-16050146
 ] 

Nemon Lou commented on HIVE-16907:
--

AST with backquote:
{noformat}
| TOK_QUERY 


 |
|TOK_FROM   


 |
|   TOK_TABREF  


 |
|  TOK_TABNAME  


 |
| t2


 |
|TOK_INSERT 


 |
|   TOK_INSERT_INTO 


 |
|  TOK_TAB  


 |
| TOK_TABNAME   


 |
|tdb.t1 


 |
|   TOK_SELECT  


 |
|  TOK_SELEXPR  


 |
| TOK_ALLCOLREF  
{noformat}

AST without backquote:
{noformat}
|
| TOK_QUERY 


 |
|TOK_FROM   


 |
|   TOK_TABREF  

   

[jira] [Commented] (HIVE-16839) Unbalanced calls to openTransaction/commitTransaction when alter the same partition concurrently

2017-06-07 Thread Nemon Lou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16042020#comment-16042020
 ] 

Nemon Lou commented on HIVE-16839:
--

I have assigned it to you.Thanks.

> Unbalanced calls to openTransaction/commitTransaction when alter the same 
> partition concurrently
> 
>
> Key: HIVE-16839
> URL: https://issues.apache.org/jira/browse/HIVE-16839
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Nemon Lou
>Assignee: Vihang Karajgaonkar
>
> SQL to reproduce:
> prepare:
> {noformat}
>  hdfs dfs -mkdir -p 
> /hzsrc/external/writing_dc/ltgsm/16e7a9b2-21a1-3f4f-8061-bc3395281627
>  1,create external table tb_ltgsm_external (id int) PARTITIONED by (cp 
> string,ld string);
> {noformat}
> open one beeline run these two sql many times 
> {noformat} 2,ALTER TABLE tb_ltgsm_external ADD IF NOT EXISTS PARTITION 
> (cp=2017060513,ld=2017060610);
>  3,ALTER TABLE tb_ltgsm_external PARTITION (cp=2017060513,ld=2017060610) SET 
> LOCATION 
> 'hdfs://hacluster/hzsrc/external/writing_dc/ltgsm/16e7a9b2-21a1-3f4f-8061-bc3395281627';
> {noformat}
> open another beeline to run this sql many times at the same time.
> {noformat}
>  4,ALTER TABLE tb_ltgsm_external DROP PARTITION (cp=2017060513,ld=2017060610);
> {noformat}
> MetaStore logs:
> {noformat}
> 2017-06-06 21:58:34,213 | ERROR | pool-6-thread-197 | Retrying HMSHandler 
> after 2000 ms (attempt 1 of 10) with error: 
> javax.jdo.JDOObjectNotFoundException: No such database row
> FailedObject:49[OID]org.apache.hadoop.hive.metastore.model.MStorageDescriptor
>   at 
> org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:475)
>   at 
> org.datanucleus.api.jdo.JDOAdapter.getApiExceptionForNucleusException(JDOAdapter.java:1158)
>   at 
> org.datanucleus.state.JDOStateManager.isLoaded(JDOStateManager.java:3231)
>   at 
> org.apache.hadoop.hive.metastore.model.MStorageDescriptor.jdoGetcd(MStorageDescriptor.java)
>   at 
> org.apache.hadoop.hive.metastore.model.MStorageDescriptor.getCD(MStorageDescriptor.java:184)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.convertToStorageDescriptor(ObjectStore.java:1282)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.convertToStorageDescriptor(ObjectStore.java:1299)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.convertToPart(ObjectStore.java:1680)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartition(ObjectStore.java:1586)
>   at sun.reflect.GeneratedMethodAccessor35.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:98)
>   at com.sun.proxy.$Proxy0.getPartition(Unknown Source)
>   at 
> org.apache.hadoop.hive.metastore.HiveAlterHandler.alterPartitions(HiveAlterHandler.java:538)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.alter_partitions(HiveMetaStore.java:3317)
>   at sun.reflect.GeneratedMethodAccessor37.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:102)
>   at com.sun.proxy.$Proxy12.alter_partitions(Unknown Source)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$alter_partitions.getResult(ThriftHiveMetastore.java:9963)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$alter_partitions.getResult(ThriftHiveMetastore.java:9947)
>   at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
>   at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110)
>   at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1673)
>   at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:118)
>   at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> 

[jira] [Assigned] (HIVE-16839) Unbalanced calls to openTransaction/commitTransaction when alter the same partition concurrently

2017-06-07 Thread Nemon Lou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nemon Lou reassigned HIVE-16839:


Assignee: Vihang Karajgaonkar

> Unbalanced calls to openTransaction/commitTransaction when alter the same 
> partition concurrently
> 
>
> Key: HIVE-16839
> URL: https://issues.apache.org/jira/browse/HIVE-16839
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Nemon Lou
>Assignee: Vihang Karajgaonkar
>
> SQL to reproduce:
> prepare:
> {noformat}
>  hdfs dfs -mkdir -p 
> /hzsrc/external/writing_dc/ltgsm/16e7a9b2-21a1-3f4f-8061-bc3395281627
>  1,create external table tb_ltgsm_external (id int) PARTITIONED by (cp 
> string,ld string);
> {noformat}
> open one beeline run these two sql many times 
> {noformat} 2,ALTER TABLE tb_ltgsm_external ADD IF NOT EXISTS PARTITION 
> (cp=2017060513,ld=2017060610);
>  3,ALTER TABLE tb_ltgsm_external PARTITION (cp=2017060513,ld=2017060610) SET 
> LOCATION 
> 'hdfs://hacluster/hzsrc/external/writing_dc/ltgsm/16e7a9b2-21a1-3f4f-8061-bc3395281627';
> {noformat}
> open another beeline to run this sql many times at the same time.
> {noformat}
>  4,ALTER TABLE tb_ltgsm_external DROP PARTITION (cp=2017060513,ld=2017060610);
> {noformat}
> MetaStore logs:
> {noformat}
> 2017-06-06 21:58:34,213 | ERROR | pool-6-thread-197 | Retrying HMSHandler 
> after 2000 ms (attempt 1 of 10) with error: 
> javax.jdo.JDOObjectNotFoundException: No such database row
> FailedObject:49[OID]org.apache.hadoop.hive.metastore.model.MStorageDescriptor
>   at 
> org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:475)
>   at 
> org.datanucleus.api.jdo.JDOAdapter.getApiExceptionForNucleusException(JDOAdapter.java:1158)
>   at 
> org.datanucleus.state.JDOStateManager.isLoaded(JDOStateManager.java:3231)
>   at 
> org.apache.hadoop.hive.metastore.model.MStorageDescriptor.jdoGetcd(MStorageDescriptor.java)
>   at 
> org.apache.hadoop.hive.metastore.model.MStorageDescriptor.getCD(MStorageDescriptor.java:184)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.convertToStorageDescriptor(ObjectStore.java:1282)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.convertToStorageDescriptor(ObjectStore.java:1299)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.convertToPart(ObjectStore.java:1680)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartition(ObjectStore.java:1586)
>   at sun.reflect.GeneratedMethodAccessor35.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:98)
>   at com.sun.proxy.$Proxy0.getPartition(Unknown Source)
>   at 
> org.apache.hadoop.hive.metastore.HiveAlterHandler.alterPartitions(HiveAlterHandler.java:538)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.alter_partitions(HiveMetaStore.java:3317)
>   at sun.reflect.GeneratedMethodAccessor37.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:102)
>   at com.sun.proxy.$Proxy12.alter_partitions(Unknown Source)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$alter_partitions.getResult(ThriftHiveMetastore.java:9963)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$alter_partitions.getResult(ThriftHiveMetastore.java:9947)
>   at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
>   at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110)
>   at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1673)
>   at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:118)
>   at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> NestedThrowablesStackTrace:
> No such database row
> 

[jira] [Commented] (HIVE-16839) Unbalanced calls to openTransaction/commitTransaction when alter the same partition concurrently

2017-06-06 Thread Nemon Lou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16040007#comment-16040007
 ] 

Nemon Lou commented on HIVE-16839:
--

Seems that we need a rollbackTransaction in method getPartition for ObjectStore:
{code:java}
  @Override
  public Partition getPartition(String dbName, String tableName,
  List part_vals) throws NoSuchObjectException, MetaException {
openTransaction();
Partition part = convertToPart(getMPartition(dbName, tableName, part_vals));
commitTransaction();
if(part == null) {
  throw new NoSuchObjectException("partition values="
  + part_vals.toString());
}
part.setValues(part_vals);
return part;
  }
{code}

> Unbalanced calls to openTransaction/commitTransaction when alter the same 
> partition concurrently
> 
>
> Key: HIVE-16839
> URL: https://issues.apache.org/jira/browse/HIVE-16839
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Nemon Lou
>
> SQL to reproduce:
> prepare:
> {noformat}
>  hdfs dfs -mkdir -p 
> /hzsrc/external/writing_dc/ltgsm/16e7a9b2-21a1-3f4f-8061-bc3395281627
>  1,create external table tb_ltgsm_external (id int) PARTITIONED by (cp 
> string,ld string);
> {noformat}
> open one beeline run these two sql many times 
> {noformat} 2,ALTER TABLE tb_ltgsm_external ADD IF NOT EXISTS PARTITION 
> (cp=2017060513,ld=2017060610);
>  3,ALTER TABLE tb_ltgsm_external PARTITION (cp=2017060513,ld=2017060610) SET 
> LOCATION 
> 'hdfs://hacluster/hzsrc/external/writing_dc/ltgsm/16e7a9b2-21a1-3f4f-8061-bc3395281627';
> {noformat}
> open another beeline to run this sql many times at the same time.
> {noformat}
>  4,ALTER TABLE tb_ltgsm_external DROP PARTITION (cp=2017060513,ld=2017060610);
> {noformat}
> MetaStore logs:
> {noformat}
> 2017-06-06 21:58:34,213 | ERROR | pool-6-thread-197 | Retrying HMSHandler 
> after 2000 ms (attempt 1 of 10) with error: 
> javax.jdo.JDOObjectNotFoundException: No such database row
> FailedObject:49[OID]org.apache.hadoop.hive.metastore.model.MStorageDescriptor
>   at 
> org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:475)
>   at 
> org.datanucleus.api.jdo.JDOAdapter.getApiExceptionForNucleusException(JDOAdapter.java:1158)
>   at 
> org.datanucleus.state.JDOStateManager.isLoaded(JDOStateManager.java:3231)
>   at 
> org.apache.hadoop.hive.metastore.model.MStorageDescriptor.jdoGetcd(MStorageDescriptor.java)
>   at 
> org.apache.hadoop.hive.metastore.model.MStorageDescriptor.getCD(MStorageDescriptor.java:184)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.convertToStorageDescriptor(ObjectStore.java:1282)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.convertToStorageDescriptor(ObjectStore.java:1299)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.convertToPart(ObjectStore.java:1680)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartition(ObjectStore.java:1586)
>   at sun.reflect.GeneratedMethodAccessor35.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:98)
>   at com.sun.proxy.$Proxy0.getPartition(Unknown Source)
>   at 
> org.apache.hadoop.hive.metastore.HiveAlterHandler.alterPartitions(HiveAlterHandler.java:538)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.alter_partitions(HiveMetaStore.java:3317)
>   at sun.reflect.GeneratedMethodAccessor37.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:102)
>   at com.sun.proxy.$Proxy12.alter_partitions(Unknown Source)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$alter_partitions.getResult(ThriftHiveMetastore.java:9963)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$alter_partitions.getResult(ThriftHiveMetastore.java:9947)
>   at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
>   at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110)
>   at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> 

[jira] [Updated] (HIVE-16839) Unbalanced calls to openTransaction/commitTransaction when alter the same partition concurrently

2017-06-06 Thread Nemon Lou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nemon Lou updated HIVE-16839:
-
Description: 
SQL to reproduce:
prepare:
{noformat}
 hdfs dfs -mkdir -p 
/hzsrc/external/writing_dc/ltgsm/16e7a9b2-21a1-3f4f-8061-bc3395281627
 1,create external table tb_ltgsm_external (id int) PARTITIONED by (cp 
string,ld string);
{noformat}
open one beeline run these two sql many times 
{noformat} 2,ALTER TABLE tb_ltgsm_external ADD IF NOT EXISTS PARTITION 
(cp=2017060513,ld=2017060610);
 3,ALTER TABLE tb_ltgsm_external PARTITION (cp=2017060513,ld=2017060610) SET 
LOCATION 
'hdfs://hacluster/hzsrc/external/writing_dc/ltgsm/16e7a9b2-21a1-3f4f-8061-bc3395281627';
{noformat}
open another beeline to run this sql many times at the same time.
{noformat}
 4,ALTER TABLE tb_ltgsm_external DROP PARTITION (cp=2017060513,ld=2017060610);
{noformat}

MetaStore logs:
{noformat}
2017-06-06 21:58:34,213 | ERROR | pool-6-thread-197 | Retrying HMSHandler after 
2000 ms (attempt 1 of 10) with error: javax.jdo.JDOObjectNotFoundException: No 
such database row
FailedObject:49[OID]org.apache.hadoop.hive.metastore.model.MStorageDescriptor
at 
org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:475)
at 
org.datanucleus.api.jdo.JDOAdapter.getApiExceptionForNucleusException(JDOAdapter.java:1158)
at 
org.datanucleus.state.JDOStateManager.isLoaded(JDOStateManager.java:3231)
at 
org.apache.hadoop.hive.metastore.model.MStorageDescriptor.jdoGetcd(MStorageDescriptor.java)
at 
org.apache.hadoop.hive.metastore.model.MStorageDescriptor.getCD(MStorageDescriptor.java:184)
at 
org.apache.hadoop.hive.metastore.ObjectStore.convertToStorageDescriptor(ObjectStore.java:1282)
at 
org.apache.hadoop.hive.metastore.ObjectStore.convertToStorageDescriptor(ObjectStore.java:1299)
at 
org.apache.hadoop.hive.metastore.ObjectStore.convertToPart(ObjectStore.java:1680)
at 
org.apache.hadoop.hive.metastore.ObjectStore.getPartition(ObjectStore.java:1586)
at sun.reflect.GeneratedMethodAccessor35.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at 
org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:98)
at com.sun.proxy.$Proxy0.getPartition(Unknown Source)
at 
org.apache.hadoop.hive.metastore.HiveAlterHandler.alterPartitions(HiveAlterHandler.java:538)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.alter_partitions(HiveMetaStore.java:3317)
at sun.reflect.GeneratedMethodAccessor37.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:102)
at com.sun.proxy.$Proxy12.alter_partitions(Unknown Source)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$alter_partitions.getResult(ThriftHiveMetastore.java:9963)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$alter_partitions.getResult(ThriftHiveMetastore.java:9947)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at 
org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110)
at 
org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1673)
at 
org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:118)
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
NestedThrowablesStackTrace:
No such database row
org.datanucleus.exceptions.NucleusObjectNotFoundException: No such database row
at 
org.datanucleus.store.rdbms.request.FetchRequest.execute(FetchRequest.java:357)
at 
org.datanucleus.store.rdbms.RDBMSPersistenceHandler.fetchObject(RDBMSPersistenceHandler.java:324)
at 
org.datanucleus.state.AbstractStateManager.loadFieldsFromDatastore(AbstractStateManager.java:1120)
at 
org.datanucleus.state.JDOStateManager.loadSpecifiedFields(JDOStateManager.java:2916)
at 
org.datanucleus.state.JDOStateManager.isLoaded(JDOStateManager.java:3219)
at 

[jira] [Commented] (HIVE-12614) RESET command does not close spark session

2017-04-10 Thread Nemon Lou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15963745#comment-15963745
 ] 

Nemon Lou commented on HIVE-12614:
--

[~stakiar] Thanks for taking it over. :)

> RESET command does not close spark session
> --
>
> Key: HIVE-12614
> URL: https://issues.apache.org/jira/browse/HIVE-12614
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 1.3.0, 2.1.0
>Reporter: Nemon Lou
>Assignee: Sahil Takiar
>Priority: Minor
> Attachments: HIVE-12614.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-14557) Nullpointer When both SkewJoin and Mapjoin Enabled

2017-03-15 Thread Nemon Lou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nemon Lou updated HIVE-14557:
-
Status: Patch Available  (was: Open)

> Nullpointer When both SkewJoin  and Mapjoin Enabled
> ---
>
> Key: HIVE-14557
> URL: https://issues.apache.org/jira/browse/HIVE-14557
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer
>Affects Versions: 2.1.0, 1.1.0
>Reporter: Nemon Lou
> Attachments: HIVE-14557.patch
>
>
> The following sql failed with return code 2 on mr.
> {noformat}
> create table a(id int,id1 int);
> create table b(id int,id1 int);
> create table c(id int,id1 int);
> set hive.optimize.skewjoin=true;
> select a.id,b.id,c.id1 from a,b,c where a.id=b.id and a.id1=c.id1;
> {noformat}
> Error log as follows:
> {noformat}
> 2016-08-17 21:13:42,081 INFO [main] 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper: 
> Id =0
>   
> Id =21
>   
> Id =28
>   
> Id =16
>   
>   <\Children>
>   Id = 28 null<\Parent>
> <\FS>
>   <\Children>
>   Id = 21 nullId = 33 
> Id =33
>   null
>   <\Children>
>   <\Parent>
> <\HASHTABLEDUMMY><\Parent>
> <\MAPJOIN>
>   <\Children>
>   Id = 0 null<\Parent>
> <\TS>
>   <\Children>
>   <\Parent>
> <\MAP>
> 2016-08-17 21:13:42,084 INFO [main] 
> org.apache.hadoop.hive.ql.exec.TableScanOperator: Initializing operator TS[21]
> 2016-08-17 21:13:42,084 INFO [main] 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper: Initializing dummy operator
> 2016-08-17 21:13:42,086 INFO [main] 
> org.apache.hadoop.hive.ql.exec.MapOperator: DESERIALIZE_ERRORS:0, 
> RECORDS_IN:0, 
> 2016-08-17 21:13:42,087 ERROR [main] 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper: Hit error while closing 
> operators - failing tree
> 2016-08-17 21:13:42,088 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : java.lang.RuntimeException: Hive Runtime Error 
> while closing operators
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:207)
>   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
>   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.MapJoinOperator.closeOp(MapJoinOperator.java:474)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:682)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:696)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:696)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:189)
>   ... 8 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-14557) Nullpointer When both SkewJoin and Mapjoin Enabled

2017-03-15 Thread Nemon Lou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nemon Lou updated HIVE-14557:
-
Attachment: HIVE-14557.patch

HIVE-6520 gives a detailed explanation about this issue.
This patch puts skewJoin optimization after mapJoin,and it just works .
If a join can be done in the way of map join,then there is no need for skew 
join.
Trigger QA to see what will happen.

> Nullpointer When both SkewJoin  and Mapjoin Enabled
> ---
>
> Key: HIVE-14557
> URL: https://issues.apache.org/jira/browse/HIVE-14557
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer
>Affects Versions: 1.1.0, 2.1.0
>Reporter: Nemon Lou
> Attachments: HIVE-14557.patch
>
>
> The following sql failed with return code 2 on mr.
> {noformat}
> create table a(id int,id1 int);
> create table b(id int,id1 int);
> create table c(id int,id1 int);
> set hive.optimize.skewjoin=true;
> select a.id,b.id,c.id1 from a,b,c where a.id=b.id and a.id1=c.id1;
> {noformat}
> Error log as follows:
> {noformat}
> 2016-08-17 21:13:42,081 INFO [main] 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper: 
> Id =0
>   
> Id =21
>   
> Id =28
>   
> Id =16
>   
>   <\Children>
>   Id = 28 null<\Parent>
> <\FS>
>   <\Children>
>   Id = 21 nullId = 33 
> Id =33
>   null
>   <\Children>
>   <\Parent>
> <\HASHTABLEDUMMY><\Parent>
> <\MAPJOIN>
>   <\Children>
>   Id = 0 null<\Parent>
> <\TS>
>   <\Children>
>   <\Parent>
> <\MAP>
> 2016-08-17 21:13:42,084 INFO [main] 
> org.apache.hadoop.hive.ql.exec.TableScanOperator: Initializing operator TS[21]
> 2016-08-17 21:13:42,084 INFO [main] 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper: Initializing dummy operator
> 2016-08-17 21:13:42,086 INFO [main] 
> org.apache.hadoop.hive.ql.exec.MapOperator: DESERIALIZE_ERRORS:0, 
> RECORDS_IN:0, 
> 2016-08-17 21:13:42,087 ERROR [main] 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper: Hit error while closing 
> operators - failing tree
> 2016-08-17 21:13:42,088 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : java.lang.RuntimeException: Hive Runtime Error 
> while closing operators
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:207)
>   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
>   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.MapJoinOperator.closeOp(MapJoinOperator.java:474)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:682)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:696)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:696)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:189)
>   ... 8 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15638) ArrayIndexOutOfBoundsException when output Columns for UDTF are pruned

2017-01-17 Thread Nemon Lou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15827388#comment-15827388
 ] 

Nemon Lou commented on HIVE-15638:
--

The following query will pass(adding a 'select * ' before UDTF hwrl) :
{noformat}
set hive.auto.convert.join=false;
select substring(c.start_time,1,10) create_date, 
tt.data_id,tt.word_type,tt.primary_word,tt.primary_nature,tt.primary_offset,tt.related_word,tt.related_nature,tt.related_offset
 
from (
select * from (
select hwrl(data_dt,src,data_id,tag_id,entity_src,pos_tagging)
as 
(data_dt,data_src,data_id,word_type,primary_word,primary_nature,primary_offset,related_word,related_nature,related_offset)
from (
select a.data_dt,a.src,a.data_id,a.tag_id,a.entity_src,b.pos_tagging
from tb_a a, tb_b b
where a.key like 'CP%' 
and a.data_dt='20160901'
and a.data_id=b.data_id
and b.src='04'
) t
) ttt
) tt, (select key,start_time from tb_c where data_dt='20160901') c 
where tt.data_id=c.key 
;
{noformat}

> ArrayIndexOutOfBoundsException when output Columns for UDTF are pruned 
> ---
>
> Key: HIVE-15638
> URL: https://issues.apache.org/jira/browse/HIVE-15638
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 1.3.0, 2.1.0
>Reporter: Nemon Lou
>
> {noformat}
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row [Error getting row data with exception 
> java.lang.ArrayIndexOutOfBoundsException: 151
>   at 
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryUtils.readVInt(LazyBinaryUtils.java:314)
>   at 
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryUtils.checkObjectByteInfo(LazyBinaryUtils.java:183)
>   at 
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.parse(LazyBinaryStruct.java:142)
>   at 
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.getField(LazyBinaryStruct.java:202)
>   at 
> org.apache.hadoop.hive.serde2.lazybinary.objectinspector.LazyBinaryStructObjectInspector.getStructFieldData(LazyBinaryStructObjectInspector.java:64)
>   at 
> org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:364)
>   at 
> org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:200)
>   at 
> org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:186)
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.toErrorMessage(MapOperator.java:525)
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:494)
>   at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:160)
>   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
>   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:180)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1710)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:174)
>  ]
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:499)
>   at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:160)
>   ... 8 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.ArrayIndexOutOfBoundsException: 151
>   at 
> org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.process(ReduceSinkOperator.java:416)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:878)
>   at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:130)
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:149)
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:489)
>   ... 9 more
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 151
>   at 
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryUtils.readVInt(LazyBinaryUtils.java:314)
>   at 
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryUtils.checkObjectByteInfo(LazyBinaryUtils.java:183)
>   at 
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.parse(LazyBinaryStruct.java:142)
>   at 
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.getField(LazyBinaryStruct.java:202)
>   at 
> org.apache.hadoop.hive.serde2.lazybinary.objectinspector.LazyBinaryStructObjectInspector.getStructFieldData(LazyBinaryStructObjectInspector.java:64)
>   at 
> org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator._evaluate(ExprNodeColumnEvaluator.java:94)
>   at 
> 

[jira] [Updated] (HIVE-14662) Wrong Class Instance When Using Custom SERDE

2016-10-09 Thread Nemon Lou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nemon Lou updated HIVE-14662:
-
Attachment: HIVE-14662.patch

> Wrong Class Instance When Using Custom SERDE
> 
>
> Key: HIVE-14662
> URL: https://issues.apache.org/jira/browse/HIVE-14662
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Reporter: Nemon Lou
>Assignee: Nemon Lou
> Attachments: HIVE-14662.patch
>
>
> Using  [SERDE for 
> mongoDB|https://github.com/mongodb/mongo-hadoop/blob/master/hive/src/main/java/com/mongodb/hadoop/hive/BSONSerDe.java]
> DDL
> {noformat}
> create external table mytable (ID STRING..) 
> ROW FORMAT SERDE  'com.mongodb.hadoop.hive.BSONSerDe' 
> WITH SERDEPROPERTIES('mongo.columns.mapping'='{"ID":"_id",.. }')
> STORED AS INPUTFORMAT 'com.mongodb.hadoop.mapred.BSONFileInputFormat'
> OUTPUTFORMAT 'com.mongodb.hadoop.hive.output.HiveBSONFileOutputFormat'
> LOCATION 'hdfs:///mypath'; 
> {noformat}
> Open beeline and run the following query ,and then open another beeline,run 
> this again.Then fails.
> {noformat}
> add jar hdfs:///tmp/mongo-hadoop-hive-1.4.2_new.jar;
> add jar hdfs:///tmp/mongo-java-driver-3.0.4.jar;
> add jar hdfs:///tmp/mongo-hadoop-core-1.4.2_new.jar;
> select * from mytable limit 1;
> {noformat}
> Error log :
> {noformat}
> 2016-08-25 09:30:34,475 | WARN  | HiveServer2-Handler-Pool: Thread-11972 | 
> Error fetching results:  | 
> org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:1058)
> org.apache.hive.service.cli.HiveSQLException: java.io.IOException: 
> org.apache.hadoop.hive.serde2.SerDeException: class 
> com.mongodb.hadoop.hive.BSONSerDerequires a BSONWritable object, notclass 
> com.mongodb.hadoop.io.BSONWritable
> at 
> org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:366)
> at 
> org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:251)
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:710)
> at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at 
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78)
> at 
> org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36)
> at 
> org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1673)
> at 
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59)
> at com.sun.proxy.$Proxy20.fetchResults(Unknown Source)
> at 
> org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:451)
> at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:1049)
> at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1553)
> at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1538)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:692)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: org.apache.hadoop.hive.serde2.SerDeException: 
> class com.mongodb.hadoop.hive.BSONSerDerequires a BSONWritable object, 
> notclass com.mongodb.hadoop.io.BSONWritable
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:507)
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:414)
> at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:140)
> at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1756)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:361)
> ... 24 more
> Caused by: 

[jira] [Updated] (HIVE-14662) Wrong Class Instance When Using Custom SERDE

2016-10-09 Thread Nemon Lou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nemon Lou updated HIVE-14662:
-
Status: Patch Available  (was: Open)

> Wrong Class Instance When Using Custom SERDE
> 
>
> Key: HIVE-14662
> URL: https://issues.apache.org/jira/browse/HIVE-14662
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Reporter: Nemon Lou
>Assignee: Nemon Lou
> Attachments: HIVE-14662.patch
>
>
> Using  [SERDE for 
> mongoDB|https://github.com/mongodb/mongo-hadoop/blob/master/hive/src/main/java/com/mongodb/hadoop/hive/BSONSerDe.java]
> DDL
> {noformat}
> create external table mytable (ID STRING..) 
> ROW FORMAT SERDE  'com.mongodb.hadoop.hive.BSONSerDe' 
> WITH SERDEPROPERTIES('mongo.columns.mapping'='{"ID":"_id",.. }')
> STORED AS INPUTFORMAT 'com.mongodb.hadoop.mapred.BSONFileInputFormat'
> OUTPUTFORMAT 'com.mongodb.hadoop.hive.output.HiveBSONFileOutputFormat'
> LOCATION 'hdfs:///mypath'; 
> {noformat}
> Open beeline and run the following query ,and then open another beeline,run 
> this again.Then fails.
> {noformat}
> add jar hdfs:///tmp/mongo-hadoop-hive-1.4.2_new.jar;
> add jar hdfs:///tmp/mongo-java-driver-3.0.4.jar;
> add jar hdfs:///tmp/mongo-hadoop-core-1.4.2_new.jar;
> select * from mytable limit 1;
> {noformat}
> Error log :
> {noformat}
> 2016-08-25 09:30:34,475 | WARN  | HiveServer2-Handler-Pool: Thread-11972 | 
> Error fetching results:  | 
> org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:1058)
> org.apache.hive.service.cli.HiveSQLException: java.io.IOException: 
> org.apache.hadoop.hive.serde2.SerDeException: class 
> com.mongodb.hadoop.hive.BSONSerDerequires a BSONWritable object, notclass 
> com.mongodb.hadoop.io.BSONWritable
> at 
> org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:366)
> at 
> org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:251)
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:710)
> at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at 
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78)
> at 
> org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36)
> at 
> org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1673)
> at 
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59)
> at com.sun.proxy.$Proxy20.fetchResults(Unknown Source)
> at 
> org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:451)
> at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:1049)
> at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1553)
> at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1538)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:692)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: org.apache.hadoop.hive.serde2.SerDeException: 
> class com.mongodb.hadoop.hive.BSONSerDerequires a BSONWritable object, 
> notclass com.mongodb.hadoop.io.BSONWritable
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:507)
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:414)
> at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:140)
> at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1756)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:361)
> ... 24 more
> 

[jira] [Commented] (HIVE-14390) Wrong Table alias when CBO is on

2016-08-06 Thread Nemon Lou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15410830#comment-15410830
 ] 

Nemon Lou commented on HIVE-14390:
--

Thanks a lot. [~pxiong] [~ashutoshc] . I haven't managed to update these qtest 
result yet.

> Wrong Table alias when CBO is on
> 
>
> Key: HIVE-14390
> URL: https://issues.apache.org/jira/browse/HIVE-14390
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 1.2.1
>Reporter: Nemon Lou
>Assignee: Nemon Lou
>Priority: Minor
> Fix For: 2.2.0
>
> Attachments: HIVE-14390.patch, explain.rar
>
>
> There are 5 web_sales references in query95 of tpcds ,with alias ws1-ws5.
> But the query plan only has ws1 when CBO is on.
> query95 :
> {noformat}
> SELECT count(distinct ws1.ws_order_number) as order_count,
>sum(ws1.ws_ext_ship_cost) as total_shipping_cost,
>sum(ws1.ws_net_profit) as total_net_profit
> FROM web_sales ws1
> JOIN customer_address ca ON (ws1.ws_ship_addr_sk = ca.ca_address_sk)
> JOIN web_site s ON (ws1.ws_web_site_sk = s.web_site_sk)
> JOIN date_dim d ON (ws1.ws_ship_date_sk = d.d_date_sk)
> LEFT SEMI JOIN (SELECT ws2.ws_order_number as ws_order_number
>FROM web_sales ws2 JOIN web_sales ws3
>ON (ws2.ws_order_number = ws3.ws_order_number)
>WHERE ws2.ws_warehouse_sk <> 
> ws3.ws_warehouse_sk
> ) ws_wh1
> ON (ws1.ws_order_number = ws_wh1.ws_order_number)
> LEFT SEMI JOIN (SELECT wr_order_number
>FROM web_returns wr
>JOIN (SELECT ws4.ws_order_number as 
> ws_order_number
>   FROM web_sales ws4 JOIN web_sales 
> ws5
>   ON (ws4.ws_order_number = 
> ws5.ws_order_number)
>  WHERE ws4.ws_warehouse_sk <> 
> ws5.ws_warehouse_sk
> ) ws_wh2
>ON (wr.wr_order_number = 
> ws_wh2.ws_order_number)) tmp1
> ON (ws1.ws_order_number = tmp1.wr_order_number)
> WHERE d.d_date between '2002-05-01' and '2002-06-30' and
>ca.ca_state = 'GA' and
>s.web_company_name = 'pri';
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10153) CBO (Calcite Return Path): TPC-DS Q15 in-efficient join order

2016-08-04 Thread Nemon Lou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15407306#comment-15407306
 ] 

Nemon Lou commented on HIVE-10153:
--

This filter prevents joining date_dim first:
{noformat}
( substr(ca_zip,1,5) in ('85669', '86197','88274','83405','86475',
 '85392', '85460', '80348', '81792')
 or customer_address.ca_state in ('CA','WA','GA')
 or catalog_sales.cs_sales_price > 500)
{noformat}
With this filter, table date_dim can not be combined into the same mutijoin 
RelNode with other 3 tables. 
{code}
  private boolean canCombine(RelNode input, boolean nullGenerating) {
return input instanceof MultiJoin
&& !((MultiJoin) input).isFullOuterJoin()
&& !((MultiJoin) input).containsOuter()
&& !nullGenerating;
  }
{code}
The input is a filter RelNode instead of MultiJoin.
{noformat}
2016-08-04 14:23:38,637 | DEBUG | HiveServer2-Handler-Pool: Thread-123 | 
Original Plan:
HiveSort(fetch=[100])
  HiveSort(sort0=[$0], dir0=[ASC])
HiveProject(ca_zip=[$0], _o__c1=[$1])
  HiveAggregate(group=[{0}], agg#0=[sum($1)])
HiveProject($f0=[$67], $f1=[$20])
  HiveFilter(condition=[AND(=($2, $37), =($41, $58), =($33, $74), 
OR(in(substr($67, 1, 5), '85669', '86197', '88274', '83405', '86475', '85392', 
'85460', '80348', '81792'), in($66, 'CA', 'WA', 'GA'), >($20, 5E2)), =($84, 2), 
=($80, 2000))])
HiveJoin(condition=[true], joinType=[inner], algorithm=[none], 
cost=[not available])
  HiveJoin(condition=[true], joinType=[inner], algorithm=[none], 
cost=[not available])
HiveJoin(condition=[true], joinType=[inner], algorithm=[none], 
cost=[not available])
  
HiveTableScan(table=[[tpcds_bin_partitioned_orc_10.catalog_sales]])
  HiveTableScan(table=[[tpcds_bin_partitioned_orc_10.customer]])

HiveTableScan(table=[[tpcds_bin_partitioned_orc_10.customer_address]])
  HiveTableScan(table=[[tpcds_bin_partitioned_orc_10.date_dim]])
 | 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:911)
2016-08-04 14:23:38,654 | DEBUG | HiveServer2-Handler-Pool: Thread-123 | Plan 
After PPD, PartPruning, ColumnPruning:
HiveSort(fetch=[100])
  HiveSort(sort0=[$0], dir0=[ASC])
HiveAggregate(group=[{0}], agg#0=[sum($1)])
  HiveProject($f0=[$7], $f1=[$1])
HiveJoin(condition=[=($2, $8)], joinType=[inner], algorithm=[none], 
cost=[not available])
  HiveFilter(condition=[OR(in(substr($7, 1, 5), '85669', '86197', 
'88274', '83405', '86475', '85392', '85460', '80348', '81792'), in($6, 'CA', 
'WA', 'GA'), >($1, 5E2))])
HiveJoin(condition=[=($4, $5)], joinType=[inner], algorithm=[none], 
cost=[not available])
  HiveJoin(condition=[=($0, $3)], joinType=[inner], 
algorithm=[none], cost=[not available])
HiveProject(cs_bill_customer_sk=[$2], cs_sales_price=[$20], 
cs_sold_date_sk=[$33])
  
HiveTableScan(table=[[tpcds_bin_partitioned_orc_10.catalog_sales]])
HiveProject(c_customer_sk=[$0], c_current_addr_sk=[$4])
  HiveTableScan(table=[[tpcds_bin_partitioned_orc_10.customer]])
  HiveProject(ca_address_sk=[$0], ca_state=[$8], ca_zip=[$9])

HiveTableScan(table=[[tpcds_bin_partitioned_orc_10.customer_address]])
  HiveProject(d_date_sk=[$0], d_year=[$6], d_qoy=[$10])
HiveFilter(condition=[AND(=($10, 2), =($6, 2000))])
  HiveTableScan(table=[[tpcds_bin_partitioned_orc_10.date_dim]])
 | 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:912)
{noformat}

Replacing 'or' with 'and' can help.
{noformat}
 ( substr(ca_zip,1,5) in ('85669', '86197','88274','83405','86475',
 '85392', '85460', '80348', '81792')
 and customer_address.ca_state in ('CA','WA','GA')
 and catalog_sales.cs_sales_price > 500)
{noformat}

> CBO (Calcite Return Path): TPC-DS Q15 in-efficient join order 
> --
>
> Key: HIVE-10153
> URL: https://issues.apache.org/jira/browse/HIVE-10153
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: cbo-branch
>Reporter: Mostafa Mokhtar
>Assignee: Laljo John Pullokkaran
> Fix For: cbo-branch
>
>
> TPC-DS Q15 joins catalog_sales with date_dim last where it should be the 
> first join.
> Query 
> {code}
> select  ca_zip
>,sum(cs_sales_price)
>  from catalog_sales
>  ,customer
>  ,customer_address
>  ,date_dim
>  where catalog_sales.cs_bill_customer_sk = customer.c_customer_sk
>   and customer.c_current_addr_sk = customer_address.ca_address_sk 
>   and ( substr(ca_zip,1,5) in ('85669', '86197','88274','83405','86475',
>'85392', '85460', '80348', 

[jira] [Commented] (HIVE-14390) Wrong Table alias when CBO is on

2016-08-02 Thread Nemon Lou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15404176#comment-15404176
 ] 

Nemon Lou commented on HIVE-14390:
--

[~pxiong] Query plans for union15.q and union.9.q in SparkCliDriver look good 
to me.It's just the same plan as in branch1.2 .

> Wrong Table alias when CBO is on
> 
>
> Key: HIVE-14390
> URL: https://issues.apache.org/jira/browse/HIVE-14390
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 1.2.1
>Reporter: Nemon Lou
>Assignee: Nemon Lou
>Priority: Minor
> Attachments: HIVE-14390.patch, explain.rar
>
>
> There are 5 web_sales references in query95 of tpcds ,with alias ws1-ws5.
> But the query plan only has ws1 when CBO is on.
> query95 :
> {noformat}
> SELECT count(distinct ws1.ws_order_number) as order_count,
>sum(ws1.ws_ext_ship_cost) as total_shipping_cost,
>sum(ws1.ws_net_profit) as total_net_profit
> FROM web_sales ws1
> JOIN customer_address ca ON (ws1.ws_ship_addr_sk = ca.ca_address_sk)
> JOIN web_site s ON (ws1.ws_web_site_sk = s.web_site_sk)
> JOIN date_dim d ON (ws1.ws_ship_date_sk = d.d_date_sk)
> LEFT SEMI JOIN (SELECT ws2.ws_order_number as ws_order_number
>FROM web_sales ws2 JOIN web_sales ws3
>ON (ws2.ws_order_number = ws3.ws_order_number)
>WHERE ws2.ws_warehouse_sk <> 
> ws3.ws_warehouse_sk
> ) ws_wh1
> ON (ws1.ws_order_number = ws_wh1.ws_order_number)
> LEFT SEMI JOIN (SELECT wr_order_number
>FROM web_returns wr
>JOIN (SELECT ws4.ws_order_number as 
> ws_order_number
>   FROM web_sales ws4 JOIN web_sales 
> ws5
>   ON (ws4.ws_order_number = 
> ws5.ws_order_number)
>  WHERE ws4.ws_warehouse_sk <> 
> ws5.ws_warehouse_sk
> ) ws_wh2
>ON (wr.wr_order_number = 
> ws_wh2.ws_order_number)) tmp1
> ON (ws1.ws_order_number = tmp1.wr_order_number)
> WHERE d.d_date between '2002-05-01' and '2002-06-30' and
>ca.ca_state = 'GA' and
>s.web_company_name = 'pri';
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14374) BeeLine argument, and configuration handling cleanup

2016-08-01 Thread Nemon Lou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15402036#comment-15402036
 ] 

Nemon Lou commented on HIVE-14374:
--

For my part,it will be fine to remove it. 

> BeeLine argument, and configuration handling cleanup
> 
>
> Key: HIVE-14374
> URL: https://issues.apache.org/jira/browse/HIVE-14374
> Project: Hive
>  Issue Type: Improvement
>  Components: Beeline
>Affects Versions: 2.2.0
>Reporter: Peter Vary
>Assignee: Peter Vary
>
> BeeLine uses reflection, to set the BeeLineOpts attributes when parsing 
> command line arguments, and when loading the configuration file.
> This means, that creating a setXXX, getXXX method in BeeLineOpts is a 
> potential risk of exposing an attribute for the user unintentionally. There 
> is a possibility to exclude an attribute from saving the value in the 
> configuration file with the Ignore annotation. This does not restrict the 
> loading or command line setting of these parameters which means there are 
> many undocumented "features" as-is, like setting the lastConnectedUrl, 
> allowMultilineCommand, maxHeight, trimScripts, etc. from command line.
> This part of the code needs a little cleanup.
> I think we should make this exposure more explicit, and be able to 
> differentiate the configurable options depending on the source (command line, 
> and configuration file), so I propose to create a mechanism to tell 
> explicitly which BeeLineOpts attributes are settable by command line, and 
> configuration file, and every other attribute should be inaccessible by the 
> user of the beeline cli.
> One possible solution could be two annotations like these:
> - CommandLineOption - there could be a mandatory text parameter here, so the 
> developer had to provide the help text for it which could be displayed to the 
> user
> - ConfigurationFileOption - no text is required here
> Something like this:
> - This attribute could be provided by command line, and from a configuration 
> file too:
> {noformat}
> @CommandLineOption("automatically save preferences")
> @ConfigurationFileOption
> public void setAutosave(boolean autosave) {
>   this.autosave = autosave;
> }
> public void getAutosave() {
>   return this.autosave;
> }
> {noformat}
> - This attribute could be set through the configuration only
> {noformat}
> @ConfigurationFileOption
> public void setLastConnectedUrl(String lastConnectedUrl) {
>   this.lastConnectedUrl = lastConnectedUrl;

> }
> 

> public String getLastConnectedUrl()
> {

>   return lastConnectedUrl;
> 
}
> 
{noformat}
> - Attribute could be set through command line only - I think this is not too 
> relevant, but possible
> {noformat}
> @CommandLineOption("specific command line option")
> public void setSpecificCommandLineOption(String specificCommandLineOption) {
> 
  this.specificCommandLineOption = specificCommandLineOption;
> 
}
> 

> public String getSpecificCommandLineOption() {
> 
  return specificCommandLineOption;
> 
}
> 
{noformat}
> - Attribute could not be set
> {noformat}
> public static Env getEnv() {
> 
  return env;
> 
}
> 

public static void setEnv(Env envToUse) {
> 
  env = envToUse;
> 
}
> {noformat}
> Accouring to our previous conversations, I think you might be interested in: 
> [~spena], [~vihangk1], [~aihuaxu], [~ngangam], [~ychena], [~xuefuz]
> but anyone is welcome to discuss this.
> What do you think about the proposed solution?
> Any better ideas, or extensions?
> Thanks,
> Peter



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14374) BeeLine argument, and configuration handling cleanup

2016-08-01 Thread Nemon Lou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15402020#comment-15402020
 ] 

Nemon Lou commented on HIVE-14374:
--

[~pvary] Thanks for reminding this.If it was removed by accident ,then it will 
be good to reintroduce it.We have already use this in our production. 

> BeeLine argument, and configuration handling cleanup
> 
>
> Key: HIVE-14374
> URL: https://issues.apache.org/jira/browse/HIVE-14374
> Project: Hive
>  Issue Type: Improvement
>  Components: Beeline
>Affects Versions: 2.2.0
>Reporter: Peter Vary
>Assignee: Peter Vary
>
> BeeLine uses reflection, to set the BeeLineOpts attributes when parsing 
> command line arguments, and when loading the configuration file.
> This means, that creating a setXXX, getXXX method in BeeLineOpts is a 
> potential risk of exposing an attribute for the user unintentionally. There 
> is a possibility to exclude an attribute from saving the value in the 
> configuration file with the Ignore annotation. This does not restrict the 
> loading or command line setting of these parameters which means there are 
> many undocumented "features" as-is, like setting the lastConnectedUrl, 
> allowMultilineCommand, maxHeight, trimScripts, etc. from command line.
> This part of the code needs a little cleanup.
> I think we should make this exposure more explicit, and be able to 
> differentiate the configurable options depending on the source (command line, 
> and configuration file), so I propose to create a mechanism to tell 
> explicitly which BeeLineOpts attributes are settable by command line, and 
> configuration file, and every other attribute should be inaccessible by the 
> user of the beeline cli.
> One possible solution could be two annotations like these:
> - CommandLineOption - there could be a mandatory text parameter here, so the 
> developer had to provide the help text for it which could be displayed to the 
> user
> - ConfigurationFileOption - no text is required here
> Something like this:
> - This attribute could be provided by command line, and from a configuration 
> file too:
> {noformat}
> @CommandLineOption("automatically save preferences")
> @ConfigurationFileOption
> public void setAutosave(boolean autosave) {
>   this.autosave = autosave;
> }
> public void getAutosave() {
>   return this.autosave;
> }
> {noformat}
> - This attribute could be set through the configuration only
> {noformat}
> @ConfigurationFileOption
> public void setLastConnectedUrl(String lastConnectedUrl) {
>   this.lastConnectedUrl = lastConnectedUrl;

> }
> 

> public String getLastConnectedUrl()
> {

>   return lastConnectedUrl;
> 
}
> 
{noformat}
> - Attribute could be set through command line only - I think this is not too 
> relevant, but possible
> {noformat}
> @CommandLineOption("specific command line option")
> public void setSpecificCommandLineOption(String specificCommandLineOption) {
> 
  this.specificCommandLineOption = specificCommandLineOption;
> 
}
> 

> public String getSpecificCommandLineOption() {
> 
  return specificCommandLineOption;
> 
}
> 
{noformat}
> - Attribute could not be set
> {noformat}
> public static Env getEnv() {
> 
  return env;
> 
}
> 

public static void setEnv(Env envToUse) {
> 
  env = envToUse;
> 
}
> {noformat}
> Accouring to our previous conversations, I think you might be interested in: 
> [~spena], [~vihangk1], [~aihuaxu], [~ngangam], [~ychena], [~xuefuz]
> but anyone is welcome to discuss this.
> What do you think about the proposed solution?
> Any better ideas, or extensions?
> Thanks,
> Peter



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-14390) Wrong Table alias when CBO is on

2016-07-31 Thread Nemon Lou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nemon Lou reassigned HIVE-14390:


Assignee: Nemon Lou

> Wrong Table alias when CBO is on
> 
>
> Key: HIVE-14390
> URL: https://issues.apache.org/jira/browse/HIVE-14390
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 1.2.1
>Reporter: Nemon Lou
>Assignee: Nemon Lou
>Priority: Minor
> Attachments: HIVE-14390.patch, explain.rar
>
>
> There are 5 web_sales references in query95 of tpcds ,with alias ws1-ws5.
> But the query plan only has ws1 when CBO is on.
> query95 :
> {noformat}
> SELECT count(distinct ws1.ws_order_number) as order_count,
>sum(ws1.ws_ext_ship_cost) as total_shipping_cost,
>sum(ws1.ws_net_profit) as total_net_profit
> FROM web_sales ws1
> JOIN customer_address ca ON (ws1.ws_ship_addr_sk = ca.ca_address_sk)
> JOIN web_site s ON (ws1.ws_web_site_sk = s.web_site_sk)
> JOIN date_dim d ON (ws1.ws_ship_date_sk = d.d_date_sk)
> LEFT SEMI JOIN (SELECT ws2.ws_order_number as ws_order_number
>FROM web_sales ws2 JOIN web_sales ws3
>ON (ws2.ws_order_number = ws3.ws_order_number)
>WHERE ws2.ws_warehouse_sk <> 
> ws3.ws_warehouse_sk
> ) ws_wh1
> ON (ws1.ws_order_number = ws_wh1.ws_order_number)
> LEFT SEMI JOIN (SELECT wr_order_number
>FROM web_returns wr
>JOIN (SELECT ws4.ws_order_number as 
> ws_order_number
>   FROM web_sales ws4 JOIN web_sales 
> ws5
>   ON (ws4.ws_order_number = 
> ws5.ws_order_number)
>  WHERE ws4.ws_warehouse_sk <> 
> ws5.ws_warehouse_sk
> ) ws_wh2
>ON (wr.wr_order_number = 
> ws_wh2.ws_order_number)) tmp1
> ON (ws1.ws_order_number = tmp1.wr_order_number)
> WHERE d.d_date between '2002-05-01' and '2002-06-30' and
>ca.ca_state = 'GA' and
>s.web_company_name = 'pri';
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14390) Wrong Table alias when CBO is on

2016-07-31 Thread Nemon Lou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nemon Lou updated HIVE-14390:
-
Status: Patch Available  (was: Open)

> Wrong Table alias when CBO is on
> 
>
> Key: HIVE-14390
> URL: https://issues.apache.org/jira/browse/HIVE-14390
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 1.2.1
>Reporter: Nemon Lou
>Priority: Minor
> Attachments: HIVE-14390.patch, explain.rar
>
>
> There are 5 web_sales references in query95 of tpcds ,with alias ws1-ws5.
> But the query plan only has ws1 when CBO is on.
> query95 :
> {noformat}
> SELECT count(distinct ws1.ws_order_number) as order_count,
>sum(ws1.ws_ext_ship_cost) as total_shipping_cost,
>sum(ws1.ws_net_profit) as total_net_profit
> FROM web_sales ws1
> JOIN customer_address ca ON (ws1.ws_ship_addr_sk = ca.ca_address_sk)
> JOIN web_site s ON (ws1.ws_web_site_sk = s.web_site_sk)
> JOIN date_dim d ON (ws1.ws_ship_date_sk = d.d_date_sk)
> LEFT SEMI JOIN (SELECT ws2.ws_order_number as ws_order_number
>FROM web_sales ws2 JOIN web_sales ws3
>ON (ws2.ws_order_number = ws3.ws_order_number)
>WHERE ws2.ws_warehouse_sk <> 
> ws3.ws_warehouse_sk
> ) ws_wh1
> ON (ws1.ws_order_number = ws_wh1.ws_order_number)
> LEFT SEMI JOIN (SELECT wr_order_number
>FROM web_returns wr
>JOIN (SELECT ws4.ws_order_number as 
> ws_order_number
>   FROM web_sales ws4 JOIN web_sales 
> ws5
>   ON (ws4.ws_order_number = 
> ws5.ws_order_number)
>  WHERE ws4.ws_warehouse_sk <> 
> ws5.ws_warehouse_sk
> ) ws_wh2
>ON (wr.wr_order_number = 
> ws_wh2.ws_order_number)) tmp1
> ON (ws1.ws_order_number = tmp1.wr_order_number)
> WHERE d.d_date between '2002-05-01' and '2002-06-30' and
>ca.ca_state = 'GA' and
>s.web_company_name = 'pri';
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14390) Wrong Table alias when CBO is on

2016-07-30 Thread Nemon Lou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nemon Lou updated HIVE-14390:
-
Attachment: HIVE-14390.patch

HIVE-14390.patch can fix this.But I'm not sure it's the right way.
[~pxiong] Would you mind taking a look?

> Wrong Table alias when CBO is on
> 
>
> Key: HIVE-14390
> URL: https://issues.apache.org/jira/browse/HIVE-14390
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 1.2.1
>Reporter: Nemon Lou
>Priority: Minor
> Attachments: HIVE-14390.patch, explain.rar
>
>
> There are 5 web_sales references in query95 of tpcds ,with alias ws1-ws5.
> But the query plan only has ws1 when CBO is on.
> query95 :
> {noformat}
> SELECT count(distinct ws1.ws_order_number) as order_count,
>sum(ws1.ws_ext_ship_cost) as total_shipping_cost,
>sum(ws1.ws_net_profit) as total_net_profit
> FROM web_sales ws1
> JOIN customer_address ca ON (ws1.ws_ship_addr_sk = ca.ca_address_sk)
> JOIN web_site s ON (ws1.ws_web_site_sk = s.web_site_sk)
> JOIN date_dim d ON (ws1.ws_ship_date_sk = d.d_date_sk)
> LEFT SEMI JOIN (SELECT ws2.ws_order_number as ws_order_number
>FROM web_sales ws2 JOIN web_sales ws3
>ON (ws2.ws_order_number = ws3.ws_order_number)
>WHERE ws2.ws_warehouse_sk <> 
> ws3.ws_warehouse_sk
> ) ws_wh1
> ON (ws1.ws_order_number = ws_wh1.ws_order_number)
> LEFT SEMI JOIN (SELECT wr_order_number
>FROM web_returns wr
>JOIN (SELECT ws4.ws_order_number as 
> ws_order_number
>   FROM web_sales ws4 JOIN web_sales 
> ws5
>   ON (ws4.ws_order_number = 
> ws5.ws_order_number)
>  WHERE ws4.ws_warehouse_sk <> 
> ws5.ws_warehouse_sk
> ) ws_wh2
>ON (wr.wr_order_number = 
> ws_wh2.ws_order_number)) tmp1
> ON (ws1.ws_order_number = tmp1.wr_order_number)
> WHERE d.d_date between '2002-05-01' and '2002-06-30' and
>ca.ca_state = 'GA' and
>s.web_company_name = 'pri';
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14390) Wrong Table alias when CBO is on

2016-07-30 Thread Nemon Lou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nemon Lou updated HIVE-14390:
-
Attachment: explain.rar

> Wrong Table alias when CBO is on
> 
>
> Key: HIVE-14390
> URL: https://issues.apache.org/jira/browse/HIVE-14390
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 1.2.1
>Reporter: Nemon Lou
>Priority: Minor
> Attachments: explain.rar
>
>
> There are 5 web_sales references in query95 of tpcds ,with alias ws1-ws5.
> But the query plan only has ws1 when CBO is on.
> query95 :
> {noformat}
> SELECT count(distinct ws1.ws_order_number) as order_count,
>sum(ws1.ws_ext_ship_cost) as total_shipping_cost,
>sum(ws1.ws_net_profit) as total_net_profit
> FROM web_sales ws1
> JOIN customer_address ca ON (ws1.ws_ship_addr_sk = ca.ca_address_sk)
> JOIN web_site s ON (ws1.ws_web_site_sk = s.web_site_sk)
> JOIN date_dim d ON (ws1.ws_ship_date_sk = d.d_date_sk)
> LEFT SEMI JOIN (SELECT ws2.ws_order_number as ws_order_number
>FROM web_sales ws2 JOIN web_sales ws3
>ON (ws2.ws_order_number = ws3.ws_order_number)
>WHERE ws2.ws_warehouse_sk <> 
> ws3.ws_warehouse_sk
> ) ws_wh1
> ON (ws1.ws_order_number = ws_wh1.ws_order_number)
> LEFT SEMI JOIN (SELECT wr_order_number
>FROM web_returns wr
>JOIN (SELECT ws4.ws_order_number as 
> ws_order_number
>   FROM web_sales ws4 JOIN web_sales 
> ws5
>   ON (ws4.ws_order_number = 
> ws5.ws_order_number)
>  WHERE ws4.ws_warehouse_sk <> 
> ws5.ws_warehouse_sk
> ) ws_wh2
>ON (wr.wr_order_number = 
> ws_wh2.ws_order_number)) tmp1
> ON (ws1.ws_order_number = tmp1.wr_order_number)
> WHERE d.d_date between '2002-05-01' and '2002-06-30' and
>ca.ca_state = 'GA' and
>s.web_company_name = 'pri';
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14353) Performance degradation after Projection Pruning in CBO

2016-07-28 Thread Nemon Lou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nemon Lou updated HIVE-14353:
-
Attachment: q46_explain_cbo_vs_nocbo.tar.gz

> Performance degradation  after Projection Pruning in CBO
> 
>
> Key: HIVE-14353
> URL: https://issues.apache.org/jira/browse/HIVE-14353
> Project: Hive
>  Issue Type: Bug
>  Components: CBO, Logical Optimizer
>Affects Versions: 1.2.1
>Reporter: Nemon Lou
> Attachments: q46_cbo_no_projection_prune_explain.txt, 
> q46_cbo_projection_prune_explain.rar, q46_explain_cbo_vs_nocbo.tar.gz
>
>
> TPC-DS with factor 1024.
> Hive on Spark. 
> With and without projection prunning,time spent are quite different.
> The way to disable projection prunning : disable HiveRelFieldTrimmer in code 
> and compile a new jar.
> ||queries||CBO_no_projection_prune||CBO||
> |q27| 160|251 | 
> |q7   |   200|312 |
> |q88| 701|1092|
> |q68| 234|345 |
> |q39|53|78  |
> |q73| 160|228 |
> |q31| 463|659 |
> |q79| 242|343 |
> |q46| 256|363 |
> |q60| 271|382 |
> |q66| 198|278 |
> |q34| 155|217 |
> |q19| 184|256 |
> |q26| 154|214 |
> |q56| 262|364 |
> |q75| 942|1303|
> |q71| 288|388 |
> |q25| 329|442 |
> |q52| 142|190 |
> |q42| 142|189 |
> |q3   |   139|185 |
> |q98| 153|203 |
> |q89| 187|248 |
> |q58| 264|340 |
> |q43| 127|162 |
> |q32| 174|221 |
> |q96| 156|197 |
> |q70| 320|404 |
> |q29| 499|629 |
> |q18| 266|329 |
> |q21| 76 |92  |
> |q90| 139|165 |



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14353) Performance degradation after Projection Pruning in CBO

2016-07-28 Thread Nemon Lou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15398540#comment-15398540
 ] 

Nemon Lou commented on HIVE-14353:
--

The motivation of this jira ticket is that ,I found that query46 was lower with 
CBO on than off, while the join order is the same. ( I changed the join order 
in SQL manually when CBO is off.)
After comparing these two query plans,the major difference is the select 
operator introduced by CBO's projection pruning.


> Performance degradation  after Projection Pruning in CBO
> 
>
> Key: HIVE-14353
> URL: https://issues.apache.org/jira/browse/HIVE-14353
> Project: Hive
>  Issue Type: Bug
>  Components: CBO, Logical Optimizer
>Affects Versions: 1.2.1
>Reporter: Nemon Lou
> Attachments: q46_cbo_no_projection_prune_explain.txt, 
> q46_cbo_projection_prune_explain.rar
>
>
> TPC-DS with factor 1024.
> Hive on Spark. 
> With and without projection prunning,time spent are quite different.
> The way to disable projection prunning : disable HiveRelFieldTrimmer in code 
> and compile a new jar.
> ||queries||CBO_no_projection_prune||CBO||
> |q27| 160|251 | 
> |q7   |   200|312 |
> |q88| 701|1092|
> |q68| 234|345 |
> |q39|53|78  |
> |q73| 160|228 |
> |q31| 463|659 |
> |q79| 242|343 |
> |q46| 256|363 |
> |q60| 271|382 |
> |q66| 198|278 |
> |q34| 155|217 |
> |q19| 184|256 |
> |q26| 154|214 |
> |q56| 262|364 |
> |q75| 942|1303|
> |q71| 288|388 |
> |q25| 329|442 |
> |q52| 142|190 |
> |q42| 142|189 |
> |q3   |   139|185 |
> |q98| 153|203 |
> |q89| 187|248 |
> |q58| 264|340 |
> |q43| 127|162 |
> |q32| 174|221 |
> |q96| 156|197 |
> |q70| 320|404 |
> |q29| 499|629 |
> |q18| 266|329 |
> |q21| 76 |92  |
> |q90| 139|165 |



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-14353) Performance degradation after Projection Pruning in CBO

2016-07-28 Thread Nemon Lou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15396893#comment-15396893
 ] 

Nemon Lou edited comment on HIVE-14353 at 7/29/16 1:08 AM:
---

||queries||CBO_total_time||CBO_time_in_SelectOP||
|q27|   266.494|80.5 | 
|q7 |   328.259|98.8 |
|q68|   369.159|105 |
|q46|   392.777|91.75|

I just run a few of them because of time limit. The time spent in selectOP is 
calculated by adding up total times spent for selectOP  in one executor ,and 
then divide number of cores.(4 in my case).
Also,I have run q46 without projection pruning.And total time is 266.226,time 
spent in selectOP is 0.125 seconds.


was (Author: nemon):
||queries||CBO_total_time||CBO_time_in_SelectOP||
|q27|   266.494|251 | 
|q7 |   328.259|98.8 |
|q68|   369.159|105 |
|q46|   392.777|91.75|

I just run a few of them because of time limit. The time spent in selectOP is 
calculated by adding up total times spent for selectOP  in one executor ,and 
then divide number of cores.(4 in my case).
Also,I have run q46 without projection pruning.And total time is 266.226,time 
spent in selectOP is 0.125 seconds.

> Performance degradation  after Projection Pruning in CBO
> 
>
> Key: HIVE-14353
> URL: https://issues.apache.org/jira/browse/HIVE-14353
> Project: Hive
>  Issue Type: Bug
>  Components: CBO, Logical Optimizer
>Affects Versions: 1.2.1
>Reporter: Nemon Lou
> Attachments: q46_cbo_no_projection_prune_explain.txt, 
> q46_cbo_projection_prune_explain.rar
>
>
> TPC-DS with factor 1024.
> Hive on Spark. 
> With and without projection prunning,time spent are quite different.
> The way to disable projection prunning : disable HiveRelFieldTrimmer in code 
> and compile a new jar.
> ||queries||CBO_no_projection_prune||CBO||
> |q27| 160|251 | 
> |q7   |   200|312 |
> |q88| 701|1092|
> |q68| 234|345 |
> |q39|53|78  |
> |q73| 160|228 |
> |q31| 463|659 |
> |q79| 242|343 |
> |q46| 256|363 |
> |q60| 271|382 |
> |q66| 198|278 |
> |q34| 155|217 |
> |q19| 184|256 |
> |q26| 154|214 |
> |q56| 262|364 |
> |q75| 942|1303|
> |q71| 288|388 |
> |q25| 329|442 |
> |q52| 142|190 |
> |q42| 142|189 |
> |q3   |   139|185 |
> |q98| 153|203 |
> |q89| 187|248 |
> |q58| 264|340 |
> |q43| 127|162 |
> |q32| 174|221 |
> |q96| 156|197 |
> |q70| 320|404 |
> |q29| 499|629 |
> |q18| 266|329 |
> |q21| 76 |92  |
> |q90| 139|165 |



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14353) Performance degradation after Projection Pruning in CBO

2016-07-28 Thread Nemon Lou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15397041#comment-15397041
 ] 

Nemon Lou commented on HIVE-14353:
--

A preliminary analysis:
Hive has a built in column pruner, and column pruning has been pushed down to 
InputFormat layer.
CBO adds an projection above table scan,which is very costly especially when 
doing projection before join.
Join can filter out a lot of rows in most cases of TPCDS.

> Performance degradation  after Projection Pruning in CBO
> 
>
> Key: HIVE-14353
> URL: https://issues.apache.org/jira/browse/HIVE-14353
> Project: Hive
>  Issue Type: Bug
>  Components: CBO, Logical Optimizer
>Affects Versions: 1.2.1
>Reporter: Nemon Lou
> Attachments: q46_cbo_no_projection_prune_explain.txt, 
> q46_cbo_projection_prune_explain.rar
>
>
> TPC-DS with factor 1024.
> Hive on Spark. 
> With and without projection prunning,time spent are quite different.
> The way to disable projection prunning : disable HiveRelFieldTrimmer in code 
> and compile a new jar.
> ||queries||CBO_no_projection_prune||CBO||
> |q27| 160|251 | 
> |q7   |   200|312 |
> |q88| 701|1092|
> |q68| 234|345 |
> |q39|53|78  |
> |q73| 160|228 |
> |q31| 463|659 |
> |q79| 242|343 |
> |q46| 256|363 |
> |q60| 271|382 |
> |q66| 198|278 |
> |q34| 155|217 |
> |q19| 184|256 |
> |q26| 154|214 |
> |q56| 262|364 |
> |q75| 942|1303|
> |q71| 288|388 |
> |q25| 329|442 |
> |q52| 142|190 |
> |q42| 142|189 |
> |q3   |   139|185 |
> |q98| 153|203 |
> |q89| 187|248 |
> |q58| 264|340 |
> |q43| 127|162 |
> |q32| 174|221 |
> |q96| 156|197 |
> |q70| 320|404 |
> |q29| 499|629 |
> |q18| 266|329 |
> |q21| 76 |92  |
> |q90| 139|165 |



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14353) Performance degradation after Projection Pruning in CBO

2016-07-28 Thread Nemon Lou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nemon Lou updated HIVE-14353:
-
Attachment: q46_cbo_projection_prune_explain.rar

> Performance degradation  after Projection Pruning in CBO
> 
>
> Key: HIVE-14353
> URL: https://issues.apache.org/jira/browse/HIVE-14353
> Project: Hive
>  Issue Type: Bug
>  Components: CBO, Logical Optimizer
>Affects Versions: 1.2.1
>Reporter: Nemon Lou
> Attachments: q46_cbo_no_projection_prune_explain.txt, 
> q46_cbo_projection_prune_explain.rar
>
>
> TPC-DS with factor 1024.
> Hive on Spark. 
> With and without projection prunning,time spent are quite different.
> The way to disable projection prunning : disable HiveRelFieldTrimmer in code 
> and compile a new jar.
> ||queries||CBO_no_projection_prune||CBO||
> |q27| 160|251 | 
> |q7   |   200|312 |
> |q88| 701|1092|
> |q68| 234|345 |
> |q39|53|78  |
> |q73| 160|228 |
> |q31| 463|659 |
> |q79| 242|343 |
> |q46| 256|363 |
> |q60| 271|382 |
> |q66| 198|278 |
> |q34| 155|217 |
> |q19| 184|256 |
> |q26| 154|214 |
> |q56| 262|364 |
> |q75| 942|1303|
> |q71| 288|388 |
> |q25| 329|442 |
> |q52| 142|190 |
> |q42| 142|189 |
> |q3   |   139|185 |
> |q98| 153|203 |
> |q89| 187|248 |
> |q58| 264|340 |
> |q43| 127|162 |
> |q32| 174|221 |
> |q96| 156|197 |
> |q70| 320|404 |
> |q29| 499|629 |
> |q18| 266|329 |
> |q21| 76 |92  |
> |q90| 139|165 |



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14353) Performance degradation after Projection Pruning in CBO

2016-07-28 Thread Nemon Lou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nemon Lou updated HIVE-14353:
-
Attachment: q46_cbo_no_projection_prune_explain.txt

> Performance degradation  after Projection Pruning in CBO
> 
>
> Key: HIVE-14353
> URL: https://issues.apache.org/jira/browse/HIVE-14353
> Project: Hive
>  Issue Type: Bug
>  Components: CBO, Logical Optimizer
>Affects Versions: 1.2.1
>Reporter: Nemon Lou
> Attachments: q46_cbo_no_projection_prune_explain.txt
>
>
> TPC-DS with factor 1024.
> Hive on Spark. 
> With and without projection prunning,time spent are quite different.
> The way to disable projection prunning : disable HiveRelFieldTrimmer in code 
> and compile a new jar.
> ||queries||CBO_no_projection_prune||CBO||
> |q27| 160|251 | 
> |q7   |   200|312 |
> |q88| 701|1092|
> |q68| 234|345 |
> |q39|53|78  |
> |q73| 160|228 |
> |q31| 463|659 |
> |q79| 242|343 |
> |q46| 256|363 |
> |q60| 271|382 |
> |q66| 198|278 |
> |q34| 155|217 |
> |q19| 184|256 |
> |q26| 154|214 |
> |q56| 262|364 |
> |q75| 942|1303|
> |q71| 288|388 |
> |q25| 329|442 |
> |q52| 142|190 |
> |q42| 142|189 |
> |q3   |   139|185 |
> |q98| 153|203 |
> |q89| 187|248 |
> |q58| 264|340 |
> |q43| 127|162 |
> |q32| 174|221 |
> |q96| 156|197 |
> |q70| 320|404 |
> |q29| 499|629 |
> |q18| 266|329 |
> |q21| 76 |92  |
> |q90| 139|165 |



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14353) Performance degradation after Projection Pruning in CBO

2016-07-27 Thread Nemon Lou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15396922#comment-15396922
 ] 

Nemon Lou commented on HIVE-14353:
--

[~pxiong]  Sorry for the misleading. Performance degradation is at run time(an 
application run on YARN),not compile time.
HiveRelFieldTrimmer adds a projection rel node above table scan.The projection 
node then compiled to select operator in hive.
That's why I record the time spent in select operator during run time.

> Performance degradation  after Projection Pruning in CBO
> 
>
> Key: HIVE-14353
> URL: https://issues.apache.org/jira/browse/HIVE-14353
> Project: Hive
>  Issue Type: Bug
>  Components: CBO, Logical Optimizer
>Affects Versions: 1.2.1
>Reporter: Nemon Lou
>
> TPC-DS with factor 1024.
> Hive on Spark. 
> With and without projection prunning,time spent are quite different.
> The way to disable projection prunning : disable HiveRelFieldTrimmer in code 
> and compile a new jar.
> ||queries||CBO_no_projection_prune||CBO||
> |q27| 160|251 | 
> |q7   |   200|312 |
> |q88| 701|1092|
> |q68| 234|345 |
> |q39|53|78  |
> |q73| 160|228 |
> |q31| 463|659 |
> |q79| 242|343 |
> |q46| 256|363 |
> |q60| 271|382 |
> |q66| 198|278 |
> |q34| 155|217 |
> |q19| 184|256 |
> |q26| 154|214 |
> |q56| 262|364 |
> |q75| 942|1303|
> |q71| 288|388 |
> |q25| 329|442 |
> |q52| 142|190 |
> |q42| 142|189 |
> |q3   |   139|185 |
> |q98| 153|203 |
> |q89| 187|248 |
> |q58| 264|340 |
> |q43| 127|162 |
> |q32| 174|221 |
> |q96| 156|197 |
> |q70| 320|404 |
> |q29| 499|629 |
> |q18| 266|329 |
> |q21| 76 |92  |
> |q90| 139|165 |



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14353) Performance degradation after Projection Pruning in CBO

2016-07-27 Thread Nemon Lou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15396893#comment-15396893
 ] 

Nemon Lou commented on HIVE-14353:
--

||queries||CBO_total_time||CBO_time_in_SelectOP||
|q27|   266.494|251 | 
|q7 |   328.259|98.8 |
|q68|   369.159|105 |
|q46|   392.777|91.75|

I just run a few of them because of time limit. The time spent in selectOP is 
calculated by adding up total times spent for selectOP  in one executor ,and 
then divide number of cores.(4 in my case).
Also,I have run q46 without projection pruning.And total time is 266.226,time 
spent in selectOP is 0.125 seconds.

> Performance degradation  after Projection Pruning in CBO
> 
>
> Key: HIVE-14353
> URL: https://issues.apache.org/jira/browse/HIVE-14353
> Project: Hive
>  Issue Type: Bug
>  Components: CBO, Logical Optimizer
>Affects Versions: 1.2.1
>Reporter: Nemon Lou
>
> TPC-DS with factor 1024.
> Hive on Spark. 
> With and without projection prunning,time spent are quite different.
> The way to disable projection prunning : disable HiveRelFieldTrimmer in code 
> and compile a new jar.
> ||queries||CBO_no_projection_prune||CBO||
> |q27| 160|251 | 
> |q7   |   200|312 |
> |q88| 701|1092|
> |q68| 234|345 |
> |q39|53|78  |
> |q73| 160|228 |
> |q31| 463|659 |
> |q79| 242|343 |
> |q46| 256|363 |
> |q60| 271|382 |
> |q66| 198|278 |
> |q34| 155|217 |
> |q19| 184|256 |
> |q26| 154|214 |
> |q56| 262|364 |
> |q75| 942|1303|
> |q71| 288|388 |
> |q25| 329|442 |
> |q52| 142|190 |
> |q42| 142|189 |
> |q3   |   139|185 |
> |q98| 153|203 |
> |q89| 187|248 |
> |q58| 264|340 |
> |q43| 127|162 |
> |q32| 174|221 |
> |q96| 156|197 |
> |q70| 320|404 |
> |q29| 499|629 |
> |q18| 266|329 |
> |q21| 76 |92  |
> |q90| 139|165 |



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-14143) RawDataSize of RCFile is zero after analyze

2016-07-09 Thread Nemon Lou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nemon Lou reassigned HIVE-14143:


Assignee: Nemon Lou  (was: Abhishek)

> RawDataSize of RCFile is zero after analyze 
> 
>
> Key: HIVE-14143
> URL: https://issues.apache.org/jira/browse/HIVE-14143
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Affects Versions: 1.2.1, 2.1.0
>Reporter: Nemon Lou
>Assignee: Nemon Lou
>Priority: Minor
> Attachments: HIVE-14143.1.patch, HIVE-14143.patch
>
>
> After running the following analyze command ,rawDataSize becomes zero for 
> rcfile tables.
> {noformat}
>  analyze table RCFILE_TABLE compute statistics ;
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14143) RawDataSize of RCFile is zero after analyze

2016-07-04 Thread Nemon Lou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15361156#comment-15361156
 ] 

Nemon Lou commented on HIVE-14143:
--

[~pxiong] The "ids" passed in is just "sizeOfColumnsInTableScan" in many 
places.So "ids.size() != *sizeOfColumnsInTableScan" will always be false.
{code}
 ColumnProjectionUtils.appendReadColumns(
  jobConf, ts.getNeededColumnIDs(), ts.getNeededColumns());
{code}
In the case of count(1) or stats gather,"sizeOfColumnsInTableScan"  is zero.We 
need to find a way to distinguish these two cases.
For  count(1), READ_ALL_COLUMNS should be set to false.
For stat gather of rcfile,READ_ALL_COLUMNS should be set to true in order to 
read all columns and then calculate rawDataSize.



> RawDataSize of RCFile is zero after analyze 
> 
>
> Key: HIVE-14143
> URL: https://issues.apache.org/jira/browse/HIVE-14143
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Affects Versions: 1.2.1, 2.1.0
>Reporter: Nemon Lou
>Assignee: Abhishek
>Priority: Minor
> Attachments: HIVE-14143.1.patch, HIVE-14143.patch
>
>
> After running the following analyze command ,rawDataSize becomes zero for 
> rcfile tables.
> {noformat}
>  analyze table RCFILE_TABLE compute statistics ;
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (HIVE-14143) RawDataSize of RCFile is zero after analyze

2016-07-02 Thread Nemon Lou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nemon Lou updated HIVE-14143:
-
Comment: was deleted

(was: Patch updated.)

> RawDataSize of RCFile is zero after analyze 
> 
>
> Key: HIVE-14143
> URL: https://issues.apache.org/jira/browse/HIVE-14143
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Affects Versions: 1.2.1, 2.1.0
>Reporter: Nemon Lou
>Assignee: Nemon Lou
>Priority: Minor
> Attachments: HIVE-14143.1.patch, HIVE-14143.patch
>
>
> After running the following analyze command ,rawDataSize becomes zero for 
> rcfile tables.
> {noformat}
>  analyze table RCFILE_TABLE compute statistics ;
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14143) RawDataSize of RCFile is zero after analyze

2016-07-02 Thread Nemon Lou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nemon Lou updated HIVE-14143:
-
Attachment: HIVE-14143.1.patch

> RawDataSize of RCFile is zero after analyze 
> 
>
> Key: HIVE-14143
> URL: https://issues.apache.org/jira/browse/HIVE-14143
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Affects Versions: 1.2.1, 2.1.0
>Reporter: Nemon Lou
>Assignee: Nemon Lou
>Priority: Minor
> Attachments: HIVE-14143.1.patch, HIVE-14143.patch
>
>
> After running the following analyze command ,rawDataSize becomes zero for 
> rcfile tables.
> {noformat}
>  analyze table RCFILE_TABLE compute statistics ;
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14143) RawDataSize of RCFile is zero after analyze

2016-07-02 Thread Nemon Lou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nemon Lou updated HIVE-14143:
-
Attachment: HIVE-14143.1.patch

Patch updated.

> RawDataSize of RCFile is zero after analyze 
> 
>
> Key: HIVE-14143
> URL: https://issues.apache.org/jira/browse/HIVE-14143
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Affects Versions: 1.2.1, 2.1.0
>Reporter: Nemon Lou
>Assignee: Nemon Lou
>Priority: Minor
> Attachments: HIVE-14143.patch
>
>
> After running the following analyze command ,rawDataSize becomes zero for 
> rcfile tables.
> {noformat}
>  analyze table RCFILE_TABLE compute statistics ;
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14143) RawDataSize of RCFile is zero after analyze

2016-07-02 Thread Nemon Lou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nemon Lou updated HIVE-14143:
-
Attachment: (was: HIVE-14143.1.patch)

> RawDataSize of RCFile is zero after analyze 
> 
>
> Key: HIVE-14143
> URL: https://issues.apache.org/jira/browse/HIVE-14143
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Affects Versions: 1.2.1, 2.1.0
>Reporter: Nemon Lou
>Assignee: Nemon Lou
>Priority: Minor
> Attachments: HIVE-14143.patch
>
>
> After running the following analyze command ,rawDataSize becomes zero for 
> rcfile tables.
> {noformat}
>  analyze table RCFILE_TABLE compute statistics ;
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14143) RawDataSize of RCFile is zero after analyze

2016-07-02 Thread Nemon Lou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15360057#comment-15360057
 ] 

Nemon Lou commented on HIVE-14143:
--

Referring to ORC and LazySimpleSerde, rawDataSize is calculated without any 
care of column projection.
So rawDataSize calculation for RCFile can be the same way.Right?

> RawDataSize of RCFile is zero after analyze 
> 
>
> Key: HIVE-14143
> URL: https://issues.apache.org/jira/browse/HIVE-14143
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Affects Versions: 1.2.1, 2.1.0
>Reporter: Nemon Lou
>Assignee: Nemon Lou
>Priority: Minor
> Attachments: HIVE-14143.patch
>
>
> After running the following analyze command ,rawDataSize becomes zero for 
> rcfile tables.
> {noformat}
>  analyze table RCFILE_TABLE compute statistics ;
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14143) RawDataSize of RCFile is zero after analyze

2016-07-02 Thread Nemon Lou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15360049#comment-15360049
 ] 

Nemon Lou commented on HIVE-14143:
--

Agreed. As describe in TableScanDesc.java
{code} 
  // Both neededColumnIDs and neededColumns should never be null.
  // When neededColumnIDs is an empty list,
  // it means no needed column (e.g. we do not need any column to evaluate
  // SELECT count(*) FROM t).
  private List neededColumnIDs;
 {code} 
 
 I must has been misleading by the following code in HiveInputFormat.java:
{code}
  private void pushProjection(final JobConf newjob, final StringBuilder 
readColumnsBuffer,
  final StringBuilder readColumnNamesBuffer) {
String readColIds = readColumnsBuffer.toString();
String readColNames = readColumnNamesBuffer.toString();
boolean readAllColumns = readColIds.isEmpty() ? true : false;
newjob.setBoolean(ColumnProjectionUtils.READ_ALL_COLUMNS, readAllColumns);
   ...
  }  
 {code}
The solution is not clear for me .  Any suggestions?

> RawDataSize of RCFile is zero after analyze 
> 
>
> Key: HIVE-14143
> URL: https://issues.apache.org/jira/browse/HIVE-14143
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Affects Versions: 1.2.1, 2.1.0
>Reporter: Nemon Lou
>Assignee: Nemon Lou
>Priority: Minor
> Attachments: HIVE-14143.patch
>
>
> After running the following analyze command ,rawDataSize becomes zero for 
> rcfile tables.
> {noformat}
>  analyze table RCFILE_TABLE compute statistics ;
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14143) RawDataSize of RCFile is zero after analyze

2016-07-01 Thread Nemon Lou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15359940#comment-15359940
 ] 

Nemon Lou commented on HIVE-14143:
--

[~pxiong] Thanks for your attention.

RawDataSize for rcfile is a summary size of the total selected columns.
https://github.com/apache/hive/blob/master/serde/src/java/org/apache/hadoop/hive/serde2/columnar/ColumnarStructBase.java#L229
{code}
  public long getRawDataSerializedSize() {
long serializedSize = 0;
for (int i = 0; i < fieldInfoList.length; ++i) {
  serializedSize += fieldInfoList[i].getSerializedSize();
}
return serializedSize;
  }
{code}

During projections push down,READ_ALL_COLUMNS is always set to false,no matter 
the specified columns are empty or not.
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java#L656
{code}
for (String alias : aliases) {
  Operator op = this.mrwork.getAliasToWork().get(
alias);
  if (op instanceof TableScanOperator) {
TableScanOperator ts = (TableScanOperator) op;
// push down projections.
ColumnProjectionUtils.appendReadColumns(
jobConf, ts.getNeededColumnIDs(), ts.getNeededColumns());
// push down filters
pushFilters(jobConf, ts);

AcidUtils.setTransactionalTableScan(job, ts.getConf().isAcidTable());
  }
}
{code}
The specified column ids are empty for analyze,which means read all columns.

Finally, no column is read :
https://github.com/apache/hive/blob/master/serde/src/java/org/apache/hadoop/hive/serde2/columnar/ColumnarSerDe.java#L104
{code}
List notSkipIDs = new ArrayList();
if (conf == null || ColumnProjectionUtils.isReadAllColumns(conf)) {
  for (int i = 0; i < size; i++ ) {
notSkipIDs.add(i);
  }
} else {
  notSkipIDs = ColumnProjectionUtils.getReadColumnIDs(conf);
}
cachedLazyStruct = new ColumnarStruct(
cachedObjectInspector, notSkipIDs, serdeParams.getNullSequence());
{code}

> RawDataSize of RCFile is zero after analyze 
> 
>
> Key: HIVE-14143
> URL: https://issues.apache.org/jira/browse/HIVE-14143
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Affects Versions: 1.2.1, 2.1.0
>Reporter: Nemon Lou
>Assignee: Nemon Lou
>Priority: Minor
> Attachments: HIVE-14143.patch
>
>
> After running the following analyze command ,rawDataSize becomes zero for 
> rcfile tables.
> {noformat}
>  analyze table RCFILE_TABLE compute statistics ;
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14143) RawDataSize of RCFile is zero after analyze

2016-07-01 Thread Nemon Lou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nemon Lou updated HIVE-14143:
-
Attachment: HIVE-14143.patch

> RawDataSize of RCFile is zero after analyze 
> 
>
> Key: HIVE-14143
> URL: https://issues.apache.org/jira/browse/HIVE-14143
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Affects Versions: 1.2.1, 2.1.0
>Reporter: Nemon Lou
>Assignee: Nemon Lou
>Priority: Minor
> Attachments: HIVE-14143.patch
>
>
> After running the following analyze command ,rawDataSize becomes zero for 
> rcfile tables.
> {noformat}
>  analyze table RCFILE_TABLE compute statistics ;
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-5999) Allow other characters for LINES TERMINATED BY

2016-06-12 Thread Nemon Lou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nemon Lou updated HIVE-5999:

Status: Open  (was: Patch Available)

> Allow other characters for LINES TERMINATED BY 
> ---
>
> Key: HIVE-5999
> URL: https://issues.apache.org/jira/browse/HIVE-5999
> Project: Hive
>  Issue Type: Improvement
>  Components: Beeline, Database/Schema, Hive
>Affects Versions: 0.12.0
>Reporter: Mariano Dominguez
>Assignee: Nemon Lou
>Priority: Critical
>  Labels: Delimiter, Hive, Row, SerDe
> Attachments: HIVE-5999.1.patch, HIVE-5999.patch
>
>
> LINES TERMINATED BY only supports newline '\n' right now.
> It would be nice to loosen this constraint and allow other characters.
> This limitation seems to be hardcoded here:
> https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java#L171
> The DDL Definition on the Hive Language manual shows this as a configurable 
> property whereas it is not. This may lead to mileading assement of being able 
> to choose a choice of field delimiter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10815) Let HiveMetaStoreClient Choose MetaStore Randomly

2016-06-05 Thread Nemon Lou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15316127#comment-15316127
 ] 

Nemon Lou commented on HIVE-10815:
--

Seems that no failure tests are related. [~thejas] would you review it again? 
Thanks.
Comparing with other build, such as 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/7/testReport/
Following are common failure tests:
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_13
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_constprog_partitioner
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3
{noformat}
This one fails even without this patch :
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_rand_partitionpruner3
{noformat}
This one doesn't fail on my local build :
{noformat}
org.apache.hive.service.TestHS2ImpersonationWithRemoteMS.org.apache.hive.service.TestHS2ImpersonationWithRemoteMS
{noformat} 

> Let HiveMetaStoreClient Choose MetaStore Randomly
> -
>
> Key: HIVE-10815
> URL: https://issues.apache.org/jira/browse/HIVE-10815
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2, Metastore
>Affects Versions: 1.2.0
>Reporter: Nemon Lou
>Assignee: Nemon Lou
> Attachments: HIVE-10815.1.patch, HIVE-10815.2.patch, HIVE-10815.patch
>
>
> Currently HiveMetaStoreClient using a fixed order to choose MetaStore URIs 
> when multiple metastores configured.
>  Choosing MetaStore Randomly will be good for load balance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10815) Let HiveMetaStoreClient Choose MetaStore Randomly

2016-06-01 Thread Nemon Lou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nemon Lou updated HIVE-10815:
-
Attachment: HIVE-10815.2.patch

Now there are less failure tests.So trigger it again.

> Let HiveMetaStoreClient Choose MetaStore Randomly
> -
>
> Key: HIVE-10815
> URL: https://issues.apache.org/jira/browse/HIVE-10815
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2, Metastore
>Affects Versions: 1.2.0
>Reporter: Nemon Lou
>Assignee: Nemon Lou
> Attachments: HIVE-10815.1.patch, HIVE-10815.2.patch, HIVE-10815.patch
>
>
> Currently HiveMetaStoreClient using a fixed order to choose MetaStore URIs 
> when multiple metastores configured.
>  Choosing MetaStore Randomly will be good for load balance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10815) Let HiveMetaStoreClient Choose MetaStore Randomly

2016-06-01 Thread Nemon Lou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nemon Lou updated HIVE-10815:
-
Status: Patch Available  (was: Open)

> Let HiveMetaStoreClient Choose MetaStore Randomly
> -
>
> Key: HIVE-10815
> URL: https://issues.apache.org/jira/browse/HIVE-10815
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2, Metastore
>Affects Versions: 1.2.0
>Reporter: Nemon Lou
>Assignee: Nemon Lou
> Attachments: HIVE-10815.1.patch, HIVE-10815.2.patch, HIVE-10815.patch
>
>
> Currently HiveMetaStoreClient using a fixed order to choose MetaStore URIs 
> when multiple metastores configured.
>  Choosing MetaStore Randomly will be good for load balance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10815) Let HiveMetaStoreClient Choose MetaStore Randomly

2016-06-01 Thread Nemon Lou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nemon Lou updated HIVE-10815:
-
Status: Open  (was: Patch Available)

> Let HiveMetaStoreClient Choose MetaStore Randomly
> -
>
> Key: HIVE-10815
> URL: https://issues.apache.org/jira/browse/HIVE-10815
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2, Metastore
>Affects Versions: 1.2.0
>Reporter: Nemon Lou
>Assignee: Nemon Lou
> Attachments: HIVE-10815.1.patch, HIVE-10815.patch
>
>
> Currently HiveMetaStoreClient using a fixed order to choose MetaStore URIs 
> when multiple metastores configured.
>  Choosing MetaStore Randomly will be good for load balance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-5999) Allow other characters for LINES TERMINATED BY

2016-06-01 Thread Nemon Lou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15311563#comment-15311563
 ] 

Nemon Lou commented on HIVE-5999:
-

Failure tests seem unrelated.This patch is ready for review.

> Allow other characters for LINES TERMINATED BY 
> ---
>
> Key: HIVE-5999
> URL: https://issues.apache.org/jira/browse/HIVE-5999
> Project: Hive
>  Issue Type: Improvement
>  Components: Beeline, Database/Schema, Hive
>Affects Versions: 0.12.0
>Reporter: Mariano Dominguez
>Assignee: Nemon Lou
>Priority: Critical
>  Labels: Delimiter, Hive, Row, SerDe
> Attachments: HIVE-5999.1.patch, HIVE-5999.patch
>
>
> LINES TERMINATED BY only supports newline '\n' right now.
> It would be nice to loosen this constraint and allow other characters.
> This limitation seems to be hardcoded here:
> https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java#L171
> The DDL Definition on the Hive Language manual shows this as a configurable 
> property whereas it is not. This may lead to mileading assement of being able 
> to choose a choice of field delimiter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-5999) Allow other characters for LINES TERMINATED BY

2016-05-30 Thread Nemon Lou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nemon Lou updated HIVE-5999:

Status: Patch Available  (was: Open)

> Allow other characters for LINES TERMINATED BY 
> ---
>
> Key: HIVE-5999
> URL: https://issues.apache.org/jira/browse/HIVE-5999
> Project: Hive
>  Issue Type: Improvement
>  Components: Beeline, Database/Schema, Hive
>Affects Versions: 0.12.0
>Reporter: Mariano Dominguez
>Assignee: Nemon Lou
>Priority: Critical
>  Labels: Delimiter, Hive, Row, SerDe
> Attachments: HIVE-5999.1.patch, HIVE-5999.patch
>
>
> LINES TERMINATED BY only supports newline '\n' right now.
> It would be nice to loosen this constraint and allow other characters.
> This limitation seems to be hardcoded here:
> https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java#L171
> The DDL Definition on the Hive Language manual shows this as a configurable 
> property whereas it is not. This may lead to mileading assement of being able 
> to choose a choice of field delimiter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-5999) Allow other characters for LINES TERMINATED BY

2016-05-30 Thread Nemon Lou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nemon Lou updated HIVE-5999:

Attachment: HIVE-5999.1.patch

Fixing failure test.

> Allow other characters for LINES TERMINATED BY 
> ---
>
> Key: HIVE-5999
> URL: https://issues.apache.org/jira/browse/HIVE-5999
> Project: Hive
>  Issue Type: Improvement
>  Components: Beeline, Database/Schema, Hive
>Affects Versions: 0.12.0
>Reporter: Mariano Dominguez
>Assignee: Nemon Lou
>Priority: Critical
>  Labels: Delimiter, Hive, Row, SerDe
> Attachments: HIVE-5999.1.patch, HIVE-5999.patch
>
>
> LINES TERMINATED BY only supports newline '\n' right now.
> It would be nice to loosen this constraint and allow other characters.
> This limitation seems to be hardcoded here:
> https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java#L171
> The DDL Definition on the Hive Language manual shows this as a configurable 
> property whereas it is not. This may lead to mileading assement of being able 
> to choose a choice of field delimiter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-5999) Allow other characters for LINES TERMINATED BY

2016-05-28 Thread Nemon Lou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nemon Lou updated HIVE-5999:

Status: Patch Available  (was: Open)

> Allow other characters for LINES TERMINATED BY 
> ---
>
> Key: HIVE-5999
> URL: https://issues.apache.org/jira/browse/HIVE-5999
> Project: Hive
>  Issue Type: Improvement
>  Components: Beeline, Database/Schema, Hive
>Affects Versions: 0.12.0
>Reporter: Mariano Dominguez
>Assignee: Nemon Lou
>Priority: Critical
>  Labels: Delimiter, Hive, Row, SerDe
> Attachments: HIVE-5999.patch
>
>
> LINES TERMINATED BY only supports newline '\n' right now.
> It would be nice to loosen this constraint and allow other characters.
> This limitation seems to be hardcoded here:
> https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java#L171
> The DDL Definition on the Hive Language manual shows this as a configurable 
> property whereas it is not. This may lead to mileading assement of being able 
> to choose a choice of field delimiter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-5999) Allow other characters for LINES TERMINATED BY

2016-05-28 Thread Nemon Lou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nemon Lou updated HIVE-5999:

Attachment: HIVE-5999.patch

Limitations:
1,"lines terminated by" only works with text file.
2,"Multiple table with the same data path but different delimiter" is not 
supported in the same query, due to path binding with tableDesc(See  
Map pathToPartitionInfo).
3,Line delimiter "10" will  be treated as string "10" instead of "\n" .As we 
support line delimiter with string .

> Allow other characters for LINES TERMINATED BY 
> ---
>
> Key: HIVE-5999
> URL: https://issues.apache.org/jira/browse/HIVE-5999
> Project: Hive
>  Issue Type: Improvement
>  Components: Beeline, Database/Schema, Hive
>Affects Versions: 0.12.0
>Reporter: Mariano Dominguez
>Assignee: Nemon Lou
>Priority: Critical
>  Labels: Delimiter, Hive, Row, SerDe
> Attachments: HIVE-5999.patch
>
>
> LINES TERMINATED BY only supports newline '\n' right now.
> It would be nice to loosen this constraint and allow other characters.
> This limitation seems to be hardcoded here:
> https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java#L171
> The DDL Definition on the Hive Language manual shows this as a configurable 
> property whereas it is not. This may lead to mileading assement of being able 
> to choose a choice of field delimiter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-5999) Allow other characters for LINES TERMINATED BY

2016-05-26 Thread Nemon Lou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15303389#comment-15303389
 ] 

Nemon Lou edited comment on HIVE-5999 at 5/27/16 3:05 AM:
--

[~ashutoshc] Do you plan to work on this? I have implemented one based on text 
file.And need some review from hive community. :)


was (Author: nemon):
[~ashutoshc] Do you plan to work on this? I have implemented one based on text 
file.And nee some review from hive community. :)

> Allow other characters for LINES TERMINATED BY 
> ---
>
> Key: HIVE-5999
> URL: https://issues.apache.org/jira/browse/HIVE-5999
> Project: Hive
>  Issue Type: Improvement
>  Components: Beeline, Database/Schema, Hive
>Affects Versions: 0.12.0
>Reporter: Mariano Dominguez
>Assignee: Ashutosh Chauhan
>Priority: Critical
>  Labels: Delimiter, Hive, Row, SerDe
>
> LINES TERMINATED BY only supports newline '\n' right now.
> It would be nice to loosen this constraint and allow other characters.
> This limitation seems to be hardcoded here:
> https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java#L171
> The DDL Definition on the Hive Language manual shows this as a configurable 
> property whereas it is not. This may lead to mileading assement of being able 
> to choose a choice of field delimiter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-5999) Allow other characters for LINES TERMINATED BY

2016-05-26 Thread Nemon Lou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15303389#comment-15303389
 ] 

Nemon Lou commented on HIVE-5999:
-

[~ashutoshc] Do you plan to work on this? I have implemented one based on text 
file.And nee some review from hive community. :)

> Allow other characters for LINES TERMINATED BY 
> ---
>
> Key: HIVE-5999
> URL: https://issues.apache.org/jira/browse/HIVE-5999
> Project: Hive
>  Issue Type: Improvement
>  Components: Beeline, Database/Schema, Hive
>Affects Versions: 0.12.0
>Reporter: Mariano Dominguez
>Assignee: Ashutosh Chauhan
>Priority: Critical
>  Labels: Delimiter, Hive, Row, SerDe
>
> LINES TERMINATED BY only supports newline '\n' right now.
> It would be nice to loosen this constraint and allow other characters.
> This limitation seems to be hardcoded here:
> https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java#L171
> The DDL Definition on the Hive Language manual shows this as a configurable 
> property whereas it is not. This may lead to mileading assement of being able 
> to choose a choice of field delimiter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10417) Parallel Order By return wrong results for partitioned tables

2016-05-24 Thread Nemon Lou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nemon Lou updated HIVE-10417:
-
Status: Open  (was: Patch Available)

> Parallel Order By return wrong results for partitioned tables
> -
>
> Key: HIVE-10417
> URL: https://issues.apache.org/jira/browse/HIVE-10417
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 1.0.0, 0.13.1, 0.14.0
>Reporter: Nemon Lou
>Assignee: Nemon Lou
> Attachments: HIVE-10417.patch
>
>
> Following is the script that reproduce this bug.
> set hive.optimize.sampling.orderby=true;
> set mapreduce.job.reduces=10;
> select * from src order by key desc limit 10;
> +--++
> | src.key  | src.value  |
> +--++
> | 98   | val_98 |
> | 98   | val_98 |
> | 97   | val_97 |
> | 97   | val_97 |
> | 96   | val_96 |
> | 95   | val_95 |
> | 95   | val_95 |
> | 92   | val_92 |
> | 90   | val_90 |
> | 90   | val_90 |
> +--++
> 10 rows selected (47.916 seconds)
> reset;
> create table src_orc_p (key string ,value string )
> partitioned by (kp string)
> stored as orc
> tblproperties("orc.compress"="SNAPPY");
> set hive.exec.dynamic.partition.mode=nonstrict;
> set hive.exec.max.dynamic.partitions.pernode=1;
> set hive.exec.max.dynamic.partitions=1;
> insert into table src_orc_p partition(kp) select *,substring(key,1) from src 
> distribute by substring(key,1);
> set mapreduce.job.reduces=10;
> set hive.optimize.sampling.orderby=true;
> select * from src_orc_p order by key desc limit 10;
> ++--+-+
> | src_orc_p.key  | src_orc_p.value  | src_orc_p.kend  |
> ++--+-+
> | 0  | val_0| 0   |
> | 0  | val_0| 0   |
> | 0  | val_0| 0   |
> ++--+-+
> 3 rows selected (39.861 seconds)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13791) Fix failure Unit Test TestHiveSessionImpl.testLeakOperationHandle

2016-05-19 Thread Nemon Lou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nemon Lou updated HIVE-13791:
-
Attachment: HIVE-13791.patch

I have run TestHiveSessionImpl successfully from local . 

> Fix  failure Unit Test TestHiveSessionImpl.testLeakOperationHandle
> --
>
> Key: HIVE-13791
> URL: https://issues.apache.org/jira/browse/HIVE-13791
> Project: Hive
>  Issue Type: Test
>  Components: Test
>Affects Versions: 2.1.0
>Reporter: Nemon Lou
>Assignee: Nemon Lou
>Priority: Minor
> Attachments: HIVE-13791.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10815) Let HiveMetaStoreClient Choose MetaStore Randomly

2016-05-17 Thread Nemon Lou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nemon Lou updated HIVE-10815:
-
Status: Patch Available  (was: Open)

Patch rebased to master.

> Let HiveMetaStoreClient Choose MetaStore Randomly
> -
>
> Key: HIVE-10815
> URL: https://issues.apache.org/jira/browse/HIVE-10815
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2, Metastore
>Affects Versions: 1.2.0
>Reporter: Nemon Lou
>Assignee: Nemon Lou
> Attachments: HIVE-10815.1.patch, HIVE-10815.patch
>
>
> Currently HiveMetaStoreClient using a fixed order to choose MetaStore URIs 
> when multiple metastores configured.
>  Choosing MetaStore Randomly will be good for load balance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10815) Let HiveMetaStoreClient Choose MetaStore Randomly

2016-05-17 Thread Nemon Lou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nemon Lou updated HIVE-10815:
-
Attachment: HIVE-10815.1.patch

> Let HiveMetaStoreClient Choose MetaStore Randomly
> -
>
> Key: HIVE-10815
> URL: https://issues.apache.org/jira/browse/HIVE-10815
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2, Metastore
>Affects Versions: 1.2.0
>Reporter: Nemon Lou
>Assignee: Nemon Lou
> Attachments: HIVE-10815.1.patch, HIVE-10815.patch
>
>
> Currently HiveMetaStoreClient using a fixed order to choose MetaStore URIs 
> when multiple metastores configured.
>  Choosing MetaStore Randomly will be good for load balance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10815) Let HiveMetaStoreClient Choose MetaStore Randomly

2016-05-17 Thread Nemon Lou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nemon Lou updated HIVE-10815:
-
Status: Open  (was: Patch Available)

> Let HiveMetaStoreClient Choose MetaStore Randomly
> -
>
> Key: HIVE-10815
> URL: https://issues.apache.org/jira/browse/HIVE-10815
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2, Metastore
>Affects Versions: 1.2.0
>Reporter: Nemon Lou
>Assignee: Nemon Lou
> Attachments: HIVE-10815.patch
>
>
> Currently HiveMetaStoreClient using a fixed order to choose MetaStore URIs 
> when multiple metastores configured.
>  Choosing MetaStore Randomly will be good for load balance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10815) Let HiveMetaStoreClient Choose MetaStore Randomly

2016-05-16 Thread Nemon Lou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15285941#comment-15285941
 ] 

Nemon Lou commented on HIVE-10815:
--

Seems that it fail to attract any volunteer to review. 
Shall I rebase it to master?

> Let HiveMetaStoreClient Choose MetaStore Randomly
> -
>
> Key: HIVE-10815
> URL: https://issues.apache.org/jira/browse/HIVE-10815
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2, Metastore
>Affects Versions: 1.2.0
>Reporter: Nemon Lou
>Assignee: Nemon Lou
> Attachments: HIVE-10815.patch
>
>
> Currently HiveMetaStoreClient using a fixed order to choose MetaStore URIs 
> when multiple metastores configured.
>  Choosing MetaStore Randomly will be good for load balance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13602) TPCH q16 return wrong result when CBO is on

2016-05-15 Thread Nemon Lou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15284045#comment-15284045
 ] 

Nemon Lou commented on HIVE-13602:
--

Thanks [~pxiong] .It will be nice to provide a patch for branch-1, too. If 
there will be a branch-1 release in the future .

> TPCH q16 return wrong result when CBO is on
> ---
>
> Key: HIVE-13602
> URL: https://issues.apache.org/jira/browse/HIVE-13602
> Project: Hive
>  Issue Type: Bug
>  Components: CBO, Logical Optimizer
>Affects Versions: 2.0.0, 1.2.2
>Reporter: Nemon Lou
>Assignee: Pengcheng Xiong
> Attachments: HIVE-13602.01.patch, HIVE-13602.03.patch, 
> HIVE-13602.04.patch, HIVE-13602.05.patch, HIVE-13602.final.patch, 
> calcite_cbo_bad.out, calcite_cbo_good.out, explain_cbo_bad_part1.out, 
> explain_cbo_bad_part2.out, explain_cbo_bad_part3.out, 
> explain_cbo_good(rewrite)_part1.out, explain_cbo_good(rewrite)_part2.out, 
> explain_cbo_good(rewrite)_part3.out
>
>
> Running tpch with factor 2, 
> q16 returns 1,160 rows when CBO is on,
> while returns 24,581 rows when CBO is off.
> See attachment for detail .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13602) TPCH q16 return wrong result when CBO is on

2016-04-25 Thread Nemon Lou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15257398#comment-15257398
 ] 

Nemon Lou commented on HIVE-13602:
--

It's 24581 on my computer. I must have checked the wrong stages from mapreduce 
job UI.
After set hive.optimize.constant.propagation=false;
the result is right:
INFO  : Table tpch_flat_orc_2.q16_cbo_debug2 stats: [numFiles=1, numRows=24581, 
totalSize=803640, rawDataSize=786232]


> TPCH q16 return wrong result when CBO is on
> ---
>
> Key: HIVE-13602
> URL: https://issues.apache.org/jira/browse/HIVE-13602
> Project: Hive
>  Issue Type: Bug
>  Components: CBO, Logical Optimizer
>Affects Versions: 1.3.0, 2.0.0, 1.2.2
>Reporter: Nemon Lou
>Assignee: Pengcheng Xiong
> Attachments: calcite_cbo_bad.out, calcite_cbo_good.out, 
> explain_cbo_bad_part1.out, explain_cbo_bad_part2.out, 
> explain_cbo_bad_part3.out, explain_cbo_good(rewrite)_part1.out, 
> explain_cbo_good(rewrite)_part2.out, explain_cbo_good(rewrite)_part3.out
>
>
> Running tpch with factor 2, 
> q16 returns 1,160 rows when CBO is on,
> while returns 59,616 rows when CBO is off.
> See attachment for detail .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   3   >