[jira] [Assigned] (HIVE-14204) Optimize loading dynamic partitions

2016-07-22 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg reassigned HIVE-14204:
--

Assignee: Vineet Garg  (was: Rajesh Balamohan)

> Optimize loading dynamic partitions 
> 
>
> Key: HIVE-14204
> URL: https://issues.apache.org/jira/browse/HIVE-14204
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Vineet Garg
>Priority: Minor
> Attachments: HIVE-14204.1.patch, HIVE-14204.3.patch, 
> HIVE-14204.4.patch
>
>
> Lots of time is spent in sequential fashion to load dynamic partitioned 
> dataset in driver side. E.g simple dynamic partitioned load as follows takes 
> 300+ seconds
> {noformat}
> INSERT INTO web_sales_test partition(ws_sold_date_sk) select * from 
> tpcds_bin_partitioned_orc_200.web_sales;
> Time taken to load dynamic partitions: 309.22 seconds
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-14204) Optimize loading dynamic partitions

2016-07-22 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg reassigned HIVE-14204:
--

Assignee: Rajesh Balamohan  (was: Vineet Garg)

> Optimize loading dynamic partitions 
> 
>
> Key: HIVE-14204
> URL: https://issues.apache.org/jira/browse/HIVE-14204
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-14204.1.patch, HIVE-14204.3.patch, 
> HIVE-14204.4.patch
>
>
> Lots of time is spent in sequential fashion to load dynamic partitioned 
> dataset in driver side. E.g simple dynamic partitioned load as follows takes 
> 300+ seconds
> {noformat}
> INSERT INTO web_sales_test partition(ws_sold_date_sk) select * from 
> tpcds_bin_partitioned_orc_200.web_sales;
> Time taken to load dynamic partitions: 309.22 seconds
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14320) Fix table_access_key_stats with returnpath feature on

2016-07-22 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-14320:
---
Attachment: HIVE-14320.1.patch

> Fix table_access_key_stats with returnpath feature on
> -
>
> Key: HIVE-14320
> URL: https://issues.apache.org/jira/browse/HIVE-14320
> Project: Hive
>  Issue Type: Task
>  Components: Hive
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-14320.1.patch
>
>
> With retrunpath feature on you get nullpointer exception with this test.
> This is because TableAccessAnalyzer expects join operator to have list of 
> underlying table reference (baseSrc). But during conversion of calcite plan 
> to hive operator tree this information is not propagated and is lost.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14320) Fix table_access_key_stats with returnpath feature on

2016-07-22 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-14320:
---
Status: Patch Available  (was: Open)

> Fix table_access_key_stats with returnpath feature on
> -
>
> Key: HIVE-14320
> URL: https://issues.apache.org/jira/browse/HIVE-14320
> Project: Hive
>  Issue Type: Task
>  Components: Hive
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-14320.1.patch
>
>
> With retrunpath feature on you get nullpointer exception with this test.
> This is because TableAccessAnalyzer expects join operator to have list of 
> underlying table reference (baseSrc). But during conversion of calcite plan 
> to hive operator tree this information is not propagated and is lost.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14320) Fix table_access_key_stats with returnpath feature on

2016-07-25 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-14320:
---
Attachment: HIVE-14320.2.patch

> Fix table_access_key_stats with returnpath feature on
> -
>
> Key: HIVE-14320
> URL: https://issues.apache.org/jira/browse/HIVE-14320
> Project: Hive
>  Issue Type: Task
>  Components: Hive
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-14320.1.patch, HIVE-14320.2.patch
>
>
> With retrunpath feature on you get nullpointer exception with this test.
> This is because TableAccessAnalyzer expects join operator to have list of 
> underlying table reference (baseSrc). But during conversion of calcite plan 
> to hive operator tree this information is not propagated and is lost.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14320) Fix table_access_key_stats with returnpath feature on

2016-07-25 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-14320:
---
Status: Patch Available  (was: Open)

Updated one golden file for table_access_keys_stats for TestSparkCliDriver

> Fix table_access_key_stats with returnpath feature on
> -
>
> Key: HIVE-14320
> URL: https://issues.apache.org/jira/browse/HIVE-14320
> Project: Hive
>  Issue Type: Task
>  Components: Hive
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-14320.1.patch, HIVE-14320.2.patch
>
>
> With retrunpath feature on you get nullpointer exception with this test.
> This is because TableAccessAnalyzer expects join operator to have list of 
> underlying table reference (baseSrc). But during conversion of calcite plan 
> to hive operator tree this information is not propagated and is lost.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14320) Fix table_access_key_stats with returnpath feature on

2016-07-25 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-14320:
---
Status: Open  (was: Patch Available)

> Fix table_access_key_stats with returnpath feature on
> -
>
> Key: HIVE-14320
> URL: https://issues.apache.org/jira/browse/HIVE-14320
> Project: Hive
>  Issue Type: Task
>  Components: Hive
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-14320.1.patch, HIVE-14320.2.patch
>
>
> With retrunpath feature on you get nullpointer exception with this test.
> This is because TableAccessAnalyzer expects join operator to have list of 
> underlying table reference (baseSrc). But during conversion of calcite plan 
> to hive operator tree this information is not propagated and is lost.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14396) CBO: Calcite Operator To Hive Operator (Calcite Return Path): TestCliDriver count.q failure

2016-08-11 Thread Vineet Garg (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15417726#comment-15417726
 ] 

Vineet Garg commented on HIVE-14396:


None of the tests are reproducible on my local machine

> CBO: Calcite Operator To Hive Operator (Calcite Return Path): TestCliDriver 
> count.q failure
> ---
>
> Key: HIVE-14396
> URL: https://issues.apache.org/jira/browse/HIVE-14396
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-14396.1.patch
>
>
> Currently there are three different failures
> Set hive.cbo.returnpath.hiveop=true for all cases.
> 1) First case is wrong result for following query
> {code:title=failure 1 Wrong result}
> explain select count(1), count(*), count(a), count(b), count(c), count(d), 
> count(distinct a), count(distinct b), count(distinct c), count(distinct d), 
> count(distinct a,b), count(distinct b,c), count(distinct c,d), count(distinct 
> a,d), count(distinct a,c), count(distinct b,d), count(distinct a,b,c), 
> count(distinct b,c,d), count(distinct a,c,d), count(distinct a,b,d), 
> count(distinct a,b,c,d) from abcd;
> {code}
> This occurs due to a bug in HiveCalciteUtil.getExprNodes. While looking for 
> corresponding expression for a aggregate function's argument wrong index is 
> being used.
> 2) Out of bound exception for following
> {code}
> set hive.map.aggr=false
> explain select count(1), count(*), count(a), count(b), count(c), count(d), 
> count(distinct a), count(distinct b), count(distinct c), count(distinct d), 
> count(distinct a,b), count(distinct b,c), count(distinct c,d), count(distinct 
> a,d), count(distinct a,c), count(distinct b,d), count(distinct a,b,c), 
> count(distinct b,c,d), count(distinct a,c,d), count(distinct a,b,d), 
> count(distinct a,b,c,d) from abcd;
> {code}
> The above happens while converting Calcite Aggregation to Hive's group by 
> operator.
> 3) Once the above case with exception is fixed same query with 
> hive.map.aggr=false give wrong results. Problem in this case is that while 
> creating expression for aggregate function's argument we end up with wrong 
> column info from underlying reduce sink operator. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-14396) CBO: Calcite Operator To Hive Operator (Calcite Return Path): TestCliDriver count.q failure

2016-08-11 Thread Vineet Garg (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15417726#comment-15417726
 ] 

Vineet Garg edited comment on HIVE-14396 at 8/11/16 6:40 PM:
-

None of the test failures are reproducible on my local machine


was (Author: vgarg):
None of the tests are reproducible on my local machine

> CBO: Calcite Operator To Hive Operator (Calcite Return Path): TestCliDriver 
> count.q failure
> ---
>
> Key: HIVE-14396
> URL: https://issues.apache.org/jira/browse/HIVE-14396
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-14396.1.patch
>
>
> Currently there are three different failures
> Set hive.cbo.returnpath.hiveop=true for all cases.
> 1) First case is wrong result for following query
> {code:title=failure 1 Wrong result}
> explain select count(1), count(*), count(a), count(b), count(c), count(d), 
> count(distinct a), count(distinct b), count(distinct c), count(distinct d), 
> count(distinct a,b), count(distinct b,c), count(distinct c,d), count(distinct 
> a,d), count(distinct a,c), count(distinct b,d), count(distinct a,b,c), 
> count(distinct b,c,d), count(distinct a,c,d), count(distinct a,b,d), 
> count(distinct a,b,c,d) from abcd;
> {code}
> This occurs due to a bug in HiveCalciteUtil.getExprNodes. While looking for 
> corresponding expression for a aggregate function's argument wrong index is 
> being used.
> 2) Out of bound exception for following
> {code}
> set hive.map.aggr=false
> explain select count(1), count(*), count(a), count(b), count(c), count(d), 
> count(distinct a), count(distinct b), count(distinct c), count(distinct d), 
> count(distinct a,b), count(distinct b,c), count(distinct c,d), count(distinct 
> a,d), count(distinct a,c), count(distinct b,d), count(distinct a,b,c), 
> count(distinct b,c,d), count(distinct a,c,d), count(distinct a,b,d), 
> count(distinct a,b,c,d) from abcd;
> {code}
> The above happens while converting Calcite Aggregation to Hive's group by 
> operator.
> 3) Once the above case with exception is fixed same query with 
> hive.map.aggr=false give wrong results. Problem in this case is that while 
> creating expression for aggregate function's argument we end up with wrong 
> column info from underlying reduce sink operator. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14522) CBO: Calcite Operator To Hive Operator(Calcite Return Path): Fix test failure for auto_join_filters

2016-08-11 Thread Vineet Garg (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15417946#comment-15417946
 ] 

Vineet Garg commented on HIVE-14522:


Right outer and full outer joins have wrong result as well

> CBO: Calcite Operator To Hive Operator(Calcite Return Path): Fix test failure 
> for auto_join_filters
> ---
>
> Key: HIVE-14522
> URL: https://issues.apache.org/jira/browse/HIVE-14522
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>
> {code}
> CREATE TABLE smb_input1(key int, value int) CLUSTERED BY (key) SORTED BY 
> (key) INTO 2 BUCKETS; 
> CREATE TABLE smb_input2(key int, value int) CLUSTERED BY (value) SORTED BY 
> (value) INTO 2 BUCKETS; 
> LOAD DATA LOCAL INPATH '../../data/files/in1.txt' into table smb_input1;
> LOAD DATA LOCAL INPATH '../../data/files/in2.txt' into table smb_input1;
> LOAD DATA LOCAL INPATH '../../data/files/in1.txt' into table smb_input2;
> LOAD DATA LOCAL INPATH '../../data/files/in2.txt' into table smb_input2;
> SET hive.optimize.bucketmapjoin = true;
> SET hive.optimize.bucketmapjoin.sortedmerge = true;
> SET hive.input.format = 
> org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
> SET hive.outerjoin.supports.filters = false;
> {code}
> {code} SELECT sum(hash(a.key,a.value,b.key,b.value)) FROM myinput1 a LEFT 
> OUTER JOIN myinput1 b on a.key > 40 AND a.value > 50 AND a.key = a.value AND 
> b.key > 40 AND b.value > 50 AND b.key = b.value; {code}
> {code} Expected result: 3078400 Actual result: 4937935 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12924) CBO: Calcite Operator To Hive Operator (Calcite Return Path): TestCliDriver groupby_ppr_multi_distinct.q failure

2016-08-11 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-12924:
---
Resolution: Duplicate
Status: Resolved  (was: Patch Available)

Marking this as duplicate since this is same issue as HIVE-14396 (issue 3)

> CBO: Calcite Operator To Hive Operator (Calcite Return Path): TestCliDriver 
> groupby_ppr_multi_distinct.q failure
> 
>
> Key: HIVE-12924
> URL: https://issues.apache.org/jira/browse/HIVE-12924
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Vineet Garg
> Attachments: HIVE-12924.1.patch, HIVE-12924.2.patch, 
> HIVE-12924.3.patch
>
>
> {code}
> EXPLAIN EXTENDED
> FROM srcpart src
> INSERT OVERWRITE TABLE dest1
> SELECT substr(src.key,1,1), count(DISTINCT substr(src.value,5)), 
> concat(substr(src.key,1,1),sum(substr(src.value,5))), sum(DISTINCT 
> substr(src.value, 5)), count(DISTINCT src.value)
> WHERE src.ds = '2008-04-08'
> GROUP BY substr(src.key,1,1)
> {code}
> Ended Job = job_local968043618_0742 with errors
> FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-12803) CBO: Calcite Operator To Hive Operator (Calcite Return Path): MiniTezCliDriver count.q failure

2016-08-11 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg resolved HIVE-12803.

Resolution: Duplicate

Same issue is captured by HIVE-14396

> CBO: Calcite Operator To Hive Operator (Calcite Return Path): 
> MiniTezCliDriver count.q failure
> --
>
> Key: HIVE-12803
> URL: https://issues.apache.org/jira/browse/HIVE-12803
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Vineet Garg
>
> {code}
> select a, count(distinct b), count(distinct c), sum(d) from abcd group by a;
> {code}
> Set hive.cbo.returnpath.hiveop=true;
> {code}
> java.lang.IndexOutOfBoundsException: Index: 5, Size: 5
> at java.util.ArrayList.rangeCheck(ArrayList.java:635) ~[?:1.7.0_79]
> at java.util.ArrayList.get(ArrayList.java:411) ~[?:1.7.0_79]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.HiveGBOpConvUtil.genReduceSideGB1NoMapGB(HiveGBOpConvUtil.java:1060)
>  ~[hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.HiveGBOpConvUtil.genNoMapSideGBNoSkew(HiveGBOpConvUtil.java:473)
>  ~[hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.HiveGBOpConvUtil.translateGB(HiveGBOpConvUtil.java:304)
>  ~[hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.HiveOpConverter.visit(HiveOpConverter.java:398)
>  ~[hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.HiveOpConverter.dispatch(HiveOpConverter.java:181)
>  ~[hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.HiveOpConverter.convert(HiveOpConverter.java:154)
>  ~[hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedHiveOPDag(CalcitePlanner.java:688)
>  ~[hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:266)
>  [hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10094)
>  [hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:231)
>  [hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:237)
>  [hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:74)
>  [hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:237)
>  [hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:471) 
> [hive-exec-2.1.0-SNAPSHOT.jar:?]
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:311) 
> [hive-exec-2.1.0-SNAPSHOT.jar:?]
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1149) 
> [hive-exec-2.1.0-SNAPSHOT.jar:?]
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1237) 
> [hive-exec-2.1.0-SNAPSHOT.jar:?]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14396) CBO: Calcite Operator To Hive Operator (Calcite Return Path): TestCliDriver count.q failure

2016-08-11 Thread Vineet Garg (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15417920#comment-15417920
 ] 

Vineet Garg commented on HIVE-14396:


Created: https://reviews.apache.org/r/51006/

> CBO: Calcite Operator To Hive Operator (Calcite Return Path): TestCliDriver 
> count.q failure
> ---
>
> Key: HIVE-14396
> URL: https://issues.apache.org/jira/browse/HIVE-14396
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-14396.1.patch
>
>
> Currently there are three different failures
> Set hive.cbo.returnpath.hiveop=true for all cases.
> 1) First case is wrong result for following query
> {code:title=failure 1 Wrong result}
> explain select count(1), count(*), count(a), count(b), count(c), count(d), 
> count(distinct a), count(distinct b), count(distinct c), count(distinct d), 
> count(distinct a,b), count(distinct b,c), count(distinct c,d), count(distinct 
> a,d), count(distinct a,c), count(distinct b,d), count(distinct a,b,c), 
> count(distinct b,c,d), count(distinct a,c,d), count(distinct a,b,d), 
> count(distinct a,b,c,d) from abcd;
> {code}
> This occurs due to a bug in HiveCalciteUtil.getExprNodes. While looking for 
> corresponding expression for a aggregate function's argument wrong index is 
> being used.
> 2) Out of bound exception for following
> {code}
> set hive.map.aggr=false
> explain select count(1), count(*), count(a), count(b), count(c), count(d), 
> count(distinct a), count(distinct b), count(distinct c), count(distinct d), 
> count(distinct a,b), count(distinct b,c), count(distinct c,d), count(distinct 
> a,d), count(distinct a,c), count(distinct b,d), count(distinct a,b,c), 
> count(distinct b,c,d), count(distinct a,c,d), count(distinct a,b,d), 
> count(distinct a,b,c,d) from abcd;
> {code}
> The above happens while converting Calcite Aggregation to Hive's group by 
> operator.
> 3) Once the above case with exception is fixed same query with 
> hive.map.aggr=false give wrong results. Problem in this case is that while 
> creating expression for aggregate function's argument we end up with wrong 
> column info from underlying reduce sink operator. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14522) CBO: Calcite Operator To Hive Operator(Calcite Return Path): Fix test failure for auto_join_filters

2016-08-12 Thread Vineet Garg (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15418231#comment-15418231
 ] 

Vineet Garg commented on HIVE-14522:


This isn't actually a wrong result this is correct result. This test is for 
HIVE-1534 where outer join semantics were fixed and a hive configuration 
parameter was added to maintain backward compatibility (with wrong result). 
This test is testing that backward compatibility. It seems converting from 
calcite tree to hive operator tree this flag is being ignored. I am not sure if 
it is worth supporting this backward compatibility in return path since it's 
been almost 6 years now.

> CBO: Calcite Operator To Hive Operator(Calcite Return Path): Fix test failure 
> for auto_join_filters
> ---
>
> Key: HIVE-14522
> URL: https://issues.apache.org/jira/browse/HIVE-14522
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>
> {code}
> CREATE TABLE smb_input1(key int, value int) CLUSTERED BY (key) SORTED BY 
> (key) INTO 2 BUCKETS; 
> CREATE TABLE smb_input2(key int, value int) CLUSTERED BY (value) SORTED BY 
> (value) INTO 2 BUCKETS; 
> LOAD DATA LOCAL INPATH '../../data/files/in1.txt' into table smb_input1;
> LOAD DATA LOCAL INPATH '../../data/files/in2.txt' into table smb_input1;
> LOAD DATA LOCAL INPATH '../../data/files/in1.txt' into table smb_input2;
> LOAD DATA LOCAL INPATH '../../data/files/in2.txt' into table smb_input2;
> SET hive.optimize.bucketmapjoin = true;
> SET hive.optimize.bucketmapjoin.sortedmerge = true;
> SET hive.input.format = 
> org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
> SET hive.outerjoin.supports.filters = false;
> {code}
> {code} SELECT sum(hash(a.key,a.value,b.key,b.value)) FROM myinput1 a LEFT 
> OUTER JOIN myinput1 b on a.key > 40 AND a.value > 50 AND a.key = a.value AND 
> b.key > 40 AND b.value > 50 AND b.key = b.value; {code}
> {code} Expected result: 3078400 Actual result: 4937935 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14396) CBO: Calcite Operator To Hive Operator (Calcite Return Path): TestCliDriver count.q failure

2016-08-12 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-14396:
---
Status: Patch Available  (was: Open)

Addressed review comments

> CBO: Calcite Operator To Hive Operator (Calcite Return Path): TestCliDriver 
> count.q failure
> ---
>
> Key: HIVE-14396
> URL: https://issues.apache.org/jira/browse/HIVE-14396
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-14396.1.patch, HIVE-14396.2.patch
>
>
> Currently there are three different failures
> Set hive.cbo.returnpath.hiveop=true for all cases.
> 1) First case is wrong result for following query
> {code:title=failure 1 Wrong result}
> explain select count(1), count(*), count(a), count(b), count(c), count(d), 
> count(distinct a), count(distinct b), count(distinct c), count(distinct d), 
> count(distinct a,b), count(distinct b,c), count(distinct c,d), count(distinct 
> a,d), count(distinct a,c), count(distinct b,d), count(distinct a,b,c), 
> count(distinct b,c,d), count(distinct a,c,d), count(distinct a,b,d), 
> count(distinct a,b,c,d) from abcd;
> {code}
> This occurs due to a bug in HiveCalciteUtil.getExprNodes. While looking for 
> corresponding expression for a aggregate function's argument wrong index is 
> being used.
> 2) Out of bound exception for following
> {code}
> set hive.map.aggr=false
> explain select count(1), count(*), count(a), count(b), count(c), count(d), 
> count(distinct a), count(distinct b), count(distinct c), count(distinct d), 
> count(distinct a,b), count(distinct b,c), count(distinct c,d), count(distinct 
> a,d), count(distinct a,c), count(distinct b,d), count(distinct a,b,c), 
> count(distinct b,c,d), count(distinct a,c,d), count(distinct a,b,d), 
> count(distinct a,b,c,d) from abcd;
> {code}
> The above happens while converting Calcite Aggregation to Hive's group by 
> operator.
> 3) Once the above case with exception is fixed same query with 
> hive.map.aggr=false give wrong results. Problem in this case is that while 
> creating expression for aggregate function's argument we end up with wrong 
> column info from underlying reduce sink operator. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14396) CBO: Calcite Operator To Hive Operator (Calcite Return Path): TestCliDriver count.q failure

2016-08-12 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-14396:
---
Attachment: HIVE-14396.2.patch

> CBO: Calcite Operator To Hive Operator (Calcite Return Path): TestCliDriver 
> count.q failure
> ---
>
> Key: HIVE-14396
> URL: https://issues.apache.org/jira/browse/HIVE-14396
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-14396.1.patch, HIVE-14396.2.patch
>
>
> Currently there are three different failures
> Set hive.cbo.returnpath.hiveop=true for all cases.
> 1) First case is wrong result for following query
> {code:title=failure 1 Wrong result}
> explain select count(1), count(*), count(a), count(b), count(c), count(d), 
> count(distinct a), count(distinct b), count(distinct c), count(distinct d), 
> count(distinct a,b), count(distinct b,c), count(distinct c,d), count(distinct 
> a,d), count(distinct a,c), count(distinct b,d), count(distinct a,b,c), 
> count(distinct b,c,d), count(distinct a,c,d), count(distinct a,b,d), 
> count(distinct a,b,c,d) from abcd;
> {code}
> This occurs due to a bug in HiveCalciteUtil.getExprNodes. While looking for 
> corresponding expression for a aggregate function's argument wrong index is 
> being used.
> 2) Out of bound exception for following
> {code}
> set hive.map.aggr=false
> explain select count(1), count(*), count(a), count(b), count(c), count(d), 
> count(distinct a), count(distinct b), count(distinct c), count(distinct d), 
> count(distinct a,b), count(distinct b,c), count(distinct c,d), count(distinct 
> a,d), count(distinct a,c), count(distinct b,d), count(distinct a,b,c), 
> count(distinct b,c,d), count(distinct a,c,d), count(distinct a,b,d), 
> count(distinct a,b,c,d) from abcd;
> {code}
> The above happens while converting Calcite Aggregation to Hive's group by 
> operator.
> 3) Once the above case with exception is fixed same query with 
> hive.map.aggr=false give wrong results. Problem in this case is that while 
> creating expression for aggregate function's argument we end up with wrong 
> column info from underlying reduce sink operator. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14396) CBO: Calcite Operator To Hive Operator (Calcite Return Path): TestCliDriver count.q failure

2016-08-12 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-14396:
---
Status: Open  (was: Patch Available)

> CBO: Calcite Operator To Hive Operator (Calcite Return Path): TestCliDriver 
> count.q failure
> ---
>
> Key: HIVE-14396
> URL: https://issues.apache.org/jira/browse/HIVE-14396
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-14396.1.patch, HIVE-14396.2.patch
>
>
> Currently there are three different failures
> Set hive.cbo.returnpath.hiveop=true for all cases.
> 1) First case is wrong result for following query
> {code:title=failure 1 Wrong result}
> explain select count(1), count(*), count(a), count(b), count(c), count(d), 
> count(distinct a), count(distinct b), count(distinct c), count(distinct d), 
> count(distinct a,b), count(distinct b,c), count(distinct c,d), count(distinct 
> a,d), count(distinct a,c), count(distinct b,d), count(distinct a,b,c), 
> count(distinct b,c,d), count(distinct a,c,d), count(distinct a,b,d), 
> count(distinct a,b,c,d) from abcd;
> {code}
> This occurs due to a bug in HiveCalciteUtil.getExprNodes. While looking for 
> corresponding expression for a aggregate function's argument wrong index is 
> being used.
> 2) Out of bound exception for following
> {code}
> set hive.map.aggr=false
> explain select count(1), count(*), count(a), count(b), count(c), count(d), 
> count(distinct a), count(distinct b), count(distinct c), count(distinct d), 
> count(distinct a,b), count(distinct b,c), count(distinct c,d), count(distinct 
> a,d), count(distinct a,c), count(distinct b,d), count(distinct a,b,c), 
> count(distinct b,c,d), count(distinct a,c,d), count(distinct a,b,d), 
> count(distinct a,b,c,d) from abcd;
> {code}
> The above happens while converting Calcite Aggregation to Hive's group by 
> operator.
> 3) Once the above case with exception is fixed same query with 
> hive.map.aggr=false give wrong results. Problem in this case is that while 
> creating expression for aggregate function's argument we end up with wrong 
> column info from underlying reduce sink operator. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14396) CBO: Calcite Operator To Hive Operator (Calcite Return Path): TestCliDriver count.q failure

2016-08-10 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-14396:
---
Attachment: HIVE-14396.1.patch

> CBO: Calcite Operator To Hive Operator (Calcite Return Path): TestCliDriver 
> count.q failure
> ---
>
> Key: HIVE-14396
> URL: https://issues.apache.org/jira/browse/HIVE-14396
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-14396.1.patch
>
>
> Currently there are three different failures
> Set hive.cbo.returnpath.hiveop=true for all cases.
> 1) First case is wrong result for following query
> {code:title=failure 1 Wrong result}
> explain select count(1), count(*), count(a), count(b), count(c), count(d), 
> count(distinct a), count(distinct b), count(distinct c), count(distinct d), 
> count(distinct a,b), count(distinct b,c), count(distinct c,d), count(distinct 
> a,d), count(distinct a,c), count(distinct b,d), count(distinct a,b,c), 
> count(distinct b,c,d), count(distinct a,c,d), count(distinct a,b,d), 
> count(distinct a,b,c,d) from abcd;
> {code}
> This occurs due to a bug in HiveCalciteUtil.getExprNodes. While looking for 
> corresponding expression for a aggregate function's argument wrong index is 
> being used.
> 2) Out of bound exception for following
> {code}
> set hive.map.aggr=false
> explain select count(1), count(*), count(a), count(b), count(c), count(d), 
> count(distinct a), count(distinct b), count(distinct c), count(distinct d), 
> count(distinct a,b), count(distinct b,c), count(distinct c,d), count(distinct 
> a,d), count(distinct a,c), count(distinct b,d), count(distinct a,b,c), 
> count(distinct b,c,d), count(distinct a,c,d), count(distinct a,b,d), 
> count(distinct a,b,c,d) from abcd;
> {code}
> The above happens while converting Calcite Aggregation to Hive's group by 
> operator.
> 3) Once the above case with exception is fixed same query with 
> hive.map.aggr=false give wrong results. Problem in this case is that while 
> creating expression for aggregate function's argument we end up with wrong 
> column info from underlying reduce sink operator. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14396) CBO: Calcite Operator To Hive Operator (Calcite Return Path): TestCliDriver count.q failure

2016-08-10 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-14396:
---
Status: Patch Available  (was: Open)

> CBO: Calcite Operator To Hive Operator (Calcite Return Path): TestCliDriver 
> count.q failure
> ---
>
> Key: HIVE-14396
> URL: https://issues.apache.org/jira/browse/HIVE-14396
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-14396.1.patch
>
>
> Currently there are three different failures
> Set hive.cbo.returnpath.hiveop=true for all cases.
> 1) First case is wrong result for following query
> {code:title=failure 1 Wrong result}
> explain select count(1), count(*), count(a), count(b), count(c), count(d), 
> count(distinct a), count(distinct b), count(distinct c), count(distinct d), 
> count(distinct a,b), count(distinct b,c), count(distinct c,d), count(distinct 
> a,d), count(distinct a,c), count(distinct b,d), count(distinct a,b,c), 
> count(distinct b,c,d), count(distinct a,c,d), count(distinct a,b,d), 
> count(distinct a,b,c,d) from abcd;
> {code}
> This occurs due to a bug in HiveCalciteUtil.getExprNodes. While looking for 
> corresponding expression for a aggregate function's argument wrong index is 
> being used.
> 2) Out of bound exception for following
> {code}
> set hive.map.aggr=false
> explain select count(1), count(*), count(a), count(b), count(c), count(d), 
> count(distinct a), count(distinct b), count(distinct c), count(distinct d), 
> count(distinct a,b), count(distinct b,c), count(distinct c,d), count(distinct 
> a,d), count(distinct a,c), count(distinct b,d), count(distinct a,b,c), 
> count(distinct b,c,d), count(distinct a,c,d), count(distinct a,b,d), 
> count(distinct a,b,c,d) from abcd;
> {code}
> The above happens while converting Calcite Aggregation to Hive's group by 
> operator.
> 3) Once the above case with exception is fixed same query with 
> hive.map.aggr=false give wrong results. Problem in this case is that while 
> creating expression for aggregate function's argument we end up with wrong 
> column info from underlying reduce sink operator. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-10328) Enable new return path for cbo

2016-07-13 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg reassigned HIVE-10328:
--

Assignee: Vineet Garg  (was: Ashutosh Chauhan)

> Enable new return path for cbo
> --
>
> Key: HIVE-10328
> URL: https://issues.apache.org/jira/browse/HIVE-10328
> Project: Hive
>  Issue Type: Task
>  Components: CBO
>Reporter: Ashutosh Chauhan
>Assignee: Vineet Garg
> Attachments: HIVE-10328.1.patch, HIVE-10328.10.patch, 
> HIVE-10328.11.patch, HIVE-10328.12.patch, HIVE-10328.13.patch, 
> HIVE-10328.2.patch, HIVE-10328.3.patch, HIVE-10328.4.patch, 
> HIVE-10328.4.patch, HIVE-10328.5.patch, HIVE-10328.6.patch, 
> HIVE-10328.7.patch, HIVE-10328.8.patch, HIVE-10328.9.patch, HIVE-10328.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10328) Enable new return path for cbo

2016-07-13 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-10328:
---
Status: Patch Available  (was: Open)

Turning on return path flag on to see what tests are failing

> Enable new return path for cbo
> --
>
> Key: HIVE-10328
> URL: https://issues.apache.org/jira/browse/HIVE-10328
> Project: Hive
>  Issue Type: Task
>  Components: CBO
>Reporter: Ashutosh Chauhan
>Assignee: Vineet Garg
> Attachments: HIVE-10328.1.patch, HIVE-10328.10.patch, 
> HIVE-10328.11.patch, HIVE-10328.12.patch, HIVE-10328.13.patch, 
> HIVE-10328.14.patch, HIVE-10328.2.patch, HIVE-10328.3.patch, 
> HIVE-10328.4.patch, HIVE-10328.4.patch, HIVE-10328.5.patch, 
> HIVE-10328.6.patch, HIVE-10328.7.patch, HIVE-10328.8.patch, 
> HIVE-10328.9.patch, HIVE-10328.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14442) CBO: Calcite Operator To Hive Operator(Calcite Return Path): Wrong result/plan in group by with hive.map.aggr=false

2016-08-05 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-14442:
---
Description: 
Reproducer

{code} set hive.cbo.returnpath.hiveop=true
 set hive.map.aggr=false

create table abcd (a int, b int, c int, d int);
LOAD DATA LOCAL INPATH '../../data/files/in4.txt' INTO TABLE abcd;
{code}

{code} explain select count(distinct a) from abcd group by b; {code}

{code}
STAGE PLANS:
  Stage: Stage-1
Map Reduce
  Map Operator Tree:
  TableScan
alias: abcd
Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE Column 
stats: NONE
Select Operator
  expressions: a (type: int)
  outputColumnNames: a
  Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
Column stats: NONE
  Reduce Output Operator
key expressions: a (type: int), a (type: int)
sort order: ++
Map-reduce partition columns: a (type: int)
Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
Column stats: NONE
  Reduce Operator Tree:
Group By Operator
  aggregations: count(DISTINCT KEY._col1:0._col0)
  keys: KEY._col0 (type: int)
  mode: complete
  outputColumnNames: b, $f1
  Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE Column 
stats: NONE
  Select Operator
expressions: $f1 (type: bigint)
outputColumnNames: _o__c0
Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE Column 
stats: NONE
File Output Operator
  compressed: false
  Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE 
Column stats: NONE
  table:
  input format: org.apache.hadoop.mapred.SequenceFileInputFormat
  output format: 
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

{code} explain select count(distinct a) from abcd group by c; {code}
{code}
STAGE PLANS:
  Stage: Stage-1
Map Reduce
  Map Operator Tree:
  TableScan
alias: abcd
Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE Column 
stats: NONE
Select Operator
  expressions: a (type: int)
  outputColumnNames: a
  Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
Column stats: NONE
  Reduce Output Operator
key expressions: a (type: int), a (type: int)
sort order: ++
Map-reduce partition columns: a (type: int)
Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
Column stats: NONE
  Reduce Operator Tree:
Group By Operator
  aggregations: count(DISTINCT KEY._col1:0._col0)
  keys: KEY._col0 (type: int)
  mode: complete
  outputColumnNames: c, $f1
  Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE Column 
stats: NONE
  Select Operator
expressions: $f1 (type: bigint)
outputColumnNames: _o__c0
Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE Column 
stats: NONE
File Output Operator
  compressed: false
  Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE 
Column stats: NONE
  table:
  input format: org.apache.hadoop.mapred.SequenceFileInputFormat
  output format: 
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

{code}

Above two cases has wrong keys in Map side Reduce Output Operator (both has a, 
a instead of b,a and c,a respectively

  was:
Reproducer

{code} set hive.cbo.returnpath.hiveop=true {code}
 set hive.map.aggr=false {code}

{code}
create table abcd (a int, b int, c int, d int);
LOAD DATA LOCAL INPATH '../../data/files/in4.txt' INTO TABLE abcd;
{code}

{code} explain select count(distinct a) from abcd group by b; {code}

{code}
STAGE PLANS:
  Stage: Stage-1
Map Reduce
  Map Operator Tree:
  TableScan
alias: abcd
Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE Column 
stats: NONE
Select Operator
  expressions: a (type: int)
  outputColumnNames: a
  Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
Column stats: NONE
  Reduce Output Operator
key expressions: a (type: int), a (type: int)
sort order: ++
Map-reduce partition columns: a (type: int)
Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
Column stats: NONE
  Reduce Operator Tree:
Group By Operator
 

[jira] [Updated] (HIVE-14442) CBO: Calcite Operator To Hive Operator(Calcite Return Path): Wrong result/plan in group by with hive.map.aggr=false

2016-08-05 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-14442:
---
Description: 
Reproducer

{code} set hive.cbo.returnpath.hiveop=true
 set hive.map.aggr=false

create table abcd (a int, b int, c int, d int);
LOAD DATA LOCAL INPATH '../../data/files/in4.txt' INTO TABLE abcd;
{code}

{code} explain select count(distinct a) from abcd group by b; {code}

{code}
STAGE PLANS:
  Stage: Stage-1
Map Reduce
  Map Operator Tree:
  TableScan
alias: abcd
Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE Column 
stats: NONE
Select Operator
  expressions: a (type: int)
  outputColumnNames: a
  Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
Column stats: NONE
  Reduce Output Operator
key expressions: a (type: int), a (type: int)
sort order: ++
Map-reduce partition columns: a (type: int)
Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
Column stats: NONE
  Reduce Operator Tree:
Group By Operator
  aggregations: count(DISTINCT KEY._col1:0._col0)
  keys: KEY._col0 (type: int)
  mode: complete
  outputColumnNames: b, $f1
  Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE Column 
stats: NONE
  Select Operator
expressions: $f1 (type: bigint)
outputColumnNames: _o__c0
Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE Column 
stats: NONE
File Output Operator
  compressed: false
  Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE 
Column stats: NONE
  table:
  input format: org.apache.hadoop.mapred.SequenceFileInputFormat
  output format: 
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
{code}

{code} explain select count(distinct a) from abcd group by c; {code}
{code}
STAGE PLANS:
  Stage: Stage-1
Map Reduce
  Map Operator Tree:
  TableScan
alias: abcd
Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE Column 
stats: NONE
Select Operator
  expressions: a (type: int)
  outputColumnNames: a
  Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
Column stats: NONE
  Reduce Output Operator
key expressions: a (type: int), a (type: int)
sort order: ++
Map-reduce partition columns: a (type: int)
Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
Column stats: NONE
  Reduce Operator Tree:
Group By Operator
  aggregations: count(DISTINCT KEY._col1:0._col0)
  keys: KEY._col0 (type: int)
  mode: complete
  outputColumnNames: c, $f1
  Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE Column 
stats: NONE
  Select Operator
expressions: $f1 (type: bigint)
outputColumnNames: _o__c0
Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE Column 
stats: NONE
File Output Operator
  compressed: false
  Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE 
Column stats: NONE
  table:
  input format: org.apache.hadoop.mapred.SequenceFileInputFormat
  output format: 
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

{code}

Above two cases has wrong keys in Map side Reduce Output Operator (both has a, 
a instead of b,a and c,a respectively

  was:
Reproducer

{code} set hive.cbo.returnpath.hiveop=true
 set hive.map.aggr=false

create table abcd (a int, b int, c int, d int);
LOAD DATA LOCAL INPATH '../../data/files/in4.txt' INTO TABLE abcd;
{code}

{code} explain select count(distinct a) from abcd group by b; {code}

{code}
STAGE PLANS:
  Stage: Stage-1
Map Reduce
  Map Operator Tree:
  TableScan
alias: abcd
Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE Column 
stats: NONE
Select Operator
  expressions: a (type: int)
  outputColumnNames: a
  Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
Column stats: NONE
  Reduce Output Operator
key expressions: a (type: int), a (type: int)
sort order: ++
Map-reduce partition columns: a (type: int)
Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
Column stats: NONE
  Reduce Operator Tree:
Group By Operator
  

[jira] [Updated] (HIVE-14442) CBO: Calcite Operator To Hive Operator(Calcite Return Path): Wrong result/plan in group by with hive.map.aggr=false

2016-08-05 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-14442:
---
Description: 
Reproducer

{code} set hive.cbo.returnpath.hiveop=true {code}
 set hive.map.aggr=false {code}

{code}
create table abcd (a int, b int, c int, d int);
LOAD DATA LOCAL INPATH '../../data/files/in4.txt' INTO TABLE abcd;
{code}

{code} explain select count(distinct a) from abcd group by b; {code}

{code}
STAGE PLANS:
  Stage: Stage-1
Map Reduce
  Map Operator Tree:
  TableScan
alias: abcd
Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE Column 
stats: NONE
Select Operator
  expressions: a (type: int)
  outputColumnNames: a
  Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
Column stats: NONE
  Reduce Output Operator
key expressions: a (type: int), a (type: int)
sort order: ++
Map-reduce partition columns: a (type: int)
Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
Column stats: NONE
  Reduce Operator Tree:
Group By Operator
  aggregations: count(DISTINCT KEY._col1:0._col0)
  keys: KEY._col0 (type: int)
  mode: complete
  outputColumnNames: b, $f1
  Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE Column 
stats: NONE
  Select Operator
expressions: $f1 (type: bigint)
outputColumnNames: _o__c0
Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE Column 
stats: NONE
File Output Operator
  compressed: false
  Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE 
Column stats: NONE
  table:
  input format: org.apache.hadoop.mapred.SequenceFileInputFormat
  output format: 
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

{code} explain select count(distinct a) from abcd group by c; {code}
{code}
STAGE PLANS:
  Stage: Stage-1
Map Reduce
  Map Operator Tree:
  TableScan
alias: abcd
Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE Column 
stats: NONE
Select Operator
  expressions: a (type: int)
  outputColumnNames: a
  Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
Column stats: NONE
  Reduce Output Operator
key expressions: a (type: int), a (type: int)
sort order: ++
Map-reduce partition columns: a (type: int)
Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
Column stats: NONE
  Reduce Operator Tree:
Group By Operator
  aggregations: count(DISTINCT KEY._col1:0._col0)
  keys: KEY._col0 (type: int)
  mode: complete
  outputColumnNames: c, $f1
  Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE Column 
stats: NONE
  Select Operator
expressions: $f1 (type: bigint)
outputColumnNames: _o__c0
Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE Column 
stats: NONE
File Output Operator
  compressed: false
  Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE 
Column stats: NONE
  table:
  input format: org.apache.hadoop.mapred.SequenceFileInputFormat
  output format: 
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

{code}

Above two cases has wrong keys in Map side Reduce Output Operator (both has a, 
a instead of b,a and c,a respectively

  was:
Reproducer

{code} set hive.cbo.returnpath.hiveop=true {code}
{code} set hive.map.aggr=false {code}

{code}
create table abcd (a int, b int, c int, d int);
LOAD DATA LOCAL INPATH '../../data/files/in4.txt' INTO TABLE abcd;
{code}

{code} explain select count(distinct a) from abcd group by b; {code}

{code}
STAGE PLANS:
  Stage: Stage-1
Map Reduce
  Map Operator Tree:
  TableScan
alias: abcd
Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE Column 
stats: NONE
Select Operator
  expressions: a (type: int)
  outputColumnNames: a
  Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
Column stats: NONE
  Reduce Output Operator
key expressions: a (type: int), a (type: int)
sort order: ++
Map-reduce partition columns: a (type: int)
Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
Column stats: NONE
  Reduce Operator Tree:

[jira] [Updated] (HIVE-14442) CBO: Calcite Operator To Hive Operator(Calcite Return Path): Wrong result/plan in group by with hive.map.aggr=false

2016-08-05 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-14442:
---
Status: Patch Available  (was: Open)

> CBO: Calcite Operator To Hive Operator(Calcite Return Path): Wrong 
> result/plan in group by with hive.map.aggr=false
> ---
>
> Key: HIVE-14442
> URL: https://issues.apache.org/jira/browse/HIVE-14442
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-14442.1.patch
>
>
> Reproducer
> {code} set hive.cbo.returnpath.hiveop=true
>  set hive.map.aggr=false
> create table abcd (a int, b int, c int, d int);
> LOAD DATA LOCAL INPATH '../../data/files/in4.txt' INTO TABLE abcd;
> {code}
> {code} explain select count(distinct a) from abcd group by b; {code}
> {code}
> STAGE PLANS:
>   Stage: Stage-1
> Map Reduce
>   Map Operator Tree:
>   TableScan
> alias: abcd
> Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
> Select Operator
>   expressions: a (type: int)
>   outputColumnNames: a
>   Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
>   Reduce Output Operator
> key expressions: a (type: int), a (type: int)
> sort order: ++
> Map-reduce partition columns: a (type: int)
> Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
>   Reduce Operator Tree:
> Group By Operator
>   aggregations: count(DISTINCT KEY._col1:0._col0)
>   keys: KEY._col0 (type: int)
>   mode: complete
>   outputColumnNames: b, $f1
>   Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE Column 
> stats: NONE
>   Select Operator
> expressions: $f1 (type: bigint)
> outputColumnNames: _o__c0
> Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE 
> Column stats: NONE
> File Output Operator
>   compressed: false
>   Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE 
> Column stats: NONE
>   table:
>   input format: 
> org.apache.hadoop.mapred.SequenceFileInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>   serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> {code}
> {code} explain select count(distinct a) from abcd group by c; {code}
> {code}
> STAGE PLANS:
>   Stage: Stage-1
> Map Reduce
>   Map Operator Tree:
>   TableScan
> alias: abcd
> Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
> Select Operator
>   expressions: a (type: int)
>   outputColumnNames: a
>   Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
>   Reduce Output Operator
> key expressions: a (type: int), a (type: int)
> sort order: ++
> Map-reduce partition columns: a (type: int)
> Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
>   Reduce Operator Tree:
> Group By Operator
>   aggregations: count(DISTINCT KEY._col1:0._col0)
>   keys: KEY._col0 (type: int)
>   mode: complete
>   outputColumnNames: c, $f1
>   Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE Column 
> stats: NONE
>   Select Operator
> expressions: $f1 (type: bigint)
> outputColumnNames: _o__c0
> Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE 
> Column stats: NONE
> File Output Operator
>   compressed: false
>   Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE 
> Column stats: NONE
>   table:
>   input format: 
> org.apache.hadoop.mapred.SequenceFileInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>   serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> {code}
> Above two cases has wrong keys in Map side Reduce Output Operator (both has 
> a, a instead of b,a and c,a respectively



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14442) CBO: Calcite Operator To Hive Operator(Calcite Return Path): Wrong result/plan in group by with hive.map.aggr=false

2016-08-05 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-14442:
---
Attachment: HIVE-14442.1.patch

> CBO: Calcite Operator To Hive Operator(Calcite Return Path): Wrong 
> result/plan in group by with hive.map.aggr=false
> ---
>
> Key: HIVE-14442
> URL: https://issues.apache.org/jira/browse/HIVE-14442
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-14442.1.patch
>
>
> Reproducer
> {code} set hive.cbo.returnpath.hiveop=true
>  set hive.map.aggr=false
> create table abcd (a int, b int, c int, d int);
> LOAD DATA LOCAL INPATH '../../data/files/in4.txt' INTO TABLE abcd;
> {code}
> {code} explain select count(distinct a) from abcd group by b; {code}
> {code}
> STAGE PLANS:
>   Stage: Stage-1
> Map Reduce
>   Map Operator Tree:
>   TableScan
> alias: abcd
> Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
> Select Operator
>   expressions: a (type: int)
>   outputColumnNames: a
>   Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
>   Reduce Output Operator
> key expressions: a (type: int), a (type: int)
> sort order: ++
> Map-reduce partition columns: a (type: int)
> Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
>   Reduce Operator Tree:
> Group By Operator
>   aggregations: count(DISTINCT KEY._col1:0._col0)
>   keys: KEY._col0 (type: int)
>   mode: complete
>   outputColumnNames: b, $f1
>   Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE Column 
> stats: NONE
>   Select Operator
> expressions: $f1 (type: bigint)
> outputColumnNames: _o__c0
> Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE 
> Column stats: NONE
> File Output Operator
>   compressed: false
>   Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE 
> Column stats: NONE
>   table:
>   input format: 
> org.apache.hadoop.mapred.SequenceFileInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>   serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> {code}
> {code} explain select count(distinct a) from abcd group by c; {code}
> {code}
> STAGE PLANS:
>   Stage: Stage-1
> Map Reduce
>   Map Operator Tree:
>   TableScan
> alias: abcd
> Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
> Select Operator
>   expressions: a (type: int)
>   outputColumnNames: a
>   Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
>   Reduce Output Operator
> key expressions: a (type: int), a (type: int)
> sort order: ++
> Map-reduce partition columns: a (type: int)
> Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
>   Reduce Operator Tree:
> Group By Operator
>   aggregations: count(DISTINCT KEY._col1:0._col0)
>   keys: KEY._col0 (type: int)
>   mode: complete
>   outputColumnNames: c, $f1
>   Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE Column 
> stats: NONE
>   Select Operator
> expressions: $f1 (type: bigint)
> outputColumnNames: _o__c0
> Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE 
> Column stats: NONE
> File Output Operator
>   compressed: false
>   Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE 
> Column stats: NONE
>   table:
>   input format: 
> org.apache.hadoop.mapred.SequenceFileInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>   serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> {code}
> Above two cases has wrong keys in Map side Reduce Output Operator (both has 
> a, a instead of b,a and c,a respectively



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14442) CBO: Calcite Operator To Hive Operator(Calcite Return Path): Wrong result/plan in group by with hive.map.aggr=false

2016-08-07 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-14442:
---
Attachment: HIVE-14442.2.patch

> CBO: Calcite Operator To Hive Operator(Calcite Return Path): Wrong 
> result/plan in group by with hive.map.aggr=false
> ---
>
> Key: HIVE-14442
> URL: https://issues.apache.org/jira/browse/HIVE-14442
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-14442.1.patch, HIVE-14442.2.patch
>
>
> Reproducer
> {code} set hive.cbo.returnpath.hiveop=true
>  set hive.map.aggr=false
> create table abcd (a int, b int, c int, d int);
> LOAD DATA LOCAL INPATH '../../data/files/in4.txt' INTO TABLE abcd;
> {code}
> {code} explain select count(distinct a) from abcd group by b; {code}
> {code}
> STAGE PLANS:
>   Stage: Stage-1
> Map Reduce
>   Map Operator Tree:
>   TableScan
> alias: abcd
> Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
> Select Operator
>   expressions: a (type: int)
>   outputColumnNames: a
>   Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
>   Reduce Output Operator
> key expressions: a (type: int), a (type: int)
> sort order: ++
> Map-reduce partition columns: a (type: int)
> Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
>   Reduce Operator Tree:
> Group By Operator
>   aggregations: count(DISTINCT KEY._col1:0._col0)
>   keys: KEY._col0 (type: int)
>   mode: complete
>   outputColumnNames: b, $f1
>   Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE Column 
> stats: NONE
>   Select Operator
> expressions: $f1 (type: bigint)
> outputColumnNames: _o__c0
> Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE 
> Column stats: NONE
> File Output Operator
>   compressed: false
>   Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE 
> Column stats: NONE
>   table:
>   input format: 
> org.apache.hadoop.mapred.SequenceFileInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>   serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> {code}
> {code} explain select count(distinct a) from abcd group by c; {code}
> {code}
> STAGE PLANS:
>   Stage: Stage-1
> Map Reduce
>   Map Operator Tree:
>   TableScan
> alias: abcd
> Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
> Select Operator
>   expressions: a (type: int)
>   outputColumnNames: a
>   Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
>   Reduce Output Operator
> key expressions: a (type: int), a (type: int)
> sort order: ++
> Map-reduce partition columns: a (type: int)
> Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
>   Reduce Operator Tree:
> Group By Operator
>   aggregations: count(DISTINCT KEY._col1:0._col0)
>   keys: KEY._col0 (type: int)
>   mode: complete
>   outputColumnNames: c, $f1
>   Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE Column 
> stats: NONE
>   Select Operator
> expressions: $f1 (type: bigint)
> outputColumnNames: _o__c0
> Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE 
> Column stats: NONE
> File Output Operator
>   compressed: false
>   Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE 
> Column stats: NONE
>   table:
>   input format: 
> org.apache.hadoop.mapred.SequenceFileInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>   serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> {code}
> Above two cases has wrong keys in Map side Reduce Output Operator (both has 
> a, a instead of b,a and c,a respectively



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14442) CBO: Calcite Operator To Hive Operator(Calcite Return Path): Wrong result/plan in group by with hive.map.aggr=false

2016-08-07 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-14442:
---
Status: Open  (was: Patch Available)

> CBO: Calcite Operator To Hive Operator(Calcite Return Path): Wrong 
> result/plan in group by with hive.map.aggr=false
> ---
>
> Key: HIVE-14442
> URL: https://issues.apache.org/jira/browse/HIVE-14442
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-14442.1.patch, HIVE-14442.2.patch
>
>
> Reproducer
> {code} set hive.cbo.returnpath.hiveop=true
>  set hive.map.aggr=false
> create table abcd (a int, b int, c int, d int);
> LOAD DATA LOCAL INPATH '../../data/files/in4.txt' INTO TABLE abcd;
> {code}
> {code} explain select count(distinct a) from abcd group by b; {code}
> {code}
> STAGE PLANS:
>   Stage: Stage-1
> Map Reduce
>   Map Operator Tree:
>   TableScan
> alias: abcd
> Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
> Select Operator
>   expressions: a (type: int)
>   outputColumnNames: a
>   Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
>   Reduce Output Operator
> key expressions: a (type: int), a (type: int)
> sort order: ++
> Map-reduce partition columns: a (type: int)
> Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
>   Reduce Operator Tree:
> Group By Operator
>   aggregations: count(DISTINCT KEY._col1:0._col0)
>   keys: KEY._col0 (type: int)
>   mode: complete
>   outputColumnNames: b, $f1
>   Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE Column 
> stats: NONE
>   Select Operator
> expressions: $f1 (type: bigint)
> outputColumnNames: _o__c0
> Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE 
> Column stats: NONE
> File Output Operator
>   compressed: false
>   Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE 
> Column stats: NONE
>   table:
>   input format: 
> org.apache.hadoop.mapred.SequenceFileInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>   serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> {code}
> {code} explain select count(distinct a) from abcd group by c; {code}
> {code}
> STAGE PLANS:
>   Stage: Stage-1
> Map Reduce
>   Map Operator Tree:
>   TableScan
> alias: abcd
> Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
> Select Operator
>   expressions: a (type: int)
>   outputColumnNames: a
>   Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
>   Reduce Output Operator
> key expressions: a (type: int), a (type: int)
> sort order: ++
> Map-reduce partition columns: a (type: int)
> Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
>   Reduce Operator Tree:
> Group By Operator
>   aggregations: count(DISTINCT KEY._col1:0._col0)
>   keys: KEY._col0 (type: int)
>   mode: complete
>   outputColumnNames: c, $f1
>   Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE Column 
> stats: NONE
>   Select Operator
> expressions: $f1 (type: bigint)
> outputColumnNames: _o__c0
> Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE 
> Column stats: NONE
> File Output Operator
>   compressed: false
>   Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE 
> Column stats: NONE
>   table:
>   input format: 
> org.apache.hadoop.mapred.SequenceFileInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>   serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> {code}
> Above two cases has wrong keys in Map side Reduce Output Operator (both has 
> a, a instead of b,a and c,a respectively



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14442) CBO: Calcite Operator To Hive Operator(Calcite Return Path): Wrong result/plan in group by with hive.map.aggr=false

2016-08-07 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-14442:
---
Attachment: (was: HIVE-14442.2.patch)

> CBO: Calcite Operator To Hive Operator(Calcite Return Path): Wrong 
> result/plan in group by with hive.map.aggr=false
> ---
>
> Key: HIVE-14442
> URL: https://issues.apache.org/jira/browse/HIVE-14442
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-14442.1.patch
>
>
> Reproducer
> {code} set hive.cbo.returnpath.hiveop=true
>  set hive.map.aggr=false
> create table abcd (a int, b int, c int, d int);
> LOAD DATA LOCAL INPATH '../../data/files/in4.txt' INTO TABLE abcd;
> {code}
> {code} explain select count(distinct a) from abcd group by b; {code}
> {code}
> STAGE PLANS:
>   Stage: Stage-1
> Map Reduce
>   Map Operator Tree:
>   TableScan
> alias: abcd
> Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
> Select Operator
>   expressions: a (type: int)
>   outputColumnNames: a
>   Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
>   Reduce Output Operator
> key expressions: a (type: int), a (type: int)
> sort order: ++
> Map-reduce partition columns: a (type: int)
> Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
>   Reduce Operator Tree:
> Group By Operator
>   aggregations: count(DISTINCT KEY._col1:0._col0)
>   keys: KEY._col0 (type: int)
>   mode: complete
>   outputColumnNames: b, $f1
>   Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE Column 
> stats: NONE
>   Select Operator
> expressions: $f1 (type: bigint)
> outputColumnNames: _o__c0
> Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE 
> Column stats: NONE
> File Output Operator
>   compressed: false
>   Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE 
> Column stats: NONE
>   table:
>   input format: 
> org.apache.hadoop.mapred.SequenceFileInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>   serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> {code}
> {code} explain select count(distinct a) from abcd group by c; {code}
> {code}
> STAGE PLANS:
>   Stage: Stage-1
> Map Reduce
>   Map Operator Tree:
>   TableScan
> alias: abcd
> Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
> Select Operator
>   expressions: a (type: int)
>   outputColumnNames: a
>   Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
>   Reduce Output Operator
> key expressions: a (type: int), a (type: int)
> sort order: ++
> Map-reduce partition columns: a (type: int)
> Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
>   Reduce Operator Tree:
> Group By Operator
>   aggregations: count(DISTINCT KEY._col1:0._col0)
>   keys: KEY._col0 (type: int)
>   mode: complete
>   outputColumnNames: c, $f1
>   Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE Column 
> stats: NONE
>   Select Operator
> expressions: $f1 (type: bigint)
> outputColumnNames: _o__c0
> Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE 
> Column stats: NONE
> File Output Operator
>   compressed: false
>   Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE 
> Column stats: NONE
>   table:
>   input format: 
> org.apache.hadoop.mapred.SequenceFileInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>   serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> {code}
> Above two cases has wrong keys in Map side Reduce Output Operator (both has 
> a, a instead of b,a and c,a respectively



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12924) CBO: Calcite Operator To Hive Operator (Calcite Return Path): TestCliDriver groupby_ppr_multi_distinct.q failure

2016-08-07 Thread Vineet Garg (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15411065#comment-15411065
 ] 

Vineet Garg commented on HIVE-12924:


This has same issue as HIVE-14396's 3) case i.e. bug in lookup for column info 
for non-distinct parameter. Translate groupby end up creating wrong column name 
for one of the parameter and later execution engine bails out on not finding 
that particular column

> CBO: Calcite Operator To Hive Operator (Calcite Return Path): TestCliDriver 
> groupby_ppr_multi_distinct.q failure
> 
>
> Key: HIVE-12924
> URL: https://issues.apache.org/jira/browse/HIVE-12924
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Vineet Garg
> Attachments: HIVE-12924.1.patch, HIVE-12924.2.patch, 
> HIVE-12924.3.patch
>
>
> {code}
> EXPLAIN EXTENDED
> FROM srcpart src
> INSERT OVERWRITE TABLE dest1
> SELECT substr(src.key,1,1), count(DISTINCT substr(src.value,5)), 
> concat(substr(src.key,1,1),sum(substr(src.value,5))), sum(DISTINCT 
> substr(src.value, 5)), count(DISTINCT src.value)
> WHERE src.ds = '2008-04-08'
> GROUP BY substr(src.key,1,1)
> {code}
> Ended Job = job_local968043618_0742 with errors
> FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14442) CBO: Calcite Operator To Hive Operator(Calcite Return Path): Wrong result/plan in group by with hive.map.aggr=false

2016-08-07 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-14442:
---
Attachment: HIVE-14442.2.patch

> CBO: Calcite Operator To Hive Operator(Calcite Return Path): Wrong 
> result/plan in group by with hive.map.aggr=false
> ---
>
> Key: HIVE-14442
> URL: https://issues.apache.org/jira/browse/HIVE-14442
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-14442.1.patch, HIVE-14442.2.patch
>
>
> Reproducer
> {code} set hive.cbo.returnpath.hiveop=true
>  set hive.map.aggr=false
> create table abcd (a int, b int, c int, d int);
> LOAD DATA LOCAL INPATH '../../data/files/in4.txt' INTO TABLE abcd;
> {code}
> {code} explain select count(distinct a) from abcd group by b; {code}
> {code}
> STAGE PLANS:
>   Stage: Stage-1
> Map Reduce
>   Map Operator Tree:
>   TableScan
> alias: abcd
> Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
> Select Operator
>   expressions: a (type: int)
>   outputColumnNames: a
>   Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
>   Reduce Output Operator
> key expressions: a (type: int), a (type: int)
> sort order: ++
> Map-reduce partition columns: a (type: int)
> Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
>   Reduce Operator Tree:
> Group By Operator
>   aggregations: count(DISTINCT KEY._col1:0._col0)
>   keys: KEY._col0 (type: int)
>   mode: complete
>   outputColumnNames: b, $f1
>   Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE Column 
> stats: NONE
>   Select Operator
> expressions: $f1 (type: bigint)
> outputColumnNames: _o__c0
> Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE 
> Column stats: NONE
> File Output Operator
>   compressed: false
>   Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE 
> Column stats: NONE
>   table:
>   input format: 
> org.apache.hadoop.mapred.SequenceFileInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>   serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> {code}
> {code} explain select count(distinct a) from abcd group by c; {code}
> {code}
> STAGE PLANS:
>   Stage: Stage-1
> Map Reduce
>   Map Operator Tree:
>   TableScan
> alias: abcd
> Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
> Select Operator
>   expressions: a (type: int)
>   outputColumnNames: a
>   Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
>   Reduce Output Operator
> key expressions: a (type: int), a (type: int)
> sort order: ++
> Map-reduce partition columns: a (type: int)
> Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
>   Reduce Operator Tree:
> Group By Operator
>   aggregations: count(DISTINCT KEY._col1:0._col0)
>   keys: KEY._col0 (type: int)
>   mode: complete
>   outputColumnNames: c, $f1
>   Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE Column 
> stats: NONE
>   Select Operator
> expressions: $f1 (type: bigint)
> outputColumnNames: _o__c0
> Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE 
> Column stats: NONE
> File Output Operator
>   compressed: false
>   Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE 
> Column stats: NONE
>   table:
>   input format: 
> org.apache.hadoop.mapred.SequenceFileInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>   serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> {code}
> Above two cases has wrong keys in Map side Reduce Output Operator (both has 
> a, a instead of b,a and c,a respectively



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14442) CBO: Calcite Operator To Hive Operator(Calcite Return Path): Wrong result/plan in group by with hive.map.aggr=false

2016-08-08 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-14442:
---
Status: Patch Available  (was: Open)

Missed sparks golden file update

> CBO: Calcite Operator To Hive Operator(Calcite Return Path): Wrong 
> result/plan in group by with hive.map.aggr=false
> ---
>
> Key: HIVE-14442
> URL: https://issues.apache.org/jira/browse/HIVE-14442
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-14442.1.patch, HIVE-14442.2.patch, 
> HIVE-14442.3.patch
>
>
> Reproducer
> {code} set hive.cbo.returnpath.hiveop=true
>  set hive.map.aggr=false
> create table abcd (a int, b int, c int, d int);
> LOAD DATA LOCAL INPATH '../../data/files/in4.txt' INTO TABLE abcd;
> {code}
> {code} explain select count(distinct a) from abcd group by b; {code}
> {code}
> STAGE PLANS:
>   Stage: Stage-1
> Map Reduce
>   Map Operator Tree:
>   TableScan
> alias: abcd
> Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
> Select Operator
>   expressions: a (type: int)
>   outputColumnNames: a
>   Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
>   Reduce Output Operator
> key expressions: a (type: int), a (type: int)
> sort order: ++
> Map-reduce partition columns: a (type: int)
> Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
>   Reduce Operator Tree:
> Group By Operator
>   aggregations: count(DISTINCT KEY._col1:0._col0)
>   keys: KEY._col0 (type: int)
>   mode: complete
>   outputColumnNames: b, $f1
>   Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE Column 
> stats: NONE
>   Select Operator
> expressions: $f1 (type: bigint)
> outputColumnNames: _o__c0
> Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE 
> Column stats: NONE
> File Output Operator
>   compressed: false
>   Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE 
> Column stats: NONE
>   table:
>   input format: 
> org.apache.hadoop.mapred.SequenceFileInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>   serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> {code}
> {code} explain select count(distinct a) from abcd group by c; {code}
> {code}
> STAGE PLANS:
>   Stage: Stage-1
> Map Reduce
>   Map Operator Tree:
>   TableScan
> alias: abcd
> Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
> Select Operator
>   expressions: a (type: int)
>   outputColumnNames: a
>   Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
>   Reduce Output Operator
> key expressions: a (type: int), a (type: int)
> sort order: ++
> Map-reduce partition columns: a (type: int)
> Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
>   Reduce Operator Tree:
> Group By Operator
>   aggregations: count(DISTINCT KEY._col1:0._col0)
>   keys: KEY._col0 (type: int)
>   mode: complete
>   outputColumnNames: c, $f1
>   Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE Column 
> stats: NONE
>   Select Operator
> expressions: $f1 (type: bigint)
> outputColumnNames: _o__c0
> Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE 
> Column stats: NONE
> File Output Operator
>   compressed: false
>   Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE 
> Column stats: NONE
>   table:
>   input format: 
> org.apache.hadoop.mapred.SequenceFileInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>   serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> {code}
> Above two cases has wrong keys in Map side Reduce Output Operator (both has 
> a, a instead of b,a and c,a respectively



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14442) CBO: Calcite Operator To Hive Operator(Calcite Return Path): Wrong result/plan in group by with hive.map.aggr=false

2016-08-08 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-14442:
---
Attachment: HIVE-14442.3.patch

> CBO: Calcite Operator To Hive Operator(Calcite Return Path): Wrong 
> result/plan in group by with hive.map.aggr=false
> ---
>
> Key: HIVE-14442
> URL: https://issues.apache.org/jira/browse/HIVE-14442
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-14442.1.patch, HIVE-14442.2.patch, 
> HIVE-14442.3.patch
>
>
> Reproducer
> {code} set hive.cbo.returnpath.hiveop=true
>  set hive.map.aggr=false
> create table abcd (a int, b int, c int, d int);
> LOAD DATA LOCAL INPATH '../../data/files/in4.txt' INTO TABLE abcd;
> {code}
> {code} explain select count(distinct a) from abcd group by b; {code}
> {code}
> STAGE PLANS:
>   Stage: Stage-1
> Map Reduce
>   Map Operator Tree:
>   TableScan
> alias: abcd
> Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
> Select Operator
>   expressions: a (type: int)
>   outputColumnNames: a
>   Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
>   Reduce Output Operator
> key expressions: a (type: int), a (type: int)
> sort order: ++
> Map-reduce partition columns: a (type: int)
> Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
>   Reduce Operator Tree:
> Group By Operator
>   aggregations: count(DISTINCT KEY._col1:0._col0)
>   keys: KEY._col0 (type: int)
>   mode: complete
>   outputColumnNames: b, $f1
>   Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE Column 
> stats: NONE
>   Select Operator
> expressions: $f1 (type: bigint)
> outputColumnNames: _o__c0
> Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE 
> Column stats: NONE
> File Output Operator
>   compressed: false
>   Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE 
> Column stats: NONE
>   table:
>   input format: 
> org.apache.hadoop.mapred.SequenceFileInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>   serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> {code}
> {code} explain select count(distinct a) from abcd group by c; {code}
> {code}
> STAGE PLANS:
>   Stage: Stage-1
> Map Reduce
>   Map Operator Tree:
>   TableScan
> alias: abcd
> Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
> Select Operator
>   expressions: a (type: int)
>   outputColumnNames: a
>   Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
>   Reduce Output Operator
> key expressions: a (type: int), a (type: int)
> sort order: ++
> Map-reduce partition columns: a (type: int)
> Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
>   Reduce Operator Tree:
> Group By Operator
>   aggregations: count(DISTINCT KEY._col1:0._col0)
>   keys: KEY._col0 (type: int)
>   mode: complete
>   outputColumnNames: c, $f1
>   Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE Column 
> stats: NONE
>   Select Operator
> expressions: $f1 (type: bigint)
> outputColumnNames: _o__c0
> Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE 
> Column stats: NONE
> File Output Operator
>   compressed: false
>   Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE 
> Column stats: NONE
>   table:
>   input format: 
> org.apache.hadoop.mapred.SequenceFileInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>   serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> {code}
> Above two cases has wrong keys in Map side Reduce Output Operator (both has 
> a, a instead of b,a and c,a respectively



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14442) CBO: Calcite Operator To Hive Operator(Calcite Return Path): Wrong result/plan in group by with hive.map.aggr=false

2016-08-08 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-14442:
---
Status: Open  (was: Patch Available)

> CBO: Calcite Operator To Hive Operator(Calcite Return Path): Wrong 
> result/plan in group by with hive.map.aggr=false
> ---
>
> Key: HIVE-14442
> URL: https://issues.apache.org/jira/browse/HIVE-14442
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-14442.1.patch, HIVE-14442.2.patch, 
> HIVE-14442.3.patch
>
>
> Reproducer
> {code} set hive.cbo.returnpath.hiveop=true
>  set hive.map.aggr=false
> create table abcd (a int, b int, c int, d int);
> LOAD DATA LOCAL INPATH '../../data/files/in4.txt' INTO TABLE abcd;
> {code}
> {code} explain select count(distinct a) from abcd group by b; {code}
> {code}
> STAGE PLANS:
>   Stage: Stage-1
> Map Reduce
>   Map Operator Tree:
>   TableScan
> alias: abcd
> Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
> Select Operator
>   expressions: a (type: int)
>   outputColumnNames: a
>   Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
>   Reduce Output Operator
> key expressions: a (type: int), a (type: int)
> sort order: ++
> Map-reduce partition columns: a (type: int)
> Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
>   Reduce Operator Tree:
> Group By Operator
>   aggregations: count(DISTINCT KEY._col1:0._col0)
>   keys: KEY._col0 (type: int)
>   mode: complete
>   outputColumnNames: b, $f1
>   Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE Column 
> stats: NONE
>   Select Operator
> expressions: $f1 (type: bigint)
> outputColumnNames: _o__c0
> Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE 
> Column stats: NONE
> File Output Operator
>   compressed: false
>   Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE 
> Column stats: NONE
>   table:
>   input format: 
> org.apache.hadoop.mapred.SequenceFileInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>   serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> {code}
> {code} explain select count(distinct a) from abcd group by c; {code}
> {code}
> STAGE PLANS:
>   Stage: Stage-1
> Map Reduce
>   Map Operator Tree:
>   TableScan
> alias: abcd
> Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
> Select Operator
>   expressions: a (type: int)
>   outputColumnNames: a
>   Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
>   Reduce Output Operator
> key expressions: a (type: int), a (type: int)
> sort order: ++
> Map-reduce partition columns: a (type: int)
> Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
>   Reduce Operator Tree:
> Group By Operator
>   aggregations: count(DISTINCT KEY._col1:0._col0)
>   keys: KEY._col0 (type: int)
>   mode: complete
>   outputColumnNames: c, $f1
>   Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE Column 
> stats: NONE
>   Select Operator
> expressions: $f1 (type: bigint)
> outputColumnNames: _o__c0
> Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE 
> Column stats: NONE
> File Output Operator
>   compressed: false
>   Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE 
> Column stats: NONE
>   table:
>   input format: 
> org.apache.hadoop.mapred.SequenceFileInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>   serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> {code}
> Above two cases has wrong keys in Map side Reduce Output Operator (both has 
> a, a instead of b,a and c,a respectively



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12301) CBO: Calcite Operator To Hive Operator (Calcite Return Path): fix test failure for udf_percentile.q

2016-08-01 Thread Vineet Garg (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15403195#comment-15403195
 ] 

Vineet Garg commented on HIVE-12301:


I am working on HIVE-14396 which is running into similar issue. It looks like 
that this fix wasn't completely correct.  I have a reproducer which I believe 
will reproduce what is incorrect with the fix. But that reproducer has also 
exposed another wrong result issue which needs to be fixed first.

> CBO: Calcite Operator To Hive Operator (Calcite Return Path): fix test 
> failure for udf_percentile.q
> ---
>
> Key: HIVE-12301
> URL: https://issues.apache.org/jira/browse/HIVE-12301
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-12301.01.patch, HIVE-12301.02.patch, 
> HIVE-12301.03.patch
>
>
> The position in argList is mapped to a wrong column from RS operator



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-12924) CBO: Calcite Operator To Hive Operator (Calcite Return Path): TestCliDriver groupby_ppr_multi_distinct.q failure

2016-08-03 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg reassigned HIVE-12924:
--

Assignee: Vineet Garg  (was: Hari Sankar Sivarama Subramaniyan)

> CBO: Calcite Operator To Hive Operator (Calcite Return Path): TestCliDriver 
> groupby_ppr_multi_distinct.q failure
> 
>
> Key: HIVE-12924
> URL: https://issues.apache.org/jira/browse/HIVE-12924
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Vineet Garg
> Attachments: HIVE-12924.1.patch, HIVE-12924.2.patch, 
> HIVE-12924.3.patch
>
>
> {code}
> EXPLAIN EXTENDED
> FROM srcpart src
> INSERT OVERWRITE TABLE dest1
> SELECT substr(src.key,1,1), count(DISTINCT substr(src.value,5)), 
> concat(substr(src.key,1,1),sum(substr(src.value,5))), sum(DISTINCT 
> substr(src.value, 5)), count(DISTINCT src.value)
> WHERE src.ds = '2008-04-08'
> GROUP BY substr(src.key,1,1)
> {code}
> Ended Job = job_local968043618_0742 with errors
> FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-12803) CBO: Calcite Operator To Hive Operator (Calcite Return Path): MiniTezCliDriver count.q failure

2016-08-03 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg reassigned HIVE-12803:
--

Assignee: Vineet Garg

> CBO: Calcite Operator To Hive Operator (Calcite Return Path): 
> MiniTezCliDriver count.q failure
> --
>
> Key: HIVE-12803
> URL: https://issues.apache.org/jira/browse/HIVE-12803
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Vineet Garg
>
> {code}
> select a, count(distinct b), count(distinct c), sum(d) from abcd group by a;
> {code}
> Set hive.cbo.returnpath.hiveop=true;
> {code}
> java.lang.IndexOutOfBoundsException: Index: 5, Size: 5
> at java.util.ArrayList.rangeCheck(ArrayList.java:635) ~[?:1.7.0_79]
> at java.util.ArrayList.get(ArrayList.java:411) ~[?:1.7.0_79]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.HiveGBOpConvUtil.genReduceSideGB1NoMapGB(HiveGBOpConvUtil.java:1060)
>  ~[hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.HiveGBOpConvUtil.genNoMapSideGBNoSkew(HiveGBOpConvUtil.java:473)
>  ~[hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.HiveGBOpConvUtil.translateGB(HiveGBOpConvUtil.java:304)
>  ~[hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.HiveOpConverter.visit(HiveOpConverter.java:398)
>  ~[hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.HiveOpConverter.dispatch(HiveOpConverter.java:181)
>  ~[hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.HiveOpConverter.convert(HiveOpConverter.java:154)
>  ~[hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedHiveOPDag(CalcitePlanner.java:688)
>  ~[hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:266)
>  [hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10094)
>  [hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:231)
>  [hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:237)
>  [hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:74)
>  [hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:237)
>  [hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:471) 
> [hive-exec-2.1.0-SNAPSHOT.jar:?]
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:311) 
> [hive-exec-2.1.0-SNAPSHOT.jar:?]
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1149) 
> [hive-exec-2.1.0-SNAPSHOT.jar:?]
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1237) 
> [hive-exec-2.1.0-SNAPSHOT.jar:?]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-12203) CBO (Calcite Return Path): groupby_grouping_id2.q returns wrong results

2016-08-15 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg reassigned HIVE-12203:
--

Assignee: Vineet Garg  (was: Jesus Camacho Rodriguez)

> CBO (Calcite Return Path): groupby_grouping_id2.q returns wrong results
> ---
>
> Key: HIVE-12203
> URL: https://issues.apache.org/jira/browse/HIVE-12203
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Affects Versions: 2.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Vineet Garg
> Attachments: HIVE-12203.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12203) CBO (Calcite Return Path): groupby_grouping_id2.q returns wrong results

2016-08-15 Thread Vineet Garg (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15421433#comment-15421433
 ] 

Vineet Garg commented on HIVE-12203:


Interestingly I am seeing NullPointerException on my local system. I am going 
to take a look on this.

> CBO (Calcite Return Path): groupby_grouping_id2.q returns wrong results
> ---
>
> Key: HIVE-12203
> URL: https://issues.apache.org/jira/browse/HIVE-12203
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Affects Versions: 2.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-12203.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-12806) CBO: Calcite Operator To Hive Operator (Calcite Return Path): MiniTezCliDriver vector_auto_smb_mapjoin_14.q failure

2016-08-15 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg reassigned HIVE-12806:
--

Assignee: Vineet Garg

> CBO: Calcite Operator To Hive Operator (Calcite Return Path): 
> MiniTezCliDriver vector_auto_smb_mapjoin_14.q failure
> ---
>
> Key: HIVE-12806
> URL: https://issues.apache.org/jira/browse/HIVE-12806
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Vineet Garg
> Attachments: HIVE-12806.1.patch
>
>
> Step to reproduce:
> mvn test -Dtest=TestMiniTezCliDriver -Dqfile=vector_auto_smb_mapjoin_14.q 
> -Dhive.cbo.returnpath.hiveop=true -Dtest.output.overwrite=true
> Query :
> {code}
> select count(*) from (
>   select a.key as key, a.value as val1, b.value as val2 from tbl1 a join tbl2 
> b on a.key = b.key
> ) subq1
> {code}
> Stack trace :
> {code}
> 2016-01-07T14:08:04,803 ERROR [da534038-d792-4d16-86e9-87b9f971adda main[]]: 
> SessionState (SessionState.java:printError(1010)) - Vertex failed, 
> vertexName=Map 1, vertexId=vertex_1452204324051_0001_33_00, 
> diagnostics=[Vertex vertex_1452204324051_0001_33_00 [Map 1] k\
> illed/failed due to:AM_USERCODE_FAILURE, Exception in VertexManager, 
> vertex:vertex_1452204324051_0001_33_00 [Map 1], java.lang.RuntimeException: 
> java.lang.RuntimeException: Failed to load plan: null: 
> java.lang.IllegalArgumentException: java.net.URISyntaxException: \
> Relative path in absolute URI: subq1:amerge.xml
> at 
> org.apache.hadoop.hive.ql.exec.tez.CustomPartitionVertex.onRootVertexInitialized(CustomPartitionVertex.java:314)
> at 
> org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEventRootInputInitialized.invoke(VertexManager.java:624)
> at 
> org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent$1.run(VertexManager.java:645)
> at 
> org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent$1.run(VertexManager.java:640)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
> at 
> org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent.call(VertexManager.java:640)
> at 
> org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent.call(VertexManager.java:629)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.RuntimeException: Failed to load plan: null: 
> java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative 
> path in absolute URI: subq1:amerge.xml
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:451)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.getMergeWork(Utilities.java:339)
> at 
> org.apache.hadoop.hive.ql.exec.tez.SplitGrouper.populateMapWork(SplitGrouper.java:260)
> at 
> org.apache.hadoop.hive.ql.exec.tez.SplitGrouper.generateGroupedSplits(SplitGrouper.java:172)
> at 
> org.apache.hadoop.hive.ql.exec.tez.CustomPartitionVertex.onRootVertexInitialized(CustomPartitionVertex.java:277)
> ... 12 more
> Caused by: java.lang.IllegalArgumentException: java.net.URISyntaxException: 
> Relative path in absolute URI: subq1:amerge.xml
> at org.apache.hadoop.fs.Path.initialize(Path.java:206)
> at org.apache.hadoop.fs.Path.(Path.java:172)
> at org.apache.hadoop.fs.Path.(Path.java:94)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.getPlanPath(Utilities.java:588)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:387)
> ... 16 more
> Caused by: java.net.URISyntaxException: Relative path in absolute URI: 
> subq1:amerge.xml
> at java.net.URI.checkPath(URI.java:1804)
> at java.net.URI.(URI.java:752)
> at org.apache.hadoop.fs.Path.initialize(Path.java:203)
> ... 20 more
> ]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12806) CBO: Calcite Operator To Hive Operator (Calcite Return Path): MiniTezCliDriver vector_auto_smb_mapjoin_14.q failure

2016-08-15 Thread Vineet Garg (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15421505#comment-15421505
 ] 

Vineet Garg commented on HIVE-12806:


Still failing. I'll take a look

> CBO: Calcite Operator To Hive Operator (Calcite Return Path): 
> MiniTezCliDriver vector_auto_smb_mapjoin_14.q failure
> ---
>
> Key: HIVE-12806
> URL: https://issues.apache.org/jira/browse/HIVE-12806
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-12806.1.patch
>
>
> Step to reproduce:
> mvn test -Dtest=TestMiniTezCliDriver -Dqfile=vector_auto_smb_mapjoin_14.q 
> -Dhive.cbo.returnpath.hiveop=true -Dtest.output.overwrite=true
> Query :
> {code}
> select count(*) from (
>   select a.key as key, a.value as val1, b.value as val2 from tbl1 a join tbl2 
> b on a.key = b.key
> ) subq1
> {code}
> Stack trace :
> {code}
> 2016-01-07T14:08:04,803 ERROR [da534038-d792-4d16-86e9-87b9f971adda main[]]: 
> SessionState (SessionState.java:printError(1010)) - Vertex failed, 
> vertexName=Map 1, vertexId=vertex_1452204324051_0001_33_00, 
> diagnostics=[Vertex vertex_1452204324051_0001_33_00 [Map 1] k\
> illed/failed due to:AM_USERCODE_FAILURE, Exception in VertexManager, 
> vertex:vertex_1452204324051_0001_33_00 [Map 1], java.lang.RuntimeException: 
> java.lang.RuntimeException: Failed to load plan: null: 
> java.lang.IllegalArgumentException: java.net.URISyntaxException: \
> Relative path in absolute URI: subq1:amerge.xml
> at 
> org.apache.hadoop.hive.ql.exec.tez.CustomPartitionVertex.onRootVertexInitialized(CustomPartitionVertex.java:314)
> at 
> org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEventRootInputInitialized.invoke(VertexManager.java:624)
> at 
> org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent$1.run(VertexManager.java:645)
> at 
> org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent$1.run(VertexManager.java:640)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
> at 
> org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent.call(VertexManager.java:640)
> at 
> org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent.call(VertexManager.java:629)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.RuntimeException: Failed to load plan: null: 
> java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative 
> path in absolute URI: subq1:amerge.xml
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:451)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.getMergeWork(Utilities.java:339)
> at 
> org.apache.hadoop.hive.ql.exec.tez.SplitGrouper.populateMapWork(SplitGrouper.java:260)
> at 
> org.apache.hadoop.hive.ql.exec.tez.SplitGrouper.generateGroupedSplits(SplitGrouper.java:172)
> at 
> org.apache.hadoop.hive.ql.exec.tez.CustomPartitionVertex.onRootVertexInitialized(CustomPartitionVertex.java:277)
> ... 12 more
> Caused by: java.lang.IllegalArgumentException: java.net.URISyntaxException: 
> Relative path in absolute URI: subq1:amerge.xml
> at org.apache.hadoop.fs.Path.initialize(Path.java:206)
> at org.apache.hadoop.fs.Path.(Path.java:172)
> at org.apache.hadoop.fs.Path.(Path.java:94)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.getPlanPath(Utilities.java:588)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:387)
> ... 16 more
> Caused by: java.net.URISyntaxException: Relative path in absolute URI: 
> subq1:amerge.xml
> at java.net.URI.checkPath(URI.java:1804)
> at java.net.URI.(URI.java:752)
> at org.apache.hadoop.fs.Path.initialize(Path.java:203)
> ... 20 more
> ]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14522) CBO: Calcite Operator To Hive Operator(Calcite Return Path): Fix test failure for auto_join_filters

2016-08-16 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-14522:
---
Status: Patch Available  (was: Open)

Removed hive.outerjoin.supports.filters. Pre-commit testing to see what test 
needs to be updated

> CBO: Calcite Operator To Hive Operator(Calcite Return Path): Fix test failure 
> for auto_join_filters
> ---
>
> Key: HIVE-14522
> URL: https://issues.apache.org/jira/browse/HIVE-14522
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-14522.1.patch
>
>
> {code}
> CREATE TABLE smb_input1(key int, value int) CLUSTERED BY (key) SORTED BY 
> (key) INTO 2 BUCKETS; 
> CREATE TABLE smb_input2(key int, value int) CLUSTERED BY (value) SORTED BY 
> (value) INTO 2 BUCKETS; 
> LOAD DATA LOCAL INPATH '../../data/files/in1.txt' into table smb_input1;
> LOAD DATA LOCAL INPATH '../../data/files/in2.txt' into table smb_input1;
> LOAD DATA LOCAL INPATH '../../data/files/in1.txt' into table smb_input2;
> LOAD DATA LOCAL INPATH '../../data/files/in2.txt' into table smb_input2;
> SET hive.optimize.bucketmapjoin = true;
> SET hive.optimize.bucketmapjoin.sortedmerge = true;
> SET hive.input.format = 
> org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
> SET hive.outerjoin.supports.filters = false;
> {code}
> {code} SELECT sum(hash(a.key,a.value,b.key,b.value)) FROM myinput1 a LEFT 
> OUTER JOIN myinput1 b on a.key > 40 AND a.value > 50 AND a.key = a.value AND 
> b.key > 40 AND b.value > 50 AND b.key = b.value; {code}
> {code} Expected result: 3078400 Actual result: 4937935 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14522) CBO: Calcite Operator To Hive Operator(Calcite Return Path): Fix test failure for auto_join_filters

2016-08-16 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-14522:
---
Attachment: HIVE-14522.1.patch

> CBO: Calcite Operator To Hive Operator(Calcite Return Path): Fix test failure 
> for auto_join_filters
> ---
>
> Key: HIVE-14522
> URL: https://issues.apache.org/jira/browse/HIVE-14522
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-14522.1.patch
>
>
> {code}
> CREATE TABLE smb_input1(key int, value int) CLUSTERED BY (key) SORTED BY 
> (key) INTO 2 BUCKETS; 
> CREATE TABLE smb_input2(key int, value int) CLUSTERED BY (value) SORTED BY 
> (value) INTO 2 BUCKETS; 
> LOAD DATA LOCAL INPATH '../../data/files/in1.txt' into table smb_input1;
> LOAD DATA LOCAL INPATH '../../data/files/in2.txt' into table smb_input1;
> LOAD DATA LOCAL INPATH '../../data/files/in1.txt' into table smb_input2;
> LOAD DATA LOCAL INPATH '../../data/files/in2.txt' into table smb_input2;
> SET hive.optimize.bucketmapjoin = true;
> SET hive.optimize.bucketmapjoin.sortedmerge = true;
> SET hive.input.format = 
> org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
> SET hive.outerjoin.supports.filters = false;
> {code}
> {code} SELECT sum(hash(a.key,a.value,b.key,b.value)) FROM myinput1 a LEFT 
> OUTER JOIN myinput1 b on a.key > 40 AND a.value > 50 AND a.key = a.value AND 
> b.key > 40 AND b.value > 50 AND b.key = b.value; {code}
> {code} Expected result: 3078400 Actual result: 4937935 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15458) Fix semi-join conversion rule for subquery

2017-02-03 Thread Vineet Garg (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15852147#comment-15852147
 ] 

Vineet Garg commented on HIVE-15458:


As a fix we have decided to add a Project on top of Join in SemiJoin conversion 
rule (instead of doing this in ASTConverter as Ashutosh suggested)

> Fix semi-join conversion rule for subquery
> --
>
> Key: HIVE-15458
> URL: https://issues.apache.org/jira/browse/HIVE-15458
> Project: Hive
>  Issue Type: Sub-task
>  Components: Logical Optimizer
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-15458.1.patch
>
>
> Subquery code in *CalcitePlanner* turns off *hive.enable.semijoin.conversion* 
> since it doesn't work for subqueries.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15458) Fix semi-join conversion rule for subquery

2017-02-03 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-15458:
---
Attachment: HIVE-15458.1.patch

> Fix semi-join conversion rule for subquery
> --
>
> Key: HIVE-15458
> URL: https://issues.apache.org/jira/browse/HIVE-15458
> Project: Hive
>  Issue Type: Sub-task
>  Components: Logical Optimizer
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-15458.1.patch
>
>
> Subquery code in *CalcitePlanner* turns off *hive.enable.semijoin.conversion* 
> since it doesn't work for subqueries.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15458) Fix semi-join conversion rule for subquery

2017-02-03 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-15458:
---
Status: Patch Available  (was: Open)

> Fix semi-join conversion rule for subquery
> --
>
> Key: HIVE-15458
> URL: https://issues.apache.org/jira/browse/HIVE-15458
> Project: Hive
>  Issue Type: Sub-task
>  Components: Logical Optimizer
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-15458.1.patch
>
>
> Subquery code in *CalcitePlanner* turns off *hive.enable.semijoin.conversion* 
> since it doesn't work for subqueries.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-14445) upgrade maven surefire to 2.19.1

2017-02-03 Thread Vineet Garg (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15851913#comment-15851913
 ] 

Vineet Garg commented on HIVE-14445:


I have also been facing timeout issue. My vote is to revert this change until 
we figure out what's causing the issue and have a resolution.

> upgrade maven surefire to 2.19.1
> 
>
> Key: HIVE-14445
> URL: https://issues.apache.org/jira/browse/HIVE-14445
> Project: Hive
>  Issue Type: Sub-task
>  Components: Tests
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
> Fix For: 2.2.0
>
> Attachments: HIVE-14445.1.patch
>
>
> newer maven surefire has a great feature:
> * it is possible to select testmethods by regular expressions...and there are 
> also improvements in using '#' to address testmethods
> i've looked into this earlier...the upgrade is "almost" seemless...i'm 
> already using 2.19.1, but the spark modules don't really like the empty 
> spark.home variable



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15458) Fix semi-join conversion rule for subquery

2017-02-01 Thread Vineet Garg (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15849292#comment-15849292
 ] 

Vineet Garg commented on HIVE-15458:


This is not really subquery issue. This is also reproducible with following 
query:
{code:SQL} select part.p_type from part join (select p1.p_name from part p1, 
part p2 group by p1.p_name) pp where pp.p_name = part.p_name; {code}

This will throw following exception in hive log
{noformat}
org.apache.hadoop.hive.ql.parse.SemanticException: Line 0:-1 Invalid table 
alias or column reference '$hdt$_0': (possible column names are: 
$hdt$_1.p_name, $hdt$_2.dummy)
{noformat}

Note that after throwing this exception HIVE falls back to non-cbo path to 
execute this query successfully, so beeline/hivecli won't see this error.

Issue is during conversion of calcite plan to AST, specifically following code 
in {{ASTConverter.java}}

{code}
else if (r instanceof Join) {
  Join join = (Join) r;
  QueryBlockInfo left = convertSource(join.getLeft());
  QueryBlockInfo right = convertSource(join.getRight());
  s = new Schema(left.schema, right.schema);
  ASTNode cond = join.getCondition().accept(new RexVisitor(s));
  boolean semiJoin = join instanceof SemiJoin;
  if (join.getRight() instanceof Join) {
// Invert join inputs; this is done because otherwise the 
SemanticAnalyzer
// methods to merge joins will not kick in
JoinRelType type;
if (join.getJoinType() == JoinRelType.LEFT) {
  type = JoinRelType.RIGHT;
} else if (join.getJoinType() == JoinRelType.RIGHT) {
  type = JoinRelType.LEFT;
} else {
  type = join.getJoinType();
}
ast = ASTBuilder.join(right.ast, left.ast, type, cond, semiJoin);
  } else {
ast = ASTBuilder.join(left.ast, right.ast, join.getJoinType(), cond, 
semiJoin);
  }
  if (semiJoin) {
s = left.schema;
  }
{code}

We should not be inverting join inputs for SEMI join since it change the 
semantics.

Bypassing this for semi-join produces correct AST but further throws an 
exception while generating joinTree  from AST in 
{{SemanticAnalyzer::genJoinTree()}}

Plan after semi-join optimization looks like as follow:

{code}
HiveProject(p_type=[$1])
  HiveSemiJoin(condition=[=($0, $2)], joinType=[inner])
HiveProject(p_name=[$1], p_type=[$4])
  HiveFilter(condition=[IS NOT NULL($1)])
HiveTableScan(table=[[default.part]], table:alias=[part])
HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not 
available])
  HiveProject(p_name=[$1])
HiveFilter(condition=[IS NOT NULL($1)])
  HiveTableScan(table=[[default.part]], table:alias=[p1])
  HiveProject(DUMMY=[0])
HiveTableScan(table=[[default.part]], table:alias=[p2])
{code}


Since {{HiveSemiJoin}} has {{HiveJoin}} as it's right input following code in 
{{SemanticAnalyzer::genJoinTree()}} throws an error

{code}
ASTNode left = (ASTNode) joinParseTree.getChild(0);
ASTNode right = (ASTNode) joinParseTree.getChild(1);
boolean isValidLeftToken = isValidJoinSide(left);
boolean isJoinLeftToken = !isValidLeftToken && isJoinToken(left);
boolean isValidRightToken = isValidJoinSide(right);
boolean isJoinRightToken = !isValidRightToken && isJoinToken(right);
// TODO: if we didn't care about the column order, we could switch join 
sides here
//   for TOK_JOIN and TOK_FULLOUTERJOIN.
if (!isValidLeftToken && !isJoinLeftToken) {
  throw new SemanticException("Invalid token on the left side of the join: "
  + left.getToken().getText() + "; please rewrite your query");
} else if (!isValidRightToken) {
  String advice= "";
  if (isJoinRightToken && !isJoinLeftToken) {
advice = "; for example, put the nested join on the left side, or nest 
joins differently";
  } else if (isJoinRightToken) {
advice = "; for example, nest joins differently";
  }
  throw new SemanticException("Invalid token on the right side of the join: 
"
  + right.getToken().getText() + "; please rewrite your query" + advice);
}
{code}

{{genJoinTree}} does not expect it's right input to be another join

{code}
private static boolean isValidJoinSide(ASTNode right) {
return (right.getToken().getType() == HiveParser.TOK_TABREF)
|| (right.getToken().getType() == HiveParser.TOK_SUBQUERY)
|| (right.getToken().getType() == HiveParser.TOK_PTBLFUNCTION);
  }
{code}

> Fix semi-join conversion rule for subquery
> --
>
> Key: HIVE-15458
> URL: https://issues.apache.org/jira/browse/HIVE-15458
> Project: Hive
>  Issue Type: Sub-task
>  Components: Logical Optimizer
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>
> Subquery code in *CalcitePlanner* turns 

[jira] [Commented] (HIVE-15458) Fix semi-join conversion rule for subquery

2017-02-01 Thread Vineet Garg (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15849294#comment-15849294
 ] 

Vineet Garg commented on HIVE-15458:


cc [~ashutoshc]
I tried with cbo off for following query
{code:SQL} select p1.p_name from part p1 left semi join (select p.* from part 
p, part pp where p.p_size = pp.p_size) p2 on p1.p_type = p2.p_type; {code}

to produce similar AST to observe how we deal with this in non-cbo, but HIVE 
doesn't produce another JOIN underneath LEFT SEMI JOIN node. Probably we never 
get into this situation with cbo off.

> Fix semi-join conversion rule for subquery
> --
>
> Key: HIVE-15458
> URL: https://issues.apache.org/jira/browse/HIVE-15458
> Project: Hive
>  Issue Type: Sub-task
>  Components: Logical Optimizer
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>
> Subquery code in *CalcitePlanner* turns off *hive.enable.semijoin.conversion* 
> since it doesn't work for subqueries.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15812) Scalar subquery with having throws exception

2017-02-03 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-15812:
---
Attachment: HIVE-15812.1.patch

> Scalar subquery with having throws exception
> 
>
> Key: HIVE-15812
> URL: https://issues.apache.org/jira/browse/HIVE-15812
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>  Labels: sub-query
> Attachments: HIVE-15812.1.patch
>
>
> Following query throws an exception
> {code:SQL}
> select sum(p_retailprice) from part group by p_type having sum(p_retailprice) 
> > (select max(pp.p_retailprice) from part pp);
> {code}
> {noformat}
> SemanticException [Error 10004]: Line 3:40 Invalid table alias or column 
> reference 'pp': (possible column names are: p_partkey, p_name, p_mfgr, 
> p_brand, p_type, p_size, p_container, p_retailprice, p_comment)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15812) Scalar subquery with having throws exception

2017-02-03 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-15812:
---
Status: Patch Available  (was: Open)

> Scalar subquery with having throws exception
> 
>
> Key: HIVE-15812
> URL: https://issues.apache.org/jira/browse/HIVE-15812
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>  Labels: sub-query
> Attachments: HIVE-15812.1.patch
>
>
> Following query throws an exception
> {code:SQL}
> select sum(p_retailprice) from part group by p_type having sum(p_retailprice) 
> > (select max(pp.p_retailprice) from part pp);
> {code}
> {noformat}
> SemanticException [Error 10004]: Line 3:40 Invalid table alias or column 
> reference 'pp': (possible column names are: p_partkey, p_name, p_mfgr, 
> p_brand, p_type, p_size, p_container, p_retailprice, p_comment)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15458) Fix semi-join conversion rule for subquery

2017-02-03 Thread Vineet Garg (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15852517#comment-15852517
 ] 

Vineet Garg commented on HIVE-15458:


[~ashutoshc] Incorporated left join optimization in latest patch

> Fix semi-join conversion rule for subquery
> --
>
> Key: HIVE-15458
> URL: https://issues.apache.org/jira/browse/HIVE-15458
> Project: Hive
>  Issue Type: Sub-task
>  Components: Logical Optimizer
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-15458.1.patch, HIVE-15458.2.patch
>
>
> Subquery code in *CalcitePlanner* turns off *hive.enable.semijoin.conversion* 
> since it doesn't work for subqueries.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15458) Fix semi-join conversion rule for subquery

2017-02-03 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-15458:
---
Attachment: HIVE-15458.2.patch

> Fix semi-join conversion rule for subquery
> --
>
> Key: HIVE-15458
> URL: https://issues.apache.org/jira/browse/HIVE-15458
> Project: Hive
>  Issue Type: Sub-task
>  Components: Logical Optimizer
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-15458.1.patch, HIVE-15458.2.patch
>
>
> Subquery code in *CalcitePlanner* turns off *hive.enable.semijoin.conversion* 
> since it doesn't work for subqueries.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15458) Fix semi-join conversion rule for subquery

2017-02-03 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-15458:
---
Status: Patch Available  (was: Open)

> Fix semi-join conversion rule for subquery
> --
>
> Key: HIVE-15458
> URL: https://issues.apache.org/jira/browse/HIVE-15458
> Project: Hive
>  Issue Type: Sub-task
>  Components: Logical Optimizer
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-15458.1.patch, HIVE-15458.2.patch
>
>
> Subquery code in *CalcitePlanner* turns off *hive.enable.semijoin.conversion* 
> since it doesn't work for subqueries.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15458) Fix semi-join conversion rule for subquery

2017-02-03 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-15458:
---
Status: Open  (was: Patch Available)

> Fix semi-join conversion rule for subquery
> --
>
> Key: HIVE-15458
> URL: https://issues.apache.org/jira/browse/HIVE-15458
> Project: Hive
>  Issue Type: Sub-task
>  Components: Logical Optimizer
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-15458.1.patch, HIVE-15458.2.patch
>
>
> Subquery code in *CalcitePlanner* turns off *hive.enable.semijoin.conversion* 
> since it doesn't work for subqueries.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-15812) Scalar subquery with having throws exception

2017-02-03 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg reassigned HIVE-15812:
--


> Scalar subquery with having throws exception
> 
>
> Key: HIVE-15812
> URL: https://issues.apache.org/jira/browse/HIVE-15812
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>  Labels: sub-query
>
> Following query throws an exception
> {code:SQL}
> select sum(p_retailprice) from part group by p_type having sum(p_retailprice) 
> > (select max(pp.p_retailprice) from part pp);
> {code}
> {noformat}
> SemanticException [Error 10004]: Line 3:40 Invalid table alias or column 
> reference 'pp': (possible column names are: p_partkey, p_name, p_mfgr, 
> p_brand, p_type, p_size, p_container, p_retailprice, p_comment)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15763) Subquery in both LHS and RHS of IN/NOT IN throws misleading error

2017-01-31 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-15763:
---
Attachment: HIVE-15763.1.patch

> Subquery in both LHS and RHS of IN/NOT IN throws misleading error
> -
>
> Key: HIVE-15763
> URL: https://issues.apache.org/jira/browse/HIVE-15763
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>  Labels: sub-query
> Attachments: HIVE-15763.1.patch
>
>
> Following query throws an error
> {code}select * from part where (select max(p_size) from part) IN (select 
> p_size from part);{code}
> Error
> {noformat}
> SemanticException [Error 10249]: Line 1:79 Unsupported SubQuery Expression 
> 'p_size': Only 1 SubQuery expression is supported.
> {noformat}
> Such queries should either be supported or should be detected and an 
> appropriate error message should be thrown.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15763) Subquery in both LHS and RHS of IN/NOT IN throws misleading error

2017-01-31 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-15763:
---
Status: Patch Available  (was: Open)

> Subquery in both LHS and RHS of IN/NOT IN throws misleading error
> -
>
> Key: HIVE-15763
> URL: https://issues.apache.org/jira/browse/HIVE-15763
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>  Labels: sub-query
> Attachments: HIVE-15763.1.patch
>
>
> Following query throws an error
> {code}select * from part where (select max(p_size) from part) IN (select 
> p_size from part);{code}
> Error
> {noformat}
> SemanticException [Error 10249]: Line 1:79 Unsupported SubQuery Expression 
> 'p_size': Only 1 SubQuery expression is supported.
> {noformat}
> Such queries should either be supported or should be detected and an 
> appropriate error message should be thrown.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15763) Subquery in both LHS and RHS of IN/NOT IN throws misleading error

2017-01-31 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-15763:
---
Summary: Subquery in both LHS and RHS of IN/NOT IN throws misleading error  
(was: Subquery in both lhs and rsh of IN/NOT IN throws misleading error)

> Subquery in both LHS and RHS of IN/NOT IN throws misleading error
> -
>
> Key: HIVE-15763
> URL: https://issues.apache.org/jira/browse/HIVE-15763
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>  Labels: sub-query
>
> Following query throws an error
> {code}select * from part where (select max(p_size) from part) IN (select 
> p_size from part);{code}
> Error
> {noformat}
> SemanticException [Error 10249]: Line 1:79 Unsupported SubQuery Expression 
> 'p_size': Only 1 SubQuery expression is supported.
> {noformat}
> Such queries should either be supported or should be detected and an 
> appropriate error message should be thrown.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-14445) upgrade maven surefire to 2.19.1

2017-02-07 Thread Vineet Garg (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15856524#comment-15856524
 ] 

Vineet Garg commented on HIVE-14445:


I am also using intellij

> upgrade maven surefire to 2.19.1
> 
>
> Key: HIVE-14445
> URL: https://issues.apache.org/jira/browse/HIVE-14445
> Project: Hive
>  Issue Type: Sub-task
>  Components: Tests
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
> Fix For: 2.2.0
>
> Attachments: HIVE-14445.1.patch
>
>
> newer maven surefire has a great feature:
> * it is possible to select testmethods by regular expressions...and there are 
> also improvements in using '#' to address testmethods
> i've looked into this earlier...the upgrade is "almost" seemless...i'm 
> already using 2.19.1, but the spark modules don't really like the empty 
> spark.home variable



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15560) clean up out files that do not correspond to any q files

2017-02-06 Thread Vineet Garg (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15854732#comment-15854732
 ] 

Vineet Garg commented on HIVE-15560:


[~hagleitn] would you mind sharing the script? Curious how did you find such 
files.

> clean up out files that do not correspond to any q files
> 
>
> Key: HIVE-15560
> URL: https://issues.apache.org/jira/browse/HIVE-15560
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Gunther Hagleitner
> Fix For: 2.2.0
>
> Attachments: HIVE-15560.1.patch
>
>
> I can see some for schema evolution, there may be others.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15458) Fix semi-join conversion rule for subquery

2017-02-04 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-15458:
---
Status: Open  (was: Patch Available)

> Fix semi-join conversion rule for subquery
> --
>
> Key: HIVE-15458
> URL: https://issues.apache.org/jira/browse/HIVE-15458
> Project: Hive
>  Issue Type: Sub-task
>  Components: Logical Optimizer
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-15458.1.patch, HIVE-15458.2.patch, 
> HIVE-15458.3.patch
>
>
> Subquery code in *CalcitePlanner* turns off *hive.enable.semijoin.conversion* 
> since it doesn't work for subqueries.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15458) Fix semi-join conversion rule for subquery

2017-02-04 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-15458:
---
Attachment: HIVE-15458.3.patch

> Fix semi-join conversion rule for subquery
> --
>
> Key: HIVE-15458
> URL: https://issues.apache.org/jira/browse/HIVE-15458
> Project: Hive
>  Issue Type: Sub-task
>  Components: Logical Optimizer
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-15458.1.patch, HIVE-15458.2.patch, 
> HIVE-15458.3.patch
>
>
> Subquery code in *CalcitePlanner* turns off *hive.enable.semijoin.conversion* 
> since it doesn't work for subqueries.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15763) Subquery in both LHS and RHS of IN/NOT IN throws misleading error

2017-02-01 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-15763:
---
Attachment: HIVE-15763.2.patch

> Subquery in both LHS and RHS of IN/NOT IN throws misleading error
> -
>
> Key: HIVE-15763
> URL: https://issues.apache.org/jira/browse/HIVE-15763
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>  Labels: sub-query
> Attachments: HIVE-15763.1.patch, HIVE-15763.2.patch
>
>
> Following query throws an error
> {code}select * from part where (select max(p_size) from part) IN (select 
> p_size from part);{code}
> Error
> {noformat}
> SemanticException [Error 10249]: Line 1:79 Unsupported SubQuery Expression 
> 'p_size': Only 1 SubQuery expression is supported.
> {noformat}
> Such queries should either be supported or should be detected and an 
> appropriate error message should be thrown.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15763) Subquery in both LHS and RHS of IN/NOT IN throws misleading error

2017-02-01 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-15763:
---
Status: Patch Available  (was: Open)

> Subquery in both LHS and RHS of IN/NOT IN throws misleading error
> -
>
> Key: HIVE-15763
> URL: https://issues.apache.org/jira/browse/HIVE-15763
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>  Labels: sub-query
> Attachments: HIVE-15763.1.patch, HIVE-15763.2.patch
>
>
> Following query throws an error
> {code}select * from part where (select max(p_size) from part) IN (select 
> p_size from part);{code}
> Error
> {noformat}
> SemanticException [Error 10249]: Line 1:79 Unsupported SubQuery Expression 
> 'p_size': Only 1 SubQuery expression is supported.
> {noformat}
> Such queries should either be supported or should be detected and an 
> appropriate error message should be thrown.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15763) Subquery in both LHS and RHS of IN/NOT IN throws misleading error

2017-02-01 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-15763:
---
Status: Open  (was: Patch Available)

> Subquery in both LHS and RHS of IN/NOT IN throws misleading error
> -
>
> Key: HIVE-15763
> URL: https://issues.apache.org/jira/browse/HIVE-15763
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>  Labels: sub-query
> Attachments: HIVE-15763.1.patch, HIVE-15763.2.patch
>
>
> Following query throws an error
> {code}select * from part where (select max(p_size) from part) IN (select 
> p_size from part);{code}
> Error
> {noformat}
> SemanticException [Error 10249]: Line 1:79 Unsupported SubQuery Expression 
> 'p_size': Only 1 SubQuery expression is supported.
> {noformat}
> Such queries should either be supported or should be detected and an 
> appropriate error message should be thrown.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15703) HiveSubQRemoveRelBuilder should use Hive's own factories

2017-01-31 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-15703:
---
Status: Open  (was: Patch Available)

> HiveSubQRemoveRelBuilder should use Hive's own factories
> 
>
> Key: HIVE-15703
> URL: https://issues.apache.org/jira/browse/HIVE-15703
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Vineet Garg
> Attachments: HIVE-15703.01.patch, HIVE-15703.2.patch, 
> HIVE-15703.3.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15703) HiveSubQRemoveRelBuilder should use Hive's own factories

2017-01-31 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-15703:
---
Status: Patch Available  (was: Open)

> HiveSubQRemoveRelBuilder should use Hive's own factories
> 
>
> Key: HIVE-15703
> URL: https://issues.apache.org/jira/browse/HIVE-15703
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Vineet Garg
> Attachments: HIVE-15703.01.patch, HIVE-15703.2.patch, 
> HIVE-15703.3.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15703) HiveSubQRemoveRelBuilder should use Hive's own factories

2017-01-31 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-15703:
---
Attachment: HIVE-15703.3.patch

> HiveSubQRemoveRelBuilder should use Hive's own factories
> 
>
> Key: HIVE-15703
> URL: https://issues.apache.org/jira/browse/HIVE-15703
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Vineet Garg
> Attachments: HIVE-15703.01.patch, HIVE-15703.2.patch, 
> HIVE-15703.3.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15721) Allow IN/NOT IN correlated subquery with aggregates

2017-01-24 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-15721:
---
Attachment: HIVE-15721.1.patch

> Allow  IN/NOT IN correlated subquery with aggregates
> 
>
> Key: HIVE-15721
> URL: https://issues.apache.org/jira/browse/HIVE-15721
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>  Labels: sub-query
> Attachments: HIVE-15721.1.patch
>
>
> With HIVE-15544 IN/NOT IN correlated subqueries with aggregates were disabled 
> since re-writting them into JOIN could have produced wrong result.
> Wrong results would occur if subquery produces zero row, since aggregate 
> always produce result lower such query into LEFT JOIN or SEMI JOIN would not 
> take these case into consideration.
> We propose to allow such queries with an added run time check which will 
> throw an error/exception if subquery produces zero row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15721) Allow IN/NOT IN correlated subquery with aggregates

2017-01-24 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-15721:
---
Status: Patch Available  (was: Open)

> Allow  IN/NOT IN correlated subquery with aggregates
> 
>
> Key: HIVE-15721
> URL: https://issues.apache.org/jira/browse/HIVE-15721
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>  Labels: sub-query
> Attachments: HIVE-15721.1.patch
>
>
> With HIVE-15544 IN/NOT IN correlated subqueries with aggregates were disabled 
> since re-writting them into JOIN could have produced wrong result.
> Wrong results would occur if subquery produces zero row, since aggregate 
> always produce result lower such query into LEFT JOIN or SEMI JOIN would not 
> take these case into consideration.
> We propose to allow such queries with an added run time check which will 
> throw an error/exception if subquery produces zero row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] (HIVE-15160) Can't order by an unselected column

2017-01-30 Thread Vineet Garg (JIRA)
Title: Message Title
 
 
 
 
 
 
 
 
 
 
  
 
 Vineet Garg commented on  HIVE-15160 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
  Re: Can't order by an unselected column  
 
 
 
 
 
 
 
 
 
 
Pengcheng Xiong Can you create RB for this? I would like to take a look at the patch. 
 
 
 
 
 
 
 
 
 
 
 
 

 
 Add Comment 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 

 This message was sent by Atlassian JIRA (v6.3.15#6346-sha1:dbc023d) 
 
 
 
 
  
 
 
 
 
 
 
 
 
   



[jira] (HIVE-15753) subquery failing with org.apache.hadoop.hive.ql.parse.SemanticException

2017-01-30 Thread Vineet Garg (JIRA)
Title: Message Title
 
 
 
 
 
 
 
 
 
 
  
 
 Vineet Garg updated an issue 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 Hive /  HIVE-15753 
 
 
 
  subquery failing with org.apache.hadoop.hive.ql.parse.SemanticException  
 
 
 
 
 
 
 
 
 

Change By:
 
 Vineet Garg 
 
 
 
 
 
 
 
 
 
 == Simple reproducer==* Create table {{part}} using {{q_test_init.sql}}{code}explain SELECT p1.p_name FROM part p1 LEFT JOIN (select p_type as p_col from part ) p2 WHERE NOT EXISTS+(select pp1.p_type as p_col from part pp1 where pp1.p_partkey = p2.p_col);{code} Following query is failing with SemanticExceptionQuery:SELECT DISTINCT  t1.smallint_col_11  FROM table_21 t1  LEFT JOIN (  SELECT  smallint_col_45,  (-224) - (COALESCE(MIN(665) OVER (ORDER BY smallint_col_45 DESC, varchar0170_col_23 DESC), NULL, -631)) AS int_col,  AVG((GREATEST(CAST(806 AS int), CAST(-606 AS int))) * (39)) OVER (PARTITION BY smallint_col_45 ORDER BY smallint_col_45 DESC, varchar0170_col_23 ASC ROWS BETWEEN 24 FOLLOWING AND UNBOUNDED FOLLOWING) AS float_col,  COALESCE(338, (965) + (-335), MAX(544) OVER (PARTITION BY varchar0170_col_23)) AS int_col_1,  varchar0170_col_23  FROM table_20  ) t2 ON (((t2.int_col_1) = (t1.smallint_col_3)) AND ((t2.smallint_col_45) = (t1.smallint_col_11))) AND ((t2.smallint_col_45) = (t1.smallint_col_11))  WHERE  NOT EXISTS (SELECT  COALESCE(tt1.smallint_col_11, tt2.smallint_col_3, tt1.smallint_col_11) AS int_col  FROM table_21 tt1  INNER JOIN table_21 tt2 ON (tt2.smallint_col_11) = (tt1.smallint_col_3)  WHERE  ((tt2.smallint_col_11) >= (tt1.smallint_col_3)) AND ((t2.int_col) = (tt2.smallint_col_3)))  
 
 
 
 
 
 
 
 
 
 
 
 

 
 Add Comment 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 

 This message was sent by Atlassian JIRA (v6.3.15#6346-sha1:dbc023d) 
 
 
 
 
  
 
 
 
 
 
 
 
 
   



[jira] (HIVE-15753) subquery failing with org.apache.hadoop.hive.ql.parse.SemanticException

2017-01-30 Thread Vineet Garg (JIRA)
Title: Message Title
 
 
 
 
 
 
 
 
 
 
  
 
 Vineet Garg updated an issue 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 Hive /  HIVE-15753 
 
 
 
  subquery failing with org.apache.hadoop.hive.ql.parse.SemanticException  
 
 
 
 
 
 
 
 
 

Change By:
 
 Vineet Garg 
 
 
 

Attachment:
 
 HIVE-15753.1.patch 
 
 
 
 
 
 
 
 
 
 
 
 

 
 Add Comment 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 

 This message was sent by Atlassian JIRA (v6.3.15#6346-sha1:dbc023d) 
 
 
 
 
  
 
 
 
 
 
 
 
 
   



[jira] (HIVE-15753) subquery failing with org.apache.hadoop.hive.ql.parse.SemanticException

2017-01-30 Thread Vineet Garg (JIRA)
Title: Message Title
 
 
 
 
 
 
 
 
 
 
  
 
 Vineet Garg updated an issue 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 Hive /  HIVE-15753 
 
 
 
  subquery failing with org.apache.hadoop.hive.ql.parse.SemanticException  
 
 
 
 
 
 
 
 
 

Change By:
 
 Vineet Garg 
 
 
 
 
 
 
 
 
 
 ==   Simple reproducer ==  - * Create table {{part}} using {{q_test_init.sql}} * Run the following query {code}explain SELECT p1.p_name FROM part p1 LEFT JOIN (select p_type as p_col from part ) p2 WHERE NOT EXISTS+(select pp1.p_type as p_col from part pp1 where pp1.p_partkey = p2.p_col);{code} - Following query is failing with SemanticExceptionQuery:SELECT DISTINCT  t1.smallint_col_11  FROM table_21 t1  LEFT JOIN (  SELECT  smallint_col_45,  (-224) - (COALESCE(MIN(665) OVER (ORDER BY smallint_col_45 DESC, varchar0170_col_23 DESC), NULL, -631)) AS int_col,  AVG((GREATEST(CAST(806 AS int), CAST(-606 AS int))) * (39)) OVER (PARTITION BY smallint_col_45 ORDER BY smallint_col_45 DESC, varchar0170_col_23 ASC ROWS BETWEEN 24 FOLLOWING AND UNBOUNDED FOLLOWING) AS float_col,  COALESCE(338, (965) + (-335), MAX(544) OVER (PARTITION BY varchar0170_col_23)) AS int_col_1,  varchar0170_col_23  FROM table_20  ) t2 ON (((t2.int_col_1) = (t1.smallint_col_3)) AND ((t2.smallint_col_45) = (t1.smallint_col_11))) AND ((t2.smallint_col_45) = (t1.smallint_col_11))  WHERE  NOT EXISTS (SELECT  COALESCE(tt1.smallint_col_11, tt2.smallint_col_3, tt1.smallint_col_11) AS int_col  FROM table_21 tt1  INNER JOIN table_21 tt2 ON (tt2.smallint_col_11) = (tt1.smallint_col_3)  WHERE  ((tt2.smallint_col_11) >= (tt1.smallint_col_3)) AND ((t2.int_col) = (tt2.smallint_col_3))) 
 
 
 
 
 
 
 
 
 
 
 
 

 
 Add Comment 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 

 This message was sent by Atlassian JIRA (v6.3.15#6346-sha1:dbc023d) 
 
 
 
 
  
 
  

[jira] (HIVE-15753) subquery failing with org.apache.hadoop.hive.ql.parse.SemanticException

2017-01-30 Thread Vineet Garg (JIRA)
Title: Message Title
 
 
 
 
 
 
 
 
 
 
  
 
 Vineet Garg updated an issue 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 Hive /  HIVE-15753 
 
 
 
  subquery failing with org.apache.hadoop.hive.ql.parse.SemanticException  
 
 
 
 
 
 
 
 
 

Change By:
 
 Vineet Garg 
 
 
 

Labels:
 
 sub-query 
 
 
 
 
 
 
 
 
 
 
 
 

 
 Add Comment 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 

 This message was sent by Atlassian JIRA (v6.3.15#6346-sha1:dbc023d) 
 
 
 
 
  
 
 
 
 
 
 
 
 
   



[jira] (HIVE-15757) Allow EXISTS/NOT EXISTS correlated subquery with aggregates

2017-01-30 Thread Vineet Garg (JIRA)
Title: Message Title
 
 
 
 
 
 
 
 
 
 
  
 
 Vineet Garg assigned an issue to Vineet Garg 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 Hive /  HIVE-15757 
 
 
 
  Allow EXISTS/NOT EXISTS correlated subquery with aggregates  
 
 
 
 
 
 
 
 
 
 
 
 

 
 Add Comment 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 

 This message was sent by Atlassian JIRA (v6.3.15#6346-sha1:dbc023d) 
 
 
 
 
  
 
 
 
 
 
 
 
 
   



[jira] (HIVE-15759) Allow correlated subqueries with windowing clause

2017-01-30 Thread Vineet Garg (JIRA)
Title: Message Title
 
 
 
 
 
 
 
 
 
 
  
 
 Vineet Garg assigned an issue to Vineet Garg 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 Hive /  HIVE-15759 
 
 
 
  Allow correlated subqueries with windowing clause  
 
 
 
 
 
 
 
 
 
 
 
 

 
 Add Comment 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 

 This message was sent by Atlassian JIRA (v6.3.15#6346-sha1:dbc023d) 
 
 
 
 
  
 
 
 
 
 
 
 
 
   



[jira] (HIVE-15759) Allow correlated subqueries with windowing clause

2017-01-30 Thread Vineet Garg (JIRA)
Title: Message Title
 
 
 
 
 
 
 
 
 
 
  
 
 Vineet Garg updated an issue 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 Hive /  HIVE-15759 
 
 
 
  Allow correlated subqueries with windowing clause  
 
 
 
 
 
 
 
 
 

Change By:
 
 Vineet Garg 
 
 
 

Labels:
 
 sub-query 
 
 
 
 
 
 
 
 
 
 
 
 

 
 Add Comment 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 

 This message was sent by Atlassian JIRA (v6.3.15#6346-sha1:dbc023d) 
 
 
 
 
  
 
 
 
 
 
 
 
 
   



[jira] (HIVE-15757) Allow EXISTS/NOT EXISTS correlated subquery with aggregates

2017-01-30 Thread Vineet Garg (JIRA)
Title: Message Title
 
 
 
 
 
 
 
 
 
 
  
 
 Vineet Garg updated an issue 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 Hive /  HIVE-15757 
 
 
 
  Allow EXISTS/NOT EXISTS correlated subquery with aggregates  
 
 
 
 
 
 
 
 
 

Change By:
 
 Vineet Garg 
 
 
 

Labels:
 
 sub-query 
 
 
 
 
 
 
 
 
 
 
 
 

 
 Add Comment 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 

 This message was sent by Atlassian JIRA (v6.3.15#6346-sha1:dbc023d) 
 
 
 
 
  
 
 
 
 
 
 
 
 
   



[jira] (HIVE-15703) HiveSubQRemoveRelBuilder should use Hive's own factories

2017-01-30 Thread Vineet Garg (JIRA)
Title: Message Title
 
 
 
 
 
 
 
 
 
 
  
 
 Vineet Garg updated an issue 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 Hive /  HIVE-15703 
 
 
 
  HiveSubQRemoveRelBuilder should use Hive's own factories  
 
 
 
 
 
 
 
 
 

Change By:
 
 Vineet Garg 
 
 
 

Attachment:
 
 HIVE-15703.2.patch 
 
 
 
 
 
 
 
 
 
 
 
 

 
 Add Comment 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 

 This message was sent by Atlassian JIRA (v6.3.15#6346-sha1:dbc023d) 
 
 
 
 
  
 
 
 
 
 
 
 
 
   



[jira] (HIVE-15703) HiveSubQRemoveRelBuilder should use Hive's own factories

2017-01-30 Thread Vineet Garg (JIRA)
Title: Message Title
 
 
 
 
 
 
 
 
 
 
  
 
 Vineet Garg updated  HIVE-15703 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 Hive /  HIVE-15703 
 
 
 
  HiveSubQRemoveRelBuilder should use Hive's own factories  
 
 
 
 
 
 
 
 
 

Change By:
 
 Vineet Garg 
 
 
 

Status:
 
 Patch Available Open 
 
 
 
 
 
 
 
 
 
 
 
 

 
 Add Comment 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 

 This message was sent by Atlassian JIRA (v6.3.15#6346-sha1:dbc023d) 
 
 
 
 
  
 
 
 
 
 
 
 
 
   



[jira] (HIVE-15703) HiveSubQRemoveRelBuilder should use Hive's own factories

2017-01-30 Thread Vineet Garg (JIRA)
Title: Message Title
 
 
 
 
 
 
 
 
 
 
  
 
 Vineet Garg updated  HIVE-15703 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 Hive /  HIVE-15703 
 
 
 
  HiveSubQRemoveRelBuilder should use Hive's own factories  
 
 
 
 
 
 
 
 
 

Change By:
 
 Vineet Garg 
 
 
 

Status:
 
 Open Patch Available 
 
 
 
 
 
 
 
 
 
 
 
 

 
 Add Comment 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 

 This message was sent by Atlassian JIRA (v6.3.15#6346-sha1:dbc023d) 
 
 
 
 
  
 
 
 
 
 
 
 
 
   



[jira] (HIVE-15758) Allow correlated scalar subqueries with aggregates which has non-equi join predicates

2017-01-30 Thread Vineet Garg (JIRA)
Title: Message Title
 
 
 
 
 
 
 
 
 
 
  
 
 Vineet Garg assigned an issue to Vineet Garg 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 Hive /  HIVE-15758 
 
 
 
  Allow correlated scalar subqueries with aggregates which has non-equi join predicates  
 
 
 
 
 
 
 
 
 

Change By:
 
 Vineet Garg 
 
 
 

Assignee:
 
 Vineet Garg 
 
 
 
 
 
 
 
 
 
 
 
 

 
 Add Comment 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 

 This message was sent by Atlassian JIRA (v6.3.15#6346-sha1:dbc023d) 
 
 
 
 
  
 
 
 
 
 
 
 
 
   



[jira] (HIVE-15758) Allow correlated scalar subqueries with aggregates which has non-equi join predicates

2017-01-30 Thread Vineet Garg (JIRA)
Title: Message Title
 
 
 
 
 
 
 
 
 
 
  
 
 Vineet Garg updated an issue 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 Hive /  HIVE-15758 
 
 
 
  Allow correlated scalar subqueries with aggregates which has non-equi join predicates  
 
 
 
 
 
 
 
 
 

Change By:
 
 Vineet Garg 
 
 
 

Labels:
 
 sub-query 
 
 
 
 
 
 
 
 
 
 
 
 

 
 Add Comment 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 

 This message was sent by Atlassian JIRA (v6.3.15#6346-sha1:dbc023d) 
 
 
 
 
  
 
 
 
 
 
 
 
 
   



[jira] (HIVE-15160) Can't order by an unselected column

2017-01-30 Thread Vineet Garg (JIRA)
Title: Message Title
 
 
 
 
 
 
 
 
 
 
  
 
 Vineet Garg commented on  HIVE-15160 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
  Re: Can't order by an unselected column  
 
 
 
 
 
 
 
 
 
 
I applied your patch locally and tried running following tests but none of them worked. Following queries have the pattern observed in tpcds queries 

 

-- order by has aggregate which is in select
select sum(p_retailprice) from part order by sum(p_retailprice);

-- order by has item which isn't in select
select sum(p_retailprice) from part group by p_type order by p_type; 

-- order by isn't using alias from select
select sum(p_retailprice) as ps from part order by sum(p_retailprice);
 

 
I haven't looked at your patch yet. Hopefully above queries would be helpful. 
 
 
 
 
 
 
 
 
 
 
 
 

 
 Add Comment 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 

 This message was sent by Atlassian JIRA (v6.3.15#6346-sha1:dbc023d) 
 
 
 
 
  
 
 
 
 
 
 
 
 
   



[jira] (HIVE-15160) Can't order by an unselected column

2017-01-30 Thread Vineet Garg (JIRA)
Title: Message Title
 
 
 
 
 
 
 
 
 
 
  
 
 Vineet Garg edited a comment on  HIVE-15160 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
  Re: Can't order by an unselected column  
 
 
 
 
 
 
 
 
 
 I applied your patch locally and tried running following tests but none of them worked. Following queries have the pattern observed in tpcds queries{code}-- order by has aggregate which is in selectselect sum(p_retailprice) from part order by sum(p_retailprice);-- order by has item which isn't in selectselect sum(p_retailprice) from part group by p_type order by p_type; -- order by isn't using alias from selectselect sum(p_retailprice) as ps from part order by sum(p_retailprice);{code}I haven't looked at your patch yet. Hopefully above queries would be helpful  in figuring out the issues . 
 
 
 
 
 
 
 
 
 
 
 
 

 
 Add Comment 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 

 This message was sent by Atlassian JIRA (v6.3.15#6346-sha1:dbc023d) 
 
 
 
 
  
 
 
 
 
 
 
 
 
   



[jira] (HIVE-15753) subquery failing with org.apache.hadoop.hive.ql.parse.SemanticException

2017-01-30 Thread Vineet Garg (JIRA)
Title: Message Title
 
 
 
 
 
 
 
 
 
 
  
 
 Vineet Garg updated an issue 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 Hive /  HIVE-15753 
 
 
 
  subquery failing with org.apache.hadoop.hive.ql.parse.SemanticException  
 
 
 
 
 
 
 
 
 

Change By:
 
 Vineet Garg 
 
 
 

Attachment:
 
 HIVE-15753.2.patch 
 
 
 
 
 
 
 
 
 
 
 
 

 
 Add Comment 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 

 This message was sent by Atlassian JIRA (v6.3.15#6346-sha1:dbc023d) 
 
 
 
 
  
 
 
 
 
 
 
 
 
   



[jira] (HIVE-15753) subquery failing with org.apache.hadoop.hive.ql.parse.SemanticException

2017-01-30 Thread Vineet Garg (JIRA)
Title: Message Title
 
 
 
 
 
 
 
 
 
 
  
 
 Vineet Garg updated  HIVE-15753 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 Hive /  HIVE-15753 
 
 
 
  subquery failing with org.apache.hadoop.hive.ql.parse.SemanticException  
 
 
 
 
 
 
 
 
 

Change By:
 
 Vineet Garg 
 
 
 

Status:
 
 Open Patch Available 
 
 
 
 
 
 
 
 
 
 
 
 

 
 Add Comment 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 

 This message was sent by Atlassian JIRA (v6.3.15#6346-sha1:dbc023d) 
 
 
 
 
  
 
 
 
 
 
 
 
 
   



[jira] (HIVE-15753) subquery failing with org.apache.hadoop.hive.ql.parse.SemanticException

2017-01-30 Thread Vineet Garg (JIRA)
Title: Message Title
 
 
 
 
 
 
 
 
 
 
  
 
 Vineet Garg updated  HIVE-15753 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 Hive /  HIVE-15753 
 
 
 
  subquery failing with org.apache.hadoop.hive.ql.parse.SemanticException  
 
 
 
 
 
 
 
 
 

Change By:
 
 Vineet Garg 
 
 
 

Status:
 
 Patch Available Open 
 
 
 
 
 
 
 
 
 
 
 
 

 
 Add Comment 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 

 This message was sent by Atlassian JIRA (v6.3.15#6346-sha1:dbc023d) 
 
 
 
 
  
 
 
 
 
 
 
 
 
   



[jira] (HIVE-15763) Subquery in both lhs and rsh of IN/NOT IN throws misleading error

2017-01-30 Thread Vineet Garg (JIRA)
Title: Message Title
 
 
 
 
 
 
 
 
 
 
  
 
 Vineet Garg updated an issue 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 Hive /  HIVE-15763 
 
 
 
  Subquery in both lhs and rsh of IN/NOT IN throws misleading error  
 
 
 
 
 
 
 
 
 

Change By:
 
 Vineet Garg 
 
 
 

Labels:
 
 sub-query 
 
 
 
 
 
 
 
 
 
 
 
 

 
 Add Comment 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 

 This message was sent by Atlassian JIRA (v6.3.15#6346-sha1:dbc023d) 
 
 
 
 
  
 
 
 
 
 
 
 
 
   



[jira] (HIVE-15763) Subquery in both lhs and rsh of IN/NOT IN throws misleading error

2017-01-30 Thread Vineet Garg (JIRA)
Title: Message Title
 
 
 
 
 
 
 
 
 
 
  
 
 Vineet Garg assigned an issue to Vineet Garg 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 Hive /  HIVE-15763 
 
 
 
  Subquery in both lhs and rsh of IN/NOT IN throws misleading error  
 
 
 
 
 
 
 
 
 
 
 
 

 
 Add Comment 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 

 This message was sent by Atlassian JIRA (v6.3.15#6346-sha1:dbc023d) 
 
 
 
 
  
 
 
 
 
 
 
 
 
   



[jira] [Commented] (HIVE-15703) HiveSubQRemoveRelBuilder should use Hive's own factories

2017-01-26 Thread Vineet Garg (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15840416#comment-15840416
 ] 

Vineet Garg commented on HIVE-15703:


HIVE-15737 will get rid of {{HiveSubQRemoveRelBuilder}}

> HiveSubQRemoveRelBuilder should use Hive's own factories
> 
>
> Key: HIVE-15703
> URL: https://issues.apache.org/jira/browse/HIVE-15703
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15703.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15703) HiveSubQRemoveRelBuilder should use Hive's own factories

2017-01-26 Thread Vineet Garg (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15840408#comment-15840408
 ] 

Vineet Garg commented on HIVE-15703:


HiveSubQRemoveRelBuilder kept default factories instead of Hive factories for a 
reason. I don't recall the exact reason but I remember wrong plans because of 
this.
Anyway we plan to get rid of this and replace with RelBuilder so I don't think 
it's worth the change.

> HiveSubQRemoveRelBuilder should use Hive's own factories
> 
>
> Key: HIVE-15703
> URL: https://issues.apache.org/jira/browse/HIVE-15703
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15703.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15160) Can't order by an unselected column

2017-01-26 Thread Vineet Garg (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15840675#comment-15840675
 ] 

Vineet Garg commented on HIVE-15160:


[~pxiong] Your patch works only for non-cbo path?

> Can't order by an unselected column
> ---
>
> Key: HIVE-15160
> URL: https://issues.apache.org/jira/browse/HIVE-15160
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15160.01.patch
>
>
> If a grouping key hasn't been selected, Hive complains. For comparison, 
> Postgres does not.
> Example. Notice i_item_id is not selected:
> {code}
> select  i_item_desc
>,i_category
>,i_class
>,i_current_price
>,sum(cs_ext_sales_price) as itemrevenue
>,sum(cs_ext_sales_price)*100/sum(sum(cs_ext_sales_price)) over
>(partition by i_class) as revenueratio
>  from catalog_sales
>  ,item
>  ,date_dim
>  where cs_item_sk = i_item_sk
>and i_category in ('Jewelry', 'Sports', 'Books')
>and cs_sold_date_sk = d_date_sk
>  and d_date between cast('2001-01-12' as date)
>   and (cast('2001-01-12' as date) + 30 days)
>  group by i_item_id
>  ,i_item_desc
>  ,i_category
>  ,i_class
>  ,i_current_price
>  order by i_category
>  ,i_class
>  ,i_item_id
>  ,i_item_desc
>  ,revenueratio
> limit 100;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15721) Allow IN/NOT IN correlated subquery with aggregates

2017-01-25 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-15721:
---
Attachment: HIVE-15721.2.patch

> Allow  IN/NOT IN correlated subquery with aggregates
> 
>
> Key: HIVE-15721
> URL: https://issues.apache.org/jira/browse/HIVE-15721
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>  Labels: sub-query
> Attachments: HIVE-15721.1.patch, HIVE-15721.2.patch
>
>
> With HIVE-15544 IN/NOT IN correlated subqueries with aggregates were disabled 
> since re-writting them into JOIN could have produced wrong result.
> Wrong results would occur if subquery produces zero row, since aggregate 
> always produce result lower such query into LEFT JOIN or SEMI JOIN would not 
> take these case into consideration.
> We propose to allow such queries with an added run time check which will 
> throw an error/exception if subquery produces zero row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15721) Allow IN/NOT IN correlated subquery with aggregates

2017-01-25 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-15721:
---
Status: Open  (was: Patch Available)

> Allow  IN/NOT IN correlated subquery with aggregates
> 
>
> Key: HIVE-15721
> URL: https://issues.apache.org/jira/browse/HIVE-15721
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>  Labels: sub-query
> Attachments: HIVE-15721.1.patch, HIVE-15721.2.patch
>
>
> With HIVE-15544 IN/NOT IN correlated subqueries with aggregates were disabled 
> since re-writting them into JOIN could have produced wrong result.
> Wrong results would occur if subquery produces zero row, since aggregate 
> always produce result lower such query into LEFT JOIN or SEMI JOIN would not 
> take these case into consideration.
> We propose to allow such queries with an added run time check which will 
> throw an error/exception if subquery produces zero row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15721) Allow IN/NOT IN correlated subquery with aggregates

2017-01-25 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-15721:
---
Status: Patch Available  (was: Open)

> Allow  IN/NOT IN correlated subquery with aggregates
> 
>
> Key: HIVE-15721
> URL: https://issues.apache.org/jira/browse/HIVE-15721
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>  Labels: sub-query
> Attachments: HIVE-15721.1.patch, HIVE-15721.2.patch
>
>
> With HIVE-15544 IN/NOT IN correlated subqueries with aggregates were disabled 
> since re-writting them into JOIN could have produced wrong result.
> Wrong results would occur if subquery produces zero row, since aggregate 
> always produce result lower such query into LEFT JOIN or SEMI JOIN would not 
> take these case into consideration.
> We propose to allow such queries with an added run time check which will 
> throw an error/exception if subquery produces zero row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15721) Allow IN/NOT IN correlated subquery with aggregates

2017-01-24 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-15721:
---
Labels: sub-query  (was: )

> Allow  IN/NOT IN correlated subquery with aggregates
> 
>
> Key: HIVE-15721
> URL: https://issues.apache.org/jira/browse/HIVE-15721
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>  Labels: sub-query
>
> With HIVE-15544 IN/NOT IN correlated subqueries with aggregates were disabled 
> since re-writting them into JOIN could have produced wrong result.
> Wrong results would occur if subquery produces zero row, since aggregate 
> always produce result lower such query into LEFT JOIN or SEMI JOIN would not 
> take these case into consideration.
> We propose to allow such queries with an added run time check which will 
> throw an error/exception if subquery produces zero row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-16002) Correlated IN subquery with aggregate asserts in sq_count_check UDF

2017-02-21 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg reassigned HIVE-16002:
--


> Correlated IN subquery with aggregate asserts in sq_count_check UDF
> ---
>
> Key: HIVE-16002
> URL: https://issues.apache.org/jira/browse/HIVE-16002
> Project: Hive
>  Issue Type: Bug
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>
> ==Reproducer==
> {code:SQL}
> create table t(i int, j int);
> insert into t values(0,1), (0,2);
> create table tt(i int, j int);
> insert into tt values(0,3);
> select * from t where i IN (select count(i) from tt where tt.j = t.j);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


  1   2   3   4   5   6   7   8   9   10   >