[jira] [Updated] (HIVE-14442) CBO: Calcite Operator To Hive Operator(Calcite Return Path): Wrong result/plan in group by with hive.map.aggr=false

2016-08-08 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-14442:

   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Pushed to master. Thanks, Vineet!

> CBO: Calcite Operator To Hive Operator(Calcite Return Path): Wrong 
> result/plan in group by with hive.map.aggr=false
> ---
>
> Key: HIVE-14442
> URL: https://issues.apache.org/jira/browse/HIVE-14442
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Fix For: 2.2.0
>
> Attachments: HIVE-14442.1.patch, HIVE-14442.2.patch, 
> HIVE-14442.3.patch
>
>
> Reproducer
> {code} set hive.cbo.returnpath.hiveop=true
>  set hive.map.aggr=false
> create table abcd (a int, b int, c int, d int);
> LOAD DATA LOCAL INPATH '../../data/files/in4.txt' INTO TABLE abcd;
> {code}
> {code} explain select count(distinct a) from abcd group by b; {code}
> {code}
> STAGE PLANS:
>   Stage: Stage-1
> Map Reduce
>   Map Operator Tree:
>   TableScan
> alias: abcd
> Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
> Select Operator
>   expressions: a (type: int)
>   outputColumnNames: a
>   Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
>   Reduce Output Operator
> key expressions: a (type: int), a (type: int)
> sort order: ++
> Map-reduce partition columns: a (type: int)
> Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
>   Reduce Operator Tree:
> Group By Operator
>   aggregations: count(DISTINCT KEY._col1:0._col0)
>   keys: KEY._col0 (type: int)
>   mode: complete
>   outputColumnNames: b, $f1
>   Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE Column 
> stats: NONE
>   Select Operator
> expressions: $f1 (type: bigint)
> outputColumnNames: _o__c0
> Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE 
> Column stats: NONE
> File Output Operator
>   compressed: false
>   Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE 
> Column stats: NONE
>   table:
>   input format: 
> org.apache.hadoop.mapred.SequenceFileInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>   serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> {code}
> {code} explain select count(distinct a) from abcd group by c; {code}
> {code}
> STAGE PLANS:
>   Stage: Stage-1
> Map Reduce
>   Map Operator Tree:
>   TableScan
> alias: abcd
> Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
> Select Operator
>   expressions: a (type: int)
>   outputColumnNames: a
>   Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
>   Reduce Output Operator
> key expressions: a (type: int), a (type: int)
> sort order: ++
> Map-reduce partition columns: a (type: int)
> Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
>   Reduce Operator Tree:
> Group By Operator
>   aggregations: count(DISTINCT KEY._col1:0._col0)
>   keys: KEY._col0 (type: int)
>   mode: complete
>   outputColumnNames: c, $f1
>   Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE Column 
> stats: NONE
>   Select Operator
> expressions: $f1 (type: bigint)
> outputColumnNames: _o__c0
> Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE 
> Column stats: NONE
> File Output Operator
>   compressed: false
>   Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE 
> Column stats: NONE
>   table:
>   input format: 
> org.apache.hadoop.mapred.SequenceFileInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>   serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> {code}
> Above two cases has wrong keys in Map side Reduce Output Operator (both has 
> a, a instead of b,a and c,a respectively



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14442) CBO: Calcite Operator To Hive Operator(Calcite Return Path): Wrong result/plan in group by with hive.map.aggr=false

2016-08-08 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-14442:
---
Status: Patch Available  (was: Open)

Missed sparks golden file update

> CBO: Calcite Operator To Hive Operator(Calcite Return Path): Wrong 
> result/plan in group by with hive.map.aggr=false
> ---
>
> Key: HIVE-14442
> URL: https://issues.apache.org/jira/browse/HIVE-14442
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-14442.1.patch, HIVE-14442.2.patch, 
> HIVE-14442.3.patch
>
>
> Reproducer
> {code} set hive.cbo.returnpath.hiveop=true
>  set hive.map.aggr=false
> create table abcd (a int, b int, c int, d int);
> LOAD DATA LOCAL INPATH '../../data/files/in4.txt' INTO TABLE abcd;
> {code}
> {code} explain select count(distinct a) from abcd group by b; {code}
> {code}
> STAGE PLANS:
>   Stage: Stage-1
> Map Reduce
>   Map Operator Tree:
>   TableScan
> alias: abcd
> Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
> Select Operator
>   expressions: a (type: int)
>   outputColumnNames: a
>   Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
>   Reduce Output Operator
> key expressions: a (type: int), a (type: int)
> sort order: ++
> Map-reduce partition columns: a (type: int)
> Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
>   Reduce Operator Tree:
> Group By Operator
>   aggregations: count(DISTINCT KEY._col1:0._col0)
>   keys: KEY._col0 (type: int)
>   mode: complete
>   outputColumnNames: b, $f1
>   Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE Column 
> stats: NONE
>   Select Operator
> expressions: $f1 (type: bigint)
> outputColumnNames: _o__c0
> Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE 
> Column stats: NONE
> File Output Operator
>   compressed: false
>   Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE 
> Column stats: NONE
>   table:
>   input format: 
> org.apache.hadoop.mapred.SequenceFileInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>   serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> {code}
> {code} explain select count(distinct a) from abcd group by c; {code}
> {code}
> STAGE PLANS:
>   Stage: Stage-1
> Map Reduce
>   Map Operator Tree:
>   TableScan
> alias: abcd
> Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
> Select Operator
>   expressions: a (type: int)
>   outputColumnNames: a
>   Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
>   Reduce Output Operator
> key expressions: a (type: int), a (type: int)
> sort order: ++
> Map-reduce partition columns: a (type: int)
> Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
>   Reduce Operator Tree:
> Group By Operator
>   aggregations: count(DISTINCT KEY._col1:0._col0)
>   keys: KEY._col0 (type: int)
>   mode: complete
>   outputColumnNames: c, $f1
>   Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE Column 
> stats: NONE
>   Select Operator
> expressions: $f1 (type: bigint)
> outputColumnNames: _o__c0
> Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE 
> Column stats: NONE
> File Output Operator
>   compressed: false
>   Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE 
> Column stats: NONE
>   table:
>   input format: 
> org.apache.hadoop.mapred.SequenceFileInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>   serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> {code}
> Above two cases has wrong keys in Map side Reduce Output Operator (both has 
> a, a instead of b,a and c,a respectively



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14442) CBO: Calcite Operator To Hive Operator(Calcite Return Path): Wrong result/plan in group by with hive.map.aggr=false

2016-08-08 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-14442:
---
Attachment: HIVE-14442.3.patch

> CBO: Calcite Operator To Hive Operator(Calcite Return Path): Wrong 
> result/plan in group by with hive.map.aggr=false
> ---
>
> Key: HIVE-14442
> URL: https://issues.apache.org/jira/browse/HIVE-14442
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-14442.1.patch, HIVE-14442.2.patch, 
> HIVE-14442.3.patch
>
>
> Reproducer
> {code} set hive.cbo.returnpath.hiveop=true
>  set hive.map.aggr=false
> create table abcd (a int, b int, c int, d int);
> LOAD DATA LOCAL INPATH '../../data/files/in4.txt' INTO TABLE abcd;
> {code}
> {code} explain select count(distinct a) from abcd group by b; {code}
> {code}
> STAGE PLANS:
>   Stage: Stage-1
> Map Reduce
>   Map Operator Tree:
>   TableScan
> alias: abcd
> Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
> Select Operator
>   expressions: a (type: int)
>   outputColumnNames: a
>   Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
>   Reduce Output Operator
> key expressions: a (type: int), a (type: int)
> sort order: ++
> Map-reduce partition columns: a (type: int)
> Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
>   Reduce Operator Tree:
> Group By Operator
>   aggregations: count(DISTINCT KEY._col1:0._col0)
>   keys: KEY._col0 (type: int)
>   mode: complete
>   outputColumnNames: b, $f1
>   Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE Column 
> stats: NONE
>   Select Operator
> expressions: $f1 (type: bigint)
> outputColumnNames: _o__c0
> Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE 
> Column stats: NONE
> File Output Operator
>   compressed: false
>   Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE 
> Column stats: NONE
>   table:
>   input format: 
> org.apache.hadoop.mapred.SequenceFileInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>   serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> {code}
> {code} explain select count(distinct a) from abcd group by c; {code}
> {code}
> STAGE PLANS:
>   Stage: Stage-1
> Map Reduce
>   Map Operator Tree:
>   TableScan
> alias: abcd
> Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
> Select Operator
>   expressions: a (type: int)
>   outputColumnNames: a
>   Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
>   Reduce Output Operator
> key expressions: a (type: int), a (type: int)
> sort order: ++
> Map-reduce partition columns: a (type: int)
> Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
>   Reduce Operator Tree:
> Group By Operator
>   aggregations: count(DISTINCT KEY._col1:0._col0)
>   keys: KEY._col0 (type: int)
>   mode: complete
>   outputColumnNames: c, $f1
>   Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE Column 
> stats: NONE
>   Select Operator
> expressions: $f1 (type: bigint)
> outputColumnNames: _o__c0
> Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE 
> Column stats: NONE
> File Output Operator
>   compressed: false
>   Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE 
> Column stats: NONE
>   table:
>   input format: 
> org.apache.hadoop.mapred.SequenceFileInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>   serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> {code}
> Above two cases has wrong keys in Map side Reduce Output Operator (both has 
> a, a instead of b,a and c,a respectively



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14442) CBO: Calcite Operator To Hive Operator(Calcite Return Path): Wrong result/plan in group by with hive.map.aggr=false

2016-08-08 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-14442:
---
Status: Open  (was: Patch Available)

> CBO: Calcite Operator To Hive Operator(Calcite Return Path): Wrong 
> result/plan in group by with hive.map.aggr=false
> ---
>
> Key: HIVE-14442
> URL: https://issues.apache.org/jira/browse/HIVE-14442
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-14442.1.patch, HIVE-14442.2.patch, 
> HIVE-14442.3.patch
>
>
> Reproducer
> {code} set hive.cbo.returnpath.hiveop=true
>  set hive.map.aggr=false
> create table abcd (a int, b int, c int, d int);
> LOAD DATA LOCAL INPATH '../../data/files/in4.txt' INTO TABLE abcd;
> {code}
> {code} explain select count(distinct a) from abcd group by b; {code}
> {code}
> STAGE PLANS:
>   Stage: Stage-1
> Map Reduce
>   Map Operator Tree:
>   TableScan
> alias: abcd
> Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
> Select Operator
>   expressions: a (type: int)
>   outputColumnNames: a
>   Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
>   Reduce Output Operator
> key expressions: a (type: int), a (type: int)
> sort order: ++
> Map-reduce partition columns: a (type: int)
> Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
>   Reduce Operator Tree:
> Group By Operator
>   aggregations: count(DISTINCT KEY._col1:0._col0)
>   keys: KEY._col0 (type: int)
>   mode: complete
>   outputColumnNames: b, $f1
>   Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE Column 
> stats: NONE
>   Select Operator
> expressions: $f1 (type: bigint)
> outputColumnNames: _o__c0
> Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE 
> Column stats: NONE
> File Output Operator
>   compressed: false
>   Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE 
> Column stats: NONE
>   table:
>   input format: 
> org.apache.hadoop.mapred.SequenceFileInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>   serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> {code}
> {code} explain select count(distinct a) from abcd group by c; {code}
> {code}
> STAGE PLANS:
>   Stage: Stage-1
> Map Reduce
>   Map Operator Tree:
>   TableScan
> alias: abcd
> Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
> Select Operator
>   expressions: a (type: int)
>   outputColumnNames: a
>   Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
>   Reduce Output Operator
> key expressions: a (type: int), a (type: int)
> sort order: ++
> Map-reduce partition columns: a (type: int)
> Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
>   Reduce Operator Tree:
> Group By Operator
>   aggregations: count(DISTINCT KEY._col1:0._col0)
>   keys: KEY._col0 (type: int)
>   mode: complete
>   outputColumnNames: c, $f1
>   Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE Column 
> stats: NONE
>   Select Operator
> expressions: $f1 (type: bigint)
> outputColumnNames: _o__c0
> Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE 
> Column stats: NONE
> File Output Operator
>   compressed: false
>   Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE 
> Column stats: NONE
>   table:
>   input format: 
> org.apache.hadoop.mapred.SequenceFileInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>   serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> {code}
> Above two cases has wrong keys in Map side Reduce Output Operator (both has 
> a, a instead of b,a and c,a respectively



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14442) CBO: Calcite Operator To Hive Operator(Calcite Return Path): Wrong result/plan in group by with hive.map.aggr=false

2016-08-07 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-14442:
---
Attachment: HIVE-14442.2.patch

> CBO: Calcite Operator To Hive Operator(Calcite Return Path): Wrong 
> result/plan in group by with hive.map.aggr=false
> ---
>
> Key: HIVE-14442
> URL: https://issues.apache.org/jira/browse/HIVE-14442
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-14442.1.patch, HIVE-14442.2.patch
>
>
> Reproducer
> {code} set hive.cbo.returnpath.hiveop=true
>  set hive.map.aggr=false
> create table abcd (a int, b int, c int, d int);
> LOAD DATA LOCAL INPATH '../../data/files/in4.txt' INTO TABLE abcd;
> {code}
> {code} explain select count(distinct a) from abcd group by b; {code}
> {code}
> STAGE PLANS:
>   Stage: Stage-1
> Map Reduce
>   Map Operator Tree:
>   TableScan
> alias: abcd
> Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
> Select Operator
>   expressions: a (type: int)
>   outputColumnNames: a
>   Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
>   Reduce Output Operator
> key expressions: a (type: int), a (type: int)
> sort order: ++
> Map-reduce partition columns: a (type: int)
> Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
>   Reduce Operator Tree:
> Group By Operator
>   aggregations: count(DISTINCT KEY._col1:0._col0)
>   keys: KEY._col0 (type: int)
>   mode: complete
>   outputColumnNames: b, $f1
>   Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE Column 
> stats: NONE
>   Select Operator
> expressions: $f1 (type: bigint)
> outputColumnNames: _o__c0
> Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE 
> Column stats: NONE
> File Output Operator
>   compressed: false
>   Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE 
> Column stats: NONE
>   table:
>   input format: 
> org.apache.hadoop.mapred.SequenceFileInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>   serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> {code}
> {code} explain select count(distinct a) from abcd group by c; {code}
> {code}
> STAGE PLANS:
>   Stage: Stage-1
> Map Reduce
>   Map Operator Tree:
>   TableScan
> alias: abcd
> Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
> Select Operator
>   expressions: a (type: int)
>   outputColumnNames: a
>   Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
>   Reduce Output Operator
> key expressions: a (type: int), a (type: int)
> sort order: ++
> Map-reduce partition columns: a (type: int)
> Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
>   Reduce Operator Tree:
> Group By Operator
>   aggregations: count(DISTINCT KEY._col1:0._col0)
>   keys: KEY._col0 (type: int)
>   mode: complete
>   outputColumnNames: c, $f1
>   Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE Column 
> stats: NONE
>   Select Operator
> expressions: $f1 (type: bigint)
> outputColumnNames: _o__c0
> Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE 
> Column stats: NONE
> File Output Operator
>   compressed: false
>   Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE 
> Column stats: NONE
>   table:
>   input format: 
> org.apache.hadoop.mapred.SequenceFileInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>   serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> {code}
> Above two cases has wrong keys in Map side Reduce Output Operator (both has 
> a, a instead of b,a and c,a respectively



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14442) CBO: Calcite Operator To Hive Operator(Calcite Return Path): Wrong result/plan in group by with hive.map.aggr=false

2016-08-07 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-14442:
---
Attachment: (was: HIVE-14442.2.patch)

> CBO: Calcite Operator To Hive Operator(Calcite Return Path): Wrong 
> result/plan in group by with hive.map.aggr=false
> ---
>
> Key: HIVE-14442
> URL: https://issues.apache.org/jira/browse/HIVE-14442
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-14442.1.patch
>
>
> Reproducer
> {code} set hive.cbo.returnpath.hiveop=true
>  set hive.map.aggr=false
> create table abcd (a int, b int, c int, d int);
> LOAD DATA LOCAL INPATH '../../data/files/in4.txt' INTO TABLE abcd;
> {code}
> {code} explain select count(distinct a) from abcd group by b; {code}
> {code}
> STAGE PLANS:
>   Stage: Stage-1
> Map Reduce
>   Map Operator Tree:
>   TableScan
> alias: abcd
> Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
> Select Operator
>   expressions: a (type: int)
>   outputColumnNames: a
>   Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
>   Reduce Output Operator
> key expressions: a (type: int), a (type: int)
> sort order: ++
> Map-reduce partition columns: a (type: int)
> Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
>   Reduce Operator Tree:
> Group By Operator
>   aggregations: count(DISTINCT KEY._col1:0._col0)
>   keys: KEY._col0 (type: int)
>   mode: complete
>   outputColumnNames: b, $f1
>   Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE Column 
> stats: NONE
>   Select Operator
> expressions: $f1 (type: bigint)
> outputColumnNames: _o__c0
> Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE 
> Column stats: NONE
> File Output Operator
>   compressed: false
>   Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE 
> Column stats: NONE
>   table:
>   input format: 
> org.apache.hadoop.mapred.SequenceFileInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>   serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> {code}
> {code} explain select count(distinct a) from abcd group by c; {code}
> {code}
> STAGE PLANS:
>   Stage: Stage-1
> Map Reduce
>   Map Operator Tree:
>   TableScan
> alias: abcd
> Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
> Select Operator
>   expressions: a (type: int)
>   outputColumnNames: a
>   Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
>   Reduce Output Operator
> key expressions: a (type: int), a (type: int)
> sort order: ++
> Map-reduce partition columns: a (type: int)
> Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
>   Reduce Operator Tree:
> Group By Operator
>   aggregations: count(DISTINCT KEY._col1:0._col0)
>   keys: KEY._col0 (type: int)
>   mode: complete
>   outputColumnNames: c, $f1
>   Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE Column 
> stats: NONE
>   Select Operator
> expressions: $f1 (type: bigint)
> outputColumnNames: _o__c0
> Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE 
> Column stats: NONE
> File Output Operator
>   compressed: false
>   Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE 
> Column stats: NONE
>   table:
>   input format: 
> org.apache.hadoop.mapred.SequenceFileInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>   serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> {code}
> Above two cases has wrong keys in Map side Reduce Output Operator (both has 
> a, a instead of b,a and c,a respectively



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14442) CBO: Calcite Operator To Hive Operator(Calcite Return Path): Wrong result/plan in group by with hive.map.aggr=false

2016-08-07 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-14442:
---
Attachment: HIVE-14442.2.patch

> CBO: Calcite Operator To Hive Operator(Calcite Return Path): Wrong 
> result/plan in group by with hive.map.aggr=false
> ---
>
> Key: HIVE-14442
> URL: https://issues.apache.org/jira/browse/HIVE-14442
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-14442.1.patch, HIVE-14442.2.patch
>
>
> Reproducer
> {code} set hive.cbo.returnpath.hiveop=true
>  set hive.map.aggr=false
> create table abcd (a int, b int, c int, d int);
> LOAD DATA LOCAL INPATH '../../data/files/in4.txt' INTO TABLE abcd;
> {code}
> {code} explain select count(distinct a) from abcd group by b; {code}
> {code}
> STAGE PLANS:
>   Stage: Stage-1
> Map Reduce
>   Map Operator Tree:
>   TableScan
> alias: abcd
> Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
> Select Operator
>   expressions: a (type: int)
>   outputColumnNames: a
>   Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
>   Reduce Output Operator
> key expressions: a (type: int), a (type: int)
> sort order: ++
> Map-reduce partition columns: a (type: int)
> Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
>   Reduce Operator Tree:
> Group By Operator
>   aggregations: count(DISTINCT KEY._col1:0._col0)
>   keys: KEY._col0 (type: int)
>   mode: complete
>   outputColumnNames: b, $f1
>   Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE Column 
> stats: NONE
>   Select Operator
> expressions: $f1 (type: bigint)
> outputColumnNames: _o__c0
> Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE 
> Column stats: NONE
> File Output Operator
>   compressed: false
>   Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE 
> Column stats: NONE
>   table:
>   input format: 
> org.apache.hadoop.mapred.SequenceFileInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>   serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> {code}
> {code} explain select count(distinct a) from abcd group by c; {code}
> {code}
> STAGE PLANS:
>   Stage: Stage-1
> Map Reduce
>   Map Operator Tree:
>   TableScan
> alias: abcd
> Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
> Select Operator
>   expressions: a (type: int)
>   outputColumnNames: a
>   Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
>   Reduce Output Operator
> key expressions: a (type: int), a (type: int)
> sort order: ++
> Map-reduce partition columns: a (type: int)
> Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
>   Reduce Operator Tree:
> Group By Operator
>   aggregations: count(DISTINCT KEY._col1:0._col0)
>   keys: KEY._col0 (type: int)
>   mode: complete
>   outputColumnNames: c, $f1
>   Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE Column 
> stats: NONE
>   Select Operator
> expressions: $f1 (type: bigint)
> outputColumnNames: _o__c0
> Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE 
> Column stats: NONE
> File Output Operator
>   compressed: false
>   Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE 
> Column stats: NONE
>   table:
>   input format: 
> org.apache.hadoop.mapred.SequenceFileInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>   serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> {code}
> Above two cases has wrong keys in Map side Reduce Output Operator (both has 
> a, a instead of b,a and c,a respectively



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14442) CBO: Calcite Operator To Hive Operator(Calcite Return Path): Wrong result/plan in group by with hive.map.aggr=false

2016-08-07 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-14442:
---
Status: Open  (was: Patch Available)

> CBO: Calcite Operator To Hive Operator(Calcite Return Path): Wrong 
> result/plan in group by with hive.map.aggr=false
> ---
>
> Key: HIVE-14442
> URL: https://issues.apache.org/jira/browse/HIVE-14442
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-14442.1.patch, HIVE-14442.2.patch
>
>
> Reproducer
> {code} set hive.cbo.returnpath.hiveop=true
>  set hive.map.aggr=false
> create table abcd (a int, b int, c int, d int);
> LOAD DATA LOCAL INPATH '../../data/files/in4.txt' INTO TABLE abcd;
> {code}
> {code} explain select count(distinct a) from abcd group by b; {code}
> {code}
> STAGE PLANS:
>   Stage: Stage-1
> Map Reduce
>   Map Operator Tree:
>   TableScan
> alias: abcd
> Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
> Select Operator
>   expressions: a (type: int)
>   outputColumnNames: a
>   Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
>   Reduce Output Operator
> key expressions: a (type: int), a (type: int)
> sort order: ++
> Map-reduce partition columns: a (type: int)
> Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
>   Reduce Operator Tree:
> Group By Operator
>   aggregations: count(DISTINCT KEY._col1:0._col0)
>   keys: KEY._col0 (type: int)
>   mode: complete
>   outputColumnNames: b, $f1
>   Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE Column 
> stats: NONE
>   Select Operator
> expressions: $f1 (type: bigint)
> outputColumnNames: _o__c0
> Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE 
> Column stats: NONE
> File Output Operator
>   compressed: false
>   Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE 
> Column stats: NONE
>   table:
>   input format: 
> org.apache.hadoop.mapred.SequenceFileInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>   serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> {code}
> {code} explain select count(distinct a) from abcd group by c; {code}
> {code}
> STAGE PLANS:
>   Stage: Stage-1
> Map Reduce
>   Map Operator Tree:
>   TableScan
> alias: abcd
> Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
> Select Operator
>   expressions: a (type: int)
>   outputColumnNames: a
>   Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
>   Reduce Output Operator
> key expressions: a (type: int), a (type: int)
> sort order: ++
> Map-reduce partition columns: a (type: int)
> Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
>   Reduce Operator Tree:
> Group By Operator
>   aggregations: count(DISTINCT KEY._col1:0._col0)
>   keys: KEY._col0 (type: int)
>   mode: complete
>   outputColumnNames: c, $f1
>   Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE Column 
> stats: NONE
>   Select Operator
> expressions: $f1 (type: bigint)
> outputColumnNames: _o__c0
> Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE 
> Column stats: NONE
> File Output Operator
>   compressed: false
>   Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE 
> Column stats: NONE
>   table:
>   input format: 
> org.apache.hadoop.mapred.SequenceFileInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>   serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> {code}
> Above two cases has wrong keys in Map side Reduce Output Operator (both has 
> a, a instead of b,a and c,a respectively



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14442) CBO: Calcite Operator To Hive Operator(Calcite Return Path): Wrong result/plan in group by with hive.map.aggr=false

2016-08-05 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-14442:
---
Status: Patch Available  (was: Open)

> CBO: Calcite Operator To Hive Operator(Calcite Return Path): Wrong 
> result/plan in group by with hive.map.aggr=false
> ---
>
> Key: HIVE-14442
> URL: https://issues.apache.org/jira/browse/HIVE-14442
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-14442.1.patch
>
>
> Reproducer
> {code} set hive.cbo.returnpath.hiveop=true
>  set hive.map.aggr=false
> create table abcd (a int, b int, c int, d int);
> LOAD DATA LOCAL INPATH '../../data/files/in4.txt' INTO TABLE abcd;
> {code}
> {code} explain select count(distinct a) from abcd group by b; {code}
> {code}
> STAGE PLANS:
>   Stage: Stage-1
> Map Reduce
>   Map Operator Tree:
>   TableScan
> alias: abcd
> Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
> Select Operator
>   expressions: a (type: int)
>   outputColumnNames: a
>   Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
>   Reduce Output Operator
> key expressions: a (type: int), a (type: int)
> sort order: ++
> Map-reduce partition columns: a (type: int)
> Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
>   Reduce Operator Tree:
> Group By Operator
>   aggregations: count(DISTINCT KEY._col1:0._col0)
>   keys: KEY._col0 (type: int)
>   mode: complete
>   outputColumnNames: b, $f1
>   Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE Column 
> stats: NONE
>   Select Operator
> expressions: $f1 (type: bigint)
> outputColumnNames: _o__c0
> Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE 
> Column stats: NONE
> File Output Operator
>   compressed: false
>   Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE 
> Column stats: NONE
>   table:
>   input format: 
> org.apache.hadoop.mapred.SequenceFileInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>   serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> {code}
> {code} explain select count(distinct a) from abcd group by c; {code}
> {code}
> STAGE PLANS:
>   Stage: Stage-1
> Map Reduce
>   Map Operator Tree:
>   TableScan
> alias: abcd
> Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
> Select Operator
>   expressions: a (type: int)
>   outputColumnNames: a
>   Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
>   Reduce Output Operator
> key expressions: a (type: int), a (type: int)
> sort order: ++
> Map-reduce partition columns: a (type: int)
> Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
>   Reduce Operator Tree:
> Group By Operator
>   aggregations: count(DISTINCT KEY._col1:0._col0)
>   keys: KEY._col0 (type: int)
>   mode: complete
>   outputColumnNames: c, $f1
>   Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE Column 
> stats: NONE
>   Select Operator
> expressions: $f1 (type: bigint)
> outputColumnNames: _o__c0
> Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE 
> Column stats: NONE
> File Output Operator
>   compressed: false
>   Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE 
> Column stats: NONE
>   table:
>   input format: 
> org.apache.hadoop.mapred.SequenceFileInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>   serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> {code}
> Above two cases has wrong keys in Map side Reduce Output Operator (both has 
> a, a instead of b,a and c,a respectively



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14442) CBO: Calcite Operator To Hive Operator(Calcite Return Path): Wrong result/plan in group by with hive.map.aggr=false

2016-08-05 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-14442:
---
Attachment: HIVE-14442.1.patch

> CBO: Calcite Operator To Hive Operator(Calcite Return Path): Wrong 
> result/plan in group by with hive.map.aggr=false
> ---
>
> Key: HIVE-14442
> URL: https://issues.apache.org/jira/browse/HIVE-14442
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-14442.1.patch
>
>
> Reproducer
> {code} set hive.cbo.returnpath.hiveop=true
>  set hive.map.aggr=false
> create table abcd (a int, b int, c int, d int);
> LOAD DATA LOCAL INPATH '../../data/files/in4.txt' INTO TABLE abcd;
> {code}
> {code} explain select count(distinct a) from abcd group by b; {code}
> {code}
> STAGE PLANS:
>   Stage: Stage-1
> Map Reduce
>   Map Operator Tree:
>   TableScan
> alias: abcd
> Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
> Select Operator
>   expressions: a (type: int)
>   outputColumnNames: a
>   Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
>   Reduce Output Operator
> key expressions: a (type: int), a (type: int)
> sort order: ++
> Map-reduce partition columns: a (type: int)
> Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
>   Reduce Operator Tree:
> Group By Operator
>   aggregations: count(DISTINCT KEY._col1:0._col0)
>   keys: KEY._col0 (type: int)
>   mode: complete
>   outputColumnNames: b, $f1
>   Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE Column 
> stats: NONE
>   Select Operator
> expressions: $f1 (type: bigint)
> outputColumnNames: _o__c0
> Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE 
> Column stats: NONE
> File Output Operator
>   compressed: false
>   Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE 
> Column stats: NONE
>   table:
>   input format: 
> org.apache.hadoop.mapred.SequenceFileInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>   serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> {code}
> {code} explain select count(distinct a) from abcd group by c; {code}
> {code}
> STAGE PLANS:
>   Stage: Stage-1
> Map Reduce
>   Map Operator Tree:
>   TableScan
> alias: abcd
> Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
> Select Operator
>   expressions: a (type: int)
>   outputColumnNames: a
>   Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
>   Reduce Output Operator
> key expressions: a (type: int), a (type: int)
> sort order: ++
> Map-reduce partition columns: a (type: int)
> Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
> Column stats: NONE
>   Reduce Operator Tree:
> Group By Operator
>   aggregations: count(DISTINCT KEY._col1:0._col0)
>   keys: KEY._col0 (type: int)
>   mode: complete
>   outputColumnNames: c, $f1
>   Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE Column 
> stats: NONE
>   Select Operator
> expressions: $f1 (type: bigint)
> outputColumnNames: _o__c0
> Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE 
> Column stats: NONE
> File Output Operator
>   compressed: false
>   Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE 
> Column stats: NONE
>   table:
>   input format: 
> org.apache.hadoop.mapred.SequenceFileInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>   serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> {code}
> Above two cases has wrong keys in Map side Reduce Output Operator (both has 
> a, a instead of b,a and c,a respectively



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14442) CBO: Calcite Operator To Hive Operator(Calcite Return Path): Wrong result/plan in group by with hive.map.aggr=false

2016-08-05 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-14442:
---
Description: 
Reproducer

{code} set hive.cbo.returnpath.hiveop=true
 set hive.map.aggr=false

create table abcd (a int, b int, c int, d int);
LOAD DATA LOCAL INPATH '../../data/files/in4.txt' INTO TABLE abcd;
{code}

{code} explain select count(distinct a) from abcd group by b; {code}

{code}
STAGE PLANS:
  Stage: Stage-1
Map Reduce
  Map Operator Tree:
  TableScan
alias: abcd
Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE Column 
stats: NONE
Select Operator
  expressions: a (type: int)
  outputColumnNames: a
  Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
Column stats: NONE
  Reduce Output Operator
key expressions: a (type: int), a (type: int)
sort order: ++
Map-reduce partition columns: a (type: int)
Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
Column stats: NONE
  Reduce Operator Tree:
Group By Operator
  aggregations: count(DISTINCT KEY._col1:0._col0)
  keys: KEY._col0 (type: int)
  mode: complete
  outputColumnNames: b, $f1
  Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE Column 
stats: NONE
  Select Operator
expressions: $f1 (type: bigint)
outputColumnNames: _o__c0
Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE Column 
stats: NONE
File Output Operator
  compressed: false
  Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE 
Column stats: NONE
  table:
  input format: org.apache.hadoop.mapred.SequenceFileInputFormat
  output format: 
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
{code}

{code} explain select count(distinct a) from abcd group by c; {code}
{code}
STAGE PLANS:
  Stage: Stage-1
Map Reduce
  Map Operator Tree:
  TableScan
alias: abcd
Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE Column 
stats: NONE
Select Operator
  expressions: a (type: int)
  outputColumnNames: a
  Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
Column stats: NONE
  Reduce Output Operator
key expressions: a (type: int), a (type: int)
sort order: ++
Map-reduce partition columns: a (type: int)
Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
Column stats: NONE
  Reduce Operator Tree:
Group By Operator
  aggregations: count(DISTINCT KEY._col1:0._col0)
  keys: KEY._col0 (type: int)
  mode: complete
  outputColumnNames: c, $f1
  Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE Column 
stats: NONE
  Select Operator
expressions: $f1 (type: bigint)
outputColumnNames: _o__c0
Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE Column 
stats: NONE
File Output Operator
  compressed: false
  Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE 
Column stats: NONE
  table:
  input format: org.apache.hadoop.mapred.SequenceFileInputFormat
  output format: 
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

{code}

Above two cases has wrong keys in Map side Reduce Output Operator (both has a, 
a instead of b,a and c,a respectively

  was:
Reproducer

{code} set hive.cbo.returnpath.hiveop=true
 set hive.map.aggr=false

create table abcd (a int, b int, c int, d int);
LOAD DATA LOCAL INPATH '../../data/files/in4.txt' INTO TABLE abcd;
{code}

{code} explain select count(distinct a) from abcd group by b; {code}

{code}
STAGE PLANS:
  Stage: Stage-1
Map Reduce
  Map Operator Tree:
  TableScan
alias: abcd
Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE Column 
stats: NONE
Select Operator
  expressions: a (type: int)
  outputColumnNames: a
  Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
Column stats: NONE
  Reduce Output Operator
key expressions: a (type: int), a (type: int)
sort order: ++
Map-reduce partition columns: a (type: int)
Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
Column stats: NONE
  Reduce Operator Tree:
Group By Operator
  

[jira] [Updated] (HIVE-14442) CBO: Calcite Operator To Hive Operator(Calcite Return Path): Wrong result/plan in group by with hive.map.aggr=false

2016-08-05 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-14442:
---
Description: 
Reproducer

{code} set hive.cbo.returnpath.hiveop=true {code}
 set hive.map.aggr=false {code}

{code}
create table abcd (a int, b int, c int, d int);
LOAD DATA LOCAL INPATH '../../data/files/in4.txt' INTO TABLE abcd;
{code}

{code} explain select count(distinct a) from abcd group by b; {code}

{code}
STAGE PLANS:
  Stage: Stage-1
Map Reduce
  Map Operator Tree:
  TableScan
alias: abcd
Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE Column 
stats: NONE
Select Operator
  expressions: a (type: int)
  outputColumnNames: a
  Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
Column stats: NONE
  Reduce Output Operator
key expressions: a (type: int), a (type: int)
sort order: ++
Map-reduce partition columns: a (type: int)
Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
Column stats: NONE
  Reduce Operator Tree:
Group By Operator
  aggregations: count(DISTINCT KEY._col1:0._col0)
  keys: KEY._col0 (type: int)
  mode: complete
  outputColumnNames: b, $f1
  Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE Column 
stats: NONE
  Select Operator
expressions: $f1 (type: bigint)
outputColumnNames: _o__c0
Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE Column 
stats: NONE
File Output Operator
  compressed: false
  Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE 
Column stats: NONE
  table:
  input format: org.apache.hadoop.mapred.SequenceFileInputFormat
  output format: 
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

{code} explain select count(distinct a) from abcd group by c; {code}
{code}
STAGE PLANS:
  Stage: Stage-1
Map Reduce
  Map Operator Tree:
  TableScan
alias: abcd
Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE Column 
stats: NONE
Select Operator
  expressions: a (type: int)
  outputColumnNames: a
  Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
Column stats: NONE
  Reduce Output Operator
key expressions: a (type: int), a (type: int)
sort order: ++
Map-reduce partition columns: a (type: int)
Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
Column stats: NONE
  Reduce Operator Tree:
Group By Operator
  aggregations: count(DISTINCT KEY._col1:0._col0)
  keys: KEY._col0 (type: int)
  mode: complete
  outputColumnNames: c, $f1
  Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE Column 
stats: NONE
  Select Operator
expressions: $f1 (type: bigint)
outputColumnNames: _o__c0
Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE Column 
stats: NONE
File Output Operator
  compressed: false
  Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE 
Column stats: NONE
  table:
  input format: org.apache.hadoop.mapred.SequenceFileInputFormat
  output format: 
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

{code}

Above two cases has wrong keys in Map side Reduce Output Operator (both has a, 
a instead of b,a and c,a respectively

  was:
Reproducer

{code} set hive.cbo.returnpath.hiveop=true {code}
{code} set hive.map.aggr=false {code}

{code}
create table abcd (a int, b int, c int, d int);
LOAD DATA LOCAL INPATH '../../data/files/in4.txt' INTO TABLE abcd;
{code}

{code} explain select count(distinct a) from abcd group by b; {code}

{code}
STAGE PLANS:
  Stage: Stage-1
Map Reduce
  Map Operator Tree:
  TableScan
alias: abcd
Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE Column 
stats: NONE
Select Operator
  expressions: a (type: int)
  outputColumnNames: a
  Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
Column stats: NONE
  Reduce Output Operator
key expressions: a (type: int), a (type: int)
sort order: ++
Map-reduce partition columns: a (type: int)
Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
Column stats: NONE
  Reduce Operator Tree:

[jira] [Updated] (HIVE-14442) CBO: Calcite Operator To Hive Operator(Calcite Return Path): Wrong result/plan in group by with hive.map.aggr=false

2016-08-05 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-14442:
---
Description: 
Reproducer

{code} set hive.cbo.returnpath.hiveop=true
 set hive.map.aggr=false

create table abcd (a int, b int, c int, d int);
LOAD DATA LOCAL INPATH '../../data/files/in4.txt' INTO TABLE abcd;
{code}

{code} explain select count(distinct a) from abcd group by b; {code}

{code}
STAGE PLANS:
  Stage: Stage-1
Map Reduce
  Map Operator Tree:
  TableScan
alias: abcd
Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE Column 
stats: NONE
Select Operator
  expressions: a (type: int)
  outputColumnNames: a
  Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
Column stats: NONE
  Reduce Output Operator
key expressions: a (type: int), a (type: int)
sort order: ++
Map-reduce partition columns: a (type: int)
Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
Column stats: NONE
  Reduce Operator Tree:
Group By Operator
  aggregations: count(DISTINCT KEY._col1:0._col0)
  keys: KEY._col0 (type: int)
  mode: complete
  outputColumnNames: b, $f1
  Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE Column 
stats: NONE
  Select Operator
expressions: $f1 (type: bigint)
outputColumnNames: _o__c0
Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE Column 
stats: NONE
File Output Operator
  compressed: false
  Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE 
Column stats: NONE
  table:
  input format: org.apache.hadoop.mapred.SequenceFileInputFormat
  output format: 
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

{code} explain select count(distinct a) from abcd group by c; {code}
{code}
STAGE PLANS:
  Stage: Stage-1
Map Reduce
  Map Operator Tree:
  TableScan
alias: abcd
Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE Column 
stats: NONE
Select Operator
  expressions: a (type: int)
  outputColumnNames: a
  Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
Column stats: NONE
  Reduce Output Operator
key expressions: a (type: int), a (type: int)
sort order: ++
Map-reduce partition columns: a (type: int)
Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
Column stats: NONE
  Reduce Operator Tree:
Group By Operator
  aggregations: count(DISTINCT KEY._col1:0._col0)
  keys: KEY._col0 (type: int)
  mode: complete
  outputColumnNames: c, $f1
  Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE Column 
stats: NONE
  Select Operator
expressions: $f1 (type: bigint)
outputColumnNames: _o__c0
Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE Column 
stats: NONE
File Output Operator
  compressed: false
  Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE 
Column stats: NONE
  table:
  input format: org.apache.hadoop.mapred.SequenceFileInputFormat
  output format: 
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

{code}

Above two cases has wrong keys in Map side Reduce Output Operator (both has a, 
a instead of b,a and c,a respectively

  was:
Reproducer

{code} set hive.cbo.returnpath.hiveop=true {code}
 set hive.map.aggr=false {code}

{code}
create table abcd (a int, b int, c int, d int);
LOAD DATA LOCAL INPATH '../../data/files/in4.txt' INTO TABLE abcd;
{code}

{code} explain select count(distinct a) from abcd group by b; {code}

{code}
STAGE PLANS:
  Stage: Stage-1
Map Reduce
  Map Operator Tree:
  TableScan
alias: abcd
Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE Column 
stats: NONE
Select Operator
  expressions: a (type: int)
  outputColumnNames: a
  Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
Column stats: NONE
  Reduce Output Operator
key expressions: a (type: int), a (type: int)
sort order: ++
Map-reduce partition columns: a (type: int)
Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE 
Column stats: NONE
  Reduce Operator Tree:
Group By Operator