[jira] [Updated] (HIVE-21539) GroupBy + where clause on same column results in incorrect query rewrite

2019-04-08 Thread Vineet Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-21539:
---
   Resolution: Fixed
Fix Version/s: 4.0.0
   Status: Resolved  (was: Patch Available)

> GroupBy + where clause on same column results in incorrect query rewrite
> 
>
> Key: HIVE-21539
> URL: https://issues.apache.org/jira/browse/HIVE-21539
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 4.0.0
>Reporter: anishek
>Assignee: Vineet Garg
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-21539.1.patch, HIVE-21539.2.patch, 
> HIVE-21539.3.patch
>
>
> {code}
> create table a (i int, j string);
> insert into a values ( 1, 'a'),(2,'b');
> explain extended select min(j) from a where j='a' group by j;
> ++
> |  Explain   |
> ++
> | OPTIMIZED SQL: SELECT MIN(TRUE) AS `_o__c0`|
> | FROM `default`.`a` |
> | WHERE `j` = 'a'|
> | GROUP BY TRUE  |
> | STAGE DEPENDENCIES:|
> |   Stage-1 is a root stage  |
> |   Stage-0 depends on stages: Stage-1   |
> ||
> | STAGE PLANS:   |
> |   Stage: Stage-1   |
> | Tez|
> |   DagId: 
> anagarwal_20190318153535_25c1f460-1986-475e-9995-9f6342029dd8:11 |
> |   Edges:   |
> | Reducer 2 <- Map 1 (SIMPLE_EDGE)   |
> |   DagName: 
> anagarwal_20190318153535_25c1f460-1986-475e-9995-9f6342029dd8:11 |
> |   Vertices:|
> | Map 1  |
> | Map Operator Tree: |
> | TableScan  |
> |   alias: a |
> |   filterExpr: (j = 'a') (type: boolean) |
> |   Statistics: Num rows: 2 Data size: 170 Basic stats: 
> COMPLETE Column stats: COMPLETE |
> |   GatherStats: false   |
> |   Filter Operator  |
> | isSamplingPred: false  |
> | predicate: (j = 'a') (type: boolean) |
> | Statistics: Num rows: 1 Data size: 85 Basic stats: 
> COMPLETE Column stats: COMPLETE |
> | Select Operator|
> |   Statistics: Num rows: 1 Data size: 85 Basic stats: 
> COMPLETE Column stats: COMPLETE |
> |   Group By Operator|
> | aggregations: min(true)|
> | keys: true (type: boolean) |
> | mode: hash |
> | outputColumnNames: _col0, _col1 |
> | Statistics: Num rows: 1 Data size: 8 Basic stats: 
> COMPLETE Column stats: COMPLETE |
> | Reduce Output Operator |
> |   key expressions: _col0 (type: boolean) |
> |   null sort order: a   |
> |   sort order: +|
> |   Map-reduce partition columns: _col0 (type: 
> boolean) |
> |   Statistics: Num rows: 1 Data size: 8 Basic stats: 
> COMPLETE Column stats: COMPLETE |
> |   tag: -1  |
> |   value expressions: _col1 (type: boolean) |
> |   auto parallelism: true   |
> | Path -> Alias: |
> |   hdfs://localhost:9000/tmp/hive/warehouse/a [a] |
> | Path -> Partition: |
> |   hdfs://localhost:9000/tmp/hive/warehouse/a  |
> | Partition  |
> |   base file name: a|
> |   input format: org.apache.hadoop.mapred.TextInputFormat |
> |   output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat |
> |   properties:  |
> | COLUMN_STATS_ACCURATE 
> {"BASIC_STATS":"true","COLUMN_STATS":{"i":"true","j":"true"}} |
> | bucket_count -1|
> | bucketing_version 2|
> | 

[jira] [Updated] (HIVE-21539) GroupBy + where clause on same column results in incorrect query rewrite

2019-04-08 Thread Vineet Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-21539:
---
Status: Open  (was: Patch Available)

> GroupBy + where clause on same column results in incorrect query rewrite
> 
>
> Key: HIVE-21539
> URL: https://issues.apache.org/jira/browse/HIVE-21539
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 4.0.0
>Reporter: anishek
>Assignee: Vineet Garg
>Priority: Major
> Attachments: HIVE-21539.1.patch, HIVE-21539.2.patch, 
> HIVE-21539.3.patch
>
>
> {code}
> create table a (i int, j string);
> insert into a values ( 1, 'a'),(2,'b');
> explain extended select min(j) from a where j='a' group by j;
> ++
> |  Explain   |
> ++
> | OPTIMIZED SQL: SELECT MIN(TRUE) AS `_o__c0`|
> | FROM `default`.`a` |
> | WHERE `j` = 'a'|
> | GROUP BY TRUE  |
> | STAGE DEPENDENCIES:|
> |   Stage-1 is a root stage  |
> |   Stage-0 depends on stages: Stage-1   |
> ||
> | STAGE PLANS:   |
> |   Stage: Stage-1   |
> | Tez|
> |   DagId: 
> anagarwal_20190318153535_25c1f460-1986-475e-9995-9f6342029dd8:11 |
> |   Edges:   |
> | Reducer 2 <- Map 1 (SIMPLE_EDGE)   |
> |   DagName: 
> anagarwal_20190318153535_25c1f460-1986-475e-9995-9f6342029dd8:11 |
> |   Vertices:|
> | Map 1  |
> | Map Operator Tree: |
> | TableScan  |
> |   alias: a |
> |   filterExpr: (j = 'a') (type: boolean) |
> |   Statistics: Num rows: 2 Data size: 170 Basic stats: 
> COMPLETE Column stats: COMPLETE |
> |   GatherStats: false   |
> |   Filter Operator  |
> | isSamplingPred: false  |
> | predicate: (j = 'a') (type: boolean) |
> | Statistics: Num rows: 1 Data size: 85 Basic stats: 
> COMPLETE Column stats: COMPLETE |
> | Select Operator|
> |   Statistics: Num rows: 1 Data size: 85 Basic stats: 
> COMPLETE Column stats: COMPLETE |
> |   Group By Operator|
> | aggregations: min(true)|
> | keys: true (type: boolean) |
> | mode: hash |
> | outputColumnNames: _col0, _col1 |
> | Statistics: Num rows: 1 Data size: 8 Basic stats: 
> COMPLETE Column stats: COMPLETE |
> | Reduce Output Operator |
> |   key expressions: _col0 (type: boolean) |
> |   null sort order: a   |
> |   sort order: +|
> |   Map-reduce partition columns: _col0 (type: 
> boolean) |
> |   Statistics: Num rows: 1 Data size: 8 Basic stats: 
> COMPLETE Column stats: COMPLETE |
> |   tag: -1  |
> |   value expressions: _col1 (type: boolean) |
> |   auto parallelism: true   |
> | Path -> Alias: |
> |   hdfs://localhost:9000/tmp/hive/warehouse/a [a] |
> | Path -> Partition: |
> |   hdfs://localhost:9000/tmp/hive/warehouse/a  |
> | Partition  |
> |   base file name: a|
> |   input format: org.apache.hadoop.mapred.TextInputFormat |
> |   output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat |
> |   properties:  |
> | COLUMN_STATS_ACCURATE 
> {"BASIC_STATS":"true","COLUMN_STATS":{"i":"true","j":"true"}} |
> | bucket_count -1|
> | bucketing_version 2|
> | column.name.delimiter ,|
> | columns i,j|
> |   

[jira] [Updated] (HIVE-21539) GroupBy + where clause on same column results in incorrect query rewrite

2019-04-08 Thread Vineet Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-21539:
---
Attachment: (was: HIVE-21539.4.patch)

> GroupBy + where clause on same column results in incorrect query rewrite
> 
>
> Key: HIVE-21539
> URL: https://issues.apache.org/jira/browse/HIVE-21539
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 4.0.0
>Reporter: anishek
>Assignee: Vineet Garg
>Priority: Major
> Attachments: HIVE-21539.1.patch, HIVE-21539.2.patch, 
> HIVE-21539.3.patch
>
>
> {code}
> create table a (i int, j string);
> insert into a values ( 1, 'a'),(2,'b');
> explain extended select min(j) from a where j='a' group by j;
> ++
> |  Explain   |
> ++
> | OPTIMIZED SQL: SELECT MIN(TRUE) AS `_o__c0`|
> | FROM `default`.`a` |
> | WHERE `j` = 'a'|
> | GROUP BY TRUE  |
> | STAGE DEPENDENCIES:|
> |   Stage-1 is a root stage  |
> |   Stage-0 depends on stages: Stage-1   |
> ||
> | STAGE PLANS:   |
> |   Stage: Stage-1   |
> | Tez|
> |   DagId: 
> anagarwal_20190318153535_25c1f460-1986-475e-9995-9f6342029dd8:11 |
> |   Edges:   |
> | Reducer 2 <- Map 1 (SIMPLE_EDGE)   |
> |   DagName: 
> anagarwal_20190318153535_25c1f460-1986-475e-9995-9f6342029dd8:11 |
> |   Vertices:|
> | Map 1  |
> | Map Operator Tree: |
> | TableScan  |
> |   alias: a |
> |   filterExpr: (j = 'a') (type: boolean) |
> |   Statistics: Num rows: 2 Data size: 170 Basic stats: 
> COMPLETE Column stats: COMPLETE |
> |   GatherStats: false   |
> |   Filter Operator  |
> | isSamplingPred: false  |
> | predicate: (j = 'a') (type: boolean) |
> | Statistics: Num rows: 1 Data size: 85 Basic stats: 
> COMPLETE Column stats: COMPLETE |
> | Select Operator|
> |   Statistics: Num rows: 1 Data size: 85 Basic stats: 
> COMPLETE Column stats: COMPLETE |
> |   Group By Operator|
> | aggregations: min(true)|
> | keys: true (type: boolean) |
> | mode: hash |
> | outputColumnNames: _col0, _col1 |
> | Statistics: Num rows: 1 Data size: 8 Basic stats: 
> COMPLETE Column stats: COMPLETE |
> | Reduce Output Operator |
> |   key expressions: _col0 (type: boolean) |
> |   null sort order: a   |
> |   sort order: +|
> |   Map-reduce partition columns: _col0 (type: 
> boolean) |
> |   Statistics: Num rows: 1 Data size: 8 Basic stats: 
> COMPLETE Column stats: COMPLETE |
> |   tag: -1  |
> |   value expressions: _col1 (type: boolean) |
> |   auto parallelism: true   |
> | Path -> Alias: |
> |   hdfs://localhost:9000/tmp/hive/warehouse/a [a] |
> | Path -> Partition: |
> |   hdfs://localhost:9000/tmp/hive/warehouse/a  |
> | Partition  |
> |   base file name: a|
> |   input format: org.apache.hadoop.mapred.TextInputFormat |
> |   output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat |
> |   properties:  |
> | COLUMN_STATS_ACCURATE 
> {"BASIC_STATS":"true","COLUMN_STATS":{"i":"true","j":"true"}} |
> | bucket_count -1|
> | bucketing_version 2|
> | column.name.delimiter ,|
> | columns i,j|
> |  

[jira] [Updated] (HIVE-21539) GroupBy + where clause on same column results in incorrect query rewrite

2019-04-08 Thread Vineet Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-21539:
---
Attachment: HIVE-21539.4.patch

> GroupBy + where clause on same column results in incorrect query rewrite
> 
>
> Key: HIVE-21539
> URL: https://issues.apache.org/jira/browse/HIVE-21539
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 4.0.0
>Reporter: anishek
>Assignee: Vineet Garg
>Priority: Major
> Attachments: HIVE-21539.1.patch, HIVE-21539.2.patch, 
> HIVE-21539.3.patch
>
>
> {code}
> create table a (i int, j string);
> insert into a values ( 1, 'a'),(2,'b');
> explain extended select min(j) from a where j='a' group by j;
> ++
> |  Explain   |
> ++
> | OPTIMIZED SQL: SELECT MIN(TRUE) AS `_o__c0`|
> | FROM `default`.`a` |
> | WHERE `j` = 'a'|
> | GROUP BY TRUE  |
> | STAGE DEPENDENCIES:|
> |   Stage-1 is a root stage  |
> |   Stage-0 depends on stages: Stage-1   |
> ||
> | STAGE PLANS:   |
> |   Stage: Stage-1   |
> | Tez|
> |   DagId: 
> anagarwal_20190318153535_25c1f460-1986-475e-9995-9f6342029dd8:11 |
> |   Edges:   |
> | Reducer 2 <- Map 1 (SIMPLE_EDGE)   |
> |   DagName: 
> anagarwal_20190318153535_25c1f460-1986-475e-9995-9f6342029dd8:11 |
> |   Vertices:|
> | Map 1  |
> | Map Operator Tree: |
> | TableScan  |
> |   alias: a |
> |   filterExpr: (j = 'a') (type: boolean) |
> |   Statistics: Num rows: 2 Data size: 170 Basic stats: 
> COMPLETE Column stats: COMPLETE |
> |   GatherStats: false   |
> |   Filter Operator  |
> | isSamplingPred: false  |
> | predicate: (j = 'a') (type: boolean) |
> | Statistics: Num rows: 1 Data size: 85 Basic stats: 
> COMPLETE Column stats: COMPLETE |
> | Select Operator|
> |   Statistics: Num rows: 1 Data size: 85 Basic stats: 
> COMPLETE Column stats: COMPLETE |
> |   Group By Operator|
> | aggregations: min(true)|
> | keys: true (type: boolean) |
> | mode: hash |
> | outputColumnNames: _col0, _col1 |
> | Statistics: Num rows: 1 Data size: 8 Basic stats: 
> COMPLETE Column stats: COMPLETE |
> | Reduce Output Operator |
> |   key expressions: _col0 (type: boolean) |
> |   null sort order: a   |
> |   sort order: +|
> |   Map-reduce partition columns: _col0 (type: 
> boolean) |
> |   Statistics: Num rows: 1 Data size: 8 Basic stats: 
> COMPLETE Column stats: COMPLETE |
> |   tag: -1  |
> |   value expressions: _col1 (type: boolean) |
> |   auto parallelism: true   |
> | Path -> Alias: |
> |   hdfs://localhost:9000/tmp/hive/warehouse/a [a] |
> | Path -> Partition: |
> |   hdfs://localhost:9000/tmp/hive/warehouse/a  |
> | Partition  |
> |   base file name: a|
> |   input format: org.apache.hadoop.mapred.TextInputFormat |
> |   output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat |
> |   properties:  |
> | COLUMN_STATS_ACCURATE 
> {"BASIC_STATS":"true","COLUMN_STATS":{"i":"true","j":"true"}} |
> | bucket_count -1|
> | bucketing_version 2|
> | column.name.delimiter ,|
> | columns i,j|
> | 

[jira] [Updated] (HIVE-21539) GroupBy + where clause on same column results in incorrect query rewrite

2019-04-08 Thread Vineet Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-21539:
---
Status: Patch Available  (was: Open)

> GroupBy + where clause on same column results in incorrect query rewrite
> 
>
> Key: HIVE-21539
> URL: https://issues.apache.org/jira/browse/HIVE-21539
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 4.0.0
>Reporter: anishek
>Assignee: Vineet Garg
>Priority: Major
> Attachments: HIVE-21539.1.patch, HIVE-21539.2.patch, 
> HIVE-21539.3.patch
>
>
> {code}
> create table a (i int, j string);
> insert into a values ( 1, 'a'),(2,'b');
> explain extended select min(j) from a where j='a' group by j;
> ++
> |  Explain   |
> ++
> | OPTIMIZED SQL: SELECT MIN(TRUE) AS `_o__c0`|
> | FROM `default`.`a` |
> | WHERE `j` = 'a'|
> | GROUP BY TRUE  |
> | STAGE DEPENDENCIES:|
> |   Stage-1 is a root stage  |
> |   Stage-0 depends on stages: Stage-1   |
> ||
> | STAGE PLANS:   |
> |   Stage: Stage-1   |
> | Tez|
> |   DagId: 
> anagarwal_20190318153535_25c1f460-1986-475e-9995-9f6342029dd8:11 |
> |   Edges:   |
> | Reducer 2 <- Map 1 (SIMPLE_EDGE)   |
> |   DagName: 
> anagarwal_20190318153535_25c1f460-1986-475e-9995-9f6342029dd8:11 |
> |   Vertices:|
> | Map 1  |
> | Map Operator Tree: |
> | TableScan  |
> |   alias: a |
> |   filterExpr: (j = 'a') (type: boolean) |
> |   Statistics: Num rows: 2 Data size: 170 Basic stats: 
> COMPLETE Column stats: COMPLETE |
> |   GatherStats: false   |
> |   Filter Operator  |
> | isSamplingPred: false  |
> | predicate: (j = 'a') (type: boolean) |
> | Statistics: Num rows: 1 Data size: 85 Basic stats: 
> COMPLETE Column stats: COMPLETE |
> | Select Operator|
> |   Statistics: Num rows: 1 Data size: 85 Basic stats: 
> COMPLETE Column stats: COMPLETE |
> |   Group By Operator|
> | aggregations: min(true)|
> | keys: true (type: boolean) |
> | mode: hash |
> | outputColumnNames: _col0, _col1 |
> | Statistics: Num rows: 1 Data size: 8 Basic stats: 
> COMPLETE Column stats: COMPLETE |
> | Reduce Output Operator |
> |   key expressions: _col0 (type: boolean) |
> |   null sort order: a   |
> |   sort order: +|
> |   Map-reduce partition columns: _col0 (type: 
> boolean) |
> |   Statistics: Num rows: 1 Data size: 8 Basic stats: 
> COMPLETE Column stats: COMPLETE |
> |   tag: -1  |
> |   value expressions: _col1 (type: boolean) |
> |   auto parallelism: true   |
> | Path -> Alias: |
> |   hdfs://localhost:9000/tmp/hive/warehouse/a [a] |
> | Path -> Partition: |
> |   hdfs://localhost:9000/tmp/hive/warehouse/a  |
> | Partition  |
> |   base file name: a|
> |   input format: org.apache.hadoop.mapred.TextInputFormat |
> |   output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat |
> |   properties:  |
> | COLUMN_STATS_ACCURATE 
> {"BASIC_STATS":"true","COLUMN_STATS":{"i":"true","j":"true"}} |
> | bucket_count -1|
> | bucketing_version 2|
> | column.name.delimiter ,|
> | columns i,j|
> |   

[jira] [Updated] (HIVE-21539) GroupBy + where clause on same column results in incorrect query rewrite

2019-04-05 Thread Vineet Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-21539:
---
Status: Open  (was: Patch Available)

> GroupBy + where clause on same column results in incorrect query rewrite
> 
>
> Key: HIVE-21539
> URL: https://issues.apache.org/jira/browse/HIVE-21539
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 4.0.0
>Reporter: anishek
>Assignee: Vineet Garg
>Priority: Major
> Attachments: HIVE-21539.1.patch, HIVE-21539.2.patch, 
> HIVE-21539.3.patch
>
>
> {code}
> create table a (i int, j string);
> insert into a values ( 1, 'a'),(2,'b');
> explain extended select min(j) from a where j='a' group by j;
> ++
> |  Explain   |
> ++
> | OPTIMIZED SQL: SELECT MIN(TRUE) AS `_o__c0`|
> | FROM `default`.`a` |
> | WHERE `j` = 'a'|
> | GROUP BY TRUE  |
> | STAGE DEPENDENCIES:|
> |   Stage-1 is a root stage  |
> |   Stage-0 depends on stages: Stage-1   |
> ||
> | STAGE PLANS:   |
> |   Stage: Stage-1   |
> | Tez|
> |   DagId: 
> anagarwal_20190318153535_25c1f460-1986-475e-9995-9f6342029dd8:11 |
> |   Edges:   |
> | Reducer 2 <- Map 1 (SIMPLE_EDGE)   |
> |   DagName: 
> anagarwal_20190318153535_25c1f460-1986-475e-9995-9f6342029dd8:11 |
> |   Vertices:|
> | Map 1  |
> | Map Operator Tree: |
> | TableScan  |
> |   alias: a |
> |   filterExpr: (j = 'a') (type: boolean) |
> |   Statistics: Num rows: 2 Data size: 170 Basic stats: 
> COMPLETE Column stats: COMPLETE |
> |   GatherStats: false   |
> |   Filter Operator  |
> | isSamplingPred: false  |
> | predicate: (j = 'a') (type: boolean) |
> | Statistics: Num rows: 1 Data size: 85 Basic stats: 
> COMPLETE Column stats: COMPLETE |
> | Select Operator|
> |   Statistics: Num rows: 1 Data size: 85 Basic stats: 
> COMPLETE Column stats: COMPLETE |
> |   Group By Operator|
> | aggregations: min(true)|
> | keys: true (type: boolean) |
> | mode: hash |
> | outputColumnNames: _col0, _col1 |
> | Statistics: Num rows: 1 Data size: 8 Basic stats: 
> COMPLETE Column stats: COMPLETE |
> | Reduce Output Operator |
> |   key expressions: _col0 (type: boolean) |
> |   null sort order: a   |
> |   sort order: +|
> |   Map-reduce partition columns: _col0 (type: 
> boolean) |
> |   Statistics: Num rows: 1 Data size: 8 Basic stats: 
> COMPLETE Column stats: COMPLETE |
> |   tag: -1  |
> |   value expressions: _col1 (type: boolean) |
> |   auto parallelism: true   |
> | Path -> Alias: |
> |   hdfs://localhost:9000/tmp/hive/warehouse/a [a] |
> | Path -> Partition: |
> |   hdfs://localhost:9000/tmp/hive/warehouse/a  |
> | Partition  |
> |   base file name: a|
> |   input format: org.apache.hadoop.mapred.TextInputFormat |
> |   output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat |
> |   properties:  |
> | COLUMN_STATS_ACCURATE 
> {"BASIC_STATS":"true","COLUMN_STATS":{"i":"true","j":"true"}} |
> | bucket_count -1|
> | bucketing_version 2|
> | column.name.delimiter ,|
> | columns i,j|
> |   

[jira] [Updated] (HIVE-21539) GroupBy + where clause on same column results in incorrect query rewrite

2019-04-05 Thread Vineet Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-21539:
---
Status: Patch Available  (was: Open)

> GroupBy + where clause on same column results in incorrect query rewrite
> 
>
> Key: HIVE-21539
> URL: https://issues.apache.org/jira/browse/HIVE-21539
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 4.0.0
>Reporter: anishek
>Assignee: Vineet Garg
>Priority: Major
> Attachments: HIVE-21539.1.patch, HIVE-21539.2.patch, 
> HIVE-21539.3.patch
>
>
> {code}
> create table a (i int, j string);
> insert into a values ( 1, 'a'),(2,'b');
> explain extended select min(j) from a where j='a' group by j;
> ++
> |  Explain   |
> ++
> | OPTIMIZED SQL: SELECT MIN(TRUE) AS `_o__c0`|
> | FROM `default`.`a` |
> | WHERE `j` = 'a'|
> | GROUP BY TRUE  |
> | STAGE DEPENDENCIES:|
> |   Stage-1 is a root stage  |
> |   Stage-0 depends on stages: Stage-1   |
> ||
> | STAGE PLANS:   |
> |   Stage: Stage-1   |
> | Tez|
> |   DagId: 
> anagarwal_20190318153535_25c1f460-1986-475e-9995-9f6342029dd8:11 |
> |   Edges:   |
> | Reducer 2 <- Map 1 (SIMPLE_EDGE)   |
> |   DagName: 
> anagarwal_20190318153535_25c1f460-1986-475e-9995-9f6342029dd8:11 |
> |   Vertices:|
> | Map 1  |
> | Map Operator Tree: |
> | TableScan  |
> |   alias: a |
> |   filterExpr: (j = 'a') (type: boolean) |
> |   Statistics: Num rows: 2 Data size: 170 Basic stats: 
> COMPLETE Column stats: COMPLETE |
> |   GatherStats: false   |
> |   Filter Operator  |
> | isSamplingPred: false  |
> | predicate: (j = 'a') (type: boolean) |
> | Statistics: Num rows: 1 Data size: 85 Basic stats: 
> COMPLETE Column stats: COMPLETE |
> | Select Operator|
> |   Statistics: Num rows: 1 Data size: 85 Basic stats: 
> COMPLETE Column stats: COMPLETE |
> |   Group By Operator|
> | aggregations: min(true)|
> | keys: true (type: boolean) |
> | mode: hash |
> | outputColumnNames: _col0, _col1 |
> | Statistics: Num rows: 1 Data size: 8 Basic stats: 
> COMPLETE Column stats: COMPLETE |
> | Reduce Output Operator |
> |   key expressions: _col0 (type: boolean) |
> |   null sort order: a   |
> |   sort order: +|
> |   Map-reduce partition columns: _col0 (type: 
> boolean) |
> |   Statistics: Num rows: 1 Data size: 8 Basic stats: 
> COMPLETE Column stats: COMPLETE |
> |   tag: -1  |
> |   value expressions: _col1 (type: boolean) |
> |   auto parallelism: true   |
> | Path -> Alias: |
> |   hdfs://localhost:9000/tmp/hive/warehouse/a [a] |
> | Path -> Partition: |
> |   hdfs://localhost:9000/tmp/hive/warehouse/a  |
> | Partition  |
> |   base file name: a|
> |   input format: org.apache.hadoop.mapred.TextInputFormat |
> |   output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat |
> |   properties:  |
> | COLUMN_STATS_ACCURATE 
> {"BASIC_STATS":"true","COLUMN_STATS":{"i":"true","j":"true"}} |
> | bucket_count -1|
> | bucketing_version 2|
> | column.name.delimiter ,|
> | columns i,j|
> |   

[jira] [Updated] (HIVE-21539) GroupBy + where clause on same column results in incorrect query rewrite

2019-04-03 Thread Vineet Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-21539:
---
Status: Open  (was: Patch Available)

> GroupBy + where clause on same column results in incorrect query rewrite
> 
>
> Key: HIVE-21539
> URL: https://issues.apache.org/jira/browse/HIVE-21539
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 4.0.0
>Reporter: anishek
>Assignee: Vineet Garg
>Priority: Major
> Attachments: HIVE-21539.1.patch, HIVE-21539.2.patch, 
> HIVE-21539.3.patch
>
>
> {code}
> create table a (i int, j string);
> insert into a values ( 1, 'a'),(2,'b');
> explain extended select min(j) from a where j='a' group by j;
> ++
> |  Explain   |
> ++
> | OPTIMIZED SQL: SELECT MIN(TRUE) AS `_o__c0`|
> | FROM `default`.`a` |
> | WHERE `j` = 'a'|
> | GROUP BY TRUE  |
> | STAGE DEPENDENCIES:|
> |   Stage-1 is a root stage  |
> |   Stage-0 depends on stages: Stage-1   |
> ||
> | STAGE PLANS:   |
> |   Stage: Stage-1   |
> | Tez|
> |   DagId: 
> anagarwal_20190318153535_25c1f460-1986-475e-9995-9f6342029dd8:11 |
> |   Edges:   |
> | Reducer 2 <- Map 1 (SIMPLE_EDGE)   |
> |   DagName: 
> anagarwal_20190318153535_25c1f460-1986-475e-9995-9f6342029dd8:11 |
> |   Vertices:|
> | Map 1  |
> | Map Operator Tree: |
> | TableScan  |
> |   alias: a |
> |   filterExpr: (j = 'a') (type: boolean) |
> |   Statistics: Num rows: 2 Data size: 170 Basic stats: 
> COMPLETE Column stats: COMPLETE |
> |   GatherStats: false   |
> |   Filter Operator  |
> | isSamplingPred: false  |
> | predicate: (j = 'a') (type: boolean) |
> | Statistics: Num rows: 1 Data size: 85 Basic stats: 
> COMPLETE Column stats: COMPLETE |
> | Select Operator|
> |   Statistics: Num rows: 1 Data size: 85 Basic stats: 
> COMPLETE Column stats: COMPLETE |
> |   Group By Operator|
> | aggregations: min(true)|
> | keys: true (type: boolean) |
> | mode: hash |
> | outputColumnNames: _col0, _col1 |
> | Statistics: Num rows: 1 Data size: 8 Basic stats: 
> COMPLETE Column stats: COMPLETE |
> | Reduce Output Operator |
> |   key expressions: _col0 (type: boolean) |
> |   null sort order: a   |
> |   sort order: +|
> |   Map-reduce partition columns: _col0 (type: 
> boolean) |
> |   Statistics: Num rows: 1 Data size: 8 Basic stats: 
> COMPLETE Column stats: COMPLETE |
> |   tag: -1  |
> |   value expressions: _col1 (type: boolean) |
> |   auto parallelism: true   |
> | Path -> Alias: |
> |   hdfs://localhost:9000/tmp/hive/warehouse/a [a] |
> | Path -> Partition: |
> |   hdfs://localhost:9000/tmp/hive/warehouse/a  |
> | Partition  |
> |   base file name: a|
> |   input format: org.apache.hadoop.mapred.TextInputFormat |
> |   output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat |
> |   properties:  |
> | COLUMN_STATS_ACCURATE 
> {"BASIC_STATS":"true","COLUMN_STATS":{"i":"true","j":"true"}} |
> | bucket_count -1|
> | bucketing_version 2|
> | column.name.delimiter ,|
> | columns i,j|
> |   

[jira] [Updated] (HIVE-21539) GroupBy + where clause on same column results in incorrect query rewrite

2019-04-03 Thread Vineet Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-21539:
---
Status: Patch Available  (was: Open)

> GroupBy + where clause on same column results in incorrect query rewrite
> 
>
> Key: HIVE-21539
> URL: https://issues.apache.org/jira/browse/HIVE-21539
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 4.0.0
>Reporter: anishek
>Assignee: Vineet Garg
>Priority: Major
> Attachments: HIVE-21539.1.patch, HIVE-21539.2.patch, 
> HIVE-21539.3.patch
>
>
> {code}
> create table a (i int, j string);
> insert into a values ( 1, 'a'),(2,'b');
> explain extended select min(j) from a where j='a' group by j;
> ++
> |  Explain   |
> ++
> | OPTIMIZED SQL: SELECT MIN(TRUE) AS `_o__c0`|
> | FROM `default`.`a` |
> | WHERE `j` = 'a'|
> | GROUP BY TRUE  |
> | STAGE DEPENDENCIES:|
> |   Stage-1 is a root stage  |
> |   Stage-0 depends on stages: Stage-1   |
> ||
> | STAGE PLANS:   |
> |   Stage: Stage-1   |
> | Tez|
> |   DagId: 
> anagarwal_20190318153535_25c1f460-1986-475e-9995-9f6342029dd8:11 |
> |   Edges:   |
> | Reducer 2 <- Map 1 (SIMPLE_EDGE)   |
> |   DagName: 
> anagarwal_20190318153535_25c1f460-1986-475e-9995-9f6342029dd8:11 |
> |   Vertices:|
> | Map 1  |
> | Map Operator Tree: |
> | TableScan  |
> |   alias: a |
> |   filterExpr: (j = 'a') (type: boolean) |
> |   Statistics: Num rows: 2 Data size: 170 Basic stats: 
> COMPLETE Column stats: COMPLETE |
> |   GatherStats: false   |
> |   Filter Operator  |
> | isSamplingPred: false  |
> | predicate: (j = 'a') (type: boolean) |
> | Statistics: Num rows: 1 Data size: 85 Basic stats: 
> COMPLETE Column stats: COMPLETE |
> | Select Operator|
> |   Statistics: Num rows: 1 Data size: 85 Basic stats: 
> COMPLETE Column stats: COMPLETE |
> |   Group By Operator|
> | aggregations: min(true)|
> | keys: true (type: boolean) |
> | mode: hash |
> | outputColumnNames: _col0, _col1 |
> | Statistics: Num rows: 1 Data size: 8 Basic stats: 
> COMPLETE Column stats: COMPLETE |
> | Reduce Output Operator |
> |   key expressions: _col0 (type: boolean) |
> |   null sort order: a   |
> |   sort order: +|
> |   Map-reduce partition columns: _col0 (type: 
> boolean) |
> |   Statistics: Num rows: 1 Data size: 8 Basic stats: 
> COMPLETE Column stats: COMPLETE |
> |   tag: -1  |
> |   value expressions: _col1 (type: boolean) |
> |   auto parallelism: true   |
> | Path -> Alias: |
> |   hdfs://localhost:9000/tmp/hive/warehouse/a [a] |
> | Path -> Partition: |
> |   hdfs://localhost:9000/tmp/hive/warehouse/a  |
> | Partition  |
> |   base file name: a|
> |   input format: org.apache.hadoop.mapred.TextInputFormat |
> |   output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat |
> |   properties:  |
> | COLUMN_STATS_ACCURATE 
> {"BASIC_STATS":"true","COLUMN_STATS":{"i":"true","j":"true"}} |
> | bucket_count -1|
> | bucketing_version 2|
> | column.name.delimiter ,|
> | columns i,j|
> |   

[jira] [Updated] (HIVE-21539) GroupBy + where clause on same column results in incorrect query rewrite

2019-04-03 Thread Vineet Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-21539:
---
Attachment: HIVE-21539.3.patch

> GroupBy + where clause on same column results in incorrect query rewrite
> 
>
> Key: HIVE-21539
> URL: https://issues.apache.org/jira/browse/HIVE-21539
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 4.0.0
>Reporter: anishek
>Assignee: Vineet Garg
>Priority: Major
> Attachments: HIVE-21539.1.patch, HIVE-21539.2.patch, 
> HIVE-21539.3.patch
>
>
> {code}
> create table a (i int, j string);
> insert into a values ( 1, 'a'),(2,'b');
> explain extended select min(j) from a where j='a' group by j;
> ++
> |  Explain   |
> ++
> | OPTIMIZED SQL: SELECT MIN(TRUE) AS `_o__c0`|
> | FROM `default`.`a` |
> | WHERE `j` = 'a'|
> | GROUP BY TRUE  |
> | STAGE DEPENDENCIES:|
> |   Stage-1 is a root stage  |
> |   Stage-0 depends on stages: Stage-1   |
> ||
> | STAGE PLANS:   |
> |   Stage: Stage-1   |
> | Tez|
> |   DagId: 
> anagarwal_20190318153535_25c1f460-1986-475e-9995-9f6342029dd8:11 |
> |   Edges:   |
> | Reducer 2 <- Map 1 (SIMPLE_EDGE)   |
> |   DagName: 
> anagarwal_20190318153535_25c1f460-1986-475e-9995-9f6342029dd8:11 |
> |   Vertices:|
> | Map 1  |
> | Map Operator Tree: |
> | TableScan  |
> |   alias: a |
> |   filterExpr: (j = 'a') (type: boolean) |
> |   Statistics: Num rows: 2 Data size: 170 Basic stats: 
> COMPLETE Column stats: COMPLETE |
> |   GatherStats: false   |
> |   Filter Operator  |
> | isSamplingPred: false  |
> | predicate: (j = 'a') (type: boolean) |
> | Statistics: Num rows: 1 Data size: 85 Basic stats: 
> COMPLETE Column stats: COMPLETE |
> | Select Operator|
> |   Statistics: Num rows: 1 Data size: 85 Basic stats: 
> COMPLETE Column stats: COMPLETE |
> |   Group By Operator|
> | aggregations: min(true)|
> | keys: true (type: boolean) |
> | mode: hash |
> | outputColumnNames: _col0, _col1 |
> | Statistics: Num rows: 1 Data size: 8 Basic stats: 
> COMPLETE Column stats: COMPLETE |
> | Reduce Output Operator |
> |   key expressions: _col0 (type: boolean) |
> |   null sort order: a   |
> |   sort order: +|
> |   Map-reduce partition columns: _col0 (type: 
> boolean) |
> |   Statistics: Num rows: 1 Data size: 8 Basic stats: 
> COMPLETE Column stats: COMPLETE |
> |   tag: -1  |
> |   value expressions: _col1 (type: boolean) |
> |   auto parallelism: true   |
> | Path -> Alias: |
> |   hdfs://localhost:9000/tmp/hive/warehouse/a [a] |
> | Path -> Partition: |
> |   hdfs://localhost:9000/tmp/hive/warehouse/a  |
> | Partition  |
> |   base file name: a|
> |   input format: org.apache.hadoop.mapred.TextInputFormat |
> |   output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat |
> |   properties:  |
> | COLUMN_STATS_ACCURATE 
> {"BASIC_STATS":"true","COLUMN_STATS":{"i":"true","j":"true"}} |
> | bucket_count -1|
> | bucketing_version 2|
> | column.name.delimiter ,|
> | columns i,j|
> | 

[jira] [Updated] (HIVE-21539) GroupBy + where clause on same column results in incorrect query rewrite

2019-04-01 Thread Vineet Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-21539:
---
Attachment: HIVE-21539.2.patch

> GroupBy + where clause on same column results in incorrect query rewrite
> 
>
> Key: HIVE-21539
> URL: https://issues.apache.org/jira/browse/HIVE-21539
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 4.0.0
>Reporter: anishek
>Assignee: Vineet Garg
>Priority: Major
> Attachments: HIVE-21539.1.patch, HIVE-21539.2.patch
>
>
> {code}
> create table a (i int, j string);
> insert into a values ( 1, 'a'),(2,'b');
> explain extended select min(j) from a where j='a' group by j;
> ++
> |  Explain   |
> ++
> | OPTIMIZED SQL: SELECT MIN(TRUE) AS `_o__c0`|
> | FROM `default`.`a` |
> | WHERE `j` = 'a'|
> | GROUP BY TRUE  |
> | STAGE DEPENDENCIES:|
> |   Stage-1 is a root stage  |
> |   Stage-0 depends on stages: Stage-1   |
> ||
> | STAGE PLANS:   |
> |   Stage: Stage-1   |
> | Tez|
> |   DagId: 
> anagarwal_20190318153535_25c1f460-1986-475e-9995-9f6342029dd8:11 |
> |   Edges:   |
> | Reducer 2 <- Map 1 (SIMPLE_EDGE)   |
> |   DagName: 
> anagarwal_20190318153535_25c1f460-1986-475e-9995-9f6342029dd8:11 |
> |   Vertices:|
> | Map 1  |
> | Map Operator Tree: |
> | TableScan  |
> |   alias: a |
> |   filterExpr: (j = 'a') (type: boolean) |
> |   Statistics: Num rows: 2 Data size: 170 Basic stats: 
> COMPLETE Column stats: COMPLETE |
> |   GatherStats: false   |
> |   Filter Operator  |
> | isSamplingPred: false  |
> | predicate: (j = 'a') (type: boolean) |
> | Statistics: Num rows: 1 Data size: 85 Basic stats: 
> COMPLETE Column stats: COMPLETE |
> | Select Operator|
> |   Statistics: Num rows: 1 Data size: 85 Basic stats: 
> COMPLETE Column stats: COMPLETE |
> |   Group By Operator|
> | aggregations: min(true)|
> | keys: true (type: boolean) |
> | mode: hash |
> | outputColumnNames: _col0, _col1 |
> | Statistics: Num rows: 1 Data size: 8 Basic stats: 
> COMPLETE Column stats: COMPLETE |
> | Reduce Output Operator |
> |   key expressions: _col0 (type: boolean) |
> |   null sort order: a   |
> |   sort order: +|
> |   Map-reduce partition columns: _col0 (type: 
> boolean) |
> |   Statistics: Num rows: 1 Data size: 8 Basic stats: 
> COMPLETE Column stats: COMPLETE |
> |   tag: -1  |
> |   value expressions: _col1 (type: boolean) |
> |   auto parallelism: true   |
> | Path -> Alias: |
> |   hdfs://localhost:9000/tmp/hive/warehouse/a [a] |
> | Path -> Partition: |
> |   hdfs://localhost:9000/tmp/hive/warehouse/a  |
> | Partition  |
> |   base file name: a|
> |   input format: org.apache.hadoop.mapred.TextInputFormat |
> |   output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat |
> |   properties:  |
> | COLUMN_STATS_ACCURATE 
> {"BASIC_STATS":"true","COLUMN_STATS":{"i":"true","j":"true"}} |
> | bucket_count -1|
> | bucketing_version 2|
> | column.name.delimiter ,|
> | columns i,j|
> | columns.comments   

[jira] [Updated] (HIVE-21539) GroupBy + where clause on same column results in incorrect query rewrite

2019-04-01 Thread Vineet Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-21539:
---
Status: Patch Available  (was: Open)

> GroupBy + where clause on same column results in incorrect query rewrite
> 
>
> Key: HIVE-21539
> URL: https://issues.apache.org/jira/browse/HIVE-21539
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 4.0.0
>Reporter: anishek
>Assignee: Vineet Garg
>Priority: Major
> Attachments: HIVE-21539.1.patch, HIVE-21539.2.patch
>
>
> {code}
> create table a (i int, j string);
> insert into a values ( 1, 'a'),(2,'b');
> explain extended select min(j) from a where j='a' group by j;
> ++
> |  Explain   |
> ++
> | OPTIMIZED SQL: SELECT MIN(TRUE) AS `_o__c0`|
> | FROM `default`.`a` |
> | WHERE `j` = 'a'|
> | GROUP BY TRUE  |
> | STAGE DEPENDENCIES:|
> |   Stage-1 is a root stage  |
> |   Stage-0 depends on stages: Stage-1   |
> ||
> | STAGE PLANS:   |
> |   Stage: Stage-1   |
> | Tez|
> |   DagId: 
> anagarwal_20190318153535_25c1f460-1986-475e-9995-9f6342029dd8:11 |
> |   Edges:   |
> | Reducer 2 <- Map 1 (SIMPLE_EDGE)   |
> |   DagName: 
> anagarwal_20190318153535_25c1f460-1986-475e-9995-9f6342029dd8:11 |
> |   Vertices:|
> | Map 1  |
> | Map Operator Tree: |
> | TableScan  |
> |   alias: a |
> |   filterExpr: (j = 'a') (type: boolean) |
> |   Statistics: Num rows: 2 Data size: 170 Basic stats: 
> COMPLETE Column stats: COMPLETE |
> |   GatherStats: false   |
> |   Filter Operator  |
> | isSamplingPred: false  |
> | predicate: (j = 'a') (type: boolean) |
> | Statistics: Num rows: 1 Data size: 85 Basic stats: 
> COMPLETE Column stats: COMPLETE |
> | Select Operator|
> |   Statistics: Num rows: 1 Data size: 85 Basic stats: 
> COMPLETE Column stats: COMPLETE |
> |   Group By Operator|
> | aggregations: min(true)|
> | keys: true (type: boolean) |
> | mode: hash |
> | outputColumnNames: _col0, _col1 |
> | Statistics: Num rows: 1 Data size: 8 Basic stats: 
> COMPLETE Column stats: COMPLETE |
> | Reduce Output Operator |
> |   key expressions: _col0 (type: boolean) |
> |   null sort order: a   |
> |   sort order: +|
> |   Map-reduce partition columns: _col0 (type: 
> boolean) |
> |   Statistics: Num rows: 1 Data size: 8 Basic stats: 
> COMPLETE Column stats: COMPLETE |
> |   tag: -1  |
> |   value expressions: _col1 (type: boolean) |
> |   auto parallelism: true   |
> | Path -> Alias: |
> |   hdfs://localhost:9000/tmp/hive/warehouse/a [a] |
> | Path -> Partition: |
> |   hdfs://localhost:9000/tmp/hive/warehouse/a  |
> | Partition  |
> |   base file name: a|
> |   input format: org.apache.hadoop.mapred.TextInputFormat |
> |   output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat |
> |   properties:  |
> | COLUMN_STATS_ACCURATE 
> {"BASIC_STATS":"true","COLUMN_STATS":{"i":"true","j":"true"}} |
> | bucket_count -1|
> | bucketing_version 2|
> | column.name.delimiter ,|
> | columns i,j|
> | columns.comments 

[jira] [Updated] (HIVE-21539) GroupBy + where clause on same column results in incorrect query rewrite

2019-04-01 Thread Vineet Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-21539:
---
Status: Open  (was: Patch Available)

> GroupBy + where clause on same column results in incorrect query rewrite
> 
>
> Key: HIVE-21539
> URL: https://issues.apache.org/jira/browse/HIVE-21539
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 4.0.0
>Reporter: anishek
>Assignee: Vineet Garg
>Priority: Major
> Attachments: HIVE-21539.1.patch, HIVE-21539.2.patch
>
>
> {code}
> create table a (i int, j string);
> insert into a values ( 1, 'a'),(2,'b');
> explain extended select min(j) from a where j='a' group by j;
> ++
> |  Explain   |
> ++
> | OPTIMIZED SQL: SELECT MIN(TRUE) AS `_o__c0`|
> | FROM `default`.`a` |
> | WHERE `j` = 'a'|
> | GROUP BY TRUE  |
> | STAGE DEPENDENCIES:|
> |   Stage-1 is a root stage  |
> |   Stage-0 depends on stages: Stage-1   |
> ||
> | STAGE PLANS:   |
> |   Stage: Stage-1   |
> | Tez|
> |   DagId: 
> anagarwal_20190318153535_25c1f460-1986-475e-9995-9f6342029dd8:11 |
> |   Edges:   |
> | Reducer 2 <- Map 1 (SIMPLE_EDGE)   |
> |   DagName: 
> anagarwal_20190318153535_25c1f460-1986-475e-9995-9f6342029dd8:11 |
> |   Vertices:|
> | Map 1  |
> | Map Operator Tree: |
> | TableScan  |
> |   alias: a |
> |   filterExpr: (j = 'a') (type: boolean) |
> |   Statistics: Num rows: 2 Data size: 170 Basic stats: 
> COMPLETE Column stats: COMPLETE |
> |   GatherStats: false   |
> |   Filter Operator  |
> | isSamplingPred: false  |
> | predicate: (j = 'a') (type: boolean) |
> | Statistics: Num rows: 1 Data size: 85 Basic stats: 
> COMPLETE Column stats: COMPLETE |
> | Select Operator|
> |   Statistics: Num rows: 1 Data size: 85 Basic stats: 
> COMPLETE Column stats: COMPLETE |
> |   Group By Operator|
> | aggregations: min(true)|
> | keys: true (type: boolean) |
> | mode: hash |
> | outputColumnNames: _col0, _col1 |
> | Statistics: Num rows: 1 Data size: 8 Basic stats: 
> COMPLETE Column stats: COMPLETE |
> | Reduce Output Operator |
> |   key expressions: _col0 (type: boolean) |
> |   null sort order: a   |
> |   sort order: +|
> |   Map-reduce partition columns: _col0 (type: 
> boolean) |
> |   Statistics: Num rows: 1 Data size: 8 Basic stats: 
> COMPLETE Column stats: COMPLETE |
> |   tag: -1  |
> |   value expressions: _col1 (type: boolean) |
> |   auto parallelism: true   |
> | Path -> Alias: |
> |   hdfs://localhost:9000/tmp/hive/warehouse/a [a] |
> | Path -> Partition: |
> |   hdfs://localhost:9000/tmp/hive/warehouse/a  |
> | Partition  |
> |   base file name: a|
> |   input format: org.apache.hadoop.mapred.TextInputFormat |
> |   output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat |
> |   properties:  |
> | COLUMN_STATS_ACCURATE 
> {"BASIC_STATS":"true","COLUMN_STATS":{"i":"true","j":"true"}} |
> | bucket_count -1|
> | bucketing_version 2|
> | column.name.delimiter ,|
> | columns i,j|
> | columns.comments 

[jira] [Updated] (HIVE-21539) GroupBy + where clause on same column results in incorrect query rewrite

2019-04-01 Thread Vineet Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-21539:
---
Status: Patch Available  (was: Open)

> GroupBy + where clause on same column results in incorrect query rewrite
> 
>
> Key: HIVE-21539
> URL: https://issues.apache.org/jira/browse/HIVE-21539
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 4.0.0
>Reporter: anishek
>Assignee: Vineet Garg
>Priority: Major
> Attachments: HIVE-21539.1.patch
>
>
> {code}
> create table a (i int, j string);
> insert into a values ( 1, 'a'),(2,'b');
> explain extended select min(j) from a where j='a' group by j;
> ++
> |  Explain   |
> ++
> | OPTIMIZED SQL: SELECT MIN(TRUE) AS `_o__c0`|
> | FROM `default`.`a` |
> | WHERE `j` = 'a'|
> | GROUP BY TRUE  |
> | STAGE DEPENDENCIES:|
> |   Stage-1 is a root stage  |
> |   Stage-0 depends on stages: Stage-1   |
> ||
> | STAGE PLANS:   |
> |   Stage: Stage-1   |
> | Tez|
> |   DagId: 
> anagarwal_20190318153535_25c1f460-1986-475e-9995-9f6342029dd8:11 |
> |   Edges:   |
> | Reducer 2 <- Map 1 (SIMPLE_EDGE)   |
> |   DagName: 
> anagarwal_20190318153535_25c1f460-1986-475e-9995-9f6342029dd8:11 |
> |   Vertices:|
> | Map 1  |
> | Map Operator Tree: |
> | TableScan  |
> |   alias: a |
> |   filterExpr: (j = 'a') (type: boolean) |
> |   Statistics: Num rows: 2 Data size: 170 Basic stats: 
> COMPLETE Column stats: COMPLETE |
> |   GatherStats: false   |
> |   Filter Operator  |
> | isSamplingPred: false  |
> | predicate: (j = 'a') (type: boolean) |
> | Statistics: Num rows: 1 Data size: 85 Basic stats: 
> COMPLETE Column stats: COMPLETE |
> | Select Operator|
> |   Statistics: Num rows: 1 Data size: 85 Basic stats: 
> COMPLETE Column stats: COMPLETE |
> |   Group By Operator|
> | aggregations: min(true)|
> | keys: true (type: boolean) |
> | mode: hash |
> | outputColumnNames: _col0, _col1 |
> | Statistics: Num rows: 1 Data size: 8 Basic stats: 
> COMPLETE Column stats: COMPLETE |
> | Reduce Output Operator |
> |   key expressions: _col0 (type: boolean) |
> |   null sort order: a   |
> |   sort order: +|
> |   Map-reduce partition columns: _col0 (type: 
> boolean) |
> |   Statistics: Num rows: 1 Data size: 8 Basic stats: 
> COMPLETE Column stats: COMPLETE |
> |   tag: -1  |
> |   value expressions: _col1 (type: boolean) |
> |   auto parallelism: true   |
> | Path -> Alias: |
> |   hdfs://localhost:9000/tmp/hive/warehouse/a [a] |
> | Path -> Partition: |
> |   hdfs://localhost:9000/tmp/hive/warehouse/a  |
> | Partition  |
> |   base file name: a|
> |   input format: org.apache.hadoop.mapred.TextInputFormat |
> |   output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat |
> |   properties:  |
> | COLUMN_STATS_ACCURATE 
> {"BASIC_STATS":"true","COLUMN_STATS":{"i":"true","j":"true"}} |
> | bucket_count -1|
> | bucketing_version 2|
> | column.name.delimiter ,|
> | columns i,j|
> | columns.comments   |
> | 

[jira] [Updated] (HIVE-21539) GroupBy + where clause on same column results in incorrect query rewrite

2019-04-01 Thread Vineet Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-21539:
---
Attachment: HIVE-21539.1.patch

> GroupBy + where clause on same column results in incorrect query rewrite
> 
>
> Key: HIVE-21539
> URL: https://issues.apache.org/jira/browse/HIVE-21539
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 4.0.0
>Reporter: anishek
>Assignee: Vineet Garg
>Priority: Major
> Attachments: HIVE-21539.1.patch
>
>
> {code}
> create table a (i int, j string);
> insert into a values ( 1, 'a'),(2,'b');
> explain extended select min(j) from a where j='a' group by j;
> ++
> |  Explain   |
> ++
> | OPTIMIZED SQL: SELECT MIN(TRUE) AS `_o__c0`|
> | FROM `default`.`a` |
> | WHERE `j` = 'a'|
> | GROUP BY TRUE  |
> | STAGE DEPENDENCIES:|
> |   Stage-1 is a root stage  |
> |   Stage-0 depends on stages: Stage-1   |
> ||
> | STAGE PLANS:   |
> |   Stage: Stage-1   |
> | Tez|
> |   DagId: 
> anagarwal_20190318153535_25c1f460-1986-475e-9995-9f6342029dd8:11 |
> |   Edges:   |
> | Reducer 2 <- Map 1 (SIMPLE_EDGE)   |
> |   DagName: 
> anagarwal_20190318153535_25c1f460-1986-475e-9995-9f6342029dd8:11 |
> |   Vertices:|
> | Map 1  |
> | Map Operator Tree: |
> | TableScan  |
> |   alias: a |
> |   filterExpr: (j = 'a') (type: boolean) |
> |   Statistics: Num rows: 2 Data size: 170 Basic stats: 
> COMPLETE Column stats: COMPLETE |
> |   GatherStats: false   |
> |   Filter Operator  |
> | isSamplingPred: false  |
> | predicate: (j = 'a') (type: boolean) |
> | Statistics: Num rows: 1 Data size: 85 Basic stats: 
> COMPLETE Column stats: COMPLETE |
> | Select Operator|
> |   Statistics: Num rows: 1 Data size: 85 Basic stats: 
> COMPLETE Column stats: COMPLETE |
> |   Group By Operator|
> | aggregations: min(true)|
> | keys: true (type: boolean) |
> | mode: hash |
> | outputColumnNames: _col0, _col1 |
> | Statistics: Num rows: 1 Data size: 8 Basic stats: 
> COMPLETE Column stats: COMPLETE |
> | Reduce Output Operator |
> |   key expressions: _col0 (type: boolean) |
> |   null sort order: a   |
> |   sort order: +|
> |   Map-reduce partition columns: _col0 (type: 
> boolean) |
> |   Statistics: Num rows: 1 Data size: 8 Basic stats: 
> COMPLETE Column stats: COMPLETE |
> |   tag: -1  |
> |   value expressions: _col1 (type: boolean) |
> |   auto parallelism: true   |
> | Path -> Alias: |
> |   hdfs://localhost:9000/tmp/hive/warehouse/a [a] |
> | Path -> Partition: |
> |   hdfs://localhost:9000/tmp/hive/warehouse/a  |
> | Partition  |
> |   base file name: a|
> |   input format: org.apache.hadoop.mapred.TextInputFormat |
> |   output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat |
> |   properties:  |
> | COLUMN_STATS_ACCURATE 
> {"BASIC_STATS":"true","COLUMN_STATS":{"i":"true","j":"true"}} |
> | bucket_count -1|
> | bucketing_version 2|
> | column.name.delimiter ,|
> | columns i,j|
> | columns.comments   |
> |   

[jira] [Updated] (HIVE-21539) GroupBy + where clause on same column results in incorrect query rewrite

2019-03-29 Thread Gopal V (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-21539:
---
Component/s: (was: HiveServer2)
 CBO

> GroupBy + where clause on same column results in incorrect query rewrite
> 
>
> Key: HIVE-21539
> URL: https://issues.apache.org/jira/browse/HIVE-21539
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 4.0.0
>Reporter: anishek
>Priority: Major
>
> {code}
> create table a (i int, j string);
> insert into a values ( 1, 'a'),(2,'b');
> explain extended select min(j) from a where j='a' group by j;
> ++
> |  Explain   |
> ++
> | OPTIMIZED SQL: SELECT MIN(TRUE) AS `_o__c0`|
> | FROM `default`.`a` |
> | WHERE `j` = 'a'|
> | GROUP BY TRUE  |
> | STAGE DEPENDENCIES:|
> |   Stage-1 is a root stage  |
> |   Stage-0 depends on stages: Stage-1   |
> ||
> | STAGE PLANS:   |
> |   Stage: Stage-1   |
> | Tez|
> |   DagId: 
> anagarwal_20190318153535_25c1f460-1986-475e-9995-9f6342029dd8:11 |
> |   Edges:   |
> | Reducer 2 <- Map 1 (SIMPLE_EDGE)   |
> |   DagName: 
> anagarwal_20190318153535_25c1f460-1986-475e-9995-9f6342029dd8:11 |
> |   Vertices:|
> | Map 1  |
> | Map Operator Tree: |
> | TableScan  |
> |   alias: a |
> |   filterExpr: (j = 'a') (type: boolean) |
> |   Statistics: Num rows: 2 Data size: 170 Basic stats: 
> COMPLETE Column stats: COMPLETE |
> |   GatherStats: false   |
> |   Filter Operator  |
> | isSamplingPred: false  |
> | predicate: (j = 'a') (type: boolean) |
> | Statistics: Num rows: 1 Data size: 85 Basic stats: 
> COMPLETE Column stats: COMPLETE |
> | Select Operator|
> |   Statistics: Num rows: 1 Data size: 85 Basic stats: 
> COMPLETE Column stats: COMPLETE |
> |   Group By Operator|
> | aggregations: min(true)|
> | keys: true (type: boolean) |
> | mode: hash |
> | outputColumnNames: _col0, _col1 |
> | Statistics: Num rows: 1 Data size: 8 Basic stats: 
> COMPLETE Column stats: COMPLETE |
> | Reduce Output Operator |
> |   key expressions: _col0 (type: boolean) |
> |   null sort order: a   |
> |   sort order: +|
> |   Map-reduce partition columns: _col0 (type: 
> boolean) |
> |   Statistics: Num rows: 1 Data size: 8 Basic stats: 
> COMPLETE Column stats: COMPLETE |
> |   tag: -1  |
> |   value expressions: _col1 (type: boolean) |
> |   auto parallelism: true   |
> | Path -> Alias: |
> |   hdfs://localhost:9000/tmp/hive/warehouse/a [a] |
> | Path -> Partition: |
> |   hdfs://localhost:9000/tmp/hive/warehouse/a  |
> | Partition  |
> |   base file name: a|
> |   input format: org.apache.hadoop.mapred.TextInputFormat |
> |   output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat |
> |   properties:  |
> | COLUMN_STATS_ACCURATE 
> {"BASIC_STATS":"true","COLUMN_STATS":{"i":"true","j":"true"}} |
> | bucket_count -1|
> | bucketing_version 2|
> | column.name.delimiter ,|
> | columns i,j|
> | columns.comments   |
> | columns.types int:string   |
> |