Re: Review Request 29878: Bug with max() together with rank() and grouping sets
On Jan. 16, 2015, 6:20 p.m., Ashutosh Chauhan wrote: overall looks good. Few minor comments. Thanks for the review. Fixed BucketingSortingInferenceOptimizer to reflect grouping set pruning and changed default value of grouping set id to -1 in semijoin cases. On Jan. 16, 2015, 6:20 p.m., Ashutosh Chauhan wrote: ql/src/test/results/clientpositive/infer_bucket_sort_grouping_operators.q.out, line 147 https://reviews.apache.org/r/29878/diff/1/?file=820775#file820775line147 should this have been only key,value ? Fixed BucketingSortingInferenceOptimizer to reflect grouping set pruning On Jan. 16, 2015, 6:20 p.m., Ashutosh Chauhan wrote: ql/src/test/results/clientpositive/infer_bucket_sort_grouping_operators.q.out, line 334 https://reviews.apache.org/r/29878/diff/1/?file=820775#file820775line334 seems like agg column should not be present there? Can you check? Same with above. - Navis --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29878/#review68438 --- On Jan. 14, 2015, 9:13 a.m., Navis Ryu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29878/ --- (Updated Jan. 14, 2015, 9:13 a.m.) Review request for hive. Bugs: HIVE-9347 https://issues.apache.org/jira/browse/HIVE-9347 Repository: hive-git Description --- It looks like the query below returns incorrect results on Hive 0.13.1, but it was working fine on Hive 0.11. I have the following table: CREATE TABLE `t`( `category` int, `live` int, `comments` int) with the following data: hive select * from t; OK 3 0 2 2 0 2 8 0 2 The query: hive select category, max(live) live, max(comments) comments, rank() OVER (PARTITION BY category ORDER BY comments) rank1 FROM t GROUP BY category GROUPING SETS ((), (category)) HAVING max(comments) 0; return the following results: NULL1 48 1 2 1 49 1 3 1 49 1 8 1 49 1 When using grouping sets with the rank() function the max() function return incorrect results. Everything works fine if I remove grouping sets clause and split the query into two independent queries or remove the rank() function. This looks like a bug to me but please review. That said, I'm not sure if it's just Amazon issue or general Hive issue. Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java 4632f08 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorGroupByOperator.java 90b4b12 ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcFactory.java afd1738 ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java cea86df ql/src/test/queries/clientpositive/groupby_grouping_window.q PRE-CREATION ql/src/test/results/clientpositive/annotate_stats_groupby.q.out 89dd1de ql/src/test/results/clientpositive/annotate_stats_groupby2.q.out c3bf0d8 ql/src/test/results/clientpositive/groupby_cube1.q.out 4f44d4f ql/src/test/results/clientpositive/groupby_grouping_sets2.q.out 0a21dbe ql/src/test/results/clientpositive/groupby_grouping_sets3.q.out 3597609 ql/src/test/results/clientpositive/groupby_grouping_sets4.q.out d1be46d ql/src/test/results/clientpositive/groupby_grouping_sets5.q.out 6d11add ql/src/test/results/clientpositive/groupby_grouping_sets6.q.out d2ff112 ql/src/test/results/clientpositive/groupby_rollup1.q.out 0108ce0 ql/src/test/results/clientpositive/infer_bucket_sort_grouping_operators.q.out 301b90c Diff: https://reviews.apache.org/r/29878/diff/ Testing --- Thanks, Navis Ryu
Re: Review Request 29878: Bug with max() together with rank() and grouping sets
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29878/ --- (Updated Jan. 19, 2015, 1:10 a.m.) Review request for hive. Changes --- Addressed comments Bugs: HIVE-9347 https://issues.apache.org/jira/browse/HIVE-9347 Repository: hive-git Description --- It looks like the query below returns incorrect results on Hive 0.13.1, but it was working fine on Hive 0.11. I have the following table: CREATE TABLE `t`( `category` int, `live` int, `comments` int) with the following data: hive select * from t; OK 3 0 2 2 0 2 8 0 2 The query: hive select category, max(live) live, max(comments) comments, rank() OVER (PARTITION BY category ORDER BY comments) rank1 FROM t GROUP BY category GROUPING SETS ((), (category)) HAVING max(comments) 0; return the following results: NULL1 48 1 2 1 49 1 3 1 49 1 8 1 49 1 When using grouping sets with the rank() function the max() function return incorrect results. Everything works fine if I remove grouping sets clause and split the query into two independent queries or remove the rank() function. This looks like a bug to me but please review. That said, I'm not sure if it's just Amazon issue or general Hive issue. Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java 4632f08 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorGroupByOperator.java 90b4b12 ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcFactory.java afd1738 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/BucketingSortingInferenceOptimizer.java 87fba2d ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/BucketingSortingOpProcFactory.java 82f4243 ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java b93a293 ql/src/java/org/apache/hadoop/hive/ql/plan/GroupByDesc.java 7a0b0da Diff: https://reviews.apache.org/r/29878/diff/ Testing --- Thanks, Navis Ryu
Re: Review Request 29878: Bug with max() together with rank() and grouping sets
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29878/#review68438 --- overall looks good. Few minor comments. ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java https://reviews.apache.org/r/29878/#comment112662 can be marked as transient. ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java https://reviews.apache.org/r/29878/#comment112661 this come from desc, can be marked as transient. ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java https://reviews.apache.org/r/29878/#comment112658 this comes from Desc, can be marked as transient. ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java https://reviews.apache.org/r/29878/#comment112659 this also comes from desc, can be marked as transient. ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java https://reviews.apache.org/r/29878/#comment112663 It will be good to add comments about what these variables represents. ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java https://reviews.apache.org/r/29878/#comment112660 This one too. ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java https://reviews.apache.org/r/29878/#comment112647 This can be marked as transient too, since its initialized only in initializeOp() which is called on backend. ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java https://reviews.apache.org/r/29878/#comment112664 Can you add a comment, about this logic. Not obvious on cursory reading. ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorGroupByOperator.java https://reviews.apache.org/r/29878/#comment112666 transient ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorGroupByOperator.java https://reviews.apache.org/r/29878/#comment112668 comment about this logic. ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java https://reviews.apache.org/r/29878/#comment112670 nit: whitespace at EOL. ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java https://reviews.apache.org/r/29878/#comment112671 nit: whitespace at EOL. ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java https://reviews.apache.org/r/29878/#comment112672 nit: whitespace at EOL. ql/src/test/results/clientpositive/infer_bucket_sort_grouping_operators.q.out https://reviews.apache.org/r/29878/#comment112673 should this have been only key,value ? ql/src/test/results/clientpositive/infer_bucket_sort_grouping_operators.q.out https://reviews.apache.org/r/29878/#comment112675 seems like agg column should not be present there? Can you check? - Ashutosh Chauhan On Jan. 14, 2015, 9:13 a.m., Navis Ryu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29878/ --- (Updated Jan. 14, 2015, 9:13 a.m.) Review request for hive. Bugs: HIVE-9347 https://issues.apache.org/jira/browse/HIVE-9347 Repository: hive-git Description --- It looks like the query below returns incorrect results on Hive 0.13.1, but it was working fine on Hive 0.11. I have the following table: CREATE TABLE `t`( `category` int, `live` int, `comments` int) with the following data: hive select * from t; OK 3 0 2 2 0 2 8 0 2 The query: hive select category, max(live) live, max(comments) comments, rank() OVER (PARTITION BY category ORDER BY comments) rank1 FROM t GROUP BY category GROUPING SETS ((), (category)) HAVING max(comments) 0; return the following results: NULL1 48 1 2 1 49 1 3 1 49 1 8 1 49 1 When using grouping sets with the rank() function the max() function return incorrect results. Everything works fine if I remove grouping sets clause and split the query into two independent queries or remove the rank() function. This looks like a bug to me but please review. That said, I'm not sure if it's just Amazon issue or general Hive issue. Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java 4632f08 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorGroupByOperator.java 90b4b12 ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcFactory.java afd1738 ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java cea86df ql/src/test/queries/clientpositive/groupby_grouping_window.q PRE-CREATION ql/src/test/results/clientpositive/annotate_stats_groupby.q.out 89dd1de ql/src/test/results/clientpositive/annotate_stats_groupby2.q.out c3bf0d8