Re: Review Request 29878: Bug with max() together with rank() and grouping sets

2015-01-18 Thread Navis Ryu


 On Jan. 16, 2015, 6:20 p.m., Ashutosh Chauhan wrote:
  overall looks good. Few minor comments.

Thanks for the review. Fixed BucketingSortingInferenceOptimizer to reflect 
grouping set pruning and changed default value of grouping set id to -1 in 
semijoin cases.


 On Jan. 16, 2015, 6:20 p.m., Ashutosh Chauhan wrote:
  ql/src/test/results/clientpositive/infer_bucket_sort_grouping_operators.q.out,
   line 147
  https://reviews.apache.org/r/29878/diff/1/?file=820775#file820775line147
 
  should this have been only key,value ?

Fixed BucketingSortingInferenceOptimizer to reflect grouping set pruning


 On Jan. 16, 2015, 6:20 p.m., Ashutosh Chauhan wrote:
  ql/src/test/results/clientpositive/infer_bucket_sort_grouping_operators.q.out,
   line 334
  https://reviews.apache.org/r/29878/diff/1/?file=820775#file820775line334
 
  seems like agg column should not be present there? Can you check?

Same with above.


- Navis


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/29878/#review68438
---


On Jan. 14, 2015, 9:13 a.m., Navis Ryu wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/29878/
 ---
 
 (Updated Jan. 14, 2015, 9:13 a.m.)
 
 
 Review request for hive.
 
 
 Bugs: HIVE-9347
 https://issues.apache.org/jira/browse/HIVE-9347
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 It looks like the query below returns incorrect results on Hive 0.13.1, but 
 it was working fine on Hive 0.11. 
 
 I have the following table:
 CREATE  TABLE `t`(
   `category` int, 
   `live` int, 
   `comments` int)
 
 with the following data:
 hive select * from t;
 OK
 3   0   2
 2   0   2
 8   0   2
 
 The query:
 hive select category, max(live) live, max(comments) comments, rank() OVER 
 (PARTITION BY category ORDER BY comments) rank1
 FROM t
 GROUP BY category
 GROUPING SETS ((), (category))
 HAVING max(comments)  0;
 
 return the following results:
 
 NULL1   48  1
 2   1   49  1
 3   1   49  1
 8   1   49  1
 
 When using grouping sets with the rank() function the max() function return 
 incorrect results. Everything works fine if I remove grouping sets clause and 
 split the query into two independent queries or remove the rank() function.
 
 This looks like a bug to me but please review. That said, I'm not sure if 
 it's just Amazon issue or general Hive issue.
 
 
 Diffs
 -
 
   ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java 4632f08 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorGroupByOperator.java 
 90b4b12 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcFactory.java 
 afd1738 
   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java cea86df 
   ql/src/test/queries/clientpositive/groupby_grouping_window.q PRE-CREATION 
   ql/src/test/results/clientpositive/annotate_stats_groupby.q.out 89dd1de 
   ql/src/test/results/clientpositive/annotate_stats_groupby2.q.out c3bf0d8 
   ql/src/test/results/clientpositive/groupby_cube1.q.out 4f44d4f 
   ql/src/test/results/clientpositive/groupby_grouping_sets2.q.out 0a21dbe 
   ql/src/test/results/clientpositive/groupby_grouping_sets3.q.out 3597609 
   ql/src/test/results/clientpositive/groupby_grouping_sets4.q.out d1be46d 
   ql/src/test/results/clientpositive/groupby_grouping_sets5.q.out 6d11add 
   ql/src/test/results/clientpositive/groupby_grouping_sets6.q.out d2ff112 
   ql/src/test/results/clientpositive/groupby_rollup1.q.out 0108ce0 
   
 ql/src/test/results/clientpositive/infer_bucket_sort_grouping_operators.q.out 
 301b90c 
 
 Diff: https://reviews.apache.org/r/29878/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Navis Ryu
 




Re: Review Request 29878: Bug with max() together with rank() and grouping sets

2015-01-18 Thread Navis Ryu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/29878/
---

(Updated Jan. 19, 2015, 1:10 a.m.)


Review request for hive.


Changes
---

Addressed comments


Bugs: HIVE-9347
https://issues.apache.org/jira/browse/HIVE-9347


Repository: hive-git


Description
---

It looks like the query below returns incorrect results on Hive 0.13.1, but it 
was working fine on Hive 0.11. 

I have the following table:
CREATE  TABLE `t`(
  `category` int, 
  `live` int, 
  `comments` int)

with the following data:
hive select * from t;
OK
3   0   2
2   0   2
8   0   2

The query:
hive select category, max(live) live, max(comments) comments, rank() OVER 
(PARTITION BY category ORDER BY comments) rank1
FROM t
GROUP BY category
GROUPING SETS ((), (category))
HAVING max(comments)  0;

return the following results:

NULL1   48  1
2   1   49  1
3   1   49  1
8   1   49  1

When using grouping sets with the rank() function the max() function return 
incorrect results. Everything works fine if I remove grouping sets clause and 
split the query into two independent queries or remove the rank() function.

This looks like a bug to me but please review. That said, I'm not sure if it's 
just Amazon issue or general Hive issue.


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java 4632f08 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorGroupByOperator.java 
90b4b12 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcFactory.java 
afd1738 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/BucketingSortingInferenceOptimizer.java
 87fba2d 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/BucketingSortingOpProcFactory.java
 82f4243 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java b93a293 
  ql/src/java/org/apache/hadoop/hive/ql/plan/GroupByDesc.java 7a0b0da 

Diff: https://reviews.apache.org/r/29878/diff/


Testing
---


Thanks,

Navis Ryu



Re: Review Request 29878: Bug with max() together with rank() and grouping sets

2015-01-16 Thread Ashutosh Chauhan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/29878/#review68438
---


overall looks good. Few minor comments.


ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
https://reviews.apache.org/r/29878/#comment112662

can be marked as transient.



ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
https://reviews.apache.org/r/29878/#comment112661

this come from desc, can be marked as transient.



ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
https://reviews.apache.org/r/29878/#comment112658

this comes from Desc, can be marked as transient.



ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
https://reviews.apache.org/r/29878/#comment112659

this also comes from desc, can be marked as transient.



ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
https://reviews.apache.org/r/29878/#comment112663

It will be good to add comments about what these variables represents.



ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
https://reviews.apache.org/r/29878/#comment112660

This one too.



ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
https://reviews.apache.org/r/29878/#comment112647

This can be marked as transient too, since its initialized only in 
initializeOp() which is called on backend.



ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
https://reviews.apache.org/r/29878/#comment112664

Can you add a comment, about this logic. Not obvious on cursory reading.



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorGroupByOperator.java
https://reviews.apache.org/r/29878/#comment112666

transient



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorGroupByOperator.java
https://reviews.apache.org/r/29878/#comment112668

comment about this logic.



ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
https://reviews.apache.org/r/29878/#comment112670

nit: whitespace at EOL.



ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
https://reviews.apache.org/r/29878/#comment112671

nit: whitespace at EOL.



ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
https://reviews.apache.org/r/29878/#comment112672

nit: whitespace at EOL.



ql/src/test/results/clientpositive/infer_bucket_sort_grouping_operators.q.out
https://reviews.apache.org/r/29878/#comment112673

should this have been only key,value ?



ql/src/test/results/clientpositive/infer_bucket_sort_grouping_operators.q.out
https://reviews.apache.org/r/29878/#comment112675

seems like agg column should not be present there? Can you check?


- Ashutosh Chauhan


On Jan. 14, 2015, 9:13 a.m., Navis Ryu wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/29878/
 ---
 
 (Updated Jan. 14, 2015, 9:13 a.m.)
 
 
 Review request for hive.
 
 
 Bugs: HIVE-9347
 https://issues.apache.org/jira/browse/HIVE-9347
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 It looks like the query below returns incorrect results on Hive 0.13.1, but 
 it was working fine on Hive 0.11. 
 
 I have the following table:
 CREATE  TABLE `t`(
   `category` int, 
   `live` int, 
   `comments` int)
 
 with the following data:
 hive select * from t;
 OK
 3   0   2
 2   0   2
 8   0   2
 
 The query:
 hive select category, max(live) live, max(comments) comments, rank() OVER 
 (PARTITION BY category ORDER BY comments) rank1
 FROM t
 GROUP BY category
 GROUPING SETS ((), (category))
 HAVING max(comments)  0;
 
 return the following results:
 
 NULL1   48  1
 2   1   49  1
 3   1   49  1
 8   1   49  1
 
 When using grouping sets with the rank() function the max() function return 
 incorrect results. Everything works fine if I remove grouping sets clause and 
 split the query into two independent queries or remove the rank() function.
 
 This looks like a bug to me but please review. That said, I'm not sure if 
 it's just Amazon issue or general Hive issue.
 
 
 Diffs
 -
 
   ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java 4632f08 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorGroupByOperator.java 
 90b4b12 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcFactory.java 
 afd1738 
   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java cea86df 
   ql/src/test/queries/clientpositive/groupby_grouping_window.q PRE-CREATION 
   ql/src/test/results/clientpositive/annotate_stats_groupby.q.out 89dd1de 
   ql/src/test/results/clientpositive/annotate_stats_groupby2.q.out c3bf0d8