Re: Review Request 55776: Eliminate unbounded memory usage for orderBy and groupBy in Hive on Spark

2017-01-20 Thread Chao Sun

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55776/#review162487
---


Ship it!




+1. Patch looks good to me, even though a little concerned by the possible 
performance downgrade. Please file follow up JIRAs for the TODO.
It also may be good to have Rui had a look before this is committed. Thanks

- Chao Sun


On Jan. 20, 2017, 6:07 p.m., Xuefu Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55776/
> ---
> 
> (Updated Jan. 20, 2017, 6:07 p.m.)
> 
> 
> Review request for hive, Chao Sun and Rui Li.
> 
> 
> Bugs: HIVE-15580
> https://issues.apache.org/jira/browse/HIVE-15580
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> See JIRA description.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/GroupByShuffler.java 
> e128dd2 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java 
> eeb4443 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunctionResultList.java
>  d57cac4 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/ReduceTran.java 3d56876 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/ShuffleTran.java a774395 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SortByShuffler.java 
> 997ab7e 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 
> 66ffe5d 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkReduceRecordHandler.java
>  0d31e5f 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkShuffler.java 40e251f 
>   ql/src/test/queries/clientpositive/union_top_level.q d93fe38 
>   ql/src/test/results/clientpositive/llap/union_top_level.q.out b48ab83 
>   ql/src/test/results/clientpositive/spark/lateral_view_explode2.q.out 
> 65a6e3e 
>   ql/src/test/results/clientpositive/spark/union_remove_25.q.out 9fec1d4 
>   ql/src/test/results/clientpositive/spark/union_top_level.q.out c9cb5d3 
>   ql/src/test/results/clientpositive/spark/vector_outer_join5.q.out 9e1742f 
> 
> Diff: https://reviews.apache.org/r/55776/diff/
> 
> 
> Testing
> ---
> 
> All test passed
> 
> 
> Thanks,
> 
> Xuefu Zhang
> 
>



Re: Review Request 55776: Eliminate unbounded memory usage for orderBy and groupBy in Hive on Spark

2017-01-20 Thread Xuefu Zhang


On Jan. 20, 2017, 6:26 p.m., Xuefu Zhang wrote:
> > I think we also need to update 
> > ql/src/test/results/clientpositive/union_top_level.q.out
> 
> Xuefu Zhang wrote:
> No. I verified that MR's plan and result don't change at all. This is 
> because the keys are the same for group by and order by.
> 
> Chao Sun wrote:
> Hmm.. I'm surprised. We changed the input qfile and how come the result 
> is not changed?
> 
> Xuefu Zhang wrote:
> MR group by is also sorted, so the order by is something not needed so 
> eliminated during optimization. So you, the test didn't fail in the Jenkins 
> result.
> 
> Chao Sun wrote:
> No, I mean the input query is changed, so the output should also be 
> changed. If you look at the MR output qfile, it still has
> ```
> PREHOOK: query: explain
> select * from (select s1.key as k, s2.value as v from src s1 join src s2 
> on (s1.key = s2.key) limit 10)a
> union all
> select * from (select s1.key as k, s2.value as v from src s1 join src s2 
> on (s1.key = s2.key) limit 10)b
> PREHOOK: type: QUERY
> POSTHOOK: query: explain
> select * from (select s1.key as k, s2.value as v from src s1 join src s2 
> on (s1.key = s2.key) limit 10)a
> union all
> select * from (select s1.key as k, s2.value as v from src s1 join src s2 
> on (s1.key = s2.key) limit 10)b
> POSTHOOK: type: QUERY
> ```
> which suggest the test is not triggered on the MR path. Anyway, maybe the 
> test is turned off for MR.

Yeah, Got it. Maybe. I don't believe that's a blocker. We can file a followup 
JIRA for this. Do you have any other comments?


- Xuefu


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55776/#review162449
---


On Jan. 20, 2017, 6:07 p.m., Xuefu Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55776/
> ---
> 
> (Updated Jan. 20, 2017, 6:07 p.m.)
> 
> 
> Review request for hive, Chao Sun and Rui Li.
> 
> 
> Bugs: HIVE-15580
> https://issues.apache.org/jira/browse/HIVE-15580
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> See JIRA description.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/GroupByShuffler.java 
> e128dd2 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java 
> eeb4443 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunctionResultList.java
>  d57cac4 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/ReduceTran.java 3d56876 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/ShuffleTran.java a774395 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SortByShuffler.java 
> 997ab7e 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 
> 66ffe5d 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkReduceRecordHandler.java
>  0d31e5f 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkShuffler.java 40e251f 
>   ql/src/test/queries/clientpositive/union_top_level.q d93fe38 
>   ql/src/test/results/clientpositive/llap/union_top_level.q.out b48ab83 
>   ql/src/test/results/clientpositive/spark/lateral_view_explode2.q.out 
> 65a6e3e 
>   ql/src/test/results/clientpositive/spark/union_remove_25.q.out 9fec1d4 
>   ql/src/test/results/clientpositive/spark/union_top_level.q.out c9cb5d3 
>   ql/src/test/results/clientpositive/spark/vector_outer_join5.q.out 9e1742f 
> 
> Diff: https://reviews.apache.org/r/55776/diff/
> 
> 
> Testing
> ---
> 
> All test passed
> 
> 
> Thanks,
> 
> Xuefu Zhang
> 
>



Re: Review Request 55776: Eliminate unbounded memory usage for orderBy and groupBy in Hive on Spark

2017-01-20 Thread Chao Sun


On Jan. 20, 2017, 6:26 p.m., Xuefu Zhang wrote:
> > I think we also need to update 
> > ql/src/test/results/clientpositive/union_top_level.q.out
> 
> Xuefu Zhang wrote:
> No. I verified that MR's plan and result don't change at all. This is 
> because the keys are the same for group by and order by.
> 
> Chao Sun wrote:
> Hmm.. I'm surprised. We changed the input qfile and how come the result 
> is not changed?
> 
> Xuefu Zhang wrote:
> MR group by is also sorted, so the order by is something not needed so 
> eliminated during optimization. So you, the test didn't fail in the Jenkins 
> result.

No, I mean the input query is changed, so the output should also be changed. If 
you look at the MR output qfile, it still has
```
PREHOOK: query: explain
select * from (select s1.key as k, s2.value as v from src s1 join src s2 on 
(s1.key = s2.key) limit 10)a
union all
select * from (select s1.key as k, s2.value as v from src s1 join src s2 on 
(s1.key = s2.key) limit 10)b
PREHOOK: type: QUERY
POSTHOOK: query: explain
select * from (select s1.key as k, s2.value as v from src s1 join src s2 on 
(s1.key = s2.key) limit 10)a
union all
select * from (select s1.key as k, s2.value as v from src s1 join src s2 on 
(s1.key = s2.key) limit 10)b
POSTHOOK: type: QUERY
```
which suggest the test is not triggered on the MR path. Anyway, maybe the test 
is turned off for MR.


- Chao


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55776/#review162449
---


On Jan. 20, 2017, 6:07 p.m., Xuefu Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55776/
> ---
> 
> (Updated Jan. 20, 2017, 6:07 p.m.)
> 
> 
> Review request for hive, Chao Sun and Rui Li.
> 
> 
> Bugs: HIVE-15580
> https://issues.apache.org/jira/browse/HIVE-15580
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> See JIRA description.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/GroupByShuffler.java 
> e128dd2 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java 
> eeb4443 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunctionResultList.java
>  d57cac4 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/ReduceTran.java 3d56876 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/ShuffleTran.java a774395 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SortByShuffler.java 
> 997ab7e 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 
> 66ffe5d 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkReduceRecordHandler.java
>  0d31e5f 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkShuffler.java 40e251f 
>   ql/src/test/queries/clientpositive/union_top_level.q d93fe38 
>   ql/src/test/results/clientpositive/llap/union_top_level.q.out b48ab83 
>   ql/src/test/results/clientpositive/spark/lateral_view_explode2.q.out 
> 65a6e3e 
>   ql/src/test/results/clientpositive/spark/union_remove_25.q.out 9fec1d4 
>   ql/src/test/results/clientpositive/spark/union_top_level.q.out c9cb5d3 
>   ql/src/test/results/clientpositive/spark/vector_outer_join5.q.out 9e1742f 
> 
> Diff: https://reviews.apache.org/r/55776/diff/
> 
> 
> Testing
> ---
> 
> All test passed
> 
> 
> Thanks,
> 
> Xuefu Zhang
> 
>



Re: Review Request 55776: Eliminate unbounded memory usage for orderBy and groupBy in Hive on Spark

2017-01-20 Thread Xuefu Zhang


On Jan. 20, 2017, 6:26 p.m., Xuefu Zhang wrote:
> > I think we also need to update 
> > ql/src/test/results/clientpositive/union_top_level.q.out
> 
> Xuefu Zhang wrote:
> No. I verified that MR's plan and result don't change at all. This is 
> because the keys are the same for group by and order by.
> 
> Chao Sun wrote:
> Hmm.. I'm surprised. We changed the input qfile and how come the result 
> is not changed?

MR group by is also sorted, so the order by is something not needed so 
eliminated during optimization. So you, the test didn't fail in the Jenkins 
result.


- Xuefu


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55776/#review162449
---


On Jan. 20, 2017, 6:07 p.m., Xuefu Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55776/
> ---
> 
> (Updated Jan. 20, 2017, 6:07 p.m.)
> 
> 
> Review request for hive, Chao Sun and Rui Li.
> 
> 
> Bugs: HIVE-15580
> https://issues.apache.org/jira/browse/HIVE-15580
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> See JIRA description.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/GroupByShuffler.java 
> e128dd2 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java 
> eeb4443 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunctionResultList.java
>  d57cac4 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/ReduceTran.java 3d56876 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/ShuffleTran.java a774395 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SortByShuffler.java 
> 997ab7e 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 
> 66ffe5d 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkReduceRecordHandler.java
>  0d31e5f 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkShuffler.java 40e251f 
>   ql/src/test/queries/clientpositive/union_top_level.q d93fe38 
>   ql/src/test/results/clientpositive/llap/union_top_level.q.out b48ab83 
>   ql/src/test/results/clientpositive/spark/lateral_view_explode2.q.out 
> 65a6e3e 
>   ql/src/test/results/clientpositive/spark/union_remove_25.q.out 9fec1d4 
>   ql/src/test/results/clientpositive/spark/union_top_level.q.out c9cb5d3 
>   ql/src/test/results/clientpositive/spark/vector_outer_join5.q.out 9e1742f 
> 
> Diff: https://reviews.apache.org/r/55776/diff/
> 
> 
> Testing
> ---
> 
> All test passed
> 
> 
> Thanks,
> 
> Xuefu Zhang
> 
>



Re: Review Request 55776: Eliminate unbounded memory usage for orderBy and groupBy in Hive on Spark

2017-01-20 Thread Chao Sun


On Jan. 20, 2017, 6:26 p.m., Xuefu Zhang wrote:
> > I think we also need to update 
> > ql/src/test/results/clientpositive/union_top_level.q.out
> 
> Xuefu Zhang wrote:
> No. I verified that MR's plan and result don't change at all. This is 
> because the keys are the same for group by and order by.

Hmm.. I'm surprised. We changed the input qfile and how come the result is not 
changed?


- Chao


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55776/#review162449
---


On Jan. 20, 2017, 6:07 p.m., Xuefu Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55776/
> ---
> 
> (Updated Jan. 20, 2017, 6:07 p.m.)
> 
> 
> Review request for hive, Chao Sun and Rui Li.
> 
> 
> Bugs: HIVE-15580
> https://issues.apache.org/jira/browse/HIVE-15580
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> See JIRA description.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/GroupByShuffler.java 
> e128dd2 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java 
> eeb4443 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunctionResultList.java
>  d57cac4 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/ReduceTran.java 3d56876 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/ShuffleTran.java a774395 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SortByShuffler.java 
> 997ab7e 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 
> 66ffe5d 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkReduceRecordHandler.java
>  0d31e5f 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkShuffler.java 40e251f 
>   ql/src/test/queries/clientpositive/union_top_level.q d93fe38 
>   ql/src/test/results/clientpositive/llap/union_top_level.q.out b48ab83 
>   ql/src/test/results/clientpositive/spark/lateral_view_explode2.q.out 
> 65a6e3e 
>   ql/src/test/results/clientpositive/spark/union_remove_25.q.out 9fec1d4 
>   ql/src/test/results/clientpositive/spark/union_top_level.q.out c9cb5d3 
>   ql/src/test/results/clientpositive/spark/vector_outer_join5.q.out 9e1742f 
> 
> Diff: https://reviews.apache.org/r/55776/diff/
> 
> 
> Testing
> ---
> 
> All test passed
> 
> 
> Thanks,
> 
> Xuefu Zhang
> 
>



Re: Review Request 55776: Eliminate unbounded memory usage for orderBy and groupBy in Hive on Spark

2017-01-20 Thread Xuefu Zhang


On Jan. 20, 2017, 6:26 p.m., Xuefu Zhang wrote:
> > I think we also need to update 
> > ql/src/test/results/clientpositive/union_top_level.q.out

No. I verified that MR's plan and result don't change at all. This is because 
the keys are the same for group by and order by.


- Xuefu


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55776/#review162449
---


On Jan. 20, 2017, 6:07 p.m., Xuefu Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55776/
> ---
> 
> (Updated Jan. 20, 2017, 6:07 p.m.)
> 
> 
> Review request for hive, Chao Sun and Rui Li.
> 
> 
> Bugs: HIVE-15580
> https://issues.apache.org/jira/browse/HIVE-15580
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> See JIRA description.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/GroupByShuffler.java 
> e128dd2 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java 
> eeb4443 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunctionResultList.java
>  d57cac4 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/ReduceTran.java 3d56876 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/ShuffleTran.java a774395 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SortByShuffler.java 
> 997ab7e 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 
> 66ffe5d 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkReduceRecordHandler.java
>  0d31e5f 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkShuffler.java 40e251f 
>   ql/src/test/queries/clientpositive/union_top_level.q d93fe38 
>   ql/src/test/results/clientpositive/llap/union_top_level.q.out b48ab83 
>   ql/src/test/results/clientpositive/spark/lateral_view_explode2.q.out 
> 65a6e3e 
>   ql/src/test/results/clientpositive/spark/union_remove_25.q.out 9fec1d4 
>   ql/src/test/results/clientpositive/spark/union_top_level.q.out c9cb5d3 
>   ql/src/test/results/clientpositive/spark/vector_outer_join5.q.out 9e1742f 
> 
> Diff: https://reviews.apache.org/r/55776/diff/
> 
> 
> Testing
> ---
> 
> All test passed
> 
> 
> Thanks,
> 
> Xuefu Zhang
> 
>



Re: Review Request 55776: Eliminate unbounded memory usage for orderBy and groupBy in Hive on Spark

2017-01-20 Thread Xuefu Zhang


> On Jan. 20, 2017, 6:26 p.m., Chao Sun wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/GroupByShuffler.java, line 
> > 31
> > 
> >
> > Is it possible that `numPartitions` equals to 0?

No. If partition number is zero, that means no partition. Then we will not even 
get here. Nevertheless, if it's set to 0, we take 1 instead.


> On Jan. 20, 2017, 6:26 p.m., Chao Sun wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/GroupByShuffler.java, line 
> > 34
> > 
> >
> > I wonder whether this also has some extra cost comparing to the 
> > original `groupByKey`, since it needs to sort all records by key in a 
> > single partition, right?

Well, we don't know which one performs better yet. 
repartitionAndSortWithinPartitions() brings extra softing, but it eliminates 
grouping in groupByKey(). Also, groupByKey() has unbounded memory usage, which 
is the problem we are tryig to solve. As described in the JIRA description. We 
will follow up with performance testing, and may provide an option to use 
either groupBy() which might be more performing but w/ unlimitted memory usage 
or the new way where memory usage is bounded.


- Xuefu


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55776/#review162449
---


On Jan. 20, 2017, 6:07 p.m., Xuefu Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55776/
> ---
> 
> (Updated Jan. 20, 2017, 6:07 p.m.)
> 
> 
> Review request for hive, Chao Sun and Rui Li.
> 
> 
> Bugs: HIVE-15580
> https://issues.apache.org/jira/browse/HIVE-15580
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> See JIRA description.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/GroupByShuffler.java 
> e128dd2 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java 
> eeb4443 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunctionResultList.java
>  d57cac4 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/ReduceTran.java 3d56876 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/ShuffleTran.java a774395 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SortByShuffler.java 
> 997ab7e 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 
> 66ffe5d 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkReduceRecordHandler.java
>  0d31e5f 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkShuffler.java 40e251f 
>   ql/src/test/queries/clientpositive/union_top_level.q d93fe38 
>   ql/src/test/results/clientpositive/llap/union_top_level.q.out b48ab83 
>   ql/src/test/results/clientpositive/spark/lateral_view_explode2.q.out 
> 65a6e3e 
>   ql/src/test/results/clientpositive/spark/union_remove_25.q.out 9fec1d4 
>   ql/src/test/results/clientpositive/spark/union_top_level.q.out c9cb5d3 
>   ql/src/test/results/clientpositive/spark/vector_outer_join5.q.out 9e1742f 
> 
> Diff: https://reviews.apache.org/r/55776/diff/
> 
> 
> Testing
> ---
> 
> All test passed
> 
> 
> Thanks,
> 
> Xuefu Zhang
> 
>



Re: Review Request 55776: Eliminate unbounded memory usage for orderBy and groupBy in Hive on Spark

2017-01-20 Thread Chao Sun

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55776/#review162449
---




ql/src/java/org/apache/hadoop/hive/ql/exec/spark/GroupByShuffler.java (line 31)


Is it possible that `numPartitions` equals to 0?



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/GroupByShuffler.java (line 34)


I wonder whether this also has some extra cost comparing to the original 
`groupByKey`, since it needs to sort all records by key in a single partition, 
right?


I think we also need to update 
ql/src/test/results/clientpositive/union_top_level.q.out

- Chao Sun


On Jan. 20, 2017, 6:07 p.m., Xuefu Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55776/
> ---
> 
> (Updated Jan. 20, 2017, 6:07 p.m.)
> 
> 
> Review request for hive, Chao Sun and Rui Li.
> 
> 
> Bugs: HIVE-15580
> https://issues.apache.org/jira/browse/HIVE-15580
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> See JIRA description.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/GroupByShuffler.java 
> e128dd2 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java 
> eeb4443 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunctionResultList.java
>  d57cac4 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/ReduceTran.java 3d56876 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/ShuffleTran.java a774395 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SortByShuffler.java 
> 997ab7e 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 
> 66ffe5d 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkReduceRecordHandler.java
>  0d31e5f 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkShuffler.java 40e251f 
>   ql/src/test/queries/clientpositive/union_top_level.q d93fe38 
>   ql/src/test/results/clientpositive/llap/union_top_level.q.out b48ab83 
>   ql/src/test/results/clientpositive/spark/lateral_view_explode2.q.out 
> 65a6e3e 
>   ql/src/test/results/clientpositive/spark/union_remove_25.q.out 9fec1d4 
>   ql/src/test/results/clientpositive/spark/union_top_level.q.out c9cb5d3 
>   ql/src/test/results/clientpositive/spark/vector_outer_join5.q.out 9e1742f 
> 
> Diff: https://reviews.apache.org/r/55776/diff/
> 
> 
> Testing
> ---
> 
> All test passed
> 
> 
> Thanks,
> 
> Xuefu Zhang
> 
>



Review Request 55776: Eliminate unbounded memory usage for orderBy and groupBy in Hive on Spark

2017-01-20 Thread Xuefu Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55776/
---

Review request for hive, Chao Sun and Rui Li.


Bugs: HIVE-15580
https://issues.apache.org/jira/browse/HIVE-15580


Repository: hive-git


Description
---

See JIRA description.


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/GroupByShuffler.java e128dd2 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java 
eeb4443 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunctionResultList.java
 d57cac4 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/ReduceTran.java 3d56876 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/ShuffleTran.java a774395 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SortByShuffler.java 997ab7e 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 
66ffe5d 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkReduceRecordHandler.java 
0d31e5f 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkShuffler.java 40e251f 
  ql/src/test/queries/clientpositive/union_top_level.q d93fe38 
  ql/src/test/results/clientpositive/llap/union_top_level.q.out b48ab83 
  ql/src/test/results/clientpositive/spark/lateral_view_explode2.q.out 65a6e3e 
  ql/src/test/results/clientpositive/spark/union_remove_25.q.out 9fec1d4 
  ql/src/test/results/clientpositive/spark/union_top_level.q.out c9cb5d3 
  ql/src/test/results/clientpositive/spark/vector_outer_join5.q.out 9e1742f 

Diff: https://reviews.apache.org/r/55776/diff/


Testing
---

All test passed


Thanks,

Xuefu Zhang