Re: Review Request 63427: HIVE-17396

2017-11-02 Thread Deepak Jaiswal


> On Nov. 1, 2017, 3:48 a.m., Lefty Leverenz wrote:
> > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
> > Lines 3017-3018 (patched)
> > 
> >
> > Please spell out the acronyms TS, DPP, and nDVs in the parameter 
> > description.
> > 
> > Also, the description isn't clear to me -- "to control if ..." what?  
> > Something about reduction, or feeding?  (Forgive my ignorance, I'm just 
> > looking at the grammatical structure of the sentence along with the 
> > parameter name.)
> 
> Deepak Jaiswal wrote:
> Thanks for the feedback. How about this below?
> 
> The factor to decide if semijoin branch feeding into a TableScan which 
> has an outgoing Dynamic Partition pruning(DPP) branch based on number of 
> distinct values.
> 
> Lefty Leverenz wrote:
> Is "feeding" what gets decided?  (In other words, should it be "feeds"?)  
> If not, what's the verb that "decide if" refers to?
> 
> Thanks for spelling out the acronyms.  Keeping DPP in parentheses is 
> good, although you need a space before the opening parenthesis.  And maybe 
> "pruning" should be "Pruning" just for consistency.

Thanks. Let me update the patch.


- Deepak


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63427/#review189778
---


On Oct. 31, 2017, 1:19 a.m., Deepak Jaiswal wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63427/
> ---
> 
> (Updated Oct. 31, 2017, 1:19 a.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan and Jason Dere.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Dynamic Semijoin Reduction : markSemiJoinForDPP marks unwanted semijoin 
> branches
> 
> In method markSemiJoinForDPP (HIVE-17399), the nDVs comparison should not 
> have equality as there is a chance that the values are same on both sides and 
> the branch is still marked as good when it shouldn't be.
> Add a configurable factor to see how useful this is if nDVs on smaller side 
> are only slightly less than that on TS side.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 6631a6e45d 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java da30c3b642 
>   ql/src/test/queries/clientpositive/dynamic_semijoin_reduction.q 6cc0a7f7a9 
>   ql/src/test/results/clientpositive/llap/dynamic_semijoin_reduction.q.out 
> 1a1a4d9b2d 
> 
> 
> Diff: https://reviews.apache.org/r/63427/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Deepak Jaiswal
> 
>



Re: Review Request 63427: HIVE-17396

2017-11-02 Thread Lefty Leverenz


> On Nov. 1, 2017, 3:48 a.m., Lefty Leverenz wrote:
> >

Nit:  This RB summary has the wrong JIRA number -- it should be 17936, not 
17396.

But the first line of the description has the correct JIRA text, so it's easy 
enough to find with a search.


- Lefty


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63427/#review189778
---


On Oct. 31, 2017, 1:19 a.m., Deepak Jaiswal wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63427/
> ---
> 
> (Updated Oct. 31, 2017, 1:19 a.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan and Jason Dere.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Dynamic Semijoin Reduction : markSemiJoinForDPP marks unwanted semijoin 
> branches
> 
> In method markSemiJoinForDPP (HIVE-17399), the nDVs comparison should not 
> have equality as there is a chance that the values are same on both sides and 
> the branch is still marked as good when it shouldn't be.
> Add a configurable factor to see how useful this is if nDVs on smaller side 
> are only slightly less than that on TS side.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 6631a6e45d 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java da30c3b642 
>   ql/src/test/queries/clientpositive/dynamic_semijoin_reduction.q 6cc0a7f7a9 
>   ql/src/test/results/clientpositive/llap/dynamic_semijoin_reduction.q.out 
> 1a1a4d9b2d 
> 
> 
> Diff: https://reviews.apache.org/r/63427/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Deepak Jaiswal
> 
>



Re: Review Request 63427: HIVE-17396

2017-11-02 Thread Lefty Leverenz


> On Nov. 1, 2017, 3:48 a.m., Lefty Leverenz wrote:
> > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
> > Lines 3017-3018 (patched)
> > 
> >
> > Please spell out the acronyms TS, DPP, and nDVs in the parameter 
> > description.
> > 
> > Also, the description isn't clear to me -- "to control if ..." what?  
> > Something about reduction, or feeding?  (Forgive my ignorance, I'm just 
> > looking at the grammatical structure of the sentence along with the 
> > parameter name.)
> 
> Deepak Jaiswal wrote:
> Thanks for the feedback. How about this below?
> 
> The factor to decide if semijoin branch feeding into a TableScan which 
> has an outgoing Dynamic Partition pruning(DPP) branch based on number of 
> distinct values.

Is "feeding" what gets decided?  (In other words, should it be "feeds"?)  If 
not, what's the verb that "decide if" refers to?

Thanks for spelling out the acronyms.  Keeping DPP in parentheses is good, 
although you need a space before the opening parenthesis.  And maybe "pruning" 
should be "Pruning" just for consistency.


- Lefty


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63427/#review189778
---


On Oct. 31, 2017, 1:19 a.m., Deepak Jaiswal wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63427/
> ---
> 
> (Updated Oct. 31, 2017, 1:19 a.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan and Jason Dere.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Dynamic Semijoin Reduction : markSemiJoinForDPP marks unwanted semijoin 
> branches
> 
> In method markSemiJoinForDPP (HIVE-17399), the nDVs comparison should not 
> have equality as there is a chance that the values are same on both sides and 
> the branch is still marked as good when it shouldn't be.
> Add a configurable factor to see how useful this is if nDVs on smaller side 
> are only slightly less than that on TS side.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 6631a6e45d 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java da30c3b642 
>   ql/src/test/queries/clientpositive/dynamic_semijoin_reduction.q 6cc0a7f7a9 
>   ql/src/test/results/clientpositive/llap/dynamic_semijoin_reduction.q.out 
> 1a1a4d9b2d 
> 
> 
> Diff: https://reviews.apache.org/r/63427/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Deepak Jaiswal
> 
>



Re: Review Request 63427: HIVE-17396

2017-11-01 Thread Deepak Jaiswal


> On Nov. 1, 2017, 3:48 a.m., Lefty Leverenz wrote:
> > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
> > Lines 3017-3018 (patched)
> > 
> >
> > Please spell out the acronyms TS, DPP, and nDVs in the parameter 
> > description.
> > 
> > Also, the description isn't clear to me -- "to control if ..." what?  
> > Something about reduction, or feeding?  (Forgive my ignorance, I'm just 
> > looking at the grammatical structure of the sentence along with the 
> > parameter name.)

Thanks for the feedback. How about this below?

The factor to decide if semijoin branch feeding into a TableScan which has an 
outgoing Dynamic Partition pruning(DPP) branch based on number of distinct 
values.


- Deepak


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63427/#review189778
---


On Oct. 31, 2017, 1:19 a.m., Deepak Jaiswal wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63427/
> ---
> 
> (Updated Oct. 31, 2017, 1:19 a.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan and Jason Dere.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Dynamic Semijoin Reduction : markSemiJoinForDPP marks unwanted semijoin 
> branches
> 
> In method markSemiJoinForDPP (HIVE-17399), the nDVs comparison should not 
> have equality as there is a chance that the values are same on both sides and 
> the branch is still marked as good when it shouldn't be.
> Add a configurable factor to see how useful this is if nDVs on smaller side 
> are only slightly less than that on TS side.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 6631a6e45d 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java da30c3b642 
>   ql/src/test/queries/clientpositive/dynamic_semijoin_reduction.q 6cc0a7f7a9 
>   ql/src/test/results/clientpositive/llap/dynamic_semijoin_reduction.q.out 
> 1a1a4d9b2d 
> 
> 
> Diff: https://reviews.apache.org/r/63427/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Deepak Jaiswal
> 
>



Re: Review Request 63427: HIVE-17396

2017-10-31 Thread Lefty Leverenz

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63427/#review189778
---




common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
Lines 3017-3018 (patched)


Please spell out the acronyms TS, DPP, and nDVs in the parameter 
description.

Also, the description isn't clear to me -- "to control if ..." what?  
Something about reduction, or feeding?  (Forgive my ignorance, I'm just looking 
at the grammatical structure of the sentence along with the parameter name.)


- Lefty Leverenz


On Oct. 31, 2017, 1:19 a.m., Deepak Jaiswal wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63427/
> ---
> 
> (Updated Oct. 31, 2017, 1:19 a.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan and Jason Dere.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Dynamic Semijoin Reduction : markSemiJoinForDPP marks unwanted semijoin 
> branches
> 
> In method markSemiJoinForDPP (HIVE-17399), the nDVs comparison should not 
> have equality as there is a chance that the values are same on both sides and 
> the branch is still marked as good when it shouldn't be.
> Add a configurable factor to see how useful this is if nDVs on smaller side 
> are only slightly less than that on TS side.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 6631a6e45d 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java da30c3b642 
>   ql/src/test/queries/clientpositive/dynamic_semijoin_reduction.q 6cc0a7f7a9 
>   ql/src/test/results/clientpositive/llap/dynamic_semijoin_reduction.q.out 
> 1a1a4d9b2d 
> 
> 
> Diff: https://reviews.apache.org/r/63427/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Deepak Jaiswal
> 
>



Re: Review Request 63427: HIVE-17396

2017-10-30 Thread Deepak Jaiswal

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63427/
---

(Updated Oct. 31, 2017, 1:19 a.m.)


Review request for hive, Ashutosh Chauhan and Jason Dere.


Changes
---

Implemented comments from Jason.


Repository: hive-git


Description
---

Dynamic Semijoin Reduction : markSemiJoinForDPP marks unwanted semijoin branches

In method markSemiJoinForDPP (HIVE-17399), the nDVs comparison should not have 
equality as there is a chance that the values are same on both sides and the 
branch is still marked as good when it shouldn't be.
Add a configurable factor to see how useful this is if nDVs on smaller side are 
only slightly less than that on TS side.


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 6631a6e45d 
  ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java da30c3b642 
  ql/src/test/queries/clientpositive/dynamic_semijoin_reduction.q 6cc0a7f7a9 
  ql/src/test/results/clientpositive/llap/dynamic_semijoin_reduction.q.out 
1a1a4d9b2d 


Diff: https://reviews.apache.org/r/63427/diff/2/

Changes: https://reviews.apache.org/r/63427/diff/1-2/


Testing
---


Thanks,

Deepak Jaiswal



Re: Review Request 63427: HIVE-17396

2017-10-30 Thread Deepak Jaiswal


> On Oct. 30, 2017, 11:19 p.m., Jason Dere wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java
> > Line 1417 (original), 1417 (patched)
> > 
> >
> > Trying to think about how to think about this setting if we're going to 
> > use this for tuning. I think a better way of being able to think about this 
> > setting is, what kind of selectivity we want from the semijoin reduction 
> > before we decide it is worth keeping. For me this setting might be a bit 
> > more intuitive (basically float value between 0-1) - for example setting 
> > config to 0.5, compared to what you have now, where I think you would set 
> > it to 2.0.

Thanks. I will go with this approach.


> On Oct. 30, 2017, 11:19 p.m., Jason Dere wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java
> > Lines 1419 (patched)
> > 
> >
> > Some concerns about long-to-float conversion going on here .. nDVs is 
> > cast to float then multiplied to 1.0, and nDVsOfTS is also converted to 
> > float during this comparison. This could affect the comparisoin results. 
> > Maybe cast nDVsFactored to long when doing the comparision to be safe?

Will cast to long.


> On Oct. 30, 2017, 11:19 p.m., Jason Dere wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java
> > Lines 1421 (patched)
> > 
> >
> > If you are logging here, mention that setShouldRemove is being set. 
> > Would be more useful for someone looking at the logs.

Thanks for the suggestion.


- Deepak


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63427/#review189657
---


On Oct. 30, 2017, 7:09 p.m., Deepak Jaiswal wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63427/
> ---
> 
> (Updated Oct. 30, 2017, 7:09 p.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan and Jason Dere.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Dynamic Semijoin Reduction : markSemiJoinForDPP marks unwanted semijoin 
> branches
> 
> In method markSemiJoinForDPP (HIVE-17399), the nDVs comparison should not 
> have equality as there is a chance that the values are same on both sides and 
> the branch is still marked as good when it shouldn't be.
> Add a configurable factor to see how useful this is if nDVs on smaller side 
> are only slightly less than that on TS side.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 6631a6e45d 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java da30c3b642 
>   ql/src/test/queries/clientpositive/dynamic_semijoin_reduction.q 6cc0a7f7a9 
>   ql/src/test/results/clientpositive/llap/dynamic_semijoin_reduction.q.out 
> 1a1a4d9b2d 
> 
> 
> Diff: https://reviews.apache.org/r/63427/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Deepak Jaiswal
> 
>



Re: Review Request 63427: HIVE-17396

2017-10-30 Thread Jason Dere

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63427/#review189657
---




ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java
Line 1417 (original), 1417 (patched)


Trying to think about how to think about this setting if we're going to use 
this for tuning. I think a better way of being able to think about this setting 
is, what kind of selectivity we want from the semijoin reduction before we 
decide it is worth keeping. For me this setting might be a bit more intuitive 
(basically float value between 0-1) - for example setting config to 0.5, 
compared to what you have now, where I think you would set it to 2.0.



ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java
Lines 1419 (patched)


Some concerns about long-to-float conversion going on here .. nDVs is cast 
to float then multiplied to 1.0, and nDVsOfTS is also converted to float during 
this comparison. This could affect the comparisoin results. Maybe cast 
nDVsFactored to long when doing the comparision to be safe?



ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java
Lines 1421 (patched)


If you are logging here, mention that setShouldRemove is being set. Would 
be more useful for someone looking at the logs.


- Jason Dere


On Oct. 30, 2017, 7:09 p.m., Deepak Jaiswal wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63427/
> ---
> 
> (Updated Oct. 30, 2017, 7:09 p.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan and Jason Dere.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Dynamic Semijoin Reduction : markSemiJoinForDPP marks unwanted semijoin 
> branches
> 
> In method markSemiJoinForDPP (HIVE-17399), the nDVs comparison should not 
> have equality as there is a chance that the values are same on both sides and 
> the branch is still marked as good when it shouldn't be.
> Add a configurable factor to see how useful this is if nDVs on smaller side 
> are only slightly less than that on TS side.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 6631a6e45d 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java da30c3b642 
>   ql/src/test/queries/clientpositive/dynamic_semijoin_reduction.q 6cc0a7f7a9 
>   ql/src/test/results/clientpositive/llap/dynamic_semijoin_reduction.q.out 
> 1a1a4d9b2d 
> 
> 
> Diff: https://reviews.apache.org/r/63427/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Deepak Jaiswal
> 
>



Review Request 63427: HIVE-17396

2017-10-30 Thread Deepak Jaiswal

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63427/
---

Review request for hive and Jason Dere.


Repository: hive-git


Description
---

Dynamic Semijoin Reduction : markSemiJoinForDPP marks unwanted semijoin branches

In method markSemiJoinForDPP (HIVE-17399), the nDVs comparison should not have 
equality as there is a chance that the values are same on both sides and the 
branch is still marked as good when it shouldn't be.
Add a configurable factor to see how useful this is if nDVs on smaller side are 
only slightly less than that on TS side.


Diffs
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 6631a6e45d 
  ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java da30c3b642 
  ql/src/test/queries/clientpositive/dynamic_semijoin_reduction.q 6cc0a7f7a9 
  ql/src/test/results/clientpositive/llap/dynamic_semijoin_reduction.q.out 
1a1a4d9b2d 


Diff: https://reviews.apache.org/r/63427/diff/1/


Testing
---


Thanks,

Deepak Jaiswal