Re: Review Request 72499: HIVE-23446:LLAP: Reduce IPC connection misses to AM for short queries

2020-05-14 Thread Rajesh Balamohan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72499/
---

(Updated May 14, 2020, 7:14 a.m.)


Review request for hive, Ashutosh Chauhan and Gopal V.


Bugs: HIVE-23446
https://issues.apache.org/jira/browse/HIVE-23446


Repository: hive-git


Description
---

Currently UGI pool is maintained at QueryInfo level. However, when short 
queries and lots of AMs are there, it ends missing IPC connection cache. Too 
many connections are are also established. Patch tries to avoid that by 
maintaining this at ContainerRunner level. It retains the current behaviour of 
having multiple connection to same AM (otherwise can get bottlenecked on single 
connection)


Diffs (updated)
-

  
llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/ContainerRunnerImpl.java
 6a13b55e69 
  llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/QueryInfo.java 
00fed15d2b 
  
llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/QueryTracker.java 
eae8e08540 
  
llap-server/src/test/org/apache/hadoop/hive/llap/daemon/impl/TaskExecutorTestHelpers.java
 50dec4759e 


Diff: https://reviews.apache.org/r/72499/diff/3/

Changes: https://reviews.apache.org/r/72499/diff/2-3/


Testing
---


Thanks,

Rajesh Balamohan



Re: Review Request 72499: HIVE-23446:LLAP: Reduce IPC connection misses to AM for short queries

2020-05-14 Thread Rajesh Balamohan


> On May 14, 2020, 4:31 a.m., Ashutosh Chauhan wrote:
> > llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/ContainerRunnerImpl.java
> > Lines 646-651 (patched)
> > 
> >
> > Is this logic needed? You already have valueloader in get() which must 
> > return a ugi, so it cant be null.
> 
> Rajesh Balamohan wrote:
> Yes, value loader is for initial miss. This is to avoid single connection 
> becoming a contention for AM communication. 
> https://issues.apache.org/jira/browse/HIVE-16634
> 
> Ashutosh Chauhan wrote:
> Not sure I follow. Can you add comments in code to explain the need for 
> this?

Addded comment in recent upload of the patch. Earlier patch wasn't uploaded 
correctly.

Value loader would be returning the queue (not ugi directly). Queue can 
maintain set of connections to same AM. Depending on query pattern, we need 
multiple connections to AM (addressed in HIVE-16634). Actually we are retaining 
the same code here as it was in QueryInfo earlier.


> On May 14, 2020, 4:31 a.m., Ashutosh Chauhan wrote:
> > llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/ContainerRunnerImpl.java
> > Lines 663 (patched)
> > 
> >
> > if its null, then its programming error. Better to not do this null 
> > check and offer without checking for null.
> 
> Ashutosh Chauhan wrote:
> better to throw NPE then to leak ugi failing to return to pool.

This is not a leak. It is gc-able ugi in case someone returns the ugi after 
expiry.


- Rajesh


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72499/#review220748
---


On May 14, 2020, 7:14 a.m., Rajesh Balamohan wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72499/
> ---
> 
> (Updated May 14, 2020, 7:14 a.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan and Gopal V.
> 
> 
> Bugs: HIVE-23446
> https://issues.apache.org/jira/browse/HIVE-23446
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Currently UGI pool is maintained at QueryInfo level. However, when short 
> queries and lots of AMs are there, it ends missing IPC connection cache. Too 
> many connections are are also established. Patch tries to avoid that by 
> maintaining this at ContainerRunner level. It retains the current behaviour 
> of having multiple connection to same AM (otherwise can get bottlenecked on 
> single connection)
> 
> 
> Diffs
> -
> 
>   
> llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/ContainerRunnerImpl.java
>  6a13b55e69 
>   llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/QueryInfo.java 
> 00fed15d2b 
>   
> llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/QueryTracker.java
>  eae8e08540 
>   
> llap-server/src/test/org/apache/hadoop/hive/llap/daemon/impl/TaskExecutorTestHelpers.java
>  50dec4759e 
> 
> 
> Diff: https://reviews.apache.org/r/72499/diff/3/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Rajesh Balamohan
> 
>



Re: Review Request 72499: HIVE-23446:LLAP: Reduce IPC connection misses to AM for short queries

2020-05-14 Thread Rajesh Balamohan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72499/
---

(Updated May 14, 2020, 7:09 a.m.)


Review request for hive, Ashutosh Chauhan and Gopal V.


Changes
---

Addressing review comments. For some reason, earlier upload wasn't successful.


Bugs: HIVE-23446
https://issues.apache.org/jira/browse/HIVE-23446


Repository: hive-git


Description
---

Currently UGI pool is maintained at QueryInfo level. However, when short 
queries and lots of AMs are there, it ends missing IPC connection cache. Too 
many connections are are also established. Patch tries to avoid that by 
maintaining this at ContainerRunner level. It retains the current behaviour of 
having multiple connection to same AM (otherwise can get bottlenecked on single 
connection)


Diffs (updated)
-

  
llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/ContainerRunnerImpl.java
 6a13b55e69 
  llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/QueryInfo.java 
00fed15d2b 
  
llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/QueryTracker.java 
eae8e08540 
  
llap-server/src/test/org/apache/hadoop/hive/llap/daemon/impl/TaskExecutorTestHelpers.java
 50dec4759e 


Diff: https://reviews.apache.org/r/72499/diff/2/

Changes: https://reviews.apache.org/r/72499/diff/1-2/


Testing
---


Thanks,

Rajesh Balamohan



Re: Review Request 72499: HIVE-23446:LLAP: Reduce IPC connection misses to AM for short queries

2020-05-13 Thread Ashutosh Chauhan


> On May 14, 2020, 4:31 a.m., Ashutosh Chauhan wrote:
> > llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/ContainerRunnerImpl.java
> > Lines 610 (patched)
> > 
> >
> > What's the reason for time based expiry? Is it because UGI expires 
> > after 24 hrs?
> > Else, I would have expected long living cache with blocking queue of 
> > bounded size.
> > 
> > Queue should be bounded by number of executors anyways, since having 
> > more connections than executors probably won't be needed.
> 
> Rajesh Balamohan wrote:
> Since this is based on the AM. So if AM dies after sometime (due to 
> inactivity, as in no-DAG submissions), these UGIs will be auto purged after 
> 10 minutes.

In LLAP, AM doesn't die due to inactivity, its long living. It may die because 
of crash though. So, then should this expiry be longer. 3 hrs?


> On May 14, 2020, 4:31 a.m., Ashutosh Chauhan wrote:
> > llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/ContainerRunnerImpl.java
> > Lines 646-651 (patched)
> > 
> >
> > Is this logic needed? You already have valueloader in get() which must 
> > return a ugi, so it cant be null.
> 
> Rajesh Balamohan wrote:
> Yes, value loader is for initial miss. This is to avoid single connection 
> becoming a contention for AM communication. 
> https://issues.apache.org/jira/browse/HIVE-16634

Not sure I follow. Can you add comments in code to explain the need for this?


> On May 14, 2020, 4:31 a.m., Ashutosh Chauhan wrote:
> > llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/ContainerRunnerImpl.java
> > Lines 663 (patched)
> > 
> >
> > if its null, then its programming error. Better to not do this null 
> > check and offer without checking for null.

better to throw NPE then to leak ugi failing to return to pool.


- Ashutosh


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72499/#review220748
---


On May 12, 2020, 12:06 p.m., Rajesh Balamohan wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72499/
> ---
> 
> (Updated May 12, 2020, 12:06 p.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan and Gopal V.
> 
> 
> Bugs: HIVE-23446
> https://issues.apache.org/jira/browse/HIVE-23446
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Currently UGI pool is maintained at QueryInfo level. However, when short 
> queries and lots of AMs are there, it ends missing IPC connection cache. Too 
> many connections are are also established. Patch tries to avoid that by 
> maintaining this at ContainerRunner level. It retains the current behaviour 
> of having multiple connection to same AM (otherwise can get bottlenecked on 
> single connection)
> 
> 
> Diffs
> -
> 
>   
> llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/ContainerRunnerImpl.java
>  6a13b55e69 
>   llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/QueryInfo.java 
> 00fed15d2b 
>   
> llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/QueryTracker.java
>  eae8e08540 
>   
> llap-server/src/test/org/apache/hadoop/hive/llap/daemon/impl/TaskExecutorTestHelpers.java
>  50dec4759e 
> 
> 
> Diff: https://reviews.apache.org/r/72499/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Rajesh Balamohan
> 
>



Re: Review Request 72499: HIVE-23446:LLAP: Reduce IPC connection misses to AM for short queries

2020-05-13 Thread Rajesh Balamohan


> On May 14, 2020, 4:31 a.m., Ashutosh Chauhan wrote:
> > llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/ContainerRunnerImpl.java
> > Lines 638 (patched)
> > 
> >
> > Bound this queue by number of executors?

Will fix this in next patch.


- Rajesh


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72499/#review220748
---


On May 12, 2020, 12:06 p.m., Rajesh Balamohan wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72499/
> ---
> 
> (Updated May 12, 2020, 12:06 p.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan and Gopal V.
> 
> 
> Bugs: HIVE-23446
> https://issues.apache.org/jira/browse/HIVE-23446
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Currently UGI pool is maintained at QueryInfo level. However, when short 
> queries and lots of AMs are there, it ends missing IPC connection cache. Too 
> many connections are are also established. Patch tries to avoid that by 
> maintaining this at ContainerRunner level. It retains the current behaviour 
> of having multiple connection to same AM (otherwise can get bottlenecked on 
> single connection)
> 
> 
> Diffs
> -
> 
>   
> llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/ContainerRunnerImpl.java
>  6a13b55e69 
>   llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/QueryInfo.java 
> 00fed15d2b 
>   
> llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/QueryTracker.java
>  eae8e08540 
>   
> llap-server/src/test/org/apache/hadoop/hive/llap/daemon/impl/TaskExecutorTestHelpers.java
>  50dec4759e 
> 
> 
> Diff: https://reviews.apache.org/r/72499/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Rajesh Balamohan
> 
>



Re: Review Request 72499: HIVE-23446:LLAP: Reduce IPC connection misses to AM for short queries

2020-05-13 Thread Rajesh Balamohan


> On May 14, 2020, 4:31 a.m., Ashutosh Chauhan wrote:
> > llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/ContainerRunnerImpl.java
> > Lines 610 (patched)
> > 
> >
> > What's the reason for time based expiry? Is it because UGI expires 
> > after 24 hrs?
> > Else, I would have expected long living cache with blocking queue of 
> > bounded size.
> > 
> > Queue should be bounded by number of executors anyways, since having 
> > more connections than executors probably won't be needed.

Since this is based on the AM. So if AM dies after sometime (due to inactivity, 
as in no-DAG submissions), these UGIs will be auto purged after 10 minutes.


> On May 14, 2020, 4:31 a.m., Ashutosh Chauhan wrote:
> > llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/ContainerRunnerImpl.java
> > Lines 646-651 (patched)
> > 
> >
> > Is this logic needed? You already have valueloader in get() which must 
> > return a ugi, so it cant be null.

Yes, value loader is for initial miss. This is to avoid single connection 
becoming a contention for AM communication. 
https://issues.apache.org/jira/browse/HIVE-16634


- Rajesh


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72499/#review220748
---


On May 12, 2020, 12:06 p.m., Rajesh Balamohan wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72499/
> ---
> 
> (Updated May 12, 2020, 12:06 p.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan and Gopal V.
> 
> 
> Bugs: HIVE-23446
> https://issues.apache.org/jira/browse/HIVE-23446
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Currently UGI pool is maintained at QueryInfo level. However, when short 
> queries and lots of AMs are there, it ends missing IPC connection cache. Too 
> many connections are are also established. Patch tries to avoid that by 
> maintaining this at ContainerRunner level. It retains the current behaviour 
> of having multiple connection to same AM (otherwise can get bottlenecked on 
> single connection)
> 
> 
> Diffs
> -
> 
>   
> llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/ContainerRunnerImpl.java
>  6a13b55e69 
>   llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/QueryInfo.java 
> 00fed15d2b 
>   
> llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/QueryTracker.java
>  eae8e08540 
>   
> llap-server/src/test/org/apache/hadoop/hive/llap/daemon/impl/TaskExecutorTestHelpers.java
>  50dec4759e 
> 
> 
> Diff: https://reviews.apache.org/r/72499/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Rajesh Balamohan
> 
>



Re: Review Request 72499: HIVE-23446:LLAP: Reduce IPC connection misses to AM for short queries

2020-05-13 Thread Ashutosh Chauhan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72499/#review220748
---




llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/ContainerRunnerImpl.java
Lines 610 (patched)


What's the reason for time based expiry? Is it because UGI expires after 24 
hrs?
Else, I would have expected long living cache with blocking queue of 
bounded size.

Queue should be bounded by number of executors anyways, since having more 
connections than executors probably won't be needed.



llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/ContainerRunnerImpl.java
Lines 617 (patched)


LOG.debug



llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/ContainerRunnerImpl.java
Lines 638 (patched)


Bound this queue by number of executors?



llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/ContainerRunnerImpl.java
Lines 640 (patched)


LOG.debug



llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/ContainerRunnerImpl.java
Lines 646-651 (patched)


Is this logic needed? You already have valueloader in get() which must 
return a ugi, so it cant be null.



llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/ContainerRunnerImpl.java
Lines 663 (patched)


if its null, then its programming error. Better to not do this null check 
and offer without checking for null.


- Ashutosh Chauhan


On May 12, 2020, 12:06 p.m., Rajesh Balamohan wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72499/
> ---
> 
> (Updated May 12, 2020, 12:06 p.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan and Gopal V.
> 
> 
> Bugs: HIVE-23446
> https://issues.apache.org/jira/browse/HIVE-23446
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Currently UGI pool is maintained at QueryInfo level. However, when short 
> queries and lots of AMs are there, it ends missing IPC connection cache. Too 
> many connections are are also established. Patch tries to avoid that by 
> maintaining this at ContainerRunner level. It retains the current behaviour 
> of having multiple connection to same AM (otherwise can get bottlenecked on 
> single connection)
> 
> 
> Diffs
> -
> 
>   
> llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/ContainerRunnerImpl.java
>  6a13b55e69 
>   llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/QueryInfo.java 
> 00fed15d2b 
>   
> llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/QueryTracker.java
>  eae8e08540 
>   
> llap-server/src/test/org/apache/hadoop/hive/llap/daemon/impl/TaskExecutorTestHelpers.java
>  50dec4759e 
> 
> 
> Diff: https://reviews.apache.org/r/72499/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Rajesh Balamohan
> 
>



Re: Review Request 72499: HIVE-23446:LLAP: Reduce IPC connection misses to AM for short queries

2020-05-12 Thread Rajesh Balamohan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72499/
---

(Updated May 12, 2020, 12:06 p.m.)


Review request for hive, Ashutosh Chauhan and Gopal V.


Bugs: HIVE-23446
https://issues.apache.org/jira/browse/HIVE-23446


Repository: hive-git


Description (updated)
---

Currently UGI pool is maintained at QueryInfo level. However, when short 
queries and lots of AMs are there, it ends missing IPC connection cache. Too 
many connections are are also established. Patch tries to avoid that by 
maintaining this at ContainerRunner level. It retains the current behaviour of 
having multiple connection to same AM (otherwise can get bottlenecked on single 
connection)


Diffs
-

  
llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/ContainerRunnerImpl.java
 6a13b55e69 
  llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/QueryInfo.java 
00fed15d2b 
  
llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/QueryTracker.java 
eae8e08540 
  
llap-server/src/test/org/apache/hadoop/hive/llap/daemon/impl/TaskExecutorTestHelpers.java
 50dec4759e 


Diff: https://reviews.apache.org/r/72499/diff/1/


Testing
---


Thanks,

Rajesh Balamohan



Review Request 72499: HIVE-23446:LLAP: Reduce IPC connection misses to AM for short queries

2020-05-12 Thread Rajesh Balamohan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72499/
---

Review request for hive, Ashutosh Chauhan and Gopal V.


Bugs: HIVE-23446
https://issues.apache.org/jira/browse/HIVE-23446


Repository: hive-git


Description
---

UGI pool was maintained at QueryInfo level. However, when short queries and 
lots of AMs are there, it ends missing IPC connection cache. Too many 
connections are are also established. Patch tries to avoid that by maintaining 
this at ContainerRunner level. It retains the current behaviour of having 
multiple connection to same AM (otherwise can get bottlenecked on single 
connection)


Diffs
-

  
llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/ContainerRunnerImpl.java
 6a13b55e69 
  llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/QueryInfo.java 
00fed15d2b 
  
llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/QueryTracker.java 
eae8e08540 
  
llap-server/src/test/org/apache/hadoop/hive/llap/daemon/impl/TaskExecutorTestHelpers.java
 50dec4759e 


Diff: https://reviews.apache.org/r/72499/diff/1/


Testing
---


Thanks,

Rajesh Balamohan