[jira] [Updated] (HIVE-16949) Leak of threads from Get-Input-Paths thread pool when more than 1 used in query

2017-08-23 Thread Sahil Takiar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-16949:

Attachment: HIVE-16949.1.patch

> Leak of threads from Get-Input-Paths thread pool when more than 1 used in 
> query
> ---
>
> Key: HIVE-16949
> URL: https://issues.apache.org/jira/browse/HIVE-16949
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Birger Brunswiek
>Assignee: Sahil Takiar
> Attachments: HIVE-16949.1.patch
>
>
> The commit 
> [20210de|https://github.com/apache/hive/commit/20210dec94148c9b529132b1545df3dd7be083c3]
>  which was part of HIVE-15546 [introduced a thread 
> pool|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3109]
>  which is not shutdown upon completion of its threads. This leads to a leak 
> of threads for each query which uses more than 1 partition. They are not 
> removed automatically. When queries spanning multiple partitions are made the 
> number of threads increases and is never reduced. On my machine hiveserver2 
> starts to get slower and slower once 10k threads are reached.
> Thread pools only shutdown automatically in special circumstances (see 
> [documentation section 
> _Finalization_|https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html]).
>  This is not currently the case for the Get-Input-Paths thread pool. I would 
> add a _pool.shutdown()_ in a finally block just before returning the result 
> to make sure the threads are really shutdown.
> My current workaround is to set {{hive.exec.input.listing.max.threads = 1}}. 
> This prevents the the thread pool from being spawned 
> [\[1\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2118]
>  
> [\[2\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3107].
> The same issue probably also applies to the [Get-Input-Summary thread 
> pool|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2193].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16949) Leak of threads from Get-Input-Paths thread pool when more than 1 used in query

2017-08-23 Thread Sahil Takiar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-16949:

Status: Patch Available  (was: Open)

> Leak of threads from Get-Input-Paths thread pool when more than 1 used in 
> query
> ---
>
> Key: HIVE-16949
> URL: https://issues.apache.org/jira/browse/HIVE-16949
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Birger Brunswiek
>Assignee: Sahil Takiar
> Attachments: HIVE-16949.1.patch
>
>
> The commit 
> [20210de|https://github.com/apache/hive/commit/20210dec94148c9b529132b1545df3dd7be083c3]
>  which was part of HIVE-15546 [introduced a thread 
> pool|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3109]
>  which is not shutdown upon completion of its threads. This leads to a leak 
> of threads for each query which uses more than 1 partition. They are not 
> removed automatically. When queries spanning multiple partitions are made the 
> number of threads increases and is never reduced. On my machine hiveserver2 
> starts to get slower and slower once 10k threads are reached.
> Thread pools only shutdown automatically in special circumstances (see 
> [documentation section 
> _Finalization_|https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html]).
>  This is not currently the case for the Get-Input-Paths thread pool. I would 
> add a _pool.shutdown()_ in a finally block just before returning the result 
> to make sure the threads are really shutdown.
> My current workaround is to set {{hive.exec.input.listing.max.threads = 1}}. 
> This prevents the the thread pool from being spawned 
> [\[1\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2118]
>  
> [\[2\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3107].
> The same issue probably also applies to the [Get-Input-Summary thread 
> pool|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2193].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16949) Leak of threads from Get-Input-Paths thread pool when more than 1 used in query

2017-06-23 Thread Birger Brunswiek (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Birger Brunswiek updated HIVE-16949:

Description: 
The commit 
[20210de|https://github.com/apache/hive/commit/20210dec94148c9b529132b1545df3dd7be083c3]
 which was part of HIVE-15546 [introduced a thread 
pool|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3109]
 which is not shutdown upon completion of its threads. This leads to a leak of 
threads for each query which uses more than 1 partition. They are not removed 
automatically. When queries spanning multiple partitions are made the number of 
threads increases and is never reduced. On my machine hiveserver2 starts to get 
slower and slower once 10k threads are reached.

Thread pools only shutdown automatically in special circumstances (see 
[documentation section 
_Finalization_|https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html]).
 This is not currently the case for the Get-Input-Paths thread pool. I would 
add a _pool.shutdown()_ in a finally block just before returning the result to 
make sure the threads are really shutdown.

My current workaround is to set {{hive.exec.input.listing.max.threads = 1}}. 
This prevents the the thread pool from being spawned 
[\[1\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2118]
 
[\[2\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3107].

The same issue probably also applies to the [Get-Input-Summary thread 
pool|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2193].

  was:
The commit 
[20210de|https://github.com/apache/hive/commit/20210dec94148c9b529132b1545df3dd7be083c3]
 which was part of HIVE-15546 [introduced a thread 
pool|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3109]
 which is not shutdown upon completion of its threads. This leads to a leak of 
threads for each query which uses more than 1 partition. They are not removed 
by the GC. When queries spanning multiple partitions are made the number of 
threads increases and is never reduced. On my machine hiveserver2 starts to get 
slower and slower once 10k threads are reached.

Thread pools only shutdown automatically in special circumstances (see 
[documentation section 
_Finalization_|https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html]).
 This is not currently the case for the Get-Input-Paths thread pool. I would 
add a _pool.shutdown()_ in a finally block just before returning the result to 
make sure the threads are really shutdown.

My current workaround is to set {{hive.exec.input.listing.max.threads = 1}}. 
This prevents the the thread pool from being spawned 
[\[1\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2118]
 
[\[2\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3107].

The same issue probably also applies to the [Get-Input-Summary thread 
pool|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2193].


> Leak of threads from Get-Input-Paths thread pool when more than 1 used in 
> query
> ---
>
> Key: HIVE-16949
> URL: https://issues.apache.org/jira/browse/HIVE-16949
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Birger Brunswiek
>
> The commit 
> [20210de|https://github.com/apache/hive/commit/20210dec94148c9b529132b1545df3dd7be083c3]
>  which was part of HIVE-15546 [introduced a thread 
> pool|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3109]
>  which is not shutdown upon completion of its threads. This leads to a leak 
> of threads for each query which uses more than 1 partition. They are not 
> removed automatically. When queries spanning multiple partitions are made the 
> number of threads increases and is never reduced. On my machine hiveserver2 
> starts to get slower and slower once 10k threads are reached.
> Thread pools only shutdown automatically in special circumstances (see 
> [documentation section 
> _Finalization_|https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html]).
>  This is not curre

[jira] [Updated] (HIVE-16949) Leak of threads from Get-Input-Paths thread pool when more than 1 used in query

2017-06-23 Thread Birger Brunswiek (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Birger Brunswiek updated HIVE-16949:

Description: 
The commit 
[20210de|https://github.com/apache/hive/commit/20210dec94148c9b529132b1545df3dd7be083c3]
 which was part of HIVE-15546 [introduced a thread 
pool|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3109]
 which is not shutdown upon completion of its threads. This leads to a leak of 
threads for each query which uses more than 1 partition. They are not removed 
by the GC. When queries spanning multiple partitions are made the number of 
threads increases and is never reduced. On my machine hiveserver2 starts to get 
slower and slower once 10k threads are reached.

Thread pools only shutdown automatically in special circumstances (see 
[documentation section 
_Finalization_|https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html]).
 This is not currently the case for the Get-Input-Paths thread pool. I would 
add a _pool.shutdown()_ in a finally block just before returning the result to 
make sure the threads are really shutdown.

My current workaround is to set {{hive.exec.input.listing.max.threads = 1}}. 
This prevents the the thread pool from being spawned 
[\[1\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2118]
 
[\[2\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3107].

The same issue probably also applies to the [Get-Input-Summary thread 
pool|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2193].

  was:
The commit 
[20210de|https://github.com/apache/hive/commit/20210dec94148c9b529132b1545df3dd7be083c3]
 which was part of HIVE-15546 [introduced a thread 
pool|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3109]
 which is not shutdown upon completion of its threads. This leads to a leak of 
threads for each query which uses more than 1 partition. They are not removed 
by the GC. When queries spanning multiple partitions are made the number of 
threads increases and is never reduced. On my machine hiveserver2 starts to get 
slower and slower once 10k threads are reached.

Thread pools only shutdown automatically in special circumstances (see 
[documentation section 
_Finalization_|https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html]).
 I am not sure why this is not the case. I would add a _pool.shutdown()_ just 
[after the pool has completed its 
work|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3137]
 to make sure the threads are really shutdown. This, however, would only fix 
normal operation. There are other exit points, namely through exceptions, which 
would still lead to the same leak of threads.

My current workaround is to set {{hive.exec.input.listing.max.threads = 1}}. 
This prevents the the thread pool from being spawned 
[\[1\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2118]
 
[\[2\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3107].

The same issue probably also applies to the [Get-Input-Summary thread 
pool|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2193].


> Leak of threads from Get-Input-Paths thread pool when more than 1 used in 
> query
> ---
>
> Key: HIVE-16949
> URL: https://issues.apache.org/jira/browse/HIVE-16949
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Birger Brunswiek
>
> The commit 
> [20210de|https://github.com/apache/hive/commit/20210dec94148c9b529132b1545df3dd7be083c3]
>  which was part of HIVE-15546 [introduced a thread 
> pool|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3109]
>  which is not shutdown upon completion of its threads. This leads to a leak 
> of threads for each query which uses more than 1 partition. They are not 
> removed by the GC. When queries spanning multiple partitions are made the 
> number of threads increases and is never reduced. On my machine hiveserver2 
> starts to get slower and slower 

[jira] [Updated] (HIVE-16949) Leak of threads from Get-Input-Paths thread pool when more than 1 used in query

2017-06-23 Thread Birger Brunswiek (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Birger Brunswiek updated HIVE-16949:

Description: 
The commit 
[20210de|https://github.com/apache/hive/commit/20210dec94148c9b529132b1545df3dd7be083c3]
 which was part of HIVE-15546 [introduced a thread 
pool|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3109]
 which is not shutdown upon completion of its threads. This leads to a leak of 
threads for each query which uses more than 1 partition. They are not removed 
by the GC. When queries spanning multiple partitions are made the number of 
threads increases and is never reduced. On my machine hiveserver2 starts to get 
slower and slower once 10k threads are reached.

Thread pools only shutdown automatically in special circumstances (see 
[documentation section 
_Finalization_|https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html]).
 I am not sure why this is not the case. I would add a _pool.shutdown()_ just 
[after the pool has completed its 
work|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3137]
 to make sure the threads are really shutdown. This, however, would only fix 
normal operation. There are other exit points, namely through exceptions, which 
would still lead to the same leak of threads.

My current workaround is to set {{hive.exec.input.listing.max.threads = 1}}. 
This prevents the the thread pool from being spawned 
[\[1\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2118]
 
[\[2\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3107].

The same issue probably also applies to the [Get-Input-Summary thread 
pool|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2193].

  was:
The commit 
[20210de|https://github.com/apache/hive/commit/20210dec94148c9b529132b1545df3dd7be083c3]
 which was part of HIVE-15546 [introduced a thread 
pool|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3109]
 which is not shutdown upon completion of its threads. This leads to a leak of 
threads for each query which uses more than 1 partition. They are not removed 
by the GC. When queries spanning multiple partitions are made the number of 
threads increases and is never reduced. On my machine hiveserver2 starts to get 
slower and slower once 10k threads are reached.

Thread pools should be should be [shutdown 
automatically|https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html].
 I am not sure why this is not the case. I would add a _pool.shutdown()_ just 
[after the pool has completed its 
work|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3137]
 to make sure the threads are really shutdown. This, however, would only fix 
normal operation. There are other exit points, namely through exceptions, which 
would still lead to the same leak of threads.

My current workaround is to set {{hive.exec.input.listing.max.threads = 1}}. 
This prevents the the thread pool from being spawned 
[\[1\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2118]
 
[\[2\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3107].

The same issue probably also applies to the [Get-Input-Summary thread 
pool|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2193].


> Leak of threads from Get-Input-Paths thread pool when more than 1 used in 
> query
> ---
>
> Key: HIVE-16949
> URL: https://issues.apache.org/jira/browse/HIVE-16949
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Birger Brunswiek
>
> The commit 
> [20210de|https://github.com/apache/hive/commit/20210dec94148c9b529132b1545df3dd7be083c3]
>  which was part of HIVE-15546 [introduced a thread 
> pool|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3109]
>  which is not shutdown upon completion of its threads. This leads to a leak 
> of threads for each query which uses more than 1 part

[jira] [Updated] (HIVE-16949) Leak of threads from Get-Input-Paths thread pool when more than 1 used in query

2017-06-23 Thread Birger Brunswiek (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Birger Brunswiek updated HIVE-16949:

Description: 
The commit 
[20210de|https://github.com/apache/hive/commit/20210dec94148c9b529132b1545df3dd7be083c3]
 which was part of HIVE-15546 [introduced a thread 
pool|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3109]
 which is not shutdown upon completion of its threads. This leads to a leak of 
threads for each query which uses more than 1 partition. They are not removed 
by the GC. When queries spanning multiple partitions are made the number of 
threads increases and is never reduced. On my machine hiveserver2 starts to get 
slower and slower once 10k threads are reached.

Thread pools should be should be [shutdown 
automatically|https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html].
 I am not sure why this is not the case. I would add a _pool.shutdown()_ just 
[after the pool has completed its 
work|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3137]
 to make sure the threads are really shutdown. This, however, would only fix 
normal operation. There are other exit points, namely through exceptions, which 
would still lead to the same leak of threads.

My current workaround is to set {{hive.exec.input.listing.max.threads = 1}}. 
This prevents the the thread pool from being spawned 
[\[1\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2118]
 
[\[2\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3107].

The same issue probably also applies to the [Get-Input-Summary thread 
pool|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2193].

  was:
The commit 
[20210de|https://github.com/apache/hive/commit/20210dec94148c9b529132b1545df3dd7be083c3]
 which was part of HIVE-15546 [introduced a thread 
pool|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3109]
 which is not shutdown upon completion of its threads. This leads to a leak of 
threads. They are not removed by the GC. When queries spanning multiple 
partitions are made the number of threads increases and is never reduced. On my 
machine hiveserver2 starts to get slower and slower once 10k threads are 
reached.

Thread pools should be should be [shutdown 
automatically|https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html].
 I am not sure why this is not the case. I would add a _pool.shutdown()_ just 
[after the pool has completed its 
work|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3137]
 to make sure the threads are really shutdown. This, however, would only fix 
normal operation. There are other exit points, namely through exceptions, which 
would still lead to the same leak of threads.

My current workaround is to set {{hive.exec.input.listing.max.threads = 1}}. 
This prevents the the thread pool from being spawned 
[\[1\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2118]
 
[\[2\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3107].

The same issue probably also applies to the [Get-Input-Summary thread 
pool|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2193].


> Leak of threads from Get-Input-Paths thread pool when more than 1 used in 
> query
> ---
>
> Key: HIVE-16949
> URL: https://issues.apache.org/jira/browse/HIVE-16949
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Birger Brunswiek
>
> The commit 
> [20210de|https://github.com/apache/hive/commit/20210dec94148c9b529132b1545df3dd7be083c3]
>  which was part of HIVE-15546 [introduced a thread 
> pool|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3109]
>  which is not shutdown upon completion of its threads. This leads to a leak 
> of threads for each query which uses more than 1 partition. They are not 
> removed by the GC. When queries spanning multiple partitions are made the 
> nu

[jira] [Updated] (HIVE-16949) Leak of threads from Get-Input-Paths thread pool when more than 1 used in query

2017-06-23 Thread Birger Brunswiek (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Birger Brunswiek updated HIVE-16949:

Description: 
The commit 
[20210de|https://github.com/apache/hive/commit/20210dec94148c9b529132b1545df3dd7be083c3]
 which was part of HIVE-15546 [introduced a thread 
pool|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3109]
 which is not shutdown upon completion of its threads. This leads to a leak of 
threads. They are not removed by the GC. When queries spanning multiple 
partitions are made the number of threads increases and is never reduced. On my 
machine hiveserver2 starts to get slower and slower once 10k threads are 
reached.

Thread pools should be should be [shutdown 
automatically|https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html].
 I am not sure why this is not the case. I would add a _pool.shutdown()_ just 
[after the pool has completed its 
work|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3137]
 to make sure the threads are really shutdown. This, however, would only fix 
normal operation. There are other exit points, namely through exceptions, which 
would still lead to the same leak of threads.

My current workaround is to set {{hive.exec.input.listing.max.threads = 1}}. 
This prevents the the thread pool from being spawned 
[\[1\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2118]
 
[\[2\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3107].

The same issue probably also applies to the [Get-Input-Summary thread 
pool|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2193].

  was:
The commit 7f1c29ebe which was part of HIVE-15881 [introduced a thread 
pool|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3109]
 which is not shutdown upon completion of its threads. This leads to a leak of 
threads. They are not removed by the GC. When queries spanning multiple 
partitions are made the number of threads increases and is never reduced. On my 
machine hiveserver2 starts to get slower and slower once 10k threads are 
reached.

Thread pools should be should be [shutdown 
automatically|https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html].
 I am not sure why this is not the case. I would add a _pool.shutdown()_ just 
[after the pool has completed its 
work|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3137]
 to make sure the threads are really shutdown. This, however, would only fix 
normal operation. There are other exit points, namely through exceptions, which 
would still lead to the same leak of threads.

My current workaround is to set {{hive.exec.input.listing.max.threads = 1}}. 
This prevents the the thread pool from being spawned 
[\[1\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2118]
 
[\[2\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3107].

The same issue probably also applies to the [Get-Input-Summary thread 
pool|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2193].


> Leak of threads from Get-Input-Paths thread pool when more than 1 used in 
> query
> ---
>
> Key: HIVE-16949
> URL: https://issues.apache.org/jira/browse/HIVE-16949
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Birger Brunswiek
>
> The commit 
> [20210de|https://github.com/apache/hive/commit/20210dec94148c9b529132b1545df3dd7be083c3]
>  which was part of HIVE-15546 [introduced a thread 
> pool|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3109]
>  which is not shutdown upon completion of its threads. This leads to a leak 
> of threads. They are not removed by the GC. When queries spanning multiple 
> partitions are made the number of threads increases and is never reduced. On 
> my machine hiveserver2 starts to get slower and slower once 10k threads are 
> reached.
> Thread pools should be should be [sh

[jira] [Updated] (HIVE-16949) Leak of threads from Get-Input-Paths thread pool when more than 1 used in query

2017-06-23 Thread Birger Brunswiek (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Birger Brunswiek updated HIVE-16949:

Description: 
The commit 7f1c29ebe which was part of HIVE-15881 [introduced a thread 
pool|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3109]
 which is not shutdown upon completion of its threads. This leads to a leak of 
threads. They are not removed by the GC. When queries spanning multiple 
partitions are made the number of threads increases and is never reduced. On my 
machine hiveserver2 starts to get slower and slower once 10k threads are 
reached.

Thread pools should be should be [shutdown 
automatically|https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html].
 I am not sure why this is not the case. I would add a _pool.shutdown()_ just 
[after the pool has completed its 
work|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3137]
 to make sure the threads are really shutdown. This, however, would only fix 
normal operation. There are other exit points, namely through exceptions, which 
would still lead to the same leak of threads.

My current workaround is to set {{hive.exec.input.listing.max.threads = 1}}. 
This prevents the the thread pool from being spawned 
[\[1\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2118]
 
[\[2\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3107].

The same issue probably also applies to the [Get-Input-Summary thread 
pool|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2193].

  was:
The commit 7f1c29ebe which was part of HIVE-15881 introduced a thread pool for 
which is not shutdown upon completion of its threads. This leads to a leak of 
threads. They are not removed by the GC. When queries spanning multiple 
partitions are made the number of threads increases and is never reduced. On my 
machine hiveserver2 starts to get slower and slower once 10k threads are 
reached.

Thread pools should be should be [shutdown 
automatically|https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html].
 I am not sure why this is not the case. I would add a _pool.shutdown()_ just 
[after the pool has completed its 
work|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3137]
 to make sure the threads are really shutdown. This, however, would only fix 
normal operation. There are other exit points, namely through exceptions, which 
would still lead to the same leak of threads.

My current workaround is to set {{hive.exec.input.listing.max.threads = 1}}. 
This prevents the the thread pool from being spawned 
[\[1\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2118]
 
[\[2\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3107].

The same issue probably also applies to the [Get-Input-Summary thread 
pool|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2193].


> Leak of threads from Get-Input-Paths thread pool when more than 1 used in 
> query
> ---
>
> Key: HIVE-16949
> URL: https://issues.apache.org/jira/browse/HIVE-16949
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Birger Brunswiek
>
> The commit 7f1c29ebe which was part of HIVE-15881 [introduced a thread 
> pool|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3109]
>  which is not shutdown upon completion of its threads. This leads to a leak 
> of threads. They are not removed by the GC. When queries spanning multiple 
> partitions are made the number of threads increases and is never reduced. On 
> my machine hiveserver2 starts to get slower and slower once 10k threads are 
> reached.
> Thread pools should be should be [shutdown 
> automatically|https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html].
>  I am not sure why this is not the case. I would add a _pool.shutdown()_ just 
> [after the pool has completed its 
> work|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac7

[jira] [Updated] (HIVE-16949) Leak of threads from Get-Input-Paths thread pool when more than 1 used in query

2017-06-23 Thread Birger Brunswiek (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Birger Brunswiek updated HIVE-16949:

Description: 
The commit 7f1c29ebe which was part of HIVE-15881 introduced a thread pool for 
which is not shutdown upon completion of its threads. This leads to a leak of 
threads. They are not removed by the GC. When queries spanning multiple 
partitions are made the number of threads increases and is never reduced. On my 
machine hiveserver2 starts to get slower and slower once 10k threads are 
reached.

Thread pools should be should be [shutdown 
automatically|https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html].
 I am not sure why this is not the case. I would add a _pool.shutdown()_ just 
[after the pool has completed its 
work|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3137]
 to make sure the threads are really shutdown. This, however, would only fix 
normal operation. There are other exit points, namely through exceptions, which 
would still lead to the same leak of threads.

My current workaround is to set {{hive.exec.input.listing.max.threads = 1}}. 
This prevents the the thread pool from being spawned 
[\[1\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2118]
 
[\[2\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3107].

The same issue probably also applies to the [Get-Input-Summary thread 
pool|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2193].

  was:
The commit 7f1c29ebe which was part of HIVE-15881 introduced a thread pool for 
which is not shutdown upon completion of its threads. This leads to a leak of 
threads. They are not removed by the GC. When queries spanning multiple 
partitions are made the number of threads increases and is never reduced. On my 
machine hiveserver2 starts to get slower and slower once 10k threads are 
reached.

Thread pools should be should be [shutdown 
automatically|https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html].
 I am not sure why this is not the case. I would add a _pool.shutdown()_ just 
[after the pool has completed its 
work|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3137]
 to make sure the threads are really shutdown. This, however, would only fix 
normal operation. There are other exit points, namely through exceptions, which 
would still lead to the same leak of threads.

My current workaround is to set {{hive.exec.input.listing.max.threads = 1}}. 
This prevents the the thread pool from being spawned 
[\[1\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2118]
 
[\[2\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3107].




> Leak of threads from Get-Input-Paths thread pool when more than 1 used in 
> query
> ---
>
> Key: HIVE-16949
> URL: https://issues.apache.org/jira/browse/HIVE-16949
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Birger Brunswiek
>
> The commit 7f1c29ebe which was part of HIVE-15881 introduced a thread pool 
> for which is not shutdown upon completion of its threads. This leads to a 
> leak of threads. They are not removed by the GC. When queries spanning 
> multiple partitions are made the number of threads increases and is never 
> reduced. On my machine hiveserver2 starts to get slower and slower once 10k 
> threads are reached.
> Thread pools should be should be [shutdown 
> automatically|https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html].
>  I am not sure why this is not the case. I would add a _pool.shutdown()_ just 
> [after the pool has completed its 
> work|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3137]
>  to make sure the threads are really shutdown. This, however, would only fix 
> normal operation. There are other exit points, namely through exceptions, 
> which would still lead to the same leak of threads.
> My current workaround is to set {{hive.exec.input.listing.max.threads = 1}}. 
> This prevents the the thread pool from being spawned 
> [\[1\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756

[jira] [Updated] (HIVE-16949) Leak of threads from Get-Input-Paths thread pool when more than 1 used in query

2017-06-23 Thread Birger Brunswiek (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Birger Brunswiek updated HIVE-16949:

Description: 
The commit 7f1c29ebe which was part of HIVE-15881 introduced a thread pool for 
which is not shutdown upon completion of its threads. This leads to a leak of 
threads. They are not removed by the GC. When queries spanning multiple 
partitions are made the number of threads increases and is never reduced. On my 
machine hiveserver2 starts to get slower and slower once 10k threads are 
reached.

Thread pools should be should be [shutdown 
automatically|https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html].
 I am not sure why this is not the case. I would add a _pool.shutdown()_ just 
[after the pool has completed its 
work|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3137]
 to make sure the threads are really shutdown. This, however, would only fix 
normal operation. There are other exit points, namely through exceptions, which 
would still lead to the same leak of threads.

My current workaround is to set {{hive.exec.input.listing.max.threads = 1}}. 
This prevents the the thread pool from being spawned 
[\[1\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2118]
 
[\[2\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3107].



  was:
The commit 7f1c29ebe which was part of HIVE-15881 introduced a thread pool for 
which is not shutdown upon completion of its threads. This leads to a leak of 
threads. They are not removed by the GC. When queries spanning multiple 
partitions are made the number of threads increases and is never reduced. On my 
machine hiveserver2 starts to get slower and slower once 10k threads are 
reached.

Thread pools should be should be [shutdown 
automatically|https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html].
 I am not sure why this is not the case. I would add a _pool.shutdown()_ just 
[after the pool has completed its 
work|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3137]
 to make sure the threads are really shutdown.

My current workaround is to set {{hive.exec.input.listing.max.threads = 1}}. 
This prevents the the thread pool from being spawned 
[\[1\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2118]
 
[\[2\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3107].




> Leak of threads from Get-Input-Paths thread pool when more than 1 used in 
> query
> ---
>
> Key: HIVE-16949
> URL: https://issues.apache.org/jira/browse/HIVE-16949
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Birger Brunswiek
>
> The commit 7f1c29ebe which was part of HIVE-15881 introduced a thread pool 
> for which is not shutdown upon completion of its threads. This leads to a 
> leak of threads. They are not removed by the GC. When queries spanning 
> multiple partitions are made the number of threads increases and is never 
> reduced. On my machine hiveserver2 starts to get slower and slower once 10k 
> threads are reached.
> Thread pools should be should be [shutdown 
> automatically|https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html].
>  I am not sure why this is not the case. I would add a _pool.shutdown()_ just 
> [after the pool has completed its 
> work|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3137]
>  to make sure the threads are really shutdown. This, however, would only fix 
> normal operation. There are other exit points, namely through exceptions, 
> which would still lead to the same leak of threads.
> My current workaround is to set {{hive.exec.input.listing.max.threads = 1}}. 
> This prevents the the thread pool from being spawned 
> [\[1\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2118]
>  
> [\[2\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3107].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)