[jira] [Commented] (HIVE-21210) CombineHiveInputFormat Thread Pool Sizing

2019-02-28 Thread Zoltan Chovan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16780279#comment-16780279
 ] 

Zoltan Chovan commented on HIVE-21210:
--

[~belugabehr] sure! :)

As discussed before my main concern is "enforceability", e.g. if the decision 
is made to merge this and the future plan would be to use your proposed central 
util class to spin up threadpools, then how to avoid scenarios where it is not 
used.

 

Also I have a few follow up questions:

1./ Do you know how many separate threadpools are spinned up currently? I'd be 
interested in the scope of the refactoring effort to make this consistent 
across the current code.

2./ Do you have any metrics/measurements of using the proposed HiveInputPolicy? 
How does the number of threads scale with number of partitions in the old and 
in your version?

3./ As far as I understand in this version the number of threads would max out 
at the number of cores. If the same threadpool util class is used, aren't we at 
a situation where the overall max thread number would be NumOfThreadPools * 
NumOfCores? 

> CombineHiveInputFormat Thread Pool Sizing
> -
>
> Key: HIVE-21210
> URL: https://issues.apache.org/jira/browse/HIVE-21210
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0, 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: HIVE-21210.1.patch, HIVE-21210.2.patch, 
> HIVE-21210.3.patch, HIVE-21210.4.patch, HIVE-21210.5.patch, 
> HIVE-21210.6.patch, HIVE-21210.7.patch, HIVE-21210.8.patch
>
>
> Threadpools.
> Hive uses threadpools in several different places and each implementation is 
> a little different and requires different configurations. I think that Hive 
> needs to reign in and standardize the way that threadpools are used and 
> threadpools should scale automatically without manual configuration. At any 
> given time, there are many hundreds of threads running in the HS2 as the 
> number of simultaneous connections increases and they surely cause contention 
> with one-another.
> Here is an example:
> {code:java|title=CombineHiveInputFormat.java}
>   // max number of threads we can use to check non-combinable paths
>   private static final int MAX_CHECK_NONCOMBINABLE_THREAD_NUM = 50;
>   private static final int DEFAULT_NUM_PATH_PER_THREAD = 100;
> {code}
> When building the splits for a MR job, there are up to 50 threads running per 
> query and there is not much scaling here, it's simply 1 thread : 100 files 
> ratio.  This implies that to process 5000 files, there are 50 threads, after 
> that, 50 threads are still used. Many Hive jobs these days involve more than 
> 5000 files so it's not scaling well on bigger sizes.
> This is not configurable (even manually), it doesn't change when the hardware 
> specs increase, and 50 threads seems like a lot when a service must support 
> up to 80 connections:
> [https://www.cloudera.com/documentation/enterprise/5/latest/topics/admin_hive_tuning.html]
> Not to mention, I have never seen a scenario where HS2 is running on a host 
> all by itself and has the entire system dedicated to it. Therefore it should 
> be more friendly and spin up fewer threads.
> I am attaching a patch here that provides a few features:
>  * Common module that produces {{ExecutorService}} which caps the number of 
> threads it spins up at the number of processors a host has. Keep in mind that 
> a class may submit as much work units ({{Callables}} as they would like, but 
> the number of threads in the pool is capped.
>  * Common module for partitioning work. That is, allow for a generic 
> framework for dividing work into partitions (i.e. batches)
>  * Modify {{CombineHiveInputFormat}} to take advantage of both modules, 
> performing its same duties in a more Java OO way that is currently implemented
>  * Add a partitioning (batching) implementation that enforces partitioning of 
> a {{Collection}} based on the natural log of the {{Collection}} size so that 
> it scales more slowly than a simple 1:100 ratio.
>  * Simplify unit test code for {{CombineHiveInputFormat}}
> My hope is to introduce these tools to {{CombineHiveInputFormat}} and then to 
> drop it into other places.  One of the things I will introduce here is a 
> "direct thread" {{ExecutorService}} so that even if there is a configuration 
> for a thread pool to be disabled, it will still use an {{ExecutorService}} so 
> that the project can avoid logic like "if this function is services by a 
> thread pool, use a {{ExecutorService}} (and remember to close it later!) 
> otherwise, create a single thread" so that things like [HIVE-16949] can be 
> avoided in the future.  Everything will just use an {{ExecutorService}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21210) CombineHiveInputFormat Thread Pool Sizing

2019-02-27 Thread BELUGA BEHR (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16779954#comment-16779954
 ] 

BELUGA BEHR commented on HIVE-21210:


[~zchovan] Can I get your thoughts on this one? :)

> CombineHiveInputFormat Thread Pool Sizing
> -
>
> Key: HIVE-21210
> URL: https://issues.apache.org/jira/browse/HIVE-21210
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0, 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: HIVE-21210.1.patch, HIVE-21210.2.patch, 
> HIVE-21210.3.patch, HIVE-21210.4.patch, HIVE-21210.5.patch, 
> HIVE-21210.6.patch, HIVE-21210.7.patch, HIVE-21210.8.patch
>
>
> Threadpools.
> Hive uses threadpools in several different places and each implementation is 
> a little different and requires different configurations. I think that Hive 
> needs to reign in and standardize the way that threadpools are used and 
> threadpools should scale automatically without manual configuration. At any 
> given time, there are many hundreds of threads running in the HS2 as the 
> number of simultaneous connections increases and they surely cause contention 
> with one-another.
> Here is an example:
> {code:java|title=CombineHiveInputFormat.java}
>   // max number of threads we can use to check non-combinable paths
>   private static final int MAX_CHECK_NONCOMBINABLE_THREAD_NUM = 50;
>   private static final int DEFAULT_NUM_PATH_PER_THREAD = 100;
> {code}
> When building the splits for a MR job, there are up to 50 threads running per 
> query and there is not much scaling here, it's simply 1 thread : 100 files 
> ratio.  This implies that to process 5000 files, there are 50 threads, after 
> that, 50 threads are still used. Many Hive jobs these days involve more than 
> 5000 files so it's not scaling well on bigger sizes.
> This is not configurable (even manually), it doesn't change when the hardware 
> specs increase, and 50 threads seems like a lot when a service must support 
> up to 80 connections:
> [https://www.cloudera.com/documentation/enterprise/5/latest/topics/admin_hive_tuning.html]
> Not to mention, I have never seen a scenario where HS2 is running on a host 
> all by itself and has the entire system dedicated to it. Therefore it should 
> be more friendly and spin up fewer threads.
> I am attaching a patch here that provides a few features:
>  * Common module that produces {{ExecutorService}} which caps the number of 
> threads it spins up at the number of processors a host has. Keep in mind that 
> a class may submit as much work units ({{Callables}} as they would like, but 
> the number of threads in the pool is capped.
>  * Common module for partitioning work. That is, allow for a generic 
> framework for dividing work into partitions (i.e. batches)
>  * Modify {{CombineHiveInputFormat}} to take advantage of both modules, 
> performing its same duties in a more Java OO way that is currently implemented
>  * Add a partitioning (batching) implementation that enforces partitioning of 
> a {{Collection}} based on the natural log of the {{Collection}} size so that 
> it scales more slowly than a simple 1:100 ratio.
>  * Simplify unit test code for {{CombineHiveInputFormat}}
> My hope is to introduce these tools to {{CombineHiveInputFormat}} and then to 
> drop it into other places.  One of the things I will introduce here is a 
> "direct thread" {{ExecutorService}} so that even if there is a configuration 
> for a thread pool to be disabled, it will still use an {{ExecutorService}} so 
> that the project can avoid logic like "if this function is services by a 
> thread pool, use a {{ExecutorService}} (and remember to close it later!) 
> otherwise, create a single thread" so that things like [HIVE-16949] can be 
> avoided in the future.  Everything will just use an {{ExecutorService}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21210) CombineHiveInputFormat Thread Pool Sizing

2019-02-07 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762504#comment-16762504
 ] 

Hive QA commented on HIVE-21210:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12957853/HIVE-21210.8.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:green}SUCCESS:{color} +1 due to 15779 tests passed

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/15983/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/15983/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-15983/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12957853 - PreCommit-HIVE-Build

> CombineHiveInputFormat Thread Pool Sizing
> -
>
> Key: HIVE-21210
> URL: https://issues.apache.org/jira/browse/HIVE-21210
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0, 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: HIVE-21210.1.patch, HIVE-21210.2.patch, 
> HIVE-21210.3.patch, HIVE-21210.4.patch, HIVE-21210.5.patch, 
> HIVE-21210.6.patch, HIVE-21210.7.patch, HIVE-21210.8.patch
>
>
> Threadpools.
> Hive uses threadpools in several different places and each implementation is 
> a little different and requires different configurations. I think that Hive 
> needs to reign in and standardize the way that threadpools are used and 
> threadpools should scale automatically without manual configuration. At any 
> given time, there are many hundreds of threads running in the HS2 as the 
> number of simultaneous connections increases and they surely cause contention 
> with one-another.
> Here is an example:
> {code:java|title=CombineHiveInputFormat.java}
>   // max number of threads we can use to check non-combinable paths
>   private static final int MAX_CHECK_NONCOMBINABLE_THREAD_NUM = 50;
>   private static final int DEFAULT_NUM_PATH_PER_THREAD = 100;
> {code}
> When building the splits for a MR job, there are up to 50 threads running per 
> query and there is not much scaling here, it's simply 1 thread : 100 files 
> ratio.  This implies that to process 5000 files, there are 50 threads, after 
> that, 50 threads are still used. Many Hive jobs these days involve more than 
> 5000 files so it's not scaling well on bigger sizes.
> This is not configurable (even manually), it doesn't change when the hardware 
> specs increase, and 50 threads seems like a lot when a service must support 
> up to 80 connections:
> [https://www.cloudera.com/documentation/enterprise/5/latest/topics/admin_hive_tuning.html]
> Not to mention, I have never seen a scenario where HS2 is running on a host 
> all by itself and has the entire system dedicated to it. Therefore it should 
> be more friendly and spin up fewer threads.
> I am attaching a patch here that provides a few features:
>  * Common module that produces {{ExecutorService}} which caps the number of 
> threads it spins up at the number of processors a host has. Keep in mind that 
> a class may submit as much work units ({{Callables}} as they would like, but 
> the number of threads in the pool is capped.
>  * Common module for partitioning work. That is, allow for a generic 
> framework for dividing work into partitions (i.e. batches)
>  * Modify {{CombineHiveInputFormat}} to take advantage of both modules, 
> performing its same duties in a more Java OO way that is currently implemented
>  * Add a partitioning (batching) implementation that enforces partitioning of 
> a {{Collection}} based on the natural log of the {{Collection}} size so that 
> it scales more slowly than a simple 1:100 ratio.
>  * Simplify unit test code for {{CombineHiveInputFormat}}
> My hope is to introduce these tools to {{CombineHiveInputFormat}} and then to 
> drop it into other places.  One of the things I will introduce here is a 
> "direct thread" {{ExecutorService}} so that even if there is a configuration 
> for a thread pool to be disabled, it will still use an {{ExecutorService}} so 
> that the project can avoid logic like "if this function is services by a 
> thread pool, use a {{ExecutorService}} (and remember to close it later!) 
> otherwise, create a single thread" so that things like [HIVE-16949] can be 
> avoided in the future.  Everything will just use an {{ExecutorService}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21210) CombineHiveInputFormat Thread Pool Sizing

2019-02-07 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762487#comment-16762487
 ] 

Hive QA commented on HIVE-21210:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
41s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
 8s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
19s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
49s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
30s{color} | {color:blue} common in master has 65 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m  
2s{color} | {color:blue} ql in master has 2298 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
11s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
29s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
20s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
12s{color} | {color:red} common: The patch generated 2 new + 0 unchanged - 0 
fixed = 2 total (was 0) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
38s{color} | {color:red} ql: The patch generated 1 new + 7 unchanged - 49 fixed 
= 8 total (was 56) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
12s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
14s{color} | {color:red} The patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 28m  5s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-15983/dev-support/hive-personality.sh
 |
| git revision | master / 6508716 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-15983/yetus/diff-checkstyle-common.txt
 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-15983/yetus/diff-checkstyle-ql.txt
 |
| asflicense | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-15983/yetus/patch-asflicense-problems.txt
 |
| modules | C: common ql U: . |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-15983/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> CombineHiveInputFormat Thread Pool Sizing
> -
>
> Key: HIVE-21210
> URL: https://issues.apache.org/jira/browse/HIVE-21210
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0, 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: HIVE-21210.1.patch, HIVE-21210.2.patch, 
> HIVE-21210.3.patch, HIVE-21210.4.patch, HIVE-21210.5.patch, 
> HIVE-21210.6.patch, HIVE-21210.7.patch, HIVE-21210.8.patch
>
>
> Threadpools.
> Hive uses threadpools in several different places and each implementation is 
> a little different and requires 

[jira] [Commented] (HIVE-21210) CombineHiveInputFormat Thread Pool Sizing

2019-02-06 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762263#comment-16762263
 ] 

Hive QA commented on HIVE-21210:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12957830/HIVE-21210.7.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 15779 tests 
executed
*Failed tests:*
{noformat}
org.apache.hive.jdbc.TestSSL.testMetastoreWithSSL (batchId=260)
org.apache.hive.service.auth.TestLdapAuthenticationProviderImpl.testAuthenticateWithBindInCredentialFilePasses
 (batchId=232)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/15975/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/15975/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-15975/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12957830 - PreCommit-HIVE-Build

> CombineHiveInputFormat Thread Pool Sizing
> -
>
> Key: HIVE-21210
> URL: https://issues.apache.org/jira/browse/HIVE-21210
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0, 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: HIVE-21210.1.patch, HIVE-21210.2.patch, 
> HIVE-21210.3.patch, HIVE-21210.4.patch, HIVE-21210.5.patch, 
> HIVE-21210.6.patch, HIVE-21210.7.patch
>
>
> Threadpools.
> Hive uses threadpools in several different places and each implementation is 
> a little different and requires different configurations. I think that Hive 
> needs to reign in and standardize the way that threadpools are used and 
> threadpools should scale automatically without manual configuration. At any 
> given time, there are many hundreds of threads running in the HS2 as the 
> number of simultaneous connections increases and they surely cause contention 
> with one-another.
> Here is an example:
> {code:java|title=CombineHiveInputFormat.java}
>   // max number of threads we can use to check non-combinable paths
>   private static final int MAX_CHECK_NONCOMBINABLE_THREAD_NUM = 50;
>   private static final int DEFAULT_NUM_PATH_PER_THREAD = 100;
> {code}
> When building the splits for a MR job, there are up to 50 threads running per 
> query and there is not much scaling here, it's simply 1 thread : 100 files 
> ratio.  This implies that to process 5000 files, there are 50 threads, after 
> that, 50 threads are still used. Many Hive jobs these days involve more than 
> 5000 files so it's not scaling well on bigger sizes.
> This is not configurable (even manually), it doesn't change when the hardware 
> specs increase, and 50 threads seems like a lot when a service must support 
> up to 80 connections:
> [https://www.cloudera.com/documentation/enterprise/5/latest/topics/admin_hive_tuning.html]
> Not to mention, I have never seen a scenario where HS2 is running on a host 
> all by itself and has the entire system dedicated to it. Therefore it should 
> be more friendly and spin up fewer threads.
> I am attaching a patch here that provides a few features:
>  * Common module that produces {{ExecutorService}} which caps the number of 
> threads it spins up at the number of processors a host has. Keep in mind that 
> a class may submit as much work units ({{Callables}} as they would like, but 
> the number of threads in the pool is capped.
>  * Common module for partitioning work. That is, allow for a generic 
> framework for dividing work into partitions (i.e. batches)
>  * Modify {{CombineHiveInputFormat}} to take advantage of both modules, 
> performing its same duties in a more Java OO way that is currently implemented
>  * Add a partitioning (batching) implementation that enforces partitioning of 
> a {{Collection}} based on the natural log of the {{Collection}} size so that 
> it scales more slowly than a simple 1:100 ratio.
>  * Simplify unit test code for {{CombineHiveInputFormat}}
> My hope is to introduce these tools to {{CombineHiveInputFormat}} and then to 
> drop it into other places.  One of the things I will introduce here is a 
> "direct thread" {{ExecutorService}} so that even if there is a configuration 
> for a thread pool to be disabled, it will still use an {{ExecutorService}} so 
> that the project can avoid logic like "if this function is services by a 
> thread pool, use a {{ExecutorService}} (and 

[jira] [Commented] (HIVE-21210) CombineHiveInputFormat Thread Pool Sizing

2019-02-06 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762249#comment-16762249
 ] 

Hive QA commented on HIVE-21210:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
40s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
 5s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
27s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
51s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
33s{color} | {color:blue} common in master has 65 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m  
0s{color} | {color:blue} ql in master has 2298 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
13s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
29s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
20s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
12s{color} | {color:red} common: The patch generated 2 new + 0 unchanged - 0 
fixed = 2 total (was 0) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
41s{color} | {color:red} ql: The patch generated 1 new + 7 unchanged - 49 fixed 
= 8 total (was 56) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
14s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
14s{color} | {color:red} The patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 28m  8s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-15975/dev-support/hive-personality.sh
 |
| git revision | master / 0e4d16b |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-15975/yetus/diff-checkstyle-common.txt
 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-15975/yetus/diff-checkstyle-ql.txt
 |
| asflicense | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-15975/yetus/patch-asflicense-problems.txt
 |
| modules | C: common ql U: . |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-15975/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> CombineHiveInputFormat Thread Pool Sizing
> -
>
> Key: HIVE-21210
> URL: https://issues.apache.org/jira/browse/HIVE-21210
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0, 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: HIVE-21210.1.patch, HIVE-21210.2.patch, 
> HIVE-21210.3.patch, HIVE-21210.4.patch, HIVE-21210.5.patch, 
> HIVE-21210.6.patch, HIVE-21210.7.patch
>
>
> Threadpools.
> Hive uses threadpools in several different places and each implementation is 
> a little different and requires different configurations. I 

[jira] [Commented] (HIVE-21210) CombineHiveInputFormat Thread Pool Sizing

2019-02-06 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762172#comment-16762172
 ] 

Hive QA commented on HIVE-21210:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12957816/HIVE-21210.6.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 15773 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_insert_partition_static]
 (batchId=185)
org.apache.hadoop.hive.llap.security.TestLlapSignerImpl.testSigning 
(batchId=336)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/15971/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/15971/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-15971/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12957816 - PreCommit-HIVE-Build

> CombineHiveInputFormat Thread Pool Sizing
> -
>
> Key: HIVE-21210
> URL: https://issues.apache.org/jira/browse/HIVE-21210
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0, 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: HIVE-21210.1.patch, HIVE-21210.2.patch, 
> HIVE-21210.3.patch, HIVE-21210.4.patch, HIVE-21210.5.patch, HIVE-21210.6.patch
>
>
> Threadpools.
> Hive uses threadpools in several different places and each implementation is 
> a little different and requires different configurations. I think that Hive 
> needs to reign in and standardize the way that threadpools are used and 
> threadpools should scale automatically without manual configuration. At any 
> given time, there are many hundreds of threads running in the HS2 as the 
> number of simultaneous connections increases and they surely cause contention 
> with one-another.
> Here is an example:
> {code:java|title=CombineHiveInputFormat.java}
>   // max number of threads we can use to check non-combinable paths
>   private static final int MAX_CHECK_NONCOMBINABLE_THREAD_NUM = 50;
>   private static final int DEFAULT_NUM_PATH_PER_THREAD = 100;
> {code}
> When building the splits for a MR job, there are up to 50 threads running per 
> query and there is not much scaling here, it's simply 1 thread : 100 files 
> ratio.  This implies that to process 5000 files, there are 50 threads, after 
> that, 50 threads are still used. Many Hive jobs these days involve more than 
> 5000 files so it's not scaling well on bigger sizes.
> This is not configurable (even manually), it doesn't change when the hardware 
> specs increase, and 50 threads seems like a lot when a service must support 
> up to 80 connections:
> [https://www.cloudera.com/documentation/enterprise/5/latest/topics/admin_hive_tuning.html]
> Not to mention, I have never seen a scenario where HS2 is running on a host 
> all by itself and has the entire system dedicated to it. Therefore it should 
> be more friendly and spin up fewer threads.
> I am attaching a patch here that provides a few features:
>  * Common module that produces {{ExecutorService}} which caps the number of 
> threads it spins up at the number of processors a host has. Keep in mind that 
> a class may submit as much work units ({{Callables}} as they would like, but 
> the number of threads in the pool is capped.
>  * Common module for partitioning work. That is, allow for a generic 
> framework for dividing work into partitions (i.e. batches)
>  * Modify {{CombineHiveInputFormat}} to take advantage of both modules, 
> performing its same duties in a more Java OO way that is currently implemented
>  * Add a partitioning (batching) implementation that enforces partitioning of 
> a {{Collection}} based on the natural log of the {{Collection}} size so that 
> it scales more slowly than a simple 1:100 ratio.
>  * Simplify unit test code for {{CombineHiveInputFormat}}
> My hope is to introduce these tools to {{CombineHiveInputFormat}} and then to 
> drop it into other places.  One of the things I will introduce here is a 
> "direct thread" {{ExecutorService}} so that even if there is a configuration 
> for a thread pool to be disabled, it will still use an {{ExecutorService}} so 
> that the project can avoid logic like "if this function is services by a 
> thread pool, use a {{ExecutorService}} (and remember to 

[jira] [Commented] (HIVE-21210) CombineHiveInputFormat Thread Pool Sizing

2019-02-06 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762147#comment-16762147
 ] 

Hive QA commented on HIVE-21210:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
46s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
 0s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
22s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
50s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
34s{color} | {color:blue} common in master has 65 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m  
4s{color} | {color:blue} ql in master has 2298 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
14s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
27s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
23s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
13s{color} | {color:red} common: The patch generated 2 new + 0 unchanged - 0 
fixed = 2 total (was 0) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
37s{color} | {color:red} ql: The patch generated 1 new + 7 unchanged - 49 fixed 
= 8 total (was 56) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
16s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
14s{color} | {color:red} The patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 28m 28s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-15971/dev-support/hive-personality.sh
 |
| git revision | master / fae6256 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-15971/yetus/diff-checkstyle-common.txt
 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-15971/yetus/diff-checkstyle-ql.txt
 |
| asflicense | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-15971/yetus/patch-asflicense-problems.txt
 |
| modules | C: common ql U: . |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-15971/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> CombineHiveInputFormat Thread Pool Sizing
> -
>
> Key: HIVE-21210
> URL: https://issues.apache.org/jira/browse/HIVE-21210
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0, 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: HIVE-21210.1.patch, HIVE-21210.2.patch, 
> HIVE-21210.3.patch, HIVE-21210.4.patch, HIVE-21210.5.patch, HIVE-21210.6.patch
>
>
> Threadpools.
> Hive uses threadpools in several different places and each implementation is 
> a little different and requires different configurations. I think that Hive 
> 

[jira] [Commented] (HIVE-21210) CombineHiveInputFormat Thread Pool Sizing

2019-02-06 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762026#comment-16762026
 ] 

Hive QA commented on HIVE-21210:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12957783/HIVE-21210.5.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:green}SUCCESS:{color} +1 due to 15773 tests passed

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/15968/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/15968/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-15968/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12957783 - PreCommit-HIVE-Build

> CombineHiveInputFormat Thread Pool Sizing
> -
>
> Key: HIVE-21210
> URL: https://issues.apache.org/jira/browse/HIVE-21210
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0, 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: HIVE-21210.1.patch, HIVE-21210.2.patch, 
> HIVE-21210.3.patch, HIVE-21210.4.patch, HIVE-21210.5.patch
>
>
> Threadpools.
> Hive uses threadpools in several different places and each implementation is 
> a little different and requires different configurations. I think that Hive 
> needs to reign in and standardize the way that threadpools are used and 
> threadpools should scale automatically without manual configuration. At any 
> given time, there are many hundreds of threads running in the HS2 as the 
> number of simultaneous connections increases and they surely cause contention 
> with one-another.
> Here is an example:
> {code:java|title=CombineHiveInputFormat.java}
>   // max number of threads we can use to check non-combinable paths
>   private static final int MAX_CHECK_NONCOMBINABLE_THREAD_NUM = 50;
>   private static final int DEFAULT_NUM_PATH_PER_THREAD = 100;
> {code}
> When building the splits for a MR job, there are up to 50 threads running per 
> query and there is not much scaling here, it's simply 1 thread : 100 files 
> ratio.  This implies that to process 5000 files, there are 50 threads, after 
> that, 50 threads are still used. Many Hive jobs these days involve more than 
> 5000 files so it's not scaling well on bigger sizes.
> This is not configurable (even manually), it doesn't change when the hardware 
> specs increase, and 50 threads seems like a lot when a service must support 
> up to 80 connections:
> [https://www.cloudera.com/documentation/enterprise/5/latest/topics/admin_hive_tuning.html]
> Not to mention, I have never seen a scenario where HS2 is running on a host 
> all by itself and has the entire system dedicated to it. Therefore it should 
> be more friendly and spin up fewer threads.
> I am attaching a patch here that provides a few features:
>  * Common module that produces {{ExecutorService}} which caps the number of 
> threads it spins up at the number of processors a host has. Keep in mind that 
> a class may submit as much work units ({{Callables}} as they would like, but 
> the number of threads in the pool is capped.
>  * Common module for partitioning work. That is, allow for a generic 
> framework for dividing work into partitions (i.e. batches)
>  * Modify {{CombineHiveInputFormat}} to take advantage of both modules, 
> performing its same duties in a more Java OO way that is currently implemented
>  * Add a partitioning (batching) implementation that enforces partitioning of 
> a {{Collection}} based on the natural log of the {{Collection}} size so that 
> it scales more slowly than a simple 1:100 ratio.
>  * Simplify unit test code for {{CombineHiveInputFormat}}
> My hope is to introduce these tools to {{CombineHiveInputFormat}} and then to 
> drop it into other places.  One of the things I will introduce here is a 
> "direct thread" {{ExecutorService}} so that even if there is a configuration 
> for a thread pool to be disabled, it will still use an {{ExecutorService}} so 
> that the project can avoid logic like "if this function is services by a 
> thread pool, use a {{ExecutorService}} (and remember to close it later!) 
> otherwise, create a single thread" so that things like [HIVE-16949] can be 
> avoided in the future.  Everything will just use an {{ExecutorService}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21210) CombineHiveInputFormat Thread Pool Sizing

2019-02-06 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762006#comment-16762006
 ] 

Hive QA commented on HIVE-21210:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
51s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
26s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
23s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
50s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
32s{color} | {color:blue} common in master has 65 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m  
8s{color} | {color:blue} ql in master has 2298 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
12s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
27s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
23s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
11s{color} | {color:red} common: The patch generated 2 new + 0 unchanged - 0 
fixed = 2 total (was 0) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
40s{color} | {color:red} ql: The patch generated 1 new + 7 unchanged - 49 fixed 
= 8 total (was 56) {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
49s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 
13s{color} | {color:red} common generated 18 new + 27 unchanged - 0 fixed = 45 
total (was 27) {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
14s{color} | {color:red} The patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 28m 54s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-15968/dev-support/hive-personality.sh
 |
| git revision | master / fae6256 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-15968/yetus/diff-checkstyle-common.txt
 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-15968/yetus/diff-checkstyle-ql.txt
 |
| whitespace | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-15968/yetus/whitespace-eol.txt
 |
| javadoc | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-15968/yetus/diff-javadoc-javadoc-common.txt
 |
| asflicense | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-15968/yetus/patch-asflicense-problems.txt
 |
| modules | C: common ql U: . |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-15968/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> CombineHiveInputFormat Thread Pool Sizing
> -
>
> Key: HIVE-21210
> URL: https://issues.apache.org/jira/browse/HIVE-21210
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0, 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>

[jira] [Commented] (HIVE-21210) CombineHiveInputFormat Thread Pool Sizing

2019-02-05 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16761388#comment-16761388
 ] 

Hive QA commented on HIVE-21210:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12957690/HIVE-21210.4.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 14 failed/errored test(s), 15731 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[input42] (batchId=81)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[masking_1] (batchId=91)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[masking_3] (batchId=59)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[masking_disablecbo_1] 
(batchId=55)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[masking_disablecbo_3] 
(batchId=39)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mm_all] (batchId=73)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mm_cttas] (batchId=50)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[nonmr_fetch] (batchId=22)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[orc_int_type_promotion] 
(batchId=45)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[pcs] (batchId=54)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ppd_vc] (batchId=91)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ppr_pushdown3] 
(batchId=30)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[rand_partitionpruner2] 
(batchId=59)
org.apache.hive.jdbc.TestSSL.testMetastoreWithSSL (batchId=260)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/15954/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/15954/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-15954/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 14 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12957690 - PreCommit-HIVE-Build

> CombineHiveInputFormat Thread Pool Sizing
> -
>
> Key: HIVE-21210
> URL: https://issues.apache.org/jira/browse/HIVE-21210
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0, 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: HIVE-21210.1.patch, HIVE-21210.2.patch, 
> HIVE-21210.3.patch, HIVE-21210.4.patch
>
>
> Threadpools.
> Hive uses threadpools in several different places and each implementation is 
> a little different and requires different configurations. I think that Hive 
> needs to reign in and standardize the way that threadpools are used and 
> threadpools should scale automatically without manual configuration. At any 
> given time, there are many hundreds of threads running in the HS2 as the 
> number of simultaneous connections increases and they surely cause contention 
> with one-another.
> Here is an example:
> {code:java|title=CombineHiveInputFormat.java}
>   // max number of threads we can use to check non-combinable paths
>   private static final int MAX_CHECK_NONCOMBINABLE_THREAD_NUM = 50;
>   private static final int DEFAULT_NUM_PATH_PER_THREAD = 100;
> {code}
> When building the splits for a MR job, there are up to 50 threads running per 
> query and there is not much scaling here, it's simply 1 thread : 100 files 
> ratio.  This implies that to process 5000 files, there are 50 threads, after 
> that, 50 threads are still used. Many Hive jobs these days involve more than 
> 5000 files so it's not scaling well on bigger sizes.
> This is not configurable (even manually), it doesn't change when the hardware 
> specs increase, and 50 threads seems like a lot when a service must support 
> up to 80 connections:
> [https://www.cloudera.com/documentation/enterprise/5/latest/topics/admin_hive_tuning.html]
> Not to mention, I have never seen a scenario where HS2 is running on a host 
> all by itself and has the entire system dedicated to it. Therefore it should 
> be more friendly and spin up fewer threads.
> I am attaching a patch here that provides a few features:
>  * Common module that produces {{ExecutorService}} which caps the number of 
> threads it spins up at the number of processors a host has. Keep in mind that 
> a class may submit as much work units ({{Callables}} as they would like, but 
> the number of threads in the pool is capped.
>  * Common module for partitioning work. That is, allow for a generic 
> framework for dividing work into partitions (i.e. 

[jira] [Commented] (HIVE-21210) CombineHiveInputFormat Thread Pool Sizing

2019-02-05 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16761372#comment-16761372
 ] 

Hive QA commented on HIVE-21210:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
17s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
13s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
14s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
44s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
30s{color} | {color:blue} common in master has 65 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  3m 
27s{color} | {color:blue} ql in master has 2307 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
3s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
24s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
14s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
10s{color} | {color:red} common: The patch generated 3 new + 0 unchanged - 0 
fixed = 3 total (was 0) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
35s{color} | {color:red} ql: The patch generated 1 new + 10 unchanged - 46 
fixed = 11 total (was 56) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
20s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 
12s{color} | {color:red} common generated 18 new + 27 unchanged - 0 fixed = 45 
total (was 27) {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
13s{color} | {color:red} The patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 24m 40s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-15954/dev-support/hive-personality.sh
 |
| git revision | master / 313e49f |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-15954/yetus/diff-checkstyle-common.txt
 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-15954/yetus/diff-checkstyle-ql.txt
 |
| javadoc | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-15954/yetus/diff-javadoc-javadoc-common.txt
 |
| asflicense | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-15954/yetus/patch-asflicense-problems.txt
 |
| modules | C: common ql U: . |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-15954/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> CombineHiveInputFormat Thread Pool Sizing
> -
>
> Key: HIVE-21210
> URL: https://issues.apache.org/jira/browse/HIVE-21210
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0, 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: HIVE-21210.1.patch, HIVE-21210.2.patch, 
> HIVE-21210.3.patch, HIVE-21210.4.patch
>
>
> Threadpools.
> Hive uses threadpools in several different 

[jira] [Commented] (HIVE-21210) CombineHiveInputFormat Thread Pool Sizing

2019-02-05 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16761196#comment-16761196
 ] 

Hive QA commented on HIVE-21210:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12957668/HIVE-21210.3.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 15 failed/errored test(s), 15731 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_table_stats] 
(batchId=57)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[input42] (batchId=81)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[masking_1] (batchId=91)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[masking_3] (batchId=59)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[masking_disablecbo_1] 
(batchId=55)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[masking_disablecbo_3] 
(batchId=39)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mm_all] (batchId=73)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mm_buckets] (batchId=65)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mm_cttas] (batchId=50)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[nonmr_fetch] (batchId=22)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[orc_int_type_promotion] 
(batchId=45)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[partition_wise_fileformat15]
 (batchId=32)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[pcs] (batchId=54)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ppd_vc] (batchId=91)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ppr_pushdown3] 
(batchId=30)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/15950/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/15950/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-15950/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 15 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12957668 - PreCommit-HIVE-Build

> CombineHiveInputFormat Thread Pool Sizing
> -
>
> Key: HIVE-21210
> URL: https://issues.apache.org/jira/browse/HIVE-21210
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0, 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: HIVE-21210.1.patch, HIVE-21210.2.patch, 
> HIVE-21210.3.patch, HIVE-21210.4.patch
>
>
> Threadpools.
> Hive uses threadpools in several different places and each implementation is 
> a little different and requires different configurations. I think that Hive 
> needs to reign in and standardize the way that threadpools are used and 
> threadpools should scale automatically without manual configuration. At any 
> given time, there are many hundreds of threads running in the HS2 as the 
> number of simultaneous connections increases and they surely cause contention 
> with one-another.
> Here is an example:
> {code:java|title=CombineHiveInputFormat.java}
>   // max number of threads we can use to check non-combinable paths
>   private static final int MAX_CHECK_NONCOMBINABLE_THREAD_NUM = 50;
>   private static final int DEFAULT_NUM_PATH_PER_THREAD = 100;
> {code}
> When building the splits for a MR job, there are up to 50 threads running per 
> query and there is not much scaling here, it's simply 1 thread : 100 files 
> ratio.  This implies that to process 5000 files, there are 50 threads, after 
> that, 50 threads are still used. Many Hive jobs these days involve more than 
> 5000 files so it's not scaling well on bigger sizes.
> This is not configurable (even manually), it doesn't change when the hardware 
> specs increase, and 50 threads seems like a lot when a service must support 
> up to 80 connections:
> [https://www.cloudera.com/documentation/enterprise/5/latest/topics/admin_hive_tuning.html]
> Not to mention, I have never seen a scenario where HS2 is running on a host 
> all by itself and has the entire system dedicated to it. Therefore it should 
> be more friendly and spin up fewer threads.
> I am attaching a patch here that provides a few features:
>  * Common module that produces {{ExecutorService}} which caps the number of 
> threads it spins up at the number of processors a host has. Keep in mind that 
> a class may submit as much work units ({{Callables}} as they would like, but 
> the number of threads in the pool is capped.
>  * Common 

[jira] [Commented] (HIVE-21210) CombineHiveInputFormat Thread Pool Sizing

2019-02-05 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16761160#comment-16761160
 ] 

Hive QA commented on HIVE-21210:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
28s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
14s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
15s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
45s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
29s{color} | {color:blue} common in master has 65 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  3m 
35s{color} | {color:blue} ql in master has 2307 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
6s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
21s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
16s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
11s{color} | {color:red} common: The patch generated 5 new + 0 unchanged - 0 
fixed = 5 total (was 0) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
38s{color} | {color:red} ql: The patch generated 4 new + 10 unchanged - 46 
fixed = 14 total (was 56) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
31s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 
13s{color} | {color:red} common generated 18 new + 27 unchanged - 0 fixed = 45 
total (was 27) {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
13s{color} | {color:red} The patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 25m 26s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-15950/dev-support/hive-personality.sh
 |
| git revision | master / 313e49f |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-15950/yetus/diff-checkstyle-common.txt
 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-15950/yetus/diff-checkstyle-ql.txt
 |
| javadoc | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-15950/yetus/diff-javadoc-javadoc-common.txt
 |
| asflicense | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-15950/yetus/patch-asflicense-problems.txt
 |
| modules | C: common ql U: . |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-15950/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> CombineHiveInputFormat Thread Pool Sizing
> -
>
> Key: HIVE-21210
> URL: https://issues.apache.org/jira/browse/HIVE-21210
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0, 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: HIVE-21210.1.patch, HIVE-21210.2.patch, 
> HIVE-21210.3.patch
>
>
> Threadpools.
> Hive uses threadpools in several different places and each 

[jira] [Commented] (HIVE-21210) CombineHiveInputFormat Thread Pool Sizing

2019-02-04 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16760467#comment-16760467
 ] 

Hive QA commented on HIVE-21210:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12957577/HIVE-21210.2.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 31 failed/errored test(s), 15731 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_table_stats] 
(batchId=57)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[dynamic_partition_insert]
 (batchId=60)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[masking_1] (batchId=91)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[masking_disablecbo_1] 
(batchId=55)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[masking_disablecbo_3] 
(batchId=39)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mm_all] (batchId=73)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mm_buckets] (batchId=65)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mm_cttas] (batchId=50)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[nonmr_fetch] (batchId=22)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[partition_wise_fileformat15]
 (batchId=32)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[partition_wise_fileformat16]
 (batchId=41)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[pcs] (batchId=54)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[pointlookup3] (batchId=7)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ppd_vc] (batchId=91)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ppr_pushdown3] 
(batchId=30)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_groupby_reduce] 
(batchId=61)
org.apache.hadoop.hive.cli.TestLocalSparkCliDriver.testCliDriver[spark_local_queries]
 (batchId=277)
org.apache.hadoop.hive.metastore.TestObjectStore.catalogs (batchId=230)
org.apache.hadoop.hive.metastore.TestObjectStore.testDatabaseOps (batchId=230)
org.apache.hadoop.hive.metastore.TestObjectStore.testDeprecatedConfigIsOverwritten
 (batchId=230)
org.apache.hadoop.hive.metastore.TestObjectStore.testDirectSQLDropParitionsCleanup
 (batchId=230)
org.apache.hadoop.hive.metastore.TestObjectStore.testDirectSQLDropPartitionsCacheCrossSession
 (batchId=230)
org.apache.hadoop.hive.metastore.TestObjectStore.testDirectSqlErrorMetrics 
(batchId=230)
org.apache.hadoop.hive.metastore.TestObjectStore.testEmptyTrustStoreProps 
(batchId=230)
org.apache.hadoop.hive.metastore.TestObjectStore.testMasterKeyOps (batchId=230)
org.apache.hadoop.hive.metastore.TestObjectStore.testMaxEventResponse 
(batchId=230)
org.apache.hadoop.hive.metastore.TestObjectStore.testPartitionOps (batchId=230)
org.apache.hadoop.hive.metastore.TestObjectStore.testQueryCloseOnError 
(batchId=230)
org.apache.hadoop.hive.metastore.TestObjectStore.testRoleOps (batchId=230)
org.apache.hadoop.hive.metastore.TestObjectStore.testTableOps (batchId=230)
org.apache.hadoop.hive.metastore.TestObjectStore.testUseSSLProperty 
(batchId=230)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/15936/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/15936/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-15936/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 31 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12957577 - PreCommit-HIVE-Build

> CombineHiveInputFormat Thread Pool Sizing
> -
>
> Key: HIVE-21210
> URL: https://issues.apache.org/jira/browse/HIVE-21210
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0, 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: HIVE-21210.1.patch, HIVE-21210.2.patch
>
>
> Threadpools.
> Hive uses threadpools in several different places and each implementation is 
> a little different and requires different configurations. I think that Hive 
> needs to reign in and standardize the way that threadpools are used and 
> threadpools should scale automatically without manual configuration. At any 
> given time, there are many hundreds of threads running in the HS2 as the 
> number of simultaneous connections increases and they surely cause contention 
> with one-another.
> Here is an example:
> {code:java|title=CombineHiveInputFormat.java}
>   // max number of threads we can use to check 

[jira] [Commented] (HIVE-21210) CombineHiveInputFormat Thread Pool Sizing

2019-02-04 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16760440#comment-16760440
 ] 

Hive QA commented on HIVE-21210:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
50s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
59s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
15s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
46s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
31s{color} | {color:blue} common in master has 65 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  3m 
34s{color} | {color:blue} ql in master has 2307 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
3s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
25s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
14s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
11s{color} | {color:red} common: The patch generated 5 new + 0 unchanged - 0 
fixed = 5 total (was 0) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
34s{color} | {color:red} ql: The patch generated 4 new + 10 unchanged - 46 
fixed = 14 total (was 56) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
22s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 
12s{color} | {color:red} common generated 18 new + 27 unchanged - 0 fixed = 45 
total (was 27) {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
12s{color} | {color:red} The patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 25m  4s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-15936/dev-support/hive-personality.sh
 |
| git revision | master / 313e49f |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-15936/yetus/diff-checkstyle-common.txt
 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-15936/yetus/diff-checkstyle-ql.txt
 |
| javadoc | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-15936/yetus/diff-javadoc-javadoc-common.txt
 |
| asflicense | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-15936/yetus/patch-asflicense-problems.txt
 |
| modules | C: common ql U: . |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-15936/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> CombineHiveInputFormat Thread Pool Sizing
> -
>
> Key: HIVE-21210
> URL: https://issues.apache.org/jira/browse/HIVE-21210
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0, 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: HIVE-21210.1.patch, HIVE-21210.2.patch
>
>
> Threadpools.
> Hive uses threadpools in several different places and each implementation is 
> a little 

[jira] [Commented] (HIVE-21210) CombineHiveInputFormat Thread Pool Sizing

2019-02-04 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16760245#comment-16760245
 ] 

Hive QA commented on HIVE-21210:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12957520/HIVE-21210.1.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 17 failed/errored test(s), 15726 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_table_stats] 
(batchId=57)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[input42] (batchId=81)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[masking_1] (batchId=91)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[masking_3] (batchId=59)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[masking_disablecbo_1] 
(batchId=55)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[masking_disablecbo_3] 
(batchId=39)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mm_all] (batchId=73)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[nonmr_fetch] (batchId=22)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[orc_int_type_promotion] 
(batchId=45)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[partition_wise_fileformat12]
 (batchId=90)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[partition_wise_fileformat16]
 (batchId=41)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[pcs] (batchId=54)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[pointlookup3] (batchId=7)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ppd_vc] (batchId=91)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ppr_pushdown3] 
(batchId=30)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[rand_partitionpruner2] 
(batchId=59)
org.apache.hadoop.hive.cli.TestLocalSparkCliDriver.testCliDriver[spark_local_queries]
 (batchId=277)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/15925/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/15925/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-15925/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 17 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12957520 - PreCommit-HIVE-Build

> CombineHiveInputFormat Thread Pool Sizing
> -
>
> Key: HIVE-21210
> URL: https://issues.apache.org/jira/browse/HIVE-21210
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0, 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: HIVE-21210.1.patch
>
>
> Threadpools.
> Hive uses threadpools in several different places and each implementation is 
> a little different and requires different configurations. I think that Hive 
> needs to reign in and standardize the way that threadpools are used and 
> threadpools should scale automatically without manual configuration. At any 
> given time, there are many hundreds of threads running in the HS2 as the 
> number of simultaneous connections increases and they surely cause contention 
> with one-another.
> Here is an example:
> {code:java|title=CombineHiveInputFormat.java}
>   // max number of threads we can use to check non-combinable paths
>   private static final int MAX_CHECK_NONCOMBINABLE_THREAD_NUM = 50;
>   private static final int DEFAULT_NUM_PATH_PER_THREAD = 100;
> {code}
> When building the splits for a MR job, there are up to 50 threads running per 
> query and there is not much scaling here, it's simply 1 thread : 100 files 
> ratio.  This implies that to process 5000 files, there are 50 threads, after 
> that, 50 threads are still used. Many Hive jobs these days involve more than 
> 5000 files so it's not scaling well on bigger sizes.
> This is not configurable (even manually), it doesn't change when the hardware 
> specs increase, and 50 threads seems like a lot when a service must support 
> up to 80 connections:
> [https://www.cloudera.com/documentation/enterprise/5/latest/topics/admin_hive_tuning.html]
> Not to mention, I have never seen a scenario where HS2 is running on a host 
> all by itself and has the entire system dedicated to it. Therefore it should 
> be more friendly and spin up fewer threads.
> I am attaching a patch here that provides a few features:
>  * Common module that produces {{ExecutorService}} which caps the number of 
> threads it spins up at the number of processors a host has. Keep in 

[jira] [Commented] (HIVE-21210) CombineHiveInputFormat Thread Pool Sizing

2019-02-04 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16760214#comment-16760214
 ] 

Hive QA commented on HIVE-21210:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
28s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
31s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
15s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
46s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
31s{color} | {color:blue} common in master has 65 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  3m 
42s{color} | {color:blue} ql in master has 2305 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
8s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
24s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
18s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
11s{color} | {color:red} common: The patch generated 6 new + 0 unchanged - 0 
fixed = 6 total (was 0) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
34s{color} | {color:red} ql: The patch generated 4 new + 10 unchanged - 46 
fixed = 14 total (was 56) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 1s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
3s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
12s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 25m 48s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-15925/dev-support/hive-personality.sh
 |
| git revision | master / 4a4b9ca |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-15925/yetus/diff-checkstyle-common.txt
 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-15925/yetus/diff-checkstyle-ql.txt
 |
| modules | C: common ql U: . |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-15925/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> CombineHiveInputFormat Thread Pool Sizing
> -
>
> Key: HIVE-21210
> URL: https://issues.apache.org/jira/browse/HIVE-21210
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0, 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: HIVE-21210.1.patch
>
>
> Threadpools.
> Hive uses threadpools in several different places and each implementation is 
> a little different and requires different configurations. I think that Hive 
> needs to reign in and standardize the way that threadpools are used and 
> threadpools should scale automatically without manual configuration. At any 
> given time, there are many hundreds of threads