[jira] [Commented] (HIVE-23521) REPL: Optimise partition loading during bootstrap

2020-06-05 Thread Rajesh Balamohan (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126515#comment-17126515
 ] 

Rajesh Balamohan commented on HIVE-23521:
-

[~aasha]: Actually it would be good to club this along with HIVE-23520. It 
would make it easier for review as well.

[https://github.com/apache/hive/pull/1060/files] is the PR for the same.

> REPL: Optimise partition loading during bootstrap
> -
>
> Key: HIVE-23521
> URL: https://issues.apache.org/jira/browse/HIVE-23521
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Major
> Attachments: HIVE-23521.1.patch, HIVE-23521.2.patch
>
>
> When bootstrapping with large "REPL dump" with ~10K partitions, it starts 
> executing "addPartition" in sequential manner and takes very long time as it 
> communicates with HMS/registers partition etc for every call.
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/table/LoadPartitions.java#L399]
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/table/LoadPartitions.java#L165]
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/table/LoadPartitions.java#L210]
> When bootstrap loading has to deal with DDL, it would be good to collate all 
> partitions in single call to HMS. This would help in reducing overall runtime.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23521) REPL: Optimise partition loading during bootstrap

2020-06-02 Thread Aasha Medhi (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17123740#comment-17123740
 ] 

Aasha Medhi commented on HIVE-23521:


Patch looks fine. Could you please add tests for the same.

> REPL: Optimise partition loading during bootstrap
> -
>
> Key: HIVE-23521
> URL: https://issues.apache.org/jira/browse/HIVE-23521
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Major
> Attachments: HIVE-23521.1.patch, HIVE-23521.2.patch
>
>
> When bootstrapping with large "REPL dump" with ~10K partitions, it starts 
> executing "addPartition" in sequential manner and takes very long time as it 
> communicates with HMS/registers partition etc for every call.
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/table/LoadPartitions.java#L399]
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/table/LoadPartitions.java#L165]
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/table/LoadPartitions.java#L210]
> When bootstrap loading has to deal with DDL, it would be good to collate all 
> partitions in single call to HMS. This would help in reducing overall runtime.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23521) REPL: Optimise partition loading during bootstrap

2020-05-26 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17116468#comment-17116468
 ] 

Hive QA commented on HIVE-23521:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/13003991/HIVE-23521.2.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 17284 tests 
executed
*Failed tests:*
{noformat}
org.apache.hive.jdbc.TestXSRFFilter.testFilterDisabledNoInjection[1: 
tranportMode=http] (batchId=214)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/22620/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/22620/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-22620/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 13003991 - PreCommit-HIVE-Build

> REPL: Optimise partition loading during bootstrap
> -
>
> Key: HIVE-23521
> URL: https://issues.apache.org/jira/browse/HIVE-23521
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Major
> Attachments: HIVE-23521.1.patch, HIVE-23521.2.patch
>
>
> When bootstrapping with large "REPL dump" with ~10K partitions, it starts 
> executing "addPartition" in sequential manner and takes very long time as it 
> communicates with HMS/registers partition etc for every call.
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/table/LoadPartitions.java#L399]
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/table/LoadPartitions.java#L165]
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/table/LoadPartitions.java#L210]
> When bootstrap loading has to deal with DDL, it would be good to collate all 
> partitions in single call to HMS. This would help in reducing overall runtime.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23521) REPL: Optimise partition loading during bootstrap

2020-05-25 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17116442#comment-17116442
 ] 

Hive QA commented on HIVE-23521:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 11m 
27s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
13s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
59s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m 
34s{color} | {color:blue} ql in master has 1524 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
6s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
9s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
13s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 29m 48s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.43-2+deb8u5 (2017-09-19) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-22620/dev-support/hive-personality.sh
 |
| git revision | master / 947b7a4 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.1 |
| modules | C: ql U: ql |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-22620/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> REPL: Optimise partition loading during bootstrap
> -
>
> Key: HIVE-23521
> URL: https://issues.apache.org/jira/browse/HIVE-23521
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Major
> Attachments: HIVE-23521.1.patch, HIVE-23521.2.patch
>
>
> When bootstrapping with large "REPL dump" with ~10K partitions, it starts 
> executing "addPartition" in sequential manner and takes very long time as it 
> communicates with HMS/registers partition etc for every call.
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/table/LoadPartitions.java#L399]
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/table/LoadPartitions.java#L165]
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/table/LoadPartitions.java#L210]
> When bootstrap loading has to deal with DDL, it would be good to collate all 
> partitions in single call to HMS. This would help in reducing overall runtime.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23521) REPL: Optimise partition loading during bootstrap

2020-05-22 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17114176#comment-17114176
 ] 

Hive QA commented on HIVE-23521:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/13003713/HIVE-23521.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 17281 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.ql.TestWarehouseExternalDir.testManagedPaths 
(batchId=183)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/22546/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/22546/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-22546/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 13003713 - PreCommit-HIVE-Build

> REPL: Optimise partition loading during bootstrap
> -
>
> Key: HIVE-23521
> URL: https://issues.apache.org/jira/browse/HIVE-23521
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Major
> Attachments: HIVE-23521.1.patch
>
>
> When bootstrapping with large "REPL dump" with ~10K partitions, it starts 
> executing "addPartition" in sequential manner and takes very long time as it 
> communicates with HMS/registers partition etc for every call.
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/table/LoadPartitions.java#L399]
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/table/LoadPartitions.java#L165]
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/table/LoadPartitions.java#L210]
> When bootstrap loading has to deal with DDL, it would be good to collate all 
> partitions in single call to HMS. This would help in reducing overall runtime.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23521) REPL: Optimise partition loading during bootstrap

2020-05-22 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17114128#comment-17114128
 ] 

Hive QA commented on HIVE-23521:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 12m 
 0s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
13s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 0s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m 
22s{color} | {color:blue} ql in master has 1524 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
8s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
8s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
16s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 30m 10s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.43-2+deb8u5 (2017-09-19) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-22546/dev-support/hive-personality.sh
 |
| git revision | master / 716f1f9 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.1 |
| modules | C: ql U: ql |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-22546/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> REPL: Optimise partition loading during bootstrap
> -
>
> Key: HIVE-23521
> URL: https://issues.apache.org/jira/browse/HIVE-23521
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Major
> Attachments: HIVE-23521.1.patch
>
>
> When bootstrapping with large "REPL dump" with ~10K partitions, it starts 
> executing "addPartition" in sequential manner and takes very long time as it 
> communicates with HMS/registers partition etc for every call.
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/table/LoadPartitions.java#L399]
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/table/LoadPartitions.java#L165]
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/table/LoadPartitions.java#L210]
> When bootstrap loading has to deal with DDL, it would be good to collate all 
> partitions in single call to HMS. This would help in reducing overall runtime.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23521) REPL: Optimise partition loading during bootstrap

2020-05-21 Thread Rajesh Balamohan (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113746#comment-17113746
 ] 

Rajesh Balamohan commented on HIVE-23521:
-

Batching is one option, but need to start considering data copy as well.

For metadata only case, it ended up running for 3.5 hours for 10K partitions. 
With the patch, it completes in 350-380 seconds!.

> REPL: Optimise partition loading during bootstrap
> -
>
> Key: HIVE-23521
> URL: https://issues.apache.org/jira/browse/HIVE-23521
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Major
> Attachments: HIVE-23521.1.patch
>
>
> When bootstrapping with large "REPL dump" with ~10K partitions, it starts 
> executing "addPartition" in sequential manner and takes very long time as it 
> communicates with HMS/registers partition etc for every call.
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/table/LoadPartitions.java#L399]
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/table/LoadPartitions.java#L165]
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/table/LoadPartitions.java#L210]
> When bootstrap loading has to deal with DDL, it would be good to collate all 
> partitions in single call to HMS. This would help in reducing overall runtime.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23521) REPL: Optimise partition loading during bootstrap

2020-05-20 Thread Anishek Agarwal (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17112807#comment-17112807
 ] 

Anishek Agarwal commented on HIVE-23521:


assuming this is a bootstrap repl case you are talking about rajesh ? we can do 
something in batches for sure there to improve performance.

> REPL: Optimise partition loading during bootstrap
> -
>
> Key: HIVE-23521
> URL: https://issues.apache.org/jira/browse/HIVE-23521
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Major
>
> When bootstrapping with large "REPL dump" with ~10K partitions, it starts 
> executing "addPartition" in sequential manner and takes very long time as it 
> communicates with HMS/registers partition etc for every call.
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/table/LoadPartitions.java#L399]
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/table/LoadPartitions.java#L165]
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/table/LoadPartitions.java#L210]
> When bootstrap loading has to deal with DDL, it would be good to collate all 
> partitions in single call to HMS. This would help in reducing overall runtime.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23521) REPL: Optimise partition loading during bootstrap

2020-05-20 Thread Gopal Vijayaraghavan (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17112725#comment-17112725
 ] 

Gopal Vijayaraghavan commented on HIVE-23521:
-

Is this a "REPL-during-REPL" problem?

> REPL: Optimise partition loading during bootstrap
> -
>
> Key: HIVE-23521
> URL: https://issues.apache.org/jira/browse/HIVE-23521
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Major
>
> When bootstrapping with large "REPL dump" with ~10K partitions, it starts 
> executing "addPartition" in sequential manner and takes very long time as it 
> communicates with HMS/registers partition etc for every call.
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/table/LoadPartitions.java#L399]
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/table/LoadPartitions.java#L165]
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/table/LoadPartitions.java#L210]
> When bootstrap loading has to deal with DDL, it would be good to collate all 
> partitions in single call to HMS. This would help in reducing overall runtime.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)