date:20180307

[jira] [Updated] (HADOOP-14445) Delegation tokens are not shared between KMS instances

2018-03-07 Thread Xiao Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HADOOP-14445:
---
Attachment: (was: HADOOP-14445.05.patch)

> Delegation tokens are not shared between KMS instances
> --
>
> Key: HADOOP-14445
> URL: https://issues.apache.org/jira/browse/HADOOP-14445
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: kms
>Affects Versions: 2.8.0, 3.0.0-alpha1
> Environment: CDH5.7.4, Kerberized, SSL, KMS-HA, at rest encryption
>Reporter: Wei-Chiu Chuang
>Assignee: Xiao Chen
>Priority: Major
> Attachments: HADOOP-14445-branch-2.8.002.patch, 
> HADOOP-14445-branch-2.8.patch, HADOOP-14445.002.patch, 
> HADOOP-14445.003.patch, HADOOP-14445.004.patch, HADOOP-14445.05.patch
>
>
> As discovered in HADOOP-14441, KMS HA using LoadBalancingKMSClientProvider do 
> not share delegation tokens. (a client uses KMS address/port as the key for 
> delegation token)
> {code:title=DelegationTokenAuthenticatedURL#openConnection}
> if (!creds.getAllTokens().isEmpty()) {
> InetSocketAddress serviceAddr = new InetSocketAddress(url.getHost(),
> url.getPort());
> Text service = SecurityUtil.buildTokenService(serviceAddr);
> dToken = creds.getToken(service);
> {code}
> But KMS doc states:
> {quote}
> Delegation Tokens
> Similar to HTTP authentication, KMS uses Hadoop Authentication for delegation 
> tokens too.
> Under HA, A KMS instance must verify the delegation token given by another 
> KMS instance, by checking the shared secret used to sign the delegation 
> token. To do this, all KMS instances must be able to retrieve the shared 
> secret from ZooKeeper.
> {quote}
> We should either update the KMS documentation, or fix this code to share 
> delegation tokens.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-14445) Delegation tokens are not shared between KMS instances

2018-03-07 Thread Xiao Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HADOOP-14445:
---
Attachment: HADOOP-14445.05.patch

> Delegation tokens are not shared between KMS instances
> --
>
> Key: HADOOP-14445
> URL: https://issues.apache.org/jira/browse/HADOOP-14445
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: kms
>Affects Versions: 2.8.0, 3.0.0-alpha1
> Environment: CDH5.7.4, Kerberized, SSL, KMS-HA, at rest encryption
>Reporter: Wei-Chiu Chuang
>Assignee: Xiao Chen
>Priority: Major
> Attachments: HADOOP-14445-branch-2.8.002.patch, 
> HADOOP-14445-branch-2.8.patch, HADOOP-14445.002.patch, 
> HADOOP-14445.003.patch, HADOOP-14445.004.patch, HADOOP-14445.05.patch
>
>
> As discovered in HADOOP-14441, KMS HA using LoadBalancingKMSClientProvider do 
> not share delegation tokens. (a client uses KMS address/port as the key for 
> delegation token)
> {code:title=DelegationTokenAuthenticatedURL#openConnection}
> if (!creds.getAllTokens().isEmpty()) {
> InetSocketAddress serviceAddr = new InetSocketAddress(url.getHost(),
> url.getPort());
> Text service = SecurityUtil.buildTokenService(serviceAddr);
> dToken = creds.getToken(service);
> {code}
> But KMS doc states:
> {quote}
> Delegation Tokens
> Similar to HTTP authentication, KMS uses Hadoop Authentication for delegation 
> tokens too.
> Under HA, A KMS instance must verify the delegation token given by another 
> KMS instance, by checking the shared secret used to sign the delegation 
> token. To do this, all KMS instances must be able to retrieve the shared 
> secret from ZooKeeper.
> {quote}
> We should either update the KMS documentation, or fix this code to share 
> delegation tokens.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-14445) Delegation tokens are not shared between KMS instances

2018-03-07 Thread Xiao Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HADOOP-14445:
---
Status: Patch Available  (was: Open)

> Delegation tokens are not shared between KMS instances
> --
>
> Key: HADOOP-14445
> URL: https://issues.apache.org/jira/browse/HADOOP-14445
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: kms
>Affects Versions: 3.0.0-alpha1, 2.8.0
> Environment: CDH5.7.4, Kerberized, SSL, KMS-HA, at rest encryption
>Reporter: Wei-Chiu Chuang
>Assignee: Xiao Chen
>Priority: Major
> Attachments: HADOOP-14445-branch-2.8.002.patch, 
> HADOOP-14445-branch-2.8.patch, HADOOP-14445.002.patch, 
> HADOOP-14445.003.patch, HADOOP-14445.004.patch, HADOOP-14445.05.patch
>
>
> As discovered in HADOOP-14441, KMS HA using LoadBalancingKMSClientProvider do 
> not share delegation tokens. (a client uses KMS address/port as the key for 
> delegation token)
> {code:title=DelegationTokenAuthenticatedURL#openConnection}
> if (!creds.getAllTokens().isEmpty()) {
> InetSocketAddress serviceAddr = new InetSocketAddress(url.getHost(),
> url.getPort());
> Text service = SecurityUtil.buildTokenService(serviceAddr);
> dToken = creds.getToken(service);
> {code}
> But KMS doc states:
> {quote}
> Delegation Tokens
> Similar to HTTP authentication, KMS uses Hadoop Authentication for delegation 
> tokens too.
> Under HA, A KMS instance must verify the delegation token given by another 
> KMS instance, by checking the shared secret used to sign the delegation 
> token. To do this, all KMS instances must be able to retrieve the shared 
> secret from ZooKeeper.
> {quote}
> We should either update the KMS documentation, or fix this code to share 
> delegation tokens.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-14445) Delegation tokens are not shared between KMS instances

2018-03-07 Thread Xiao Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390877#comment-16390877
 ] 

Xiao Chen commented on HADOOP-14445:


Patch 5 ready for review. 

All previous comments addressed, except for this one from [~daryn] that I want 
some discussions:
{quote}The semantics for getDelegationTokenService are oddly cyclical.  I'd 
expect it, like other hadoop clients, to premeditate the service name. ...
{quote}
If I understand correctly, in this patch it would be {{KMSCP#getKMSToken}}. It 
currently uses token selector, so should be inline with other services. The 
other part is in {{DelegationTokenAuthenticatedURL#selectDelegationToken}} - 
it's a general class so it does not aware of the token kind of the service 
that's using it. So I think we should be good to just do it in KMSCP.

> Delegation tokens are not shared between KMS instances
> --
>
> Key: HADOOP-14445
> URL: https://issues.apache.org/jira/browse/HADOOP-14445
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: kms
>Affects Versions: 2.8.0, 3.0.0-alpha1
> Environment: CDH5.7.4, Kerberized, SSL, KMS-HA, at rest encryption
>Reporter: Wei-Chiu Chuang
>Assignee: Xiao Chen
>Priority: Major
> Attachments: HADOOP-14445-branch-2.8.002.patch, 
> HADOOP-14445-branch-2.8.patch, HADOOP-14445.002.patch, 
> HADOOP-14445.003.patch, HADOOP-14445.004.patch, HADOOP-14445.05.patch
>
>
> As discovered in HADOOP-14441, KMS HA using LoadBalancingKMSClientProvider do 
> not share delegation tokens. (a client uses KMS address/port as the key for 
> delegation token)
> {code:title=DelegationTokenAuthenticatedURL#openConnection}
> if (!creds.getAllTokens().isEmpty()) {
> InetSocketAddress serviceAddr = new InetSocketAddress(url.getHost(),
> url.getPort());
> Text service = SecurityUtil.buildTokenService(serviceAddr);
> dToken = creds.getToken(service);
> {code}
> But KMS doc states:
> {quote}
> Delegation Tokens
> Similar to HTTP authentication, KMS uses Hadoop Authentication for delegation 
> tokens too.
> Under HA, A KMS instance must verify the delegation token given by another 
> KMS instance, by checking the shared secret used to sign the delegation 
> token. To do this, all KMS instances must be able to retrieve the shared 
> secret from ZooKeeper.
> {quote}
> We should either update the KMS documentation, or fix this code to share 
> delegation tokens.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-15296) Fix a wrong link for RBF in the top page

2018-03-07 Thread Yiqun Lin (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390873#comment-16390873
 ] 

Yiqun Lin commented on HADOOP-15296:


LGTM, +1, committing...

> Fix a wrong link for RBF in the top page
> 
>
> Key: HADOOP-15296
> URL: https://issues.apache.org/jira/browse/HADOOP-15296
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: documentation
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Minor
> Attachments: HADOOP-15296.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-14445) Delegation tokens are not shared between KMS instances

2018-03-07 Thread Xiao Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HADOOP-14445:
---
Attachment: HADOOP-14445.05.patch

> Delegation tokens are not shared between KMS instances
> --
>
> Key: HADOOP-14445
> URL: https://issues.apache.org/jira/browse/HADOOP-14445
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: kms
>Affects Versions: 2.8.0, 3.0.0-alpha1
> Environment: CDH5.7.4, Kerberized, SSL, KMS-HA, at rest encryption
>Reporter: Wei-Chiu Chuang
>Assignee: Xiao Chen
>Priority: Major
> Attachments: HADOOP-14445-branch-2.8.002.patch, 
> HADOOP-14445-branch-2.8.patch, HADOOP-14445.002.patch, 
> HADOOP-14445.003.patch, HADOOP-14445.004.patch, HADOOP-14445.05.patch
>
>
> As discovered in HADOOP-14441, KMS HA using LoadBalancingKMSClientProvider do 
> not share delegation tokens. (a client uses KMS address/port as the key for 
> delegation token)
> {code:title=DelegationTokenAuthenticatedURL#openConnection}
> if (!creds.getAllTokens().isEmpty()) {
> InetSocketAddress serviceAddr = new InetSocketAddress(url.getHost(),
> url.getPort());
> Text service = SecurityUtil.buildTokenService(serviceAddr);
> dToken = creds.getToken(service);
> {code}
> But KMS doc states:
> {quote}
> Delegation Tokens
> Similar to HTTP authentication, KMS uses Hadoop Authentication for delegation 
> tokens too.
> Under HA, A KMS instance must verify the delegation token given by another 
> KMS instance, by checking the shared secret used to sign the delegation 
> token. To do this, all KMS instances must be able to retrieve the shared 
> secret from ZooKeeper.
> {quote}
> We should either update the KMS documentation, or fix this code to share 
> delegation tokens.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-15280) TestKMS.testWebHDFSProxyUserKerb and TestKMS.testWebHDFSProxyUserSimple fail in trunk

2018-03-07 Thread Xiao Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390822#comment-16390822
 ] 

Xiao Chen commented on HADOOP-15280:


Thanks for working on this issue [~bharatviswa].

I agree we should change the test. But instead of changing it to validate the 
wrapper message, we should still validate the original message ("Forbidden") 
instead of the wrapper message. Otherwise it defeats the purpose of the 
assertion.

Maybe we can add a utility method into GenericTestUtils so this can be reused.

> TestKMS.testWebHDFSProxyUserKerb and TestKMS.testWebHDFSProxyUserSimple fail 
> in trunk
> -
>
> Key: HADOOP-15280
> URL: https://issues.apache.org/jira/browse/HADOOP-15280
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.2.0
>Reporter: Ray Chiang
>Assignee: Bharat Viswanadham
>Priority: Major
> Attachments: HADOOP-15280.00.patch
>
>
> I'm seeing these messages on OS X and on Linux.
> {noformat}
> [ERROR] Failures:
> [ERROR] 
> TestKMS.testWebHDFSProxyUserKerb:2526->doWebHDFSProxyUserTest:2625->runServer:158->runServer:176
>  org.apache.hadoop.security.authentication.client.AuthenticationException: 
> Error while authenticating with endpoint: 
> http://localhost:56112/kms/v1/keys?doAs=foo1
> [ERROR] 
> TestKMS.testWebHDFSProxyUserSimple:2531->doWebHDFSProxyUserTest:2625->runServer:158->runServer:176
>  org.apache.hadoop.security.authentication.client.AuthenticationException: 
> Error while authenticating with endpoint: 
> http://localhost:56206/kms/v1/keys?doAs=foo1 
> {noformat}
> as well as a [recent PreCommit-HADOOP-Build 
> job|https://builds.apache.org/job/PreCommit-HADOOP-Build/14235/].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-15280) TestKMS.testWebHDFSProxyUserKerb and TestKMS.testWebHDFSProxyUserSimple fail in trunk

2018-03-07 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390797#comment-16390797
 ] 

genericqa commented on HADOOP-15280:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 14m 
11s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 55s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
24s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 12m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
22s{color} | {color:green} hadoop-common-project/hadoop-kms: The patch 
generated 0 new + 97 unchanged - 1 fixed = 97 total (was 98) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 17s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  4m 
10s{color} | {color:green} hadoop-kms in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
37s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 75m  0s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:d4cc50f |
| JIRA Issue | HADOOP-15280 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12913516/HADOOP-15280.00.patch 
|
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 2dd9e48e5433 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 
11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 583f459 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14280/testReport/ |
| Max. process+thread count | 315 (vs. ulimit of 1) |
| modules | C: hadoop-common-project/hadoop-kms U: 
hadoop-common-project/hadoop-kms |
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14280/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> TestKMS.testWebHDFSProxyUserKerb and TestKMS.testWebHDFSProxyUserSimple

[jira] [Assigned] (HADOOP-15280) TestKMS.testWebHDFSProxyUserKerb and TestKMS.testWebHDFSProxyUserSimple fail in trunk

2018-03-07 Thread Bharat Viswanadham (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharat Viswanadham reassigned HADOOP-15280:
---

Assignee: Bharat Viswanadham

> TestKMS.testWebHDFSProxyUserKerb and TestKMS.testWebHDFSProxyUserSimple fail 
> in trunk
> -
>
> Key: HADOOP-15280
> URL: https://issues.apache.org/jira/browse/HADOOP-15280
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.2.0
>Reporter: Ray Chiang
>Assignee: Bharat Viswanadham
>Priority: Major
> Attachments: HADOOP-15280.00.patch
>
>
> I'm seeing these messages on OS X and on Linux.
> {noformat}
> [ERROR] Failures:
> [ERROR] 
> TestKMS.testWebHDFSProxyUserKerb:2526->doWebHDFSProxyUserTest:2625->runServer:158->runServer:176
>  org.apache.hadoop.security.authentication.client.AuthenticationException: 
> Error while authenticating with endpoint: 
> http://localhost:56112/kms/v1/keys?doAs=foo1
> [ERROR] 
> TestKMS.testWebHDFSProxyUserSimple:2531->doWebHDFSProxyUserTest:2625->runServer:158->runServer:176
>  org.apache.hadoop.security.authentication.client.AuthenticationException: 
> Error while authenticating with endpoint: 
> http://localhost:56206/kms/v1/keys?doAs=foo1 
> {noformat}
> as well as a [recent PreCommit-HADOOP-Build 
> job|https://builds.apache.org/job/PreCommit-HADOOP-Build/14235/].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-15280) TestKMS.testWebHDFSProxyUserKerb and TestKMS.testWebHDFSProxyUserSimple fail in trunk

2018-03-07 Thread Bharat Viswanadham (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390754#comment-16390754
 ] 

Bharat Viswanadham commented on HADOOP-15280:
-

[~xiaochen] This change is due to different exception message being thrown.

Fixed test case to address this issue.

> TestKMS.testWebHDFSProxyUserKerb and TestKMS.testWebHDFSProxyUserSimple fail 
> in trunk
> -
>
> Key: HADOOP-15280
> URL: https://issues.apache.org/jira/browse/HADOOP-15280
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.2.0
>Reporter: Ray Chiang
>Priority: Major
> Attachments: HADOOP-15280.00.patch
>
>
> I'm seeing these messages on OS X and on Linux.
> {noformat}
> [ERROR] Failures:
> [ERROR] 
> TestKMS.testWebHDFSProxyUserKerb:2526->doWebHDFSProxyUserTest:2625->runServer:158->runServer:176
>  org.apache.hadoop.security.authentication.client.AuthenticationException: 
> Error while authenticating with endpoint: 
> http://localhost:56112/kms/v1/keys?doAs=foo1
> [ERROR] 
> TestKMS.testWebHDFSProxyUserSimple:2531->doWebHDFSProxyUserTest:2625->runServer:158->runServer:176
>  org.apache.hadoop.security.authentication.client.AuthenticationException: 
> Error while authenticating with endpoint: 
> http://localhost:56206/kms/v1/keys?doAs=foo1 
> {noformat}
> as well as a [recent PreCommit-HADOOP-Build 
> job|https://builds.apache.org/job/PreCommit-HADOOP-Build/14235/].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15280) TestKMS.testWebHDFSProxyUserKerb and TestKMS.testWebHDFSProxyUserSimple fail in trunk

2018-03-07 Thread Bharat Viswanadham (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharat Viswanadham updated HADOOP-15280:

Attachment: HADOOP-15280.00.patch

> TestKMS.testWebHDFSProxyUserKerb and TestKMS.testWebHDFSProxyUserSimple fail 
> in trunk
> -
>
> Key: HADOOP-15280
> URL: https://issues.apache.org/jira/browse/HADOOP-15280
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.2.0
>Reporter: Ray Chiang
>Priority: Major
> Attachments: HADOOP-15280.00.patch
>
>
> I'm seeing these messages on OS X and on Linux.
> {noformat}
> [ERROR] Failures:
> [ERROR] 
> TestKMS.testWebHDFSProxyUserKerb:2526->doWebHDFSProxyUserTest:2625->runServer:158->runServer:176
>  org.apache.hadoop.security.authentication.client.AuthenticationException: 
> Error while authenticating with endpoint: 
> http://localhost:56112/kms/v1/keys?doAs=foo1
> [ERROR] 
> TestKMS.testWebHDFSProxyUserSimple:2531->doWebHDFSProxyUserTest:2625->runServer:158->runServer:176
>  org.apache.hadoop.security.authentication.client.AuthenticationException: 
> Error while authenticating with endpoint: 
> http://localhost:56206/kms/v1/keys?doAs=foo1 
> {noformat}
> as well as a [recent PreCommit-HADOOP-Build 
> job|https://builds.apache.org/job/PreCommit-HADOOP-Build/14235/].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15280) TestKMS.testWebHDFSProxyUserKerb and TestKMS.testWebHDFSProxyUserSimple fail in trunk

2018-03-07 Thread Bharat Viswanadham (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharat Viswanadham updated HADOOP-15280:

Status: Patch Available  (was: Open)

> TestKMS.testWebHDFSProxyUserKerb and TestKMS.testWebHDFSProxyUserSimple fail 
> in trunk
> -
>
> Key: HADOOP-15280
> URL: https://issues.apache.org/jira/browse/HADOOP-15280
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.2.0
>Reporter: Ray Chiang
>Priority: Major
> Attachments: HADOOP-15280.00.patch
>
>
> I'm seeing these messages on OS X and on Linux.
> {noformat}
> [ERROR] Failures:
> [ERROR] 
> TestKMS.testWebHDFSProxyUserKerb:2526->doWebHDFSProxyUserTest:2625->runServer:158->runServer:176
>  org.apache.hadoop.security.authentication.client.AuthenticationException: 
> Error while authenticating with endpoint: 
> http://localhost:56112/kms/v1/keys?doAs=foo1
> [ERROR] 
> TestKMS.testWebHDFSProxyUserSimple:2531->doWebHDFSProxyUserTest:2625->runServer:158->runServer:176
>  org.apache.hadoop.security.authentication.client.AuthenticationException: 
> Error while authenticating with endpoint: 
> http://localhost:56206/kms/v1/keys?doAs=foo1 
> {noformat}
> as well as a [recent PreCommit-HADOOP-Build 
> job|https://builds.apache.org/job/PreCommit-HADOOP-Build/14235/].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-12897) KerberosAuthenticator.authenticate to include URL on IO failures

2018-03-07 Thread Xiao Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-12897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390733#comment-16390733
 ] 

Xiao Chen commented on HADOOP-12897:


Hi [~ajayydv] and [~arpitagarwal], do you plan to look into the HADOOP-15280 
breakage? 

> KerberosAuthenticator.authenticate to include URL on IO failures
> 
>
> Key: HADOOP-12897
> URL: https://issues.apache.org/jira/browse/HADOOP-12897
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: security
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Assignee: Ajay Kumar
>Priority: Minor
> Fix For: 3.2.0
>
> Attachments: HADOOP-12897.001.patch, HADOOP-12897.002.patch, 
> HADOOP-12897.003.patch, HADOOP-12897.004.patch, HADOOP-12897.005.patch, 
> HADOOP-12897.006.patch, HADOOP-12897.007.patch
>
>
> If {{KerberosAuthenticator.authenticate}} can't connect to the endpoint, you 
> get a stack trace, but without the URL it is trying to talk to.
> That is: it doesn't have any equivalent of the {{NetUtils.wrapException}} 
> handler —which can't be called here as its not in the {{hadoop-auth}} module



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15158) AliyunOSS: Supports role based credential in URL

2018-03-07 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated HADOOP-15158:

Fix Version/s: (was: 3.0.1)
   (was: 2.9.1)
   (was: 3.1.0)

> AliyunOSS: Supports role based credential in URL
> 
>
> Key: HADOOP-15158
> URL: https://issues.apache.org/jira/browse/HADOOP-15158
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
>Priority: Major
> Attachments: HADOOP-15158.001.patch, HADOOP-15158.002.patch, 
> HADOOP-15158.003.patch, HADOOP-15158.004.patch, HADOOP-15158.005.patch
>
>
> Currently, AliyunCredentialsProvider supports credential by 
> configuration(core-site.xml). Sometimes, admin wants to create different 
> temporary credential(key/secret/token) for different roles so that one role 
> cannot read data that belongs to another role.
> So, our code should support pass in the URI when creates an 
> XXXCredentialsProvider so that we can get user info(role) from the URI



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-15262) AliyunOSS: rename() to move files in a directory in parallel

2018-03-07 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390703#comment-16390703
 ] 

Wangda Tan commented on HADOOP-15262:
-

[~wujinhu], removed fix version, it should be set by committer when the patch 
got committed.

> AliyunOSS: rename() to move files in a directory in parallel
> 
>
> Key: HADOOP-15262
> URL: https://issues.apache.org/jira/browse/HADOOP-15262
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
>Priority: Major
> Attachments: HADOOP-15262.001.patch, HADOOP-15262.002.patch, 
> HADOOP-15262.003.patch, HADOOP-15262.004.patch, HADOOP-15262.005.patch
>
>
> Currently, rename() operation renames files in series. This will be slow if a 
> directory contains many files. So we can improve this by rename files in 
> parallel.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15262) AliyunOSS: rename() to move files in a directory in parallel

2018-03-07 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated HADOOP-15262:

Fix Version/s: (was: 3.0.1)
   (was: 2.9.1)
   (was: 3.1.0)

> AliyunOSS: rename() to move files in a directory in parallel
> 
>
> Key: HADOOP-15262
> URL: https://issues.apache.org/jira/browse/HADOOP-15262
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
>Priority: Major
> Attachments: HADOOP-15262.001.patch, HADOOP-15262.002.patch, 
> HADOOP-15262.003.patch, HADOOP-15262.004.patch, HADOOP-15262.005.patch
>
>
> Currently, rename() operation renames files in series. This will be slow if a 
> directory contains many files. So we can improve this by rename files in 
> parallel.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-15262) AliyunOSS: rename() to move files in a directory in parallel

2018-03-07 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390652#comment-16390652
 ] 

genericqa commented on HADOOP-15262:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
16s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 
54s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 38s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
16s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 27s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
15s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
17s{color} | {color:green} hadoop-aliyun in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
20s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 43m 44s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:d4cc50f |
| JIRA Issue | HADOOP-15262 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12913498/HADOOP-15262.005.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 3236f8039ddc 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 
11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 583f459 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14279/testReport/ |
| Max. process+thread count | 303 (vs. ulimit of 1) |
| modules | C: hadoop-tools/hadoop-aliyun U: hadoop-tools/hadoop-aliyun |
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14279/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> AliyunOSS: rename() to move files in a directory in parallel
> 
>
> Key: HADOOP-15262
>

[jira] [Comment Edited] (HADOOP-15262) AliyunOSS: rename() to move files in a directory in parallel

2018-03-07 Thread wujinhu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390620#comment-16390620
 ] 

wujinhu edited comment on HADOOP-15262 at 3/8/18 2:47 AM:
--

Attach HADOOP-15262.005.patch! Set a upper limit to waiting list size.

With this patch resolved, users can improve copy performance by increase 
*fs.oss.max.copy.threads* and *fs.oss.max.copy.tasks.per.dir*(Old version 
copies directory in series). Generally*, *the greater of the 
*fs.oss.max.copy.threads* and ** *fs.oss.max.copy.tasks.per.dir***, the 
better(if we have enough resources)

For example, if we set *fs.oss.max.copy.threads = 5* and ** 
*fs.oss.max.copy.tasks.per.dir = 5*, the copy time will reduce to 1/5 of old 
version rename(). **

Here is one use case that drives us to have this improvement.

Users use spark/tensorFlow/. to train models and save models file to OSS. 
However, the number of the model files is large, so it will be slow when 
committing jobs because frameworks will call rename().

 


was (Author: wujinhu):
Attach HADOOP-15262.005.patch! Set a upper limit to waiting list size.

With this patch resolved, users can improve copy performance by increase 
*fs.oss.max.copy.threads* and *fs.oss.max.copy.tasks.per.dir*(Old version 
copies directory in series). Generally*,* the greater of the 
*fs.oss.max.copy.threads* and ** *fs.oss.max.copy.tasks.per.dir***, the 
better(if we have enough resources)

Here is one use case that drives us to have this improvement.

Users use spark/tensorFlow/. to train models and save models file to OSS. 
However, the number of the model files is large, so it will be slow when 
committing jobs because frameworks will call rename().

 

> AliyunOSS: rename() to move files in a directory in parallel
> 
>
> Key: HADOOP-15262
> URL: https://issues.apache.org/jira/browse/HADOOP-15262
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
>Priority: Major
> Fix For: 3.1.0, 2.9.1, 3.0.1
>
> Attachments: HADOOP-15262.001.patch, HADOOP-15262.002.patch, 
> HADOOP-15262.003.patch, HADOOP-15262.004.patch, HADOOP-15262.005.patch
>
>
> Currently, rename() operation renames files in series. This will be slow if a 
> directory contains many files. So we can improve this by rename files in 
> parallel.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HADOOP-15262) AliyunOSS: rename() to move files in a directory in parallel

2018-03-07 Thread wujinhu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385641#comment-16385641
 ] 

wujinhu edited comment on HADOOP-15262 at 3/8/18 2:36 AM:
--

Thanks [~Sammi] for your review comments. I have fixed from 1 to 4.

For 5, as we all know, copy operation will be inexpensive as oss will support 
shallow copy soon. User can configure a higher number threads to copy files, so 
it is a little hard to define the upper limit of the waiting list 
size(Different from pre-read configuration, because read operations are 
expensive). However, though the queue is defined as unbounded queue, but we 
have used SemaphoredDelegatingExecutor to limit the concurrency of one 
directory. 

For 6, since we read only one field of AliyunOSSCopyFileContext class, there is 
no need to call lock()(we may copy one more file when whole rename operation 
failed, but it's OK). Reduce the call of lock() can also improve our 
performance.


was (Author: wujinhu):
Thanks [~Sammi] for your review comments. I have fixed from 1 to 4.

For 5, as we all know, copy operation will be inexpensive as oss will support 
shallow copy soon. User can configure a higher number threads to copy files, so 
it is a little hard to define the upper limit of the waiting list 
size(Different from pre-read configuration, because read operations are 
expensive). However, though the queue is defined as unbounded queue, but we 
have used SemaphoredDelegatingExecutor to limit the concurrency of one 
directory. 

For 6, since we read only one field of AliyunOSSCopyFileContext class, there is 
no need to call lock(). Reduce the call of lock() can also improve our 
performance.

> AliyunOSS: rename() to move files in a directory in parallel
> 
>
> Key: HADOOP-15262
> URL: https://issues.apache.org/jira/browse/HADOOP-15262
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
>Priority: Major
> Fix For: 3.1.0, 2.9.1, 3.0.1
>
> Attachments: HADOOP-15262.001.patch, HADOOP-15262.002.patch, 
> HADOOP-15262.003.patch, HADOOP-15262.004.patch, HADOOP-15262.005.patch
>
>
> Currently, rename() operation renames files in series. This will be slow if a 
> directory contains many files. So we can improve this by rename files in 
> parallel.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-15262) AliyunOSS: rename() to move files in a directory in parallel

2018-03-07 Thread wujinhu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390620#comment-16390620
 ] 

wujinhu commented on HADOOP-15262:
--

Attach HADOOP-15262.005.patch! Set a upper limit to waiting list size.

With this patch resolved, users can improve copy performance by increase 
*fs.oss.max.copy.threads* and *fs.oss.max.copy.tasks.per.dir*(Old version 
copies directory in series). Generally*,* the greater of the 
*fs.oss.max.copy.threads* and ** *fs.oss.max.copy.tasks.per.dir***, the 
better(if we have enough resources)

Here is one use case that drives us to have this improvement.

Users use spark/tensorFlow/. to train models and save models file to OSS. 
However, the number of the model files is large, so it will be slow when 
committing jobs because frameworks will call rename().

 

> AliyunOSS: rename() to move files in a directory in parallel
> 
>
> Key: HADOOP-15262
> URL: https://issues.apache.org/jira/browse/HADOOP-15262
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
>Priority: Major
> Fix For: 3.1.0, 2.9.1, 3.0.1
>
> Attachments: HADOOP-15262.001.patch, HADOOP-15262.002.patch, 
> HADOOP-15262.003.patch, HADOOP-15262.004.patch, HADOOP-15262.005.patch
>
>
> Currently, rename() operation renames files in series. This will be slow if a 
> directory contains many files. So we can improve this by rename files in 
> parallel.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-12502) SetReplication OutOfMemoryError

2018-03-07 Thread Aaron Fabbri (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-12502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390614#comment-16390614
 ] 

Aaron Fabbri commented on HADOOP-12502:
---

Thanks for the updated patch.. Will try to review this week.

> SetReplication OutOfMemoryError
> ---
>
> Key: HADOOP-12502
> URL: https://issues.apache.org/jira/browse/HADOOP-12502
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.3.0
>Reporter: Philipp Schuegerl
>Assignee: Vinayakumar B
>Priority: Major
> Attachments: HADOOP-12502-01.patch, HADOOP-12502-02.patch, 
> HADOOP-12502-03.patch, HADOOP-12502-04.patch, HADOOP-12502-05.patch, 
> HADOOP-12502-06.patch, HADOOP-12502-07.patch, HADOOP-12502-08.patch, 
> HADOOP-12502-09.patch, HADOOP-12502-10.patch
>
>
> Setting the replication of a HDFS folder recursively can run out of memory. 
> E.g. with a large /var/log directory:
> hdfs dfs -setrep -R -w 1 /var/log
> Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit 
> exceeded
>   at java.util.Arrays.copyOfRange(Arrays.java:2694)
>   at java.lang.String.(String.java:203)
>   at java.lang.String.substring(String.java:1913)
>   at java.net.URI$Parser.substring(URI.java:2850)
>   at java.net.URI$Parser.parse(URI.java:3046)
>   at java.net.URI.(URI.java:753)
>   at org.apache.hadoop.fs.Path.initialize(Path.java:203)
>   at org.apache.hadoop.fs.Path.(Path.java:116)
>   at org.apache.hadoop.fs.Path.(Path.java:94)
>   at 
> org.apache.hadoop.hdfs.protocol.HdfsFileStatus.getFullPath(HdfsFileStatus.java:222)
>   at 
> org.apache.hadoop.hdfs.protocol.HdfsFileStatus.makeQualified(HdfsFileStatus.java:246)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:689)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:102)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:712)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:708)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:708)
>   at 
> org.apache.hadoop.fs.shell.PathData.getDirectoryContents(PathData.java:268)
>   at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:347)
>   at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:308)
>   at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:347)
>   at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:308)
>   at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:347)
>   at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:308)
>   at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:347)
>   at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:308)
>   at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:347)
>   at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:308)
>   at 
> org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:278)
>   at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:260)
>   at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:244)
>   at 
> org.apache.hadoop.fs.shell.SetReplication.processArguments(SetReplication.java:76)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-15273) distcp can't handle remote stores with different checksum algorithms

2018-03-07 Thread Aaron Fabbri (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390602#comment-16390602
 ] 

Aaron Fabbri commented on HADOOP-15273:
---

LGTM.  +1

Testing: I only ran the TestCopyMapper unit test



> distcp can't handle remote stores with different checksum algorithms
> 
>
> Key: HADOOP-15273
> URL: https://issues.apache.org/jira/browse/HADOOP-15273
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Critical
> Attachments: HADOOP-15273-001.patch, HADOOP-15273-002.patch, 
> HADOOP-15273-003.patch
>
>
> When using distcp without {{-skipcrcchecks}} . If there's a checksum mismatch 
> between src and dest store types (e.g hdfs to s3), then the error message 
> will talk about blocksize, even when its the underlying checksum protocol 
> itself which is the cause for failure
> bq. Source and target differ in block-size. Use -pb to preserve block-sizes 
> during copy. Alternatively, skip checksum-checks altogether, using -skipCrc. 
> (NOTE: By skipping checksums, one runs the risk of masking data-corruption 
> during file-transfer.)
> update:  the CRC check takes always place on a distcp upload before the file 
> is renamed into place. *and you can't disable it then*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15262) AliyunOSS: rename() to move files in a directory in parallel

2018-03-07 Thread wujinhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wujinhu updated HADOOP-15262:
-
Attachment: HADOOP-15262.005.patch

> AliyunOSS: rename() to move files in a directory in parallel
> 
>
> Key: HADOOP-15262
> URL: https://issues.apache.org/jira/browse/HADOOP-15262
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
>Priority: Major
> Fix For: 3.1.0, 2.9.1, 3.0.1
>
> Attachments: HADOOP-15262.001.patch, HADOOP-15262.002.patch, 
> HADOOP-15262.003.patch, HADOOP-15262.004.patch, HADOOP-15262.005.patch
>
>
> Currently, rename() operation renames files in series. This will be slow if a 
> directory contains many files. So we can improve this by rename files in 
> parallel.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-15295) Remove redundant logging related to tags from Configuration

2018-03-07 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390444#comment-16390444
 ] 

genericqa commented on HADOOP-15295:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
15s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
1s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 44s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
47s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 11m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 11m 
33s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 45s{color} | {color:orange} hadoop-common-project/hadoop-common: The patch 
generated 1 new + 244 unchanged - 0 fixed = 245 total (was 244) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
8m 44s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  7m 
58s{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 76m 21s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:d4cc50f |
| JIRA Issue | HADOOP-15295 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12913459/HADOOP-15295.000.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 80467bdf5254 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 
13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 19ae442 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14278/artifact/out/diff-checkstyle-hadoop-common-project_hadoop-common.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14278/testReport/ |
| Max. process+thread count | 1432 (vs. ulimit of 1) |
| modules | C: hadoop-common-project/hadoop-common U: 
hadoop-common-project/hadoop-common |
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14278/console |
| Powered by | Apache Yetus

[jira] [Commented] (HADOOP-15292) Distcp's use of pread is slowing it down.

2018-03-07 Thread Chris Douglas (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390440#comment-16390440
 ] 

Chris Douglas commented on HADOOP-15292:


+1 

> Distcp's use of pread is slowing it down.
> -
>
> Key: HADOOP-15292
> URL: https://issues.apache.org/jira/browse/HADOOP-15292
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 2.5.0
>Reporter: Virajith Jalaparti
>Priority: Minor
> Attachments: HADOOP-15292.000.patch, HADOOP-15292.001.patch, 
> HADOOP-15292.002.patch
>
>
> Distcp currently uses positioned-reads (in 
> RetriableFileCopyCommand#copyBytes) when the source offset is > 0. This 
> results in unnecessary overheads (new BlockReader being created on the 
> client-side, multiple readBlock() calls to the Datanodes, each of which 
> requires the creation of a BlockSender and an inputstream to the ReplicaInfo).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-15297) Make s3a etag -> checksum publishing option

2018-03-07 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390431#comment-16390431
 ] 

genericqa commented on HADOOP-15297:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
20s{color} | {color:blue} Docker mode activated. {color} |
| {color:blue}0{color} | {color:blue} patch {color} | {color:blue}  0m  
3s{color} | {color:blue} The patch file was not named according to hadoop's 
naming conventions. Please see https://wiki.apache.org/hadoop/HowToContribute 
for instructions. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
16s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
 8s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 46s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
25s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
19s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 13m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 13m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
44s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 3 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 30s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
23s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  9m 
29s{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  4m 
51s{color} | {color:green} hadoop-aws in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
36s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}105m 42s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:d4cc50f |
| JIRA Issue | HADOOP-15297 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12913451/HADOOP-15297-001.patchh
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  xml  findbugs  checkstyle  |
| uname | Linux 023b658063ea

[jira] [Commented] (HADOOP-15206) BZip2 drops and duplicates records when input split size is small

2018-03-07 Thread Aaron Fabbri (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390429#comment-16390429
 ] 

Aaron Fabbri commented on HADOOP-15206:
---

Just reviewing this as part of a backport. Quick question [~jlowe] and 
[~tanakahda]:

 
{noformat}
+long skipBytes = numSkipped;
+while (skipBytes > 0) {
+  long s = bufferedIn.skip(skipBytes);
+  if (s > 0) {
+skipBytes -= s;
+  } else {
+if (bufferedIn.read() == -1) {
+  break; // end of the split
+} else {
+  skipBytes--;
{noformat}

Why is {{skipBytes}} decremented here?  skip() returned <= 0, doesn't that mean 
that no bytes were skipped?  I know we want this loop to terminate eventually 
but I did not understand this part.

{noformat}
+}
+  }{noformat}

> BZip2 drops and duplicates records when input split size is small
> -
>
> Key: HADOOP-15206
> URL: https://issues.apache.org/jira/browse/HADOOP-15206
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.8.3, 3.0.0
>Reporter: Aki Tanaka
>Assignee: Aki Tanaka
>Priority: Major
> Fix For: 3.1.0, 2.10.0, 2.9.1, 2.8.4, 2.7.6, 3.0.2
>
> Attachments: HADOOP-15206-test.patch, HADOOP-15206.001.patch, 
> HADOOP-15206.002.patch, HADOOP-15206.003.patch, HADOOP-15206.004.patch, 
> HADOOP-15206.005.patch, HADOOP-15206.006.patch, HADOOP-15206.007.patch, 
> HADOOP-15206.008.patch
>
>
> BZip2 can drop and duplicate record when input split file is small. I 
> confirmed that this issue happens when the input split size is between 1byte 
> and 4bytes.
> I am seeing the following 2 problem behaviors.
>  
> 1. Drop record:
> BZip2 skips the first record in the input file when the input split size is 
> small
>  
> Set the split size to 3 and tested to load 100 records (0, 1, 2..99)
> {code:java}
> 2018-02-01 10:52:33,502 INFO  [Thread-17] mapred.TestTextInputFormat 
> (TestTextInputFormat.java:verifyPartitions(317)) - 
> splits[1]=file:/work/count-mismatch2/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/target/test-dir/TestTextInputFormat/test.bz2:3+3
>  count=99{code}
> > The input format read only 99 records but not 100 records
>  
> 2. Duplicate Record:
> 2 input splits has same BZip2 records when the input split size is small
>  
> Set the split size to 1 and tested to load 100 records (0, 1, 2..99)
>  
> {code:java}
> 2018-02-01 11:18:49,309 INFO [Thread-17] mapred.TestTextInputFormat 
> (TestTextInputFormat.java:verifyPartitions(318)) - splits[3]=file 
> /work/count-mismatch2/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/target/test-dir/TestTextInputFormat/test.bz2:3+1
>  count=99
> 2018-02-01 11:18:49,310 WARN [Thread-17] mapred.TestTextInputFormat 
> (TestTextInputFormat.java:verifyPartitions(308)) - conflict with 1 in split 4 
> at position 8
> {code}
>  
> I experienced this error when I execute Spark (SparkSQL) job under the 
> following conditions:
> * The file size of the input files are small (around 1KB)
> * Hadoop cluster has many slave nodes (able to launch many executor tasks)
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-15273) distcp can't handle remote stores with different checksum algorithms

2018-03-07 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390390#comment-16390390
 ] 

genericqa commented on HADOOP-15273:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
16s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 13s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
20s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
14s{color} | {color:green} hadoop-tools/hadoop-distcp: The patch generated 0 
new + 90 unchanged - 4 fixed = 90 total (was 94) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 30s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 15m  
0s{color} | {color:green} hadoop-distcp in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
20s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 59m 56s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:d4cc50f |
| JIRA Issue | HADOOP-15273 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12913455/HADOOP-15273-003.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 9950576a3130 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 
11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 46d29e3 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14277/testReport/ |
| Max. process+thread count | 346 (vs. ulimit of 1) |
| modules | C: hadoop-tools/hadoop-distcp U: hadoop-tools/hadoop-distcp |
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14277/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> distcp can't handle remote stores with different checksum algorithms
>

[jira] [Commented] (HADOOP-15292) Distcp's use of pread is slowing it down.

2018-03-07 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390383#comment-16390383
 ] 

Steve Loughran commented on HADOOP-15292:
-

LGTM. 

Chris. what say you?

> Distcp's use of pread is slowing it down.
> -
>
> Key: HADOOP-15292
> URL: https://issues.apache.org/jira/browse/HADOOP-15292
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 2.5.0
>Reporter: Virajith Jalaparti
>Priority: Minor
> Attachments: HADOOP-15292.000.patch, HADOOP-15292.001.patch, 
> HADOOP-15292.002.patch
>
>
> Distcp currently uses positioned-reads (in 
> RetriableFileCopyCommand#copyBytes) when the source offset is > 0. This 
> results in unnecessary overheads (new BlockReader being created on the 
> client-side, multiple readBlock() calls to the Datanodes, each of which 
> requires the creation of a BlockSender and an inputstream to the ReplicaInfo).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-15209) DistCp to eliminate needless deletion of files under already-deleted directories

2018-03-07 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390359#comment-16390359
 ] 

Steve Loughran commented on HADOOP-15209:
-

checkstyle are on method length and a line being 84 vs 80. Needless

h1. Can someone start reviewing this so we can get it into Hadoop 3.1? 

thanks

> DistCp to eliminate needless deletion of files under already-deleted 
> directories
> 
>
> Key: HADOOP-15209
> URL: https://issues.apache.org/jira/browse/HADOOP-15209
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 2.9.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Attachments: HADOOP-15209-001.patch, HADOOP-15209-002.patch, 
> HADOOP-15209-003.patch, HADOOP-15209-004.patch, HADOOP-15209-005.patch, 
> HADOOP-15209-006.patch
>
>
> DistCP issues a delete(file) request even if is underneath an already deleted 
> directory. This generates needless load on filesystems/object stores, and, if 
> the store throttles delete, can dramatically slow down the delete operation.
> If the distcp delete operation can build a history of deleted directories, 
> then it will know when it does not need to issue those deletes.
> Care is needed here to make sure that whatever structure is created does not 
> overload the heap of the process.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15295) Remove redundant logging related to tags from Configuration

2018-03-07 Thread Ajay Kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajay Kumar updated HADOOP-15295:

Status: Patch Available  (was: Open)

> Remove redundant logging related to tags from Configuration
> ---
>
> Key: HADOOP-15295
> URL: https://issues.apache.org/jira/browse/HADOOP-15295
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Ajay Kumar
>Assignee: Ajay Kumar
>Priority: Major
> Attachments: HADOOP-15295.000.patch
>
>
> Remove redundant logging related to tags from Configuration.
> {code}
> 2018-03-06 18:55:46,164 INFO conf.Configuration: Removed undeclared tags:
> 2018-03-06 18:55:46,237 INFO conf.Configuration: Removed undeclared tags:
> 2018-03-06 18:55:46,249 INFO conf.Configuration: Removed undeclared tags:
> 2018-03-06 18:55:46,256 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15295) Remove redundant logging related to tags from Configuration

2018-03-07 Thread Ajay Kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajay Kumar updated HADOOP-15295:

Attachment: HADOOP-15295.000.patch

> Remove redundant logging related to tags from Configuration
> ---
>
> Key: HADOOP-15295
> URL: https://issues.apache.org/jira/browse/HADOOP-15295
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Ajay Kumar
>Assignee: Ajay Kumar
>Priority: Major
> Attachments: HADOOP-15295.000.patch
>
>
> Remove redundant logging related to tags from Configuration.
> {code}
> 2018-03-06 18:55:46,164 INFO conf.Configuration: Removed undeclared tags:
> 2018-03-06 18:55:46,237 INFO conf.Configuration: Removed undeclared tags:
> 2018-03-06 18:55:46,249 INFO conf.Configuration: Removed undeclared tags:
> 2018-03-06 18:55:46,256 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-15273) distcp can't handle remote stores with different checksum algorithms

2018-03-07 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390297#comment-16390297
 ] 

genericqa commented on HADOOP-15273:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 25s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
17s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 13s{color} | {color:orange} hadoop-tools/hadoop-distcp: The patch generated 
1 new + 90 unchanged - 4 fixed = 91 total (was 94) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m  9s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
15s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 14m 
59s{color} | {color:green} hadoop-distcp in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
20s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 58m 45s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:d4cc50f |
| JIRA Issue | HADOOP-15273 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12913445/HADOOP-15273-002.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux ad3b5c0e074b 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 
11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 46d29e3 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14275/artifact/out/diff-checkstyle-hadoop-tools_hadoop-distcp.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14275/testReport/ |
| Max. process+thread count | 339 (vs. ulimit of 1) |
| modules | C: hadoop-tools/hadoop-distcp U: hadoop-tools/hadoop-distcp |
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14275/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org

[jira] [Updated] (HADOOP-15273) distcp can't handle remote stores with different checksum algorithms

2018-03-07 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15273:

Status: Patch Available  (was: Open)

> distcp can't handle remote stores with different checksum algorithms
> 
>
> Key: HADOOP-15273
> URL: https://issues.apache.org/jira/browse/HADOOP-15273
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Critical
> Attachments: HADOOP-15273-001.patch, HADOOP-15273-002.patch, 
> HADOOP-15273-003.patch
>
>
> When using distcp without {{-skipcrcchecks}} . If there's a checksum mismatch 
> between src and dest store types (e.g hdfs to s3), then the error message 
> will talk about blocksize, even when its the underlying checksum protocol 
> itself which is the cause for failure
> bq. Source and target differ in block-size. Use -pb to preserve block-sizes 
> during copy. Alternatively, skip checksum-checks altogether, using -skipCrc. 
> (NOTE: By skipping checksums, one runs the risk of masking data-corruption 
> during file-transfer.)
> update:  the CRC check takes always place on a distcp upload before the file 
> is renamed into place. *and you can't disable it then*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-15273) distcp can't handle remote stores with different checksum algorithms

2018-03-07 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390291#comment-16390291
 ] 

Steve Loughran commented on HADOOP-15273:
-

Patch 003
* fixes checkstyle
* fixes tests

With HADOOP-15297 making the etags => checksum feature in s3a optional, this 
isn't quite a blocker, but it is when you try to distcp between any two stores 
with different algorithms, because only -update lets you skip the checks right 
now. If any other FS offers checksums, things will break

> distcp can't handle remote stores with different checksum algorithms
> 
>
> Key: HADOOP-15273
> URL: https://issues.apache.org/jira/browse/HADOOP-15273
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Critical
> Attachments: HADOOP-15273-001.patch, HADOOP-15273-002.patch, 
> HADOOP-15273-003.patch
>
>
> When using distcp without {{-skipcrcchecks}} . If there's a checksum mismatch 
> between src and dest store types (e.g hdfs to s3), then the error message 
> will talk about blocksize, even when its the underlying checksum protocol 
> itself which is the cause for failure
> bq. Source and target differ in block-size. Use -pb to preserve block-sizes 
> during copy. Alternatively, skip checksum-checks altogether, using -skipCrc. 
> (NOTE: By skipping checksums, one runs the risk of masking data-corruption 
> during file-transfer.)
> update:  the CRC check takes always place on a distcp upload before the file 
> is renamed into place. *and you can't disable it then*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15273) distcp can't handle remote stores with different checksum algorithms

2018-03-07 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15273:

Status: Open  (was: Patch Available)

> distcp can't handle remote stores with different checksum algorithms
> 
>
> Key: HADOOP-15273
> URL: https://issues.apache.org/jira/browse/HADOOP-15273
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Critical
> Attachments: HADOOP-15273-001.patch, HADOOP-15273-002.patch, 
> HADOOP-15273-003.patch
>
>
> When using distcp without {{-skipcrcchecks}} . If there's a checksum mismatch 
> between src and dest store types (e.g hdfs to s3), then the error message 
> will talk about blocksize, even when its the underlying checksum protocol 
> itself which is the cause for failure
> bq. Source and target differ in block-size. Use -pb to preserve block-sizes 
> during copy. Alternatively, skip checksum-checks altogether, using -skipCrc. 
> (NOTE: By skipping checksums, one runs the risk of masking data-corruption 
> during file-transfer.)
> update:  the CRC check takes always place on a distcp upload before the file 
> is renamed into place. *and you can't disable it then*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15273) distcp can't handle remote stores with different checksum algorithms

2018-03-07 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15273:

Target Version/s: 3.1.0
  Status: Patch Available  (was: Open)

> distcp can't handle remote stores with different checksum algorithms
> 
>
> Key: HADOOP-15273
> URL: https://issues.apache.org/jira/browse/HADOOP-15273
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Blocker
> Attachments: HADOOP-15273-001.patch, HADOOP-15273-002.patch, 
> HADOOP-15273-003.patch
>
>
> When using distcp without {{-skipcrcchecks}} . If there's a checksum mismatch 
> between src and dest store types (e.g hdfs to s3), then the error message 
> will talk about blocksize, even when its the underlying checksum protocol 
> itself which is the cause for failure
> bq. Source and target differ in block-size. Use -pb to preserve block-sizes 
> during copy. Alternatively, skip checksum-checks altogether, using -skipCrc. 
> (NOTE: By skipping checksums, one runs the risk of masking data-corruption 
> during file-transfer.)
> update:  the CRC check takes always place on a distcp upload before the file 
> is renamed into place. *and you can't disable it then*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15273) distcp can't handle remote stores with different checksum algorithms

2018-03-07 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15273:

Attachment: HADOOP-15273-003.patch

> distcp can't handle remote stores with different checksum algorithms
> 
>
> Key: HADOOP-15273
> URL: https://issues.apache.org/jira/browse/HADOOP-15273
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Blocker
> Attachments: HADOOP-15273-001.patch, HADOOP-15273-002.patch, 
> HADOOP-15273-003.patch
>
>
> When using distcp without {{-skipcrcchecks}} . If there's a checksum mismatch 
> between src and dest store types (e.g hdfs to s3), then the error message 
> will talk about blocksize, even when its the underlying checksum protocol 
> itself which is the cause for failure
> bq. Source and target differ in block-size. Use -pb to preserve block-sizes 
> during copy. Alternatively, skip checksum-checks altogether, using -skipCrc. 
> (NOTE: By skipping checksums, one runs the risk of masking data-corruption 
> during file-transfer.)
> update:  the CRC check takes always place on a distcp upload before the file 
> is renamed into place. *and you can't disable it then*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15273) distcp can't handle remote stores with different checksum algorithms

2018-03-07 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15273:

Priority: Critical  (was: Blocker)

> distcp can't handle remote stores with different checksum algorithms
> 
>
> Key: HADOOP-15273
> URL: https://issues.apache.org/jira/browse/HADOOP-15273
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Critical
> Attachments: HADOOP-15273-001.patch, HADOOP-15273-002.patch, 
> HADOOP-15273-003.patch
>
>
> When using distcp without {{-skipcrcchecks}} . If there's a checksum mismatch 
> between src and dest store types (e.g hdfs to s3), then the error message 
> will talk about blocksize, even when its the underlying checksum protocol 
> itself which is the cause for failure
> bq. Source and target differ in block-size. Use -pb to preserve block-sizes 
> during copy. Alternatively, skip checksum-checks altogether, using -skipCrc. 
> (NOTE: By skipping checksums, one runs the risk of masking data-corruption 
> during file-transfer.)
> update:  the CRC check takes always place on a distcp upload before the file 
> is renamed into place. *and you can't disable it then*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15273) distcp can't handle remote stores with different checksum algorithms

2018-03-07 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15273:

Status: Open  (was: Patch Available)

> distcp can't handle remote stores with different checksum algorithms
> 
>
> Key: HADOOP-15273
> URL: https://issues.apache.org/jira/browse/HADOOP-15273
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Blocker
> Attachments: HADOOP-15273-001.patch, HADOOP-15273-002.patch, 
> HADOOP-15273-003.patch
>
>
> When using distcp without {{-skipcrcchecks}} . If there's a checksum mismatch 
> between src and dest store types (e.g hdfs to s3), then the error message 
> will talk about blocksize, even when its the underlying checksum protocol 
> itself which is the cause for failure
> bq. Source and target differ in block-size. Use -pb to preserve block-sizes 
> during copy. Alternatively, skip checksum-checks altogether, using -skipCrc. 
> (NOTE: By skipping checksums, one runs the risk of masking data-corruption 
> during file-transfer.)
> update:  the CRC check takes always place on a distcp upload before the file 
> is renamed into place. *and you can't disable it then*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15297) Make s3a etag -> checksum publishing option

2018-03-07 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15297:

Status: Patch Available  (was: Open)

FYI: [~devaraj] [~ehiggs] [~fabbri][~leftnoteasy]

With this patch in you don't need HADOOP-15273 to get hdfs <-> s3a distcp to 
work, but you will want that if you do enable this feature

> Make s3a etag -> checksum publishing option
> ---
>
> Key: HADOOP-15297
> URL: https://issues.apache.org/jira/browse/HADOOP-15297
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Blocker
> Attachments: HADOOP-15297-001.patchh
>
>
> HADOOP-15273 shows how distcp doesn't handle non-HDFS filesystems with 
> checksums.
> Exposing Etags as checksums, HADOOP-13282, breaks workflows which back up to 
> s3a.
> Rather than revert  I want to make it an option, off by default. Once we are 
> happy with distcp in future, we can turn it on.
> Why an option? Because it lines up for a successor to distcp which saves src 
> and dest checksums to a file and can then verify whether or not files have 
> really changed. Currently distcp relies on dest checksum algorithm being the 
> same as the src for incremental updates, but if either of the stores don't 
> serve checksums, silently downgrades to not checking. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-15297) Make s3a etag -> checksum publishing option

2018-03-07 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390281#comment-16390281
 ] 

Steve Loughran commented on HADOOP-15297:
-

HADOOP-15297 patch 001
* make etag feature optional, disabled by default
* update tests to check both states
* update docs
* also update instrumentation to track invocations. Not directly used, but 
useful for anything downstream trying to track the #of calls made

Testing: S3 ireland, also did the mvn site:site to verify the updated docs were 
valid.

> Make s3a etag -> checksum publishing option
> ---
>
> Key: HADOOP-15297
> URL: https://issues.apache.org/jira/browse/HADOOP-15297
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Blocker
> Attachments: HADOOP-15297-001.patchh
>
>
> HADOOP-15273 shows how distcp doesn't handle non-HDFS filesystems with 
> checksums.
> Exposing Etags as checksums, HADOOP-13282, breaks workflows which back up to 
> s3a.
> Rather than revert  I want to make it an option, off by default. Once we are 
> happy with distcp in future, we can turn it on.
> Why an option? Because it lines up for a successor to distcp which saves src 
> and dest checksums to a file and can then verify whether or not files have 
> really changed. Currently distcp relies on dest checksum algorithm being the 
> same as the src for incremental updates, but if either of the stores don't 
> serve checksums, silently downgrades to not checking. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15297) Make s3a etag -> checksum publishing option

2018-03-07 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15297:

Attachment: HADOOP-15297-001.patchh

> Make s3a etag -> checksum publishing option
> ---
>
> Key: HADOOP-15297
> URL: https://issues.apache.org/jira/browse/HADOOP-15297
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Blocker
> Attachments: HADOOP-15297-001.patchh
>
>
> HADOOP-15273 shows how distcp doesn't handle non-HDFS filesystems with 
> checksums.
> Exposing Etags as checksums, HADOOP-13282, breaks workflows which back up to 
> s3a.
> Rather than revert  I want to make it an option, off by default. Once we are 
> happy with distcp in future, we can turn it on.
> Why an option? Because it lines up for a successor to distcp which saves src 
> and dest checksums to a file and can then verify whether or not files have 
> really changed. Currently distcp relies on dest checksum algorithm being the 
> same as the src for incremental updates, but if either of the stores don't 
> serve checksums, silently downgrades to not checking. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-15278) log s3a at info

2018-03-07 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390262#comment-16390262
 ] 

genericqa commented on HADOOP-15278:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
16s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
1s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
28m 41s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m  8s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
58s{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
22s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 43m 52s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:d4cc50f |
| JIRA Issue | HADOOP-15278 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12913440/HADOOP-15278-001.patch
 |
| Optional Tests |  asflicense  mvnsite  unit  |
| uname | Linux e4b1ac05f0d3 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 
11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 46d29e3 |
| maven | version: Apache Maven 3.3.9 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14274/testReport/ |
| Max. process+thread count | 334 (vs. ulimit of 1) |
| modules | C: hadoop-common-project/hadoop-common U: 
hadoop-common-project/hadoop-common |
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14274/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> log s3a at info
> ---
>
> Key: HADOOP-15278
> URL: https://issues.apache.org/jira/browse/HADOOP-15278
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Attachments: HADOOP-15278-001.patch
>
>
> since it was added, hadoop conf/log4j only logs s3a at ERROR, even though in 
> our test/resources it logs at info. We do actually log lots of stuff useful 
> when debugging things
> Proposed: drop the log level to INFO here



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Created] (HADOOP-15297) Make s3a etag -> checksum publishing option

2018-03-07 Thread Steve Loughran (JIRA)

Steve Loughran created HADOOP-15297:
---

 Summary: Make s3a etag -> checksum publishing option
 Key: HADOOP-15297
 URL: https://issues.apache.org/jira/browse/HADOOP-15297
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Affects Versions: 3.1.0
Reporter: Steve Loughran
Assignee: Steve Loughran


HADOOP-15273 shows how distcp doesn't handle non-HDFS filesystems with 
checksums.

Exposing Etags as checksums, HADOOP-13282, breaks workflows which back up to 
s3a.

Rather than revert  I want to make it an option, off by default. Once we are 
happy with distcp in future, we can turn it on.

Why an option? Because it lines up for a successor to distcp which saves src 
and dest checksums to a file and can then verify whether or not files have 
really changed. Currently distcp relies on dest checksum algorithm being the 
same as the src for incremental updates, but if either of the stores don't 
serve checksums, silently downgrades to not checking. 




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15273) distcp can't handle remote stores with different checksum algorithms

2018-03-07 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15273:

Status: Patch Available  (was: Open)

> distcp can't handle remote stores with different checksum algorithms
> 
>
> Key: HADOOP-15273
> URL: https://issues.apache.org/jira/browse/HADOOP-15273
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Blocker
> Attachments: HADOOP-15273-001.patch, HADOOP-15273-002.patch
>
>
> When using distcp without {{-skipcrcchecks}} . If there's a checksum mismatch 
> between src and dest store types (e.g hdfs to s3), then the error message 
> will talk about blocksize, even when its the underlying checksum protocol 
> itself which is the cause for failure
> bq. Source and target differ in block-size. Use -pb to preserve block-sizes 
> during copy. Alternatively, skip checksum-checks altogether, using -skipCrc. 
> (NOTE: By skipping checksums, one runs the risk of masking data-corruption 
> during file-transfer.)
> update:  the CRC check takes always place on a distcp upload before the file 
> is renamed into place. *and you can't disable it then*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15273) distcp can't handle remote stores with different checksum algorithms

2018-03-07 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15273:

Attachment: HADOOP-15273-002.patch

> distcp can't handle remote stores with different checksum algorithms
> 
>
> Key: HADOOP-15273
> URL: https://issues.apache.org/jira/browse/HADOOP-15273
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Blocker
> Attachments: HADOOP-15273-001.patch, HADOOP-15273-002.patch
>
>
> When using distcp without {{-skipcrcchecks}} . If there's a checksum mismatch 
> between src and dest store types (e.g hdfs to s3), then the error message 
> will talk about blocksize, even when its the underlying checksum protocol 
> itself which is the cause for failure
> bq. Source and target differ in block-size. Use -pb to preserve block-sizes 
> during copy. Alternatively, skip checksum-checks altogether, using -skipCrc. 
> (NOTE: By skipping checksums, one runs the risk of masking data-corruption 
> during file-transfer.)
> update:  the CRC check takes always place on a distcp upload before the file 
> is renamed into place. *and you can't disable it then*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15273) distcp can't handle remote stores with different checksum algorithms

2018-03-07 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15273:

Status: Open  (was: Patch Available)

> distcp can't handle remote stores with different checksum algorithms
> 
>
> Key: HADOOP-15273
> URL: https://issues.apache.org/jira/browse/HADOOP-15273
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Blocker
> Attachments: HADOOP-15273-001.patch, HADOOP-15273-002.patch
>
>
> When using distcp without {{-skipcrcchecks}} . If there's a checksum mismatch 
> between src and dest store types (e.g hdfs to s3), then the error message 
> will talk about blocksize, even when its the underlying checksum protocol 
> itself which is the cause for failure
> bq. Source and target differ in block-size. Use -pb to preserve block-sizes 
> during copy. Alternatively, skip checksum-checks altogether, using -skipCrc. 
> (NOTE: By skipping checksums, one runs the risk of masking data-corruption 
> during file-transfer.)
> update:  the CRC check takes always place on a distcp upload before the file 
> is renamed into place. *and you can't disable it then*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-14937) initial part uploads seem to block unnecessarily in S3ABlockOutputStream

2018-03-07 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-14937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-14937:

Parent Issue: HADOOP-15220  (was: HADOOP-14831)

> initial part uploads seem to block unnecessarily in S3ABlockOutputStream
> 
>
> Key: HADOOP-14937
> URL: https://issues.apache.org/jira/browse/HADOOP-14937
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Steven Rand
>Assignee: Steven Rand
>Priority: Major
> Attachments: yjp_threads.png
>
>
> From looking at a YourKit snapshot of an FsShell process running a {{hadoop 
> fs -put file:///... s3a://...}}, it seems that the first part in the 
> multipart upload doesn't begin to upload until n of the 
> {{s3a-transfer-shared-pool}} threads are able to start uploading, where n is 
> the value of {{fs.s3a.fast.upload.active.blocks}}.
> To hopefully clarify a bit, the series of events that I expected to see with 
> {{fs.s3a.fast.upload.active.blocks}} set to 4 is:
> 1.  An amount of data equal to {{fs.s3a.multipart.size}} is buffered into 
> off-heap memory (I have {{fs.s3a.fast.upload.buffer = bytebuffer}}).
> 2. As soon as that happens, a thread begins to upload that part. Meanwhile, 
> the main thread continues to buffer data into off-heap memory.
> 3. Once another part has been buffered into off-heap memory, a separate 
> thread uploads that part, and so on.
> Whereas what I think the YK snapshot shows happening is:
> 1. An amount of data equal to {{fs.s3a.multipart.size}} * 4 is buffered into 
> off-heap memory.
> 2. Four threads start to upload one part each at the same time.
> I've attached a picture of the "Threads" tab to show what I mean. Basically 
> the times at which the first four {{s3a-transfer-shared-pool}} threads start 
> to upload are roughly the same, whereas I would've expected them to be more 
> staggered.
> I'm actually not sure whether this is the expected behavior or not, so feel 
> free to close if this doesn't come as a surprise to anyone.
> For some context, I've been trying to get a sense for roughly which values of 
> {{fs.s3a.multipart.size}} perform the best at different file sizes. One thing 
> that I found confusing is that a part size of 5 MB seems to outperform a part 
> size of 64 MB up until files that are upwards of about 500 MB in size. This 
> seems odd, since each {{uploadPart}} call is its own HTTP request, and I 
> would've expected the overhead of those to become costly at small part sizes. 
> My suspicion is that with 4 concurrent part uploads and 64 MB blocks, we have 
> to wait until 256 MB are buffered before we can start uploading, while with 5 
> MB blocks we can start uploading as soon as we buffer 20 MB, and that's what 
> gives the smaller parts the advantage for smaller files.
> I'm happy to submit a patch if this is in fact a problem, but wanted to check 
> to make sure I'm not just misunderstanding something.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-15273) distcp can't handle remote stores with different checksum algorithms

2018-03-07 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390179#comment-16390179
 ] 

Steve Loughran commented on HADOOP-15273:
-

This is what you get now
{code}
Checksum mismatch between hdfs://localhost:50883/tmp/source/5/6 and 
hdfs://localhost:50883/tmp/target/.distcp.tmp.attempt___m_00_0. Source 
and target differ in block-size.
 Use -pb to preserve block-sizes during copy. You can skip checksum-checks 
altogether  with -skipcrccheck.
 (NOTE: By skipping checksums, one runs the risk of masking data-corruption 
during file-transfer.)
{code}

> distcp can't handle remote stores with different checksum algorithms
> 
>
> Key: HADOOP-15273
> URL: https://issues.apache.org/jira/browse/HADOOP-15273
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Critical
> Attachments: HADOOP-15273-001.patch
>
>
> When using distcp without {{-skipcrcchecks}} . If there's a checksum mismatch 
> between src and dest store types (e.g hdfs to s3), then the error message 
> will talk about blocksize, even when its the underlying checksum protocol 
> itself which is the cause for failure
> bq. Source and target differ in block-size. Use -pb to preserve block-sizes 
> during copy. Alternatively, skip checksum-checks altogether, using -skipCrc. 
> (NOTE: By skipping checksums, one runs the risk of masking data-corruption 
> during file-transfer.)
> update:  the CRC check takes always place on a distcp upload before the file 
> is renamed into place. *and you can't disable it then*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15273) distcp can't handle remote stores with different checksum algorithms

2018-03-07 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15273:

Priority: Blocker  (was: Critical)

> distcp can't handle remote stores with different checksum algorithms
> 
>
> Key: HADOOP-15273
> URL: https://issues.apache.org/jira/browse/HADOOP-15273
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Blocker
> Attachments: HADOOP-15273-001.patch
>
>
> When using distcp without {{-skipcrcchecks}} . If there's a checksum mismatch 
> between src and dest store types (e.g hdfs to s3), then the error message 
> will talk about blocksize, even when its the underlying checksum protocol 
> itself which is the cause for failure
> bq. Source and target differ in block-size. Use -pb to preserve block-sizes 
> during copy. Alternatively, skip checksum-checks altogether, using -skipCrc. 
> (NOTE: By skipping checksums, one runs the risk of masking data-corruption 
> during file-transfer.)
> update:  the CRC check takes always place on a distcp upload before the file 
> is renamed into place. *and you can't disable it then*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-15267) S3A multipart upload fails when SSE-C encryption is enabled

2018-03-07 Thread Anis Elleuch (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390171#comment-16390171
 ] 

Anis Elleuch commented on HADOOP-15267:
---

Thanks for merging this patch [~ste...@apache.org].

> S3A multipart upload fails when SSE-C encryption is enabled
> ---
>
> Key: HADOOP-15267
> URL: https://issues.apache.org/jira/browse/HADOOP-15267
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.9.0, 3.0.0, 3.1.0
> Environment: Hadoop 3.1 Snapshot
>Reporter: Anis Elleuch
>Assignee: Anis Elleuch
>Priority: Critical
> Fix For: 3.1.0, 3.0.2
>
> Attachments: HADOOP-15267-001.patch, HADOOP-15267-002.patch, 
> HADOOP-15267-003.patch
>
>
> When I enable SSE-C encryption in Hadoop 3.1 and set  fs.s3a.multipart.size 
> to 5 Mb, storing data in AWS doesn't work anymore. For example, running the 
> following code:
> {code}
> >>> df1 = spark.read.json('/home/user/people.json')
> >>> df1.write.mode("overwrite").json("s3a://testbucket/people.json")
> {code}
> shows the following exception:
> {code:java}
> com.amazonaws.services.s3.model.AmazonS3Exception: The multipart upload 
> initiate requested encryption. Subsequent part requests must include the 
> appropriate encryption parameters.
> {code}
> After some investigation, I discovered that hadoop-aws doesn't send SSE-C 
> headers in Put Object Part as stated in AWS specification: 
> [https://docs.aws.amazon.com/AmazonS3/latest/API/mpUploadUploadPart.html]
> {code:java}
> If you requested server-side encryption using a customer-provided encryption 
> key in your initiate multipart upload request, you must provide identical 
> encryption information in each part upload using the following headers.
> {code}
>  
> You can find a patch attached to this issue for a better clarification of the 
> problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15278) log s3a at info

2018-03-07 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15278:

Status: Patch Available  (was: Open)

patch 001; s3a logging is commented out & so gets the global scope; aws is 
uprated to info, with the exception of the httpclient. This can log stack 
traces when it gets errors its retrying, so can be confusing

> log s3a at info
> ---
>
> Key: HADOOP-15278
> URL: https://issues.apache.org/jira/browse/HADOOP-15278
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Attachments: HADOOP-15278-001.patch
>
>
> since it was added, hadoop conf/log4j only logs s3a at ERROR, even though in 
> our test/resources it logs at info. We do actually log lots of stuff useful 
> when debugging things
> Proposed: drop the log level to INFO here



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15278) log s3a at info

2018-03-07 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15278:

Attachment: HADOOP-15278-001.patch

> log s3a at info
> ---
>
> Key: HADOOP-15278
> URL: https://issues.apache.org/jira/browse/HADOOP-15278
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Attachments: HADOOP-15278-001.patch
>
>
> since it was added, hadoop conf/log4j only logs s3a at ERROR, even though in 
> our test/resources it logs at info. We do actually log lots of stuff useful 
> when debugging things
> Proposed: drop the log level to INFO here



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-15278) log s3a at info

2018-03-07 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390165#comment-16390165
 ] 

Steve Loughran commented on HADOOP-15278:
-

Actually, this is what you see in the default
{code}
2018-03-07 20:20:48,755 INFO beanutils.FluentPropertyBeanIntrospector: Error 
when creating PropertyDescriptor for public final void 
org.apache.commons.configuration2.AbstractConfiguration.setProperty(java.lang.String,java.lang.Object)!
 Ignoring this property.
2018-03-07 20:20:48,780 INFO impl.MetricsConfig: loaded properties from 
hadoop-metrics2.properties
2018-03-07 20:20:48,844 INFO impl.MetricsSystemImpl: Scheduled Metric snapshot 
period at 10 second(s).
2018-03-07 20:20:48,844 INFO impl.MetricsSystemImpl: s3a-file-system metrics 
system started
2018-03-07 20:20:50,108 INFO Configuration.deprecation: 
fs.s3a.server-side-encryption-key is deprecated. Instead, use 
fs.s3a.server-side-encryption.key
Found 1 items
drwxrwxrwx   - stevel stevel  0 2018-03-07 20:20 
s3a://hwdev-steve-london/Users
2018-03-07 20:20:50,274 INFO impl.MetricsSystemImpl: Stopping s3a-file-system 
metrics system...
2018-03-07 20:20:50,274 INFO impl.MetricsSystemImpl: s3a-file-system metrics 
system stopped.
2018-03-07 20:20:50,274 INFO impl.MetricsSystemImpl: s3a-file-system metrics 
system shutdown complete.
{code}

Something about deprecation I've never got away, and metrics cruft. The fluent 
property one is HADOOP-15277; think its actually metrics related.

for reference, azure
{code}
bin/hadoop fs -ls wasb://contr...@contender.blob.core.windows.net
2018-03-07 20:26:16,848 INFO util.log: Logging initialized @1371ms
2018-03-07 20:26:16,968 INFO beanutils.FluentPropertyBeanIntrospector: Error 
when creating PropertyDescriptor for public final void 
org.apache.commons.configuration2.AbstractConfiguration.setProperty(java.lang.String,java.lang.Object)!
 Ignoring this property.
2018-03-07 20:26:17,003 INFO impl.MetricsConfig: loaded properties from 
hadoop-metrics2.properties
2018-03-07 20:26:17,075 INFO impl.MetricsSystemImpl: Scheduled Metric snapshot 
period at 10 second(s).
2018-03-07 20:26:17,075 INFO impl.MetricsSystemImpl: azure-file-system metrics 
system started
Found 7 items
-rw-r--r--   1 stevel supergroup  0 2017-06-30 18:12 
wasb://contr...@contender.blob.core.windows.net/user/stevel/file
-rw-r--r--   1 stevel supergroup513 2017-06-30 18:10 
wasb://contr...@contender.blob.core.windows.net/user/stevel/file.dat
-rw-r--r--   1 stevel supergroup  0 2017-06-30 18:13 
wasb://contr...@contender.blob.core.windows.net/user/stevel/fileToRename
drwxr-xr-x   - stevel supergroup  0 2017-06-30 18:11 
wasb://contr...@contender.blob.core.windows.net/user/stevel/parent
drwxr-xr-x   - stevel supergroup  0 2017-06-30 18:11 
wasb://contr...@contender.blob.core.windows.net/user/stevel/renamedFolder
drwxr-xr-x   - stevel supergroup  0 2016-10-20 22:00 
wasb://contr...@contender.blob.core.windows.net/user/stevel/streaming
drwxr-xr-x   - stevel supergroup  0 2017-06-30 18:11 
wasb://contr...@contender.blob.core.windows.net/user/stevel/testDestFolder
2018-03-07 20:26:17,756 INFO impl.MetricsSystemImpl: Stopping azure-file-system 
metrics system...
2018-03-07 20:26:17,756 INFO impl.MetricsSystemImpl: azure-file-system metrics 
system stopped.
2018-03-07 20:26:17,756 INFO impl.MetricsSystemImpl: azure-file-system metrics 
system shutdown complete.
{code}

so: equivalent noisyness around metrics. Dangerous to turn that off though, as 
this is the same log4j used in the servers, which also spin the metrics. wasb 
and s3a are just the two clients to use the metrics classes

> log s3a at info
> ---
>
> Key: HADOOP-15278
> URL: https://issues.apache.org/jira/browse/HADOOP-15278
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>
> since it was added, hadoop conf/log4j only logs s3a at ERROR, even though in 
> our test/resources it logs at info. We do actually log lots of stuff useful 
> when debugging things
> Proposed: drop the log level to INFO here



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HADOOP-15273) distcp can't handle remote stores with different checksum algorithms

2018-03-07 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390143#comment-16390143
 ] 

Steve Loughran edited comment on HADOOP-15273 at 3/7/18 8:20 PM:
-

copymapper contains test to look for string of (incorrect) -skipCrc message. So 
not just wrong, tests to make sure it stays wrong :)
{code}
java.lang.AssertionError: Failure exception should have suggested the use of 
-skipCrc.
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.assertTrue(Assert.java:41)
at 
org.apache.hadoop.tools.mapred.TestCopyMapper.testCopyFailOnBlockSizeDifference(TestCopyMapper.java:949)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
{code}
checkstyle
{code}
./hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/RetriableFileCopyCommand.java:213:
  StringBuilder errorMessage = new StringBuilder("Checksum mismatch between 
"): Line is longer than 80 characters (found 82). [LineLength]
{code}


was (Author: ste...@apache.org):
copymapper contains test to look for string of (incorrect) -skipCrc message. So 
not just wrong, tests to make sure it stays wrong :)

> distcp can't handle remote stores with different checksum algorithms
> 
>
> Key: HADOOP-15273
> URL: https://issues.apache.org/jira/browse/HADOOP-15273
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Critical
> Attachments: HADOOP-15273-001.patch
>
>
> When using distcp without {{-skipcrcchecks}} . If there's a checksum mismatch 
> between src and dest store types (e.g hdfs to s3), then the error message 
> will talk about blocksize, even when its the underlying checksum protocol 
> itself which is the cause for failure
> bq. Source and target differ in block-size. Use -pb to preserve block-sizes 
> during copy. Alternatively, skip checksum-checks altogether, using -skipCrc. 
> (NOTE: By skipping checksums, one runs the risk of masking data-corruption 
> during file-transfer.)
> update:  the CRC check takes always place on a distcp upload before the file 
> is renamed into place. *and you can't disable it then*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-15273) distcp can't handle remote stores with different checksum algorithms

2018-03-07 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390143#comment-16390143
 ] 

Steve Loughran commented on HADOOP-15273:
-

copymapper contains test to look for string of (incorrect) -skipCrc message. So 
not just wrong, tests to make sure it stays wrong :)

> distcp can't handle remote stores with different checksum algorithms
> 
>
> Key: HADOOP-15273
> URL: https://issues.apache.org/jira/browse/HADOOP-15273
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Critical
> Attachments: HADOOP-15273-001.patch
>
>
> When using distcp without {{-skipcrcchecks}} . If there's a checksum mismatch 
> between src and dest store types (e.g hdfs to s3), then the error message 
> will talk about blocksize, even when its the underlying checksum protocol 
> itself which is the cause for failure
> bq. Source and target differ in block-size. Use -pb to preserve block-sizes 
> during copy. Alternatively, skip checksum-checks altogether, using -skipCrc. 
> (NOTE: By skipping checksums, one runs the risk of masking data-corruption 
> during file-transfer.)
> update:  the CRC check takes always place on a distcp upload before the file 
> is renamed into place. *and you can't disable it then*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-15292) Distcp's use of pread is slowing it down.

2018-03-07 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390141#comment-16390141
 ] 

genericqa commented on HADOOP-15292:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 11m 
53s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 58s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
17s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 13s{color} | {color:orange} hadoop-tools/hadoop-distcp: The patch generated 
1 new + 95 unchanged - 1 fixed = 96 total (was 96) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 26s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 14m 
40s{color} | {color:green} hadoop-distcp in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
22s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 67m 12s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:d4cc50f |
| JIRA Issue | HADOOP-15292 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12913424/HADOOP-15292.002.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 343b69026d0a 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 
13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / e0307e5 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14273/artifact/out/diff-checkstyle-hadoop-tools_hadoop-distcp.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14273/testReport/ |
| Max. process+thread count | 434 (vs. ulimit of 1) |
| modules | C: hadoop-tools/hadoop-distcp U: hadoop-tools/hadoop-distcp |
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14273/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |

[jira] [Commented] (HADOOP-15278) log s3a at info

2018-03-07 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390131#comment-16390131
 ] 

Steve Loughran commented on HADOOP-15278:
-

Nope, nothing by default

{code}
bin/hadoop fs -ls s3a://hwdev-steve-london/
Found 1 items
drwxrwxrwx   - stevel stevel  0 2018-03-07 20:12 
s3a://hwdev-steve-london/Users
{code}

you do get more on other ops, like committing work, which is where I'd like it

> log s3a at info
> ---
>
> Key: HADOOP-15278
> URL: https://issues.apache.org/jira/browse/HADOOP-15278
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>
> since it was added, hadoop conf/log4j only logs s3a at ERROR, even though in 
> our test/resources it logs at info. We do actually log lots of stuff useful 
> when debugging things
> Proposed: drop the log level to INFO here



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HADOOP-15267) S3A multipart upload fails when SSE-C encryption is enabled

2018-03-07 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390031#comment-16390031
 ] 

Steve Loughran edited comment on HADOOP-15267 at 3/7/18 8:10 PM:
-

+1
OK its in.

Anis, thank you for finding this!

I'm just seeing how easy this is to backport, as its a major bug


was (Author: ste...@apache.org):
OK its in.

Anis, thank you for finding this!

I'm just seeing how easy this is to backport, as its a major bug

> S3A multipart upload fails when SSE-C encryption is enabled
> ---
>
> Key: HADOOP-15267
> URL: https://issues.apache.org/jira/browse/HADOOP-15267
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.9.0, 3.0.0, 3.1.0
> Environment: Hadoop 3.1 Snapshot
>Reporter: Anis Elleuch
>Assignee: Anis Elleuch
>Priority: Critical
> Fix For: 3.1.0, 3.0.2
>
> Attachments: HADOOP-15267-001.patch, HADOOP-15267-002.patch, 
> HADOOP-15267-003.patch
>
>
> When I enable SSE-C encryption in Hadoop 3.1 and set  fs.s3a.multipart.size 
> to 5 Mb, storing data in AWS doesn't work anymore. For example, running the 
> following code:
> {code}
> >>> df1 = spark.read.json('/home/user/people.json')
> >>> df1.write.mode("overwrite").json("s3a://testbucket/people.json")
> {code}
> shows the following exception:
> {code:java}
> com.amazonaws.services.s3.model.AmazonS3Exception: The multipart upload 
> initiate requested encryption. Subsequent part requests must include the 
> appropriate encryption parameters.
> {code}
> After some investigation, I discovered that hadoop-aws doesn't send SSE-C 
> headers in Put Object Part as stated in AWS specification: 
> [https://docs.aws.amazon.com/AmazonS3/latest/API/mpUploadUploadPart.html]
> {code:java}
> If you requested server-side encryption using a customer-provided encryption 
> key in your initiate multipart upload request, you must provide identical 
> encryption information in each part upload using the following headers.
> {code}
>  
> You can find a patch attached to this issue for a better clarification of the 
> problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15267) S3A multipart upload fails when SSE-C encryption is enabled

2018-03-07 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15267:

Affects Version/s: 2.9.0
   3.0.0

> S3A multipart upload fails when SSE-C encryption is enabled
> ---
>
> Key: HADOOP-15267
> URL: https://issues.apache.org/jira/browse/HADOOP-15267
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.9.0, 3.0.0, 3.1.0
> Environment: Hadoop 3.1 Snapshot
>Reporter: Anis Elleuch
>Assignee: Anis Elleuch
>Priority: Critical
> Fix For: 3.1.0, 3.0.2
>
> Attachments: HADOOP-15267-001.patch, HADOOP-15267-002.patch, 
> HADOOP-15267-003.patch
>
>
> When I enable SSE-C encryption in Hadoop 3.1 and set  fs.s3a.multipart.size 
> to 5 Mb, storing data in AWS doesn't work anymore. For example, running the 
> following code:
> {code}
> >>> df1 = spark.read.json('/home/user/people.json')
> >>> df1.write.mode("overwrite").json("s3a://testbucket/people.json")
> {code}
> shows the following exception:
> {code:java}
> com.amazonaws.services.s3.model.AmazonS3Exception: The multipart upload 
> initiate requested encryption. Subsequent part requests must include the 
> appropriate encryption parameters.
> {code}
> After some investigation, I discovered that hadoop-aws doesn't send SSE-C 
> headers in Put Object Part as stated in AWS specification: 
> [https://docs.aws.amazon.com/AmazonS3/latest/API/mpUploadUploadPart.html]
> {code:java}
> If you requested server-side encryption using a customer-provided encryption 
> key in your initiate multipart upload request, you must provide identical 
> encryption information in each part upload using the following headers.
> {code}
>  
> You can find a patch attached to this issue for a better clarification of the 
> problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-15267) S3A multipart upload fails when SSE-C encryption is enabled

2018-03-07 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390120#comment-16390120
 ] 

Steve Loughran commented on HADOOP-15267:
-

+[~fabbri]: FYI

> S3A multipart upload fails when SSE-C encryption is enabled
> ---
>
> Key: HADOOP-15267
> URL: https://issues.apache.org/jira/browse/HADOOP-15267
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.1.0
> Environment: Hadoop 3.1 Snapshot
>Reporter: Anis Elleuch
>Assignee: Anis Elleuch
>Priority: Critical
> Fix For: 3.1.0, 3.0.2
>
> Attachments: HADOOP-15267-001.patch, HADOOP-15267-002.patch, 
> HADOOP-15267-003.patch
>
>
> When I enable SSE-C encryption in Hadoop 3.1 and set  fs.s3a.multipart.size 
> to 5 Mb, storing data in AWS doesn't work anymore. For example, running the 
> following code:
> {code}
> >>> df1 = spark.read.json('/home/user/people.json')
> >>> df1.write.mode("overwrite").json("s3a://testbucket/people.json")
> {code}
> shows the following exception:
> {code:java}
> com.amazonaws.services.s3.model.AmazonS3Exception: The multipart upload 
> initiate requested encryption. Subsequent part requests must include the 
> appropriate encryption parameters.
> {code}
> After some investigation, I discovered that hadoop-aws doesn't send SSE-C 
> headers in Put Object Part as stated in AWS specification: 
> [https://docs.aws.amazon.com/AmazonS3/latest/API/mpUploadUploadPart.html]
> {code:java}
> If you requested server-side encryption using a customer-provided encryption 
> key in your initiate multipart upload request, you must provide identical 
> encryption information in each part upload using the following headers.
> {code}
>  
> You can find a patch attached to this issue for a better clarification of the 
> problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-15273) distcp can't handle remote stores with different checksum algorithms

2018-03-07 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390093#comment-16390093
 ] 

genericqa commented on HADOOP-15273:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
16s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
42s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 22s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
17s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 12s{color} | {color:orange} hadoop-tools/hadoop-distcp: The patch generated 
1 new + 8 unchanged - 2 fixed = 9 total (was 10) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 10s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
15s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 14m 41s{color} 
| {color:red} hadoop-distcp in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
23s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 58m 50s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.tools.mapred.TestCopyMapper |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:d4cc50f |
| JIRA Issue | HADOOP-15273 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12913420/HADOOP-15273-001.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 44786d213856 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 
11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / d69b31f |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14272/artifact/out/diff-checkstyle-hadoop-tools_hadoop-distcp.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14272/artifact/out/patch-unit-hadoop-tools_hadoop-distcp.txt
 |
|  Test Results |

[jira] [Commented] (HADOOP-15267) S3A multipart upload fails when SSE-C encryption is enabled

2018-03-07 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390061#comment-16390061
 ] 

Steve Loughran commented on HADOOP-15267:
-

backported to 3.0.x; not got time right now to do look at & retest 
branch-2...the 3.0 commit should be the one to pick up if anyone wants to

> S3A multipart upload fails when SSE-C encryption is enabled
> ---
>
> Key: HADOOP-15267
> URL: https://issues.apache.org/jira/browse/HADOOP-15267
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.1.0
> Environment: Hadoop 3.1 Snapshot
>Reporter: Anis Elleuch
>Assignee: Anis Elleuch
>Priority: Critical
> Fix For: 3.1.0, 3.0.2
>
> Attachments: HADOOP-15267-001.patch, HADOOP-15267-002.patch, 
> HADOOP-15267-003.patch
>
>
> When I enable SSE-C encryption in Hadoop 3.1 and set  fs.s3a.multipart.size 
> to 5 Mb, storing data in AWS doesn't work anymore. For example, running the 
> following code:
> {code}
> >>> df1 = spark.read.json('/home/user/people.json')
> >>> df1.write.mode("overwrite").json("s3a://testbucket/people.json")
> {code}
> shows the following exception:
> {code:java}
> com.amazonaws.services.s3.model.AmazonS3Exception: The multipart upload 
> initiate requested encryption. Subsequent part requests must include the 
> appropriate encryption parameters.
> {code}
> After some investigation, I discovered that hadoop-aws doesn't send SSE-C 
> headers in Put Object Part as stated in AWS specification: 
> [https://docs.aws.amazon.com/AmazonS3/latest/API/mpUploadUploadPart.html]
> {code:java}
> If you requested server-side encryption using a customer-provided encryption 
> key in your initiate multipart upload request, you must provide identical 
> encryption information in each part upload using the following headers.
> {code}
>  
> You can find a patch attached to this issue for a better clarification of the 
> problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15267) S3A multipart upload fails when SSE-C encryption is enabled

2018-03-07 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15267:

Fix Version/s: 3.0.2

> S3A multipart upload fails when SSE-C encryption is enabled
> ---
>
> Key: HADOOP-15267
> URL: https://issues.apache.org/jira/browse/HADOOP-15267
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.1.0
> Environment: Hadoop 3.1 Snapshot
>Reporter: Anis Elleuch
>Assignee: Anis Elleuch
>Priority: Critical
> Fix For: 3.1.0, 3.0.2
>
> Attachments: HADOOP-15267-001.patch, HADOOP-15267-002.patch, 
> HADOOP-15267-003.patch
>
>
> When I enable SSE-C encryption in Hadoop 3.1 and set  fs.s3a.multipart.size 
> to 5 Mb, storing data in AWS doesn't work anymore. For example, running the 
> following code:
> {code}
> >>> df1 = spark.read.json('/home/user/people.json')
> >>> df1.write.mode("overwrite").json("s3a://testbucket/people.json")
> {code}
> shows the following exception:
> {code:java}
> com.amazonaws.services.s3.model.AmazonS3Exception: The multipart upload 
> initiate requested encryption. Subsequent part requests must include the 
> appropriate encryption parameters.
> {code}
> After some investigation, I discovered that hadoop-aws doesn't send SSE-C 
> headers in Put Object Part as stated in AWS specification: 
> [https://docs.aws.amazon.com/AmazonS3/latest/API/mpUploadUploadPart.html]
> {code:java}
> If you requested server-side encryption using a customer-provided encryption 
> key in your initiate multipart upload request, you must provide identical 
> encryption information in each part upload using the following headers.
> {code}
>  
> You can find a patch attached to this issue for a better clarification of the 
> problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-15267) S3A multipart upload fails when SSE-C encryption is enabled

2018-03-07 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390051#comment-16390051
 ] 

Hudson commented on HADOOP-15267:
-

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13787 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/13787/])
HADOOP-15267. S3A multipart upload fails when SSE-C encryption is (stevel: rev 
e0307e53e2110cb6b418861a7471e97a013c16e2)
* (edit) 
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/MockS3AFileSystem.java
* (edit) 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
* (add) 
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/scale/ITestS3AHugeFilesSSECDiskBlocks.java


> S3A multipart upload fails when SSE-C encryption is enabled
> ---
>
> Key: HADOOP-15267
> URL: https://issues.apache.org/jira/browse/HADOOP-15267
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.1.0
> Environment: Hadoop 3.1 Snapshot
>Reporter: Anis Elleuch
>Assignee: Anis Elleuch
>Priority: Critical
> Fix For: 3.1.0
>
> Attachments: HADOOP-15267-001.patch, HADOOP-15267-002.patch, 
> HADOOP-15267-003.patch
>
>
> When I enable SSE-C encryption in Hadoop 3.1 and set  fs.s3a.multipart.size 
> to 5 Mb, storing data in AWS doesn't work anymore. For example, running the 
> following code:
> {code}
> >>> df1 = spark.read.json('/home/user/people.json')
> >>> df1.write.mode("overwrite").json("s3a://testbucket/people.json")
> {code}
> shows the following exception:
> {code:java}
> com.amazonaws.services.s3.model.AmazonS3Exception: The multipart upload 
> initiate requested encryption. Subsequent part requests must include the 
> appropriate encryption parameters.
> {code}
> After some investigation, I discovered that hadoop-aws doesn't send SSE-C 
> headers in Put Object Part as stated in AWS specification: 
> [https://docs.aws.amazon.com/AmazonS3/latest/API/mpUploadUploadPart.html]
> {code:java}
> If you requested server-side encryption using a customer-provided encryption 
> key in your initiate multipart upload request, you must provide identical 
> encryption information in each part upload using the following headers.
> {code}
>  
> You can find a patch attached to this issue for a better clarification of the 
> problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-15292) Distcp's use of pread is slowing it down.

2018-03-07 Thread Virajith Jalaparti (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390047#comment-16390047
 ] 

Virajith Jalaparti commented on HADOOP-15292:
-

 [^HADOOP-15292.002.patch] fixes [~ste...@apache.org]'s and [~chris.douglas]'s 
comment of seeking when {{sourceOffset != inStream.getPos()}}. 

[~ste...@apache.org] {{ITestAzureNativeContractDistCp}} with this fix.

> Distcp's use of pread is slowing it down.
> -
>
> Key: HADOOP-15292
> URL: https://issues.apache.org/jira/browse/HADOOP-15292
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 2.5.0
>Reporter: Virajith Jalaparti
>Priority: Minor
> Attachments: HADOOP-15292.000.patch, HADOOP-15292.001.patch, 
> HADOOP-15292.002.patch
>
>
> Distcp currently uses positioned-reads (in 
> RetriableFileCopyCommand#copyBytes) when the source offset is > 0. This 
> results in unnecessary overheads (new BlockReader being created on the 
> client-side, multiple readBlock() calls to the Datanodes, each of which 
> requires the creation of a BlockSender and an inputstream to the ReplicaInfo).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15292) Distcp's use of pread is slowing it down.

2018-03-07 Thread Virajith Jalaparti (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Virajith Jalaparti updated HADOOP-15292:

Status: Patch Available  (was: Open)

> Distcp's use of pread is slowing it down.
> -
>
> Key: HADOOP-15292
> URL: https://issues.apache.org/jira/browse/HADOOP-15292
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 2.5.0
>Reporter: Virajith Jalaparti
>Priority: Minor
> Attachments: HADOOP-15292.000.patch, HADOOP-15292.001.patch, 
> HADOOP-15292.002.patch
>
>
> Distcp currently uses positioned-reads (in 
> RetriableFileCopyCommand#copyBytes) when the source offset is > 0. This 
> results in unnecessary overheads (new BlockReader being created on the 
> client-side, multiple readBlock() calls to the Datanodes, each of which 
> requires the creation of a BlockSender and an inputstream to the ReplicaInfo).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HADOOP-15292) Distcp's use of pread is slowing it down.

2018-03-07 Thread Virajith Jalaparti (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390047#comment-16390047
 ] 

Virajith Jalaparti edited comment on HADOOP-15292 at 3/7/18 7:07 PM:
-

 [^HADOOP-15292.002.patch] fixes [~ste...@apache.org]'s and [~chris.douglas]'s 
comment of seeking when {{sourceOffset != inStream.getPos()}}. 

[~ste...@apache.org] {{ITestAzureNativeContractDistCp}} passes after this fix 
as well.


was (Author: virajith):
 [^HADOOP-15292.002.patch] fixes [~ste...@apache.org]'s and [~chris.douglas]'s 
comment of seeking when {{sourceOffset != inStream.getPos()}}. 

[~ste...@apache.org] {{ITestAzureNativeContractDistCp}} with this fix.

> Distcp's use of pread is slowing it down.
> -
>
> Key: HADOOP-15292
> URL: https://issues.apache.org/jira/browse/HADOOP-15292
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 2.5.0
>Reporter: Virajith Jalaparti
>Priority: Minor
> Attachments: HADOOP-15292.000.patch, HADOOP-15292.001.patch, 
> HADOOP-15292.002.patch
>
>
> Distcp currently uses positioned-reads (in 
> RetriableFileCopyCommand#copyBytes) when the source offset is > 0. This 
> results in unnecessary overheads (new BlockReader being created on the 
> client-side, multiple readBlock() calls to the Datanodes, each of which 
> requires the creation of a BlockSender and an inputstream to the ReplicaInfo).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15292) Distcp's use of pread is slowing it down.

2018-03-07 Thread Virajith Jalaparti (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Virajith Jalaparti updated HADOOP-15292:

Status: Open  (was: Patch Available)

> Distcp's use of pread is slowing it down.
> -
>
> Key: HADOOP-15292
> URL: https://issues.apache.org/jira/browse/HADOOP-15292
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 2.5.0
>Reporter: Virajith Jalaparti
>Priority: Minor
> Attachments: HADOOP-15292.000.patch, HADOOP-15292.001.patch, 
> HADOOP-15292.002.patch
>
>
> Distcp currently uses positioned-reads (in 
> RetriableFileCopyCommand#copyBytes) when the source offset is > 0. This 
> results in unnecessary overheads (new BlockReader being created on the 
> client-side, multiple readBlock() calls to the Datanodes, each of which 
> requires the creation of a BlockSender and an inputstream to the ReplicaInfo).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15292) Distcp's use of pread is slowing it down.

2018-03-07 Thread Virajith Jalaparti (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Virajith Jalaparti updated HADOOP-15292:

Attachment: HADOOP-15292.002.patch

> Distcp's use of pread is slowing it down.
> -
>
> Key: HADOOP-15292
> URL: https://issues.apache.org/jira/browse/HADOOP-15292
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 2.5.0
>Reporter: Virajith Jalaparti
>Priority: Minor
> Attachments: HADOOP-15292.000.patch, HADOOP-15292.001.patch, 
> HADOOP-15292.002.patch
>
>
> Distcp currently uses positioned-reads (in 
> RetriableFileCopyCommand#copyBytes) when the source offset is > 0. This 
> results in unnecessary overheads (new BlockReader being created on the 
> client-side, multiple readBlock() calls to the Datanodes, each of which 
> requires the creation of a BlockSender and an inputstream to the ReplicaInfo).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15267) S3A multipart upload fails when SSE-C encryption is enabled

2018-03-07 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15267:

   Resolution: Fixed
Fix Version/s: 3.1.0
   Status: Resolved  (was: Patch Available)

OK its in.

Anis, thank you for finding this!

I'm just seeing how easy this is to backport, as its a major bug

> S3A multipart upload fails when SSE-C encryption is enabled
> ---
>
> Key: HADOOP-15267
> URL: https://issues.apache.org/jira/browse/HADOOP-15267
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.1.0
> Environment: Hadoop 3.1 Snapshot
>Reporter: Anis Elleuch
>Assignee: Anis Elleuch
>Priority: Critical
> Fix For: 3.1.0
>
> Attachments: HADOOP-15267-001.patch, HADOOP-15267-002.patch, 
> HADOOP-15267-003.patch
>
>
> When I enable SSE-C encryption in Hadoop 3.1 and set  fs.s3a.multipart.size 
> to 5 Mb, storing data in AWS doesn't work anymore. For example, running the 
> following code:
> {code}
> >>> df1 = spark.read.json('/home/user/people.json')
> >>> df1.write.mode("overwrite").json("s3a://testbucket/people.json")
> {code}
> shows the following exception:
> {code:java}
> com.amazonaws.services.s3.model.AmazonS3Exception: The multipart upload 
> initiate requested encryption. Subsequent part requests must include the 
> appropriate encryption parameters.
> {code}
> After some investigation, I discovered that hadoop-aws doesn't send SSE-C 
> headers in Put Object Part as stated in AWS specification: 
> [https://docs.aws.amazon.com/AmazonS3/latest/API/mpUploadUploadPart.html]
> {code:java}
> If you requested server-side encryption using a customer-provided encryption 
> key in your initiate multipart upload request, you must provide identical 
> encryption information in each part upload using the following headers.
> {code}
>  
> You can find a patch attached to this issue for a better clarification of the 
> problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15273) distcp can't handle remote stores with different checksum algorithms

2018-03-07 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15273:

Status: Patch Available  (was: Open)

Patch 001

* allows -skipcrccheck everywhere
* when the filesystem schemas are different not the hdfs ones (hdfs, webhdfs, 
swebhdfs) then a filesystem message is printed instead of one about block size
* error message adds \n formatting
* and the correct name of the option to disable the checks

Tests: not easily. Maybe after HADOOP-15209 is in I could do it...we'd need 
something in hadoop-aws with a minihdfs cluster. This is not an easy 
undertaking.

I have manually tested it & verified that yes, the skipcrc goes down.

Even with this patch, I'm wondering whether its best to revert the s3a etag 
feature until we have distcp better able to cope

> distcp can't handle remote stores with different checksum algorithms
> 
>
> Key: HADOOP-15273
> URL: https://issues.apache.org/jira/browse/HADOOP-15273
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Critical
> Attachments: HADOOP-15273-001.patch
>
>
> When using distcp without {{-skipcrcchecks}} . If there's a checksum mismatch 
> between src and dest store types (e.g hdfs to s3), then the error message 
> will talk about blocksize, even when its the underlying checksum protocol 
> itself which is the cause for failure
> bq. Source and target differ in block-size. Use -pb to preserve block-sizes 
> during copy. Alternatively, skip checksum-checks altogether, using -skipCrc. 
> (NOTE: By skipping checksums, one runs the risk of masking data-corruption 
> during file-transfer.)
> update:  the CRC check takes always place on a distcp upload before the file 
> is renamed into place. *and you can't disable it then*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15273) distcp can't handle remote stores with different checksum algorithms

2018-03-07 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15273:

Attachment: HADOOP-15273-001.patch

> distcp can't handle remote stores with different checksum algorithms
> 
>
> Key: HADOOP-15273
> URL: https://issues.apache.org/jira/browse/HADOOP-15273
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Priority: Critical
> Attachments: HADOOP-15273-001.patch
>
>
> When using distcp without {{-skipcrcchecks}} . If there's a checksum mismatch 
> between src and dest store types (e.g hdfs to s3), then the error message 
> will talk about blocksize, even when its the underlying checksum protocol 
> itself which is the cause for failure
> bq. Source and target differ in block-size. Use -pb to preserve block-sizes 
> during copy. Alternatively, skip checksum-checks altogether, using -skipCrc. 
> (NOTE: By skipping checksums, one runs the risk of masking data-corruption 
> during file-transfer.)
> update:  the CRC check takes always place on a distcp upload before the file 
> is renamed into place. *and you can't disable it then*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-15273) distcp can't handle remote stores with different checksum algorithms

2018-03-07 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389953#comment-16389953
 ] 

Steve Loughran commented on HADOOP-15273:
-

An initial distcp always does a checksum check during upload

This sequence will fail
{code}
hadoop fs -rm -R -skipTrash s3a://hwdev-steve-new/\*
hadoop distcp  /user/steve/data s3a://hwdev-steve-new/data
{code}

Here

{code}
18/03/07 17:08:03 INFO mapreduce.Job: Task Id : 
attempt_1520388269891_0019_m_04_2, Status : FAILED
Error: java.io.IOException: File copy failed: 
hdfs://mycluster/user/steve/data/example.py --> 
s3a://hwdev-steve-new/data/example.py
at 
org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:259)
at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:217)
at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:48)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:794)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1965)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)
Caused by: java.io.IOException: Couldn't run retriable-command: Copying 
hdfs://mycluster/user/steve/data/example.py to 
s3a://hwdev-steve-new/data/example.py
at 
org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:101)
at 
org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:256)
... 10 more
Caused by: java.io.IOException: Check-sum mismatch between 
hdfs://mycluster/user/steve/data/example.py and 
s3a://hwdev-steve-new/data-connectors/.distcp.tmp.attempt_1520388269891_0019_m_04_2.
 Source and target differ in block-size. Use -pb to preserve block-sizes during 
copy. Alternatively, skip checksum-checks altogether, using -skipCrc. (NOTE: By 
skipping checksums, one runs the risk of masking data-corruption during 
file-transfer.)
at 
org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.compareCheckSums(RetriableFileCopyCommand.java:223)
at 
org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doCopy(RetriableFileCopyCommand.java:133)
at 
org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doExecute(RetriableFileCopyCommand.java:99)
at 
org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:87)
... 11 more
{code}

You cannot use -skipcrccheck as the validator forbids it, yet without it you 
can't upload to hdfs to s3a now that it serves up its checksums as etags.

> distcp can't handle remote stores with different checksum algorithms
> 
>
> Key: HADOOP-15273
> URL: https://issues.apache.org/jira/browse/HADOOP-15273
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Priority: Critical
> Attachments: HADOOP-15273-001.patch
>
>
> When using distcp without {{-skipcrcchecks}} . If there's a checksum mismatch 
> between src and dest store types (e.g hdfs to s3), then the error message 
> will talk about blocksize, even when its the underlying checksum protocol 
> itself which is the cause for failure
> bq. Source and target differ in block-size. Use -pb to preserve block-sizes 
> during copy. Alternatively, skip checksum-checks altogether, using -skipCrc. 
> (NOTE: By skipping checksums, one runs the risk of masking data-corruption 
> during file-transfer.)
> update:  the CRC check takes always place on a distcp upload before the file 
> is renamed into place. *and you can't disable it then*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Assigned] (HADOOP-15273) distcp can't handle remote stores with different checksum algorithms

2018-03-07 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran reassigned HADOOP-15273:
---

Assignee: Steve Loughran

> distcp can't handle remote stores with different checksum algorithms
> 
>
> Key: HADOOP-15273
> URL: https://issues.apache.org/jira/browse/HADOOP-15273
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Critical
> Attachments: HADOOP-15273-001.patch
>
>
> When using distcp without {{-skipcrcchecks}} . If there's a checksum mismatch 
> between src and dest store types (e.g hdfs to s3), then the error message 
> will talk about blocksize, even when its the underlying checksum protocol 
> itself which is the cause for failure
> bq. Source and target differ in block-size. Use -pb to preserve block-sizes 
> during copy. Alternatively, skip checksum-checks altogether, using -skipCrc. 
> (NOTE: By skipping checksums, one runs the risk of masking data-corruption 
> during file-transfer.)
> update:  the CRC check takes always place on a distcp upload before the file 
> is renamed into place. *and you can't disable it then*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15273) distcp can't handle remote stores with different checksum algorithms

2018-03-07 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15273:

Description: 
When using distcp without {{-skipcrcchecks}} . If there's a checksum mismatch 
between src and dest store types (e.g hdfs to s3), then the error message will 
talk about blocksize, even when its the underlying checksum protocol itself 
which is the cause for failure

bq. Source and target differ in block-size. Use -pb to preserve block-sizes 
during copy. Alternatively, skip checksum-checks altogether, using -skipCrc. 
(NOTE: By skipping checksums, one runs the risk of masking data-corruption 
during file-transfer.)

update:  the CRC check takes always place on a distcp upload before the file is 
renamed into place. *and you can't disable it then*

  was:
When using distcp without {{-skipCRC}} . If there's a checksum mismatch between 
src and dest store types (e.g hdfs to s3), then the error message will talk 
about blocksize, even when its the underlying checksum protocol itself which is 
the cause for failure

bq. Source and target differ in block-size. Use -pb to preserve block-sizes 
during copy. Alternatively, skip checksum-checks altogether, using -skipCrc. 
(NOTE: By skipping checksums, one runs the risk of masking data-corruption 
during file-transfer.)

IF the checksum types are fundamentally different, the error message should say 
so


> distcp can't handle remote stores with different checksum algorithms
> 
>
> Key: HADOOP-15273
> URL: https://issues.apache.org/jira/browse/HADOOP-15273
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Priority: Critical
>
> When using distcp without {{-skipcrcchecks}} . If there's a checksum mismatch 
> between src and dest store types (e.g hdfs to s3), then the error message 
> will talk about blocksize, even when its the underlying checksum protocol 
> itself which is the cause for failure
> bq. Source and target differ in block-size. Use -pb to preserve block-sizes 
> during copy. Alternatively, skip checksum-checks altogether, using -skipCrc. 
> (NOTE: By skipping checksums, one runs the risk of masking data-corruption 
> during file-transfer.)
> update:  the CRC check takes always place on a distcp upload before the file 
> is renamed into place. *and you can't disable it then*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15273) distcp can't handle remote stores with different checksum algorithms

2018-03-07 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15273:

Priority: Critical  (was: Minor)

> distcp can't handle remote stores with different checksum algorithms
> 
>
> Key: HADOOP-15273
> URL: https://issues.apache.org/jira/browse/HADOOP-15273
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Priority: Critical
>
> When using distcp without {{-skipCRC}} . If there's a checksum mismatch 
> between src and dest store types (e.g hdfs to s3), then the error message 
> will talk about blocksize, even when its the underlying checksum protocol 
> itself which is the cause for failure
> bq. Source and target differ in block-size. Use -pb to preserve block-sizes 
> during copy. Alternatively, skip checksum-checks altogether, using -skipCrc. 
> (NOTE: By skipping checksums, one runs the risk of masking data-corruption 
> during file-transfer.)
> IF the checksum types are fundamentally different, the error message should 
> say so



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15273) distcp can't handle remote stores with different checksum algorithms

2018-03-07 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15273:

Summary: distcp can't handle remote stores with different checksum 
algorithms  (was: distcp to downgrade on checksum algorithm mismatch to "files 
unchanged")

> distcp can't handle remote stores with different checksum algorithms
> 
>
> Key: HADOOP-15273
> URL: https://issues.apache.org/jira/browse/HADOOP-15273
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Priority: Minor
>
> When using distcp without {{-skipCRC}} . If there's a checksum mismatch 
> between src and dest store types (e.g hdfs to s3), then the error message 
> will talk about blocksize, even when its the underlying checksum protocol 
> itself which is the cause for failure
> bq. Source and target differ in block-size. Use -pb to preserve block-sizes 
> during copy. Alternatively, skip checksum-checks altogether, using -skipCrc. 
> (NOTE: By skipping checksums, one runs the risk of masking data-corruption 
> during file-transfer.)
> IF the checksum types are fundamentally different, the error message should 
> say so



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-15267) S3A multipart upload fails when SSE-C encryption is enabled

2018-03-07 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389675#comment-16389675
 ] 

genericqa commented on HADOOP-15267:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
53s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 49s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
20s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m  4s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
18s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  4m 
41s{color} | {color:green} hadoop-aws in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
26s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 51m 37s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:d4cc50f |
| JIRA Issue | HADOOP-15267 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12913197/HADOOP-15267-003.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux c1d3d7470c71 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 
11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 58ea2d7 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14271/testReport/ |
| Max. process+thread count | 292 (vs. ulimit of 1) |
| modules | C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws |
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14271/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> S3A multipart upload fails when SSE-C encryption is enabled
> ---
>
> Key: HADOOP-15267
> URL:

[jira] [Updated] (HADOOP-15277) remove .FluentPropertyBeanIntrospector from CLI operation log output

2018-03-07 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15277:

Issue Type: Sub-task  (was: Improvement)
Parent: HADOOP-14831

> remove .FluentPropertyBeanIntrospector from CLI operation log output
> 
>
> Key: HADOOP-15277
> URL: https://issues.apache.org/jira/browse/HADOOP-15277
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: conf
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Attachments: HADOOP-15277-001.patch
>
>
> when using the default logs, I get told off by beanutils
> {code}
> 18/03/01 18:43:54 INFO beanutils.FluentPropertyBeanIntrospector: Error when 
> creating PropertyDescriptor for public final void 
> org.apache.commons.configuration2.AbstractConfiguration.setProperty(java.lang.String,java.lang.Object)!
>  Ignoring this property.
> {code}
> This is a distraction.
> I propose to raise the log level to ERROR for that class in log4j.properties



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-15273) distcp to downgrade on checksum algorithm mismatch to "files unchanged"

2018-03-07 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389633#comment-16389633
 ] 

Steve Loughran commented on HADOOP-15273:
-

oh, and the text is actually wrong, as the arg is "-skipcrccheck"

> distcp to downgrade on checksum algorithm mismatch to "files unchanged"
> ---
>
> Key: HADOOP-15273
> URL: https://issues.apache.org/jira/browse/HADOOP-15273
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Priority: Minor
>
> When using distcp without {{-skipCRC}} . If there's a checksum mismatch 
> between src and dest store types (e.g hdfs to s3), then the error message 
> will talk about blocksize, even when its the underlying checksum protocol 
> itself which is the cause for failure
> bq. Source and target differ in block-size. Use -pb to preserve block-sizes 
> during copy. Alternatively, skip checksum-checks altogether, using -skipCrc. 
> (NOTE: By skipping checksums, one runs the risk of masking data-corruption 
> during file-transfer.)
> IF the checksum types are fundamentally different, the error message should 
> say so



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-10786) Fix UGI#reloginFromKeytab on Java 8

2018-03-07 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-10786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389630#comment-16389630
 ] 

Steve Loughran commented on HADOOP-10786:
-

Looks like it's in branch-2.7
{code}
> git log --grep HADOOP-10786 branch-2.7
commit 8f4a09b6076de9fbd6cd8ccaddf72ba9c94429ff
Author: Vinayakumar B 
Date:   Fri Aug 14 12:23:51 2015 +0530

HADOOP-10786. Fix UGI#reloginFromKeytab on Java 8. Contributed by Stephen 
Chu.
Moved CHANGES.txt entry to 2.6.1

(cherry picked from commit e7aa81394dce61cc96d480e21204263a5f2ed153)
{code}

The code has moved on a lot since that patch went in, which is why there's no 
match. 

> Fix UGI#reloginFromKeytab on Java 8
> ---
>
> Key: HADOOP-10786
> URL: https://issues.apache.org/jira/browse/HADOOP-10786
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: security
>Affects Versions: 2.6.0
>Reporter: Tobi Vollebregt
>Assignee: Stephen Chu
>Priority: Major
>  Labels: 2.6.1-candidate
> Fix For: 2.6.1, 2.7.0, 3.0.0-alpha1
>
> Attachments: HADOOP-10786.2.patch, HADOOP-10786.3.patch, 
> HADOOP-10786.3.patch, HADOOP-10786.4.patch, HADOOP-10786.5.patch, 
> HADOOP-10786.patch
>
>
> Krb5LoginModule changed subtly in java 8: in particular, if useKeyTab and 
> storeKey are specified, then only a KeyTab object is added to the Subject's 
> private credentials, whereas in java <= 7 both a KeyTab and some number of 
> KerberosKey objects were added.
> The UGI constructor checks whether or not a keytab was used to login by 
> looking if there are any KerberosKey objects in the Subject's private 
> credentials. If there are, then isKeyTab is set to true, and otherwise it's 
> set to false.
> Thus, in java 8 isKeyTab is always false given the current UGI 
> implementation, which makes UGI#reloginFromKeytab fail silently.
> Attached patch will check for a KeyTab object on the Subject, instead of a 
> KerberosKey object. This fixes relogins from kerberos keytabs on Oracle java 
> 8, and works on Oracle java 7 as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-15292) Distcp's use of pread is slowing it down.

2018-03-07 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389616#comment-16389616
 ] 

Steve Loughran commented on HADOOP-15292:
-

# I like the extra instrumentation & probes; if it works for HDFS it'll be the 
same everywhere
# I think chris's comment about {{sourceOffset != inStream.getPos()}} seems 
valid. If the file is newly opened, this is the same as offset!=0, otherwise 
its relative to where you are.

w.r.t S3 testing, I can see why it wouldn't be your default, but our test 
suites are designed to be very low cost (no persistent data, bias to uploads 
and large D/Ls all from AWS funded buckets). It's worth getting set up for this 
to help verify consistent behaviour everywhere. 

At the very least, make sure the Azure WASB store tests are happy. (you don't 
get an ADL test until HADOOP-15209). 

> Distcp's use of pread is slowing it down.
> -
>
> Key: HADOOP-15292
> URL: https://issues.apache.org/jira/browse/HADOOP-15292
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 2.5.0
>Reporter: Virajith Jalaparti
>Priority: Minor
> Attachments: HADOOP-15292.000.patch, HADOOP-15292.001.patch
>
>
> Distcp currently uses positioned-reads (in 
> RetriableFileCopyCommand#copyBytes) when the source offset is > 0. This 
> results in unnecessary overheads (new BlockReader being created on the 
> client-side, multiple readBlock() calls to the Datanodes, each of which 
> requires the creation of a BlockSender and an inputstream to the ReplicaInfo).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-15293) TestLogLevel fails on Java 9

2018-03-07 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389605#comment-16389605
 ] 

genericqa commented on HADOOP-15293:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
16s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
 1s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 13m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
5s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 59s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
54s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 12m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 59s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  9m 
28s{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
48s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 84m 48s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:d4cc50f |
| JIRA Issue | HADOOP-15293 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12913382/HADOOP-15293.1.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 922c6517d5df 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 
11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 58ea2d7 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14269/testReport/ |
| Max. process+thread count | 1361 (vs. ulimit of 1) |
| modules | C: hadoop-common-project/hadoop-common U: 
hadoop-common-project/hadoop-common |
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14269/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> TestLogLevel fails on Java 9
> 
>
> Key: HADOOP-15293
> URL:

[jira] [Commented] (HADOOP-15267) S3A multipart upload fails when SSE-C encryption is enabled

2018-03-07 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389597#comment-16389597
 ] 

Steve Loughran commented on HADOOP-15267:
-

now that I've remembered to hit "submit patch", if yetus is happy it'll go in. 
After that, the best thing you can do is verify that it works for you

> S3A multipart upload fails when SSE-C encryption is enabled
> ---
>
> Key: HADOOP-15267
> URL: https://issues.apache.org/jira/browse/HADOOP-15267
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.1.0
> Environment: Hadoop 3.1 Snapshot
>Reporter: Anis Elleuch
>Assignee: Anis Elleuch
>Priority: Critical
> Attachments: HADOOP-15267-001.patch, HADOOP-15267-002.patch, 
> HADOOP-15267-003.patch
>
>
> When I enable SSE-C encryption in Hadoop 3.1 and set  fs.s3a.multipart.size 
> to 5 Mb, storing data in AWS doesn't work anymore. For example, running the 
> following code:
> {code}
> >>> df1 = spark.read.json('/home/user/people.json')
> >>> df1.write.mode("overwrite").json("s3a://testbucket/people.json")
> {code}
> shows the following exception:
> {code:java}
> com.amazonaws.services.s3.model.AmazonS3Exception: The multipart upload 
> initiate requested encryption. Subsequent part requests must include the 
> appropriate encryption parameters.
> {code}
> After some investigation, I discovered that hadoop-aws doesn't send SSE-C 
> headers in Put Object Part as stated in AWS specification: 
> [https://docs.aws.amazon.com/AmazonS3/latest/API/mpUploadUploadPart.html]
> {code:java}
> If you requested server-side encryption using a customer-provided encryption 
> key in your initiate multipart upload request, you must provide identical 
> encryption information in each part upload using the following headers.
> {code}
>  
> You can find a patch attached to this issue for a better clarification of the 
> problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15267) S3A multipart upload fails when SSE-C encryption is enabled

2018-03-07 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15267:

Status: Patch Available  (was: Open)

> S3A multipart upload fails when SSE-C encryption is enabled
> ---
>
> Key: HADOOP-15267
> URL: https://issues.apache.org/jira/browse/HADOOP-15267
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.1.0
> Environment: Hadoop 3.1 Snapshot
>Reporter: Anis Elleuch
>Assignee: Anis Elleuch
>Priority: Critical
> Attachments: HADOOP-15267-001.patch, HADOOP-15267-002.patch, 
> HADOOP-15267-003.patch
>
>
> When I enable SSE-C encryption in Hadoop 3.1 and set  fs.s3a.multipart.size 
> to 5 Mb, storing data in AWS doesn't work anymore. For example, running the 
> following code:
> {code}
> >>> df1 = spark.read.json('/home/user/people.json')
> >>> df1.write.mode("overwrite").json("s3a://testbucket/people.json")
> {code}
> shows the following exception:
> {code:java}
> com.amazonaws.services.s3.model.AmazonS3Exception: The multipart upload 
> initiate requested encryption. Subsequent part requests must include the 
> appropriate encryption parameters.
> {code}
> After some investigation, I discovered that hadoop-aws doesn't send SSE-C 
> headers in Put Object Part as stated in AWS specification: 
> [https://docs.aws.amazon.com/AmazonS3/latest/API/mpUploadUploadPart.html]
> {code:java}
> If you requested server-side encryption using a customer-provided encryption 
> key in your initiate multipart upload request, you must provide identical 
> encryption information in each part upload using the following headers.
> {code}
>  
> You can find a patch attached to this issue for a better clarification of the 
> problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13126) Add Brotli compression codec

2018-03-07 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389593#comment-16389593
 ] 

genericqa commented on HADOOP-13126:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  6s{color} 
| {color:red} HADOOP-13126 does not apply to trunk. Rebase required? Wrong 
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HADOOP-13126 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12831838/HADOOP-13126.5.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14270/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Add Brotli compression codec
> 
>
> Key: HADOOP-13126
> URL: https://issues.apache.org/jira/browse/HADOOP-13126
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: io
>Affects Versions: 2.7.2
>Reporter: Ryan Blue
>Assignee: Ryan Blue
>Priority: Major
> Attachments: HADOOP-13126.1.patch, HADOOP-13126.2.patch, 
> HADOOP-13126.3.patch, HADOOP-13126.4.patch, HADOOP-13126.5.patch
>
>
> I've been testing [Brotli|https://github.com/google/brotli/], a new 
> compression library based on LZ77 from Google. Google's [brotli 
> benchmarks|https://cran.r-project.org/web/packages/brotli/vignettes/brotli-2015-09-22.pdf]
>  look really good and we're also seeing a significant improvement in 
> compression size, compression speed, or both.
> {code:title=Brotli preliminary test results}
> [blue@work Downloads]$ time parquet from test.parquet -o test.snappy.parquet 
> --compression-codec snappy --overwrite  
> real1m17.106s
> user1m30.804s
> sys 0m4.404s
> [blue@work Downloads]$ time parquet from test.parquet -o test.br.parquet 
> --compression-codec brotli --overwrite 
> real1m16.640s
> user1m24.244s
> sys 0m6.412s
> [blue@work Downloads]$ time parquet from test.parquet -o test.gz.parquet 
> --compression-codec gzip --overwrite
> real3m39.496s
> user3m48.736s
> sys 0m3.880s
> [blue@work Downloads]$ ls -l
> -rw-r--r-- 1 blue blue 1068821936 May 10 11:06 test.br.parquet
> -rw-r--r-- 1 blue blue 1421601880 May 10 11:10 test.gz.parquet
> -rw-r--r-- 1 blue blue 2265950833 May 10 10:30 test.snappy.parquet
> {code}
> Brotli, at quality 1, is as fast as snappy and ends up smaller than gzip-9. 
> Another test resulted in a slightly larger Brotli file than gzip produced, 
> but Brotli was 4x faster. I'd like to get this compression codec into Hadoop.
> [Brotli is licensed with the MIT 
> license|https://github.com/google/brotli/blob/master/LICENSE], and the [JNI 
> library jbrotli is 
> ALv2|https://github.com/MeteoGroup/jbrotli/blob/master/LICENSE].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13126) Add Brotli compression codec

2018-03-07 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389588#comment-16389588
 ] 

Steve Loughran commented on HADOOP-13126:
-

Not noticed this before. It would seem OK for Hadoop 3.2/2.10+; too late for 
the 3.1.

Would probably need some more tests. Maybe even adding a test resource with a 
brotli compressed file as the reference; all a round trip does is verify that 
you can round trip "something", not that the compressor is working

> Add Brotli compression codec
> 
>
> Key: HADOOP-13126
> URL: https://issues.apache.org/jira/browse/HADOOP-13126
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: io
>Affects Versions: 2.7.2
>Reporter: Ryan Blue
>Assignee: Ryan Blue
>Priority: Major
> Attachments: HADOOP-13126.1.patch, HADOOP-13126.2.patch, 
> HADOOP-13126.3.patch, HADOOP-13126.4.patch, HADOOP-13126.5.patch
>
>
> I've been testing [Brotli|https://github.com/google/brotli/], a new 
> compression library based on LZ77 from Google. Google's [brotli 
> benchmarks|https://cran.r-project.org/web/packages/brotli/vignettes/brotli-2015-09-22.pdf]
>  look really good and we're also seeing a significant improvement in 
> compression size, compression speed, or both.
> {code:title=Brotli preliminary test results}
> [blue@work Downloads]$ time parquet from test.parquet -o test.snappy.parquet 
> --compression-codec snappy --overwrite  
> real1m17.106s
> user1m30.804s
> sys 0m4.404s
> [blue@work Downloads]$ time parquet from test.parquet -o test.br.parquet 
> --compression-codec brotli --overwrite 
> real1m16.640s
> user1m24.244s
> sys 0m6.412s
> [blue@work Downloads]$ time parquet from test.parquet -o test.gz.parquet 
> --compression-codec gzip --overwrite
> real3m39.496s
> user3m48.736s
> sys 0m3.880s
> [blue@work Downloads]$ ls -l
> -rw-r--r-- 1 blue blue 1068821936 May 10 11:06 test.br.parquet
> -rw-r--r-- 1 blue blue 1421601880 May 10 11:10 test.gz.parquet
> -rw-r--r-- 1 blue blue 2265950833 May 10 10:30 test.snappy.parquet
> {code}
> Brotli, at quality 1, is as fast as snappy and ends up smaller than gzip-9. 
> Another test resulted in a slightly larger Brotli file than gzip produced, 
> but Brotli was 4x faster. I'd like to get this compression codec into Hadoop.
> [Brotli is licensed with the MIT 
> license|https://github.com/google/brotli/blob/master/LICENSE], and the [JNI 
> library jbrotli is 
> ALv2|https://github.com/MeteoGroup/jbrotli/blob/master/LICENSE].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-15293) TestLogLevel fails on Java 9

2018-03-07 Thread Takanobu Asanuma (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389496#comment-16389496
 ] 

Takanobu Asanuma commented on HADOOP-15293:
---

Uploaded the 1st patch. The error message is slightly changed between java8 and 
java9. The patch uses one more try-catch block to handle both of them.

> TestLogLevel fails on Java 9
> 
>
> Key: HADOOP-15293
> URL: https://issues.apache.org/jira/browse/HADOOP-15293
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: test
> Environment: Applied HADOOP-12760 and HDFS-11610
>Reporter: Akira Ajisaka
>Assignee: Takanobu Asanuma
>Priority: Major
> Attachments: HADOOP-15293.1.patch
>
>
> {noformat}
> [INFO] Running org.apache.hadoop.log.TestLogLevel
> [ERROR] Tests run: 7, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 9.805 
> s <<< FAILURE! - in org.apache.hadoop.log.TestLogLevel
> [ERROR] testLogLevelByHttpWithSpnego(org.apache.hadoop.log.TestLogLevel)  
> Time elapsed: 1.179 s  <<< FAILURE!
> java.lang.AssertionError: 
>  Expected to find 'Unrecognized SSL message' but got unexpected exception: 
> javax.net.ssl.SSLException: Unsupported or unrecognized SSL message
>   at 
> java.base/sun.security.ssl.SSLSocketInputRecord.handleUnknownRecord(SSLSocketInputRecord.java:416)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15293) TestLogLevel fails on Java 9

2018-03-07 Thread Takanobu Asanuma (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma updated HADOOP-15293:
--
Status: Patch Available  (was: Open)

> TestLogLevel fails on Java 9
> 
>
> Key: HADOOP-15293
> URL: https://issues.apache.org/jira/browse/HADOOP-15293
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: test
> Environment: Applied HADOOP-12760 and HDFS-11610
>Reporter: Akira Ajisaka
>Assignee: Takanobu Asanuma
>Priority: Major
> Attachments: HADOOP-15293.1.patch
>
>
> {noformat}
> [INFO] Running org.apache.hadoop.log.TestLogLevel
> [ERROR] Tests run: 7, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 9.805 
> s <<< FAILURE! - in org.apache.hadoop.log.TestLogLevel
> [ERROR] testLogLevelByHttpWithSpnego(org.apache.hadoop.log.TestLogLevel)  
> Time elapsed: 1.179 s  <<< FAILURE!
> java.lang.AssertionError: 
>  Expected to find 'Unrecognized SSL message' but got unexpected exception: 
> javax.net.ssl.SSLException: Unsupported or unrecognized SSL message
>   at 
> java.base/sun.security.ssl.SSLSocketInputRecord.handleUnknownRecord(SSLSocketInputRecord.java:416)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15293) TestLogLevel fails on Java 9

2018-03-07 Thread Takanobu Asanuma (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma updated HADOOP-15293:
--
Attachment: HADOOP-15293.1.patch

> TestLogLevel fails on Java 9
> 
>
> Key: HADOOP-15293
> URL: https://issues.apache.org/jira/browse/HADOOP-15293
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: test
> Environment: Applied HADOOP-12760 and HDFS-11610
>Reporter: Akira Ajisaka
>Assignee: Takanobu Asanuma
>Priority: Major
> Attachments: HADOOP-15293.1.patch
>
>
> {noformat}
> [INFO] Running org.apache.hadoop.log.TestLogLevel
> [ERROR] Tests run: 7, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 9.805 
> s <<< FAILURE! - in org.apache.hadoop.log.TestLogLevel
> [ERROR] testLogLevelByHttpWithSpnego(org.apache.hadoop.log.TestLogLevel)  
> Time elapsed: 1.179 s  <<< FAILURE!
> java.lang.AssertionError: 
>  Expected to find 'Unrecognized SSL message' but got unexpected exception: 
> javax.net.ssl.SSLException: Unsupported or unrecognized SSL message
>   at 
> java.base/sun.security.ssl.SSLSocketInputRecord.handleUnknownRecord(SSLSocketInputRecord.java:416)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-15293) TestLogLevel fails on Java 9

2018-03-07 Thread Takanobu Asanuma (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389439#comment-16389439
 ] 

Takanobu Asanuma commented on HADOOP-15293:
---

I would like to work on this issue. Thanks.

> TestLogLevel fails on Java 9
> 
>
> Key: HADOOP-15293
> URL: https://issues.apache.org/jira/browse/HADOOP-15293
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: test
> Environment: Applied HADOOP-12760 and HDFS-11610
>Reporter: Akira Ajisaka
>Priority: Major
>
> {noformat}
> [INFO] Running org.apache.hadoop.log.TestLogLevel
> [ERROR] Tests run: 7, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 9.805 
> s <<< FAILURE! - in org.apache.hadoop.log.TestLogLevel
> [ERROR] testLogLevelByHttpWithSpnego(org.apache.hadoop.log.TestLogLevel)  
> Time elapsed: 1.179 s  <<< FAILURE!
> java.lang.AssertionError: 
>  Expected to find 'Unrecognized SSL message' but got unexpected exception: 
> javax.net.ssl.SSLException: Unsupported or unrecognized SSL message
>   at 
> java.base/sun.security.ssl.SSLSocketInputRecord.handleUnknownRecord(SSLSocketInputRecord.java:416)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-14445) Delegation tokens are not shared between KMS instances

2018-03-07 Thread Xiao Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HADOOP-14445:
---
Attachment: (was: HADOOP-14445.004.patch)

> Delegation tokens are not shared between KMS instances
> --
>
> Key: HADOOP-14445
> URL: https://issues.apache.org/jira/browse/HADOOP-14445
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: kms
>Affects Versions: 2.8.0, 3.0.0-alpha1
> Environment: CDH5.7.4, Kerberized, SSL, KMS-HA, at rest encryption
>Reporter: Wei-Chiu Chuang
>Assignee: Xiao Chen
>Priority: Major
> Attachments: HADOOP-14445-branch-2.8.002.patch, 
> HADOOP-14445-branch-2.8.patch, HADOOP-14445.002.patch, 
> HADOOP-14445.003.patch, HADOOP-14445.004.patch
>
>
> As discovered in HADOOP-14441, KMS HA using LoadBalancingKMSClientProvider do 
> not share delegation tokens. (a client uses KMS address/port as the key for 
> delegation token)
> {code:title=DelegationTokenAuthenticatedURL#openConnection}
> if (!creds.getAllTokens().isEmpty()) {
> InetSocketAddress serviceAddr = new InetSocketAddress(url.getHost(),
> url.getPort());
> Text service = SecurityUtil.buildTokenService(serviceAddr);
> dToken = creds.getToken(service);
> {code}
> But KMS doc states:
> {quote}
> Delegation Tokens
> Similar to HTTP authentication, KMS uses Hadoop Authentication for delegation 
> tokens too.
> Under HA, A KMS instance must verify the delegation token given by another 
> KMS instance, by checking the shared secret used to sign the delegation 
> token. To do this, all KMS instances must be able to retrieve the shared 
> secret from ZooKeeper.
> {quote}
> We should either update the KMS documentation, or fix this code to share 
> delegation tokens.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-14445) Delegation tokens are not shared between KMS instances

2018-03-07 Thread Xiao Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389216#comment-16389216
 ] 

Xiao Chen commented on HADOOP-14445:


Attached a preliminary [^HADOOP-14445.004.patch] that handles this via token 
kind. Code needs cleaning up but it should show the general idea. [~shahrs87] 
and [~daryn], would you mind to take a quick look to see if this converges with 
what you had in mind?
 - Added a new token kind {{KMS_DELEGATION_TOKEN}}, and made the old {{kms-dt}} 
deprecated.
 - New client will dup a legacy token if the newly created token is the new 
kind. This duplication is on by default but can be turned off by config.
 - On authentication, the token selection logic will first favor the new kind, 
and fall back to the old way if not found.
 - Added a dedicated Renewer to handle the new token kind. Old renewer behavior 
unchanged.

Tested this works via some cross-cluster bidirectional distcp runs, where one 
cluster was upgraded and one was not. Will go through the patch on Wednesday to 
polish things up, and add more unit tests.

> Delegation tokens are not shared between KMS instances
> --
>
> Key: HADOOP-14445
> URL: https://issues.apache.org/jira/browse/HADOOP-14445
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: kms
>Affects Versions: 2.8.0, 3.0.0-alpha1
> Environment: CDH5.7.4, Kerberized, SSL, KMS-HA, at rest encryption
>Reporter: Wei-Chiu Chuang
>Assignee: Xiao Chen
>Priority: Major
> Attachments: HADOOP-14445-branch-2.8.002.patch, 
> HADOOP-14445-branch-2.8.patch, HADOOP-14445.002.patch, 
> HADOOP-14445.003.patch, HADOOP-14445.004.patch, HADOOP-14445.004.patch
>
>
> As discovered in HADOOP-14441, KMS HA using LoadBalancingKMSClientProvider do 
> not share delegation tokens. (a client uses KMS address/port as the key for 
> delegation token)
> {code:title=DelegationTokenAuthenticatedURL#openConnection}
> if (!creds.getAllTokens().isEmpty()) {
> InetSocketAddress serviceAddr = new InetSocketAddress(url.getHost(),
> url.getPort());
> Text service = SecurityUtil.buildTokenService(serviceAddr);
> dToken = creds.getToken(service);
> {code}
> But KMS doc states:
> {quote}
> Delegation Tokens
> Similar to HTTP authentication, KMS uses Hadoop Authentication for delegation 
> tokens too.
> Under HA, A KMS instance must verify the delegation token given by another 
> KMS instance, by checking the shared secret used to sign the delegation 
> token. To do this, all KMS instances must be able to retrieve the shared 
> secret from ZooKeeper.
> {quote}
> We should either update the KMS documentation, or fix this code to share 
> delegation tokens.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

95 matches

Mail list logo