[jira] [Updated] (HADOOP-14445) Delegation tokens are not shared between KMS instances
[ https://issues.apache.org/jira/browse/HADOOP-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Chen updated HADOOP-14445: --- Attachment: (was: HADOOP-14445.05.patch) > Delegation tokens are not shared between KMS instances > -- > > Key: HADOOP-14445 > URL: https://issues.apache.org/jira/browse/HADOOP-14445 > Project: Hadoop Common > Issue Type: Bug > Components: kms >Affects Versions: 2.8.0, 3.0.0-alpha1 > Environment: CDH5.7.4, Kerberized, SSL, KMS-HA, at rest encryption >Reporter: Wei-Chiu Chuang >Assignee: Xiao Chen >Priority: Major > Attachments: HADOOP-14445-branch-2.8.002.patch, > HADOOP-14445-branch-2.8.patch, HADOOP-14445.002.patch, > HADOOP-14445.003.patch, HADOOP-14445.004.patch, HADOOP-14445.05.patch > > > As discovered in HADOOP-14441, KMS HA using LoadBalancingKMSClientProvider do > not share delegation tokens. (a client uses KMS address/port as the key for > delegation token) > {code:title=DelegationTokenAuthenticatedURL#openConnection} > if (!creds.getAllTokens().isEmpty()) { > InetSocketAddress serviceAddr = new InetSocketAddress(url.getHost(), > url.getPort()); > Text service = SecurityUtil.buildTokenService(serviceAddr); > dToken = creds.getToken(service); > {code} > But KMS doc states: > {quote} > Delegation Tokens > Similar to HTTP authentication, KMS uses Hadoop Authentication for delegation > tokens too. > Under HA, A KMS instance must verify the delegation token given by another > KMS instance, by checking the shared secret used to sign the delegation > token. To do this, all KMS instances must be able to retrieve the shared > secret from ZooKeeper. > {quote} > We should either update the KMS documentation, or fix this code to share > delegation tokens. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14445) Delegation tokens are not shared between KMS instances
[ https://issues.apache.org/jira/browse/HADOOP-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Chen updated HADOOP-14445: --- Attachment: HADOOP-14445.05.patch > Delegation tokens are not shared between KMS instances > -- > > Key: HADOOP-14445 > URL: https://issues.apache.org/jira/browse/HADOOP-14445 > Project: Hadoop Common > Issue Type: Bug > Components: kms >Affects Versions: 2.8.0, 3.0.0-alpha1 > Environment: CDH5.7.4, Kerberized, SSL, KMS-HA, at rest encryption >Reporter: Wei-Chiu Chuang >Assignee: Xiao Chen >Priority: Major > Attachments: HADOOP-14445-branch-2.8.002.patch, > HADOOP-14445-branch-2.8.patch, HADOOP-14445.002.patch, > HADOOP-14445.003.patch, HADOOP-14445.004.patch, HADOOP-14445.05.patch > > > As discovered in HADOOP-14441, KMS HA using LoadBalancingKMSClientProvider do > not share delegation tokens. (a client uses KMS address/port as the key for > delegation token) > {code:title=DelegationTokenAuthenticatedURL#openConnection} > if (!creds.getAllTokens().isEmpty()) { > InetSocketAddress serviceAddr = new InetSocketAddress(url.getHost(), > url.getPort()); > Text service = SecurityUtil.buildTokenService(serviceAddr); > dToken = creds.getToken(service); > {code} > But KMS doc states: > {quote} > Delegation Tokens > Similar to HTTP authentication, KMS uses Hadoop Authentication for delegation > tokens too. > Under HA, A KMS instance must verify the delegation token given by another > KMS instance, by checking the shared secret used to sign the delegation > token. To do this, all KMS instances must be able to retrieve the shared > secret from ZooKeeper. > {quote} > We should either update the KMS documentation, or fix this code to share > delegation tokens. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14445) Delegation tokens are not shared between KMS instances
[ https://issues.apache.org/jira/browse/HADOOP-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Chen updated HADOOP-14445: --- Status: Patch Available (was: Open) > Delegation tokens are not shared between KMS instances > -- > > Key: HADOOP-14445 > URL: https://issues.apache.org/jira/browse/HADOOP-14445 > Project: Hadoop Common > Issue Type: Bug > Components: kms >Affects Versions: 3.0.0-alpha1, 2.8.0 > Environment: CDH5.7.4, Kerberized, SSL, KMS-HA, at rest encryption >Reporter: Wei-Chiu Chuang >Assignee: Xiao Chen >Priority: Major > Attachments: HADOOP-14445-branch-2.8.002.patch, > HADOOP-14445-branch-2.8.patch, HADOOP-14445.002.patch, > HADOOP-14445.003.patch, HADOOP-14445.004.patch, HADOOP-14445.05.patch > > > As discovered in HADOOP-14441, KMS HA using LoadBalancingKMSClientProvider do > not share delegation tokens. (a client uses KMS address/port as the key for > delegation token) > {code:title=DelegationTokenAuthenticatedURL#openConnection} > if (!creds.getAllTokens().isEmpty()) { > InetSocketAddress serviceAddr = new InetSocketAddress(url.getHost(), > url.getPort()); > Text service = SecurityUtil.buildTokenService(serviceAddr); > dToken = creds.getToken(service); > {code} > But KMS doc states: > {quote} > Delegation Tokens > Similar to HTTP authentication, KMS uses Hadoop Authentication for delegation > tokens too. > Under HA, A KMS instance must verify the delegation token given by another > KMS instance, by checking the shared secret used to sign the delegation > token. To do this, all KMS instances must be able to retrieve the shared > secret from ZooKeeper. > {quote} > We should either update the KMS documentation, or fix this code to share > delegation tokens. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14445) Delegation tokens are not shared between KMS instances
[ https://issues.apache.org/jira/browse/HADOOP-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390877#comment-16390877 ] Xiao Chen commented on HADOOP-14445: Patch 5 ready for review. All previous comments addressed, except for this one from [~daryn] that I want some discussions: {quote}The semantics for getDelegationTokenService are oddly cyclical. I'd expect it, like other hadoop clients, to premeditate the service name. ... {quote} If I understand correctly, in this patch it would be {{KMSCP#getKMSToken}}. It currently uses token selector, so should be inline with other services. The other part is in {{DelegationTokenAuthenticatedURL#selectDelegationToken}} - it's a general class so it does not aware of the token kind of the service that's using it. So I think we should be good to just do it in KMSCP. > Delegation tokens are not shared between KMS instances > -- > > Key: HADOOP-14445 > URL: https://issues.apache.org/jira/browse/HADOOP-14445 > Project: Hadoop Common > Issue Type: Bug > Components: kms >Affects Versions: 2.8.0, 3.0.0-alpha1 > Environment: CDH5.7.4, Kerberized, SSL, KMS-HA, at rest encryption >Reporter: Wei-Chiu Chuang >Assignee: Xiao Chen >Priority: Major > Attachments: HADOOP-14445-branch-2.8.002.patch, > HADOOP-14445-branch-2.8.patch, HADOOP-14445.002.patch, > HADOOP-14445.003.patch, HADOOP-14445.004.patch, HADOOP-14445.05.patch > > > As discovered in HADOOP-14441, KMS HA using LoadBalancingKMSClientProvider do > not share delegation tokens. (a client uses KMS address/port as the key for > delegation token) > {code:title=DelegationTokenAuthenticatedURL#openConnection} > if (!creds.getAllTokens().isEmpty()) { > InetSocketAddress serviceAddr = new InetSocketAddress(url.getHost(), > url.getPort()); > Text service = SecurityUtil.buildTokenService(serviceAddr); > dToken = creds.getToken(service); > {code} > But KMS doc states: > {quote} > Delegation Tokens > Similar to HTTP authentication, KMS uses Hadoop Authentication for delegation > tokens too. > Under HA, A KMS instance must verify the delegation token given by another > KMS instance, by checking the shared secret used to sign the delegation > token. To do this, all KMS instances must be able to retrieve the shared > secret from ZooKeeper. > {quote} > We should either update the KMS documentation, or fix this code to share > delegation tokens. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15296) Fix a wrong link for RBF in the top page
[ https://issues.apache.org/jira/browse/HADOOP-15296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390873#comment-16390873 ] Yiqun Lin commented on HADOOP-15296: LGTM, +1, committing... > Fix a wrong link for RBF in the top page > > > Key: HADOOP-15296 > URL: https://issues.apache.org/jira/browse/HADOOP-15296 > Project: Hadoop Common > Issue Type: Bug > Components: documentation >Reporter: Takanobu Asanuma >Assignee: Takanobu Asanuma >Priority: Minor > Attachments: HADOOP-15296.1.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14445) Delegation tokens are not shared between KMS instances
[ https://issues.apache.org/jira/browse/HADOOP-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Chen updated HADOOP-14445: --- Attachment: HADOOP-14445.05.patch > Delegation tokens are not shared between KMS instances > -- > > Key: HADOOP-14445 > URL: https://issues.apache.org/jira/browse/HADOOP-14445 > Project: Hadoop Common > Issue Type: Bug > Components: kms >Affects Versions: 2.8.0, 3.0.0-alpha1 > Environment: CDH5.7.4, Kerberized, SSL, KMS-HA, at rest encryption >Reporter: Wei-Chiu Chuang >Assignee: Xiao Chen >Priority: Major > Attachments: HADOOP-14445-branch-2.8.002.patch, > HADOOP-14445-branch-2.8.patch, HADOOP-14445.002.patch, > HADOOP-14445.003.patch, HADOOP-14445.004.patch, HADOOP-14445.05.patch > > > As discovered in HADOOP-14441, KMS HA using LoadBalancingKMSClientProvider do > not share delegation tokens. (a client uses KMS address/port as the key for > delegation token) > {code:title=DelegationTokenAuthenticatedURL#openConnection} > if (!creds.getAllTokens().isEmpty()) { > InetSocketAddress serviceAddr = new InetSocketAddress(url.getHost(), > url.getPort()); > Text service = SecurityUtil.buildTokenService(serviceAddr); > dToken = creds.getToken(service); > {code} > But KMS doc states: > {quote} > Delegation Tokens > Similar to HTTP authentication, KMS uses Hadoop Authentication for delegation > tokens too. > Under HA, A KMS instance must verify the delegation token given by another > KMS instance, by checking the shared secret used to sign the delegation > token. To do this, all KMS instances must be able to retrieve the shared > secret from ZooKeeper. > {quote} > We should either update the KMS documentation, or fix this code to share > delegation tokens. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15280) TestKMS.testWebHDFSProxyUserKerb and TestKMS.testWebHDFSProxyUserSimple fail in trunk
[ https://issues.apache.org/jira/browse/HADOOP-15280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390822#comment-16390822 ] Xiao Chen commented on HADOOP-15280: Thanks for working on this issue [~bharatviswa]. I agree we should change the test. But instead of changing it to validate the wrapper message, we should still validate the original message ("Forbidden") instead of the wrapper message. Otherwise it defeats the purpose of the assertion. Maybe we can add a utility method into GenericTestUtils so this can be reused. > TestKMS.testWebHDFSProxyUserKerb and TestKMS.testWebHDFSProxyUserSimple fail > in trunk > - > > Key: HADOOP-15280 > URL: https://issues.apache.org/jira/browse/HADOOP-15280 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 3.2.0 >Reporter: Ray Chiang >Assignee: Bharat Viswanadham >Priority: Major > Attachments: HADOOP-15280.00.patch > > > I'm seeing these messages on OS X and on Linux. > {noformat} > [ERROR] Failures: > [ERROR] > TestKMS.testWebHDFSProxyUserKerb:2526->doWebHDFSProxyUserTest:2625->runServer:158->runServer:176 > org.apache.hadoop.security.authentication.client.AuthenticationException: > Error while authenticating with endpoint: > http://localhost:56112/kms/v1/keys?doAs=foo1 > [ERROR] > TestKMS.testWebHDFSProxyUserSimple:2531->doWebHDFSProxyUserTest:2625->runServer:158->runServer:176 > org.apache.hadoop.security.authentication.client.AuthenticationException: > Error while authenticating with endpoint: > http://localhost:56206/kms/v1/keys?doAs=foo1 > {noformat} > as well as a [recent PreCommit-HADOOP-Build > job|https://builds.apache.org/job/PreCommit-HADOOP-Build/14235/]. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15280) TestKMS.testWebHDFSProxyUserKerb and TestKMS.testWebHDFSProxyUserSimple fail in trunk
[ https://issues.apache.org/jira/browse/HADOOP-15280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390797#comment-16390797 ] genericqa commented on HADOOP-15280: | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 29s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 14m 11s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 22s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 55s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 33s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 12m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 22s{color} | {color:green} hadoop-common-project/hadoop-kms: The patch generated 0 new + 97 unchanged - 1 fixed = 97 total (was 98) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 17s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 4m 10s{color} | {color:green} hadoop-kms in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 37s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 75m 0s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:d4cc50f | | JIRA Issue | HADOOP-15280 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12913516/HADOOP-15280.00.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 2dd9e48e5433 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 583f459 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_151 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-HADOOP-Build/14280/testReport/ | | Max. process+thread count | 315 (vs. ulimit of 1) | | modules | C: hadoop-common-project/hadoop-kms U: hadoop-common-project/hadoop-kms | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/14280/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > TestKMS.testWebHDFSProxyUserKerb and TestKMS.testWebHDFSProxyUserSimple
[jira] [Assigned] (HADOOP-15280) TestKMS.testWebHDFSProxyUserKerb and TestKMS.testWebHDFSProxyUserSimple fail in trunk
[ https://issues.apache.org/jira/browse/HADOOP-15280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham reassigned HADOOP-15280: --- Assignee: Bharat Viswanadham > TestKMS.testWebHDFSProxyUserKerb and TestKMS.testWebHDFSProxyUserSimple fail > in trunk > - > > Key: HADOOP-15280 > URL: https://issues.apache.org/jira/browse/HADOOP-15280 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 3.2.0 >Reporter: Ray Chiang >Assignee: Bharat Viswanadham >Priority: Major > Attachments: HADOOP-15280.00.patch > > > I'm seeing these messages on OS X and on Linux. > {noformat} > [ERROR] Failures: > [ERROR] > TestKMS.testWebHDFSProxyUserKerb:2526->doWebHDFSProxyUserTest:2625->runServer:158->runServer:176 > org.apache.hadoop.security.authentication.client.AuthenticationException: > Error while authenticating with endpoint: > http://localhost:56112/kms/v1/keys?doAs=foo1 > [ERROR] > TestKMS.testWebHDFSProxyUserSimple:2531->doWebHDFSProxyUserTest:2625->runServer:158->runServer:176 > org.apache.hadoop.security.authentication.client.AuthenticationException: > Error while authenticating with endpoint: > http://localhost:56206/kms/v1/keys?doAs=foo1 > {noformat} > as well as a [recent PreCommit-HADOOP-Build > job|https://builds.apache.org/job/PreCommit-HADOOP-Build/14235/]. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15280) TestKMS.testWebHDFSProxyUserKerb and TestKMS.testWebHDFSProxyUserSimple fail in trunk
[ https://issues.apache.org/jira/browse/HADOOP-15280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390754#comment-16390754 ] Bharat Viswanadham commented on HADOOP-15280: - [~xiaochen] This change is due to different exception message being thrown. Fixed test case to address this issue. > TestKMS.testWebHDFSProxyUserKerb and TestKMS.testWebHDFSProxyUserSimple fail > in trunk > - > > Key: HADOOP-15280 > URL: https://issues.apache.org/jira/browse/HADOOP-15280 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 3.2.0 >Reporter: Ray Chiang >Priority: Major > Attachments: HADOOP-15280.00.patch > > > I'm seeing these messages on OS X and on Linux. > {noformat} > [ERROR] Failures: > [ERROR] > TestKMS.testWebHDFSProxyUserKerb:2526->doWebHDFSProxyUserTest:2625->runServer:158->runServer:176 > org.apache.hadoop.security.authentication.client.AuthenticationException: > Error while authenticating with endpoint: > http://localhost:56112/kms/v1/keys?doAs=foo1 > [ERROR] > TestKMS.testWebHDFSProxyUserSimple:2531->doWebHDFSProxyUserTest:2625->runServer:158->runServer:176 > org.apache.hadoop.security.authentication.client.AuthenticationException: > Error while authenticating with endpoint: > http://localhost:56206/kms/v1/keys?doAs=foo1 > {noformat} > as well as a [recent PreCommit-HADOOP-Build > job|https://builds.apache.org/job/PreCommit-HADOOP-Build/14235/]. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15280) TestKMS.testWebHDFSProxyUserKerb and TestKMS.testWebHDFSProxyUserSimple fail in trunk
[ https://issues.apache.org/jira/browse/HADOOP-15280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham updated HADOOP-15280: Attachment: HADOOP-15280.00.patch > TestKMS.testWebHDFSProxyUserKerb and TestKMS.testWebHDFSProxyUserSimple fail > in trunk > - > > Key: HADOOP-15280 > URL: https://issues.apache.org/jira/browse/HADOOP-15280 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 3.2.0 >Reporter: Ray Chiang >Priority: Major > Attachments: HADOOP-15280.00.patch > > > I'm seeing these messages on OS X and on Linux. > {noformat} > [ERROR] Failures: > [ERROR] > TestKMS.testWebHDFSProxyUserKerb:2526->doWebHDFSProxyUserTest:2625->runServer:158->runServer:176 > org.apache.hadoop.security.authentication.client.AuthenticationException: > Error while authenticating with endpoint: > http://localhost:56112/kms/v1/keys?doAs=foo1 > [ERROR] > TestKMS.testWebHDFSProxyUserSimple:2531->doWebHDFSProxyUserTest:2625->runServer:158->runServer:176 > org.apache.hadoop.security.authentication.client.AuthenticationException: > Error while authenticating with endpoint: > http://localhost:56206/kms/v1/keys?doAs=foo1 > {noformat} > as well as a [recent PreCommit-HADOOP-Build > job|https://builds.apache.org/job/PreCommit-HADOOP-Build/14235/]. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15280) TestKMS.testWebHDFSProxyUserKerb and TestKMS.testWebHDFSProxyUserSimple fail in trunk
[ https://issues.apache.org/jira/browse/HADOOP-15280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham updated HADOOP-15280: Status: Patch Available (was: Open) > TestKMS.testWebHDFSProxyUserKerb and TestKMS.testWebHDFSProxyUserSimple fail > in trunk > - > > Key: HADOOP-15280 > URL: https://issues.apache.org/jira/browse/HADOOP-15280 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 3.2.0 >Reporter: Ray Chiang >Priority: Major > Attachments: HADOOP-15280.00.patch > > > I'm seeing these messages on OS X and on Linux. > {noformat} > [ERROR] Failures: > [ERROR] > TestKMS.testWebHDFSProxyUserKerb:2526->doWebHDFSProxyUserTest:2625->runServer:158->runServer:176 > org.apache.hadoop.security.authentication.client.AuthenticationException: > Error while authenticating with endpoint: > http://localhost:56112/kms/v1/keys?doAs=foo1 > [ERROR] > TestKMS.testWebHDFSProxyUserSimple:2531->doWebHDFSProxyUserTest:2625->runServer:158->runServer:176 > org.apache.hadoop.security.authentication.client.AuthenticationException: > Error while authenticating with endpoint: > http://localhost:56206/kms/v1/keys?doAs=foo1 > {noformat} > as well as a [recent PreCommit-HADOOP-Build > job|https://builds.apache.org/job/PreCommit-HADOOP-Build/14235/]. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-12897) KerberosAuthenticator.authenticate to include URL on IO failures
[ https://issues.apache.org/jira/browse/HADOOP-12897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390733#comment-16390733 ] Xiao Chen commented on HADOOP-12897: Hi [~ajayydv] and [~arpitagarwal], do you plan to look into the HADOOP-15280 breakage? > KerberosAuthenticator.authenticate to include URL on IO failures > > > Key: HADOOP-12897 > URL: https://issues.apache.org/jira/browse/HADOOP-12897 > Project: Hadoop Common > Issue Type: Improvement > Components: security >Affects Versions: 2.8.0 >Reporter: Steve Loughran >Assignee: Ajay Kumar >Priority: Minor > Fix For: 3.2.0 > > Attachments: HADOOP-12897.001.patch, HADOOP-12897.002.patch, > HADOOP-12897.003.patch, HADOOP-12897.004.patch, HADOOP-12897.005.patch, > HADOOP-12897.006.patch, HADOOP-12897.007.patch > > > If {{KerberosAuthenticator.authenticate}} can't connect to the endpoint, you > get a stack trace, but without the URL it is trying to talk to. > That is: it doesn't have any equivalent of the {{NetUtils.wrapException}} > handler —which can't be called here as its not in the {{hadoop-auth}} module -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15158) AliyunOSS: Supports role based credential in URL
[ https://issues.apache.org/jira/browse/HADOOP-15158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated HADOOP-15158: Fix Version/s: (was: 3.0.1) (was: 2.9.1) (was: 3.1.0) > AliyunOSS: Supports role based credential in URL > > > Key: HADOOP-15158 > URL: https://issues.apache.org/jira/browse/HADOOP-15158 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/oss >Affects Versions: 3.0.0 >Reporter: wujinhu >Assignee: wujinhu >Priority: Major > Attachments: HADOOP-15158.001.patch, HADOOP-15158.002.patch, > HADOOP-15158.003.patch, HADOOP-15158.004.patch, HADOOP-15158.005.patch > > > Currently, AliyunCredentialsProvider supports credential by > configuration(core-site.xml). Sometimes, admin wants to create different > temporary credential(key/secret/token) for different roles so that one role > cannot read data that belongs to another role. > So, our code should support pass in the URI when creates an > XXXCredentialsProvider so that we can get user info(role) from the URI -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15262) AliyunOSS: rename() to move files in a directory in parallel
[ https://issues.apache.org/jira/browse/HADOOP-15262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390703#comment-16390703 ] Wangda Tan commented on HADOOP-15262: - [~wujinhu], removed fix version, it should be set by committer when the patch got committed. > AliyunOSS: rename() to move files in a directory in parallel > > > Key: HADOOP-15262 > URL: https://issues.apache.org/jira/browse/HADOOP-15262 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0 >Reporter: wujinhu >Assignee: wujinhu >Priority: Major > Attachments: HADOOP-15262.001.patch, HADOOP-15262.002.patch, > HADOOP-15262.003.patch, HADOOP-15262.004.patch, HADOOP-15262.005.patch > > > Currently, rename() operation renames files in series. This will be slow if a > directory contains many files. So we can improve this by rename files in > parallel. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15262) AliyunOSS: rename() to move files in a directory in parallel
[ https://issues.apache.org/jira/browse/HADOOP-15262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated HADOOP-15262: Fix Version/s: (was: 3.0.1) (was: 2.9.1) (was: 3.1.0) > AliyunOSS: rename() to move files in a directory in parallel > > > Key: HADOOP-15262 > URL: https://issues.apache.org/jira/browse/HADOOP-15262 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0 >Reporter: wujinhu >Assignee: wujinhu >Priority: Major > Attachments: HADOOP-15262.001.patch, HADOOP-15262.002.patch, > HADOOP-15262.003.patch, HADOOP-15262.004.patch, HADOOP-15262.005.patch > > > Currently, rename() operation renames files in series. This will be slow if a > directory contains many files. So we can improve this by rename files in > parallel. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15262) AliyunOSS: rename() to move files in a directory in parallel
[ https://issues.apache.org/jira/browse/HADOOP-15262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390652#comment-16390652 ] genericqa commented on HADOOP-15262: | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 54s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 19s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 21s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 38s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 27s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 27s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 17s{color} | {color:green} hadoop-aliyun in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 20s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 43m 44s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:d4cc50f | | JIRA Issue | HADOOP-15262 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12913498/HADOOP-15262.005.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 3236f8039ddc 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 583f459 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_151 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-HADOOP-Build/14279/testReport/ | | Max. process+thread count | 303 (vs. ulimit of 1) | | modules | C: hadoop-tools/hadoop-aliyun U: hadoop-tools/hadoop-aliyun | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/14279/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > AliyunOSS: rename() to move files in a directory in parallel > > > Key: HADOOP-15262 >
[jira] [Comment Edited] (HADOOP-15262) AliyunOSS: rename() to move files in a directory in parallel
[ https://issues.apache.org/jira/browse/HADOOP-15262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390620#comment-16390620 ] wujinhu edited comment on HADOOP-15262 at 3/8/18 2:47 AM: -- Attach HADOOP-15262.005.patch! Set a upper limit to waiting list size. With this patch resolved, users can improve copy performance by increase *fs.oss.max.copy.threads* and *fs.oss.max.copy.tasks.per.dir*(Old version copies directory in series). Generally*, *the greater of the *fs.oss.max.copy.threads* and ** *fs.oss.max.copy.tasks.per.dir***, the better(if we have enough resources) For example, if we set *fs.oss.max.copy.threads = 5* and ** *fs.oss.max.copy.tasks.per.dir = 5*, the copy time will reduce to 1/5 of old version rename(). ** Here is one use case that drives us to have this improvement. Users use spark/tensorFlow/. to train models and save models file to OSS. However, the number of the model files is large, so it will be slow when committing jobs because frameworks will call rename(). was (Author: wujinhu): Attach HADOOP-15262.005.patch! Set a upper limit to waiting list size. With this patch resolved, users can improve copy performance by increase *fs.oss.max.copy.threads* and *fs.oss.max.copy.tasks.per.dir*(Old version copies directory in series). Generally*,* the greater of the *fs.oss.max.copy.threads* and ** *fs.oss.max.copy.tasks.per.dir***, the better(if we have enough resources) Here is one use case that drives us to have this improvement. Users use spark/tensorFlow/. to train models and save models file to OSS. However, the number of the model files is large, so it will be slow when committing jobs because frameworks will call rename(). > AliyunOSS: rename() to move files in a directory in parallel > > > Key: HADOOP-15262 > URL: https://issues.apache.org/jira/browse/HADOOP-15262 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0 >Reporter: wujinhu >Assignee: wujinhu >Priority: Major > Fix For: 3.1.0, 2.9.1, 3.0.1 > > Attachments: HADOOP-15262.001.patch, HADOOP-15262.002.patch, > HADOOP-15262.003.patch, HADOOP-15262.004.patch, HADOOP-15262.005.patch > > > Currently, rename() operation renames files in series. This will be slow if a > directory contains many files. So we can improve this by rename files in > parallel. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-15262) AliyunOSS: rename() to move files in a directory in parallel
[ https://issues.apache.org/jira/browse/HADOOP-15262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385641#comment-16385641 ] wujinhu edited comment on HADOOP-15262 at 3/8/18 2:36 AM: -- Thanks [~Sammi] for your review comments. I have fixed from 1 to 4. For 5, as we all know, copy operation will be inexpensive as oss will support shallow copy soon. User can configure a higher number threads to copy files, so it is a little hard to define the upper limit of the waiting list size(Different from pre-read configuration, because read operations are expensive). However, though the queue is defined as unbounded queue, but we have used SemaphoredDelegatingExecutor to limit the concurrency of one directory. For 6, since we read only one field of AliyunOSSCopyFileContext class, there is no need to call lock()(we may copy one more file when whole rename operation failed, but it's OK). Reduce the call of lock() can also improve our performance. was (Author: wujinhu): Thanks [~Sammi] for your review comments. I have fixed from 1 to 4. For 5, as we all know, copy operation will be inexpensive as oss will support shallow copy soon. User can configure a higher number threads to copy files, so it is a little hard to define the upper limit of the waiting list size(Different from pre-read configuration, because read operations are expensive). However, though the queue is defined as unbounded queue, but we have used SemaphoredDelegatingExecutor to limit the concurrency of one directory. For 6, since we read only one field of AliyunOSSCopyFileContext class, there is no need to call lock(). Reduce the call of lock() can also improve our performance. > AliyunOSS: rename() to move files in a directory in parallel > > > Key: HADOOP-15262 > URL: https://issues.apache.org/jira/browse/HADOOP-15262 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0 >Reporter: wujinhu >Assignee: wujinhu >Priority: Major > Fix For: 3.1.0, 2.9.1, 3.0.1 > > Attachments: HADOOP-15262.001.patch, HADOOP-15262.002.patch, > HADOOP-15262.003.patch, HADOOP-15262.004.patch, HADOOP-15262.005.patch > > > Currently, rename() operation renames files in series. This will be slow if a > directory contains many files. So we can improve this by rename files in > parallel. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15262) AliyunOSS: rename() to move files in a directory in parallel
[ https://issues.apache.org/jira/browse/HADOOP-15262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390620#comment-16390620 ] wujinhu commented on HADOOP-15262: -- Attach HADOOP-15262.005.patch! Set a upper limit to waiting list size. With this patch resolved, users can improve copy performance by increase *fs.oss.max.copy.threads* and *fs.oss.max.copy.tasks.per.dir*(Old version copies directory in series). Generally*,* the greater of the *fs.oss.max.copy.threads* and ** *fs.oss.max.copy.tasks.per.dir***, the better(if we have enough resources) Here is one use case that drives us to have this improvement. Users use spark/tensorFlow/. to train models and save models file to OSS. However, the number of the model files is large, so it will be slow when committing jobs because frameworks will call rename(). > AliyunOSS: rename() to move files in a directory in parallel > > > Key: HADOOP-15262 > URL: https://issues.apache.org/jira/browse/HADOOP-15262 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0 >Reporter: wujinhu >Assignee: wujinhu >Priority: Major > Fix For: 3.1.0, 2.9.1, 3.0.1 > > Attachments: HADOOP-15262.001.patch, HADOOP-15262.002.patch, > HADOOP-15262.003.patch, HADOOP-15262.004.patch, HADOOP-15262.005.patch > > > Currently, rename() operation renames files in series. This will be slow if a > directory contains many files. So we can improve this by rename files in > parallel. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-12502) SetReplication OutOfMemoryError
[ https://issues.apache.org/jira/browse/HADOOP-12502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390614#comment-16390614 ] Aaron Fabbri commented on HADOOP-12502: --- Thanks for the updated patch.. Will try to review this week. > SetReplication OutOfMemoryError > --- > > Key: HADOOP-12502 > URL: https://issues.apache.org/jira/browse/HADOOP-12502 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 2.3.0 >Reporter: Philipp Schuegerl >Assignee: Vinayakumar B >Priority: Major > Attachments: HADOOP-12502-01.patch, HADOOP-12502-02.patch, > HADOOP-12502-03.patch, HADOOP-12502-04.patch, HADOOP-12502-05.patch, > HADOOP-12502-06.patch, HADOOP-12502-07.patch, HADOOP-12502-08.patch, > HADOOP-12502-09.patch, HADOOP-12502-10.patch > > > Setting the replication of a HDFS folder recursively can run out of memory. > E.g. with a large /var/log directory: > hdfs dfs -setrep -R -w 1 /var/log > Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit > exceeded > at java.util.Arrays.copyOfRange(Arrays.java:2694) > at java.lang.String.(String.java:203) > at java.lang.String.substring(String.java:1913) > at java.net.URI$Parser.substring(URI.java:2850) > at java.net.URI$Parser.parse(URI.java:3046) > at java.net.URI.(URI.java:753) > at org.apache.hadoop.fs.Path.initialize(Path.java:203) > at org.apache.hadoop.fs.Path.(Path.java:116) > at org.apache.hadoop.fs.Path.(Path.java:94) > at > org.apache.hadoop.hdfs.protocol.HdfsFileStatus.getFullPath(HdfsFileStatus.java:222) > at > org.apache.hadoop.hdfs.protocol.HdfsFileStatus.makeQualified(HdfsFileStatus.java:246) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:689) > at > org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:102) > at > org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:712) > at > org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:708) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:708) > at > org.apache.hadoop.fs.shell.PathData.getDirectoryContents(PathData.java:268) > at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:347) > at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:308) > at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:347) > at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:308) > at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:347) > at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:308) > at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:347) > at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:308) > at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:347) > at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:308) > at > org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:278) > at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:260) > at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:244) > at > org.apache.hadoop.fs.shell.SetReplication.processArguments(SetReplication.java:76) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15273) distcp can't handle remote stores with different checksum algorithms
[ https://issues.apache.org/jira/browse/HADOOP-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390602#comment-16390602 ] Aaron Fabbri commented on HADOOP-15273: --- LGTM. +1 Testing: I only ran the TestCopyMapper unit test > distcp can't handle remote stores with different checksum algorithms > > > Key: HADOOP-15273 > URL: https://issues.apache.org/jira/browse/HADOOP-15273 > Project: Hadoop Common > Issue Type: Bug > Components: tools/distcp >Affects Versions: 3.1.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Critical > Attachments: HADOOP-15273-001.patch, HADOOP-15273-002.patch, > HADOOP-15273-003.patch > > > When using distcp without {{-skipcrcchecks}} . If there's a checksum mismatch > between src and dest store types (e.g hdfs to s3), then the error message > will talk about blocksize, even when its the underlying checksum protocol > itself which is the cause for failure > bq. Source and target differ in block-size. Use -pb to preserve block-sizes > during copy. Alternatively, skip checksum-checks altogether, using -skipCrc. > (NOTE: By skipping checksums, one runs the risk of masking data-corruption > during file-transfer.) > update: the CRC check takes always place on a distcp upload before the file > is renamed into place. *and you can't disable it then* -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15262) AliyunOSS: rename() to move files in a directory in parallel
[ https://issues.apache.org/jira/browse/HADOOP-15262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wujinhu updated HADOOP-15262: - Attachment: HADOOP-15262.005.patch > AliyunOSS: rename() to move files in a directory in parallel > > > Key: HADOOP-15262 > URL: https://issues.apache.org/jira/browse/HADOOP-15262 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0 >Reporter: wujinhu >Assignee: wujinhu >Priority: Major > Fix For: 3.1.0, 2.9.1, 3.0.1 > > Attachments: HADOOP-15262.001.patch, HADOOP-15262.002.patch, > HADOOP-15262.003.patch, HADOOP-15262.004.patch, HADOOP-15262.005.patch > > > Currently, rename() operation renames files in series. This will be slow if a > directory contains many files. So we can improve this by rename files in > parallel. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15295) Remove redundant logging related to tags from Configuration
[ https://issues.apache.org/jira/browse/HADOOP-15295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390444#comment-16390444 ] genericqa commented on HADOOP-15295: | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 1s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 44s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 20s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 47s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 11m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 11m 33s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 45s{color} | {color:orange} hadoop-common-project/hadoop-common: The patch generated 1 new + 244 unchanged - 0 fixed = 245 total (was 244) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 8m 44s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 58s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 7m 58s{color} | {color:green} hadoop-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 25s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 76m 21s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:d4cc50f | | JIRA Issue | HADOOP-15295 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12913459/HADOOP-15295.000.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 80467bdf5254 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 19ae442 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_151 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-HADOOP-Build/14278/artifact/out/diff-checkstyle-hadoop-common-project_hadoop-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-HADOOP-Build/14278/testReport/ | | Max. process+thread count | 1432 (vs. ulimit of 1) | | modules | C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/14278/console | | Powered by | Apache Yetus
[jira] [Commented] (HADOOP-15292) Distcp's use of pread is slowing it down.
[ https://issues.apache.org/jira/browse/HADOOP-15292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390440#comment-16390440 ] Chris Douglas commented on HADOOP-15292: +1 > Distcp's use of pread is slowing it down. > - > > Key: HADOOP-15292 > URL: https://issues.apache.org/jira/browse/HADOOP-15292 > Project: Hadoop Common > Issue Type: Bug > Components: tools/distcp >Affects Versions: 2.5.0 >Reporter: Virajith Jalaparti >Priority: Minor > Attachments: HADOOP-15292.000.patch, HADOOP-15292.001.patch, > HADOOP-15292.002.patch > > > Distcp currently uses positioned-reads (in > RetriableFileCopyCommand#copyBytes) when the source offset is > 0. This > results in unnecessary overheads (new BlockReader being created on the > client-side, multiple readBlock() calls to the Datanodes, each of which > requires the creation of a BlockSender and an inputstream to the ReplicaInfo). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15297) Make s3a etag -> checksum publishing option
[ https://issues.apache.org/jira/browse/HADOOP-15297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390431#comment-16390431 ] genericqa commented on HADOOP-15297: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 20s{color} | {color:blue} Docker mode activated. {color} | | {color:blue}0{color} | {color:blue} patch {color} | {color:blue} 0m 3s{color} | {color:blue} The patch file was not named according to hadoop's naming conventions. Please see https://wiki.apache.org/hadoop/HowToContribute for instructions. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 16s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 8s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 3m 12s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 50s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 46s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 23s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 25s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 19s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 13m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 13m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 3m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 44s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 3 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 30s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 23s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 29s{color} | {color:green} hadoop-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 4m 51s{color} | {color:green} hadoop-aws in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 36s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}105m 42s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:d4cc50f | | JIRA Issue | HADOOP-15297 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12913451/HADOOP-15297-001.patchh | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient xml findbugs checkstyle | | uname | Linux 023b658063ea
[jira] [Commented] (HADOOP-15206) BZip2 drops and duplicates records when input split size is small
[ https://issues.apache.org/jira/browse/HADOOP-15206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390429#comment-16390429 ] Aaron Fabbri commented on HADOOP-15206: --- Just reviewing this as part of a backport. Quick question [~jlowe] and [~tanakahda]: {noformat} +long skipBytes = numSkipped; +while (skipBytes > 0) { + long s = bufferedIn.skip(skipBytes); + if (s > 0) { +skipBytes -= s; + } else { +if (bufferedIn.read() == -1) { + break; // end of the split +} else { + skipBytes--; {noformat} Why is {{skipBytes}} decremented here? skip() returned <= 0, doesn't that mean that no bytes were skipped? I know we want this loop to terminate eventually but I did not understand this part. {noformat} +} + }{noformat} > BZip2 drops and duplicates records when input split size is small > - > > Key: HADOOP-15206 > URL: https://issues.apache.org/jira/browse/HADOOP-15206 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 2.8.3, 3.0.0 >Reporter: Aki Tanaka >Assignee: Aki Tanaka >Priority: Major > Fix For: 3.1.0, 2.10.0, 2.9.1, 2.8.4, 2.7.6, 3.0.2 > > Attachments: HADOOP-15206-test.patch, HADOOP-15206.001.patch, > HADOOP-15206.002.patch, HADOOP-15206.003.patch, HADOOP-15206.004.patch, > HADOOP-15206.005.patch, HADOOP-15206.006.patch, HADOOP-15206.007.patch, > HADOOP-15206.008.patch > > > BZip2 can drop and duplicate record when input split file is small. I > confirmed that this issue happens when the input split size is between 1byte > and 4bytes. > I am seeing the following 2 problem behaviors. > > 1. Drop record: > BZip2 skips the first record in the input file when the input split size is > small > > Set the split size to 3 and tested to load 100 records (0, 1, 2..99) > {code:java} > 2018-02-01 10:52:33,502 INFO [Thread-17] mapred.TestTextInputFormat > (TestTextInputFormat.java:verifyPartitions(317)) - > splits[1]=file:/work/count-mismatch2/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/target/test-dir/TestTextInputFormat/test.bz2:3+3 > count=99{code} > > The input format read only 99 records but not 100 records > > 2. Duplicate Record: > 2 input splits has same BZip2 records when the input split size is small > > Set the split size to 1 and tested to load 100 records (0, 1, 2..99) > > {code:java} > 2018-02-01 11:18:49,309 INFO [Thread-17] mapred.TestTextInputFormat > (TestTextInputFormat.java:verifyPartitions(318)) - splits[3]=file > /work/count-mismatch2/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/target/test-dir/TestTextInputFormat/test.bz2:3+1 > count=99 > 2018-02-01 11:18:49,310 WARN [Thread-17] mapred.TestTextInputFormat > (TestTextInputFormat.java:verifyPartitions(308)) - conflict with 1 in split 4 > at position 8 > {code} > > I experienced this error when I execute Spark (SparkSQL) job under the > following conditions: > * The file size of the input files are small (around 1KB) > * Hadoop cluster has many slave nodes (able to launch many executor tasks) > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15273) distcp can't handle remote stores with different checksum algorithms
[ https://issues.apache.org/jira/browse/HADOOP-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390390#comment-16390390 ] genericqa commented on HADOOP-15273: | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 17s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 22s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 16s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 13s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 30s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 14s{color} | {color:green} hadoop-tools/hadoop-distcp: The patch generated 0 new + 90 unchanged - 4 fixed = 90 total (was 94) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 30s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 15m 0s{color} | {color:green} hadoop-distcp in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 20s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 59m 56s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:d4cc50f | | JIRA Issue | HADOOP-15273 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12913455/HADOOP-15273-003.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 9950576a3130 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 46d29e3 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_151 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-HADOOP-Build/14277/testReport/ | | Max. process+thread count | 346 (vs. ulimit of 1) | | modules | C: hadoop-tools/hadoop-distcp U: hadoop-tools/hadoop-distcp | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/14277/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > distcp can't handle remote stores with different checksum algorithms >
[jira] [Commented] (HADOOP-15292) Distcp's use of pread is slowing it down.
[ https://issues.apache.org/jira/browse/HADOOP-15292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390383#comment-16390383 ] Steve Loughran commented on HADOOP-15292: - LGTM. Chris. what say you? > Distcp's use of pread is slowing it down. > - > > Key: HADOOP-15292 > URL: https://issues.apache.org/jira/browse/HADOOP-15292 > Project: Hadoop Common > Issue Type: Bug > Components: tools/distcp >Affects Versions: 2.5.0 >Reporter: Virajith Jalaparti >Priority: Minor > Attachments: HADOOP-15292.000.patch, HADOOP-15292.001.patch, > HADOOP-15292.002.patch > > > Distcp currently uses positioned-reads (in > RetriableFileCopyCommand#copyBytes) when the source offset is > 0. This > results in unnecessary overheads (new BlockReader being created on the > client-side, multiple readBlock() calls to the Datanodes, each of which > requires the creation of a BlockSender and an inputstream to the ReplicaInfo). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15209) DistCp to eliminate needless deletion of files under already-deleted directories
[ https://issues.apache.org/jira/browse/HADOOP-15209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390359#comment-16390359 ] Steve Loughran commented on HADOOP-15209: - checkstyle are on method length and a line being 84 vs 80. Needless h1. Can someone start reviewing this so we can get it into Hadoop 3.1? thanks > DistCp to eliminate needless deletion of files under already-deleted > directories > > > Key: HADOOP-15209 > URL: https://issues.apache.org/jira/browse/HADOOP-15209 > Project: Hadoop Common > Issue Type: Improvement > Components: tools/distcp >Affects Versions: 2.9.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Major > Attachments: HADOOP-15209-001.patch, HADOOP-15209-002.patch, > HADOOP-15209-003.patch, HADOOP-15209-004.patch, HADOOP-15209-005.patch, > HADOOP-15209-006.patch > > > DistCP issues a delete(file) request even if is underneath an already deleted > directory. This generates needless load on filesystems/object stores, and, if > the store throttles delete, can dramatically slow down the delete operation. > If the distcp delete operation can build a history of deleted directories, > then it will know when it does not need to issue those deletes. > Care is needed here to make sure that whatever structure is created does not > overload the heap of the process. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15295) Remove redundant logging related to tags from Configuration
[ https://issues.apache.org/jira/browse/HADOOP-15295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajay Kumar updated HADOOP-15295: Status: Patch Available (was: Open) > Remove redundant logging related to tags from Configuration > --- > > Key: HADOOP-15295 > URL: https://issues.apache.org/jira/browse/HADOOP-15295 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Ajay Kumar >Assignee: Ajay Kumar >Priority: Major > Attachments: HADOOP-15295.000.patch > > > Remove redundant logging related to tags from Configuration. > {code} > 2018-03-06 18:55:46,164 INFO conf.Configuration: Removed undeclared tags: > 2018-03-06 18:55:46,237 INFO conf.Configuration: Removed undeclared tags: > 2018-03-06 18:55:46,249 INFO conf.Configuration: Removed undeclared tags: > 2018-03-06 18:55:46,256 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15295) Remove redundant logging related to tags from Configuration
[ https://issues.apache.org/jira/browse/HADOOP-15295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajay Kumar updated HADOOP-15295: Attachment: HADOOP-15295.000.patch > Remove redundant logging related to tags from Configuration > --- > > Key: HADOOP-15295 > URL: https://issues.apache.org/jira/browse/HADOOP-15295 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Ajay Kumar >Assignee: Ajay Kumar >Priority: Major > Attachments: HADOOP-15295.000.patch > > > Remove redundant logging related to tags from Configuration. > {code} > 2018-03-06 18:55:46,164 INFO conf.Configuration: Removed undeclared tags: > 2018-03-06 18:55:46,237 INFO conf.Configuration: Removed undeclared tags: > 2018-03-06 18:55:46,249 INFO conf.Configuration: Removed undeclared tags: > 2018-03-06 18:55:46,256 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15273) distcp can't handle remote stores with different checksum algorithms
[ https://issues.apache.org/jira/browse/HADOOP-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390297#comment-16390297 ] genericqa commented on HADOOP-15273: | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 20s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 22s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 17s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 25s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 29s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 20s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 13s{color} | {color:orange} hadoop-tools/hadoop-distcp: The patch generated 1 new + 90 unchanged - 4 fixed = 91 total (was 94) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 9s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 14m 59s{color} | {color:green} hadoop-distcp in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 20s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 58m 45s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:d4cc50f | | JIRA Issue | HADOOP-15273 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12913445/HADOOP-15273-002.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux ad3b5c0e074b 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 46d29e3 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_151 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-HADOOP-Build/14275/artifact/out/diff-checkstyle-hadoop-tools_hadoop-distcp.txt | | Test Results | https://builds.apache.org/job/PreCommit-HADOOP-Build/14275/testReport/ | | Max. process+thread count | 339 (vs. ulimit of 1) | | modules | C: hadoop-tools/hadoop-distcp U: hadoop-tools/hadoop-distcp | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/14275/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org
[jira] [Updated] (HADOOP-15273) distcp can't handle remote stores with different checksum algorithms
[ https://issues.apache.org/jira/browse/HADOOP-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-15273: Status: Patch Available (was: Open) > distcp can't handle remote stores with different checksum algorithms > > > Key: HADOOP-15273 > URL: https://issues.apache.org/jira/browse/HADOOP-15273 > Project: Hadoop Common > Issue Type: Bug > Components: tools/distcp >Affects Versions: 3.1.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Critical > Attachments: HADOOP-15273-001.patch, HADOOP-15273-002.patch, > HADOOP-15273-003.patch > > > When using distcp without {{-skipcrcchecks}} . If there's a checksum mismatch > between src and dest store types (e.g hdfs to s3), then the error message > will talk about blocksize, even when its the underlying checksum protocol > itself which is the cause for failure > bq. Source and target differ in block-size. Use -pb to preserve block-sizes > during copy. Alternatively, skip checksum-checks altogether, using -skipCrc. > (NOTE: By skipping checksums, one runs the risk of masking data-corruption > during file-transfer.) > update: the CRC check takes always place on a distcp upload before the file > is renamed into place. *and you can't disable it then* -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15273) distcp can't handle remote stores with different checksum algorithms
[ https://issues.apache.org/jira/browse/HADOOP-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390291#comment-16390291 ] Steve Loughran commented on HADOOP-15273: - Patch 003 * fixes checkstyle * fixes tests With HADOOP-15297 making the etags => checksum feature in s3a optional, this isn't quite a blocker, but it is when you try to distcp between any two stores with different algorithms, because only -update lets you skip the checks right now. If any other FS offers checksums, things will break > distcp can't handle remote stores with different checksum algorithms > > > Key: HADOOP-15273 > URL: https://issues.apache.org/jira/browse/HADOOP-15273 > Project: Hadoop Common > Issue Type: Bug > Components: tools/distcp >Affects Versions: 3.1.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Critical > Attachments: HADOOP-15273-001.patch, HADOOP-15273-002.patch, > HADOOP-15273-003.patch > > > When using distcp without {{-skipcrcchecks}} . If there's a checksum mismatch > between src and dest store types (e.g hdfs to s3), then the error message > will talk about blocksize, even when its the underlying checksum protocol > itself which is the cause for failure > bq. Source and target differ in block-size. Use -pb to preserve block-sizes > during copy. Alternatively, skip checksum-checks altogether, using -skipCrc. > (NOTE: By skipping checksums, one runs the risk of masking data-corruption > during file-transfer.) > update: the CRC check takes always place on a distcp upload before the file > is renamed into place. *and you can't disable it then* -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15273) distcp can't handle remote stores with different checksum algorithms
[ https://issues.apache.org/jira/browse/HADOOP-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-15273: Status: Open (was: Patch Available) > distcp can't handle remote stores with different checksum algorithms > > > Key: HADOOP-15273 > URL: https://issues.apache.org/jira/browse/HADOOP-15273 > Project: Hadoop Common > Issue Type: Bug > Components: tools/distcp >Affects Versions: 3.1.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Critical > Attachments: HADOOP-15273-001.patch, HADOOP-15273-002.patch, > HADOOP-15273-003.patch > > > When using distcp without {{-skipcrcchecks}} . If there's a checksum mismatch > between src and dest store types (e.g hdfs to s3), then the error message > will talk about blocksize, even when its the underlying checksum protocol > itself which is the cause for failure > bq. Source and target differ in block-size. Use -pb to preserve block-sizes > during copy. Alternatively, skip checksum-checks altogether, using -skipCrc. > (NOTE: By skipping checksums, one runs the risk of masking data-corruption > during file-transfer.) > update: the CRC check takes always place on a distcp upload before the file > is renamed into place. *and you can't disable it then* -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15273) distcp can't handle remote stores with different checksum algorithms
[ https://issues.apache.org/jira/browse/HADOOP-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-15273: Target Version/s: 3.1.0 Status: Patch Available (was: Open) > distcp can't handle remote stores with different checksum algorithms > > > Key: HADOOP-15273 > URL: https://issues.apache.org/jira/browse/HADOOP-15273 > Project: Hadoop Common > Issue Type: Bug > Components: tools/distcp >Affects Versions: 3.1.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Blocker > Attachments: HADOOP-15273-001.patch, HADOOP-15273-002.patch, > HADOOP-15273-003.patch > > > When using distcp without {{-skipcrcchecks}} . If there's a checksum mismatch > between src and dest store types (e.g hdfs to s3), then the error message > will talk about blocksize, even when its the underlying checksum protocol > itself which is the cause for failure > bq. Source and target differ in block-size. Use -pb to preserve block-sizes > during copy. Alternatively, skip checksum-checks altogether, using -skipCrc. > (NOTE: By skipping checksums, one runs the risk of masking data-corruption > during file-transfer.) > update: the CRC check takes always place on a distcp upload before the file > is renamed into place. *and you can't disable it then* -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15273) distcp can't handle remote stores with different checksum algorithms
[ https://issues.apache.org/jira/browse/HADOOP-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-15273: Attachment: HADOOP-15273-003.patch > distcp can't handle remote stores with different checksum algorithms > > > Key: HADOOP-15273 > URL: https://issues.apache.org/jira/browse/HADOOP-15273 > Project: Hadoop Common > Issue Type: Bug > Components: tools/distcp >Affects Versions: 3.1.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Blocker > Attachments: HADOOP-15273-001.patch, HADOOP-15273-002.patch, > HADOOP-15273-003.patch > > > When using distcp without {{-skipcrcchecks}} . If there's a checksum mismatch > between src and dest store types (e.g hdfs to s3), then the error message > will talk about blocksize, even when its the underlying checksum protocol > itself which is the cause for failure > bq. Source and target differ in block-size. Use -pb to preserve block-sizes > during copy. Alternatively, skip checksum-checks altogether, using -skipCrc. > (NOTE: By skipping checksums, one runs the risk of masking data-corruption > during file-transfer.) > update: the CRC check takes always place on a distcp upload before the file > is renamed into place. *and you can't disable it then* -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15273) distcp can't handle remote stores with different checksum algorithms
[ https://issues.apache.org/jira/browse/HADOOP-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-15273: Priority: Critical (was: Blocker) > distcp can't handle remote stores with different checksum algorithms > > > Key: HADOOP-15273 > URL: https://issues.apache.org/jira/browse/HADOOP-15273 > Project: Hadoop Common > Issue Type: Bug > Components: tools/distcp >Affects Versions: 3.1.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Critical > Attachments: HADOOP-15273-001.patch, HADOOP-15273-002.patch, > HADOOP-15273-003.patch > > > When using distcp without {{-skipcrcchecks}} . If there's a checksum mismatch > between src and dest store types (e.g hdfs to s3), then the error message > will talk about blocksize, even when its the underlying checksum protocol > itself which is the cause for failure > bq. Source and target differ in block-size. Use -pb to preserve block-sizes > during copy. Alternatively, skip checksum-checks altogether, using -skipCrc. > (NOTE: By skipping checksums, one runs the risk of masking data-corruption > during file-transfer.) > update: the CRC check takes always place on a distcp upload before the file > is renamed into place. *and you can't disable it then* -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15273) distcp can't handle remote stores with different checksum algorithms
[ https://issues.apache.org/jira/browse/HADOOP-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-15273: Status: Open (was: Patch Available) > distcp can't handle remote stores with different checksum algorithms > > > Key: HADOOP-15273 > URL: https://issues.apache.org/jira/browse/HADOOP-15273 > Project: Hadoop Common > Issue Type: Bug > Components: tools/distcp >Affects Versions: 3.1.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Blocker > Attachments: HADOOP-15273-001.patch, HADOOP-15273-002.patch, > HADOOP-15273-003.patch > > > When using distcp without {{-skipcrcchecks}} . If there's a checksum mismatch > between src and dest store types (e.g hdfs to s3), then the error message > will talk about blocksize, even when its the underlying checksum protocol > itself which is the cause for failure > bq. Source and target differ in block-size. Use -pb to preserve block-sizes > during copy. Alternatively, skip checksum-checks altogether, using -skipCrc. > (NOTE: By skipping checksums, one runs the risk of masking data-corruption > during file-transfer.) > update: the CRC check takes always place on a distcp upload before the file > is renamed into place. *and you can't disable it then* -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15297) Make s3a etag -> checksum publishing option
[ https://issues.apache.org/jira/browse/HADOOP-15297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-15297: Status: Patch Available (was: Open) FYI: [~devaraj] [~ehiggs] [~fabbri][~leftnoteasy] With this patch in you don't need HADOOP-15273 to get hdfs <-> s3a distcp to work, but you will want that if you do enable this feature > Make s3a etag -> checksum publishing option > --- > > Key: HADOOP-15297 > URL: https://issues.apache.org/jira/browse/HADOOP-15297 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.1.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Blocker > Attachments: HADOOP-15297-001.patchh > > > HADOOP-15273 shows how distcp doesn't handle non-HDFS filesystems with > checksums. > Exposing Etags as checksums, HADOOP-13282, breaks workflows which back up to > s3a. > Rather than revert I want to make it an option, off by default. Once we are > happy with distcp in future, we can turn it on. > Why an option? Because it lines up for a successor to distcp which saves src > and dest checksums to a file and can then verify whether or not files have > really changed. Currently distcp relies on dest checksum algorithm being the > same as the src for incremental updates, but if either of the stores don't > serve checksums, silently downgrades to not checking. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15297) Make s3a etag -> checksum publishing option
[ https://issues.apache.org/jira/browse/HADOOP-15297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390281#comment-16390281 ] Steve Loughran commented on HADOOP-15297: - HADOOP-15297 patch 001 * make etag feature optional, disabled by default * update tests to check both states * update docs * also update instrumentation to track invocations. Not directly used, but useful for anything downstream trying to track the #of calls made Testing: S3 ireland, also did the mvn site:site to verify the updated docs were valid. > Make s3a etag -> checksum publishing option > --- > > Key: HADOOP-15297 > URL: https://issues.apache.org/jira/browse/HADOOP-15297 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.1.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Blocker > Attachments: HADOOP-15297-001.patchh > > > HADOOP-15273 shows how distcp doesn't handle non-HDFS filesystems with > checksums. > Exposing Etags as checksums, HADOOP-13282, breaks workflows which back up to > s3a. > Rather than revert I want to make it an option, off by default. Once we are > happy with distcp in future, we can turn it on. > Why an option? Because it lines up for a successor to distcp which saves src > and dest checksums to a file and can then verify whether or not files have > really changed. Currently distcp relies on dest checksum algorithm being the > same as the src for incremental updates, but if either of the stores don't > serve checksums, silently downgrades to not checking. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15297) Make s3a etag -> checksum publishing option
[ https://issues.apache.org/jira/browse/HADOOP-15297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-15297: Attachment: HADOOP-15297-001.patchh > Make s3a etag -> checksum publishing option > --- > > Key: HADOOP-15297 > URL: https://issues.apache.org/jira/browse/HADOOP-15297 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.1.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Blocker > Attachments: HADOOP-15297-001.patchh > > > HADOOP-15273 shows how distcp doesn't handle non-HDFS filesystems with > checksums. > Exposing Etags as checksums, HADOOP-13282, breaks workflows which back up to > s3a. > Rather than revert I want to make it an option, off by default. Once we are > happy with distcp in future, we can turn it on. > Why an option? Because it lines up for a successor to distcp which saves src > and dest checksums to a file and can then verify whether or not files have > really changed. Currently distcp relies on dest checksum algorithm being the > same as the src for incremental updates, but if either of the stores don't > serve checksums, silently downgrades to not checking. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15278) log s3a at info
[ https://issues.apache.org/jira/browse/HADOOP-15278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390262#comment-16390262 ] genericqa commented on HADOOP-15278: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 1s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 28m 41s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 8s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 58s{color} | {color:green} hadoop-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 22s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 43m 52s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:d4cc50f | | JIRA Issue | HADOOP-15278 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12913440/HADOOP-15278-001.patch | | Optional Tests | asflicense mvnsite unit | | uname | Linux e4b1ac05f0d3 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 46d29e3 | | maven | version: Apache Maven 3.3.9 | | Test Results | https://builds.apache.org/job/PreCommit-HADOOP-Build/14274/testReport/ | | Max. process+thread count | 334 (vs. ulimit of 1) | | modules | C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/14274/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > log s3a at info > --- > > Key: HADOOP-15278 > URL: https://issues.apache.org/jira/browse/HADOOP-15278 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.0.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Major > Attachments: HADOOP-15278-001.patch > > > since it was added, hadoop conf/log4j only logs s3a at ERROR, even though in > our test/resources it logs at info. We do actually log lots of stuff useful > when debugging things > Proposed: drop the log level to INFO here -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-15297) Make s3a etag -> checksum publishing option
Steve Loughran created HADOOP-15297: --- Summary: Make s3a etag -> checksum publishing option Key: HADOOP-15297 URL: https://issues.apache.org/jira/browse/HADOOP-15297 Project: Hadoop Common Issue Type: Sub-task Components: fs/s3 Affects Versions: 3.1.0 Reporter: Steve Loughran Assignee: Steve Loughran HADOOP-15273 shows how distcp doesn't handle non-HDFS filesystems with checksums. Exposing Etags as checksums, HADOOP-13282, breaks workflows which back up to s3a. Rather than revert I want to make it an option, off by default. Once we are happy with distcp in future, we can turn it on. Why an option? Because it lines up for a successor to distcp which saves src and dest checksums to a file and can then verify whether or not files have really changed. Currently distcp relies on dest checksum algorithm being the same as the src for incremental updates, but if either of the stores don't serve checksums, silently downgrades to not checking. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15273) distcp can't handle remote stores with different checksum algorithms
[ https://issues.apache.org/jira/browse/HADOOP-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-15273: Status: Patch Available (was: Open) > distcp can't handle remote stores with different checksum algorithms > > > Key: HADOOP-15273 > URL: https://issues.apache.org/jira/browse/HADOOP-15273 > Project: Hadoop Common > Issue Type: Bug > Components: tools/distcp >Affects Versions: 3.1.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Blocker > Attachments: HADOOP-15273-001.patch, HADOOP-15273-002.patch > > > When using distcp without {{-skipcrcchecks}} . If there's a checksum mismatch > between src and dest store types (e.g hdfs to s3), then the error message > will talk about blocksize, even when its the underlying checksum protocol > itself which is the cause for failure > bq. Source and target differ in block-size. Use -pb to preserve block-sizes > during copy. Alternatively, skip checksum-checks altogether, using -skipCrc. > (NOTE: By skipping checksums, one runs the risk of masking data-corruption > during file-transfer.) > update: the CRC check takes always place on a distcp upload before the file > is renamed into place. *and you can't disable it then* -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15273) distcp can't handle remote stores with different checksum algorithms
[ https://issues.apache.org/jira/browse/HADOOP-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-15273: Attachment: HADOOP-15273-002.patch > distcp can't handle remote stores with different checksum algorithms > > > Key: HADOOP-15273 > URL: https://issues.apache.org/jira/browse/HADOOP-15273 > Project: Hadoop Common > Issue Type: Bug > Components: tools/distcp >Affects Versions: 3.1.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Blocker > Attachments: HADOOP-15273-001.patch, HADOOP-15273-002.patch > > > When using distcp without {{-skipcrcchecks}} . If there's a checksum mismatch > between src and dest store types (e.g hdfs to s3), then the error message > will talk about blocksize, even when its the underlying checksum protocol > itself which is the cause for failure > bq. Source and target differ in block-size. Use -pb to preserve block-sizes > during copy. Alternatively, skip checksum-checks altogether, using -skipCrc. > (NOTE: By skipping checksums, one runs the risk of masking data-corruption > during file-transfer.) > update: the CRC check takes always place on a distcp upload before the file > is renamed into place. *and you can't disable it then* -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15273) distcp can't handle remote stores with different checksum algorithms
[ https://issues.apache.org/jira/browse/HADOOP-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-15273: Status: Open (was: Patch Available) > distcp can't handle remote stores with different checksum algorithms > > > Key: HADOOP-15273 > URL: https://issues.apache.org/jira/browse/HADOOP-15273 > Project: Hadoop Common > Issue Type: Bug > Components: tools/distcp >Affects Versions: 3.1.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Blocker > Attachments: HADOOP-15273-001.patch, HADOOP-15273-002.patch > > > When using distcp without {{-skipcrcchecks}} . If there's a checksum mismatch > between src and dest store types (e.g hdfs to s3), then the error message > will talk about blocksize, even when its the underlying checksum protocol > itself which is the cause for failure > bq. Source and target differ in block-size. Use -pb to preserve block-sizes > during copy. Alternatively, skip checksum-checks altogether, using -skipCrc. > (NOTE: By skipping checksums, one runs the risk of masking data-corruption > during file-transfer.) > update: the CRC check takes always place on a distcp upload before the file > is renamed into place. *and you can't disable it then* -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14937) initial part uploads seem to block unnecessarily in S3ABlockOutputStream
[ https://issues.apache.org/jira/browse/HADOOP-14937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-14937: Parent Issue: HADOOP-15220 (was: HADOOP-14831) > initial part uploads seem to block unnecessarily in S3ABlockOutputStream > > > Key: HADOOP-14937 > URL: https://issues.apache.org/jira/browse/HADOOP-14937 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.0.0-beta1 >Reporter: Steven Rand >Assignee: Steven Rand >Priority: Major > Attachments: yjp_threads.png > > > From looking at a YourKit snapshot of an FsShell process running a {{hadoop > fs -put file:///... s3a://...}}, it seems that the first part in the > multipart upload doesn't begin to upload until n of the > {{s3a-transfer-shared-pool}} threads are able to start uploading, where n is > the value of {{fs.s3a.fast.upload.active.blocks}}. > To hopefully clarify a bit, the series of events that I expected to see with > {{fs.s3a.fast.upload.active.blocks}} set to 4 is: > 1. An amount of data equal to {{fs.s3a.multipart.size}} is buffered into > off-heap memory (I have {{fs.s3a.fast.upload.buffer = bytebuffer}}). > 2. As soon as that happens, a thread begins to upload that part. Meanwhile, > the main thread continues to buffer data into off-heap memory. > 3. Once another part has been buffered into off-heap memory, a separate > thread uploads that part, and so on. > Whereas what I think the YK snapshot shows happening is: > 1. An amount of data equal to {{fs.s3a.multipart.size}} * 4 is buffered into > off-heap memory. > 2. Four threads start to upload one part each at the same time. > I've attached a picture of the "Threads" tab to show what I mean. Basically > the times at which the first four {{s3a-transfer-shared-pool}} threads start > to upload are roughly the same, whereas I would've expected them to be more > staggered. > I'm actually not sure whether this is the expected behavior or not, so feel > free to close if this doesn't come as a surprise to anyone. > For some context, I've been trying to get a sense for roughly which values of > {{fs.s3a.multipart.size}} perform the best at different file sizes. One thing > that I found confusing is that a part size of 5 MB seems to outperform a part > size of 64 MB up until files that are upwards of about 500 MB in size. This > seems odd, since each {{uploadPart}} call is its own HTTP request, and I > would've expected the overhead of those to become costly at small part sizes. > My suspicion is that with 4 concurrent part uploads and 64 MB blocks, we have > to wait until 256 MB are buffered before we can start uploading, while with 5 > MB blocks we can start uploading as soon as we buffer 20 MB, and that's what > gives the smaller parts the advantage for smaller files. > I'm happy to submit a patch if this is in fact a problem, but wanted to check > to make sure I'm not just misunderstanding something. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15273) distcp can't handle remote stores with different checksum algorithms
[ https://issues.apache.org/jira/browse/HADOOP-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390179#comment-16390179 ] Steve Loughran commented on HADOOP-15273: - This is what you get now {code} Checksum mismatch between hdfs://localhost:50883/tmp/source/5/6 and hdfs://localhost:50883/tmp/target/.distcp.tmp.attempt___m_00_0. Source and target differ in block-size. Use -pb to preserve block-sizes during copy. You can skip checksum-checks altogether with -skipcrccheck. (NOTE: By skipping checksums, one runs the risk of masking data-corruption during file-transfer.) {code} > distcp can't handle remote stores with different checksum algorithms > > > Key: HADOOP-15273 > URL: https://issues.apache.org/jira/browse/HADOOP-15273 > Project: Hadoop Common > Issue Type: Bug > Components: tools/distcp >Affects Versions: 3.1.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Critical > Attachments: HADOOP-15273-001.patch > > > When using distcp without {{-skipcrcchecks}} . If there's a checksum mismatch > between src and dest store types (e.g hdfs to s3), then the error message > will talk about blocksize, even when its the underlying checksum protocol > itself which is the cause for failure > bq. Source and target differ in block-size. Use -pb to preserve block-sizes > during copy. Alternatively, skip checksum-checks altogether, using -skipCrc. > (NOTE: By skipping checksums, one runs the risk of masking data-corruption > during file-transfer.) > update: the CRC check takes always place on a distcp upload before the file > is renamed into place. *and you can't disable it then* -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15273) distcp can't handle remote stores with different checksum algorithms
[ https://issues.apache.org/jira/browse/HADOOP-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-15273: Priority: Blocker (was: Critical) > distcp can't handle remote stores with different checksum algorithms > > > Key: HADOOP-15273 > URL: https://issues.apache.org/jira/browse/HADOOP-15273 > Project: Hadoop Common > Issue Type: Bug > Components: tools/distcp >Affects Versions: 3.1.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Blocker > Attachments: HADOOP-15273-001.patch > > > When using distcp without {{-skipcrcchecks}} . If there's a checksum mismatch > between src and dest store types (e.g hdfs to s3), then the error message > will talk about blocksize, even when its the underlying checksum protocol > itself which is the cause for failure > bq. Source and target differ in block-size. Use -pb to preserve block-sizes > during copy. Alternatively, skip checksum-checks altogether, using -skipCrc. > (NOTE: By skipping checksums, one runs the risk of masking data-corruption > during file-transfer.) > update: the CRC check takes always place on a distcp upload before the file > is renamed into place. *and you can't disable it then* -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15267) S3A multipart upload fails when SSE-C encryption is enabled
[ https://issues.apache.org/jira/browse/HADOOP-15267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390171#comment-16390171 ] Anis Elleuch commented on HADOOP-15267: --- Thanks for merging this patch [~ste...@apache.org]. > S3A multipart upload fails when SSE-C encryption is enabled > --- > > Key: HADOOP-15267 > URL: https://issues.apache.org/jira/browse/HADOOP-15267 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 2.9.0, 3.0.0, 3.1.0 > Environment: Hadoop 3.1 Snapshot >Reporter: Anis Elleuch >Assignee: Anis Elleuch >Priority: Critical > Fix For: 3.1.0, 3.0.2 > > Attachments: HADOOP-15267-001.patch, HADOOP-15267-002.patch, > HADOOP-15267-003.patch > > > When I enable SSE-C encryption in Hadoop 3.1 and set fs.s3a.multipart.size > to 5 Mb, storing data in AWS doesn't work anymore. For example, running the > following code: > {code} > >>> df1 = spark.read.json('/home/user/people.json') > >>> df1.write.mode("overwrite").json("s3a://testbucket/people.json") > {code} > shows the following exception: > {code:java} > com.amazonaws.services.s3.model.AmazonS3Exception: The multipart upload > initiate requested encryption. Subsequent part requests must include the > appropriate encryption parameters. > {code} > After some investigation, I discovered that hadoop-aws doesn't send SSE-C > headers in Put Object Part as stated in AWS specification: > [https://docs.aws.amazon.com/AmazonS3/latest/API/mpUploadUploadPart.html] > {code:java} > If you requested server-side encryption using a customer-provided encryption > key in your initiate multipart upload request, you must provide identical > encryption information in each part upload using the following headers. > {code} > > You can find a patch attached to this issue for a better clarification of the > problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15278) log s3a at info
[ https://issues.apache.org/jira/browse/HADOOP-15278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-15278: Status: Patch Available (was: Open) patch 001; s3a logging is commented out & so gets the global scope; aws is uprated to info, with the exception of the httpclient. This can log stack traces when it gets errors its retrying, so can be confusing > log s3a at info > --- > > Key: HADOOP-15278 > URL: https://issues.apache.org/jira/browse/HADOOP-15278 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.0.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Major > Attachments: HADOOP-15278-001.patch > > > since it was added, hadoop conf/log4j only logs s3a at ERROR, even though in > our test/resources it logs at info. We do actually log lots of stuff useful > when debugging things > Proposed: drop the log level to INFO here -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15278) log s3a at info
[ https://issues.apache.org/jira/browse/HADOOP-15278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-15278: Attachment: HADOOP-15278-001.patch > log s3a at info > --- > > Key: HADOOP-15278 > URL: https://issues.apache.org/jira/browse/HADOOP-15278 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.0.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Major > Attachments: HADOOP-15278-001.patch > > > since it was added, hadoop conf/log4j only logs s3a at ERROR, even though in > our test/resources it logs at info. We do actually log lots of stuff useful > when debugging things > Proposed: drop the log level to INFO here -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15278) log s3a at info
[ https://issues.apache.org/jira/browse/HADOOP-15278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390165#comment-16390165 ] Steve Loughran commented on HADOOP-15278: - Actually, this is what you see in the default {code} 2018-03-07 20:20:48,755 INFO beanutils.FluentPropertyBeanIntrospector: Error when creating PropertyDescriptor for public final void org.apache.commons.configuration2.AbstractConfiguration.setProperty(java.lang.String,java.lang.Object)! Ignoring this property. 2018-03-07 20:20:48,780 INFO impl.MetricsConfig: loaded properties from hadoop-metrics2.properties 2018-03-07 20:20:48,844 INFO impl.MetricsSystemImpl: Scheduled Metric snapshot period at 10 second(s). 2018-03-07 20:20:48,844 INFO impl.MetricsSystemImpl: s3a-file-system metrics system started 2018-03-07 20:20:50,108 INFO Configuration.deprecation: fs.s3a.server-side-encryption-key is deprecated. Instead, use fs.s3a.server-side-encryption.key Found 1 items drwxrwxrwx - stevel stevel 0 2018-03-07 20:20 s3a://hwdev-steve-london/Users 2018-03-07 20:20:50,274 INFO impl.MetricsSystemImpl: Stopping s3a-file-system metrics system... 2018-03-07 20:20:50,274 INFO impl.MetricsSystemImpl: s3a-file-system metrics system stopped. 2018-03-07 20:20:50,274 INFO impl.MetricsSystemImpl: s3a-file-system metrics system shutdown complete. {code} Something about deprecation I've never got away, and metrics cruft. The fluent property one is HADOOP-15277; think its actually metrics related. for reference, azure {code} bin/hadoop fs -ls wasb://contr...@contender.blob.core.windows.net 2018-03-07 20:26:16,848 INFO util.log: Logging initialized @1371ms 2018-03-07 20:26:16,968 INFO beanutils.FluentPropertyBeanIntrospector: Error when creating PropertyDescriptor for public final void org.apache.commons.configuration2.AbstractConfiguration.setProperty(java.lang.String,java.lang.Object)! Ignoring this property. 2018-03-07 20:26:17,003 INFO impl.MetricsConfig: loaded properties from hadoop-metrics2.properties 2018-03-07 20:26:17,075 INFO impl.MetricsSystemImpl: Scheduled Metric snapshot period at 10 second(s). 2018-03-07 20:26:17,075 INFO impl.MetricsSystemImpl: azure-file-system metrics system started Found 7 items -rw-r--r-- 1 stevel supergroup 0 2017-06-30 18:12 wasb://contr...@contender.blob.core.windows.net/user/stevel/file -rw-r--r-- 1 stevel supergroup513 2017-06-30 18:10 wasb://contr...@contender.blob.core.windows.net/user/stevel/file.dat -rw-r--r-- 1 stevel supergroup 0 2017-06-30 18:13 wasb://contr...@contender.blob.core.windows.net/user/stevel/fileToRename drwxr-xr-x - stevel supergroup 0 2017-06-30 18:11 wasb://contr...@contender.blob.core.windows.net/user/stevel/parent drwxr-xr-x - stevel supergroup 0 2017-06-30 18:11 wasb://contr...@contender.blob.core.windows.net/user/stevel/renamedFolder drwxr-xr-x - stevel supergroup 0 2016-10-20 22:00 wasb://contr...@contender.blob.core.windows.net/user/stevel/streaming drwxr-xr-x - stevel supergroup 0 2017-06-30 18:11 wasb://contr...@contender.blob.core.windows.net/user/stevel/testDestFolder 2018-03-07 20:26:17,756 INFO impl.MetricsSystemImpl: Stopping azure-file-system metrics system... 2018-03-07 20:26:17,756 INFO impl.MetricsSystemImpl: azure-file-system metrics system stopped. 2018-03-07 20:26:17,756 INFO impl.MetricsSystemImpl: azure-file-system metrics system shutdown complete. {code} so: equivalent noisyness around metrics. Dangerous to turn that off though, as this is the same log4j used in the servers, which also spin the metrics. wasb and s3a are just the two clients to use the metrics classes > log s3a at info > --- > > Key: HADOOP-15278 > URL: https://issues.apache.org/jira/browse/HADOOP-15278 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.0.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Major > > since it was added, hadoop conf/log4j only logs s3a at ERROR, even though in > our test/resources it logs at info. We do actually log lots of stuff useful > when debugging things > Proposed: drop the log level to INFO here -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-15273) distcp can't handle remote stores with different checksum algorithms
[ https://issues.apache.org/jira/browse/HADOOP-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390143#comment-16390143 ] Steve Loughran edited comment on HADOOP-15273 at 3/7/18 8:20 PM: - copymapper contains test to look for string of (incorrect) -skipCrc message. So not just wrong, tests to make sure it stays wrong :) {code} java.lang.AssertionError: Failure exception should have suggested the use of -skipCrc. at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.assertTrue(Assert.java:41) at org.apache.hadoop.tools.mapred.TestCopyMapper.testCopyFailOnBlockSizeDifference(TestCopyMapper.java:949) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) {code} checkstyle {code} ./hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/RetriableFileCopyCommand.java:213: StringBuilder errorMessage = new StringBuilder("Checksum mismatch between "): Line is longer than 80 characters (found 82). [LineLength] {code} was (Author: ste...@apache.org): copymapper contains test to look for string of (incorrect) -skipCrc message. So not just wrong, tests to make sure it stays wrong :) > distcp can't handle remote stores with different checksum algorithms > > > Key: HADOOP-15273 > URL: https://issues.apache.org/jira/browse/HADOOP-15273 > Project: Hadoop Common > Issue Type: Bug > Components: tools/distcp >Affects Versions: 3.1.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Critical > Attachments: HADOOP-15273-001.patch > > > When using distcp without {{-skipcrcchecks}} . If there's a checksum mismatch > between src and dest store types (e.g hdfs to s3), then the error message > will talk about blocksize, even when its the underlying checksum protocol > itself which is the cause for failure > bq. Source and target differ in block-size. Use -pb to preserve block-sizes > during copy. Alternatively, skip checksum-checks altogether, using -skipCrc. > (NOTE: By skipping checksums, one runs the risk of masking data-corruption > during file-transfer.) > update: the CRC check takes always place on a distcp upload before the file > is renamed into place. *and you can't disable it then* -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15273) distcp can't handle remote stores with different checksum algorithms
[ https://issues.apache.org/jira/browse/HADOOP-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390143#comment-16390143 ] Steve Loughran commented on HADOOP-15273: - copymapper contains test to look for string of (incorrect) -skipCrc message. So not just wrong, tests to make sure it stays wrong :) > distcp can't handle remote stores with different checksum algorithms > > > Key: HADOOP-15273 > URL: https://issues.apache.org/jira/browse/HADOOP-15273 > Project: Hadoop Common > Issue Type: Bug > Components: tools/distcp >Affects Versions: 3.1.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Critical > Attachments: HADOOP-15273-001.patch > > > When using distcp without {{-skipcrcchecks}} . If there's a checksum mismatch > between src and dest store types (e.g hdfs to s3), then the error message > will talk about blocksize, even when its the underlying checksum protocol > itself which is the cause for failure > bq. Source and target differ in block-size. Use -pb to preserve block-sizes > during copy. Alternatively, skip checksum-checks altogether, using -skipCrc. > (NOTE: By skipping checksums, one runs the risk of masking data-corruption > during file-transfer.) > update: the CRC check takes always place on a distcp upload before the file > is renamed into place. *and you can't disable it then* -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15292) Distcp's use of pread is slowing it down.
[ https://issues.apache.org/jira/browse/HADOOP-15292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390141#comment-16390141 ] genericqa commented on HADOOP-15292: | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 11m 53s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 18s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 25s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 58s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 29s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 20s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 13s{color} | {color:orange} hadoop-tools/hadoop-distcp: The patch generated 1 new + 95 unchanged - 1 fixed = 96 total (was 96) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 26s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 14m 40s{color} | {color:green} hadoop-distcp in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 22s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 67m 12s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:d4cc50f | | JIRA Issue | HADOOP-15292 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12913424/HADOOP-15292.002.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 343b69026d0a 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / e0307e5 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_151 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-HADOOP-Build/14273/artifact/out/diff-checkstyle-hadoop-tools_hadoop-distcp.txt | | Test Results | https://builds.apache.org/job/PreCommit-HADOOP-Build/14273/testReport/ | | Max. process+thread count | 434 (vs. ulimit of 1) | | modules | C: hadoop-tools/hadoop-distcp U: hadoop-tools/hadoop-distcp | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/14273/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org |
[jira] [Commented] (HADOOP-15278) log s3a at info
[ https://issues.apache.org/jira/browse/HADOOP-15278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390131#comment-16390131 ] Steve Loughran commented on HADOOP-15278: - Nope, nothing by default {code} bin/hadoop fs -ls s3a://hwdev-steve-london/ Found 1 items drwxrwxrwx - stevel stevel 0 2018-03-07 20:12 s3a://hwdev-steve-london/Users {code} you do get more on other ops, like committing work, which is where I'd like it > log s3a at info > --- > > Key: HADOOP-15278 > URL: https://issues.apache.org/jira/browse/HADOOP-15278 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.0.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Major > > since it was added, hadoop conf/log4j only logs s3a at ERROR, even though in > our test/resources it logs at info. We do actually log lots of stuff useful > when debugging things > Proposed: drop the log level to INFO here -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-15267) S3A multipart upload fails when SSE-C encryption is enabled
[ https://issues.apache.org/jira/browse/HADOOP-15267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390031#comment-16390031 ] Steve Loughran edited comment on HADOOP-15267 at 3/7/18 8:10 PM: - +1 OK its in. Anis, thank you for finding this! I'm just seeing how easy this is to backport, as its a major bug was (Author: ste...@apache.org): OK its in. Anis, thank you for finding this! I'm just seeing how easy this is to backport, as its a major bug > S3A multipart upload fails when SSE-C encryption is enabled > --- > > Key: HADOOP-15267 > URL: https://issues.apache.org/jira/browse/HADOOP-15267 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 2.9.0, 3.0.0, 3.1.0 > Environment: Hadoop 3.1 Snapshot >Reporter: Anis Elleuch >Assignee: Anis Elleuch >Priority: Critical > Fix For: 3.1.0, 3.0.2 > > Attachments: HADOOP-15267-001.patch, HADOOP-15267-002.patch, > HADOOP-15267-003.patch > > > When I enable SSE-C encryption in Hadoop 3.1 and set fs.s3a.multipart.size > to 5 Mb, storing data in AWS doesn't work anymore. For example, running the > following code: > {code} > >>> df1 = spark.read.json('/home/user/people.json') > >>> df1.write.mode("overwrite").json("s3a://testbucket/people.json") > {code} > shows the following exception: > {code:java} > com.amazonaws.services.s3.model.AmazonS3Exception: The multipart upload > initiate requested encryption. Subsequent part requests must include the > appropriate encryption parameters. > {code} > After some investigation, I discovered that hadoop-aws doesn't send SSE-C > headers in Put Object Part as stated in AWS specification: > [https://docs.aws.amazon.com/AmazonS3/latest/API/mpUploadUploadPart.html] > {code:java} > If you requested server-side encryption using a customer-provided encryption > key in your initiate multipart upload request, you must provide identical > encryption information in each part upload using the following headers. > {code} > > You can find a patch attached to this issue for a better clarification of the > problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15267) S3A multipart upload fails when SSE-C encryption is enabled
[ https://issues.apache.org/jira/browse/HADOOP-15267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-15267: Affects Version/s: 2.9.0 3.0.0 > S3A multipart upload fails when SSE-C encryption is enabled > --- > > Key: HADOOP-15267 > URL: https://issues.apache.org/jira/browse/HADOOP-15267 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 2.9.0, 3.0.0, 3.1.0 > Environment: Hadoop 3.1 Snapshot >Reporter: Anis Elleuch >Assignee: Anis Elleuch >Priority: Critical > Fix For: 3.1.0, 3.0.2 > > Attachments: HADOOP-15267-001.patch, HADOOP-15267-002.patch, > HADOOP-15267-003.patch > > > When I enable SSE-C encryption in Hadoop 3.1 and set fs.s3a.multipart.size > to 5 Mb, storing data in AWS doesn't work anymore. For example, running the > following code: > {code} > >>> df1 = spark.read.json('/home/user/people.json') > >>> df1.write.mode("overwrite").json("s3a://testbucket/people.json") > {code} > shows the following exception: > {code:java} > com.amazonaws.services.s3.model.AmazonS3Exception: The multipart upload > initiate requested encryption. Subsequent part requests must include the > appropriate encryption parameters. > {code} > After some investigation, I discovered that hadoop-aws doesn't send SSE-C > headers in Put Object Part as stated in AWS specification: > [https://docs.aws.amazon.com/AmazonS3/latest/API/mpUploadUploadPart.html] > {code:java} > If you requested server-side encryption using a customer-provided encryption > key in your initiate multipart upload request, you must provide identical > encryption information in each part upload using the following headers. > {code} > > You can find a patch attached to this issue for a better clarification of the > problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15267) S3A multipart upload fails when SSE-C encryption is enabled
[ https://issues.apache.org/jira/browse/HADOOP-15267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390120#comment-16390120 ] Steve Loughran commented on HADOOP-15267: - +[~fabbri]: FYI > S3A multipart upload fails when SSE-C encryption is enabled > --- > > Key: HADOOP-15267 > URL: https://issues.apache.org/jira/browse/HADOOP-15267 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.1.0 > Environment: Hadoop 3.1 Snapshot >Reporter: Anis Elleuch >Assignee: Anis Elleuch >Priority: Critical > Fix For: 3.1.0, 3.0.2 > > Attachments: HADOOP-15267-001.patch, HADOOP-15267-002.patch, > HADOOP-15267-003.patch > > > When I enable SSE-C encryption in Hadoop 3.1 and set fs.s3a.multipart.size > to 5 Mb, storing data in AWS doesn't work anymore. For example, running the > following code: > {code} > >>> df1 = spark.read.json('/home/user/people.json') > >>> df1.write.mode("overwrite").json("s3a://testbucket/people.json") > {code} > shows the following exception: > {code:java} > com.amazonaws.services.s3.model.AmazonS3Exception: The multipart upload > initiate requested encryption. Subsequent part requests must include the > appropriate encryption parameters. > {code} > After some investigation, I discovered that hadoop-aws doesn't send SSE-C > headers in Put Object Part as stated in AWS specification: > [https://docs.aws.amazon.com/AmazonS3/latest/API/mpUploadUploadPart.html] > {code:java} > If you requested server-side encryption using a customer-provided encryption > key in your initiate multipart upload request, you must provide identical > encryption information in each part upload using the following headers. > {code} > > You can find a patch attached to this issue for a better clarification of the > problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15273) distcp can't handle remote stores with different checksum algorithms
[ https://issues.apache.org/jira/browse/HADOOP-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390093#comment-16390093 ] genericqa commented on HADOOP-15273: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 42s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 16s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 25s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 22s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 31s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 21s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 12s{color} | {color:orange} hadoop-tools/hadoop-distcp: The patch generated 1 new + 8 unchanged - 2 fixed = 9 total (was 10) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 10s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 14m 41s{color} | {color:red} hadoop-distcp in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 23s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 58m 50s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.tools.mapred.TestCopyMapper | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:d4cc50f | | JIRA Issue | HADOOP-15273 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12913420/HADOOP-15273-001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 44786d213856 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / d69b31f | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_151 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-HADOOP-Build/14272/artifact/out/diff-checkstyle-hadoop-tools_hadoop-distcp.txt | | unit | https://builds.apache.org/job/PreCommit-HADOOP-Build/14272/artifact/out/patch-unit-hadoop-tools_hadoop-distcp.txt | | Test Results |
[jira] [Commented] (HADOOP-15267) S3A multipart upload fails when SSE-C encryption is enabled
[ https://issues.apache.org/jira/browse/HADOOP-15267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390061#comment-16390061 ] Steve Loughran commented on HADOOP-15267: - backported to 3.0.x; not got time right now to do look at & retest branch-2...the 3.0 commit should be the one to pick up if anyone wants to > S3A multipart upload fails when SSE-C encryption is enabled > --- > > Key: HADOOP-15267 > URL: https://issues.apache.org/jira/browse/HADOOP-15267 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.1.0 > Environment: Hadoop 3.1 Snapshot >Reporter: Anis Elleuch >Assignee: Anis Elleuch >Priority: Critical > Fix For: 3.1.0, 3.0.2 > > Attachments: HADOOP-15267-001.patch, HADOOP-15267-002.patch, > HADOOP-15267-003.patch > > > When I enable SSE-C encryption in Hadoop 3.1 and set fs.s3a.multipart.size > to 5 Mb, storing data in AWS doesn't work anymore. For example, running the > following code: > {code} > >>> df1 = spark.read.json('/home/user/people.json') > >>> df1.write.mode("overwrite").json("s3a://testbucket/people.json") > {code} > shows the following exception: > {code:java} > com.amazonaws.services.s3.model.AmazonS3Exception: The multipart upload > initiate requested encryption. Subsequent part requests must include the > appropriate encryption parameters. > {code} > After some investigation, I discovered that hadoop-aws doesn't send SSE-C > headers in Put Object Part as stated in AWS specification: > [https://docs.aws.amazon.com/AmazonS3/latest/API/mpUploadUploadPart.html] > {code:java} > If you requested server-side encryption using a customer-provided encryption > key in your initiate multipart upload request, you must provide identical > encryption information in each part upload using the following headers. > {code} > > You can find a patch attached to this issue for a better clarification of the > problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15267) S3A multipart upload fails when SSE-C encryption is enabled
[ https://issues.apache.org/jira/browse/HADOOP-15267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-15267: Fix Version/s: 3.0.2 > S3A multipart upload fails when SSE-C encryption is enabled > --- > > Key: HADOOP-15267 > URL: https://issues.apache.org/jira/browse/HADOOP-15267 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.1.0 > Environment: Hadoop 3.1 Snapshot >Reporter: Anis Elleuch >Assignee: Anis Elleuch >Priority: Critical > Fix For: 3.1.0, 3.0.2 > > Attachments: HADOOP-15267-001.patch, HADOOP-15267-002.patch, > HADOOP-15267-003.patch > > > When I enable SSE-C encryption in Hadoop 3.1 and set fs.s3a.multipart.size > to 5 Mb, storing data in AWS doesn't work anymore. For example, running the > following code: > {code} > >>> df1 = spark.read.json('/home/user/people.json') > >>> df1.write.mode("overwrite").json("s3a://testbucket/people.json") > {code} > shows the following exception: > {code:java} > com.amazonaws.services.s3.model.AmazonS3Exception: The multipart upload > initiate requested encryption. Subsequent part requests must include the > appropriate encryption parameters. > {code} > After some investigation, I discovered that hadoop-aws doesn't send SSE-C > headers in Put Object Part as stated in AWS specification: > [https://docs.aws.amazon.com/AmazonS3/latest/API/mpUploadUploadPart.html] > {code:java} > If you requested server-side encryption using a customer-provided encryption > key in your initiate multipart upload request, you must provide identical > encryption information in each part upload using the following headers. > {code} > > You can find a patch attached to this issue for a better clarification of the > problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15267) S3A multipart upload fails when SSE-C encryption is enabled
[ https://issues.apache.org/jira/browse/HADOOP-15267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390051#comment-16390051 ] Hudson commented on HADOOP-15267: - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13787 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/13787/]) HADOOP-15267. S3A multipart upload fails when SSE-C encryption is (stevel: rev e0307e53e2110cb6b418861a7471e97a013c16e2) * (edit) hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/MockS3AFileSystem.java * (edit) hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java * (add) hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/scale/ITestS3AHugeFilesSSECDiskBlocks.java > S3A multipart upload fails when SSE-C encryption is enabled > --- > > Key: HADOOP-15267 > URL: https://issues.apache.org/jira/browse/HADOOP-15267 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.1.0 > Environment: Hadoop 3.1 Snapshot >Reporter: Anis Elleuch >Assignee: Anis Elleuch >Priority: Critical > Fix For: 3.1.0 > > Attachments: HADOOP-15267-001.patch, HADOOP-15267-002.patch, > HADOOP-15267-003.patch > > > When I enable SSE-C encryption in Hadoop 3.1 and set fs.s3a.multipart.size > to 5 Mb, storing data in AWS doesn't work anymore. For example, running the > following code: > {code} > >>> df1 = spark.read.json('/home/user/people.json') > >>> df1.write.mode("overwrite").json("s3a://testbucket/people.json") > {code} > shows the following exception: > {code:java} > com.amazonaws.services.s3.model.AmazonS3Exception: The multipart upload > initiate requested encryption. Subsequent part requests must include the > appropriate encryption parameters. > {code} > After some investigation, I discovered that hadoop-aws doesn't send SSE-C > headers in Put Object Part as stated in AWS specification: > [https://docs.aws.amazon.com/AmazonS3/latest/API/mpUploadUploadPart.html] > {code:java} > If you requested server-side encryption using a customer-provided encryption > key in your initiate multipart upload request, you must provide identical > encryption information in each part upload using the following headers. > {code} > > You can find a patch attached to this issue for a better clarification of the > problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15292) Distcp's use of pread is slowing it down.
[ https://issues.apache.org/jira/browse/HADOOP-15292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390047#comment-16390047 ] Virajith Jalaparti commented on HADOOP-15292: - [^HADOOP-15292.002.patch] fixes [~ste...@apache.org]'s and [~chris.douglas]'s comment of seeking when {{sourceOffset != inStream.getPos()}}. [~ste...@apache.org] {{ITestAzureNativeContractDistCp}} with this fix. > Distcp's use of pread is slowing it down. > - > > Key: HADOOP-15292 > URL: https://issues.apache.org/jira/browse/HADOOP-15292 > Project: Hadoop Common > Issue Type: Bug > Components: tools/distcp >Affects Versions: 2.5.0 >Reporter: Virajith Jalaparti >Priority: Minor > Attachments: HADOOP-15292.000.patch, HADOOP-15292.001.patch, > HADOOP-15292.002.patch > > > Distcp currently uses positioned-reads (in > RetriableFileCopyCommand#copyBytes) when the source offset is > 0. This > results in unnecessary overheads (new BlockReader being created on the > client-side, multiple readBlock() calls to the Datanodes, each of which > requires the creation of a BlockSender and an inputstream to the ReplicaInfo). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15292) Distcp's use of pread is slowing it down.
[ https://issues.apache.org/jira/browse/HADOOP-15292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Virajith Jalaparti updated HADOOP-15292: Status: Patch Available (was: Open) > Distcp's use of pread is slowing it down. > - > > Key: HADOOP-15292 > URL: https://issues.apache.org/jira/browse/HADOOP-15292 > Project: Hadoop Common > Issue Type: Bug > Components: tools/distcp >Affects Versions: 2.5.0 >Reporter: Virajith Jalaparti >Priority: Minor > Attachments: HADOOP-15292.000.patch, HADOOP-15292.001.patch, > HADOOP-15292.002.patch > > > Distcp currently uses positioned-reads (in > RetriableFileCopyCommand#copyBytes) when the source offset is > 0. This > results in unnecessary overheads (new BlockReader being created on the > client-side, multiple readBlock() calls to the Datanodes, each of which > requires the creation of a BlockSender and an inputstream to the ReplicaInfo). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-15292) Distcp's use of pread is slowing it down.
[ https://issues.apache.org/jira/browse/HADOOP-15292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390047#comment-16390047 ] Virajith Jalaparti edited comment on HADOOP-15292 at 3/7/18 7:07 PM: - [^HADOOP-15292.002.patch] fixes [~ste...@apache.org]'s and [~chris.douglas]'s comment of seeking when {{sourceOffset != inStream.getPos()}}. [~ste...@apache.org] {{ITestAzureNativeContractDistCp}} passes after this fix as well. was (Author: virajith): [^HADOOP-15292.002.patch] fixes [~ste...@apache.org]'s and [~chris.douglas]'s comment of seeking when {{sourceOffset != inStream.getPos()}}. [~ste...@apache.org] {{ITestAzureNativeContractDistCp}} with this fix. > Distcp's use of pread is slowing it down. > - > > Key: HADOOP-15292 > URL: https://issues.apache.org/jira/browse/HADOOP-15292 > Project: Hadoop Common > Issue Type: Bug > Components: tools/distcp >Affects Versions: 2.5.0 >Reporter: Virajith Jalaparti >Priority: Minor > Attachments: HADOOP-15292.000.patch, HADOOP-15292.001.patch, > HADOOP-15292.002.patch > > > Distcp currently uses positioned-reads (in > RetriableFileCopyCommand#copyBytes) when the source offset is > 0. This > results in unnecessary overheads (new BlockReader being created on the > client-side, multiple readBlock() calls to the Datanodes, each of which > requires the creation of a BlockSender and an inputstream to the ReplicaInfo). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15292) Distcp's use of pread is slowing it down.
[ https://issues.apache.org/jira/browse/HADOOP-15292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Virajith Jalaparti updated HADOOP-15292: Status: Open (was: Patch Available) > Distcp's use of pread is slowing it down. > - > > Key: HADOOP-15292 > URL: https://issues.apache.org/jira/browse/HADOOP-15292 > Project: Hadoop Common > Issue Type: Bug > Components: tools/distcp >Affects Versions: 2.5.0 >Reporter: Virajith Jalaparti >Priority: Minor > Attachments: HADOOP-15292.000.patch, HADOOP-15292.001.patch, > HADOOP-15292.002.patch > > > Distcp currently uses positioned-reads (in > RetriableFileCopyCommand#copyBytes) when the source offset is > 0. This > results in unnecessary overheads (new BlockReader being created on the > client-side, multiple readBlock() calls to the Datanodes, each of which > requires the creation of a BlockSender and an inputstream to the ReplicaInfo). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15292) Distcp's use of pread is slowing it down.
[ https://issues.apache.org/jira/browse/HADOOP-15292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Virajith Jalaparti updated HADOOP-15292: Attachment: HADOOP-15292.002.patch > Distcp's use of pread is slowing it down. > - > > Key: HADOOP-15292 > URL: https://issues.apache.org/jira/browse/HADOOP-15292 > Project: Hadoop Common > Issue Type: Bug > Components: tools/distcp >Affects Versions: 2.5.0 >Reporter: Virajith Jalaparti >Priority: Minor > Attachments: HADOOP-15292.000.patch, HADOOP-15292.001.patch, > HADOOP-15292.002.patch > > > Distcp currently uses positioned-reads (in > RetriableFileCopyCommand#copyBytes) when the source offset is > 0. This > results in unnecessary overheads (new BlockReader being created on the > client-side, multiple readBlock() calls to the Datanodes, each of which > requires the creation of a BlockSender and an inputstream to the ReplicaInfo). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15267) S3A multipart upload fails when SSE-C encryption is enabled
[ https://issues.apache.org/jira/browse/HADOOP-15267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-15267: Resolution: Fixed Fix Version/s: 3.1.0 Status: Resolved (was: Patch Available) OK its in. Anis, thank you for finding this! I'm just seeing how easy this is to backport, as its a major bug > S3A multipart upload fails when SSE-C encryption is enabled > --- > > Key: HADOOP-15267 > URL: https://issues.apache.org/jira/browse/HADOOP-15267 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.1.0 > Environment: Hadoop 3.1 Snapshot >Reporter: Anis Elleuch >Assignee: Anis Elleuch >Priority: Critical > Fix For: 3.1.0 > > Attachments: HADOOP-15267-001.patch, HADOOP-15267-002.patch, > HADOOP-15267-003.patch > > > When I enable SSE-C encryption in Hadoop 3.1 and set fs.s3a.multipart.size > to 5 Mb, storing data in AWS doesn't work anymore. For example, running the > following code: > {code} > >>> df1 = spark.read.json('/home/user/people.json') > >>> df1.write.mode("overwrite").json("s3a://testbucket/people.json") > {code} > shows the following exception: > {code:java} > com.amazonaws.services.s3.model.AmazonS3Exception: The multipart upload > initiate requested encryption. Subsequent part requests must include the > appropriate encryption parameters. > {code} > After some investigation, I discovered that hadoop-aws doesn't send SSE-C > headers in Put Object Part as stated in AWS specification: > [https://docs.aws.amazon.com/AmazonS3/latest/API/mpUploadUploadPart.html] > {code:java} > If you requested server-side encryption using a customer-provided encryption > key in your initiate multipart upload request, you must provide identical > encryption information in each part upload using the following headers. > {code} > > You can find a patch attached to this issue for a better clarification of the > problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15273) distcp can't handle remote stores with different checksum algorithms
[ https://issues.apache.org/jira/browse/HADOOP-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-15273: Status: Patch Available (was: Open) Patch 001 * allows -skipcrccheck everywhere * when the filesystem schemas are different not the hdfs ones (hdfs, webhdfs, swebhdfs) then a filesystem message is printed instead of one about block size * error message adds \n formatting * and the correct name of the option to disable the checks Tests: not easily. Maybe after HADOOP-15209 is in I could do it...we'd need something in hadoop-aws with a minihdfs cluster. This is not an easy undertaking. I have manually tested it & verified that yes, the skipcrc goes down. Even with this patch, I'm wondering whether its best to revert the s3a etag feature until we have distcp better able to cope > distcp can't handle remote stores with different checksum algorithms > > > Key: HADOOP-15273 > URL: https://issues.apache.org/jira/browse/HADOOP-15273 > Project: Hadoop Common > Issue Type: Bug > Components: tools/distcp >Affects Versions: 3.1.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Critical > Attachments: HADOOP-15273-001.patch > > > When using distcp without {{-skipcrcchecks}} . If there's a checksum mismatch > between src and dest store types (e.g hdfs to s3), then the error message > will talk about blocksize, even when its the underlying checksum protocol > itself which is the cause for failure > bq. Source and target differ in block-size. Use -pb to preserve block-sizes > during copy. Alternatively, skip checksum-checks altogether, using -skipCrc. > (NOTE: By skipping checksums, one runs the risk of masking data-corruption > during file-transfer.) > update: the CRC check takes always place on a distcp upload before the file > is renamed into place. *and you can't disable it then* -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15273) distcp can't handle remote stores with different checksum algorithms
[ https://issues.apache.org/jira/browse/HADOOP-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-15273: Attachment: HADOOP-15273-001.patch > distcp can't handle remote stores with different checksum algorithms > > > Key: HADOOP-15273 > URL: https://issues.apache.org/jira/browse/HADOOP-15273 > Project: Hadoop Common > Issue Type: Bug > Components: tools/distcp >Affects Versions: 3.1.0 >Reporter: Steve Loughran >Priority: Critical > Attachments: HADOOP-15273-001.patch > > > When using distcp without {{-skipcrcchecks}} . If there's a checksum mismatch > between src and dest store types (e.g hdfs to s3), then the error message > will talk about blocksize, even when its the underlying checksum protocol > itself which is the cause for failure > bq. Source and target differ in block-size. Use -pb to preserve block-sizes > during copy. Alternatively, skip checksum-checks altogether, using -skipCrc. > (NOTE: By skipping checksums, one runs the risk of masking data-corruption > during file-transfer.) > update: the CRC check takes always place on a distcp upload before the file > is renamed into place. *and you can't disable it then* -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15273) distcp can't handle remote stores with different checksum algorithms
[ https://issues.apache.org/jira/browse/HADOOP-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389953#comment-16389953 ] Steve Loughran commented on HADOOP-15273: - An initial distcp always does a checksum check during upload This sequence will fail {code} hadoop fs -rm -R -skipTrash s3a://hwdev-steve-new/\* hadoop distcp /user/steve/data s3a://hwdev-steve-new/data {code} Here {code} 18/03/07 17:08:03 INFO mapreduce.Job: Task Id : attempt_1520388269891_0019_m_04_2, Status : FAILED Error: java.io.IOException: File copy failed: hdfs://mycluster/user/steve/data/example.py --> s3a://hwdev-steve-new/data/example.py at org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:259) at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:217) at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:48) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:794) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1965) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168) Caused by: java.io.IOException: Couldn't run retriable-command: Copying hdfs://mycluster/user/steve/data/example.py to s3a://hwdev-steve-new/data/example.py at org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:101) at org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:256) ... 10 more Caused by: java.io.IOException: Check-sum mismatch between hdfs://mycluster/user/steve/data/example.py and s3a://hwdev-steve-new/data-connectors/.distcp.tmp.attempt_1520388269891_0019_m_04_2. Source and target differ in block-size. Use -pb to preserve block-sizes during copy. Alternatively, skip checksum-checks altogether, using -skipCrc. (NOTE: By skipping checksums, one runs the risk of masking data-corruption during file-transfer.) at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.compareCheckSums(RetriableFileCopyCommand.java:223) at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doCopy(RetriableFileCopyCommand.java:133) at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doExecute(RetriableFileCopyCommand.java:99) at org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:87) ... 11 more {code} You cannot use -skipcrccheck as the validator forbids it, yet without it you can't upload to hdfs to s3a now that it serves up its checksums as etags. > distcp can't handle remote stores with different checksum algorithms > > > Key: HADOOP-15273 > URL: https://issues.apache.org/jira/browse/HADOOP-15273 > Project: Hadoop Common > Issue Type: Bug > Components: tools/distcp >Affects Versions: 3.1.0 >Reporter: Steve Loughran >Priority: Critical > Attachments: HADOOP-15273-001.patch > > > When using distcp without {{-skipcrcchecks}} . If there's a checksum mismatch > between src and dest store types (e.g hdfs to s3), then the error message > will talk about blocksize, even when its the underlying checksum protocol > itself which is the cause for failure > bq. Source and target differ in block-size. Use -pb to preserve block-sizes > during copy. Alternatively, skip checksum-checks altogether, using -skipCrc. > (NOTE: By skipping checksums, one runs the risk of masking data-corruption > during file-transfer.) > update: the CRC check takes always place on a distcp upload before the file > is renamed into place. *and you can't disable it then* -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Assigned] (HADOOP-15273) distcp can't handle remote stores with different checksum algorithms
[ https://issues.apache.org/jira/browse/HADOOP-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran reassigned HADOOP-15273: --- Assignee: Steve Loughran > distcp can't handle remote stores with different checksum algorithms > > > Key: HADOOP-15273 > URL: https://issues.apache.org/jira/browse/HADOOP-15273 > Project: Hadoop Common > Issue Type: Bug > Components: tools/distcp >Affects Versions: 3.1.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Critical > Attachments: HADOOP-15273-001.patch > > > When using distcp without {{-skipcrcchecks}} . If there's a checksum mismatch > between src and dest store types (e.g hdfs to s3), then the error message > will talk about blocksize, even when its the underlying checksum protocol > itself which is the cause for failure > bq. Source and target differ in block-size. Use -pb to preserve block-sizes > during copy. Alternatively, skip checksum-checks altogether, using -skipCrc. > (NOTE: By skipping checksums, one runs the risk of masking data-corruption > during file-transfer.) > update: the CRC check takes always place on a distcp upload before the file > is renamed into place. *and you can't disable it then* -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15273) distcp can't handle remote stores with different checksum algorithms
[ https://issues.apache.org/jira/browse/HADOOP-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-15273: Description: When using distcp without {{-skipcrcchecks}} . If there's a checksum mismatch between src and dest store types (e.g hdfs to s3), then the error message will talk about blocksize, even when its the underlying checksum protocol itself which is the cause for failure bq. Source and target differ in block-size. Use -pb to preserve block-sizes during copy. Alternatively, skip checksum-checks altogether, using -skipCrc. (NOTE: By skipping checksums, one runs the risk of masking data-corruption during file-transfer.) update: the CRC check takes always place on a distcp upload before the file is renamed into place. *and you can't disable it then* was: When using distcp without {{-skipCRC}} . If there's a checksum mismatch between src and dest store types (e.g hdfs to s3), then the error message will talk about blocksize, even when its the underlying checksum protocol itself which is the cause for failure bq. Source and target differ in block-size. Use -pb to preserve block-sizes during copy. Alternatively, skip checksum-checks altogether, using -skipCrc. (NOTE: By skipping checksums, one runs the risk of masking data-corruption during file-transfer.) IF the checksum types are fundamentally different, the error message should say so > distcp can't handle remote stores with different checksum algorithms > > > Key: HADOOP-15273 > URL: https://issues.apache.org/jira/browse/HADOOP-15273 > Project: Hadoop Common > Issue Type: Bug > Components: tools/distcp >Affects Versions: 3.1.0 >Reporter: Steve Loughran >Priority: Critical > > When using distcp without {{-skipcrcchecks}} . If there's a checksum mismatch > between src and dest store types (e.g hdfs to s3), then the error message > will talk about blocksize, even when its the underlying checksum protocol > itself which is the cause for failure > bq. Source and target differ in block-size. Use -pb to preserve block-sizes > during copy. Alternatively, skip checksum-checks altogether, using -skipCrc. > (NOTE: By skipping checksums, one runs the risk of masking data-corruption > during file-transfer.) > update: the CRC check takes always place on a distcp upload before the file > is renamed into place. *and you can't disable it then* -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15273) distcp can't handle remote stores with different checksum algorithms
[ https://issues.apache.org/jira/browse/HADOOP-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-15273: Priority: Critical (was: Minor) > distcp can't handle remote stores with different checksum algorithms > > > Key: HADOOP-15273 > URL: https://issues.apache.org/jira/browse/HADOOP-15273 > Project: Hadoop Common > Issue Type: Bug > Components: tools/distcp >Affects Versions: 3.1.0 >Reporter: Steve Loughran >Priority: Critical > > When using distcp without {{-skipCRC}} . If there's a checksum mismatch > between src and dest store types (e.g hdfs to s3), then the error message > will talk about blocksize, even when its the underlying checksum protocol > itself which is the cause for failure > bq. Source and target differ in block-size. Use -pb to preserve block-sizes > during copy. Alternatively, skip checksum-checks altogether, using -skipCrc. > (NOTE: By skipping checksums, one runs the risk of masking data-corruption > during file-transfer.) > IF the checksum types are fundamentally different, the error message should > say so -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15273) distcp can't handle remote stores with different checksum algorithms
[ https://issues.apache.org/jira/browse/HADOOP-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-15273: Summary: distcp can't handle remote stores with different checksum algorithms (was: distcp to downgrade on checksum algorithm mismatch to "files unchanged") > distcp can't handle remote stores with different checksum algorithms > > > Key: HADOOP-15273 > URL: https://issues.apache.org/jira/browse/HADOOP-15273 > Project: Hadoop Common > Issue Type: Bug > Components: tools/distcp >Affects Versions: 3.1.0 >Reporter: Steve Loughran >Priority: Minor > > When using distcp without {{-skipCRC}} . If there's a checksum mismatch > between src and dest store types (e.g hdfs to s3), then the error message > will talk about blocksize, even when its the underlying checksum protocol > itself which is the cause for failure > bq. Source and target differ in block-size. Use -pb to preserve block-sizes > during copy. Alternatively, skip checksum-checks altogether, using -skipCrc. > (NOTE: By skipping checksums, one runs the risk of masking data-corruption > during file-transfer.) > IF the checksum types are fundamentally different, the error message should > say so -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15267) S3A multipart upload fails when SSE-C encryption is enabled
[ https://issues.apache.org/jira/browse/HADOOP-15267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389675#comment-16389675 ] genericqa commented on HADOOP-15267: | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 53s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 18s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 32s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 49s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 40s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 4s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 4m 41s{color} | {color:green} hadoop-aws in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 26s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 51m 37s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:d4cc50f | | JIRA Issue | HADOOP-15267 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12913197/HADOOP-15267-003.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux c1d3d7470c71 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 58ea2d7 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_151 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-HADOOP-Build/14271/testReport/ | | Max. process+thread count | 292 (vs. ulimit of 1) | | modules | C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/14271/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > S3A multipart upload fails when SSE-C encryption is enabled > --- > > Key: HADOOP-15267 > URL:
[jira] [Updated] (HADOOP-15277) remove .FluentPropertyBeanIntrospector from CLI operation log output
[ https://issues.apache.org/jira/browse/HADOOP-15277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-15277: Issue Type: Sub-task (was: Improvement) Parent: HADOOP-14831 > remove .FluentPropertyBeanIntrospector from CLI operation log output > > > Key: HADOOP-15277 > URL: https://issues.apache.org/jira/browse/HADOOP-15277 > Project: Hadoop Common > Issue Type: Sub-task > Components: conf >Affects Versions: 3.1.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Major > Attachments: HADOOP-15277-001.patch > > > when using the default logs, I get told off by beanutils > {code} > 18/03/01 18:43:54 INFO beanutils.FluentPropertyBeanIntrospector: Error when > creating PropertyDescriptor for public final void > org.apache.commons.configuration2.AbstractConfiguration.setProperty(java.lang.String,java.lang.Object)! > Ignoring this property. > {code} > This is a distraction. > I propose to raise the log level to ERROR for that class in log4j.properties -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15273) distcp to downgrade on checksum algorithm mismatch to "files unchanged"
[ https://issues.apache.org/jira/browse/HADOOP-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389633#comment-16389633 ] Steve Loughran commented on HADOOP-15273: - oh, and the text is actually wrong, as the arg is "-skipcrccheck" > distcp to downgrade on checksum algorithm mismatch to "files unchanged" > --- > > Key: HADOOP-15273 > URL: https://issues.apache.org/jira/browse/HADOOP-15273 > Project: Hadoop Common > Issue Type: Bug > Components: tools/distcp >Affects Versions: 3.1.0 >Reporter: Steve Loughran >Priority: Minor > > When using distcp without {{-skipCRC}} . If there's a checksum mismatch > between src and dest store types (e.g hdfs to s3), then the error message > will talk about blocksize, even when its the underlying checksum protocol > itself which is the cause for failure > bq. Source and target differ in block-size. Use -pb to preserve block-sizes > during copy. Alternatively, skip checksum-checks altogether, using -skipCrc. > (NOTE: By skipping checksums, one runs the risk of masking data-corruption > during file-transfer.) > IF the checksum types are fundamentally different, the error message should > say so -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-10786) Fix UGI#reloginFromKeytab on Java 8
[ https://issues.apache.org/jira/browse/HADOOP-10786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389630#comment-16389630 ] Steve Loughran commented on HADOOP-10786: - Looks like it's in branch-2.7 {code} > git log --grep HADOOP-10786 branch-2.7 commit 8f4a09b6076de9fbd6cd8ccaddf72ba9c94429ff Author: Vinayakumar BDate: Fri Aug 14 12:23:51 2015 +0530 HADOOP-10786. Fix UGI#reloginFromKeytab on Java 8. Contributed by Stephen Chu. Moved CHANGES.txt entry to 2.6.1 (cherry picked from commit e7aa81394dce61cc96d480e21204263a5f2ed153) {code} The code has moved on a lot since that patch went in, which is why there's no match. > Fix UGI#reloginFromKeytab on Java 8 > --- > > Key: HADOOP-10786 > URL: https://issues.apache.org/jira/browse/HADOOP-10786 > Project: Hadoop Common > Issue Type: Bug > Components: security >Affects Versions: 2.6.0 >Reporter: Tobi Vollebregt >Assignee: Stephen Chu >Priority: Major > Labels: 2.6.1-candidate > Fix For: 2.6.1, 2.7.0, 3.0.0-alpha1 > > Attachments: HADOOP-10786.2.patch, HADOOP-10786.3.patch, > HADOOP-10786.3.patch, HADOOP-10786.4.patch, HADOOP-10786.5.patch, > HADOOP-10786.patch > > > Krb5LoginModule changed subtly in java 8: in particular, if useKeyTab and > storeKey are specified, then only a KeyTab object is added to the Subject's > private credentials, whereas in java <= 7 both a KeyTab and some number of > KerberosKey objects were added. > The UGI constructor checks whether or not a keytab was used to login by > looking if there are any KerberosKey objects in the Subject's private > credentials. If there are, then isKeyTab is set to true, and otherwise it's > set to false. > Thus, in java 8 isKeyTab is always false given the current UGI > implementation, which makes UGI#reloginFromKeytab fail silently. > Attached patch will check for a KeyTab object on the Subject, instead of a > KerberosKey object. This fixes relogins from kerberos keytabs on Oracle java > 8, and works on Oracle java 7 as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15292) Distcp's use of pread is slowing it down.
[ https://issues.apache.org/jira/browse/HADOOP-15292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389616#comment-16389616 ] Steve Loughran commented on HADOOP-15292: - # I like the extra instrumentation & probes; if it works for HDFS it'll be the same everywhere # I think chris's comment about {{sourceOffset != inStream.getPos()}} seems valid. If the file is newly opened, this is the same as offset!=0, otherwise its relative to where you are. w.r.t S3 testing, I can see why it wouldn't be your default, but our test suites are designed to be very low cost (no persistent data, bias to uploads and large D/Ls all from AWS funded buckets). It's worth getting set up for this to help verify consistent behaviour everywhere. At the very least, make sure the Azure WASB store tests are happy. (you don't get an ADL test until HADOOP-15209). > Distcp's use of pread is slowing it down. > - > > Key: HADOOP-15292 > URL: https://issues.apache.org/jira/browse/HADOOP-15292 > Project: Hadoop Common > Issue Type: Bug > Components: tools/distcp >Affects Versions: 2.5.0 >Reporter: Virajith Jalaparti >Priority: Minor > Attachments: HADOOP-15292.000.patch, HADOOP-15292.001.patch > > > Distcp currently uses positioned-reads (in > RetriableFileCopyCommand#copyBytes) when the source offset is > 0. This > results in unnecessary overheads (new BlockReader being created on the > client-side, multiple readBlock() calls to the Datanodes, each of which > requires the creation of a BlockSender and an inputstream to the ReplicaInfo). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15293) TestLogLevel fails on Java 9
[ https://issues.apache.org/jira/browse/HADOOP-15293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389605#comment-16389605 ] genericqa commented on HADOOP-15293: | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 1s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 13m 49s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 48s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 5s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 59s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 54s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 12m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 59s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 28s{color} | {color:green} hadoop-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 48s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 84m 48s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:d4cc50f | | JIRA Issue | HADOOP-15293 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12913382/HADOOP-15293.1.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 922c6517d5df 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 58ea2d7 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_151 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-HADOOP-Build/14269/testReport/ | | Max. process+thread count | 1361 (vs. ulimit of 1) | | modules | C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/14269/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > TestLogLevel fails on Java 9 > > > Key: HADOOP-15293 > URL:
[jira] [Commented] (HADOOP-15267) S3A multipart upload fails when SSE-C encryption is enabled
[ https://issues.apache.org/jira/browse/HADOOP-15267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389597#comment-16389597 ] Steve Loughran commented on HADOOP-15267: - now that I've remembered to hit "submit patch", if yetus is happy it'll go in. After that, the best thing you can do is verify that it works for you > S3A multipart upload fails when SSE-C encryption is enabled > --- > > Key: HADOOP-15267 > URL: https://issues.apache.org/jira/browse/HADOOP-15267 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.1.0 > Environment: Hadoop 3.1 Snapshot >Reporter: Anis Elleuch >Assignee: Anis Elleuch >Priority: Critical > Attachments: HADOOP-15267-001.patch, HADOOP-15267-002.patch, > HADOOP-15267-003.patch > > > When I enable SSE-C encryption in Hadoop 3.1 and set fs.s3a.multipart.size > to 5 Mb, storing data in AWS doesn't work anymore. For example, running the > following code: > {code} > >>> df1 = spark.read.json('/home/user/people.json') > >>> df1.write.mode("overwrite").json("s3a://testbucket/people.json") > {code} > shows the following exception: > {code:java} > com.amazonaws.services.s3.model.AmazonS3Exception: The multipart upload > initiate requested encryption. Subsequent part requests must include the > appropriate encryption parameters. > {code} > After some investigation, I discovered that hadoop-aws doesn't send SSE-C > headers in Put Object Part as stated in AWS specification: > [https://docs.aws.amazon.com/AmazonS3/latest/API/mpUploadUploadPart.html] > {code:java} > If you requested server-side encryption using a customer-provided encryption > key in your initiate multipart upload request, you must provide identical > encryption information in each part upload using the following headers. > {code} > > You can find a patch attached to this issue for a better clarification of the > problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15267) S3A multipart upload fails when SSE-C encryption is enabled
[ https://issues.apache.org/jira/browse/HADOOP-15267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-15267: Status: Patch Available (was: Open) > S3A multipart upload fails when SSE-C encryption is enabled > --- > > Key: HADOOP-15267 > URL: https://issues.apache.org/jira/browse/HADOOP-15267 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.1.0 > Environment: Hadoop 3.1 Snapshot >Reporter: Anis Elleuch >Assignee: Anis Elleuch >Priority: Critical > Attachments: HADOOP-15267-001.patch, HADOOP-15267-002.patch, > HADOOP-15267-003.patch > > > When I enable SSE-C encryption in Hadoop 3.1 and set fs.s3a.multipart.size > to 5 Mb, storing data in AWS doesn't work anymore. For example, running the > following code: > {code} > >>> df1 = spark.read.json('/home/user/people.json') > >>> df1.write.mode("overwrite").json("s3a://testbucket/people.json") > {code} > shows the following exception: > {code:java} > com.amazonaws.services.s3.model.AmazonS3Exception: The multipart upload > initiate requested encryption. Subsequent part requests must include the > appropriate encryption parameters. > {code} > After some investigation, I discovered that hadoop-aws doesn't send SSE-C > headers in Put Object Part as stated in AWS specification: > [https://docs.aws.amazon.com/AmazonS3/latest/API/mpUploadUploadPart.html] > {code:java} > If you requested server-side encryption using a customer-provided encryption > key in your initiate multipart upload request, you must provide identical > encryption information in each part upload using the following headers. > {code} > > You can find a patch attached to this issue for a better clarification of the > problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13126) Add Brotli compression codec
[ https://issues.apache.org/jira/browse/HADOOP-13126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389593#comment-16389593 ] genericqa commented on HADOOP-13126: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 6s{color} | {color:red} HADOOP-13126 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HADOOP-13126 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12831838/HADOOP-13126.5.patch | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/14270/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Add Brotli compression codec > > > Key: HADOOP-13126 > URL: https://issues.apache.org/jira/browse/HADOOP-13126 > Project: Hadoop Common > Issue Type: Improvement > Components: io >Affects Versions: 2.7.2 >Reporter: Ryan Blue >Assignee: Ryan Blue >Priority: Major > Attachments: HADOOP-13126.1.patch, HADOOP-13126.2.patch, > HADOOP-13126.3.patch, HADOOP-13126.4.patch, HADOOP-13126.5.patch > > > I've been testing [Brotli|https://github.com/google/brotli/], a new > compression library based on LZ77 from Google. Google's [brotli > benchmarks|https://cran.r-project.org/web/packages/brotli/vignettes/brotli-2015-09-22.pdf] > look really good and we're also seeing a significant improvement in > compression size, compression speed, or both. > {code:title=Brotli preliminary test results} > [blue@work Downloads]$ time parquet from test.parquet -o test.snappy.parquet > --compression-codec snappy --overwrite > real1m17.106s > user1m30.804s > sys 0m4.404s > [blue@work Downloads]$ time parquet from test.parquet -o test.br.parquet > --compression-codec brotli --overwrite > real1m16.640s > user1m24.244s > sys 0m6.412s > [blue@work Downloads]$ time parquet from test.parquet -o test.gz.parquet > --compression-codec gzip --overwrite > real3m39.496s > user3m48.736s > sys 0m3.880s > [blue@work Downloads]$ ls -l > -rw-r--r-- 1 blue blue 1068821936 May 10 11:06 test.br.parquet > -rw-r--r-- 1 blue blue 1421601880 May 10 11:10 test.gz.parquet > -rw-r--r-- 1 blue blue 2265950833 May 10 10:30 test.snappy.parquet > {code} > Brotli, at quality 1, is as fast as snappy and ends up smaller than gzip-9. > Another test resulted in a slightly larger Brotli file than gzip produced, > but Brotli was 4x faster. I'd like to get this compression codec into Hadoop. > [Brotli is licensed with the MIT > license|https://github.com/google/brotli/blob/master/LICENSE], and the [JNI > library jbrotli is > ALv2|https://github.com/MeteoGroup/jbrotli/blob/master/LICENSE]. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13126) Add Brotli compression codec
[ https://issues.apache.org/jira/browse/HADOOP-13126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389588#comment-16389588 ] Steve Loughran commented on HADOOP-13126: - Not noticed this before. It would seem OK for Hadoop 3.2/2.10+; too late for the 3.1. Would probably need some more tests. Maybe even adding a test resource with a brotli compressed file as the reference; all a round trip does is verify that you can round trip "something", not that the compressor is working > Add Brotli compression codec > > > Key: HADOOP-13126 > URL: https://issues.apache.org/jira/browse/HADOOP-13126 > Project: Hadoop Common > Issue Type: Improvement > Components: io >Affects Versions: 2.7.2 >Reporter: Ryan Blue >Assignee: Ryan Blue >Priority: Major > Attachments: HADOOP-13126.1.patch, HADOOP-13126.2.patch, > HADOOP-13126.3.patch, HADOOP-13126.4.patch, HADOOP-13126.5.patch > > > I've been testing [Brotli|https://github.com/google/brotli/], a new > compression library based on LZ77 from Google. Google's [brotli > benchmarks|https://cran.r-project.org/web/packages/brotli/vignettes/brotli-2015-09-22.pdf] > look really good and we're also seeing a significant improvement in > compression size, compression speed, or both. > {code:title=Brotli preliminary test results} > [blue@work Downloads]$ time parquet from test.parquet -o test.snappy.parquet > --compression-codec snappy --overwrite > real1m17.106s > user1m30.804s > sys 0m4.404s > [blue@work Downloads]$ time parquet from test.parquet -o test.br.parquet > --compression-codec brotli --overwrite > real1m16.640s > user1m24.244s > sys 0m6.412s > [blue@work Downloads]$ time parquet from test.parquet -o test.gz.parquet > --compression-codec gzip --overwrite > real3m39.496s > user3m48.736s > sys 0m3.880s > [blue@work Downloads]$ ls -l > -rw-r--r-- 1 blue blue 1068821936 May 10 11:06 test.br.parquet > -rw-r--r-- 1 blue blue 1421601880 May 10 11:10 test.gz.parquet > -rw-r--r-- 1 blue blue 2265950833 May 10 10:30 test.snappy.parquet > {code} > Brotli, at quality 1, is as fast as snappy and ends up smaller than gzip-9. > Another test resulted in a slightly larger Brotli file than gzip produced, > but Brotli was 4x faster. I'd like to get this compression codec into Hadoop. > [Brotli is licensed with the MIT > license|https://github.com/google/brotli/blob/master/LICENSE], and the [JNI > library jbrotli is > ALv2|https://github.com/MeteoGroup/jbrotli/blob/master/LICENSE]. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15293) TestLogLevel fails on Java 9
[ https://issues.apache.org/jira/browse/HADOOP-15293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389496#comment-16389496 ] Takanobu Asanuma commented on HADOOP-15293: --- Uploaded the 1st patch. The error message is slightly changed between java8 and java9. The patch uses one more try-catch block to handle both of them. > TestLogLevel fails on Java 9 > > > Key: HADOOP-15293 > URL: https://issues.apache.org/jira/browse/HADOOP-15293 > Project: Hadoop Common > Issue Type: Sub-task > Components: test > Environment: Applied HADOOP-12760 and HDFS-11610 >Reporter: Akira Ajisaka >Assignee: Takanobu Asanuma >Priority: Major > Attachments: HADOOP-15293.1.patch > > > {noformat} > [INFO] Running org.apache.hadoop.log.TestLogLevel > [ERROR] Tests run: 7, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 9.805 > s <<< FAILURE! - in org.apache.hadoop.log.TestLogLevel > [ERROR] testLogLevelByHttpWithSpnego(org.apache.hadoop.log.TestLogLevel) > Time elapsed: 1.179 s <<< FAILURE! > java.lang.AssertionError: > Expected to find 'Unrecognized SSL message' but got unexpected exception: > javax.net.ssl.SSLException: Unsupported or unrecognized SSL message > at > java.base/sun.security.ssl.SSLSocketInputRecord.handleUnknownRecord(SSLSocketInputRecord.java:416) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15293) TestLogLevel fails on Java 9
[ https://issues.apache.org/jira/browse/HADOOP-15293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takanobu Asanuma updated HADOOP-15293: -- Status: Patch Available (was: Open) > TestLogLevel fails on Java 9 > > > Key: HADOOP-15293 > URL: https://issues.apache.org/jira/browse/HADOOP-15293 > Project: Hadoop Common > Issue Type: Sub-task > Components: test > Environment: Applied HADOOP-12760 and HDFS-11610 >Reporter: Akira Ajisaka >Assignee: Takanobu Asanuma >Priority: Major > Attachments: HADOOP-15293.1.patch > > > {noformat} > [INFO] Running org.apache.hadoop.log.TestLogLevel > [ERROR] Tests run: 7, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 9.805 > s <<< FAILURE! - in org.apache.hadoop.log.TestLogLevel > [ERROR] testLogLevelByHttpWithSpnego(org.apache.hadoop.log.TestLogLevel) > Time elapsed: 1.179 s <<< FAILURE! > java.lang.AssertionError: > Expected to find 'Unrecognized SSL message' but got unexpected exception: > javax.net.ssl.SSLException: Unsupported or unrecognized SSL message > at > java.base/sun.security.ssl.SSLSocketInputRecord.handleUnknownRecord(SSLSocketInputRecord.java:416) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15293) TestLogLevel fails on Java 9
[ https://issues.apache.org/jira/browse/HADOOP-15293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takanobu Asanuma updated HADOOP-15293: -- Attachment: HADOOP-15293.1.patch > TestLogLevel fails on Java 9 > > > Key: HADOOP-15293 > URL: https://issues.apache.org/jira/browse/HADOOP-15293 > Project: Hadoop Common > Issue Type: Sub-task > Components: test > Environment: Applied HADOOP-12760 and HDFS-11610 >Reporter: Akira Ajisaka >Assignee: Takanobu Asanuma >Priority: Major > Attachments: HADOOP-15293.1.patch > > > {noformat} > [INFO] Running org.apache.hadoop.log.TestLogLevel > [ERROR] Tests run: 7, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 9.805 > s <<< FAILURE! - in org.apache.hadoop.log.TestLogLevel > [ERROR] testLogLevelByHttpWithSpnego(org.apache.hadoop.log.TestLogLevel) > Time elapsed: 1.179 s <<< FAILURE! > java.lang.AssertionError: > Expected to find 'Unrecognized SSL message' but got unexpected exception: > javax.net.ssl.SSLException: Unsupported or unrecognized SSL message > at > java.base/sun.security.ssl.SSLSocketInputRecord.handleUnknownRecord(SSLSocketInputRecord.java:416) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15293) TestLogLevel fails on Java 9
[ https://issues.apache.org/jira/browse/HADOOP-15293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389439#comment-16389439 ] Takanobu Asanuma commented on HADOOP-15293: --- I would like to work on this issue. Thanks. > TestLogLevel fails on Java 9 > > > Key: HADOOP-15293 > URL: https://issues.apache.org/jira/browse/HADOOP-15293 > Project: Hadoop Common > Issue Type: Sub-task > Components: test > Environment: Applied HADOOP-12760 and HDFS-11610 >Reporter: Akira Ajisaka >Priority: Major > > {noformat} > [INFO] Running org.apache.hadoop.log.TestLogLevel > [ERROR] Tests run: 7, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 9.805 > s <<< FAILURE! - in org.apache.hadoop.log.TestLogLevel > [ERROR] testLogLevelByHttpWithSpnego(org.apache.hadoop.log.TestLogLevel) > Time elapsed: 1.179 s <<< FAILURE! > java.lang.AssertionError: > Expected to find 'Unrecognized SSL message' but got unexpected exception: > javax.net.ssl.SSLException: Unsupported or unrecognized SSL message > at > java.base/sun.security.ssl.SSLSocketInputRecord.handleUnknownRecord(SSLSocketInputRecord.java:416) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14445) Delegation tokens are not shared between KMS instances
[ https://issues.apache.org/jira/browse/HADOOP-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Chen updated HADOOP-14445: --- Attachment: (was: HADOOP-14445.004.patch) > Delegation tokens are not shared between KMS instances > -- > > Key: HADOOP-14445 > URL: https://issues.apache.org/jira/browse/HADOOP-14445 > Project: Hadoop Common > Issue Type: Bug > Components: kms >Affects Versions: 2.8.0, 3.0.0-alpha1 > Environment: CDH5.7.4, Kerberized, SSL, KMS-HA, at rest encryption >Reporter: Wei-Chiu Chuang >Assignee: Xiao Chen >Priority: Major > Attachments: HADOOP-14445-branch-2.8.002.patch, > HADOOP-14445-branch-2.8.patch, HADOOP-14445.002.patch, > HADOOP-14445.003.patch, HADOOP-14445.004.patch > > > As discovered in HADOOP-14441, KMS HA using LoadBalancingKMSClientProvider do > not share delegation tokens. (a client uses KMS address/port as the key for > delegation token) > {code:title=DelegationTokenAuthenticatedURL#openConnection} > if (!creds.getAllTokens().isEmpty()) { > InetSocketAddress serviceAddr = new InetSocketAddress(url.getHost(), > url.getPort()); > Text service = SecurityUtil.buildTokenService(serviceAddr); > dToken = creds.getToken(service); > {code} > But KMS doc states: > {quote} > Delegation Tokens > Similar to HTTP authentication, KMS uses Hadoop Authentication for delegation > tokens too. > Under HA, A KMS instance must verify the delegation token given by another > KMS instance, by checking the shared secret used to sign the delegation > token. To do this, all KMS instances must be able to retrieve the shared > secret from ZooKeeper. > {quote} > We should either update the KMS documentation, or fix this code to share > delegation tokens. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14445) Delegation tokens are not shared between KMS instances
[ https://issues.apache.org/jira/browse/HADOOP-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389216#comment-16389216 ] Xiao Chen commented on HADOOP-14445: Attached a preliminary [^HADOOP-14445.004.patch] that handles this via token kind. Code needs cleaning up but it should show the general idea. [~shahrs87] and [~daryn], would you mind to take a quick look to see if this converges with what you had in mind? - Added a new token kind {{KMS_DELEGATION_TOKEN}}, and made the old {{kms-dt}} deprecated. - New client will dup a legacy token if the newly created token is the new kind. This duplication is on by default but can be turned off by config. - On authentication, the token selection logic will first favor the new kind, and fall back to the old way if not found. - Added a dedicated Renewer to handle the new token kind. Old renewer behavior unchanged. Tested this works via some cross-cluster bidirectional distcp runs, where one cluster was upgraded and one was not. Will go through the patch on Wednesday to polish things up, and add more unit tests. > Delegation tokens are not shared between KMS instances > -- > > Key: HADOOP-14445 > URL: https://issues.apache.org/jira/browse/HADOOP-14445 > Project: Hadoop Common > Issue Type: Bug > Components: kms >Affects Versions: 2.8.0, 3.0.0-alpha1 > Environment: CDH5.7.4, Kerberized, SSL, KMS-HA, at rest encryption >Reporter: Wei-Chiu Chuang >Assignee: Xiao Chen >Priority: Major > Attachments: HADOOP-14445-branch-2.8.002.patch, > HADOOP-14445-branch-2.8.patch, HADOOP-14445.002.patch, > HADOOP-14445.003.patch, HADOOP-14445.004.patch, HADOOP-14445.004.patch > > > As discovered in HADOOP-14441, KMS HA using LoadBalancingKMSClientProvider do > not share delegation tokens. (a client uses KMS address/port as the key for > delegation token) > {code:title=DelegationTokenAuthenticatedURL#openConnection} > if (!creds.getAllTokens().isEmpty()) { > InetSocketAddress serviceAddr = new InetSocketAddress(url.getHost(), > url.getPort()); > Text service = SecurityUtil.buildTokenService(serviceAddr); > dToken = creds.getToken(service); > {code} > But KMS doc states: > {quote} > Delegation Tokens > Similar to HTTP authentication, KMS uses Hadoop Authentication for delegation > tokens too. > Under HA, A KMS instance must verify the delegation token given by another > KMS instance, by checking the shared secret used to sign the delegation > token. To do this, all KMS instances must be able to retrieve the shared > secret from ZooKeeper. > {quote} > We should either update the KMS documentation, or fix this code to share > delegation tokens. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org