[jira] [Commented] (HADOOP-15446) WASB: PageBlobInputStream.skip breaks HBASE replication
[ https://issues.apache.org/jira/browse/HADOOP-15446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466862#comment-16466862 ] genericqa commented on HADOOP-15446: | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 26s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} branch-2 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 15s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 17s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 31s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s{color} | {color:green} branch-2 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 21s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 12s{color} | {color:orange} hadoop-tools/hadoop-azure: The patch generated 20 new + 3 unchanged - 0 fixed = 23 total (was 3) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 3s{color} | {color:green} hadoop-azure in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 23s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 22m 12s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:f667ef1 | | JIRA Issue | HADOOP-15446 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12922393/HADOOP-15446-branch-2.001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 0b0a6a021c83 3.13.0-137-generic #186-Ubuntu SMP Mon Dec 4 19:09:19 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | branch-2 / 5679920 | | maven | version: Apache Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c07478323dc5; 2015-11-10T16:41:47+00:00) | | Default Java | 1.7.0_171 | | checkstyle | https://builds.apache.org/job/PreCommit-HADOOP-Build/14600/artifact/out/diff-checkstyle-hadoop-tools_hadoop-azure.txt | | Test Results | https://builds.apache.org/job/PreCommit-HADOOP-Build/14600/testReport/ | | Max. process+thread count | 190 (vs. ulimit of 1) | | modules | C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/14600/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > WASB: PageBlobInputStream.skip breaks HBASE replication > --- > > Key: HADOOP-15446 > URL: https://issues.apache.org/jira/browse/HADOOP-15446 > Project: Hadoop Common > Issue Type: Bug > Components: fs/azure >Affects Versions: 2.9.0, 3.0.2 >
[jira] [Commented] (HADOOP-15446) WASB: PageBlobInputStream.skip breaks HBASE replication
[ https://issues.apache.org/jira/browse/HADOOP-15446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466846#comment-16466846 ] Thomas Marquardt commented on HADOOP-15446: --- FYI, the same changes apply to both branch-2 and trunk. > WASB: PageBlobInputStream.skip breaks HBASE replication > --- > > Key: HADOOP-15446 > URL: https://issues.apache.org/jira/browse/HADOOP-15446 > Project: Hadoop Common > Issue Type: Bug > Components: fs/azure >Affects Versions: 2.9.0, 3.0.2 >Reporter: Thomas Marquardt >Assignee: Thomas Marquardt >Priority: Major > Attachments: HADOOP-15446-001.patch, HADOOP-15446-002.patch, > HADOOP-15446-003.patch, HADOOP-15446-branch-2.001.patch > > > Page Blobs are primarily used by HBASE. HBASE replication, which apparently > has not been used with WASB until recently, performs non-sequential reads on > log files using PageBlobInputStream. There are bugs in this stream > implementation which prevent skip and seek from working properly, and > eventually the stream state becomes corrupt and unusable. > I believe this bug affects all releases of WASB/HADOOP. It appears to be a > day-0 bug in PageBlobInputStream. There were similar bugs opened in the past > (HADOOP-15042) but the issue was not properly fixed, and no test coverage was > added. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15446) WASB: PageBlobInputStream.skip breaks HBASE replication
[ https://issues.apache.org/jira/browse/HADOOP-15446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466843#comment-16466843 ] Thomas Marquardt commented on HADOOP-15446: --- For branch-2, I've attached HADOOP-15446-branch-2.001.patch. All tests are passing against my storage account: *$ mvn test -Dtest=ITestPageBlobInputStream#** [INFO] Running org.apache.hadoop.fs.azure.ITestPageBlobInputStream [INFO] Tests run: 11, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 8.212 s - in org.apache.hadoop.fs.azure.ITestPageBlobInputStream *$ mvn -T 1C clean verify* [INFO] Results: [INFO] [WARNING] Tests run: 233, Failures: 0, Errors: 0, Skipped: 4 [INFO] Results: [INFO] [WARNING] Tests run: 570, Failures: 0, Errors: 0, Skipped: 12 > WASB: PageBlobInputStream.skip breaks HBASE replication > --- > > Key: HADOOP-15446 > URL: https://issues.apache.org/jira/browse/HADOOP-15446 > Project: Hadoop Common > Issue Type: Bug > Components: fs/azure >Affects Versions: 2.9.0, 3.0.2 >Reporter: Thomas Marquardt >Assignee: Thomas Marquardt >Priority: Major > Attachments: HADOOP-15446-001.patch, HADOOP-15446-002.patch, > HADOOP-15446-003.patch, HADOOP-15446-branch-2.001.patch > > > Page Blobs are primarily used by HBASE. HBASE replication, which apparently > has not been used with WASB until recently, performs non-sequential reads on > log files using PageBlobInputStream. There are bugs in this stream > implementation which prevent skip and seek from working properly, and > eventually the stream state becomes corrupt and unusable. > I believe this bug affects all releases of WASB/HADOOP. It appears to be a > day-0 bug in PageBlobInputStream. There were similar bugs opened in the past > (HADOOP-15042) but the issue was not properly fixed, and no test coverage was > added. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15446) WASB: PageBlobInputStream.skip breaks HBASE replication
[ https://issues.apache.org/jira/browse/HADOOP-15446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Marquardt updated HADOOP-15446: -- Attachment: HADOOP-15446-branch-2.001.patch > WASB: PageBlobInputStream.skip breaks HBASE replication > --- > > Key: HADOOP-15446 > URL: https://issues.apache.org/jira/browse/HADOOP-15446 > Project: Hadoop Common > Issue Type: Bug > Components: fs/azure >Affects Versions: 2.9.0, 3.0.2 >Reporter: Thomas Marquardt >Assignee: Thomas Marquardt >Priority: Major > Attachments: HADOOP-15446-001.patch, HADOOP-15446-002.patch, > HADOOP-15446-003.patch, HADOOP-15446-branch-2.001.patch > > > Page Blobs are primarily used by HBASE. HBASE replication, which apparently > has not been used with WASB until recently, performs non-sequential reads on > log files using PageBlobInputStream. There are bugs in this stream > implementation which prevent skip and seek from working properly, and > eventually the stream state becomes corrupt and unusable. > I believe this bug affects all releases of WASB/HADOOP. It appears to be a > day-0 bug in PageBlobInputStream. There were similar bugs opened in the past > (HADOOP-15042) but the issue was not properly fixed, and no test coverage was > added. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13649) s3guard: implement time-based (TTL) expiry for LocalMetadataStore
[ https://issues.apache.org/jira/browse/HADOOP-13649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466836#comment-16466836 ] genericqa commented on HADOOP-13649: | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 25s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 19s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 33s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 2s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 27s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 4m 32s{color} | {color:green} hadoop-aws in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 22s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 58m 19s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd | | JIRA Issue | HADOOP-13649 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12922382/HADOOP-13649.003.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 48d7bc95e671 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 08ea90e | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_162 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-HADOOP-Build/14599/testReport/ | | Max. process+thread count | 352 (vs. ulimit of 1) | | modules | C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/14599/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > s3guard: implement time-based (TTL) expiry for LocalMetadataStore > - > > Key: HADOOP-13649 >
[jira] [Comment Edited] (HADOOP-13649) s3guard: implement time-based (TTL) expiry for LocalMetadataStore
[ https://issues.apache.org/jira/browse/HADOOP-13649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466798#comment-16466798 ] Aaron Fabbri edited comment on HADOOP-13649 at 5/8/18 3:07 AM: --- Nice work [~gabor.bota], this looks good. I hope you don't mind, I've attached a v3 of the patch with a couple tweaks: - Fix the checkstyle issue. - TTL is optional (if ttl config is zero, it is disabled) - TTL units seconds -> milliseconds (in case anyone wants to play with very short TTLs, which i think are interesting) - Minor test changes to make sure they still work with zero TTL - Added "Evolving" API annotations to the "undocumented" configs used for LocalMetadataStore. Please review and give a "+1 (nonbinding)" if you like these changes. I tested in US West 2. was (Author: fabbri): Nice work [~gabor.bota], this looks good. I hope you don't mind, I've attached a v3 of the patch with a couple tweaks: - Fix the checkstyle issue. - TTL is optional (if ttl config is zero, it is disabled) - TTL units seconds -> milliseconds (in case anyone wants to play with very short TTLs, which i think are interesting) - Minor test changes to make sure they still work with zero TTL Please review and give a "+1 (nonbinding)" if you like these changes. I tested in US West 2. > s3guard: implement time-based (TTL) expiry for LocalMetadataStore > - > > Key: HADOOP-13649 > URL: https://issues.apache.org/jira/browse/HADOOP-13649 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.0.0-beta1 >Reporter: Aaron Fabbri >Assignee: Gabor Bota >Priority: Minor > Attachments: HADOOP-13649.001.patch, HADOOP-13649.002.patch, > HADOOP-13649.003.patch > > > LocalMetadataStore is primarily a reference implementation for testing. It > may be useful in narrow circumstances where the workload can tolerate > short-term lack of inter-node consistency: Being in-memory, one JVM/node's > LocalMetadataStore will not see another node's changes to the underlying > filesystem. > To put a bound on the time during which this inconsistency may occur, we > should implement time-based (a.k.a. Time To Live / TTL) expiration for > LocalMetadataStore -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-13649) s3guard: implement time-based (TTL) expiry for LocalMetadataStore
[ https://issues.apache.org/jira/browse/HADOOP-13649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Fabbri updated HADOOP-13649: -- Attachment: HADOOP-13649.003.patch > s3guard: implement time-based (TTL) expiry for LocalMetadataStore > - > > Key: HADOOP-13649 > URL: https://issues.apache.org/jira/browse/HADOOP-13649 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.0.0-beta1 >Reporter: Aaron Fabbri >Assignee: Gabor Bota >Priority: Minor > Attachments: HADOOP-13649.001.patch, HADOOP-13649.002.patch, > HADOOP-13649.003.patch > > > LocalMetadataStore is primarily a reference implementation for testing. It > may be useful in narrow circumstances where the workload can tolerate > short-term lack of inter-node consistency: Being in-memory, one JVM/node's > LocalMetadataStore will not see another node's changes to the underlying > filesystem. > To put a bound on the time during which this inconsistency may occur, we > should implement time-based (a.k.a. Time To Live / TTL) expiration for > LocalMetadataStore -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13649) s3guard: implement time-based (TTL) expiry for LocalMetadataStore
[ https://issues.apache.org/jira/browse/HADOOP-13649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466798#comment-16466798 ] Aaron Fabbri commented on HADOOP-13649: --- Nice work [~gabor.bota], this looks good. I hope you don't mind, I've attached a v3 of the patch with a couple tweaks: - Fix the checkstyle issue. - TTL is optional (if ttl config is zero, it is disabled) - TTL units seconds -> milliseconds (in case anyone wants to play with very short TTLs, which i think are interesting) - Minor test changes to make sure they still work with zero TTL Please review and give a "+1 (nonbinding)" if you like these changes. I tested in US West 2. > s3guard: implement time-based (TTL) expiry for LocalMetadataStore > - > > Key: HADOOP-13649 > URL: https://issues.apache.org/jira/browse/HADOOP-13649 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.0.0-beta1 >Reporter: Aaron Fabbri >Assignee: Gabor Bota >Priority: Minor > Attachments: HADOOP-13649.001.patch, HADOOP-13649.002.patch, > HADOOP-13649.003.patch > > > LocalMetadataStore is primarily a reference implementation for testing. It > may be useful in narrow circumstances where the workload can tolerate > short-term lack of inter-node consistency: Being in-memory, one JVM/node's > LocalMetadataStore will not see another node's changes to the underlying > filesystem. > To put a bound on the time during which this inconsistency may occur, we > should implement time-based (a.k.a. Time To Live / TTL) expiration for > LocalMetadataStore -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-15420) s3guard ITestS3GuardToolLocal failures in diff tests
[ https://issues.apache.org/jira/browse/HADOOP-15420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466764#comment-16466764 ] Aaron Fabbri edited comment on HADOOP-15420 at 5/8/18 2:44 AM: --- Thank you for working on this issue [~gabor.bota]. Good work identifying the bug. Couple comments on v1 patch: {noformat} private boolean expired(FileStatus status, long expiry, String keyPrefix) { +// remove the protocol from path string to be able to compare +String bucket = status.getPath().toUri().getHost(); + statusTranslatedPath = status.getPath().toUri().getPath(); +} + {noformat} Can you use helper func {{standardize(Path)}} here instead? Thanks for moving {{ testDiffCommand() }} to the base class. Did you test this with Dynamo? (`mvn clean test -Ds3guard -Ddynamo`) Unfortunately dynamodb metadatastore test the Local (in-memory) Test Dynamo thing still (until we finish HADOOP-14918). was (Author: fabbri): Thank you for working on this issue [~gabor.bota]. Good work identifying the bug. Couple comments on v1 patch: {noformat} private boolean expired(FileStatus status, long expiry, String keyPrefix) { +// remove the protocol from path string to be able to compare +String bucket = status.getPath().toUri().getHost(); + statusTranslatedPath = status.getPath().toUri().getPath(); +} + {noformat} Can you use helper func {{standardize(Path)}} here instead? Thanks for moving {{ testDiffCommand() }} to the base class. Did you test this with Dynamo? (`mvn clean test -Ds3guard -Ddynamo`) Unfortunately dynamodb metadatastore test the Local (in-memory) Test Dynamo thing still (until we finish HADOOP-14918). Also a reminder please declare which AWS region you ran integration tests in. > s3guard ITestS3GuardToolLocal failures in diff tests > > > Key: HADOOP-15420 > URL: https://issues.apache.org/jira/browse/HADOOP-15420 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Aaron Fabbri >Assignee: Gabor Bota >Priority: Minor > Attachments: HADOOP-15420.001.patch, HADOOP-15420.002.patch > > > Noticed this when testing the patch for HADOOP-13756. > > {code:java} > [ERROR] Failures: > [ERROR] > ITestS3GuardToolLocal>AbstractS3GuardToolTestBase.testPruneCommandCLI:221->AbstractS3GuardToolTestBase.testPruneCommand:201->AbstractS3GuardToolTestBase.assertMetastoreListingCount:214->Assert.assertEquals:555->Assert.assertEquals:118->Assert.failNotEquals:743->Assert.fail:88 > Pruned children count > [PathMetadata{fileStatus=S3AFileStatus{path=s3a://bucket-new/test/testPruneCommandCLI/stale; > isDirectory=false; length=100; replication=1; blocksize=512; > modification_time=1524798258286; access_time=0; owner=hdfs; group=hdfs; > permission=rw-rw-rw-; isSymlink=false; hasAcl=false; isEncrypted=false; > isErasureCoded=false} isEmptyDirectory=FALSE; isEmptyDirectory=UNKNOWN; > isDeleted=false}, > PathMetadata{fileStatus=S3AFileStatus{path=s3a://bucket-new/test/testPruneCommandCLI/fresh; > isDirectory=false; length=100; replication=1; blocksize=512; > modification_time=1524798262583; access_time=0; owner=hdfs; group=hdfs; > permission=rw-rw-rw-; isSymlink=false; hasAcl=false; isEncrypted=false; > isErasureCoded=false} isEmptyDirectory=FALSE; isEmptyDirectory=UNKNOWN; > isDeleted=false}] expected:<1> but was:<2>{code} > > Looking through the code, I'm noticing a couple of issues. > > 1. {{testDiffCommand()}} is in {{ITestS3GuardToolLocal}}, but it should > really be running for all MetadataStore implementations. Seems like it > should live in {{AbstractS3GuardToolTestBase}}. > 2. {{AbstractS3GuardToolTestBase#createFile()}} seems wrong. When > {{onMetadataStore}} is false, it does a {{ContractTestUtils.touch(file)}}, > but the fs is initialized with a MetadataStore present, so seem like the fs > will still put the file in the MetadataStore? > There are other tests which explicitly go around the MetadataStore by using > {{fs.setMetadataStore(nullMS)}}, e.g. ITestS3AInconsistency. We should do > something similar in {{AbstractS3GuardToolTestBase#createFile()}}, minding > any issues with parallel test runs. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14444) New implementation of ftp and sftp filesystems
[ https://issues.apache.org/jira/browse/HADOOP-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466769#comment-16466769 ] genericqa commented on HADOOP-1: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 27s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 35 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 22s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 58s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 28m 17s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 3m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 53s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-project hadoop-tools {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 32s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 36s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 36s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 1m 27s{color} | {color:red} hadoop-tools in the patch failed. {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 29m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 29m 26s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 3m 20s{color} | {color:orange} root: The patch generated 4 new + 7 unchanged - 0 fixed = 11 total (was 7) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 4m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} shellcheck {color} | {color:green} 0m 0s{color} | {color:green} There were no new shellcheck issues. {color} | | {color:green}+1{color} | {color:green} shelldocs {color} | {color:green} 0m 28s{color} | {color:green} There were no new shelldocs issues. {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 6s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 37s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-project hadoop-tools {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 14s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 25s{color} | {color:green} hadoop-project in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 10m 24s{color} | {color:green} hadoop-common in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 10m 56s{color} | {color:red} hadoop-ftp in the patch failed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 62m 9s{color} | {color:red} hadoop-tools in the patch failed.
[jira] [Updated] (HADOOP-15399) KMSAcls should read kms-site.xml file.
[ https://issues.apache.org/jira/browse/HADOOP-15399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated HADOOP-15399: Fix Version/s: (was: 2.8.4) > KMSAcls should read kms-site.xml file. > -- > > Key: HADOOP-15399 > URL: https://issues.apache.org/jira/browse/HADOOP-15399 > Project: Hadoop Common > Issue Type: Bug > Components: kms >Reporter: Rushabh S Shah >Assignee: Rushabh S Shah >Priority: Major > > KMSACLs uses {{AccessControlList}} for authorization. > For creating groups membership, the group implementation class that will be > instantiated is configured by {{hadoop.security.group.mapping}}. > Today {{KMSACLs}} class reads only {{kms-acls.xml}} file to create > {{AccessControlList}}. > {{kms-acls.xml}} doesn't look the right place add the above config. > So KMSAcls should read either kms-site. > [~xiaochen]: Any preference which file should acls load ? > IMO it should be kms-site because that file is mandatory. But all the > properties in kms-site.xml starts with {{hadoop.kms}}, I am little bit > inclined towards core-site.xml. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15420) s3guard ITestS3GuardToolLocal failures in diff tests
[ https://issues.apache.org/jira/browse/HADOOP-15420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466764#comment-16466764 ] Aaron Fabbri commented on HADOOP-15420: --- Thank you for working on this issue [~gabor.bota]. Good work identifying the bug. Couple comments on v1 patch: {noformat} private boolean expired(FileStatus status, long expiry, String keyPrefix) { +// remove the protocol from path string to be able to compare +String bucket = status.getPath().toUri().getHost(); + statusTranslatedPath = status.getPath().toUri().getPath(); +} + {noformat} Can you use helper func {{standardize(Path)}} here instead? Thanks for moving {{ testDiffCommand() }} to the base class. Did you test this with Dynamo? (`mvn clean test -Ds3guard -Ddynamo`) Unfortunately dynamodb metadatastore test the Local (in-memory) Test Dynamo thing still (until we finish HADOOP-14918). Also a reminder please declare which AWS region you ran integration tests in. > s3guard ITestS3GuardToolLocal failures in diff tests > > > Key: HADOOP-15420 > URL: https://issues.apache.org/jira/browse/HADOOP-15420 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Aaron Fabbri >Assignee: Gabor Bota >Priority: Minor > Attachments: HADOOP-15420.001.patch, HADOOP-15420.002.patch > > > Noticed this when testing the patch for HADOOP-13756. > > {code:java} > [ERROR] Failures: > [ERROR] > ITestS3GuardToolLocal>AbstractS3GuardToolTestBase.testPruneCommandCLI:221->AbstractS3GuardToolTestBase.testPruneCommand:201->AbstractS3GuardToolTestBase.assertMetastoreListingCount:214->Assert.assertEquals:555->Assert.assertEquals:118->Assert.failNotEquals:743->Assert.fail:88 > Pruned children count > [PathMetadata{fileStatus=S3AFileStatus{path=s3a://bucket-new/test/testPruneCommandCLI/stale; > isDirectory=false; length=100; replication=1; blocksize=512; > modification_time=1524798258286; access_time=0; owner=hdfs; group=hdfs; > permission=rw-rw-rw-; isSymlink=false; hasAcl=false; isEncrypted=false; > isErasureCoded=false} isEmptyDirectory=FALSE; isEmptyDirectory=UNKNOWN; > isDeleted=false}, > PathMetadata{fileStatus=S3AFileStatus{path=s3a://bucket-new/test/testPruneCommandCLI/fresh; > isDirectory=false; length=100; replication=1; blocksize=512; > modification_time=1524798262583; access_time=0; owner=hdfs; group=hdfs; > permission=rw-rw-rw-; isSymlink=false; hasAcl=false; isEncrypted=false; > isErasureCoded=false} isEmptyDirectory=FALSE; isEmptyDirectory=UNKNOWN; > isDeleted=false}] expected:<1> but was:<2>{code} > > Looking through the code, I'm noticing a couple of issues. > > 1. {{testDiffCommand()}} is in {{ITestS3GuardToolLocal}}, but it should > really be running for all MetadataStore implementations. Seems like it > should live in {{AbstractS3GuardToolTestBase}}. > 2. {{AbstractS3GuardToolTestBase#createFile()}} seems wrong. When > {{onMetadataStore}} is false, it does a {{ContractTestUtils.touch(file)}}, > but the fs is initialized with a MetadataStore present, so seem like the fs > will still put the file in the MetadataStore? > There are other tests which explicitly go around the MetadataStore by using > {{fs.setMetadataStore(nullMS)}}, e.g. ITestS3AInconsistency. We should do > something similar in {{AbstractS3GuardToolTestBase#createFile()}}, minding > any issues with parallel test runs. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15450) Avoid fsync storm triggered by DiskChecker and handle disk full situation
[ https://issues.apache.org/jira/browse/HADOOP-15450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated HADOOP-15450: Target Version/s: 3.1.1, 2.9.2, 3.0.3, 2.8.5 (was: 2.8.4, 3.1.1, 2.9.2, 3.0.3) > Avoid fsync storm triggered by DiskChecker and handle disk full situation > - > > Key: HADOOP-15450 > URL: https://issues.apache.org/jira/browse/HADOOP-15450 > Project: Hadoop Common > Issue Type: Bug >Reporter: Kihwal Lee >Assignee: Arpit Agarwal >Priority: Blocker > > Fix disk checker issues reported by [~kihwal] in HADOOP-13738: > # When space is low, the os returns ENOSPC. Instead simply stop writing, the > drive is marked bad and replication happens. This make cluster-wide space > problem worse. If the number of "failed" drives exceeds the DFIP limit, the > datanode shuts down. > # There are non-hdfs users of DiskChecker, who use it proactively, not just > on failures. This was fine before, but now it incurs heavy I/O due to > introduction of fsync() in the code. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13738) DiskChecker should perform some disk IO
[ https://issues.apache.org/jira/browse/HADOOP-13738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466748#comment-16466748 ] Junping Du commented on HADOOP-13738: - Revert it from branch-2.8.4 but keep it on branch-2.8 as I plan to kick off 2.8.4 RC0 today. We can leave the work to 2.8.5. > DiskChecker should perform some disk IO > --- > > Key: HADOOP-13738 > URL: https://issues.apache.org/jira/browse/HADOOP-13738 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Arpit Agarwal >Assignee: Arpit Agarwal >Priority: Major > Fix For: 2.9.0, 3.0.0-alpha2, 2.8.5 > > Attachments: HADOOP-13738-branch-2.8-06.patch, HADOOP-13738.01.patch, > HADOOP-13738.02.patch, HADOOP-13738.03.patch, HADOOP-13738.04.patch, > HADOOP-13738.05.patch > > > DiskChecker can fail to detect total disk/controller failures indefinitely. > We have seen this in real clusters. DiskChecker performs simple > permissions-based checks on directories which do not guarantee that any disk > IO will be attempted. > A simple improvement is to write some data and flush it to the disk. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-13738) DiskChecker should perform some disk IO
[ https://issues.apache.org/jira/browse/HADOOP-13738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated HADOOP-13738: Fix Version/s: (was: 2.8.4) 2.8.5 > DiskChecker should perform some disk IO > --- > > Key: HADOOP-13738 > URL: https://issues.apache.org/jira/browse/HADOOP-13738 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Arpit Agarwal >Assignee: Arpit Agarwal >Priority: Major > Fix For: 2.9.0, 3.0.0-alpha2, 2.8.5 > > Attachments: HADOOP-13738-branch-2.8-06.patch, HADOOP-13738.01.patch, > HADOOP-13738.02.patch, HADOOP-13738.03.patch, HADOOP-13738.04.patch, > HADOOP-13738.05.patch > > > DiskChecker can fail to detect total disk/controller failures indefinitely. > We have seen this in real clusters. DiskChecker performs simple > permissions-based checks on directories which do not guarantee that any disk > IO will be attempted. > A simple improvement is to write some data and flush it to the disk. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15449) ZK performance issues causing frequent Namenode failover
[ https://issues.apache.org/jira/browse/HADOOP-15449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466685#comment-16466685 ] genericqa commented on HADOOP-15449: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 20s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 56s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 29m 21s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 9s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 67m 47s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 56s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 26m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 26m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 21s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 14s{color} | {color:green} hadoop-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 37s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}119m 31s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd | | JIRA Issue | HADOOP-15449 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/1293/HADOOP-15449.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient xml | | uname | Linux e2eea1bc6f6f 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 696a4be | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_162 | | Test Results | https://builds.apache.org/job/PreCommit-HADOOP-Build/14597/testReport/ | | Max. process+thread count | 1513 (vs. ulimit of 1) | | modules | C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/14597/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > ZK performance issues causing frequent Namenode failover > - > > Key: HADOOP-15449 > URL: https://issues.apache.org/jira/browse/HADOOP-15449 > Project: Hadoop Common > Issue Type: Wish > Components: common >Affects Versions: 2.7.4 >Reporter: Karthik Palanisamy >Assignee: Karthik Palanisamy >
[jira] [Commented] (HADOOP-15420) s3guard ITestS3GuardToolLocal failures in diff tests
[ https://issues.apache.org/jira/browse/HADOOP-15420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466624#comment-16466624 ] genericqa commented on HADOOP-15420: | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 42s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 33s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 19s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 5s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 40s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 4m 40s{color} | {color:green} hadoop-aws in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 23s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 60m 16s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd | | JIRA Issue | HADOOP-15420 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12922327/HADOOP-15420.002.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux f57ac7a5353d 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 696a4be | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_162 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-HADOOP-Build/14598/testReport/ | | Max. process+thread count | 345 (vs. ulimit of 1) | | modules | C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/14598/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > s3guard ITestS3GuardToolLocal failures in diff tests > > > Key: HADOOP-15420 > URL:
[jira] [Commented] (HADOOP-15446) WASB: PageBlobInputStream.skip breaks HBASE replication
[ https://issues.apache.org/jira/browse/HADOOP-15446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466557#comment-16466557 ] Thomas Marquardt commented on HADOOP-15446: --- Yes, I will submit a branch-2 patch later today. > WASB: PageBlobInputStream.skip breaks HBASE replication > --- > > Key: HADOOP-15446 > URL: https://issues.apache.org/jira/browse/HADOOP-15446 > Project: Hadoop Common > Issue Type: Bug > Components: fs/azure >Affects Versions: 2.9.0, 3.0.2 >Reporter: Thomas Marquardt >Assignee: Thomas Marquardt >Priority: Major > Attachments: HADOOP-15446-001.patch, HADOOP-15446-002.patch, > HADOOP-15446-003.patch > > > Page Blobs are primarily used by HBASE. HBASE replication, which apparently > has not been used with WASB until recently, performs non-sequential reads on > log files using PageBlobInputStream. There are bugs in this stream > implementation which prevent skip and seek from working properly, and > eventually the stream state becomes corrupt and unusable. > I believe this bug affects all releases of WASB/HADOOP. It appears to be a > day-0 bug in PageBlobInputStream. There were similar bugs opened in the past > (HADOOP-15042) but the issue was not properly fixed, and no test coverage was > added. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15441) After HADOOP-14445, encryption zone operations print unnecessary INFO logs
[ https://issues.apache.org/jira/browse/HADOOP-15441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466486#comment-16466486 ] Xiao Chen commented on HADOOP-15441: Changed links based on Wei-Chiu's comment > After HADOOP-14445, encryption zone operations print unnecessary INFO logs > -- > > Key: HADOOP-15441 > URL: https://issues.apache.org/jira/browse/HADOOP-15441 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Wei-Chiu Chuang >Assignee: Gabor Bota >Priority: Minor > Attachments: HADOOP-15441.001.patch, HADOOP-15441.002.patch > > > It looks like after HADOOP-14445, any encryption zone operations prints extra > INFO log messages as follows: > {code:java} > $ hdfs dfs -copyFromLocal /etc/krb5.conf /scale/ > 18/05/02 11:54:55 INFO kms.KMSClientProvider: KMSClientProvider for KMS url: > https://hadoop3-1.example.com:16000/kms/v1/ delegation token service: > kms://ht...@hadoop3-1.example.com:16000/kms created. > {code} > It might make sense to make it a DEBUG message instead. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15408) HADOOP-14445 broke Spark.
[ https://issues.apache.org/jira/browse/HADOOP-15408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Chen updated HADOOP-15408: --- Resolution: Invalid Status: Resolved (was: Patch Available) With HADOOP-14445 reverted (see [discussion|https://issues.apache.org/jira/browse/HADOOP-14445?focusedCommentId=16464600=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16464600]), this is no longer an issue. Thanks all for the report and investigation. > HADOOP-14445 broke Spark. > - > > Key: HADOOP-15408 > URL: https://issues.apache.org/jira/browse/HADOOP-15408 > Project: Hadoop Common > Issue Type: Bug >Reporter: Rushabh S Shah >Priority: Blocker > Attachments: HADOOP-15408-trunk.001.patch, > HADOOP-15408.trunk.poc.patch, split.patch, split.prelim.patch > > > Spark bundles hadoop related jars in their package. > Spark expects backwards compatibility between minor versions. > Their job failed after we deployed HADOOP-14445 in our test cluster. > {noformat} > 2018-04-20 21:09:53,245 INFO [main] > org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Executing with tokens: > 2018-04-20 21:09:53,273 ERROR [main] > org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster > java.util.ServiceConfigurationError: > org.apache.hadoop.security.token.TokenIdentifier: Provider > org.apache.hadoop.crypto.key.kms.KMSDelegationToken$ > KMSLegacyDelegationTokenIdentifier could not be instantiated > at java.util.ServiceLoader.fail(ServiceLoader.java:232) > at java.util.ServiceLoader.access$100(ServiceLoader.java:185) > at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:384) > at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404) > at java.util.ServiceLoader$1.next(ServiceLoader.java:480) > at > org.apache.hadoop.security.token.Token.getClassForIdentifier(Token.java:117) > at org.apache.hadoop.security.token.Token.decodeIdentifier(Token.java:138) > at org.apache.hadoop.security.token.Token.identifierToString(Token.java:393) > at org.apache.hadoop.security.token.Token.toString(Token.java:413) > at java.lang.String.valueOf(String.java:2994) > at > org.apache.commons.logging.impl.SLF4JLocationAwareLog.info(SLF4JLocationAwareLog.java:155) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1634) > at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1583) > Caused by: java.lang.NoSuchFieldError: TOKEN_LEGACY_KIND > at > org.apache.hadoop.crypto.key.kms.KMSDelegationToken$KMSLegacyDelegationTokenIdentifier.(KMSDelegationToken.java:64) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at java.lang.Class.newInstance(Class.java:442) > at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:380) > ... 10 more > 2018-04-20 21:09:53,278 INFO [main] org.apache.hadoop.util.ExitUtil: Exiting > with status 1 > {noformat} > Their classpath looks like > {{\{...:hadoop-common-pre-HADOOP-14445.jar:.:hadoop-common-with-HADOOP-14445.jar:\}}} > This is because the container loaded {{KMSDelegationToken}} class from an > older jar and {{KMSLegacyDelegationTokenIdentifier}} from new jar and it > fails when {{KMSLegacyDelegationTokenIdentifier}} wants to read > {{TOKEN_LEGACY_KIND}} from {{KMSDelegationToken}} which doesn't exist before. > Cc [~xiaochen] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15431) KMSTokenRenewer should work with KMS_DELEGATION_TOKEN which has ip:port as service
[ https://issues.apache.org/jira/browse/HADOOP-15431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Chen updated HADOOP-15431: --- Resolution: Invalid Status: Resolved (was: Patch Available) With HADOOP-14445 reverted (see [discussion|https://issues.apache.org/jira/browse/HADOOP-14445?focusedCommentId=16464600=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16464600]), this is no longer an issue. Thanks all for the report and investigation. > KMSTokenRenewer should work with KMS_DELEGATION_TOKEN which has ip:port as > service > -- > > Key: HADOOP-15431 > URL: https://issues.apache.org/jira/browse/HADOOP-15431 > Project: Hadoop Common > Issue Type: Bug > Components: kms >Affects Versions: 2.10.0, 2.8.4, 3.2.0, 3.1.1, 2.9.2, 3.0.3 >Reporter: Xiao Chen >Assignee: Xiao Chen >Priority: Blocker > Attachments: HADOOP-15431.01.patch, HADOOP-15431.02.patch > > > Seen a test failure where a MR job failed to submit. > RM log has: > {noformat} > 2018-04-30 15:00:17,864 WARN > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer: > Unable to add the application to the delegation token renewer. > java.lang.IllegalArgumentException: Invalid token service IP_ADDR:16000 > at > org.apache.hadoop.util.KMSUtil.createKeyProviderFromTokenService(KMSUtil.java:237) > at > org.apache.hadoop.crypto.key.kms.KMSTokenRenewer.createKeyProvider(KMSTokenRenewer.java:100) > at > org.apache.hadoop.crypto.key.kms.KMSTokenRenewer.renew(KMSTokenRenewer.java:57) > at org.apache.hadoop.security.token.Token.renew(Token.java:414) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:590) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:587) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.renewToken(DelegationTokenRenewer.java:585) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:463) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$800(DelegationTokenRenewer.java:79) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:894) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:871) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {noformat} > while client log has > {noformat} > 18/04/30 15:53:28 INFO mapreduce.JobSubmitter: Submitting tokens for job: > job_1525128478242_0001 > 18/04/30 15:53:28 INFO mapreduce.JobSubmitter: Kind: HDFS_DELEGATION_TOKEN, > Service: ha-hdfs:ns1, Ident: (token for systest: HDFS_DELEGATION_TOKEN > owner=syst...@example.com, renewer=yarn, realUser=, issueDate=1525128807236, > maxDate=1525733607236, sequenceNumber=1038, masterKeyId=20) > 18/04/30 15:53:28 INFO mapreduce.JobSubmitter: Kind: HBASE_AUTH_TOKEN, > Service: 621a942b-292f-493d-ba50-f9b783704359, Ident: > (org.apache.hadoop.hbase.security.token.AuthenticationTokenIdentifier@0) > 18/04/30 15:53:28 INFO mapreduce.JobSubmitter: Kind: KMS_DELEGATION_TOKEN, > Service: IP_ADDR:16000, Ident: 00 07 73 79 73 74 65 73 74 04 79 61 72 6e 00 > 8a 01 63 18 c2 c3 d5 8a 01 63 3c cf 47 d5 8e 01 ec 10 > 18/04/30 15:53:29 INFO mapreduce.JobSubmitter: Cleaning up the staging area > /user/systest/.staging/job_1525128478242_0001 > 18/04/30 15:53:29 WARN security.UserGroupInformation: > PriviledgedActionException as:syst...@example.com (auth:KERBEROS) > cause:java.io.IOException: org.apache.hadoop.yarn.exceptions.YarnException: > Failed to submit application_1525128478242_0001 to YARN : Invalid token > service IP_ADDR:16000 > 18/04/30 15:53:29 INFO client.ConnectionManager$HConnectionImplementation: > Closing master protocol: MasterService > 18/04/30 15:53:29 INFO client.ConnectionManager$HConnectionImplementation: > Closing zookeeper sessionid=0x1630ba2d0001cb5 > 18/04/30 15:53:29 INFO zookeeper.ZooKeeper: Session:
[jira] [Assigned] (HADOOP-15408) HADOOP-14445 broke Spark.
[ https://issues.apache.org/jira/browse/HADOOP-15408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Chen reassigned HADOOP-15408: -- Assignee: Rushabh S Shah > HADOOP-14445 broke Spark. > - > > Key: HADOOP-15408 > URL: https://issues.apache.org/jira/browse/HADOOP-15408 > Project: Hadoop Common > Issue Type: Bug >Reporter: Rushabh S Shah >Assignee: Rushabh S Shah >Priority: Blocker > Attachments: HADOOP-15408-trunk.001.patch, > HADOOP-15408.trunk.poc.patch, split.patch, split.prelim.patch > > > Spark bundles hadoop related jars in their package. > Spark expects backwards compatibility between minor versions. > Their job failed after we deployed HADOOP-14445 in our test cluster. > {noformat} > 2018-04-20 21:09:53,245 INFO [main] > org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Executing with tokens: > 2018-04-20 21:09:53,273 ERROR [main] > org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster > java.util.ServiceConfigurationError: > org.apache.hadoop.security.token.TokenIdentifier: Provider > org.apache.hadoop.crypto.key.kms.KMSDelegationToken$ > KMSLegacyDelegationTokenIdentifier could not be instantiated > at java.util.ServiceLoader.fail(ServiceLoader.java:232) > at java.util.ServiceLoader.access$100(ServiceLoader.java:185) > at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:384) > at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404) > at java.util.ServiceLoader$1.next(ServiceLoader.java:480) > at > org.apache.hadoop.security.token.Token.getClassForIdentifier(Token.java:117) > at org.apache.hadoop.security.token.Token.decodeIdentifier(Token.java:138) > at org.apache.hadoop.security.token.Token.identifierToString(Token.java:393) > at org.apache.hadoop.security.token.Token.toString(Token.java:413) > at java.lang.String.valueOf(String.java:2994) > at > org.apache.commons.logging.impl.SLF4JLocationAwareLog.info(SLF4JLocationAwareLog.java:155) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1634) > at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1583) > Caused by: java.lang.NoSuchFieldError: TOKEN_LEGACY_KIND > at > org.apache.hadoop.crypto.key.kms.KMSDelegationToken$KMSLegacyDelegationTokenIdentifier.(KMSDelegationToken.java:64) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at java.lang.Class.newInstance(Class.java:442) > at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:380) > ... 10 more > 2018-04-20 21:09:53,278 INFO [main] org.apache.hadoop.util.ExitUtil: Exiting > with status 1 > {noformat} > Their classpath looks like > {{\{...:hadoop-common-pre-HADOOP-14445.jar:.:hadoop-common-with-HADOOP-14445.jar:\}}} > This is because the container loaded {{KMSDelegationToken}} class from an > older jar and {{KMSLegacyDelegationTokenIdentifier}} from new jar and it > fails when {{KMSLegacyDelegationTokenIdentifier}} wants to read > {{TOKEN_LEGACY_KIND}} from {{KMSDelegationToken}} which doesn't exist before. > Cc [~xiaochen] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14445) Delegation tokens are not shared between KMS instances
[ https://issues.apache.org/jira/browse/HADOOP-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Chen updated HADOOP-14445: --- Fix Version/s: (was: 3.0.3) (was: 2.9.2) (was: 3.1.1) (was: 2.8.4) (was: 2.10.0) Status: Open (was: Patch Available) > Delegation tokens are not shared between KMS instances > -- > > Key: HADOOP-14445 > URL: https://issues.apache.org/jira/browse/HADOOP-14445 > Project: Hadoop Common > Issue Type: Bug > Components: kms >Affects Versions: 3.0.0-alpha1, 2.8.0 > Environment: CDH5.7.4, Kerberized, SSL, KMS-HA, at rest encryption >Reporter: Wei-Chiu Chuang >Assignee: Xiao Chen >Priority: Major > Attachments: HADOOP-14445-branch-2.8.002.patch, > HADOOP-14445-branch-2.8.patch, HADOOP-14445.002.patch, > HADOOP-14445.003.patch, HADOOP-14445.004.patch, HADOOP-14445.05.patch, > HADOOP-14445.06.patch, HADOOP-14445.07.patch, HADOOP-14445.08.patch, > HADOOP-14445.09.patch, HADOOP-14445.10.patch, HADOOP-14445.11.patch, > HADOOP-14445.12.patch, HADOOP-14445.13.patch, > HADOOP-14445.branch-2.000.precommit.patch, > HADOOP-14445.branch-2.001.precommit.patch, HADOOP-14445.branch-2.01.patch, > HADOOP-14445.branch-2.02.patch, HADOOP-14445.branch-2.03.patch, > HADOOP-14445.branch-2.04.patch, HADOOP-14445.branch-2.05.patch, > HADOOP-14445.branch-2.06.patch, HADOOP-14445.branch-2.8.003.patch, > HADOOP-14445.branch-2.8.004.patch, HADOOP-14445.branch-2.8.005.patch, > HADOOP-14445.branch-2.8.006.patch, HADOOP-14445.branch-2.8.revert.patch, > HADOOP-14445.revert.patch > > > As discovered in HADOOP-14441, KMS HA using LoadBalancingKMSClientProvider do > not share delegation tokens. (a client uses KMS address/port as the key for > delegation token) > {code:title=DelegationTokenAuthenticatedURL#openConnection} > if (!creds.getAllTokens().isEmpty()) { > InetSocketAddress serviceAddr = new InetSocketAddress(url.getHost(), > url.getPort()); > Text service = SecurityUtil.buildTokenService(serviceAddr); > dToken = creds.getToken(service); > {code} > But KMS doc states: > {quote} > Delegation Tokens > Similar to HTTP authentication, KMS uses Hadoop Authentication for delegation > tokens too. > Under HA, A KMS instance must verify the delegation token given by another > KMS instance, by checking the shared secret used to sign the delegation > token. To do this, all KMS instances must be able to retrieve the shared > secret from ZooKeeper. > {quote} > We should either update the KMS documentation, or fix this code to share > delegation tokens. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14445) Delegation tokens are not shared between KMS instances
[ https://issues.apache.org/jira/browse/HADOOP-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466466#comment-16466466 ] Hudson commented on HADOOP-14445: - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14133 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/14133/]) Revert "HADOOP-14445. Delegation tokens are not shared between KMS (xiao: rev a3a1552c33d5650fbd0a702369fccd21b8c9d3e2) * (delete) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/key/kms/KMSTokenRenewer.java * (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/key/kms/KMSDelegationToken.java * (edit) hadoop-common-project/hadoop-common/src/main/resources/core-default.xml * (delete) hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestKMSUtil.java * (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/CommonConfigurationKeysPublic.java * (delete) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/key/kms/package-info.java * (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/token/delegation/web/DelegationTokenAuthenticatedURL.java * (edit) hadoop-common-project/hadoop-common/src/main/resources/META-INF/services/org.apache.hadoop.security.token.TokenIdentifier * (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/token/delegation/web/DelegationTokenAuthenticationHandler.java * (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/KMSUtil.java * (delete) hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/crypto/key/kms/TestKMSClientProvider.java * (edit) hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/crypto/key/kms/TestLoadBalancingKMSClientProvider.java * (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/key/kms/KMSClientProvider.java * (edit) hadoop-common-project/hadoop-kms/src/test/java/org/apache/hadoop/crypto/key/kms/server/TestKMS.java * (delete) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/key/kms/KMSLegacyTokenRenewer.java * (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/token/delegation/web/DelegationTokenAuthenticator.java * (edit) hadoop-common-project/hadoop-common/src/main/resources/META-INF/services/org.apache.hadoop.security.token.TokenRenewer * (delete) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/KMSUtilFaultInjector.java > Delegation tokens are not shared between KMS instances > -- > > Key: HADOOP-14445 > URL: https://issues.apache.org/jira/browse/HADOOP-14445 > Project: Hadoop Common > Issue Type: Bug > Components: kms >Affects Versions: 2.8.0, 3.0.0-alpha1 > Environment: CDH5.7.4, Kerberized, SSL, KMS-HA, at rest encryption >Reporter: Wei-Chiu Chuang >Assignee: Xiao Chen >Priority: Major > Fix For: 2.10.0, 2.8.4, 3.1.1, 2.9.2, 3.0.3 > > Attachments: HADOOP-14445-branch-2.8.002.patch, > HADOOP-14445-branch-2.8.patch, HADOOP-14445.002.patch, > HADOOP-14445.003.patch, HADOOP-14445.004.patch, HADOOP-14445.05.patch, > HADOOP-14445.06.patch, HADOOP-14445.07.patch, HADOOP-14445.08.patch, > HADOOP-14445.09.patch, HADOOP-14445.10.patch, HADOOP-14445.11.patch, > HADOOP-14445.12.patch, HADOOP-14445.13.patch, > HADOOP-14445.branch-2.000.precommit.patch, > HADOOP-14445.branch-2.001.precommit.patch, HADOOP-14445.branch-2.01.patch, > HADOOP-14445.branch-2.02.patch, HADOOP-14445.branch-2.03.patch, > HADOOP-14445.branch-2.04.patch, HADOOP-14445.branch-2.05.patch, > HADOOP-14445.branch-2.06.patch, HADOOP-14445.branch-2.8.003.patch, > HADOOP-14445.branch-2.8.004.patch, HADOOP-14445.branch-2.8.005.patch, > HADOOP-14445.branch-2.8.006.patch, HADOOP-14445.branch-2.8.revert.patch, > HADOOP-14445.revert.patch > > > As discovered in HADOOP-14441, KMS HA using LoadBalancingKMSClientProvider do > not share delegation tokens. (a client uses KMS address/port as the key for > delegation token) > {code:title=DelegationTokenAuthenticatedURL#openConnection} > if (!creds.getAllTokens().isEmpty()) { > InetSocketAddress serviceAddr = new InetSocketAddress(url.getHost(), > url.getPort()); > Text service = SecurityUtil.buildTokenService(serviceAddr); > dToken = creds.getToken(service); > {code} > But KMS doc states: > {quote} > Delegation Tokens > Similar to HTTP authentication, KMS uses Hadoop Authentication for delegation > tokens too. > Under HA, A KMS instance must verify the delegation token given by another > KMS instance, by checking the shared secret used to sign the delegation > token. To do this, all KMS instances must be able to
[jira] [Assigned] (HADOOP-15416) s3guard diff assert failure if source path not found
[ https://issues.apache.org/jira/browse/HADOOP-15416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Bota reassigned HADOOP-15416: --- Assignee: Gabor Bota > s3guard diff assert failure if source path not found > > > Key: HADOOP-15416 > URL: https://issues.apache.org/jira/browse/HADOOP-15416 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.1.0 > Environment: s3a with fault injection turned on >Reporter: Steve Loughran >Assignee: Gabor Bota >Priority: Minor > > Got an illegal argument exception trying to do a s3guard diff in a test run. > Underlying cause: directory in supplied s3a path didn't exist -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14445) Delegation tokens are not shared between KMS instances
[ https://issues.apache.org/jira/browse/HADOOP-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Chen updated HADOOP-14445: --- Fix Version/s: (was: 3.2.0) > Delegation tokens are not shared between KMS instances > -- > > Key: HADOOP-14445 > URL: https://issues.apache.org/jira/browse/HADOOP-14445 > Project: Hadoop Common > Issue Type: Bug > Components: kms >Affects Versions: 2.8.0, 3.0.0-alpha1 > Environment: CDH5.7.4, Kerberized, SSL, KMS-HA, at rest encryption >Reporter: Wei-Chiu Chuang >Assignee: Xiao Chen >Priority: Major > Fix For: 2.10.0, 2.8.4, 3.1.1, 2.9.2, 3.0.3 > > Attachments: HADOOP-14445-branch-2.8.002.patch, > HADOOP-14445-branch-2.8.patch, HADOOP-14445.002.patch, > HADOOP-14445.003.patch, HADOOP-14445.004.patch, HADOOP-14445.05.patch, > HADOOP-14445.06.patch, HADOOP-14445.07.patch, HADOOP-14445.08.patch, > HADOOP-14445.09.patch, HADOOP-14445.10.patch, HADOOP-14445.11.patch, > HADOOP-14445.12.patch, HADOOP-14445.13.patch, > HADOOP-14445.branch-2.000.precommit.patch, > HADOOP-14445.branch-2.001.precommit.patch, HADOOP-14445.branch-2.01.patch, > HADOOP-14445.branch-2.02.patch, HADOOP-14445.branch-2.03.patch, > HADOOP-14445.branch-2.04.patch, HADOOP-14445.branch-2.05.patch, > HADOOP-14445.branch-2.06.patch, HADOOP-14445.branch-2.8.003.patch, > HADOOP-14445.branch-2.8.004.patch, HADOOP-14445.branch-2.8.005.patch, > HADOOP-14445.branch-2.8.006.patch, HADOOP-14445.branch-2.8.revert.patch, > HADOOP-14445.revert.patch > > > As discovered in HADOOP-14441, KMS HA using LoadBalancingKMSClientProvider do > not share delegation tokens. (a client uses KMS address/port as the key for > delegation token) > {code:title=DelegationTokenAuthenticatedURL#openConnection} > if (!creds.getAllTokens().isEmpty()) { > InetSocketAddress serviceAddr = new InetSocketAddress(url.getHost(), > url.getPort()); > Text service = SecurityUtil.buildTokenService(serviceAddr); > dToken = creds.getToken(service); > {code} > But KMS doc states: > {quote} > Delegation Tokens > Similar to HTTP authentication, KMS uses Hadoop Authentication for delegation > tokens too. > Under HA, A KMS instance must verify the delegation token given by another > KMS instance, by checking the shared secret used to sign the delegation > token. To do this, all KMS instances must be able to retrieve the shared > secret from ZooKeeper. > {quote} > We should either update the KMS documentation, or fix this code to share > delegation tokens. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-14445) Delegation tokens are not shared between KMS instances
[ https://issues.apache.org/jira/browse/HADOOP-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466221#comment-16466221 ] Xiao Chen edited comment on HADOOP-14445 at 5/7/18 7:54 PM: Reopening Jira as I'm reverting those changes. Will remove fix versions as I proceed. Some minor conflicts due to HADOOP-14188, HADOOP-15390 and HADOOP-15313. Ran {{mvn clean test -DskipShade -Dmaven.javadoc.skip=true -Dtest=TestKMS*,TestDelegationTokenRenewer}} before pushing. Attached a trunk and a branch-2.8 version for reference - 3.x lines are similar to trunk, and 2.x lines similar to 2.8. (HDFS-13430 will also be reverted to accommodate this) was (Author: xiaochen): Reopening Jira as I'm reverting those changes. Will remove fix versions as I proceed. Some minor conflicts due to HADOOP-14188 and HADOOP-15313, so building repo and running touched tests before I push. > Delegation tokens are not shared between KMS instances > -- > > Key: HADOOP-14445 > URL: https://issues.apache.org/jira/browse/HADOOP-14445 > Project: Hadoop Common > Issue Type: Bug > Components: kms >Affects Versions: 2.8.0, 3.0.0-alpha1 > Environment: CDH5.7.4, Kerberized, SSL, KMS-HA, at rest encryption >Reporter: Wei-Chiu Chuang >Assignee: Xiao Chen >Priority: Major > Fix For: 2.10.0, 2.8.4, 3.2.0, 3.1.1, 2.9.2, 3.0.3 > > Attachments: HADOOP-14445-branch-2.8.002.patch, > HADOOP-14445-branch-2.8.patch, HADOOP-14445.002.patch, > HADOOP-14445.003.patch, HADOOP-14445.004.patch, HADOOP-14445.05.patch, > HADOOP-14445.06.patch, HADOOP-14445.07.patch, HADOOP-14445.08.patch, > HADOOP-14445.09.patch, HADOOP-14445.10.patch, HADOOP-14445.11.patch, > HADOOP-14445.12.patch, HADOOP-14445.13.patch, > HADOOP-14445.branch-2.000.precommit.patch, > HADOOP-14445.branch-2.001.precommit.patch, HADOOP-14445.branch-2.01.patch, > HADOOP-14445.branch-2.02.patch, HADOOP-14445.branch-2.03.patch, > HADOOP-14445.branch-2.04.patch, HADOOP-14445.branch-2.05.patch, > HADOOP-14445.branch-2.06.patch, HADOOP-14445.branch-2.8.003.patch, > HADOOP-14445.branch-2.8.004.patch, HADOOP-14445.branch-2.8.005.patch, > HADOOP-14445.branch-2.8.006.patch, HADOOP-14445.branch-2.8.revert.patch, > HADOOP-14445.revert.patch > > > As discovered in HADOOP-14441, KMS HA using LoadBalancingKMSClientProvider do > not share delegation tokens. (a client uses KMS address/port as the key for > delegation token) > {code:title=DelegationTokenAuthenticatedURL#openConnection} > if (!creds.getAllTokens().isEmpty()) { > InetSocketAddress serviceAddr = new InetSocketAddress(url.getHost(), > url.getPort()); > Text service = SecurityUtil.buildTokenService(serviceAddr); > dToken = creds.getToken(service); > {code} > But KMS doc states: > {quote} > Delegation Tokens > Similar to HTTP authentication, KMS uses Hadoop Authentication for delegation > tokens too. > Under HA, A KMS instance must verify the delegation token given by another > KMS instance, by checking the shared secret used to sign the delegation > token. To do this, all KMS instances must be able to retrieve the shared > secret from ZooKeeper. > {quote} > We should either update the KMS documentation, or fix this code to share > delegation tokens. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15420) s3guard ITestS3GuardToolLocal failures in diff tests
[ https://issues.apache.org/jira/browse/HADOOP-15420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Bota updated HADOOP-15420: Attachment: HADOOP-15420.002.patch > s3guard ITestS3GuardToolLocal failures in diff tests > > > Key: HADOOP-15420 > URL: https://issues.apache.org/jira/browse/HADOOP-15420 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Aaron Fabbri >Assignee: Gabor Bota >Priority: Minor > Attachments: HADOOP-15420.001.patch, HADOOP-15420.002.patch > > > Noticed this when testing the patch for HADOOP-13756. > > {code:java} > [ERROR] Failures: > [ERROR] > ITestS3GuardToolLocal>AbstractS3GuardToolTestBase.testPruneCommandCLI:221->AbstractS3GuardToolTestBase.testPruneCommand:201->AbstractS3GuardToolTestBase.assertMetastoreListingCount:214->Assert.assertEquals:555->Assert.assertEquals:118->Assert.failNotEquals:743->Assert.fail:88 > Pruned children count > [PathMetadata{fileStatus=S3AFileStatus{path=s3a://bucket-new/test/testPruneCommandCLI/stale; > isDirectory=false; length=100; replication=1; blocksize=512; > modification_time=1524798258286; access_time=0; owner=hdfs; group=hdfs; > permission=rw-rw-rw-; isSymlink=false; hasAcl=false; isEncrypted=false; > isErasureCoded=false} isEmptyDirectory=FALSE; isEmptyDirectory=UNKNOWN; > isDeleted=false}, > PathMetadata{fileStatus=S3AFileStatus{path=s3a://bucket-new/test/testPruneCommandCLI/fresh; > isDirectory=false; length=100; replication=1; blocksize=512; > modification_time=1524798262583; access_time=0; owner=hdfs; group=hdfs; > permission=rw-rw-rw-; isSymlink=false; hasAcl=false; isEncrypted=false; > isErasureCoded=false} isEmptyDirectory=FALSE; isEmptyDirectory=UNKNOWN; > isDeleted=false}] expected:<1> but was:<2>{code} > > Looking through the code, I'm noticing a couple of issues. > > 1. {{testDiffCommand()}} is in {{ITestS3GuardToolLocal}}, but it should > really be running for all MetadataStore implementations. Seems like it > should live in {{AbstractS3GuardToolTestBase}}. > 2. {{AbstractS3GuardToolTestBase#createFile()}} seems wrong. When > {{onMetadataStore}} is false, it does a {{ContractTestUtils.touch(file)}}, > but the fs is initialized with a MetadataStore present, so seem like the fs > will still put the file in the MetadataStore? > There are other tests which explicitly go around the MetadataStore by using > {{fs.setMetadataStore(nullMS)}}, e.g. ITestS3AInconsistency. We should do > something similar in {{AbstractS3GuardToolTestBase#createFile()}}, minding > any issues with parallel test runs. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15420) s3guard ITestS3GuardToolLocal failures in diff tests
[ https://issues.apache.org/jira/browse/HADOOP-15420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466387#comment-16466387 ] Gabor Bota commented on HADOOP-15420: - Fixed checkstyle issues > s3guard ITestS3GuardToolLocal failures in diff tests > > > Key: HADOOP-15420 > URL: https://issues.apache.org/jira/browse/HADOOP-15420 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Aaron Fabbri >Assignee: Gabor Bota >Priority: Minor > Attachments: HADOOP-15420.001.patch, HADOOP-15420.002.patch > > > Noticed this when testing the patch for HADOOP-13756. > > {code:java} > [ERROR] Failures: > [ERROR] > ITestS3GuardToolLocal>AbstractS3GuardToolTestBase.testPruneCommandCLI:221->AbstractS3GuardToolTestBase.testPruneCommand:201->AbstractS3GuardToolTestBase.assertMetastoreListingCount:214->Assert.assertEquals:555->Assert.assertEquals:118->Assert.failNotEquals:743->Assert.fail:88 > Pruned children count > [PathMetadata{fileStatus=S3AFileStatus{path=s3a://bucket-new/test/testPruneCommandCLI/stale; > isDirectory=false; length=100; replication=1; blocksize=512; > modification_time=1524798258286; access_time=0; owner=hdfs; group=hdfs; > permission=rw-rw-rw-; isSymlink=false; hasAcl=false; isEncrypted=false; > isErasureCoded=false} isEmptyDirectory=FALSE; isEmptyDirectory=UNKNOWN; > isDeleted=false}, > PathMetadata{fileStatus=S3AFileStatus{path=s3a://bucket-new/test/testPruneCommandCLI/fresh; > isDirectory=false; length=100; replication=1; blocksize=512; > modification_time=1524798262583; access_time=0; owner=hdfs; group=hdfs; > permission=rw-rw-rw-; isSymlink=false; hasAcl=false; isEncrypted=false; > isErasureCoded=false} isEmptyDirectory=FALSE; isEmptyDirectory=UNKNOWN; > isDeleted=false}] expected:<1> but was:<2>{code} > > Looking through the code, I'm noticing a couple of issues. > > 1. {{testDiffCommand()}} is in {{ITestS3GuardToolLocal}}, but it should > really be running for all MetadataStore implementations. Seems like it > should live in {{AbstractS3GuardToolTestBase}}. > 2. {{AbstractS3GuardToolTestBase#createFile()}} seems wrong. When > {{onMetadataStore}} is false, it does a {{ContractTestUtils.touch(file)}}, > but the fs is initialized with a MetadataStore present, so seem like the fs > will still put the file in the MetadataStore? > There are other tests which explicitly go around the MetadataStore by using > {{fs.setMetadataStore(nullMS)}}, e.g. ITestS3AInconsistency. We should do > something similar in {{AbstractS3GuardToolTestBase#createFile()}}, minding > any issues with parallel test runs. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15390) Yarn RM logs flooded by DelegationTokenRenewer trying to renew KMS tokens
[ https://issues.apache.org/jira/browse/HADOOP-15390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Chen updated HADOOP-15390: --- Fix Version/s: 2.10.0 > Yarn RM logs flooded by DelegationTokenRenewer trying to renew KMS tokens > - > > Key: HADOOP-15390 > URL: https://issues.apache.org/jira/browse/HADOOP-15390 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 2.8.0 >Reporter: Xiao Chen >Assignee: Xiao Chen >Priority: Critical > Fix For: 2.10.0, 2.8.4, 3.2.0, 3.1.1, 2.9.2, 3.0.3 > > Attachments: HADOOP-15390.01.patch, HADOOP-15390.02.patch > > > When looking at a recent issue with [~rkanter] and [~yufeigu], we found that > the RM log in a cluster was flooded by KMS token renewal errors below: > {noformat} > $ tail -9 hadoop-cmf-yarn-RESOURCEMANAGER.log > 2018-04-11 11:34:09,367 WARN > org.apache.hadoop.crypto.key.kms.KMSClientProvider$KMSTokenRenewer: > keyProvider null cannot renew dt. > 2018-04-11 11:34:09,367 INFO > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer: > Renewed delegation-token= [Kind: kms-dt, Service: KMSIP:16000, Ident: > (kms-dt owner=user, renewer=yarn, realUser=, issueDate=1522192283334, > maxDate=1522797083334, sequenceNumber=15108613, masterKeyId=2674);exp=0; > apps=[]], for [] > 2018-04-11 11:34:09,367 INFO > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer: > Renew Kind: kms-dt, Service: KMSIP:16000, Ident: (kms-dt owner=user, > renewer=yarn, realUser=, issueDate=1522192283334, maxDate=1522797083334, > sequenceNumber=15108613, masterKeyId=2674);exp=0; apps=[] in -1523446449367 > ms, appId = [] > ... > 2018-04-11 11:34:09,367 WARN > org.apache.hadoop.crypto.key.kms.KMSClientProvider$KMSTokenRenewer: > keyProvider null cannot renew dt. > 2018-04-11 11:34:09,367 INFO > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer: > Renewed delegation-token= [Kind: kms-dt, Service: KMSIP:16000, Ident: > (kms-dt owner=user, renewer=yarn, realUser=, issueDate=1522192283334, > maxDate=1522797083334, sequenceNumber=15108613, masterKeyId=2674);exp=0; > apps=[]], for [] > 2018-04-11 11:34:09,367 INFO > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer: > Renew Kind: kms-dt, Service: KMSIP:16000, Ident: (kms-dt owner=user, > renewer=yarn, realUser=, issueDate=1522192283334, maxDate=1522797083334, > sequenceNumber=15108613, masterKeyId=2674);exp=0; apps=[] in -1523446449367 > ms, appId = [] > {noformat} > Further inspection shows the KMS IP is from another cluster. The RM is before > HADOOP-14445, so needs to read from config. The config rightfully doesn't > have the other cluster's KMS configured. > Although HADOOP-14445 will make this a non-issue by creating the provider > from token service, we should fix 2 things here: > - KMS token renewer should throw instead of return 0. Returning 0 when not > able to renew shall be considered a bug in the renewer. > - Yarn RM's {{DelegationTokenRenewer}} service should validate the return and > not go into this busy loop. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15390) Yarn RM logs flooded by DelegationTokenRenewer trying to renew KMS tokens
[ https://issues.apache.org/jira/browse/HADOOP-15390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466369#comment-16466369 ] Xiao Chen commented on HADOOP-15390: Just found out this was missing from branch-2, cherry-picked there. > Yarn RM logs flooded by DelegationTokenRenewer trying to renew KMS tokens > - > > Key: HADOOP-15390 > URL: https://issues.apache.org/jira/browse/HADOOP-15390 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 2.8.0 >Reporter: Xiao Chen >Assignee: Xiao Chen >Priority: Critical > Fix For: 2.10.0, 2.8.4, 3.2.0, 3.1.1, 2.9.2, 3.0.3 > > Attachments: HADOOP-15390.01.patch, HADOOP-15390.02.patch > > > When looking at a recent issue with [~rkanter] and [~yufeigu], we found that > the RM log in a cluster was flooded by KMS token renewal errors below: > {noformat} > $ tail -9 hadoop-cmf-yarn-RESOURCEMANAGER.log > 2018-04-11 11:34:09,367 WARN > org.apache.hadoop.crypto.key.kms.KMSClientProvider$KMSTokenRenewer: > keyProvider null cannot renew dt. > 2018-04-11 11:34:09,367 INFO > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer: > Renewed delegation-token= [Kind: kms-dt, Service: KMSIP:16000, Ident: > (kms-dt owner=user, renewer=yarn, realUser=, issueDate=1522192283334, > maxDate=1522797083334, sequenceNumber=15108613, masterKeyId=2674);exp=0; > apps=[]], for [] > 2018-04-11 11:34:09,367 INFO > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer: > Renew Kind: kms-dt, Service: KMSIP:16000, Ident: (kms-dt owner=user, > renewer=yarn, realUser=, issueDate=1522192283334, maxDate=1522797083334, > sequenceNumber=15108613, masterKeyId=2674);exp=0; apps=[] in -1523446449367 > ms, appId = [] > ... > 2018-04-11 11:34:09,367 WARN > org.apache.hadoop.crypto.key.kms.KMSClientProvider$KMSTokenRenewer: > keyProvider null cannot renew dt. > 2018-04-11 11:34:09,367 INFO > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer: > Renewed delegation-token= [Kind: kms-dt, Service: KMSIP:16000, Ident: > (kms-dt owner=user, renewer=yarn, realUser=, issueDate=1522192283334, > maxDate=1522797083334, sequenceNumber=15108613, masterKeyId=2674);exp=0; > apps=[]], for [] > 2018-04-11 11:34:09,367 INFO > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer: > Renew Kind: kms-dt, Service: KMSIP:16000, Ident: (kms-dt owner=user, > renewer=yarn, realUser=, issueDate=1522192283334, maxDate=1522797083334, > sequenceNumber=15108613, masterKeyId=2674);exp=0; apps=[] in -1523446449367 > ms, appId = [] > {noformat} > Further inspection shows the KMS IP is from another cluster. The RM is before > HADOOP-14445, so needs to read from config. The config rightfully doesn't > have the other cluster's KMS configured. > Although HADOOP-14445 will make this a non-issue by creating the provider > from token service, we should fix 2 things here: > - KMS token renewer should throw instead of return 0. Returning 0 when not > able to renew shall be considered a bug in the renewer. > - Yarn RM's {{DelegationTokenRenewer}} service should validate the return and > not go into this busy loop. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15450) Avoid fsync storm triggered by DiskChecker and handle disk full situation
[ https://issues.apache.org/jira/browse/HADOOP-15450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated HADOOP-15450: Target Version/s: 2.8.4, 3.1.1, 2.9.2, 3.0.3 Fix Version/s: (was: 2.8.4) > Avoid fsync storm triggered by DiskChecker and handle disk full situation > - > > Key: HADOOP-15450 > URL: https://issues.apache.org/jira/browse/HADOOP-15450 > Project: Hadoop Common > Issue Type: Bug >Reporter: Kihwal Lee >Assignee: Arpit Agarwal >Priority: Blocker > > Fix disk checker issues reported by [~kihwal] in HADOOP-13738: > # When space is low, the os returns ENOSPC. Instead simply stop writing, the > drive is marked bad and replication happens. This make cluster-wide space > problem worse. If the number of "failed" drives exceeds the DFIP limit, the > datanode shuts down. > # There are non-hdfs users of DiskChecker, who use it proactively, not just > on failures. This was fine before, but now it incurs heavy I/O due to > introduction of fsync() in the code. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15450) Avoid fsync storm triggered by DiskChecker and handle disk full situation
[ https://issues.apache.org/jira/browse/HADOOP-15450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HADOOP-15450: Fix Version/s: 2.8.4 > Avoid fsync storm triggered by DiskChecker and handle disk full situation > - > > Key: HADOOP-15450 > URL: https://issues.apache.org/jira/browse/HADOOP-15450 > Project: Hadoop Common > Issue Type: Bug >Reporter: Kihwal Lee >Assignee: Arpit Agarwal >Priority: Blocker > Fix For: 2.8.4 > > > Fix disk checker issues reported by [~kihwal] in HADOOP-13738: > # When space is low, the os returns ENOSPC. Instead simply stop writing, the > drive is marked bad and replication happens. This make cluster-wide space > problem worse. If the number of "failed" drives exceeds the DFIP limit, the > datanode shuts down. > # There are non-hdfs users of DiskChecker, who use it proactively, not just > on failures. This was fine before, but now it incurs heavy I/O due to > introduction of fsync() in the code. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14444) New implementation of ftp and sftp filesystems
[ https://issues.apache.org/jira/browse/HADOOP-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Waldmann updated HADOOP-1: Attachment: (was: HADOOP-1.15.patch) > New implementation of ftp and sftp filesystems > -- > > Key: HADOOP-1 > URL: https://issues.apache.org/jira/browse/HADOOP-1 > Project: Hadoop Common > Issue Type: New Feature > Components: fs >Affects Versions: 2.8.0 >Reporter: Lukas Waldmann >Assignee: Lukas Waldmann >Priority: Major > Attachments: HADOOP-1.10.patch, HADOOP-1.11.patch, > HADOOP-1.12.patch, HADOOP-1.13.patch, HADOOP-1.14.patch, > HADOOP-1.15.patch, HADOOP-1.2.patch, HADOOP-1.3.patch, > HADOOP-1.4.patch, HADOOP-1.5.patch, HADOOP-1.6.patch, > HADOOP-1.7.patch, HADOOP-1.8.patch, HADOOP-1.9.patch, > HADOOP-1.patch > > > Current implementation of FTP and SFTP filesystems have severe limitations > and performance issues when dealing with high number of files. Mine patch > solve those issues and integrate both filesystems such a way that most of the > core functionality is common for both and therefore simplifying the > maintainability. > The core features: > * Support for HTTP/SOCKS proxies > * Support for passive FTP > * Support for explicit FTPS (SSL/TLS) > * Support of connection pooling - new connection is not created for every > single command but reused from the pool. > For huge number of files it shows order of magnitude performance improvement > over not pooled connections. > * Caching of directory trees. For ftp you always need to list whole > directory whenever you ask information about particular file. > Again for huge number of files it shows order of magnitude performance > improvement over not cached connections. > * Support of keep alive (NOOP) messages to avoid connection drops > * Support for Unix style or regexp wildcard glob - useful for listing a > particular files across whole directory tree > * Support for reestablishing broken ftp data transfers - can happen > surprisingly often > * Support for sftp private keys (including pass phrase) > * Support for keeping passwords, private keys and pass phrase in the jceks > key stores -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14444) New implementation of ftp and sftp filesystems
[ https://issues.apache.org/jira/browse/HADOOP-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Waldmann updated HADOOP-1: Attachment: HADOOP-1.15.patch > New implementation of ftp and sftp filesystems > -- > > Key: HADOOP-1 > URL: https://issues.apache.org/jira/browse/HADOOP-1 > Project: Hadoop Common > Issue Type: New Feature > Components: fs >Affects Versions: 2.8.0 >Reporter: Lukas Waldmann >Assignee: Lukas Waldmann >Priority: Major > Attachments: HADOOP-1.10.patch, HADOOP-1.11.patch, > HADOOP-1.12.patch, HADOOP-1.13.patch, HADOOP-1.14.patch, > HADOOP-1.15.patch, HADOOP-1.2.patch, HADOOP-1.3.patch, > HADOOP-1.4.patch, HADOOP-1.5.patch, HADOOP-1.6.patch, > HADOOP-1.7.patch, HADOOP-1.8.patch, HADOOP-1.9.patch, > HADOOP-1.patch > > > Current implementation of FTP and SFTP filesystems have severe limitations > and performance issues when dealing with high number of files. Mine patch > solve those issues and integrate both filesystems such a way that most of the > core functionality is common for both and therefore simplifying the > maintainability. > The core features: > * Support for HTTP/SOCKS proxies > * Support for passive FTP > * Support for explicit FTPS (SSL/TLS) > * Support of connection pooling - new connection is not created for every > single command but reused from the pool. > For huge number of files it shows order of magnitude performance improvement > over not pooled connections. > * Caching of directory trees. For ftp you always need to list whole > directory whenever you ask information about particular file. > Again for huge number of files it shows order of magnitude performance > improvement over not cached connections. > * Support of keep alive (NOOP) messages to avoid connection drops > * Support for Unix style or regexp wildcard glob - useful for listing a > particular files across whole directory tree > * Support for reestablishing broken ftp data transfers - can happen > surprisingly often > * Support for sftp private keys (including pass phrase) > * Support for keeping passwords, private keys and pass phrase in the jceks > key stores -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14445) Delegation tokens are not shared between KMS instances
[ https://issues.apache.org/jira/browse/HADOOP-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Chen updated HADOOP-14445: --- Status: Patch Available (was: Reopened) > Delegation tokens are not shared between KMS instances > -- > > Key: HADOOP-14445 > URL: https://issues.apache.org/jira/browse/HADOOP-14445 > Project: Hadoop Common > Issue Type: Bug > Components: kms >Affects Versions: 3.0.0-alpha1, 2.8.0 > Environment: CDH5.7.4, Kerberized, SSL, KMS-HA, at rest encryption >Reporter: Wei-Chiu Chuang >Assignee: Xiao Chen >Priority: Major > Fix For: 2.10.0, 2.8.4, 3.2.0, 3.1.1, 2.9.2, 3.0.3 > > Attachments: HADOOP-14445-branch-2.8.002.patch, > HADOOP-14445-branch-2.8.patch, HADOOP-14445.002.patch, > HADOOP-14445.003.patch, HADOOP-14445.004.patch, HADOOP-14445.05.patch, > HADOOP-14445.06.patch, HADOOP-14445.07.patch, HADOOP-14445.08.patch, > HADOOP-14445.09.patch, HADOOP-14445.10.patch, HADOOP-14445.11.patch, > HADOOP-14445.12.patch, HADOOP-14445.13.patch, > HADOOP-14445.branch-2.000.precommit.patch, > HADOOP-14445.branch-2.001.precommit.patch, HADOOP-14445.branch-2.01.patch, > HADOOP-14445.branch-2.02.patch, HADOOP-14445.branch-2.03.patch, > HADOOP-14445.branch-2.04.patch, HADOOP-14445.branch-2.05.patch, > HADOOP-14445.branch-2.06.patch, HADOOP-14445.branch-2.8.003.patch, > HADOOP-14445.branch-2.8.004.patch, HADOOP-14445.branch-2.8.005.patch, > HADOOP-14445.branch-2.8.006.patch, HADOOP-14445.branch-2.8.revert.patch, > HADOOP-14445.revert.patch > > > As discovered in HADOOP-14441, KMS HA using LoadBalancingKMSClientProvider do > not share delegation tokens. (a client uses KMS address/port as the key for > delegation token) > {code:title=DelegationTokenAuthenticatedURL#openConnection} > if (!creds.getAllTokens().isEmpty()) { > InetSocketAddress serviceAddr = new InetSocketAddress(url.getHost(), > url.getPort()); > Text service = SecurityUtil.buildTokenService(serviceAddr); > dToken = creds.getToken(service); > {code} > But KMS doc states: > {quote} > Delegation Tokens > Similar to HTTP authentication, KMS uses Hadoop Authentication for delegation > tokens too. > Under HA, A KMS instance must verify the delegation token given by another > KMS instance, by checking the shared secret used to sign the delegation > token. To do this, all KMS instances must be able to retrieve the shared > secret from ZooKeeper. > {quote} > We should either update the KMS documentation, or fix this code to share > delegation tokens. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14445) Delegation tokens are not shared between KMS instances
[ https://issues.apache.org/jira/browse/HADOOP-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Chen updated HADOOP-14445: --- Attachment: HADOOP-14445.branch-2.8.revert.patch > Delegation tokens are not shared between KMS instances > -- > > Key: HADOOP-14445 > URL: https://issues.apache.org/jira/browse/HADOOP-14445 > Project: Hadoop Common > Issue Type: Bug > Components: kms >Affects Versions: 2.8.0, 3.0.0-alpha1 > Environment: CDH5.7.4, Kerberized, SSL, KMS-HA, at rest encryption >Reporter: Wei-Chiu Chuang >Assignee: Xiao Chen >Priority: Major > Fix For: 2.10.0, 2.8.4, 3.2.0, 3.1.1, 2.9.2, 3.0.3 > > Attachments: HADOOP-14445-branch-2.8.002.patch, > HADOOP-14445-branch-2.8.patch, HADOOP-14445.002.patch, > HADOOP-14445.003.patch, HADOOP-14445.004.patch, HADOOP-14445.05.patch, > HADOOP-14445.06.patch, HADOOP-14445.07.patch, HADOOP-14445.08.patch, > HADOOP-14445.09.patch, HADOOP-14445.10.patch, HADOOP-14445.11.patch, > HADOOP-14445.12.patch, HADOOP-14445.13.patch, > HADOOP-14445.branch-2.000.precommit.patch, > HADOOP-14445.branch-2.001.precommit.patch, HADOOP-14445.branch-2.01.patch, > HADOOP-14445.branch-2.02.patch, HADOOP-14445.branch-2.03.patch, > HADOOP-14445.branch-2.04.patch, HADOOP-14445.branch-2.05.patch, > HADOOP-14445.branch-2.06.patch, HADOOP-14445.branch-2.8.003.patch, > HADOOP-14445.branch-2.8.004.patch, HADOOP-14445.branch-2.8.005.patch, > HADOOP-14445.branch-2.8.006.patch, HADOOP-14445.branch-2.8.revert.patch, > HADOOP-14445.revert.patch > > > As discovered in HADOOP-14441, KMS HA using LoadBalancingKMSClientProvider do > not share delegation tokens. (a client uses KMS address/port as the key for > delegation token) > {code:title=DelegationTokenAuthenticatedURL#openConnection} > if (!creds.getAllTokens().isEmpty()) { > InetSocketAddress serviceAddr = new InetSocketAddress(url.getHost(), > url.getPort()); > Text service = SecurityUtil.buildTokenService(serviceAddr); > dToken = creds.getToken(service); > {code} > But KMS doc states: > {quote} > Delegation Tokens > Similar to HTTP authentication, KMS uses Hadoop Authentication for delegation > tokens too. > Under HA, A KMS instance must verify the delegation token given by another > KMS instance, by checking the shared secret used to sign the delegation > token. To do this, all KMS instances must be able to retrieve the shared > secret from ZooKeeper. > {quote} > We should either update the KMS documentation, or fix this code to share > delegation tokens. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14445) Delegation tokens are not shared between KMS instances
[ https://issues.apache.org/jira/browse/HADOOP-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Chen updated HADOOP-14445: --- Attachment: HADOOP-14445.revert.patch > Delegation tokens are not shared between KMS instances > -- > > Key: HADOOP-14445 > URL: https://issues.apache.org/jira/browse/HADOOP-14445 > Project: Hadoop Common > Issue Type: Bug > Components: kms >Affects Versions: 2.8.0, 3.0.0-alpha1 > Environment: CDH5.7.4, Kerberized, SSL, KMS-HA, at rest encryption >Reporter: Wei-Chiu Chuang >Assignee: Xiao Chen >Priority: Major > Fix For: 2.10.0, 2.8.4, 3.2.0, 3.1.1, 2.9.2, 3.0.3 > > Attachments: HADOOP-14445-branch-2.8.002.patch, > HADOOP-14445-branch-2.8.patch, HADOOP-14445.002.patch, > HADOOP-14445.003.patch, HADOOP-14445.004.patch, HADOOP-14445.05.patch, > HADOOP-14445.06.patch, HADOOP-14445.07.patch, HADOOP-14445.08.patch, > HADOOP-14445.09.patch, HADOOP-14445.10.patch, HADOOP-14445.11.patch, > HADOOP-14445.12.patch, HADOOP-14445.13.patch, > HADOOP-14445.branch-2.000.precommit.patch, > HADOOP-14445.branch-2.001.precommit.patch, HADOOP-14445.branch-2.01.patch, > HADOOP-14445.branch-2.02.patch, HADOOP-14445.branch-2.03.patch, > HADOOP-14445.branch-2.04.patch, HADOOP-14445.branch-2.05.patch, > HADOOP-14445.branch-2.06.patch, HADOOP-14445.branch-2.8.003.patch, > HADOOP-14445.branch-2.8.004.patch, HADOOP-14445.branch-2.8.005.patch, > HADOOP-14445.branch-2.8.006.patch, HADOOP-14445.revert.patch > > > As discovered in HADOOP-14441, KMS HA using LoadBalancingKMSClientProvider do > not share delegation tokens. (a client uses KMS address/port as the key for > delegation token) > {code:title=DelegationTokenAuthenticatedURL#openConnection} > if (!creds.getAllTokens().isEmpty()) { > InetSocketAddress serviceAddr = new InetSocketAddress(url.getHost(), > url.getPort()); > Text service = SecurityUtil.buildTokenService(serviceAddr); > dToken = creds.getToken(service); > {code} > But KMS doc states: > {quote} > Delegation Tokens > Similar to HTTP authentication, KMS uses Hadoop Authentication for delegation > tokens too. > Under HA, A KMS instance must verify the delegation token given by another > KMS instance, by checking the shared secret used to sign the delegation > token. To do this, all KMS instances must be able to retrieve the shared > secret from ZooKeeper. > {quote} > We should either update the KMS documentation, or fix this code to share > delegation tokens. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15450) Avoid fsync storm triggered by DiskChecker and handle disk full situation
[ https://issues.apache.org/jira/browse/HADOOP-15450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HADOOP-15450: Priority: Blocker (was: Major) > Avoid fsync storm triggered by DiskChecker and handle disk full situation > - > > Key: HADOOP-15450 > URL: https://issues.apache.org/jira/browse/HADOOP-15450 > Project: Hadoop Common > Issue Type: Bug >Reporter: Kihwal Lee >Assignee: Arpit Agarwal >Priority: Blocker > > Fix disk checker issues reported by [~kihwal] in HADOOP-13738: > # When space is low, the os returns ENOSPC. Instead simply stop writing, the > drive is marked bad and replication happens. This make cluster-wide space > problem worse. If the number of "failed" drives exceeds the DFIP limit, the > datanode shuts down. > # There are non-hdfs users of DiskChecker, who use it proactively, not just > on failures. This was fine before, but now it incurs heavy I/O due to > introduction of fsync() in the code. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15446) WASB: PageBlobInputStream.skip breaks HBASE replication
[ https://issues.apache.org/jira/browse/HADOOP-15446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466250#comment-16466250 ] Duo Xu commented on HADOOP-15446: - [~tmarquardt] & [~ste...@apache.org] Could we backport this to branch-2? Thanks! > WASB: PageBlobInputStream.skip breaks HBASE replication > --- > > Key: HADOOP-15446 > URL: https://issues.apache.org/jira/browse/HADOOP-15446 > Project: Hadoop Common > Issue Type: Bug > Components: fs/azure >Affects Versions: 2.9.0, 3.0.2 >Reporter: Thomas Marquardt >Assignee: Thomas Marquardt >Priority: Major > Attachments: HADOOP-15446-001.patch, HADOOP-15446-002.patch, > HADOOP-15446-003.patch > > > Page Blobs are primarily used by HBASE. HBASE replication, which apparently > has not been used with WASB until recently, performs non-sequential reads on > log files using PageBlobInputStream. There are bugs in this stream > implementation which prevent skip and seek from working properly, and > eventually the stream state becomes corrupt and unusable. > I believe this bug affects all releases of WASB/HADOOP. It appears to be a > day-0 bug in PageBlobInputStream. There were similar bugs opened in the past > (HADOOP-15042) but the issue was not properly fixed, and no test coverage was > added. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Reopened] (HADOOP-14445) Delegation tokens are not shared between KMS instances
[ https://issues.apache.org/jira/browse/HADOOP-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Chen reopened HADOOP-14445: Reopening Jira as I'm reverting those changes. Will remove fix versions as I proceed. Some minor conflicts due to HADOOP-14188 and HADOOP-15313, so building repo and running touched tests before I push. > Delegation tokens are not shared between KMS instances > -- > > Key: HADOOP-14445 > URL: https://issues.apache.org/jira/browse/HADOOP-14445 > Project: Hadoop Common > Issue Type: Bug > Components: kms >Affects Versions: 2.8.0, 3.0.0-alpha1 > Environment: CDH5.7.4, Kerberized, SSL, KMS-HA, at rest encryption >Reporter: Wei-Chiu Chuang >Assignee: Xiao Chen >Priority: Major > Fix For: 2.10.0, 2.8.4, 3.2.0, 3.1.1, 2.9.2, 3.0.3 > > Attachments: HADOOP-14445-branch-2.8.002.patch, > HADOOP-14445-branch-2.8.patch, HADOOP-14445.002.patch, > HADOOP-14445.003.patch, HADOOP-14445.004.patch, HADOOP-14445.05.patch, > HADOOP-14445.06.patch, HADOOP-14445.07.patch, HADOOP-14445.08.patch, > HADOOP-14445.09.patch, HADOOP-14445.10.patch, HADOOP-14445.11.patch, > HADOOP-14445.12.patch, HADOOP-14445.13.patch, > HADOOP-14445.branch-2.000.precommit.patch, > HADOOP-14445.branch-2.001.precommit.patch, HADOOP-14445.branch-2.01.patch, > HADOOP-14445.branch-2.02.patch, HADOOP-14445.branch-2.03.patch, > HADOOP-14445.branch-2.04.patch, HADOOP-14445.branch-2.05.patch, > HADOOP-14445.branch-2.06.patch, HADOOP-14445.branch-2.8.003.patch, > HADOOP-14445.branch-2.8.004.patch, HADOOP-14445.branch-2.8.005.patch, > HADOOP-14445.branch-2.8.006.patch > > > As discovered in HADOOP-14441, KMS HA using LoadBalancingKMSClientProvider do > not share delegation tokens. (a client uses KMS address/port as the key for > delegation token) > {code:title=DelegationTokenAuthenticatedURL#openConnection} > if (!creds.getAllTokens().isEmpty()) { > InetSocketAddress serviceAddr = new InetSocketAddress(url.getHost(), > url.getPort()); > Text service = SecurityUtil.buildTokenService(serviceAddr); > dToken = creds.getToken(service); > {code} > But KMS doc states: > {quote} > Delegation Tokens > Similar to HTTP authentication, KMS uses Hadoop Authentication for delegation > tokens too. > Under HA, A KMS instance must verify the delegation token given by another > KMS instance, by checking the shared secret used to sign the delegation > token. To do this, all KMS instances must be able to retrieve the shared > secret from ZooKeeper. > {quote} > We should either update the KMS documentation, or fix this code to share > delegation tokens. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15450) Avoid fsync storm triggered by DiskChecker and handle disk full situation
[ https://issues.apache.org/jira/browse/HADOOP-15450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HADOOP-15450: --- Reporter: Kihwal Lee (was: Arpit Agarwal) > Avoid fsync storm triggered by DiskChecker and handle disk full situation > - > > Key: HADOOP-15450 > URL: https://issues.apache.org/jira/browse/HADOOP-15450 > Project: Hadoop Common > Issue Type: Bug >Reporter: Kihwal Lee >Assignee: Arpit Agarwal >Priority: Major > > Fix disk checker issues reported by [~kihwal] in HADOOP-13738: > # When space is low, the os returns ENOSPC. Instead simply stop writing, the > drive is marked bad and replication happens. This make cluster-wide space > problem worse. If the number of "failed" drives exceeds the DFIP limit, the > datanode shuts down. > # There are non-hdfs users of DiskChecker, who use it proactively, not just > on failures. This was fine before, but now it incurs heavy I/O due to > introduction of fsync() in the code. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13738) DiskChecker should perform some disk IO
[ https://issues.apache.org/jira/browse/HADOOP-13738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466171#comment-16466171 ] Arpit Agarwal commented on HADOOP-13738: Filed HADOOP-15450. > DiskChecker should perform some disk IO > --- > > Key: HADOOP-13738 > URL: https://issues.apache.org/jira/browse/HADOOP-13738 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Arpit Agarwal >Assignee: Arpit Agarwal >Priority: Major > Fix For: 2.9.0, 3.0.0-alpha2, 2.8.4 > > Attachments: HADOOP-13738-branch-2.8-06.patch, HADOOP-13738.01.patch, > HADOOP-13738.02.patch, HADOOP-13738.03.patch, HADOOP-13738.04.patch, > HADOOP-13738.05.patch > > > DiskChecker can fail to detect total disk/controller failures indefinitely. > We have seen this in real clusters. DiskChecker performs simple > permissions-based checks on directories which do not guarantee that any disk > IO will be attempted. > A simple improvement is to write some data and flush it to the disk. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-15450) Avoid fsync storm triggered by DiskChecker and handle disk full situation
Arpit Agarwal created HADOOP-15450: -- Summary: Avoid fsync storm triggered by DiskChecker and handle disk full situation Key: HADOOP-15450 URL: https://issues.apache.org/jira/browse/HADOOP-15450 Project: Hadoop Common Issue Type: Bug Reporter: Arpit Agarwal Assignee: Arpit Agarwal Fix disk checker issues reported by [~kihwal] in HADOOP-13738: 1. When space is low, the os returns ENOSPC. Instead simply stop writing, the drive is marked bad and replication happens. This make cluster-wide space problem worse. If the number of "failed" drives exceeds the DFIP limit, the datanode shuts down. 1. There are non-hdfs users of DiskChecker, who use it proactively, not just on failures. This was fine before, but now it incurs heavy I/O due to introduction of fsync() in the code. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15450) Avoid fsync storm triggered by DiskChecker and handle disk full situation
[ https://issues.apache.org/jira/browse/HADOOP-15450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HADOOP-15450: --- Description: Fix disk checker issues reported by [~kihwal] in HADOOP-13738: # When space is low, the os returns ENOSPC. Instead simply stop writing, the drive is marked bad and replication happens. This make cluster-wide space problem worse. If the number of "failed" drives exceeds the DFIP limit, the datanode shuts down. # There are non-hdfs users of DiskChecker, who use it proactively, not just on failures. This was fine before, but now it incurs heavy I/O due to introduction of fsync() in the code. was: Fix disk checker issues reported by [~kihwal] in HADOOP-13738: 1. When space is low, the os returns ENOSPC. Instead simply stop writing, the drive is marked bad and replication happens. This make cluster-wide space problem worse. If the number of "failed" drives exceeds the DFIP limit, the datanode shuts down. 1. There are non-hdfs users of DiskChecker, who use it proactively, not just on failures. This was fine before, but now it incurs heavy I/O due to introduction of fsync() in the code. > Avoid fsync storm triggered by DiskChecker and handle disk full situation > - > > Key: HADOOP-15450 > URL: https://issues.apache.org/jira/browse/HADOOP-15450 > Project: Hadoop Common > Issue Type: Bug >Reporter: Arpit Agarwal >Assignee: Arpit Agarwal >Priority: Major > > Fix disk checker issues reported by [~kihwal] in HADOOP-13738: > # When space is low, the os returns ENOSPC. Instead simply stop writing, the > drive is marked bad and replication happens. This make cluster-wide space > problem worse. If the number of "failed" drives exceeds the DFIP limit, the > datanode shuts down. > # There are non-hdfs users of DiskChecker, who use it proactively, not just > on failures. This was fine before, but now it incurs heavy I/O due to > introduction of fsync() in the code. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13738) DiskChecker should perform some disk IO
[ https://issues.apache.org/jira/browse/HADOOP-13738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466159#comment-16466159 ] Arpit Agarwal commented on HADOOP-13738: [~daryn], [~kihwal], we can avoid the fsync storm by using disk IO only for HDFS-triggered disk checks. These are already throttled to at most once per 15 minutes. The other issue you reported - disk full - can also be handled separately. I'll file a follow up Jira and post a patch this week. > DiskChecker should perform some disk IO > --- > > Key: HADOOP-13738 > URL: https://issues.apache.org/jira/browse/HADOOP-13738 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Arpit Agarwal >Assignee: Arpit Agarwal >Priority: Major > Fix For: 2.9.0, 3.0.0-alpha2, 2.8.4 > > Attachments: HADOOP-13738-branch-2.8-06.patch, HADOOP-13738.01.patch, > HADOOP-13738.02.patch, HADOOP-13738.03.patch, HADOOP-13738.04.patch, > HADOOP-13738.05.patch > > > DiskChecker can fail to detect total disk/controller failures indefinitely. > We have seen this in real clusters. DiskChecker performs simple > permissions-based checks on directories which do not guarantee that any disk > IO will be attempted. > A simple improvement is to write some data and flush it to the disk. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14444) New implementation of ftp and sftp filesystems
[ https://issues.apache.org/jira/browse/HADOOP-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466155#comment-16466155 ] genericqa commented on HADOOP-1: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 27s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 35 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 23s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 33s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 33m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 3m 47s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 4m 52s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 58s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-project hadoop-tools {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 49s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 15s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 25s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 36m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 36m 17s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 3m 45s{color} | {color:orange} root: The patch generated 1 new + 7 unchanged - 0 fixed = 8 total (was 7) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 5m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} shellcheck {color} | {color:green} 0m 0s{color} | {color:green} There were no new shellcheck issues. {color} | | {color:green}+1{color} | {color:green} shelldocs {color} | {color:green} 0m 29s{color} | {color:green} There were no new shelldocs issues. {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 7s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 39s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-project hadoop-tools {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 24s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 38s{color} | {color:green} hadoop-project in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 12m 13s{color} | {color:green} hadoop-common in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 10m 48s{color} | {color:red} hadoop-ftp in the patch failed. {color} | | {color:red}-1{color} | {color:red} unit {color} |
[jira] [Commented] (HADOOP-15449) Frequent Namenode Flipover affecting user Jobs.
[ https://issues.apache.org/jira/browse/HADOOP-15449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466119#comment-16466119 ] Karthik Palanisamy commented on HADOOP-15449: - [~arpitagarwal] Yes, it should re-connect. But Zookeeper already expires the session because of the timeout (no heartbeat been received from ZK client within session timeout). In this case, Znode lock could have acquired by another ZKFC controller which eventually failover to us. > Frequent Namenode Flipover affecting user Jobs. > --- > > Key: HADOOP-15449 > URL: https://issues.apache.org/jira/browse/HADOOP-15449 > Project: Hadoop Common > Issue Type: Wish > Components: common >Affects Versions: 2.7.4 >Reporter: Karthik Palanisamy >Assignee: Karthik Palanisamy >Priority: Critical > Attachments: HADOOP-15449.patch > > > We observed from several users regarding Namenode flip-over is due to either > zookeeper disk slowness (higher fsync cost) or network issue. We would need > to avoid flip-over issue to some extent by increasing HA session timeout, > ha.zookeeper.session-timeout.ms. > Default value is 5000 ms, seems very low in any production environment. I > would suggest 1 ms as default session timeout. > > {code} > .. > 2018-05-04 03:54:36,848 INFO zookeeper.ClientCnxn > (ClientCnxn.java:run(1140)) - Client session timed out, have not heard from > server in 4689ms for sessionid 0x260e24bac500aa3, closing socket connection > and attempting reconnect > 2018-05-04 03:56:49,088 INFO zookeeper.ClientCnxn > (ClientCnxn.java:run(1140)) - Client session timed out, have not heard from > server in 3981ms for sessionid 0x360fd152b8700fe, closing socket connection > and attempting reconnect > .. > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15449) ZK performance issues causing frequent Namenode failover
[ https://issues.apache.org/jira/browse/HADOOP-15449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HADOOP-15449: --- Summary: ZK performance issues causing frequent Namenode failover (was: Frequent Namenode Flipover affecting user Jobs.) > ZK performance issues causing frequent Namenode failover > - > > Key: HADOOP-15449 > URL: https://issues.apache.org/jira/browse/HADOOP-15449 > Project: Hadoop Common > Issue Type: Wish > Components: common >Affects Versions: 2.7.4 >Reporter: Karthik Palanisamy >Assignee: Karthik Palanisamy >Priority: Critical > Attachments: HADOOP-15449.patch > > > We observed from several users regarding Namenode flip-over is due to either > zookeeper disk slowness (higher fsync cost) or network issue. We would need > to avoid flip-over issue to some extent by increasing HA session timeout, > ha.zookeeper.session-timeout.ms. > Default value is 5000 ms, seems very low in any production environment. I > would suggest 1 ms as default session timeout. > > {code} > .. > 2018-05-04 03:54:36,848 INFO zookeeper.ClientCnxn > (ClientCnxn.java:run(1140)) - Client session timed out, have not heard from > server in 4689ms for sessionid 0x260e24bac500aa3, closing socket connection > and attempting reconnect > 2018-05-04 03:56:49,088 INFO zookeeper.ClientCnxn > (ClientCnxn.java:run(1140)) - Client session timed out, have not heard from > server in 3981ms for sessionid 0x360fd152b8700fe, closing socket connection > and attempting reconnect > .. > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13738) DiskChecker should perform some disk IO
[ https://issues.apache.org/jira/browse/HADOOP-13738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466106#comment-16466106 ] Daryn Sharp commented on HADOOP-13738: -- This jira was not thought out. It's causing problems on the clusters where it's deployed. # We had a cluster lose 10% of nodes due to this patch. A few nodes filled up, they went dead, and created a domino effect that caused nodes to go dead until intervention. # Jobs may cause severe performance degradation from sync storms during a sort phase because local dir allocator calls checkDisk. The biggest risk is a runaway job may fill disks and cause a cluster to implode. [~arpitagarwal], do you want me to file another jira for immediate revert? Or do you want to reopen this one? > DiskChecker should perform some disk IO > --- > > Key: HADOOP-13738 > URL: https://issues.apache.org/jira/browse/HADOOP-13738 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Arpit Agarwal >Assignee: Arpit Agarwal >Priority: Major > Fix For: 2.9.0, 3.0.0-alpha2, 2.8.4 > > Attachments: HADOOP-13738-branch-2.8-06.patch, HADOOP-13738.01.patch, > HADOOP-13738.02.patch, HADOOP-13738.03.patch, HADOOP-13738.04.patch, > HADOOP-13738.05.patch > > > DiskChecker can fail to detect total disk/controller failures indefinitely. > We have seen this in real clusters. DiskChecker performs simple > permissions-based checks on directories which do not guarantee that any disk > IO will be attempted. > A simple improvement is to write some data and flush it to the disk. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15449) Frequent Namenode Flipover affecting user Jobs.
[ https://issues.apache.org/jira/browse/HADOOP-15449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16465968#comment-16465968 ] Arpit Agarwal commented on HADOOP-15449: Thanks for reporting this [~kpalanisamy]. 5 seconds is rather aggressive. +1 for increasing it to 10. Another potential issue is why the ZKFCs are triggering failover when they reconnect to ZooKeeper. That should also be addressed separately. > Frequent Namenode Flipover affecting user Jobs. > --- > > Key: HADOOP-15449 > URL: https://issues.apache.org/jira/browse/HADOOP-15449 > Project: Hadoop Common > Issue Type: Wish > Components: common >Affects Versions: 2.7.4 >Reporter: Karthik Palanisamy >Assignee: Karthik Palanisamy >Priority: Critical > Attachments: HADOOP-15449.patch > > > We observed from several users regarding Namenode flip-over is due to either > zookeeper disk slowness (higher fsync cost) or network issue. We would need > to avoid flip-over issue to some extent by increasing HA session timeout, > ha.zookeeper.session-timeout.ms. > Default value is 5000 ms, seems very low in any production environment. I > would suggest 1 ms as default session timeout. > > {code} > .. > 2018-05-04 03:54:36,848 INFO zookeeper.ClientCnxn > (ClientCnxn.java:run(1140)) - Client session timed out, have not heard from > server in 4689ms for sessionid 0x260e24bac500aa3, closing socket connection > and attempting reconnect > 2018-05-04 03:56:49,088 INFO zookeeper.ClientCnxn > (ClientCnxn.java:run(1140)) - Client session timed out, have not heard from > server in 3981ms for sessionid 0x360fd152b8700fe, closing socket connection > and attempting reconnect > .. > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15449) Frequent Namenode Flipover affecting user Jobs.
[ https://issues.apache.org/jira/browse/HADOOP-15449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HADOOP-15449: --- Status: Patch Available (was: Open) > Frequent Namenode Flipover affecting user Jobs. > --- > > Key: HADOOP-15449 > URL: https://issues.apache.org/jira/browse/HADOOP-15449 > Project: Hadoop Common > Issue Type: Wish > Components: common >Affects Versions: 2.7.4 >Reporter: Karthik Palanisamy >Assignee: Karthik Palanisamy >Priority: Critical > Attachments: HADOOP-15449.patch > > > We observed from several users regarding Namenode flip-over is due to either > zookeeper disk slowness (higher fsync cost) or network issue. We would need > to avoid flip-over issue to some extent by increasing HA session timeout, > ha.zookeeper.session-timeout.ms. > Default value is 5000 ms, seems very low in any production environment. I > would suggest 1 ms as default session timeout. > > {code} > .. > 2018-05-04 03:54:36,848 INFO zookeeper.ClientCnxn > (ClientCnxn.java:run(1140)) - Client session timed out, have not heard from > server in 4689ms for sessionid 0x260e24bac500aa3, closing socket connection > and attempting reconnect > 2018-05-04 03:56:49,088 INFO zookeeper.ClientCnxn > (ClientCnxn.java:run(1140)) - Client session timed out, have not heard from > server in 3981ms for sessionid 0x360fd152b8700fe, closing socket connection > and attempting reconnect > .. > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Assigned] (HADOOP-15449) Frequent Namenode Flipover affecting user Jobs.
[ https://issues.apache.org/jira/browse/HADOOP-15449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal reassigned HADOOP-15449: -- Assignee: Karthik Palanisamy > Frequent Namenode Flipover affecting user Jobs. > --- > > Key: HADOOP-15449 > URL: https://issues.apache.org/jira/browse/HADOOP-15449 > Project: Hadoop Common > Issue Type: Wish > Components: common >Affects Versions: 2.7.4 >Reporter: Karthik Palanisamy >Assignee: Karthik Palanisamy >Priority: Critical > Attachments: HADOOP-15449.patch > > > We observed from several users regarding Namenode flip-over is due to either > zookeeper disk slowness (higher fsync cost) or network issue. We would need > to avoid flip-over issue to some extent by increasing HA session timeout, > ha.zookeeper.session-timeout.ms. > Default value is 5000 ms, seems very low in any production environment. I > would suggest 1 ms as default session timeout. > > {code} > .. > 2018-05-04 03:54:36,848 INFO zookeeper.ClientCnxn > (ClientCnxn.java:run(1140)) - Client session timed out, have not heard from > server in 4689ms for sessionid 0x260e24bac500aa3, closing socket connection > and attempting reconnect > 2018-05-04 03:56:49,088 INFO zookeeper.ClientCnxn > (ClientCnxn.java:run(1140)) - Client session timed out, have not heard from > server in 3981ms for sessionid 0x360fd152b8700fe, closing socket connection > and attempting reconnect > .. > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-12896) kdiag to add a --DEFAULTREALM option
[ https://issues.apache.org/jira/browse/HADOOP-12896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16465877#comment-16465877 ] SammiChen edited comment on HADOOP-12896 at 5/7/18 12:56 PM: - Remove the fix version field since it's not fixed actually. was (Author: sammi): Remov the fix version field since it's not fixed actually. > kdiag to add a --DEFAULTREALM option > - > > Key: HADOOP-12896 > URL: https://issues.apache.org/jira/browse/HADOOP-12896 > Project: Hadoop Common > Issue Type: Sub-task > Components: security >Affects Versions: 2.8.0 >Reporter: Steve Loughran >Priority: Minor > > * kdiag to add a --DEFAULTREALM option to say not having a default realm is > an error. > * if this flag is unset, when dumping the credential cache, if there is any > entry without a realm, *and there is no default realm*, diagnostics to fail > with an error. Hadoop will fail in this situation; kdiag should detect and > report -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-12896) kdiag to add a --DEFAULTREALM option
[ https://issues.apache.org/jira/browse/HADOOP-12896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16465877#comment-16465877 ] SammiChen commented on HADOOP-12896: Remov the fix version field since it's not fixed actually. > kdiag to add a --DEFAULTREALM option > - > > Key: HADOOP-12896 > URL: https://issues.apache.org/jira/browse/HADOOP-12896 > Project: Hadoop Common > Issue Type: Sub-task > Components: security >Affects Versions: 2.8.0 >Reporter: Steve Loughran >Priority: Minor > > * kdiag to add a --DEFAULTREALM option to say not having a default realm is > an error. > * if this flag is unset, when dumping the credential cache, if there is any > entry without a realm, *and there is no default realm*, diagnostics to fail > with an error. Hadoop will fail in this situation; kdiag should detect and > report -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-12896) kdiag to add a --DEFAULTREALM option
[ https://issues.apache.org/jira/browse/HADOOP-12896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SammiChen updated HADOOP-12896: --- Fix Version/s: (was: 2.9.1) > kdiag to add a --DEFAULTREALM option > - > > Key: HADOOP-12896 > URL: https://issues.apache.org/jira/browse/HADOOP-12896 > Project: Hadoop Common > Issue Type: Sub-task > Components: security >Affects Versions: 2.8.0 >Reporter: Steve Loughran >Priority: Minor > > * kdiag to add a --DEFAULTREALM option to say not having a default realm is > an error. > * if this flag is unset, when dumping the credential cache, if there is any > entry without a realm, *and there is no default realm*, diagnostics to fail > with an error. Hadoop will fail in this situation; kdiag should detect and > report -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14444) New implementation of ftp and sftp filesystems
[ https://issues.apache.org/jira/browse/HADOOP-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Waldmann updated HADOOP-1: Attachment: HADOOP-1.15.patch > New implementation of ftp and sftp filesystems > -- > > Key: HADOOP-1 > URL: https://issues.apache.org/jira/browse/HADOOP-1 > Project: Hadoop Common > Issue Type: New Feature > Components: fs >Affects Versions: 2.8.0 >Reporter: Lukas Waldmann >Assignee: Lukas Waldmann >Priority: Major > Attachments: HADOOP-1.10.patch, HADOOP-1.11.patch, > HADOOP-1.12.patch, HADOOP-1.13.patch, HADOOP-1.14.patch, > HADOOP-1.15.patch, HADOOP-1.2.patch, HADOOP-1.3.patch, > HADOOP-1.4.patch, HADOOP-1.5.patch, HADOOP-1.6.patch, > HADOOP-1.7.patch, HADOOP-1.8.patch, HADOOP-1.9.patch, > HADOOP-1.patch > > > Current implementation of FTP and SFTP filesystems have severe limitations > and performance issues when dealing with high number of files. Mine patch > solve those issues and integrate both filesystems such a way that most of the > core functionality is common for both and therefore simplifying the > maintainability. > The core features: > * Support for HTTP/SOCKS proxies > * Support for passive FTP > * Support for explicit FTPS (SSL/TLS) > * Support of connection pooling - new connection is not created for every > single command but reused from the pool. > For huge number of files it shows order of magnitude performance improvement > over not pooled connections. > * Caching of directory trees. For ftp you always need to list whole > directory whenever you ask information about particular file. > Again for huge number of files it shows order of magnitude performance > improvement over not cached connections. > * Support of keep alive (NOOP) messages to avoid connection drops > * Support for Unix style or regexp wildcard glob - useful for listing a > particular files across whole directory tree > * Support for reestablishing broken ftp data transfers - can happen > surprisingly often > * Support for sftp private keys (including pass phrase) > * Support for keeping passwords, private keys and pass phrase in the jceks > key stores -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14444) New implementation of ftp and sftp filesystems
[ https://issues.apache.org/jira/browse/HADOOP-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Waldmann updated HADOOP-1: Attachment: (was: HADOOP-1.15.patch) > New implementation of ftp and sftp filesystems > -- > > Key: HADOOP-1 > URL: https://issues.apache.org/jira/browse/HADOOP-1 > Project: Hadoop Common > Issue Type: New Feature > Components: fs >Affects Versions: 2.8.0 >Reporter: Lukas Waldmann >Assignee: Lukas Waldmann >Priority: Major > Attachments: HADOOP-1.10.patch, HADOOP-1.11.patch, > HADOOP-1.12.patch, HADOOP-1.13.patch, HADOOP-1.14.patch, > HADOOP-1.15.patch, HADOOP-1.2.patch, HADOOP-1.3.patch, > HADOOP-1.4.patch, HADOOP-1.5.patch, HADOOP-1.6.patch, > HADOOP-1.7.patch, HADOOP-1.8.patch, HADOOP-1.9.patch, > HADOOP-1.patch > > > Current implementation of FTP and SFTP filesystems have severe limitations > and performance issues when dealing with high number of files. Mine patch > solve those issues and integrate both filesystems such a way that most of the > core functionality is common for both and therefore simplifying the > maintainability. > The core features: > * Support for HTTP/SOCKS proxies > * Support for passive FTP > * Support for explicit FTPS (SSL/TLS) > * Support of connection pooling - new connection is not created for every > single command but reused from the pool. > For huge number of files it shows order of magnitude performance improvement > over not pooled connections. > * Caching of directory trees. For ftp you always need to list whole > directory whenever you ask information about particular file. > Again for huge number of files it shows order of magnitude performance > improvement over not cached connections. > * Support of keep alive (NOOP) messages to avoid connection drops > * Support for Unix style or regexp wildcard glob - useful for listing a > particular files across whole directory tree > * Support for reestablishing broken ftp data transfers - can happen > surprisingly often > * Support for sftp private keys (including pass phrase) > * Support for keeping passwords, private keys and pass phrase in the jceks > key stores -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14444) New implementation of ftp and sftp filesystems
[ https://issues.apache.org/jira/browse/HADOOP-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16465825#comment-16465825 ] genericqa commented on HADOOP-1: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 32s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 34 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 48s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 31m 5s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 38m 36s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 3m 16s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 29s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 41s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-project hadoop-tools {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 39s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 23s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 31m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 31m 18s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 3m 24s{color} | {color:orange} root: The patch generated 3 new + 0 unchanged - 0 fixed = 3 total (was 0) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} shellcheck {color} | {color:green} 0m 0s{color} | {color:green} There were no new shellcheck issues. {color} | | {color:green}+1{color} | {color:green} shelldocs {color} | {color:green} 0m 31s{color} | {color:green} There were no new shelldocs issues. {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 1s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 5s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 21s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-project hadoop-tools {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 15s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 26s{color} | {color:green} hadoop-project in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 10m 40s{color} | {color:red} hadoop-ftp in the patch failed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 61m 47s{color} | {color:red} hadoop-tools in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 46s{color} | {color:green} The patch does not generate ASF License
[jira] [Commented] (HADOOP-15448) Swift auth fails "Expecting to find auth in request body"
[ https://issues.apache.org/jira/browse/HADOOP-15448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16465812#comment-16465812 ] Steve Loughran commented on HADOOP-15448: - * Well, this isn't a support channel, more for filing bugs and fixes. So I'm afraid you are going to have to start creating that bug report (with stacks in here, versions defined, etc). And you are going to have to turn up the logging in the swift code, the httpclient code, etc, to see what's going on at all. I don't expect anyone else to put their hand up here, sorry. Auth and openstack is a really source of pain with that swift module. Every endpoint had its own variants of the auth mech, and, for security reasons, nothing provides meaningful information. If its your own openstack instance. see what gets received, how it matches the expectations. That's what I would probably start with. > Swift auth fails "Expecting to find auth in request body" > - > > Key: HADOOP-15448 > URL: https://issues.apache.org/jira/browse/HADOOP-15448 > Project: Hadoop Common > Issue Type: Bug > Components: fs/swift >Reporter: Bhujay Kumar Bhatta >Priority: Major > > tried with hadoop upstream repo as per this document > https://hadoop.apache.org/docs/stable/hadoop-openstack/index.html . > Connection fails with malformed request , here is the log > http://paste.openstack.org/show/720417/ . I am out of option now . Kindly > help -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15448) Swift auth fails "Expecting to find auth in request body"
[ https://issues.apache.org/jira/browse/HADOOP-15448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-15448: Summary: Swift auth fails "Expecting to find auth in request body" (was: Hadoop-8545 Swift Integration) Component/s: fs/swift > Swift auth fails "Expecting to find auth in request body" > - > > Key: HADOOP-15448 > URL: https://issues.apache.org/jira/browse/HADOOP-15448 > Project: Hadoop Common > Issue Type: Bug > Components: fs/swift >Reporter: Bhujay Kumar Bhatta >Priority: Major > > tried with hadoop upstream repo as per this document > https://hadoop.apache.org/docs/stable/hadoop-openstack/index.html . > Connection fails with malformed request , here is the log > http://paste.openstack.org/show/720417/ . I am out of option now . Kindly > help -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-13649) s3guard: implement time-based (TTL) expiry for LocalMetadataStore
[ https://issues.apache.org/jira/browse/HADOOP-13649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16465802#comment-16465802 ] Gabor Bota edited comment on HADOOP-13649 at 5/7/18 11:23 AM: -- Mvn test and verify were successful on eu-west-1 with fs.s3a.s3guard.test.enabled (_-Ds3guard)._ was (Author: gabor.bota): Mvn test and verify were successful on eu-west-1 with fs.s3a.s3guard.test.enabled _-Ds3guard._ > s3guard: implement time-based (TTL) expiry for LocalMetadataStore > - > > Key: HADOOP-13649 > URL: https://issues.apache.org/jira/browse/HADOOP-13649 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.0.0-beta1 >Reporter: Aaron Fabbri >Assignee: Gabor Bota >Priority: Minor > Attachments: HADOOP-13649.001.patch, HADOOP-13649.002.patch > > > LocalMetadataStore is primarily a reference implementation for testing. It > may be useful in narrow circumstances where the workload can tolerate > short-term lack of inter-node consistency: Being in-memory, one JVM/node's > LocalMetadataStore will not see another node's changes to the underlying > filesystem. > To put a bound on the time during which this inconsistency may occur, we > should implement time-based (a.k.a. Time To Live / TTL) expiration for > LocalMetadataStore -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13649) s3guard: implement time-based (TTL) expiry for LocalMetadataStore
[ https://issues.apache.org/jira/browse/HADOOP-13649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16465802#comment-16465802 ] Gabor Bota commented on HADOOP-13649: - Mvn test and verify were successful on eu-west-1 with fs.s3a.s3guard.test.enabled _-Ds3guard._ > s3guard: implement time-based (TTL) expiry for LocalMetadataStore > - > > Key: HADOOP-13649 > URL: https://issues.apache.org/jira/browse/HADOOP-13649 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.0.0-beta1 >Reporter: Aaron Fabbri >Assignee: Gabor Bota >Priority: Minor > Attachments: HADOOP-13649.001.patch, HADOOP-13649.002.patch > > > LocalMetadataStore is primarily a reference implementation for testing. It > may be useful in narrow circumstances where the workload can tolerate > short-term lack of inter-node consistency: Being in-memory, one JVM/node's > LocalMetadataStore will not see another node's changes to the underlying > filesystem. > To put a bound on the time during which this inconsistency may occur, we > should implement time-based (a.k.a. Time To Live / TTL) expiration for > LocalMetadataStore -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15446) WASB: PageBlobInputStream.skip breaks HBASE replication
[ https://issues.apache.org/jira/browse/HADOOP-15446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16465799#comment-16465799 ] Hudson commented on HADOOP-15446: - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14132 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/14132/]) HADOOP-15446. WASB: PageBlobInputStream.skip breaks HBASE replication. (stevel: rev 5b11b9fd413470e134ecdc7c50468f8c7b39fa50) * (edit) hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azure/PageBlobInputStream.java * (add) hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azure/ITestPageBlobInputStream.java > WASB: PageBlobInputStream.skip breaks HBASE replication > --- > > Key: HADOOP-15446 > URL: https://issues.apache.org/jira/browse/HADOOP-15446 > Project: Hadoop Common > Issue Type: Bug > Components: fs/azure >Affects Versions: 2.9.0, 3.0.2 >Reporter: Thomas Marquardt >Assignee: Thomas Marquardt >Priority: Major > Attachments: HADOOP-15446-001.patch, HADOOP-15446-002.patch, > HADOOP-15446-003.patch > > > Page Blobs are primarily used by HBASE. HBASE replication, which apparently > has not been used with WASB until recently, performs non-sequential reads on > log files using PageBlobInputStream. There are bugs in this stream > implementation which prevent skip and seek from working properly, and > eventually the stream state becomes corrupt and unusable. > I believe this bug affects all releases of WASB/HADOOP. It appears to be a > day-0 bug in PageBlobInputStream. There were similar bugs opened in the past > (HADOOP-15042) but the issue was not properly fixed, and no test coverage was > added. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15446) WASB: PageBlobInputStream.skip breaks HBASE replication
[ https://issues.apache.org/jira/browse/HADOOP-15446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16465783#comment-16465783 ] Steve Loughran commented on HADOOP-15446: - +1, committed to branch 3.1 & trunk. If you want backporting to branch-2, run the tests, tell me how it went, & I'll backport. I did see one failure in my own test run, I'm assuming unrelated and just a function of network distance. {code} [ERROR] Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 26.213 s <<< FAILURE! - in org.apache.hadoop.fs.azure.TestClientThrottlingAnalyzer [ERROR] testManySuccessAndErrorsAndWaiting(org.apache.hadoop.fs.azure.TestClientThrottlingAnalyzer) Time elapsed: 1.123 s <<< FAILURE! java.lang.AssertionError: The actual value 9 is not within the expected range: [5.60, 8.40]. at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.assertTrue(Assert.java:41) at org.apache.hadoop.fs.azure.TestClientThrottlingAnalyzer.fuzzyValidate(TestClientThrottlingAnalyzer.java:46) at org.apache.hadoop.fs.azure.TestClientThrottlingAnalyzer.testManySuccessAndErrorsAndWaiting(TestClientThrottlingAnalyzer.java:168) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) {code} > WASB: PageBlobInputStream.skip breaks HBASE replication > --- > > Key: HADOOP-15446 > URL: https://issues.apache.org/jira/browse/HADOOP-15446 > Project: Hadoop Common > Issue Type: Bug > Components: fs/azure >Affects Versions: 2.9.0, 3.0.2 >Reporter: Thomas Marquardt >Assignee: Thomas Marquardt >Priority: Major > Attachments: HADOOP-15446-001.patch, HADOOP-15446-002.patch, > HADOOP-15446-003.patch > > > Page Blobs are primarily used by HBASE. HBASE replication, which apparently > has not been used with WASB until recently, performs non-sequential reads on > log files using PageBlobInputStream. There are bugs in this stream > implementation which prevent skip and seek from working properly, and > eventually the stream state becomes corrupt and unusable. > I believe this bug affects all releases of WASB/HADOOP. It appears to be a > day-0 bug in PageBlobInputStream. There were similar bugs opened in the past > (HADOOP-15042) but the issue was not properly fixed, and no test coverage was > added. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13649) s3guard: implement time-based (TTL) expiry for LocalMetadataStore
[ https://issues.apache.org/jira/browse/HADOOP-13649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16465674#comment-16465674 ] genericqa commented on HADOOP-13649: | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 21s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 19s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 33s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 55s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 25s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 14s{color} | {color:orange} hadoop-tools/hadoop-aws: The patch generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 31s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 4m 35s{color} | {color:green} hadoop-aws in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 24s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 58m 16s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd | | JIRA Issue | HADOOP-13649 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12922237/HADOOP-13649.002.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux ab0ad586323f 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 67f239c | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_162 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-HADOOP-Build/14593/artifact/out/diff-checkstyle-hadoop-tools_hadoop-aws.txt | | Test Results | https://builds.apache.org/job/PreCommit-HADOOP-Build/14593/testReport/ | | Max. process+thread count | 356 (vs. ulimit of 1) | | modules | C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/14593/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message
[jira] [Comment Edited] (HADOOP-13649) s3guard: implement time-based (TTL) expiry for LocalMetadataStore
[ https://issues.apache.org/jira/browse/HADOOP-13649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16465636#comment-16465636 ] Gabor Bota edited comment on HADOOP-13649 at 5/7/18 8:53 AM: - Thanks for the review. # I've created HADOOP-15423 to merge the two caches into one. # .expireAfterWrite() vs .expireAfterAccess() ** I think that access could be better in this situation, as long as there's no modification in the underlying bucket from another client - so no one else is modifying the s3 bucket like deleting files while the cache is in use - that way we can say that the cache is up to date. ** This store is only used for testing right now, so I can say that's right to choose expireAfterAccess. # Locking ** The com.google.common.cache.LocalCache has locking for write (e.g put, replace, remove) but not for simple read (getIfPresent). ** LocalMetadataStore has a lock for read too: synchronized (this) in get(). ** As the merge of the two caches will happen in HADOOP-15423, I think that's a topic to discuss further on that issue. # Performance testing ** I've done some performance testing to compare the cache vs hash performance. ** I hope that used sane parameters during the tests. ** Based on this, there will be some performance decrease with this implementation, but nothing significant with the basic test settings - in my tests I've modified (increased) the settings a little. Move() performance should improve when merging the caches - it will be interesting to compare what's happening after that change. ** Test results are in the following gist: [https://gist.github.com/bgaborg/2220fd53e553ec971c8edd1adf2493cd] was (Author: gabor.bota): Thanks for the review. # I've created HADOOP-15423 to merge the two caches into one. # .expireAfterWrite() vs .expireAfterAccess() ** I think that access could be better in this situation, as long as there's no modification in the underlying bucket from another client - so no one else is modifying the s3 bucket like deleting files while the cache is in use - that way we can say that the cache is up to date. ** This store is only used for testing right now, so I can say that's right to choose expireAfterAccess. # Locking ** The com.google.common.cache.LocalCache has locking for write (e.g put, replace, remove) but not for simple read (getIfPresent). ** LocalMetadataStore has a lock for read too: synchronized (this) in get(). ** As the merge of the two caches will happen in HADOOP-15423, I think that's a topic to discuss further on that issue. # Performance testing ** I've done some performance testing to compare the cache vs hash performance. ** I hope that used sane parameters during the tests. ** Based on this, there will be some performance decrease with this implementation, but nothing significant with the basic test settings - in my tests I've modified the settings a little bit. Move() performance should improve when merging the caches - it will be interesting to compare what's happening after that change. ** Test results are in the following gist: [https://gist.github.com/bgaborg/2220fd53e553ec971c8edd1adf2493cd] > s3guard: implement time-based (TTL) expiry for LocalMetadataStore > - > > Key: HADOOP-13649 > URL: https://issues.apache.org/jira/browse/HADOOP-13649 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.0.0-beta1 >Reporter: Aaron Fabbri >Assignee: Gabor Bota >Priority: Minor > Attachments: HADOOP-13649.001.patch, HADOOP-13649.002.patch > > > LocalMetadataStore is primarily a reference implementation for testing. It > may be useful in narrow circumstances where the workload can tolerate > short-term lack of inter-node consistency: Being in-memory, one JVM/node's > LocalMetadataStore will not see another node's changes to the underlying > filesystem. > To put a bound on the time during which this inconsistency may occur, we > should implement time-based (a.k.a. Time To Live / TTL) expiration for > LocalMetadataStore -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-13649) s3guard: implement time-based (TTL) expiry for LocalMetadataStore
[ https://issues.apache.org/jira/browse/HADOOP-13649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16465636#comment-16465636 ] Gabor Bota edited comment on HADOOP-13649 at 5/7/18 8:43 AM: - Thanks for the review. # I've created HADOOP-15423 to merge the two caches into one. # .expireAfterWrite() vs .expireAfterAccess() ** I think that access could be better in this situation, as long as there's no modification in the underlying bucket from another client - so no one else is modifying the s3 bucket like deleting files while the cache is in use - that way we can say that the cache is up to date. ** This store is only used for testing right now, so I can say that's right to choose expireAfterAccess. # Locking ** The com.google.common.cache.LocalCache has locking for write (e.g put, replace, remove) but not for simple read (getIfPresent). ** LocalMetadataStore has a lock for read too: synchronized (this) in get(). ** As the merge of the two caches will happen in HADOOP-15423, I think that's a topic to discuss further on that issue. # Performance testing ** I've done some performance testing to compare the cache vs hash performance. ** I hope that used sane parameters during the tests. ** Based on this, there will be some performance decrease with this implementation, but nothing significant with the basic test settings - in my tests I've modified the settings a little bit. Move() performance should improve when merging the caches - it will be interesting to compare what's happening after that change. ** Test results are in the following gist: [https://gist.github.com/bgaborg/2220fd53e553ec971c8edd1adf2493cd] was (Author: gabor.bota): Thanks for the review. # I've created HADOOP-15423 to merge the two caches into one. # .expireAfterWrite() vs .expireAfterAccess() ** I think that access could be better in this situation, as long as there's no modification in the underlying bucket from another client - so no one else is modifying the s3 bucket like deleting files while the cache is in use - that way we can say that the cache is up to date. This store is only used for testing right now, so I can say that's right to choose expireAfterAccess. # Locking ** The com.google.common.cache.LocalCache has locking for write (e.g put, replace, remove) but not for simple read (getIfPresent). ** LocalMetadataStore has a lock for read too: synchronized (this) in get(). ** As the merge of the two caches will happen in HADOOP-15423, I think that's a topic to discuss further on that issue. # Performance testing ** I've done some performance testing to compare the cache vs hash performance. ** I hope that used sane parameters during the tests. ** Based on this, there will be some performance decrease with this implementation, but nothing significant with the basic test settings - in my tests I've modified the settings a little bit. Move() performance should improve when merging the caches - it will be interesting to compare what's happening after that change. ** Test results are in the following gist: [https://gist.github.com/bgaborg/2220fd53e553ec971c8edd1adf2493cd] > s3guard: implement time-based (TTL) expiry for LocalMetadataStore > - > > Key: HADOOP-13649 > URL: https://issues.apache.org/jira/browse/HADOOP-13649 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.0.0-beta1 >Reporter: Aaron Fabbri >Assignee: Gabor Bota >Priority: Minor > Attachments: HADOOP-13649.001.patch, HADOOP-13649.002.patch > > > LocalMetadataStore is primarily a reference implementation for testing. It > may be useful in narrow circumstances where the workload can tolerate > short-term lack of inter-node consistency: Being in-memory, one JVM/node's > LocalMetadataStore will not see another node's changes to the underlying > filesystem. > To put a bound on the time during which this inconsistency may occur, we > should implement time-based (a.k.a. Time To Live / TTL) expiration for > LocalMetadataStore -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13649) s3guard: implement time-based (TTL) expiry for LocalMetadataStore
[ https://issues.apache.org/jira/browse/HADOOP-13649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16465636#comment-16465636 ] Gabor Bota commented on HADOOP-13649: - Thanks for the review. # I've created HADOOP-15423 to merge the two caches into one. # .expireAfterWrite() vs .expireAfterAccess() ** I think that access could be better in this situation, as long as there's no modification in the underlying bucket from another client - so no one else is modifying the s3 bucket like deleting files while the cache is in use - that way we can say that the cache is up to date. This store is only used for testing right now, so I can say that's right to choose expireAfterAccess. # Locking ** The com.google.common.cache.LocalCache has locking for write (e.g put, replace, remove) but not for simple read (getIfPresent). ** LocalMetadataStore has a lock for read too: synchronized (this) in get(). ** As the merge of the two caches will happen in HADOOP-15423, I think that's a topic to discuss further on that issue. # Performance testing ** I've done some performance testing to compare the cache vs hash performance. ** I hope that used sane parameters during the tests. ** Based on this, there will be some performance decrease with this implementation, but nothing significant with the basic test settings - in my tests I've modified the settings a little bit. Move() performance should improve when merging the caches - it will be interesting to compare what's happening after that change. ** Test results are in the following gist: [https://gist.github.com/bgaborg/2220fd53e553ec971c8edd1adf2493cd] > s3guard: implement time-based (TTL) expiry for LocalMetadataStore > - > > Key: HADOOP-13649 > URL: https://issues.apache.org/jira/browse/HADOOP-13649 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.0.0-beta1 >Reporter: Aaron Fabbri >Assignee: Gabor Bota >Priority: Minor > Attachments: HADOOP-13649.001.patch, HADOOP-13649.002.patch > > > LocalMetadataStore is primarily a reference implementation for testing. It > may be useful in narrow circumstances where the workload can tolerate > short-term lack of inter-node consistency: Being in-memory, one JVM/node's > LocalMetadataStore will not see another node's changes to the underlying > filesystem. > To put a bound on the time during which this inconsistency may occur, we > should implement time-based (a.k.a. Time To Live / TTL) expiration for > LocalMetadataStore -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-13649) s3guard: implement time-based (TTL) expiry for LocalMetadataStore
[ https://issues.apache.org/jira/browse/HADOOP-13649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Bota updated HADOOP-13649: Attachment: HADOOP-13649.002.patch > s3guard: implement time-based (TTL) expiry for LocalMetadataStore > - > > Key: HADOOP-13649 > URL: https://issues.apache.org/jira/browse/HADOOP-13649 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.0.0-beta1 >Reporter: Aaron Fabbri >Assignee: Gabor Bota >Priority: Minor > Attachments: HADOOP-13649.001.patch, HADOOP-13649.002.patch > > > LocalMetadataStore is primarily a reference implementation for testing. It > may be useful in narrow circumstances where the workload can tolerate > short-term lack of inter-node consistency: Being in-memory, one JVM/node's > LocalMetadataStore will not see another node's changes to the underlying > filesystem. > To put a bound on the time during which this inconsistency may occur, we > should implement time-based (a.k.a. Time To Live / TTL) expiration for > LocalMetadataStore -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14444) New implementation of ftp and sftp filesystems
[ https://issues.apache.org/jira/browse/HADOOP-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Waldmann updated HADOOP-1: Attachment: (was: HADOOP-1.15.patch) > New implementation of ftp and sftp filesystems > -- > > Key: HADOOP-1 > URL: https://issues.apache.org/jira/browse/HADOOP-1 > Project: Hadoop Common > Issue Type: New Feature > Components: fs >Affects Versions: 2.8.0 >Reporter: Lukas Waldmann >Assignee: Lukas Waldmann >Priority: Major > Attachments: HADOOP-1.10.patch, HADOOP-1.11.patch, > HADOOP-1.12.patch, HADOOP-1.13.patch, HADOOP-1.14.patch, > HADOOP-1.15.patch, HADOOP-1.2.patch, HADOOP-1.3.patch, > HADOOP-1.4.patch, HADOOP-1.5.patch, HADOOP-1.6.patch, > HADOOP-1.7.patch, HADOOP-1.8.patch, HADOOP-1.9.patch, > HADOOP-1.patch > > > Current implementation of FTP and SFTP filesystems have severe limitations > and performance issues when dealing with high number of files. Mine patch > solve those issues and integrate both filesystems such a way that most of the > core functionality is common for both and therefore simplifying the > maintainability. > The core features: > * Support for HTTP/SOCKS proxies > * Support for passive FTP > * Support for explicit FTPS (SSL/TLS) > * Support of connection pooling - new connection is not created for every > single command but reused from the pool. > For huge number of files it shows order of magnitude performance improvement > over not pooled connections. > * Caching of directory trees. For ftp you always need to list whole > directory whenever you ask information about particular file. > Again for huge number of files it shows order of magnitude performance > improvement over not cached connections. > * Support of keep alive (NOOP) messages to avoid connection drops > * Support for Unix style or regexp wildcard glob - useful for listing a > particular files across whole directory tree > * Support for reestablishing broken ftp data transfers - can happen > surprisingly often > * Support for sftp private keys (including pass phrase) > * Support for keeping passwords, private keys and pass phrase in the jceks > key stores -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14444) New implementation of ftp and sftp filesystems
[ https://issues.apache.org/jira/browse/HADOOP-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Waldmann updated HADOOP-1: Attachment: HADOOP-1.15.patch > New implementation of ftp and sftp filesystems > -- > > Key: HADOOP-1 > URL: https://issues.apache.org/jira/browse/HADOOP-1 > Project: Hadoop Common > Issue Type: New Feature > Components: fs >Affects Versions: 2.8.0 >Reporter: Lukas Waldmann >Assignee: Lukas Waldmann >Priority: Major > Attachments: HADOOP-1.10.patch, HADOOP-1.11.patch, > HADOOP-1.12.patch, HADOOP-1.13.patch, HADOOP-1.14.patch, > HADOOP-1.15.patch, HADOOP-1.2.patch, HADOOP-1.3.patch, > HADOOP-1.4.patch, HADOOP-1.5.patch, HADOOP-1.6.patch, > HADOOP-1.7.patch, HADOOP-1.8.patch, HADOOP-1.9.patch, > HADOOP-1.patch > > > Current implementation of FTP and SFTP filesystems have severe limitations > and performance issues when dealing with high number of files. Mine patch > solve those issues and integrate both filesystems such a way that most of the > core functionality is common for both and therefore simplifying the > maintainability. > The core features: > * Support for HTTP/SOCKS proxies > * Support for passive FTP > * Support for explicit FTPS (SSL/TLS) > * Support of connection pooling - new connection is not created for every > single command but reused from the pool. > For huge number of files it shows order of magnitude performance improvement > over not pooled connections. > * Caching of directory trees. For ftp you always need to list whole > directory whenever you ask information about particular file. > Again for huge number of files it shows order of magnitude performance > improvement over not cached connections. > * Support of keep alive (NOOP) messages to avoid connection drops > * Support for Unix style or regexp wildcard glob - useful for listing a > particular files across whole directory tree > * Support for reestablishing broken ftp data transfers - can happen > surprisingly often > * Support for sftp private keys (including pass phrase) > * Support for keeping passwords, private keys and pass phrase in the jceks > key stores -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15449) Frequent Namenode Flipover affecting user Jobs.
[ https://issues.apache.org/jira/browse/HADOOP-15449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Palanisamy updated HADOOP-15449: Description: We observed from several users regarding Namenode flip-over is due to either zookeeper disk slowness (higher fsync cost) or network issue. We would need to avoid flip-over issue to some extent by increasing HA session timeout, ha.zookeeper.session-timeout.ms. Default value is 5000 ms, seems very low in any production environment. I would suggest 1 ms as default session timeout. {code} .. 2018-05-04 03:54:36,848 INFO zookeeper.ClientCnxn (ClientCnxn.java:run(1140)) - Client session timed out, have not heard from server in 4689ms for sessionid 0x260e24bac500aa3, closing socket connection and attempting reconnect 2018-05-04 03:56:49,088 INFO zookeeper.ClientCnxn (ClientCnxn.java:run(1140)) - Client session timed out, have not heard from server in 3981ms for sessionid 0x360fd152b8700fe, closing socket connection and attempting reconnect .. {code} was: We observed from several users regarding Namenode flip-over is due to either zookeeper disk slowness (higher fsync cost) or network issue. We would need to avoid flip-over issue to some extent by increasing HA session timeout, ha.zookeeper.session-timeout.ms. Default value is 5000 ms, seems very low in any production environment. I would suggest 1 ms as default session timeout. > Frequent Namenode Flipover affecting user Jobs. > --- > > Key: HADOOP-15449 > URL: https://issues.apache.org/jira/browse/HADOOP-15449 > Project: Hadoop Common > Issue Type: Wish > Components: common >Affects Versions: 2.7.4 >Reporter: Karthik Palanisamy >Priority: Critical > Attachments: HADOOP-15449.patch > > > We observed from several users regarding Namenode flip-over is due to either > zookeeper disk slowness (higher fsync cost) or network issue. We would need > to avoid flip-over issue to some extent by increasing HA session timeout, > ha.zookeeper.session-timeout.ms. > Default value is 5000 ms, seems very low in any production environment. I > would suggest 1 ms as default session timeout. > > {code} > .. > 2018-05-04 03:54:36,848 INFO zookeeper.ClientCnxn > (ClientCnxn.java:run(1140)) - Client session timed out, have not heard from > server in 4689ms for sessionid 0x260e24bac500aa3, closing socket connection > and attempting reconnect > 2018-05-04 03:56:49,088 INFO zookeeper.ClientCnxn > (ClientCnxn.java:run(1140)) - Client session timed out, have not heard from > server in 3981ms for sessionid 0x360fd152b8700fe, closing socket connection > and attempting reconnect > .. > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15449) Frequent Namenode Flipover affecting user Jobs.
[ https://issues.apache.org/jira/browse/HADOOP-15449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16465524#comment-16465524 ] Karthik Palanisamy commented on HADOOP-15449: - Cc: [~arpitagarwal] [~anu] > Frequent Namenode Flipover affecting user Jobs. > --- > > Key: HADOOP-15449 > URL: https://issues.apache.org/jira/browse/HADOOP-15449 > Project: Hadoop Common > Issue Type: Wish > Components: common >Affects Versions: 2.7.4 >Reporter: Karthik Palanisamy >Priority: Critical > Attachments: HADOOP-15449.patch > > > We observed from several users regarding Namenode flip-over is due to either > zookeeper disk slowness (higher fsync cost) or network issue. We would need > to avoid flip-over issue to some extent by increasing HA session timeout, > ha.zookeeper.session-timeout.ms. > Default value 5000 ms, seems very low in any production environment. I would > suggest 1 ms as default session timeout. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15449) Frequent Namenode Flipover affecting user Jobs.
[ https://issues.apache.org/jira/browse/HADOOP-15449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Palanisamy updated HADOOP-15449: Description: We observed from several users regarding Namenode flip-over is due to either zookeeper disk slowness (higher fsync cost) or network issue. We would need to avoid flip-over issue to some extent by increasing HA session timeout, ha.zookeeper.session-timeout.ms. Default value is 5000 ms, seems very low in any production environment. I would suggest 1 ms as default session timeout. was: We observed from several users regarding Namenode flip-over is due to either zookeeper disk slowness (higher fsync cost) or network issue. We would need to avoid flip-over issue to some extent by increasing HA session timeout, ha.zookeeper.session-timeout.ms. Default value 5000 ms, seems very low in any production environment. I would suggest 1 ms as default session timeout. > Frequent Namenode Flipover affecting user Jobs. > --- > > Key: HADOOP-15449 > URL: https://issues.apache.org/jira/browse/HADOOP-15449 > Project: Hadoop Common > Issue Type: Wish > Components: common >Affects Versions: 2.7.4 >Reporter: Karthik Palanisamy >Priority: Critical > Attachments: HADOOP-15449.patch > > > We observed from several users regarding Namenode flip-over is due to either > zookeeper disk slowness (higher fsync cost) or network issue. We would need > to avoid flip-over issue to some extent by increasing HA session timeout, > ha.zookeeper.session-timeout.ms. > Default value is 5000 ms, seems very low in any production environment. I > would suggest 1 ms as default session timeout. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15449) Frequent Namenode Flipover affecting user Jobs.
[ https://issues.apache.org/jira/browse/HADOOP-15449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Palanisamy updated HADOOP-15449: Attachment: HADOOP-15449.patch > Frequent Namenode Flipover affecting user Jobs. > --- > > Key: HADOOP-15449 > URL: https://issues.apache.org/jira/browse/HADOOP-15449 > Project: Hadoop Common > Issue Type: Wish > Components: common >Affects Versions: 2.7.4 >Reporter: Karthik Palanisamy >Priority: Critical > Attachments: HADOOP-15449.patch > > > We observed from several users regarding Namenode flip-over is due to either > zookeeper disk slowness (higher fsync cost) or network issue. We would need > to avoid flip-over issue to some extent by increasing HA session timeout, > ha.zookeeper.session-timeout.ms. > Default value 5000 ms, seems very low in any production environment. I would > suggest 1 ms as default session timeout. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-15449) Frequent Namenode Flipover affecting user Jobs.
Karthik Palanisamy created HADOOP-15449: --- Summary: Frequent Namenode Flipover affecting user Jobs. Key: HADOOP-15449 URL: https://issues.apache.org/jira/browse/HADOOP-15449 Project: Hadoop Common Issue Type: Wish Components: common Affects Versions: 2.7.4 Reporter: Karthik Palanisamy We observed from several users regarding Namenode flip-over is due to either zookeeper disk slowness (higher fsync cost) or network issue. We would need to avoid flip-over issue to some extent by increasing HA session timeout, ha.zookeeper.session-timeout.ms. Default value 5000 ms, seems very low in any production environment. I would suggest 1 ms as default session timeout. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org