[jira] [Commented] (HDFS-10643) HDFS namenode should always use service user (hdfs) to generateEncryptedKey
[ https://issues.apache.org/jira/browse/HDFS-10643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15403408#comment-15403408 ] Xiao Chen commented on HDFS-10643: -- Thanks [~xyao] for revving! The change LGTM too, but the test is passing even without the fix. I think (not debugged, sorry if not correct) this is because NN will warm up the cache after HDFS-9405, so the test didn't trigger the KMS ACL check. Why {{createFile}} is done 3 times in the test? Is it for cache draining? I think we could set the cache size to 1 make it fail if so. Also a nit: in the test, can we remove this? {code} try { ... } catch (IOException e) { throw new IOException(e); } {code} > HDFS namenode should always use service user (hdfs) to generateEncryptedKey > --- > > Key: HDFS-10643 > URL: https://issues.apache.org/jira/browse/HDFS-10643 > Project: Hadoop HDFS > Issue Type: Bug > Components: encryption, namenode >Affects Versions: 2.6.0 >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao > Attachments: HDFS-10643.00.patch, HDFS-10643.01.patch, > HDFS-10643.02.patch, HDFS-10643.03.patch, HDFS-10643.04.patch > > > KMSClientProvider is designed to be shared by different KMS clients. When > HDFS Namenode as KMS client talks to KMS to generateEncryptedKey for new file > creation from proxy user (hive, oozie), the proxyuser handling for > KMSClientProvider in this case is unnecessary, which cause 1) an extra proxy > user configuration allowing hdfs user to proxy its clients and 2) KMS acls to > allow non-hdfs user for GENERATE_EEK operation. > This ticket is opened to always use HDFS namenode login user (hdfs) when > talking to KMS to generateEncryptedKey for new file creation. This way, we > have a more secure KMS based HDFS encryption (we can set kms-acls to allow > only hdfs user for GENERATE_EEK) with less configuration hassle for KMS to > allow hdfs to proxy other users. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10682) Replace FsDatasetImpl object lock with a separate lock object
[ https://issues.apache.org/jira/browse/HDFS-10682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15403399#comment-15403399 ] Fenghua Hu commented on HDFS-10682: --- Arpit/Liang, Looks like there is one JIRA(https://issues.apache.org/jira/browse/HDFS-9668) to address the big lock issue, maybe we should we relate them? > Replace FsDatasetImpl object lock with a separate lock object > - > > Key: HDFS-10682 > URL: https://issues.apache.org/jira/browse/HDFS-10682 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Chen Liang >Assignee: Chen Liang > Attachments: HDFS-10682.001.patch, HDFS-10682.002.patch, > HDFS-10682.003.patch, HDFS-10682.004.patch, HDFS-10682.005.patch, > HDFS-10682.006.patch > > > This Jira proposes to replace the FsDatasetImpl object lock with a separate > lock object. Doing so will make it easier to measure lock statistics like > lock held time and warn about potential lock contention due to slow disk > operations. > In the future we can also consider replacing the lock with a read-write lock. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10706) Add tool generating FSImage from external store
[ https://issues.apache.org/jira/browse/HDFS-10706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15403393#comment-15403393 ] Hadoop QA commented on HDFS-10706: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 5 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 36s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 52s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 25s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 29s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 8s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-tools {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 21s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 6s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 5s{color} | {color:red} hadoop-tools in the patch failed. {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 4s{color} | {color:red} hadoop-fs2img in the patch failed. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 7s{color} | {color:red} root in the patch failed. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 7s{color} | {color:red} root in the patch failed. {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 6s{color} | {color:green} root: The patch generated 0 new + 0 unchanged - 6 fixed = 0 total (was 6) {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 6s{color} | {color:red} hadoop-tools in the patch failed. {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 4s{color} | {color:red} hadoop-fs2img in the patch failed. {color} | | {color:red}-1{color} | {color:red} mvneclipse {color} | {color:red} 0m 6s{color} | {color:red} hadoop-tools in the patch failed. {color} | | {color:red}-1{color} | {color:red} mvneclipse {color} | {color:red} 0m 4s{color} | {color:red} hadoop-fs2img in the patch failed. {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 5 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 3s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-tools {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 4s{color} | {color:red} hadoop-fs2img in the patch failed. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 7s{color} | {color:red} hadoop-tools in the patch failed. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 4s{color} | {color:red} hadoop-fs2img in the patch failed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 79m 14s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 58s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 9s{color} | {color:red} hadoop-tools in the patch failed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 8s{color} | {color:red} hadoop-fs2img in the
[jira] [Commented] (HDFS-10678) Documenting NNThroughputBenchmark tool
[ https://issues.apache.org/jira/browse/HDFS-10678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15403341#comment-15403341 ] Hadoop QA commented on HDFS-10678: -- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 38s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 15s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 36s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 30s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 5s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 37s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-project {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 57s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 32s{color} | {color:green} root: The patch generated 0 new + 140 unchanged - 1 fixed = 140 total (was 141) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-project {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 9s{color} | {color:green} hadoop-project in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 7m 41s{color} | {color:green} hadoop-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 62m 47s{color} | {color:green} hadoop-hdfs in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 24s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}118m 0s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12821489/HDFS-10678.002.patch | | JIRA Issue | HDFS-10678 | | Optional Tests | asflicense mvnsite compile javac javadoc mvninstall unit findbugs checkstyle xml | | uname | Linux 83279432c1c9 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality |
[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order
[ https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15403325#comment-15403325 ] Konstantin Shvachko commented on HDFS-10301: Unfortunately, there seems to be a problem with the patch. Storage report is not recognized in certain cases. Will revert the commits. > BlockReport retransmissions may lead to storages falsely being declared > zombie if storage report processing happens out of order > > > Key: HDFS-10301 > URL: https://issues.apache.org/jira/browse/HDFS-10301 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.6.1 >Reporter: Konstantin Shvachko >Assignee: Vinitha Reddy Gankidi >Priority: Critical > Fix For: 2.7.4 > > Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, > HDFS-10301.004.patch, HDFS-10301.005.patch, HDFS-10301.006.patch, > HDFS-10301.007.patch, HDFS-10301.008.patch, HDFS-10301.009.patch, > HDFS-10301.01.patch, HDFS-10301.010.patch, HDFS-10301.011.patch, > HDFS-10301.012.patch, HDFS-10301.branch-2.7.patch, HDFS-10301.branch-2.patch, > HDFS-10301.sample.patch, zombieStorageLogs.rtf > > > When NameNode is busy a DataNode can timeout sending a block report. Then it > sends the block report again. Then NameNode while process these two reports > at the same time can interleave processing storages from different reports. > This screws up the blockReportId field, which makes NameNode think that some > storages are zombie. Replicas from zombie storages are immediately removed, > causing missing blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10714) Issue in handling checksum errors in write pipeline when fault DN is LAST_IN_PIPELINE
[ https://issues.apache.org/jira/browse/HDFS-10714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15403322#comment-15403322 ] Brahma Reddy Battula commented on HDFS-10714: - Thinking solutions like this. 1) Remove both DNs in checksum error case..i.e DN2 and DN3 2) Remove DN3 first and record DN2 as suspect node .. If it still fails with checksum error , then DN2 can be removed as it's suspected during next pipeline I think, 2nd solution will be safe.. anythoughts on this...? cc [~kanaka]/[~vinayrpet] > Issue in handling checksum errors in write pipeline when fault DN is > LAST_IN_PIPELINE > - > > Key: HDFS-10714 > URL: https://issues.apache.org/jira/browse/HDFS-10714 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Brahma Reddy Battula >Assignee: Brahma Reddy Battula > > We had come across one issue, where write is failed even 7 DN’s are available > due to network fault at one datanode which is LAST_IN_PIPELINE. It will be > similar to HDFS-6937 . > Scenario : (DN3 has N/W Fault and Min repl=2). > Write pipeline: > DN1->DN2->DN3 => DN3 Gives ERROR_CHECKSUM ack. And so DN2 marked as bad > DN1->DN4-> DN3 => DN3 Gives ERROR_CHECKSUM ack. And so DN4 is marked as bad > …. > And so on ( all the times DN3 is LAST_IN_PIPELINE) ... Continued till no more > datanodes to construct the pipeline. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-10714) Issue in handling checksum errors in write pipeline when fault DN is LAST_IN_PIPELINE
[ https://issues.apache.org/jira/browse/HDFS-10714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula reassigned HDFS-10714: --- Assignee: Brahma Reddy Battula > Issue in handling checksum errors in write pipeline when fault DN is > LAST_IN_PIPELINE > - > > Key: HDFS-10714 > URL: https://issues.apache.org/jira/browse/HDFS-10714 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Brahma Reddy Battula >Assignee: Brahma Reddy Battula > > We had come across one issue, where write is failed even 7 DN’s are available > due to network fault at one datanode which is LAST_IN_PIPELINE. It will be > similar to HDFS-6937 . > Scenario : (DN3 has N/W Fault and Min repl=2). > Write pipeline: > DN1->DN2->DN3 => DN3 Gives ERROR_CHECKSUM ack. And so DN2 marked as bad > DN1->DN4-> DN3 => DN3 Gives ERROR_CHECKSUM ack. And so DN4 is marked as bad > …. > And so on ( all the times DN3 is LAST_IN_PIPELINE) ... Continued till no more > datanodes to construct the pipeline. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-6937) Another issue in handling checksum errors in write pipeline
[ https://issues.apache.org/jira/browse/HDFS-6937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15403313#comment-15403313 ] Brahma Reddy Battula commented on HDFS-6937: bq. If your problem is really a network issue, then your proposed solution sounds reasonable to me. However, it seems different than what HDFS-6937 intends to solve, and I think we can create a new jira for your issue. Here is why: Initially thought of handling with this issue only. Thanks for correction..Raised HDFS-10714 to handle seperately.. > Another issue in handling checksum errors in write pipeline > --- > > Key: HDFS-6937 > URL: https://issues.apache.org/jira/browse/HDFS-6937 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, hdfs-client >Affects Versions: 2.5.0 >Reporter: Yongjun Zhang >Assignee: Wei-Chiu Chuang > Attachments: HDFS-6937.001.patch, HDFS-6937.002.patch > > > Given a write pipeline: > DN1 -> DN2 -> DN3 > DN3 detected cheksum error and terminate, DN2 truncates its replica to the > ACKed size. Then a new pipeline is attempted as > DN1 -> DN2 -> DN4 > DN4 detects checksum error again. Later when replaced DN4 with DN5 (and so > on), it failed for the same reason. This led to the observation that DN2's > data is corrupted. > Found that the software currently truncates DN2's replca to the ACKed size > after DN3 terminates. But it doesn't check the correctness of the data > already written to disk. > So intuitively, a solution would be, when downstream DN (DN3 here) found > checksum error, propagate this info back to upstream DN (DN2 here), DN2 > checks the correctness of the data already written to disk, and truncate the > replica to to MIN(correctDataSize, ACKedSize). > Found this issue is similar to what was reported by HDFS-3875, and the > truncation at DN2 was actually introduced as part of the HDFS-3875 solution. > Filing this jira for the issue reported here. HDFS-3875 was filed by > [~tlipcon] > and found he proposed something similar there. > {quote} > if the tail node in the pipeline detects a checksum error, then it returns a > special error code back up the pipeline indicating this (rather than just > disconnecting) > if a non-tail node receives this error code, then it immediately scans its > own block on disk (from the beginning up through the last acked length). If > it detects a corruption on its local copy, then it should assume that it is > the faulty one, rather than the downstream neighbor. If it detects no > corruption, then the faulty node is either the downstream mirror or the > network link between the two, and the current behavior is reasonable. > {quote} > Thanks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-10714) Issue in handling checksum errors in write pipeline when fault DN is LAST_IN_PIPELINE
Brahma Reddy Battula created HDFS-10714: --- Summary: Issue in handling checksum errors in write pipeline when fault DN is LAST_IN_PIPELINE Key: HDFS-10714 URL: https://issues.apache.org/jira/browse/HDFS-10714 Project: Hadoop HDFS Issue Type: Bug Reporter: Brahma Reddy Battula We had come across one issue, where write is failed even 7 DN’s are available due to network fault at one datanode which is LAST_IN_PIPELINE. It will be similar to HDFS-6937 . Scenario : (DN3 has N/W Fault and Min repl=2). Write pipeline: DN1->DN2->DN3 => DN3 Gives ERROR_CHECKSUM ack. And so DN2 marked as bad DN1->DN4-> DN3 => DN3 Gives ERROR_CHECKSUM ack. And so DN4 is marked as bad …. And so on ( all the times DN3 is LAST_IN_PIPELINE) ... Continued till no more datanodes to construct the pipeline. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10706) Add tool generating FSImage from external store
[ https://issues.apache.org/jira/browse/HDFS-10706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated HDFS-10706: - Status: Patch Available (was: Open) > Add tool generating FSImage from external store > --- > > Key: HDFS-10706 > URL: https://issues.apache.org/jira/browse/HDFS-10706 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode, tools >Reporter: Chris Douglas > Attachments: HDFS-10706.001.patch > > > To experiment with provided storage, this provides a tool to map an external > namespace to an FSImage/NN storage. By loading it in a NN, one can access the > remote FS using HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10586) Erasure Code misfunctions when 3 DataNode down
[ https://issues.apache.org/jira/browse/HDFS-10586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15403291#comment-15403291 ] gao shan commented on HDFS-10586: - Thanks. but I feel that is not caused by network. The cluster consists of 10 nodes ( 1 namenode and 9 datanodes ) , which are all virtual machines (15G memory for per vm) created in a same physical server machine. And IPs of these 10 nodes are assigned in a same internal network segment ( 192.168.X .X ). > Erasure Code misfunctions when 3 DataNode down > -- > > Key: HDFS-10586 > URL: https://issues.apache.org/jira/browse/HDFS-10586 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Affects Versions: 3.0.0-alpha1 > Environment: 9 DataNode and 1 NameNode,Erasured code policy is > set as "6--3", When 3 DataNode down, erasured code fails and an exception > is thrown >Reporter: gao shan > > The following is the steps to reproduce: > 1) hadoop fs -mkdir /ec > 2) set erasured code policy as "6-3" > 3) "write" data by : > time hadoop jar > /opt/hadoop/hadoop-3.0.0-SNAPSHOT/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.0.0-SNAPSHOT.jar > TestDFSIO -D test.build.data=/ec -write -nrFiles 30 -fileSize 12288 > -bufferSize 1073741824 > 4) Manually down 3 nodes. Kill the threads of "datanode" and "nodemanager" > in 3 DataNode. > 5) By using erasured code to "read" data by: > time hadoop jar > /opt/hadoop/hadoop-3.0.0-SNAPSHOT/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.0.0-SNAPSHOT.jar > TestDFSIO -D test.build.data=/ec -read -nrFiles 30 -fileSize 12288 > -bufferSize 1073741824 > then the failure occurs and the exception is thrown as: > INFO mapreduce.Job: Task Id : attempt_1465445965249_0008_m_34_2, Status : > FAILED > Error: java.io.IOException: 4 missing blocks, the stripe is: Offset=0, > length=8388608, fetchedChunksNum=0, missingChunksNum=4 > at > org.apache.hadoop.hdfs.DFSStripedInputStream$StripeReader.checkMissingBlocks(DFSStripedInputStream.java:614) > at > org.apache.hadoop.hdfs.DFSStripedInputStream$StripeReader.readParityChunks(DFSStripedInputStream.java:647) > at > org.apache.hadoop.hdfs.DFSStripedInputStream$StripeReader.readStripe(DFSStripedInputStream.java:762) > at > org.apache.hadoop.hdfs.DFSStripedInputStream.readOneStripe(DFSStripedInputStream.java:316) > at > org.apache.hadoop.hdfs.DFSStripedInputStream.readWithStrategy(DFSStripedInputStream.java:450) > at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:941) > at java.io.DataInputStream.read(DataInputStream.java:149) > at org.apache.hadoop.fs.TestDFSIO$ReadMapper.doIO(TestDFSIO.java:531) > at org.apache.hadoop.fs.TestDFSIO$ReadMapper.doIO(TestDFSIO.java:508) > at org.apache.hadoop.fs.IOMapperBase.map(IOMapperBase.java:134) > at org.apache.hadoop.fs.IOMapperBase.map(IOMapperBase.java:37) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10678) Documenting NNThroughputBenchmark tool
[ https://issues.apache.org/jira/browse/HDFS-10678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Liu updated HDFS-10678: - Attachment: HDFS-10678.002.patch The v2 patch is to address [~iwasakims]'s last comment. Thanks for the suggestion. {quote} I think Benchmarking.md should be just toc and the doc of NNThroughputBenchmakr should be under hadoop-hdfs-project/hadoop-hdfs/src/site as independent page. {quote} Adding a toc file seems good for long term, but a bit heavy/overkill because it will have only one link as content as the only benchmarking related material by now is {{NNThroughputBenchmark}}. Does it make sense to add a new *menu* for the hadoop-project/src/site/site.xml? In this way, we organize all benchmarking related pages in one section, while keep themselves independent and well placed. > Documenting NNThroughputBenchmark tool > -- > > Key: HDFS-10678 > URL: https://issues.apache.org/jira/browse/HDFS-10678 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: benchmarks, test >Reporter: Mingliang Liu >Assignee: Mingliang Liu > Labels: documentation > Attachments: HDFS-10678.000.patch, HDFS-10678.001.patch, > HDFS-10678.002.patch > > > The best (only) documentation for the NNThroughputBenchmark currently exists > as a JavaDoc on the NNThroughputBenchmark class. This is less than useful, > especially since we no longer generate javadocs for HDFS as part of the build > process. I suggest we extract it into a separate markdown doc, or merge it > with other benchmarking materials (if any?) about HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10467) Router-based HDFS federation
[ https://issues.apache.org/jira/browse/HDFS-10467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15403257#comment-15403257 ] Inigo Goiri commented on HDFS-10467: Regarding the rebalancing operations, currently we are proposing to disallow write accesses from the Routers. The problem is that then we have to disallow direct accesses to the Namenodes to prevent writes at that level. For this reason, we could leverage the concept of immutable folders/files from HDFS-3154 and more recently HDFS-7568. Not sure how likely are those efforts are to move forward though. > Router-based HDFS federation > > > Key: HDFS-10467 > URL: https://issues.apache.org/jira/browse/HDFS-10467 > Project: Hadoop HDFS > Issue Type: New Feature > Components: fs >Affects Versions: 2.7.2 >Reporter: Inigo Goiri >Assignee: Inigo Goiri > Attachments: HDFS Router Federation.pdf, HDFS-10467.PoC.001.patch, > HDFS-10467.PoC.patch, HDFS-Router-Federation-Prototype.patch > > > Add a Router to provide a federated view of multiple HDFS clusters. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10682) Replace FsDatasetImpl object lock with a separate lock object
[ https://issues.apache.org/jira/browse/HDFS-10682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15403191#comment-15403191 ] Hadoop QA commented on HDFS-10682: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 48s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 55s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 12s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 56s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 46s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 24s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 16 new + 109 unchanged - 11 fixed = 125 total (was 120) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 10s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 20s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 86m 45s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.blockmanagement.TestPendingInvalidateBlock | | | hadoop.hdfs.server.datanode.TestBlockRecovery | | Timed out junit tests | org.apache.hadoop.hdfs.TestReadWhileWriting | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12821459/HDFS-10682.006.patch | | JIRA Issue | HDFS-10682 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 81bdc8af99e1 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 9f473cf | | Default Java | 1.8.0_101 | | findbugs | v3.0.0 | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/16282/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/16282/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/16282/testReport/ | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/16282/console | | Powered
[jira] [Commented] (HDFS-10702) Add a Client API and Proxy Provider to enable stale read from Standby
[ https://issues.apache.org/jira/browse/HDFS-10702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15403169#comment-15403169 ] Hadoop QA commented on HDFS-10702: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 22s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 4 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 43s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 25s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 51s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 9s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 51s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 5s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 6m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 57s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 49s{color} | {color:orange} root: The patch generated 9 new + 1030 unchanged - 2 fixed = 1039 total (was 1032) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch 1 line(s) with tabs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 21s{color} | {color:green} hadoop-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 12s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 75m 6s{color} | {color:green} hadoop-hdfs in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 23s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}142m 33s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12821448/HDFS-10702.003.patch | | JIRA Issue | HDFS-10702 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc | | uname | Linux 222ef60ae1e7 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 9f473cf | | Default Java | 1.8.0_101 | | findbugs | v3.0.0 | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/16281/artifact/patchprocess/diff-checkstyle-root.txt | | whitespace | https://builds.apache.org/job/PreCommit-HDFS-Build/16281/artifact/patchprocess/whitespace-tabs.txt | |
[jira] [Updated] (HDFS-10682) Replace FsDatasetImpl object lock with a separate lock object
[ https://issues.apache.org/jira/browse/HDFS-10682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-10682: - Description: This Jira proposes to replace the FsDatasetImpl object lock with a separate lock object. Doing so will make it easier to measure lock statistics like lock held time and warn about potential lock contention due to slow disk operations. In the future we can also consider replacing the lock with a read-write lock. was: This Jira proposes to replace the FsDatasetImpl object lock with a separate lock object. Doing so will allow us to measure lock statistics. In the future we can also consider replacing the lock with a read-write lock. > Replace FsDatasetImpl object lock with a separate lock object > - > > Key: HDFS-10682 > URL: https://issues.apache.org/jira/browse/HDFS-10682 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Chen Liang >Assignee: Chen Liang > Attachments: HDFS-10682.001.patch, HDFS-10682.002.patch, > HDFS-10682.003.patch, HDFS-10682.004.patch, HDFS-10682.005.patch, > HDFS-10682.006.patch > > > This Jira proposes to replace the FsDatasetImpl object lock with a separate > lock object. Doing so will make it easier to measure lock statistics like > lock held time and warn about potential lock contention due to slow disk > operations. > In the future we can also consider replacing the lock with a read-write lock. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10712) Fix TestDataNodeVolumeFailure on 2.* branches.
[ https://issues.apache.org/jira/browse/HDFS-10712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15403126#comment-15403126 ] Hadoop QA commented on HDFS-10712: -- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 21s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 51s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 53s{color} | {color:green} branch-2 passed with JDK v1.8.0_101 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 46s{color} | {color:green} branch-2 passed with JDK v1.7.0_101 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 31s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 0s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 17s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 7s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 2s{color} | {color:green} branch-2 passed with JDK v1.8.0_101 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 44s{color} | {color:green} branch-2 passed with JDK v1.7.0_101 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 49s{color} | {color:green} the patch passed with JDK v1.8.0_101 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 51s{color} | {color:green} the patch passed with JDK v1.7.0_101 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 28s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 57s{color} | {color:green} the patch passed with JDK v1.8.0_101 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 33s{color} | {color:green} the patch passed with JDK v1.7.0_101 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 50m 30s{color} | {color:green} hadoop-hdfs in the patch passed with JDK v1.7.0_101. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 22s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}131m 40s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:b59b8b7 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12821441/HDFS-10712.branch-2.patch | | JIRA Issue | HDFS-10712 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux ed9b865e076a 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | branch-2 / 4ad2a73 | | Default Java | 1.7.0_101 | | Multi-JDK versions | /usr/lib/jvm/java-8-oracle:1.8.0_101 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_101 | | findbugs | v3.0.0 | | JDK v1.7.0_101 Test Results |
[jira] [Updated] (HDFS-10682) Replace FsDatasetImpl object lock with a separate lock object
[ https://issues.apache.org/jira/browse/HDFS-10682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-10682: - Description: This Jira proposes to replace the FsDatasetImpl object lock with a separate lock object. Doing so will allow us to measure lock statistics. In the future we can also consider replacing the lock with a read-write lock. was:Add a metric to measure the time the lock of FSDataSetImpl is held by a thread. The goal is to expose this for users to identify operations that locks dataset for long time ("long" in some sense) and be able to understand/reason/track the operation based on logs. > Replace FsDatasetImpl object lock with a separate lock object > - > > Key: HDFS-10682 > URL: https://issues.apache.org/jira/browse/HDFS-10682 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Chen Liang >Assignee: Chen Liang > Attachments: HDFS-10682.001.patch, HDFS-10682.002.patch, > HDFS-10682.003.patch, HDFS-10682.004.patch, HDFS-10682.005.patch, > HDFS-10682.006.patch > > > This Jira proposes to replace the FsDatasetImpl object lock with a separate > lock object. Doing so will allow us to measure lock statistics. > In the future we can also consider replacing the lock with a read-write lock. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10682) Replace FsDatasetImpl object lock with a separate lock object
[ https://issues.apache.org/jira/browse/HDFS-10682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15403118#comment-15403118 ] Arpit Agarwal commented on HDFS-10682: -- bq. And I think we need dataset to expose lock acquire and release to those callers. What do you think? Hi [~vagarychen], yes that's correct. We'd need FsDatasetImpl (and perhaps FsDatasetSpi) to expose locking routines. bq. I was trying to take into account lock acquire time and lock release time there, which requires recording time before acquiring lock and after releasing lock and this is ThreadLocal is all about. Yes I agree with the idea, if we want to incorporate lock-held/release time we'd need either thread locals or the map you talked about. But we can skip measuring those values for simplicity and to avoid questions like thread local overhead. We can consider adding more measurements later. > Replace FsDatasetImpl object lock with a separate lock object > - > > Key: HDFS-10682 > URL: https://issues.apache.org/jira/browse/HDFS-10682 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Chen Liang >Assignee: Chen Liang > Attachments: HDFS-10682.001.patch, HDFS-10682.002.patch, > HDFS-10682.003.patch, HDFS-10682.004.patch, HDFS-10682.005.patch, > HDFS-10682.006.patch > > > Add a metric to measure the time the lock of FSDataSetImpl is held by a > thread. The goal is to expose this for users to identify operations that > locks dataset for long time ("long" in some sense) and be able to > understand/reason/track the operation based on logs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10682) Replace FsDatasetImpl object lock with a separate lock object
[ https://issues.apache.org/jira/browse/HDFS-10682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15403093#comment-15403093 ] Chen Liang commented on HDFS-10682: --- Thank you [~arpitagarwal] very much for the review! A few things I would like to clarify though: 1. by "fix the other locations", did you mean fixing places like such?: synchronized(dataset) { ... } As dataset itself has been refactored with a separate lock, having this synchronized cal meanswe would have two locks here. I have been thinking about situations like this. And I think we need dataset to expose lock acquire and release to those callers. What do you think? 2. Regarding the ThreadLocal variables, you have a good point. But I was trying to take into account lock acquire time and lock release time there, which requires recording time before acquiring lock and after releasing lock and this is ThreadLocal is all about. Do you have any comments on this? e.g. are these values worth recording? 3. And you are totally right that I don't need the equals zero checks, thanks for pointing it out! > Replace FsDatasetImpl object lock with a separate lock object > - > > Key: HDFS-10682 > URL: https://issues.apache.org/jira/browse/HDFS-10682 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Chen Liang >Assignee: Chen Liang > Attachments: HDFS-10682.001.patch, HDFS-10682.002.patch, > HDFS-10682.003.patch, HDFS-10682.004.patch, HDFS-10682.005.patch, > HDFS-10682.006.patch > > > Add a metric to measure the time the lock of FSDataSetImpl is held by a > thread. The goal is to expose this for users to identify operations that > locks dataset for long time ("long" in some sense) and be able to > understand/reason/track the operation based on logs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10643) HDFS namenode should always use service user (hdfs) to generateEncryptedKey
[ https://issues.apache.org/jira/browse/HDFS-10643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15403083#comment-15403083 ] Hadoop QA commented on HDFS-10643: -- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 3s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 27s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 55s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 12s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 47s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 55s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 59m 58s{color} | {color:green} hadoop-hdfs in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 18s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 79m 27s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12821442/HDFS-10643.04.patch | | JIRA Issue | HDFS-10643 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 6201e31dde22 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 9f473cf | | Default Java | 1.8.0_101 | | findbugs | v3.0.0 | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/16280/testReport/ | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/16280/console | | Powered by | Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > HDFS namenode should always use service user (hdfs) to generateEncryptedKey > --- > > Key: HDFS-10643 > URL: https://issues.apache.org/jira/browse/HDFS-10643 > Project: Hadoop HDFS > Issue Type: Bug > Components: encryption, namenode >Affects Versions: 2.6.0 >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao > Attachments: HDFS-10643.00.patch, HDFS-10643.01.patch, > HDFS-10643.02.patch, HDFS-10643.03.patch, HDFS-10643.04.patch > > > KMSClientProvider is designed to be
[jira] [Commented] (HDFS-10682) Replace FsDatasetImpl object lock with a separate lock object
[ https://issues.apache.org/jira/browse/HDFS-10682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15403061#comment-15403061 ] Arpit Agarwal commented on HDFS-10682: -- Thanks for the updated patch [~vagarychen]! This is looking good. A couple of comments: # We also need to fix other locations that are synchronizing on the FSDatasetImpl object e.g. [FsVolumeImpl|https://github.com/apache/hadoop/blob/branch-2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsVolumeImpl.java#L307], [DirectoryScanner|https://github.com/apache/hadoop/blob/branch-2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DirectoryScanner.java#L586]. # Let's move the instrumentation changes to a separate Jira. We can repurpose this just for splitting out the lock. Comments on the instrumentation changes: ## We don't need ThreadLocal or threadID-> timestamps map. We are measuring the lock held time so we can save a timestamp just after getting the lock and another timestamp just before releasing the lock. Then diff them with the lock held and log after releasing the lock. We may need to use a thread local approach later if we have a read-write lock in which case there can be multiple concurrent lock holders. ## You don't need the {{if (start == 0 || start2 == 0)}} checks. These values they can be assumed to be correct now they are initialized in the lock class. > Replace FsDatasetImpl object lock with a separate lock object > - > > Key: HDFS-10682 > URL: https://issues.apache.org/jira/browse/HDFS-10682 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Chen Liang >Assignee: Chen Liang > Attachments: HDFS-10682.001.patch, HDFS-10682.002.patch, > HDFS-10682.003.patch, HDFS-10682.004.patch, HDFS-10682.005.patch, > HDFS-10682.006.patch > > > Add a metric to measure the time the lock of FSDataSetImpl is held by a > thread. The goal is to expose this for users to identify operations that > locks dataset for long time ("long" in some sense) and be able to > understand/reason/track the operation based on logs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10682) Replace FsDatasetImpl object lock with a separate lock object
[ https://issues.apache.org/jira/browse/HDFS-10682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-10682: - Summary: Replace FsDatasetImpl object lock with a separate lock object (was: Add metric to measure lock held time in FSDataSetImpl) > Replace FsDatasetImpl object lock with a separate lock object > - > > Key: HDFS-10682 > URL: https://issues.apache.org/jira/browse/HDFS-10682 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Chen Liang >Assignee: Chen Liang > Attachments: HDFS-10682.001.patch, HDFS-10682.002.patch, > HDFS-10682.003.patch, HDFS-10682.004.patch, HDFS-10682.005.patch, > HDFS-10682.006.patch > > > Add a metric to measure the time the lock of FSDataSetImpl is held by a > thread. The goal is to expose this for users to identify operations that > locks dataset for long time ("long" in some sense) and be able to > understand/reason/track the operation based on logs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order
[ https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15403045#comment-15403045 ] Konstantin Shvachko commented on HDFS-10301: We are actively looking into possible problem with this change. LMK if the revert fixes the problem. Just to clarify you are using per-storage reports on your cluster? In the meantime answering your questions Daryn. ??Why is this patch changing per-storage reports when it's the single-rpc report that is the problem??? The problem is both with single-rpc and per-storage reports. In multi-rpc case DNs can send repeated RPCs for each storage and this will cause incorrect zombie detection if RPCs processed out of order. ??Is this change compatible??? Yes. The compatibility issues were discussed here above. ??What does an old NN do if it gets this pseudo-report??? According to [Rolling upgrade documentation|https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html] we first upgrade NameNodes, then DataNodes. So in practice new DNs don't talk to old NNs. ??What does a new NN do when it gets old style reports? Will it remove all but the last storage??? As mentioned in [this comment|https://issues.apache.org/jira/browse/HDFS-10301?focusedCommentId=15271737=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15271737] old DataNodes reports will be processed as regular reports, only zombie storages will not be removed until DNs upgraded. During upgrade no storages are removed. > BlockReport retransmissions may lead to storages falsely being declared > zombie if storage report processing happens out of order > > > Key: HDFS-10301 > URL: https://issues.apache.org/jira/browse/HDFS-10301 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.6.1 >Reporter: Konstantin Shvachko >Assignee: Vinitha Reddy Gankidi >Priority: Critical > Fix For: 2.7.4 > > Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, > HDFS-10301.004.patch, HDFS-10301.005.patch, HDFS-10301.006.patch, > HDFS-10301.007.patch, HDFS-10301.008.patch, HDFS-10301.009.patch, > HDFS-10301.01.patch, HDFS-10301.010.patch, HDFS-10301.011.patch, > HDFS-10301.012.patch, HDFS-10301.branch-2.7.patch, HDFS-10301.branch-2.patch, > HDFS-10301.sample.patch, zombieStorageLogs.rtf > > > When NameNode is busy a DataNode can timeout sending a block report. Then it > sends the block report again. Then NameNode while process these two reports > at the same time can interleave processing storages from different reports. > This screws up the blockReportId field, which makes NameNode think that some > storages are zombie. Replicas from zombie storages are immediately removed, > causing missing blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10682) Add metric to measure lock held time in FSDataSetImpl
[ https://issues.apache.org/jira/browse/HDFS-10682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Liang updated HDFS-10682: -- Status: In Progress (was: Patch Available) > Add metric to measure lock held time in FSDataSetImpl > - > > Key: HDFS-10682 > URL: https://issues.apache.org/jira/browse/HDFS-10682 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Chen Liang >Assignee: Chen Liang > Attachments: HDFS-10682.001.patch, HDFS-10682.002.patch, > HDFS-10682.003.patch, HDFS-10682.004.patch, HDFS-10682.005.patch, > HDFS-10682.006.patch > > > Add a metric to measure the time the lock of FSDataSetImpl is held by a > thread. The goal is to expose this for users to identify operations that > locks dataset for long time ("long" in some sense) and be able to > understand/reason/track the operation based on logs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10682) Add metric to measure lock held time in FSDataSetImpl
[ https://issues.apache.org/jira/browse/HDFS-10682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Liang updated HDFS-10682: -- Status: Patch Available (was: In Progress) > Add metric to measure lock held time in FSDataSetImpl > - > > Key: HDFS-10682 > URL: https://issues.apache.org/jira/browse/HDFS-10682 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Chen Liang >Assignee: Chen Liang > Attachments: HDFS-10682.001.patch, HDFS-10682.002.patch, > HDFS-10682.003.patch, HDFS-10682.004.patch, HDFS-10682.005.patch, > HDFS-10682.006.patch > > > Add a metric to measure the time the lock of FSDataSetImpl is held by a > thread. The goal is to expose this for users to identify operations that > locks dataset for long time ("long" in some sense) and be able to > understand/reason/track the operation based on logs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10682) Add metric to measure lock held time in FSDataSetImpl
[ https://issues.apache.org/jira/browse/HDFS-10682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Liang updated HDFS-10682: -- Attachment: HDFS-10682.006.patch > Add metric to measure lock held time in FSDataSetImpl > - > > Key: HDFS-10682 > URL: https://issues.apache.org/jira/browse/HDFS-10682 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Chen Liang >Assignee: Chen Liang > Attachments: HDFS-10682.001.patch, HDFS-10682.002.patch, > HDFS-10682.003.patch, HDFS-10682.004.patch, HDFS-10682.005.patch, > HDFS-10682.006.patch > > > Add a metric to measure the time the lock of FSDataSetImpl is held by a > thread. The goal is to expose this for users to identify operations that > locks dataset for long time ("long" in some sense) and be able to > understand/reason/track the operation based on logs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10713) Throttle FsNameSystem lock warnings
[ https://issues.apache.org/jira/browse/HDFS-10713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-10713: - Assignee: Hanisha Koneru > Throttle FsNameSystem lock warnings > --- > > Key: HDFS-10713 > URL: https://issues.apache.org/jira/browse/HDFS-10713 > Project: Hadoop HDFS > Issue Type: Bug > Components: logging, namenode >Reporter: Arpit Agarwal >Assignee: Hanisha Koneru > > The NameNode logs a message if the FSNamesystem write lock is held by a > thread for over 1 second. These messages can be throttled to at one most one > per x minutes to avoid potentially filling up NN logs. We can also log the > number of suppressed notices since the last log message. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10713) Throttle FsNameSystem lock warnings
[ https://issues.apache.org/jira/browse/HDFS-10713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-10713: - Target Version/s: 2.8.0 > Throttle FsNameSystem lock warnings > --- > > Key: HDFS-10713 > URL: https://issues.apache.org/jira/browse/HDFS-10713 > Project: Hadoop HDFS > Issue Type: Bug > Components: logging, namenode >Reporter: Arpit Agarwal > > The NameNode logs a message if the FSNamesystem write lock is held by a > thread for over 1 second. These messages can be throttled to at one most one > per x minutes to avoid potentially filling up NN logs. We can also log the > number of suppressed notices since the last log message. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10643) HDFS namenode should always use service user (hdfs) to generateEncryptedKey
[ https://issues.apache.org/jira/browse/HDFS-10643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15402994#comment-15402994 ] Jitendra Nath Pandey commented on HDFS-10643: - +1 > HDFS namenode should always use service user (hdfs) to generateEncryptedKey > --- > > Key: HDFS-10643 > URL: https://issues.apache.org/jira/browse/HDFS-10643 > Project: Hadoop HDFS > Issue Type: Bug > Components: encryption, namenode >Affects Versions: 2.6.0 >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao > Attachments: HDFS-10643.00.patch, HDFS-10643.01.patch, > HDFS-10643.02.patch, HDFS-10643.03.patch, HDFS-10643.04.patch > > > KMSClientProvider is designed to be shared by different KMS clients. When > HDFS Namenode as KMS client talks to KMS to generateEncryptedKey for new file > creation from proxy user (hive, oozie), the proxyuser handling for > KMSClientProvider in this case is unnecessary, which cause 1) an extra proxy > user configuration allowing hdfs user to proxy its clients and 2) KMS acls to > allow non-hdfs user for GENERATE_EEK operation. > This ticket is opened to always use HDFS namenode login user (hdfs) when > talking to KMS to generateEncryptedKey for new file creation. This way, we > have a more secure KMS based HDFS encryption (we can set kms-acls to allow > only hdfs user for GENERATE_EEK) with less configuration hassle for KMS to > allow hdfs to proxy other users. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-10713) Throttle FsNameSystem lock warnings
Arpit Agarwal created HDFS-10713: Summary: Throttle FsNameSystem lock warnings Key: HDFS-10713 URL: https://issues.apache.org/jira/browse/HDFS-10713 Project: Hadoop HDFS Issue Type: Bug Components: logging, namenode Reporter: Arpit Agarwal The NameNode logs a message if the FSNamesystem write lock is held by a thread for over 1 second. These messages can be throttled to at one most one per x minutes to avoid potentially filling up NN logs. We can also log the number of suppressed notices since the last log message. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10702) Add a Client API and Proxy Provider to enable stale read from Standby
[ https://issues.apache.org/jira/browse/HDFS-10702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiayi Zhou updated HDFS-10702: -- Attachment: HDFS-10702.003.patch Fix checkstyle. Some of the style problems are intentionally preserved. > Add a Client API and Proxy Provider to enable stale read from Standby > - > > Key: HDFS-10702 > URL: https://issues.apache.org/jira/browse/HDFS-10702 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Jiayi Zhou >Assignee: Jiayi Zhou >Priority: Minor > Attachments: HDFS-10702.001.patch, HDFS-10702.002.patch, > HDFS-10702.003.patch, StaleReadfromStandbyNN.pdf > > > Currently, clients must always talk to the active NameNode when performing > any metadata operation, which means active NameNode could be a bottleneck for > scalability. One way to solve this problem is to send read-only operations to > Standby NameNode. The disadvantage is that it might be a stale read. > Here, I'm thinking of adding a Client API to enable/disable stale read from > Standby which gives Client the power to set the staleness restriction. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order
[ https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15402951#comment-15402951 ] Daryn Sharp commented on HDFS-10301: I've read this jira as I said I would, and I've looked at the patch. Our nightly build & deploy for 2.7 is broken. DNs claim to report thousands of blocks, NN says nope, -1. This should be reason enough to revert until we get to the bottom of it. We're reverting internally. If that fixes it, I will have someone help me revert tomorrow morning if not already. Why is this patch changing per-storage reports when it's the single-rpc report that is the problem? Is this change compatible? # What does an old NN do if it gets this pseudo-report? Will it forget about all the blocks on the non-last storage? # What does a new NN do when it gets old style reports? Will it remove all but the last storage? This zombie detection, report context, etc is getting out of hand. I don't understand why the zombie detection isn't based on the healthy storages in the heartbeat. Anything else gets flagged as failed and the heartbeat monitor disposes of them. > BlockReport retransmissions may lead to storages falsely being declared > zombie if storage report processing happens out of order > > > Key: HDFS-10301 > URL: https://issues.apache.org/jira/browse/HDFS-10301 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.6.1 >Reporter: Konstantin Shvachko >Assignee: Vinitha Reddy Gankidi >Priority: Critical > Fix For: 2.7.4 > > Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, > HDFS-10301.004.patch, HDFS-10301.005.patch, HDFS-10301.006.patch, > HDFS-10301.007.patch, HDFS-10301.008.patch, HDFS-10301.009.patch, > HDFS-10301.01.patch, HDFS-10301.010.patch, HDFS-10301.011.patch, > HDFS-10301.012.patch, HDFS-10301.branch-2.7.patch, HDFS-10301.branch-2.patch, > HDFS-10301.sample.patch, zombieStorageLogs.rtf > > > When NameNode is busy a DataNode can timeout sending a block report. Then it > sends the block report again. Then NameNode while process these two reports > at the same time can interleave processing storages from different reports. > This screws up the blockReportId field, which makes NameNode think that some > storages are zombie. Replicas from zombie storages are immediately removed, > causing missing blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10643) HDFS namenode should always use service user (hdfs) to generateEncryptedKey
[ https://issues.apache.org/jira/browse/HDFS-10643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-10643: -- Attachment: HDFS-10643.04.patch Thanks [~jnp] for the review. Attach a patch that address the comments. > HDFS namenode should always use service user (hdfs) to generateEncryptedKey > --- > > Key: HDFS-10643 > URL: https://issues.apache.org/jira/browse/HDFS-10643 > Project: Hadoop HDFS > Issue Type: Bug > Components: encryption, namenode >Affects Versions: 2.6.0 >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao > Attachments: HDFS-10643.00.patch, HDFS-10643.01.patch, > HDFS-10643.02.patch, HDFS-10643.03.patch, HDFS-10643.04.patch > > > KMSClientProvider is designed to be shared by different KMS clients. When > HDFS Namenode as KMS client talks to KMS to generateEncryptedKey for new file > creation from proxy user (hive, oozie), the proxyuser handling for > KMSClientProvider in this case is unnecessary, which cause 1) an extra proxy > user configuration allowing hdfs user to proxy its clients and 2) KMS acls to > allow non-hdfs user for GENERATE_EEK operation. > This ticket is opened to always use HDFS namenode login user (hdfs) when > talking to KMS to generateEncryptedKey for new file creation. This way, we > have a more secure KMS based HDFS encryption (we can set kms-acls to allow > only hdfs user for GENERATE_EEK) with less configuration hassle for KMS to > allow hdfs to proxy other users. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10712) Fix TestDataNodeVolumeFailure on 2.* branches.
[ https://issues.apache.org/jira/browse/HDFS-10712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15402938#comment-15402938 ] Vinitha Reddy Gankidi commented on HDFS-10712: -- Done. > Fix TestDataNodeVolumeFailure on 2.* branches. > -- > > Key: HDFS-10712 > URL: https://issues.apache.org/jira/browse/HDFS-10712 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.4 >Reporter: Konstantin Shvachko >Assignee: Vinitha Reddy Gankidi > Attachments: HDFS-10712.branch-2.7.patch, HDFS-10712.branch-2.patch > > > {{TestDataNodeVolumeFailure.testVolumeFailure()}} should pass not null > {{BlockReportContext}}. > This has been fixed on trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10712) Fix TestDataNodeVolumeFailure on 2.* branches.
[ https://issues.apache.org/jira/browse/HDFS-10712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinitha Reddy Gankidi updated HDFS-10712: - Attachment: HDFS-10712.branch-2.patch > Fix TestDataNodeVolumeFailure on 2.* branches. > -- > > Key: HDFS-10712 > URL: https://issues.apache.org/jira/browse/HDFS-10712 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.4 >Reporter: Konstantin Shvachko >Assignee: Vinitha Reddy Gankidi > Attachments: HDFS-10712.branch-2.7.patch, HDFS-10712.branch-2.patch > > > {{TestDataNodeVolumeFailure.testVolumeFailure()}} should pass not null > {{BlockReportContext}}. > This has been fixed on trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10712) Fix TestDataNodeVolumeFailure on 2.* branches.
[ https://issues.apache.org/jira/browse/HDFS-10712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-10712: --- Status: Patch Available (was: Open) > Fix TestDataNodeVolumeFailure on 2.* branches. > -- > > Key: HDFS-10712 > URL: https://issues.apache.org/jira/browse/HDFS-10712 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.4 >Reporter: Konstantin Shvachko >Assignee: Vinitha Reddy Gankidi > Attachments: HDFS-10712.branch-2.7.patch > > > {{TestDataNodeVolumeFailure.testVolumeFailure()}} should pass not null > {{BlockReportContext}}. > This has been fixed on trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10712) Fix TestDataNodeVolumeFailure on 2.* branches.
[ https://issues.apache.org/jira/browse/HDFS-10712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinitha Reddy Gankidi updated HDFS-10712: - Attachment: HDFS-10712.branch-2.7.patch > Fix TestDataNodeVolumeFailure on 2.* branches. > -- > > Key: HDFS-10712 > URL: https://issues.apache.org/jira/browse/HDFS-10712 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.4 >Reporter: Konstantin Shvachko >Assignee: Vinitha Reddy Gankidi > Attachments: HDFS-10712.branch-2.7.patch > > > {{TestDataNodeVolumeFailure.testVolumeFailure()}} should pass not null > {{BlockReportContext}}. > This has been fixed on trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10712) Fix TestDataNodeVolumeFailure on 2.* branches.
[ https://issues.apache.org/jira/browse/HDFS-10712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinitha Reddy Gankidi updated HDFS-10712: - Attachment: (was: HDFS-10712.001.patch) > Fix TestDataNodeVolumeFailure on 2.* branches. > -- > > Key: HDFS-10712 > URL: https://issues.apache.org/jira/browse/HDFS-10712 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.4 >Reporter: Konstantin Shvachko >Assignee: Vinitha Reddy Gankidi > > {{TestDataNodeVolumeFailure.testVolumeFailure()}} should pass not null > {{BlockReportContext}}. > This has been fixed on trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10712) Fix TestDataNodeVolumeFailure on 2.* branches.
[ https://issues.apache.org/jira/browse/HDFS-10712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15402893#comment-15402893 ] Konstantin Shvachko commented on HDFS-10712: The patch looks good. I understand this is for branch-2.7, could you please attach one for branch-2 as well. > Fix TestDataNodeVolumeFailure on 2.* branches. > -- > > Key: HDFS-10712 > URL: https://issues.apache.org/jira/browse/HDFS-10712 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.4 >Reporter: Konstantin Shvachko >Assignee: Vinitha Reddy Gankidi > Attachments: HDFS-10712.001.patch > > > {{TestDataNodeVolumeFailure.testVolumeFailure()}} should pass not null > {{BlockReportContext}}. > This has been fixed on trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10712) Fix TestDataNodeVolumeFailure on 2.* branches.
[ https://issues.apache.org/jira/browse/HDFS-10712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinitha Reddy Gankidi updated HDFS-10712: - Attachment: HDFS-10712.001.patch > Fix TestDataNodeVolumeFailure on 2.* branches. > -- > > Key: HDFS-10712 > URL: https://issues.apache.org/jira/browse/HDFS-10712 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.4 >Reporter: Konstantin Shvachko >Assignee: Vinitha Reddy Gankidi > Attachments: HDFS-10712.001.patch > > > {{TestDataNodeVolumeFailure.testVolumeFailure()}} should pass not null > {{BlockReportContext}}. > This has been fixed on trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10712) Fix TestDataNodeVolumeFailure on 2.* branches.
[ https://issues.apache.org/jira/browse/HDFS-10712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinitha Reddy Gankidi updated HDFS-10712: - Attachment: (was: HDFS-10712.001.patch) > Fix TestDataNodeVolumeFailure on 2.* branches. > -- > > Key: HDFS-10712 > URL: https://issues.apache.org/jira/browse/HDFS-10712 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.4 >Reporter: Konstantin Shvachko >Assignee: Vinitha Reddy Gankidi > > {{TestDataNodeVolumeFailure.testVolumeFailure()}} should pass not null > {{BlockReportContext}}. > This has been fixed on trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10712) Fix TestDataNodeVolumeFailure on 2.* branches.
[ https://issues.apache.org/jira/browse/HDFS-10712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15402881#comment-15402881 ] Vinitha Reddy Gankidi commented on HDFS-10712: -- [~shv] I have attached a patch. Can you please take a look? Thanks. > Fix TestDataNodeVolumeFailure on 2.* branches. > -- > > Key: HDFS-10712 > URL: https://issues.apache.org/jira/browse/HDFS-10712 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.4 >Reporter: Konstantin Shvachko >Assignee: Vinitha Reddy Gankidi > Attachments: HDFS-10712.001.patch > > > {{TestDataNodeVolumeFailure.testVolumeFailure()}} should pass not null > {{BlockReportContext}}. > This has been fixed on trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-10712) Fix TestDataNodeVolumeFailure on 2.* branches.
[ https://issues.apache.org/jira/browse/HDFS-10712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinitha Reddy Gankidi reassigned HDFS-10712: Assignee: Vinitha Reddy Gankidi > Fix TestDataNodeVolumeFailure on 2.* branches. > -- > > Key: HDFS-10712 > URL: https://issues.apache.org/jira/browse/HDFS-10712 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.4 >Reporter: Konstantin Shvachko >Assignee: Vinitha Reddy Gankidi > Attachments: HDFS-10712.001.patch > > > {{TestDataNodeVolumeFailure.testVolumeFailure()}} should pass not null > {{BlockReportContext}}. > This has been fixed on trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10712) Fix TestDataNodeVolumeFailure on 2.* branches.
[ https://issues.apache.org/jira/browse/HDFS-10712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinitha Reddy Gankidi updated HDFS-10712: - Attachment: HDFS-10712.001.patch > Fix TestDataNodeVolumeFailure on 2.* branches. > -- > > Key: HDFS-10712 > URL: https://issues.apache.org/jira/browse/HDFS-10712 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.4 >Reporter: Konstantin Shvachko > Attachments: HDFS-10712.001.patch > > > {{TestDataNodeVolumeFailure.testVolumeFailure()}} should pass not null > {{BlockReportContext}}. > This has been fixed on trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order
[ https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15402826#comment-15402826 ] Konstantin Shvachko commented on HDFS-10301: Daryn, I do not understand what you disagree with. And what is the problem with the implementation, which you object to? Nobody is taking away per-storage block reports. If you don't have time to understand the jira and don't have time to look at your own sandbox cluster, then how I can help you. > BlockReport retransmissions may lead to storages falsely being declared > zombie if storage report processing happens out of order > > > Key: HDFS-10301 > URL: https://issues.apache.org/jira/browse/HDFS-10301 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.6.1 >Reporter: Konstantin Shvachko >Assignee: Vinitha Reddy Gankidi >Priority: Critical > Fix For: 2.7.4 > > Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, > HDFS-10301.004.patch, HDFS-10301.005.patch, HDFS-10301.006.patch, > HDFS-10301.007.patch, HDFS-10301.008.patch, HDFS-10301.009.patch, > HDFS-10301.01.patch, HDFS-10301.010.patch, HDFS-10301.011.patch, > HDFS-10301.012.patch, HDFS-10301.branch-2.7.patch, HDFS-10301.branch-2.patch, > HDFS-10301.sample.patch, zombieStorageLogs.rtf > > > When NameNode is busy a DataNode can timeout sending a block report. Then it > sends the block report again. Then NameNode while process these two reports > at the same time can interleave processing storages from different reports. > This screws up the blockReportId field, which makes NameNode think that some > storages are zombie. Replicas from zombie storages are immediately removed, > causing missing blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10712) Fix TestDataNodeVolumeFailure on 2.* branches.
[ https://issues.apache.org/jira/browse/HDFS-10712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-10712: --- Affects Version/s: 2.7.4 Target Version/s: 2.7.4 > Fix TestDataNodeVolumeFailure on 2.* branches. > -- > > Key: HDFS-10712 > URL: https://issues.apache.org/jira/browse/HDFS-10712 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.4 >Reporter: Konstantin Shvachko > > {{TestDataNodeVolumeFailure.testVolumeFailure()}} should pass not null > {{BlockReportContext}}. > This has been fixed on trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-10712) Fix TestDataNodeVolumeFailure on 2.* branches.
Konstantin Shvachko created HDFS-10712: -- Summary: Fix TestDataNodeVolumeFailure on 2.* branches. Key: HDFS-10712 URL: https://issues.apache.org/jira/browse/HDFS-10712 Project: Hadoop HDFS Issue Type: Bug Reporter: Konstantin Shvachko {{TestDataNodeVolumeFailure.testVolumeFailure()}} should pass not null {{BlockReportContext}}. This has been fixed on trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order
[ https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15402798#comment-15402798 ] Daryn Sharp commented on HDFS-10301: bq. If NN doesn't come out of safe mode, then wouldn't that be caught by unit tests. You have more faith in the unit tests than I do. :) I do not have time to fully debug why sandbox clusters are DOA when I object to the implementation anyway. > BlockReport retransmissions may lead to storages falsely being declared > zombie if storage report processing happens out of order > > > Key: HDFS-10301 > URL: https://issues.apache.org/jira/browse/HDFS-10301 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.6.1 >Reporter: Konstantin Shvachko >Assignee: Vinitha Reddy Gankidi >Priority: Critical > Fix For: 2.7.4 > > Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, > HDFS-10301.004.patch, HDFS-10301.005.patch, HDFS-10301.006.patch, > HDFS-10301.007.patch, HDFS-10301.008.patch, HDFS-10301.009.patch, > HDFS-10301.01.patch, HDFS-10301.010.patch, HDFS-10301.011.patch, > HDFS-10301.012.patch, HDFS-10301.branch-2.7.patch, HDFS-10301.branch-2.patch, > HDFS-10301.sample.patch, zombieStorageLogs.rtf > > > When NameNode is busy a DataNode can timeout sending a block report. Then it > sends the block report again. Then NameNode while process these two reports > at the same time can interleave processing storages from different reports. > This screws up the blockReportId field, which makes NameNode think that some > storages are zombie. Replicas from zombie storages are immediately removed, > causing missing blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order
[ https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15402795#comment-15402795 ] Daryn Sharp commented on HDFS-10301: Block report processing does need to be so complicated. Just ban single-rpc reports and the problem goes away. At most the DN is retransmitting the same storage report. Reprocessing it should not be a problem. If the only objection is multiple RPCs are a scalability issue, I completely disagree. # A single RPC is not scalable. It will not work on clusters with many hundreds of millions of blocks. # The size of the RPC quickly becomes an issue. The memory pressure and pre-mature promotion rate - even with a huge young gen (8-16G) - is not sustainable. # The time to process the RPC becomes an issue. The DN timing out and retransmitting (and causing this jira's bug) becomes an issue. Per-storage block reports eliminated multiple full GCs (2-3 for 5-10mins each) during startup on large clusters. Please revert or I'll grab someone here to help me do it. > BlockReport retransmissions may lead to storages falsely being declared > zombie if storage report processing happens out of order > > > Key: HDFS-10301 > URL: https://issues.apache.org/jira/browse/HDFS-10301 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.6.1 >Reporter: Konstantin Shvachko >Assignee: Vinitha Reddy Gankidi >Priority: Critical > Fix For: 2.7.4 > > Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, > HDFS-10301.004.patch, HDFS-10301.005.patch, HDFS-10301.006.patch, > HDFS-10301.007.patch, HDFS-10301.008.patch, HDFS-10301.009.patch, > HDFS-10301.01.patch, HDFS-10301.010.patch, HDFS-10301.011.patch, > HDFS-10301.012.patch, HDFS-10301.branch-2.7.patch, HDFS-10301.branch-2.patch, > HDFS-10301.sample.patch, zombieStorageLogs.rtf > > > When NameNode is busy a DataNode can timeout sending a block report. Then it > sends the block report again. Then NameNode while process these two reports > at the same time can interleave processing storages from different reports. > This screws up the blockReportId field, which makes NameNode think that some > storages are zombie. Replicas from zombie storages are immediately removed, > causing missing blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order
[ https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15402784#comment-15402784 ] Konstantin Shvachko commented on HDFS-10301: Looks like we need to fix {{TestDataNodeVolumeFailure}} for all 2 branches. Will open a jira for that promptly. Sorry guys for breaking your build. [~daryn], it seems that you are overreacting a bit. Only one test is broken. I rerun other tests reported by Jenkins. They all pass. Could you please elaborate on the problem with the sandbox cluster. If NN doesn't come out of safe mode, then wouldn't that be caught by unit tests. > BlockReport retransmissions may lead to storages falsely being declared > zombie if storage report processing happens out of order > > > Key: HDFS-10301 > URL: https://issues.apache.org/jira/browse/HDFS-10301 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.6.1 >Reporter: Konstantin Shvachko >Assignee: Vinitha Reddy Gankidi >Priority: Critical > Fix For: 2.7.4 > > Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, > HDFS-10301.004.patch, HDFS-10301.005.patch, HDFS-10301.006.patch, > HDFS-10301.007.patch, HDFS-10301.008.patch, HDFS-10301.009.patch, > HDFS-10301.01.patch, HDFS-10301.010.patch, HDFS-10301.011.patch, > HDFS-10301.012.patch, HDFS-10301.branch-2.7.patch, HDFS-10301.branch-2.patch, > HDFS-10301.sample.patch, zombieStorageLogs.rtf > > > When NameNode is busy a DataNode can timeout sending a block report. Then it > sends the block report again. Then NameNode while process these two reports > at the same time can interleave processing storages from different reports. > This screws up the blockReportId field, which makes NameNode think that some > storages are zombie. Replicas from zombie storages are immediately removed, > causing missing blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order
[ https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15402747#comment-15402747 ] Konstantin Shvachko commented on HDFS-10301: And the rest of the tests are passing locally. > BlockReport retransmissions may lead to storages falsely being declared > zombie if storage report processing happens out of order > > > Key: HDFS-10301 > URL: https://issues.apache.org/jira/browse/HDFS-10301 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.6.1 >Reporter: Konstantin Shvachko >Assignee: Vinitha Reddy Gankidi >Priority: Critical > Fix For: 2.7.4 > > Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, > HDFS-10301.004.patch, HDFS-10301.005.patch, HDFS-10301.006.patch, > HDFS-10301.007.patch, HDFS-10301.008.patch, HDFS-10301.009.patch, > HDFS-10301.01.patch, HDFS-10301.010.patch, HDFS-10301.011.patch, > HDFS-10301.012.patch, HDFS-10301.branch-2.7.patch, HDFS-10301.branch-2.patch, > HDFS-10301.sample.patch, zombieStorageLogs.rtf > > > When NameNode is busy a DataNode can timeout sending a block report. Then it > sends the block report again. Then NameNode while process these two reports > at the same time can interleave processing storages from different reports. > This screws up the blockReportId field, which makes NameNode think that some > storages are zombie. Replicas from zombie storages are immediately removed, > causing missing blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order
[ https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15402734#comment-15402734 ] Vinitha Reddy Gankidi commented on HDFS-10301: -- [~ebadger] Thanks for reporting this. TestDataNodeVolumeFailure does not call blockReport() with context=null on trunk. This was fixed as a part of HDFS-9260. We need to modify TestDataNodeVolumeFailure.testVolumeFailure() for branch-2.7 as well: {code} -cluster.getNameNodeRpc().blockReport(dnR, bpid, reports, null); +cluster.getNameNodeRpc().blockReport(dnR, bpid, reports, +new BlockReportContext(1, 0, System.nanoTime())); {code} > BlockReport retransmissions may lead to storages falsely being declared > zombie if storage report processing happens out of order > > > Key: HDFS-10301 > URL: https://issues.apache.org/jira/browse/HDFS-10301 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.6.1 >Reporter: Konstantin Shvachko >Assignee: Vinitha Reddy Gankidi >Priority: Critical > Fix For: 2.7.4 > > Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, > HDFS-10301.004.patch, HDFS-10301.005.patch, HDFS-10301.006.patch, > HDFS-10301.007.patch, HDFS-10301.008.patch, HDFS-10301.009.patch, > HDFS-10301.01.patch, HDFS-10301.010.patch, HDFS-10301.011.patch, > HDFS-10301.012.patch, HDFS-10301.branch-2.7.patch, HDFS-10301.branch-2.patch, > HDFS-10301.sample.patch, zombieStorageLogs.rtf > > > When NameNode is busy a DataNode can timeout sending a block report. Then it > sends the block report again. Then NameNode while process these two reports > at the same time can interleave processing storages from different reports. > This screws up the blockReportId field, which makes NameNode think that some > storages are zombie. Replicas from zombie storages are immediately removed, > causing missing blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Reopened] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order
[ https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daryn Sharp reopened HDFS-10301: > BlockReport retransmissions may lead to storages falsely being declared > zombie if storage report processing happens out of order > > > Key: HDFS-10301 > URL: https://issues.apache.org/jira/browse/HDFS-10301 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.6.1 >Reporter: Konstantin Shvachko >Assignee: Vinitha Reddy Gankidi >Priority: Critical > Fix For: 2.7.4 > > Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, > HDFS-10301.004.patch, HDFS-10301.005.patch, HDFS-10301.006.patch, > HDFS-10301.007.patch, HDFS-10301.008.patch, HDFS-10301.009.patch, > HDFS-10301.01.patch, HDFS-10301.010.patch, HDFS-10301.011.patch, > HDFS-10301.012.patch, HDFS-10301.branch-2.7.patch, HDFS-10301.branch-2.patch, > HDFS-10301.sample.patch, zombieStorageLogs.rtf > > > When NameNode is busy a DataNode can timeout sending a block report. Then it > sends the block report again. Then NameNode while process these two reports > at the same time can interleave processing storages from different reports. > This screws up the blockReportId field, which makes NameNode think that some > storages are zombie. Replicas from zombie storages are immediately removed, > causing missing blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order
[ https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15402697#comment-15402697 ] Daryn Sharp commented on HDFS-10301: -1 This needs to be reverted and I'm too git-ignorant to to do. Our sandbox clusters won't come out of safemode because the NN thinks the DNs are reporting -1 blocks. I see this patch is return -1 blocks for a "storage report". I need to catch up on this jira but in the meantime it must be reverted. I find it odd this patch was committed with so many failed tests. > BlockReport retransmissions may lead to storages falsely being declared > zombie if storage report processing happens out of order > > > Key: HDFS-10301 > URL: https://issues.apache.org/jira/browse/HDFS-10301 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.6.1 >Reporter: Konstantin Shvachko >Assignee: Vinitha Reddy Gankidi >Priority: Critical > Fix For: 2.7.4 > > Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, > HDFS-10301.004.patch, HDFS-10301.005.patch, HDFS-10301.006.patch, > HDFS-10301.007.patch, HDFS-10301.008.patch, HDFS-10301.009.patch, > HDFS-10301.01.patch, HDFS-10301.010.patch, HDFS-10301.011.patch, > HDFS-10301.012.patch, HDFS-10301.branch-2.7.patch, HDFS-10301.branch-2.patch, > HDFS-10301.sample.patch, zombieStorageLogs.rtf > > > When NameNode is busy a DataNode can timeout sending a block report. Then it > sends the block report again. Then NameNode while process these two reports > at the same time can interleave processing storages from different reports. > This screws up the blockReportId field, which makes NameNode think that some > storages are zombie. Replicas from zombie storages are immediately removed, > causing missing blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-8780) Fetching live/dead datanode list with arg true for removeDecommissionNode,returns list with decom node.
[ https://issues.apache.org/jira/browse/HDFS-8780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15402686#comment-15402686 ] Eric Badger commented on HDFS-8780: --- [~shv], the 2.7 patch that you committed here breaks TestHostsFiles.testHostsExcludeInUI. The failure is consistently reproducible and the associated stack trace is shown below. {noformat} java.lang.AssertionError: Live nodes should contain the decommissioned node at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.assertTrue(Assert.java:41) at org.apache.hadoop.hdfs.server.namenode.TestHostsFiles.testHostsExcludeInUI(TestHostsFiles.java:126) {noformat} > Fetching live/dead datanode list with arg true for > removeDecommissionNode,returns list with decom node. > --- > > Key: HDFS-8780 > URL: https://issues.apache.org/jira/browse/HDFS-8780 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: J.Andreina >Assignee: J.Andreina > Fix For: 2.7.4 > > Attachments: HDFS-8780-branch-2.7.patch, HDFS-8780.1.patch, > HDFS-8780.2.patch, HDFS-8780.3.patch > > > Current implementation: > == > DatanodeManager#removeDecomNodeFromList() , Decommissioned node will be > removed from dead/live node list only if below conditions are met > I . If the Include list is not empty. > II. If include and exclude list does not have decommissioned node and node > state is decommissioned. > {code} > if (!hostFileManager.hasIncludes()) { > return; >} > if ((!hostFileManager.isIncluded(node)) && > (!hostFileManager.isExcluded(node)) > && node.isDecommissioned()) { > // Include list is not empty, an existing datanode does not appear > // in both include or exclude lists and it has been decommissioned. > // Remove it from the node list. > it.remove(); > } > {code} > As mentioned in javadoc a datanode cannot be in "already decommissioned > datanode state". > Following the steps mentioned in javadoc datanode state is "dead" and not > decommissioned. > *Can we avoid the unnecessary checks and have check for the node is in > decommissioned state then remove from node list. ?* > Please provide your feedback. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10643) HDFS namenode should always use service user (hdfs) to generateEncryptedKey
[ https://issues.apache.org/jira/browse/HDFS-10643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15402547#comment-15402547 ] Jitendra Nath Pandey commented on HDFS-10643: - Minor comment: The {{edek}} declaration and assignment could be done on the same line i.e. {code} EncryptedKeyVersion edek = SecurityUtil.doAs {code} > HDFS namenode should always use service user (hdfs) to generateEncryptedKey > --- > > Key: HDFS-10643 > URL: https://issues.apache.org/jira/browse/HDFS-10643 > Project: Hadoop HDFS > Issue Type: Bug > Components: encryption, namenode >Affects Versions: 2.6.0 >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao > Attachments: HDFS-10643.00.patch, HDFS-10643.01.patch, > HDFS-10643.02.patch, HDFS-10643.03.patch > > > KMSClientProvider is designed to be shared by different KMS clients. When > HDFS Namenode as KMS client talks to KMS to generateEncryptedKey for new file > creation from proxy user (hive, oozie), the proxyuser handling for > KMSClientProvider in this case is unnecessary, which cause 1) an extra proxy > user configuration allowing hdfs user to proxy its clients and 2) KMS acls to > allow non-hdfs user for GENERATE_EEK operation. > This ticket is opened to always use HDFS namenode login user (hdfs) when > talking to KMS to generateEncryptedKey for new file creation. This way, we > have a more secure KMS based HDFS encryption (we can set kms-acls to allow > only hdfs user for GENERATE_EEK) with less configuration hassle for KMS to > allow hdfs to proxy other users. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10682) Add metric to measure lock held time in FSDataSetImpl
[ https://issues.apache.org/jira/browse/HDFS-10682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15402481#comment-15402481 ] Chen Liang commented on HDFS-10682: --- Thanks [~arpitagarwal] for the comments! Will upload another patch fixing this soon. > Add metric to measure lock held time in FSDataSetImpl > - > > Key: HDFS-10682 > URL: https://issues.apache.org/jira/browse/HDFS-10682 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Chen Liang >Assignee: Chen Liang > Attachments: HDFS-10682.001.patch, HDFS-10682.002.patch, > HDFS-10682.003.patch, HDFS-10682.004.patch, HDFS-10682.005.patch > > > Add a metric to measure the time the lock of FSDataSetImpl is held by a > thread. The goal is to expose this for users to identify operations that > locks dataset for long time ("long" in some sense) and be able to > understand/reason/track the operation based on logs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10682) Add metric to measure lock held time in FSDataSetImpl
[ https://issues.apache.org/jira/browse/HDFS-10682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15402478#comment-15402478 ] Arpit Agarwal commented on HDFS-10682: -- Hi [~vagarychen], thanks for taking this up. I recommend splitting the work into two parts: # Refactor the code to synchronize on a new Reentrant lock instead of the FsDatasetImpl object. (create a separate Jira for this). The advantage of a wrapper object for the lock is callers won't need to add boilerplate code for instrumentation. Also we can use try-with-resources instead of having to release the lock manually. # In the second patch we can add instrumentation in just the acquire/close methods and expose it as a metric. > Add metric to measure lock held time in FSDataSetImpl > - > > Key: HDFS-10682 > URL: https://issues.apache.org/jira/browse/HDFS-10682 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Chen Liang >Assignee: Chen Liang > Attachments: HDFS-10682.001.patch, HDFS-10682.002.patch, > HDFS-10682.003.patch, HDFS-10682.004.patch, HDFS-10682.005.patch > > > Add a metric to measure the time the lock of FSDataSetImpl is held by a > thread. The goal is to expose this for users to identify operations that > locks dataset for long time ("long" in some sense) and be able to > understand/reason/track the operation based on logs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10674) Optimize creating a full path from an inode
[ https://issues.apache.org/jira/browse/HDFS-10674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15402443#comment-15402443 ] Hadoop QA commented on HDFS-10674: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 22s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 51s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 33s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 54s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 58s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 28s{color} | {color:green} hadoop-hdfs-project/hadoop-hdfs: The patch generated 0 new + 293 unchanged - 4 fixed = 293 total (was 297) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 53s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 77m 29s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 20s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 97m 26s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.namenode.snapshot.TestSnapshotFileLength | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12819495/HDFS-10674.patch | | JIRA Issue | HDFS-10674 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux efda9c68b7fa 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 9f473cf | | Default Java | 1.8.0_101 | | findbugs | v3.0.0 | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/16277/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/16277/testReport/ | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/16277/console | | Powered by | Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Optimize creating a full path from an inode > --- > > Key: HDFS-10674 > URL: https://issues.apache.org/jira/browse/HDFS-10674 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs >
[jira] [Commented] (HDFS-10678) Documenting NNThroughputBenchmark tool
[ https://issues.apache.org/jira/browse/HDFS-10678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15402427#comment-15402427 ] Masatake Iwasaki commented on HDFS-10678: - {noformat} +| OPERATION\_OPTION| Commands | {noformat} "Commands" should be "operation-specific parameters"? > Documenting NNThroughputBenchmark tool > -- > > Key: HDFS-10678 > URL: https://issues.apache.org/jira/browse/HDFS-10678 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: benchmarks, test >Reporter: Mingliang Liu >Assignee: Mingliang Liu > Labels: documentation > Attachments: HDFS-10678.000.patch, HDFS-10678.001.patch > > > The best (only) documentation for the NNThroughputBenchmark currently exists > as a JavaDoc on the NNThroughputBenchmark class. This is less than useful, > especially since we no longer generate javadocs for HDFS as part of the build > process. I suggest we extract it into a separate markdown doc, or merge it > with other benchmarking materials (if any?) about HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10656) Optimize conversion of byte arrays back to path string
[ https://issues.apache.org/jira/browse/HDFS-10656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15402419#comment-15402419 ] Hadoop QA commented on HDFS-10656: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 30s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 27s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 51s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 29s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 59s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 11s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 59s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 62m 39s{color} | {color:green} hadoop-hdfs in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 21s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 85m 39s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12819146/HDFS-10656.patch | | JIRA Issue | HDFS-10656 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux ccd3bc7fa258 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 9f473cf | | Default Java | 1.8.0_101 | | findbugs | v3.0.0 | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/16278/testReport/ | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/16278/console | | Powered by | Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Optimize conversion of byte arrays back to path string > -- > > Key: HDFS-10656 > URL: https://issues.apache.org/jira/browse/HDFS-10656 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs >Reporter: Daryn Sharp >Assignee: Daryn Sharp > Attachments: HDFS-10656.patch > > > {{DFSUtil.byteArray2PathString}} generates excessive object allocation. > # each
[jira] [Commented] (HDFS-10678) Documenting NNThroughputBenchmark tool
[ https://issues.apache.org/jira/browse/HDFS-10678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15402397#comment-15402397 ] Masatake Iwasaki commented on HDFS-10678: - Thanks for working on this, [~liuml07]. I think Benchmarking.md should be just toc and the doc of NNThroughputBenchmakr should be under hadoop-hdfs-project/hadoop-hdfs/src/site as independent page. > Documenting NNThroughputBenchmark tool > -- > > Key: HDFS-10678 > URL: https://issues.apache.org/jira/browse/HDFS-10678 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: benchmarks, test >Reporter: Mingliang Liu >Assignee: Mingliang Liu > Labels: documentation > Attachments: HDFS-10678.000.patch, HDFS-10678.001.patch > > > The best (only) documentation for the NNThroughputBenchmark currently exists > as a JavaDoc on the NNThroughputBenchmark class. This is less than useful, > especially since we no longer generate javadocs for HDFS as part of the build > process. I suggest we extract it into a separate markdown doc, or merge it > with other benchmarking materials (if any?) about HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order
[ https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15402341#comment-15402341 ] Eric Badger commented on HDFS-10301: [~shv], this breaks TestDataNodeVolumeFailure.testVolumeFailure(). blockReport() is called with context = null. Then inside of blockReport we try to call methods on context with it still set to null {noformat} java.lang.NullPointerException: null at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:1342) at org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure.testVolumeFailure(TestDataNodeVolumeFailure.java:189) {noformat} > BlockReport retransmissions may lead to storages falsely being declared > zombie if storage report processing happens out of order > > > Key: HDFS-10301 > URL: https://issues.apache.org/jira/browse/HDFS-10301 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.6.1 >Reporter: Konstantin Shvachko >Assignee: Vinitha Reddy Gankidi >Priority: Critical > Fix For: 2.7.4 > > Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, > HDFS-10301.004.patch, HDFS-10301.005.patch, HDFS-10301.006.patch, > HDFS-10301.007.patch, HDFS-10301.008.patch, HDFS-10301.009.patch, > HDFS-10301.01.patch, HDFS-10301.010.patch, HDFS-10301.011.patch, > HDFS-10301.012.patch, HDFS-10301.branch-2.7.patch, HDFS-10301.branch-2.patch, > HDFS-10301.sample.patch, zombieStorageLogs.rtf > > > When NameNode is busy a DataNode can timeout sending a block report. Then it > sends the block report again. Then NameNode while process these two reports > at the same time can interleave processing storages from different reports. > This screws up the blockReportId field, which makes NameNode think that some > storages are zombie. Replicas from zombie storages are immediately removed, > causing missing blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-8901) Use ByteBuffer in striping positional read
[ https://issues.apache.org/jira/browse/HDFS-8901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15402328#comment-15402328 ] Kai Zheng commented on HDFS-8901: - Hi Youwei, Thanks for your update. Please note we should support both direct ByteBuffer and on-heap ByteBuffer, thus calling aBuffer.array() isn't proper. I would suggest you resume this effort based on the previous patch, instead of reworking this at all. > Use ByteBuffer in striping positional read > -- > > Key: HDFS-8901 > URL: https://issues.apache.org/jira/browse/HDFS-8901 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Youwei Wang > Attachments: HDFS-8901-v10.patch, HDFS-8901-v2.patch, > HDFS-8901-v3.patch, HDFS-8901-v4.patch, HDFS-8901-v5.patch, > HDFS-8901-v6.patch, HDFS-8901-v7.patch, HDFS-8901-v8.patch, > HDFS-8901-v9.patch, HDFS-8901.v11.patch, HDFS-8901.v12.patch, > HDFS-8901.v13.patch, initial-poc.patch > > > Native erasure coder prefers to direct ByteBuffer for performance > consideration. To prepare for it, this change uses ByteBuffer through the > codes in implementing striping position read. It will also fix avoiding > unnecessary data copying between striping read chunk buffers and decode input > buffers. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10655) Fix path related byte array conversion bugs
[ https://issues.apache.org/jira/browse/HDFS-10655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15402266#comment-15402266 ] Hudson commented on HDFS-10655: --- SUCCESS: Integrated in Hadoop-trunk-Commit #10188 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/10188/]) HDFS-10655. Fix path related byte array conversion bugs. (daryn) (daryn: rev 9f473cf903e586c556154abd56b3a3d820c6b028) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestPathComponents.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSUtil.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFsLimits.java > Fix path related byte array conversion bugs > --- > > Key: HDFS-10655 > URL: https://issues.apache.org/jira/browse/HDFS-10655 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs >Reporter: Daryn Sharp >Assignee: Daryn Sharp > Fix For: 2.8.0 > > Attachments: HDFS-10655.patch, HDFS-10655.patch > > > {{DFSUtil.bytes2ByteArray}} does not always properly handle runs of multiple > separators, nor does it handle relative paths correctly. > {{DFSUtil.byteArray2PathString}} does not rebuild the path correctly unless > the specified range is the entire component array. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10674) Optimize creating a full path from an inode
[ https://issues.apache.org/jira/browse/HDFS-10674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daryn Sharp updated HDFS-10674: --- Status: Patch Available (was: Open) > Optimize creating a full path from an inode > --- > > Key: HDFS-10674 > URL: https://issues.apache.org/jira/browse/HDFS-10674 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs >Reporter: Daryn Sharp >Assignee: Daryn Sharp > Attachments: HDFS-10674.patch > > > {{INode#getFullPathName}} walks up the inode tree, creates a INode[], > converts each component byte[] name to a String while building the path. > This involves many allocations, copies, and char conversions. > The path should be built with a single byte[] allocation. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10656) Optimize conversion of byte arrays back to path string
[ https://issues.apache.org/jira/browse/HDFS-10656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daryn Sharp updated HDFS-10656: --- Status: Patch Available (was: Open) > Optimize conversion of byte arrays back to path string > -- > > Key: HDFS-10656 > URL: https://issues.apache.org/jira/browse/HDFS-10656 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs >Reporter: Daryn Sharp >Assignee: Daryn Sharp > Attachments: HDFS-10656.patch > > > {{DFSUtil.byteArray2PathString}} generates excessive object allocation. > # each byte array is encoded to a string (copy) > # string appended to a builder which extracts the chars from the intermediate > string (copy) and adds to its own char array > # builder's char array is re-alloced if over 16 chars (copy) > # builder's toString creates another string (copy) > Instead of allocating all these objects and performing multiple byte/char > encoding/decoding conversions, the byte array can be built in-place with a > single final conversion to a string. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10655) Fix path related byte array conversion bugs
[ https://issues.apache.org/jira/browse/HDFS-10655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daryn Sharp updated HDFS-10655: --- Resolution: Fixed Fix Version/s: 2.8.0 Status: Resolved (was: Patch Available) Thanks Jing! > Fix path related byte array conversion bugs > --- > > Key: HDFS-10655 > URL: https://issues.apache.org/jira/browse/HDFS-10655 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs >Reporter: Daryn Sharp >Assignee: Daryn Sharp > Fix For: 2.8.0 > > Attachments: HDFS-10655.patch, HDFS-10655.patch > > > {{DFSUtil.bytes2ByteArray}} does not always properly handle runs of multiple > separators, nor does it handle relative paths correctly. > {{DFSUtil.byteArray2PathString}} does not rebuild the path correctly unless > the specified range is the entire component array. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10711) Optimize FSPermissionChecker group membership check
[ https://issues.apache.org/jira/browse/HDFS-10711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daryn Sharp updated HDFS-10711: --- Attachment: (was: HDFS-10711.patch) > Optimize FSPermissionChecker group membership check > --- > > Key: HDFS-10711 > URL: https://issues.apache.org/jira/browse/HDFS-10711 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs >Reporter: Daryn Sharp >Assignee: Daryn Sharp > Attachments: HDFS-10711.patch > > > HADOOP-13442 obviates the need for multiple group related object allocations. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10711) Optimize FSPermissionChecker group membership check
[ https://issues.apache.org/jira/browse/HDFS-10711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daryn Sharp updated HDFS-10711: --- Attachment: HDFS-10711.patch > Optimize FSPermissionChecker group membership check > --- > > Key: HDFS-10711 > URL: https://issues.apache.org/jira/browse/HDFS-10711 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs >Reporter: Daryn Sharp >Assignee: Daryn Sharp > Attachments: HDFS-10711.patch > > > HADOOP-13442 obviates the need for multiple group related object allocations. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10711) Optimize FSPermissionChecker group membership check
[ https://issues.apache.org/jira/browse/HDFS-10711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daryn Sharp updated HDFS-10711: --- Attachment: HDFS-10711.patch > Optimize FSPermissionChecker group membership check > --- > > Key: HDFS-10711 > URL: https://issues.apache.org/jira/browse/HDFS-10711 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs >Reporter: Daryn Sharp >Assignee: Daryn Sharp > Attachments: HDFS-10711.patch > > > HADOOP-13442 obviates the need for multiple group related object allocations. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-10711) Optimize FSPermissionChecker group membership check
Daryn Sharp created HDFS-10711: -- Summary: Optimize FSPermissionChecker group membership check Key: HDFS-10711 URL: https://issues.apache.org/jira/browse/HDFS-10711 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs Reporter: Daryn Sharp Assignee: Daryn Sharp HADOOP-13442 obviates the need for multiple group related object allocations. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10673) Optimize FSPermissionChecker's internal path usage
[ https://issues.apache.org/jira/browse/HDFS-10673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15402090#comment-15402090 ] Daryn Sharp commented on HDFS-10673: [~jingzhao], please let me know if latest patch is ok, or if I should revert checking subdir access back to calling the inode attr provider with just the components of the original subdir. > Optimize FSPermissionChecker's internal path usage > -- > > Key: HDFS-10673 > URL: https://issues.apache.org/jira/browse/HDFS-10673 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs >Reporter: Daryn Sharp >Assignee: Daryn Sharp > Attachments: HDFS-10673.1.patch, HDFS-10673.patch > > > The INodeAttributeProvider and AccessControlEnforcer features degrade > performance and generate excessive garbage even when neither is used. Main > issues: > # A byte[][] of components is unnecessarily created. Each path component > lookup converts a subrange of the byte[][] to a new String[] - then not used > by default attribute provider. > # Subaccess checks are insanely expensive. The full path of every subdir is > created by walking up the inode tree, creating a INode[], building a string > by converting each inode's byte[] name to a string, etc. Which will only be > used if there's an exception. > The expensive of #1 should only be incurred when using the provider/enforcer > feature. For #2, paths should be created on-demand for exceptions. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10700) I increase the value of the GC_OPTS on namenode. After I modified the value ,namenode start failed.
[ https://issues.apache.org/jira/browse/HDFS-10700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15402007#comment-15402007 ] Weiwei Yang commented on HDFS-10700: Hi [~biggersell] Can you upload related namenode log to help diagnosis ? Thanks! > I increase the value of the GC_OPTS on namenode. After I modified the value > ,namenode start failed. > --- > > Key: HDFS-10700 > URL: https://issues.apache.org/jira/browse/HDFS-10700 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.2 > Environment: Linux Suse 11 SP3 >Reporter: Liu Guannan > > I increase the value of the GC_OPTS on namenode. After I modified the value > ,namenode start failed.The reasion is that Datanodes reported block status > to the namenode, resulting in namenode update block status slowly. And then > namenode start failed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10710) In BlockManager#rescanPostponedMisreplicatedBlocks(), start and end block counts should be get with the protect with lock
[ https://issues.apache.org/jira/browse/HDFS-10710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15401907#comment-15401907 ] Hadoop QA commented on HDFS-10710: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 12s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 49s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 46s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 27s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 50s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 12s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 41s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 55s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 59m 6s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 17s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 77m 51s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.datanode.TestDataNodeErasureCodingMetrics | | | hadoop.hdfs.TestDFSShell | | | hadoop.hdfs.server.datanode.TestDirectoryScanner | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12821300/HDFS-10710.1.patch | | JIRA Issue | HDFS-10710 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 04b97c4c9717 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 770b5eb | | Default Java | 1.8.0_101 | | findbugs | v3.0.0 | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/16276/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/16276/testReport/ | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/16276/console | | Powered by | Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > In BlockManager#rescanPostponedMisreplicatedBlocks(), start and end block > counts should be get with the protect with lock >
[jira] [Commented] (HDFS-10602) TestBalancer runs timeout intermittently
[ https://issues.apache.org/jira/browse/HDFS-10602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15401798#comment-15401798 ] Hadoop QA commented on HDFS-10602: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 12s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 33s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 50s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 41s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 55s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 56m 43s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 18s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 75m 9s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.TestFileChecksum | | | hadoop.hdfs.TestDFSShell | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12821292/HDFS-10602.002.patch | | JIRA Issue | HDFS-10602 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux da9ab7982318 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 770b5eb | | Default Java | 1.8.0_101 | | findbugs | v3.0.0 | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/16275/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/16275/testReport/ | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/16275/console | | Powered by | Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > TestBalancer runs timeout intermittently > > > Key: HDFS-10602 > URL: https://issues.apache.org/jira/browse/HDFS-10602 > Project: Hadoop HDFS > Issue Type: Bug >
[jira] [Updated] (HDFS-10710) In BlockManager#rescanPostponedMisreplicatedBlocks(), start and end block counts should be get with the protect with lock
[ https://issues.apache.org/jira/browse/HDFS-10710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] GAO Rui updated HDFS-10710: --- Status: Patch Available (was: Open) > In BlockManager#rescanPostponedMisreplicatedBlocks(), start and end block > counts should be get with the protect with lock > - > > Key: HDFS-10710 > URL: https://issues.apache.org/jira/browse/HDFS-10710 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: GAO Rui >Assignee: GAO Rui > Attachments: HDFS-10710.1.patch > > > In BlockManager#rescanPostponedMisreplicatedBlocks(), start and end block > counts should be get with the protect with lock. Or, log records like "-1 > blocks are removed" which indicate minus blocks are removed could be > generated. > For example, following scenario: > 1. thread1 run {{long startPostponedMisReplicatedBlocksCount = > getPostponedMisreplicatedBlocksCount();}} currently > startPostponedMisReplicatedBlocksCount get the value 20. > 2. before thread1 run {{namesystem.writeLock();}} , thread2 increment > postponedMisreplicatedBlocksCount by 1, so postponedMisreplicatedBlocksCount > is 21 now. > 3. thread1 end the iteration, but no postponed block is removed, so after run > {{long endPostponedMisReplicatedBlocksCount = > getPostponedMisreplicatedBlocksCount();}}, > endPostponedMisReplicatedBlocksCount get the value of 21. > 4. thread 1 generate the log: > {noformat} > LOG.info("Rescan of postponedMisreplicatedBlocks completed in " + > (Time.monotonicNow() - startTimeRescanPostponedMisReplicatedBlocks) > + > " msecs. " + endPostponedMisReplicatedBlocksCount + > " blocks are left. " + (startPostponedMisReplicatedBlocksCount - > endPostponedMisReplicatedBlocksCount) + " blocks are removed."); > {noformat} > Then, we'll get the log record like "-1 blocks are removed." -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10710) In BlockManager#rescanPostponedMisreplicatedBlocks(), start and end block counts should be get with the protect with lock
[ https://issues.apache.org/jira/browse/HDFS-10710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] GAO Rui updated HDFS-10710: --- Attachment: HDFS-10710.1.patch > In BlockManager#rescanPostponedMisreplicatedBlocks(), start and end block > counts should be get with the protect with lock > - > > Key: HDFS-10710 > URL: https://issues.apache.org/jira/browse/HDFS-10710 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: GAO Rui >Assignee: GAO Rui > Attachments: HDFS-10710.1.patch > > > In BlockManager#rescanPostponedMisreplicatedBlocks(), start and end block > counts should be get with the protect with lock. Or, log records like "-1 > blocks are removed" which indicate minus blocks are removed could be > generated. > For example, following scenario: > 1. thread1 run {{long startPostponedMisReplicatedBlocksCount = > getPostponedMisreplicatedBlocksCount();}} currently > startPostponedMisReplicatedBlocksCount get the value 20. > 2. before thread1 run {{namesystem.writeLock();}} , thread2 increment > postponedMisreplicatedBlocksCount by 1, so postponedMisreplicatedBlocksCount > is 21 now. > 3. thread1 end the iteration, but no postponed block is removed, so after run > {{long endPostponedMisReplicatedBlocksCount = > getPostponedMisreplicatedBlocksCount();}}, > endPostponedMisReplicatedBlocksCount get the value of 21. > 4. thread 1 generate the log: > {noformat} > LOG.info("Rescan of postponedMisreplicatedBlocks completed in " + > (Time.monotonicNow() - startTimeRescanPostponedMisReplicatedBlocks) > + > " msecs. " + endPostponedMisReplicatedBlocksCount + > " blocks are left. " + (startPostponedMisReplicatedBlocksCount - > endPostponedMisReplicatedBlocksCount) + " blocks are removed."); > {noformat} > Then, we'll get the log record like "-1 blocks are removed." -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10710) In BlockManager#rescanPostponedMisreplicatedBlocks(), start and end block counts should be get with the protect with lock
[ https://issues.apache.org/jira/browse/HDFS-10710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] GAO Rui updated HDFS-10710: --- Description: In BlockManager#rescanPostponedMisreplicatedBlocks(), start and end block counts should be get with the protect with lock. Or, log records like "-1 blocks are removed" which indicate minus blocks are removed could be generated. For example, following scenario: 1. thread1 run {{long startPostponedMisReplicatedBlocksCount = getPostponedMisreplicatedBlocksCount();}} currently startPostponedMisReplicatedBlocksCount get the value 20. 2. before thread1 run {{namesystem.writeLock();}} , thread2 increment postponedMisreplicatedBlocksCount by 1, so postponedMisreplicatedBlocksCount is 21 now. 3. thread1 end the iteration, but no postponed block is removed, so after run {{long endPostponedMisReplicatedBlocksCount = getPostponedMisreplicatedBlocksCount();}}, endPostponedMisReplicatedBlocksCount get the value of 21. 4. thread 1 generate the log: {noformat} LOG.info("Rescan of postponedMisreplicatedBlocks completed in " + (Time.monotonicNow() - startTimeRescanPostponedMisReplicatedBlocks) + " msecs. " + endPostponedMisReplicatedBlocksCount + " blocks are left. " + (startPostponedMisReplicatedBlocksCount - endPostponedMisReplicatedBlocksCount) + " blocks are removed."); {noformat} Then, we'll get the log record like "-1 blocks are removed." was: In BlockManager#rescanPostponedMisreplicatedBlocks(), start and end block counts should be get with the protect with lock. Or, log records like "-1 blocks are removed" which indicate minus blocks are removed could be generated. For example, following scenario: 1. thread1 run {{long startPostponedMisReplicatedBlocksCount = getPostponedMisreplicatedBlocksCount();}} currently startPostponedMisReplicatedBlocksCount get the value 20. 2. before thread1 run {{namesystem.writeLock();}} , thread2 increment postponedMisreplicatedBlocksCount by 1, so postponedMisreplicatedBlocksCount is 21 now. 3. thread1 end the iteration, but no postponed block is removed, so after run {{long endPostponedMisReplicatedBlocksCount = getPostponedMisreplicatedBlocksCount();}}, endPostponedMisReplicatedBlocksCount get the value of 21. 4. thread 1 generate the log: {code} LOG.info("Rescan of postponedMisreplicatedBlocks completed in " + (Time.monotonicNow() - startTimeRescanPostponedMisReplicatedBlocks) + " msecs. " + endPostponedMisReplicatedBlocksCount + " blocks are left. " + (startPostponedMisReplicatedBlocksCount - endPostponedMisReplicatedBlocksCount) + " blocks are removed."); {code} Then, we'll get the log record like "-1 blocks are removed." > In BlockManager#rescanPostponedMisreplicatedBlocks(), start and end block > counts should be get with the protect with lock > - > > Key: HDFS-10710 > URL: https://issues.apache.org/jira/browse/HDFS-10710 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: GAO Rui >Assignee: GAO Rui > > In BlockManager#rescanPostponedMisreplicatedBlocks(), start and end block > counts should be get with the protect with lock. Or, log records like "-1 > blocks are removed" which indicate minus blocks are removed could be > generated. > For example, following scenario: > 1. thread1 run {{long startPostponedMisReplicatedBlocksCount = > getPostponedMisreplicatedBlocksCount();}} currently > startPostponedMisReplicatedBlocksCount get the value 20. > 2. before thread1 run {{namesystem.writeLock();}} , thread2 increment > postponedMisreplicatedBlocksCount by 1, so postponedMisreplicatedBlocksCount > is 21 now. > 3. thread1 end the iteration, but no postponed block is removed, so after run > {{long endPostponedMisReplicatedBlocksCount = > getPostponedMisreplicatedBlocksCount();}}, > endPostponedMisReplicatedBlocksCount get the value of 21. > 4. thread 1 generate the log: > {noformat} > LOG.info("Rescan of postponedMisreplicatedBlocks completed in " + > (Time.monotonicNow() - startTimeRescanPostponedMisReplicatedBlocks) > + > " msecs. " + endPostponedMisReplicatedBlocksCount + > " blocks are left. " + (startPostponedMisReplicatedBlocksCount - > endPostponedMisReplicatedBlocksCount) + " blocks are removed."); > {noformat} > Then, we'll get the log record like "-1 blocks are removed." -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10710) In BlockManager#rescanPostponedMisreplicatedBlocks(), start and end block counts should be get with the protect with lock
[ https://issues.apache.org/jira/browse/HDFS-10710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] GAO Rui updated HDFS-10710: --- Description: In BlockManager#rescanPostponedMisreplicatedBlocks(), start and end block counts should be get with the protect with lock. Or, log records like "-1 blocks are removed" which indicate minus blocks are removed could be generated. For example, following scenario: 1. thread1 run {{long startPostponedMisReplicatedBlocksCount = getPostponedMisreplicatedBlocksCount();}} currently startPostponedMisReplicatedBlocksCount get the value 20. 2. before thread1 run {{namesystem.writeLock();}} , thread2 increment postponedMisreplicatedBlocksCount by 1, so postponedMisreplicatedBlocksCount is 21 now. 3. thread1 end the iteration, but no postponed block is removed, so after run {{long endPostponedMisReplicatedBlocksCount = getPostponedMisreplicatedBlocksCount();}}, endPostponedMisReplicatedBlocksCount get the value of 21. 4. thread 1 generate the log: {code} LOG.info("Rescan of postponedMisreplicatedBlocks completed in " + (Time.monotonicNow() - startTimeRescanPostponedMisReplicatedBlocks) + " msecs. " + endPostponedMisReplicatedBlocksCount + " blocks are left. " + (startPostponedMisReplicatedBlocksCount - endPostponedMisReplicatedBlocksCount) + " blocks are removed."); {code} Then, we'll get the log record like "-1 blocks are removed." was: In BlockManager#rescanPostponedMisreplicatedBlocks(), start and end block counts should be get with the protect with lock. Or, log records like "-1 blocks are removed" which indicate minus blocks are removed could be generated. For example, following scenario: 1. thread1 run {{long startPostponedMisReplicatedBlocksCount = getPostponedMisreplicatedBlocksCount();}} currently startPostponedMisReplicatedBlocksCount get the value 20. 2. before thread1 run {{namesystem.writeLock();}} , thread2 increment postponedMisreplicatedBlocksCount by 1, so postponedMisreplicatedBlocksCount is 21 now. 3. thread1 end the iteration, but no postponed block is removed, so after run {{long endPostponedMisReplicatedBlocksCount = getPostponedMisreplicatedBlocksCount();}}, endPostponedMisReplicatedBlocksCount get the value of 21. 4. thread 1 generate the log: ``` LOG.info("Rescan of postponedMisreplicatedBlocks completed in " + (Time.monotonicNow() - startTimeRescanPostponedMisReplicatedBlocks) + " msecs. " + endPostponedMisReplicatedBlocksCount + " blocks are left. " + (startPostponedMisReplicatedBlocksCount - endPostponedMisReplicatedBlocksCount) + " blocks are removed."); ``` Then, we'll get the log record like "-1 blocks are removed." > In BlockManager#rescanPostponedMisreplicatedBlocks(), start and end block > counts should be get with the protect with lock > - > > Key: HDFS-10710 > URL: https://issues.apache.org/jira/browse/HDFS-10710 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: GAO Rui >Assignee: GAO Rui > > In BlockManager#rescanPostponedMisreplicatedBlocks(), start and end block > counts should be get with the protect with lock. Or, log records like "-1 > blocks are removed" which indicate minus blocks are removed could be > generated. > For example, following scenario: > 1. thread1 run {{long startPostponedMisReplicatedBlocksCount = > getPostponedMisreplicatedBlocksCount();}} currently > startPostponedMisReplicatedBlocksCount get the value 20. > 2. before thread1 run {{namesystem.writeLock();}} , thread2 increment > postponedMisreplicatedBlocksCount by 1, so postponedMisreplicatedBlocksCount > is 21 now. > 3. thread1 end the iteration, but no postponed block is removed, so after run > {{long endPostponedMisReplicatedBlocksCount = > getPostponedMisreplicatedBlocksCount();}}, > endPostponedMisReplicatedBlocksCount get the value of 21. > 4. thread 1 generate the log: > {code} > LOG.info("Rescan of postponedMisreplicatedBlocks completed in " + > (Time.monotonicNow() - startTimeRescanPostponedMisReplicatedBlocks) + > " msecs. " + endPostponedMisReplicatedBlocksCount + > " blocks are left. " + (startPostponedMisReplicatedBlocksCount - > endPostponedMisReplicatedBlocksCount) + " blocks are removed."); > {code} > Then, we'll get the log record like "-1 blocks are removed." -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-10710) In BlockManager#rescanPostponedMisreplicatedBlocks(), start and end block counts should be get with the protect with lock
GAO Rui created HDFS-10710: -- Summary: In BlockManager#rescanPostponedMisreplicatedBlocks(), start and end block counts should be get with the protect with lock Key: HDFS-10710 URL: https://issues.apache.org/jira/browse/HDFS-10710 Project: Hadoop HDFS Issue Type: Bug Components: namenode Reporter: GAO Rui Assignee: GAO Rui In BlockManager#rescanPostponedMisreplicatedBlocks(), start and end block counts should be get with the protect with lock. Or, log records like "-1 blocks are removed" which indicate minus blocks are removed could be generated. For example, following scenario: 1. thread1 run {{long startPostponedMisReplicatedBlocksCount = getPostponedMisreplicatedBlocksCount();}} currently startPostponedMisReplicatedBlocksCount get the value 20. 2. before thread1 run {{namesystem.writeLock();}} , thread2 increment postponedMisreplicatedBlocksCount by 1, so postponedMisreplicatedBlocksCount is 21 now. 3. thread1 end the iteration, but no postponed block is removed, so after run {{long endPostponedMisReplicatedBlocksCount = getPostponedMisreplicatedBlocksCount();}}, endPostponedMisReplicatedBlocksCount get the value of 21. 4. thread 1 generate the log: ``` LOG.info("Rescan of postponedMisreplicatedBlocks completed in " + (Time.monotonicNow() - startTimeRescanPostponedMisReplicatedBlocks) + " msecs. " + endPostponedMisReplicatedBlocksCount + " blocks are left. " + (startPostponedMisReplicatedBlocksCount - endPostponedMisReplicatedBlocksCount) + " blocks are removed."); ``` Then, we'll get the log record like "-1 blocks are removed." -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-8901) Use ByteBuffer in striping positional read
[ https://issues.apache.org/jira/browse/HDFS-8901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15401762#comment-15401762 ] Hadoop QA commented on HDFS-8901: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 7s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 42s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 30s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 25s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 6s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 16s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 8s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 23s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 28s{color} | {color:orange} hadoop-hdfs-project: The patch generated 4 new + 89 unchanged - 0 fixed = 93 total (was 89) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 53s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 75m 34s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 19s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}102m 46s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.balancer.TestBalancer | | | hadoop.hdfs.server.namenode.TestEditLog | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12821280/HDFS-8901.v14.patch | | JIRA Issue | HDFS-8901 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux ad8d02b861b3 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 34ccaa8 | | Default Java | 1.8.0_101 | | findbugs | v3.0.0 | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/16274/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project.txt | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/16274/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/16274/testReport/ | |
[jira] [Created] (HDFS-10709) hdfs shell du command not match OS du command
Jiahongchao created HDFS-10709: -- Summary: hdfs shell du command not match OS du command Key: HDFS-10709 URL: https://issues.apache.org/jira/browse/HDFS-10709 Project: Hadoop HDFS Issue Type: Bug Components: fs Affects Versions: 2.6.0 Environment: centos 6.7,jdk 1.7 Reporter: Jiahongchao Priority: Minor I got files created by solr on HDFS, but the size is different when using HDFS du and centos du. [apd@dev186 ~]$ hdfs dfs -du /solr/fileSizeTest/core_node1/data/tlog 46 402653184 /solr/fileSizeTest/core_node1/data/tlog/tlog.002 [apd@dev186 ~]$ hdfs dfs -ls /solr/fileSizeTest/core_node1/data/tlog Found 1 items -rw-r--r-- 3 solr solr 46 2016-08-01 13:18 /solr/fileSizeTest/core_node1/data/tlog/tlog.002 after download this file using get, [apd@dev186 ~]$ ll -h tlog.002 -rw-r--r-- 1 apd apd 8.5M Aug 1 15:48 tlog.002 so what's dfhs dfs's du?And why the two values are so different? 46 vs 402653184 ? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10602) TestBalancer runs timeout intermittently
[ https://issues.apache.org/jira/browse/HDFS-10602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yiqun Lin updated HDFS-10602: - Attachment: HDFS-10602.002.patch > TestBalancer runs timeout intermittently > > > Key: HDFS-10602 > URL: https://issues.apache.org/jira/browse/HDFS-10602 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0-alpha1 >Reporter: Yiqun Lin >Assignee: Yiqun Lin > Attachments: HDFS-10602.001.patch, HDFS-10602.002.patch, fail.log, > pass.log > > > As the jira HDFS-10336 has mentioned, the unit test > {{TestBalancer#testBalancerWithKeytabs}} will runs too slowly sometimes and > that leads the timeout. The test {{TestBalancer#testUnknownDatanodeSimple}} > will also has this problem. These two tests both use the method > {{testUnknownDatanode}}. We can do some optimization for this method. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10602) TestBalancer runs timeout intermittently
[ https://issues.apache.org/jira/browse/HDFS-10602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15401742#comment-15401742 ] Yiqun Lin commented on HDFS-10602: -- Thanks [~xiaochen] for the comments and thanks [~liuml07] for providing the logs. {quote} The 3rd DN is excluded for the test case. {quote} Hi, Xiao, it seems that the 3rd DN is not excluded in the test case. The balancer moves data always between 2 specific DNs so that the 3rd DN can't moves data. {quote} It seems the code has been refactored so no getBlockList exists in trunk now {quote} Yes, the method getBlockList in {{Balancer.java}} has been changed, now the method was updated in {{Dispatcher#Source}}. And I have tested in the case, the getBlockList will return the srcBlocks in the failing case. So the problem is mainly concentrated on why the balancer moves data always between 2 DNs. I add some codes to print the detail info. {code} private PendingMove chooseNextMove() { for (Iterator i = tasks.iterator(); i.hasNext();) { final Task task = i.next(); final DDatanode target = task.target.getDDatanode(); final PendingMove pendingBlock = new PendingMove(this, task.target); if (target.addPendingBlock(pendingBlock)) { // target is not busy, so do a tentative block allocation if (pendingBlock.chooseBlockAndProxy()) { long blockSize = pendingBlock.reportedBlock.getNumBytes(this); incScheduledSize(-blockSize); task.size -= blockSize; // Print the scheduled size for test LOG.info("TargetNode: " + target.getDatanodeInfo().getXferPort() + ", bytes scheduled to move, after: " + task.size + ", before: " + (task.size + blockSize)); if (task.size == 0) { LOG.info("TargetNode removed."); i.remove(); } LOG.info("Return pendingBlock for target node " + target.getDatanodeInfo().getXferPort()); return pendingBlock; ... {code} In here the task.size which means the bytes scheduled to move will not always reduce to 0. And then it will return this pendingBlock and the task for next target node will be ignored. In test, I saw that the 3rd DN in balancer is always the second targetNode in here, and the method will just return when deals with the first target node. These are my local logs: {code} 2016-08-01 16:51:53,466 [pool-49-thread-1] INFO balancer.Dispatcher (Dispatcher.java:chooseNextMove(799)) - TargetNode: 58798, bytes scheduled to move, after: -1067, before: -967 2016-08-01 16:51:53,466 [pool-49-thread-1] INFO balancer.Dispatcher (Dispatcher.java:chooseNextMove(806)) - Return pendingBlock for target node 58798 2016-08-01 16:51:53,466 [pool-50-thread-10] INFO balancer.Dispatcher (Dispatcher.java:dispatch(322)) - Start moving blk_1073741833_1009 with size=100 from 127.0.0.1:58794:DISK to 127.0.0.1:58798:DISK through 127.0.0.1:58794 2016-08-01 16:51:53,467 [pool-49-thread-1] INFO balancer.Dispatcher (Dispatcher.java:chooseNextMove(799)) - TargetNode: 58798, bytes scheduled to move, after: -1167, before: -1067 2016-08-01 16:51:53,467 [pool-49-thread-1] INFO balancer.Dispatcher (Dispatcher.java:chooseNextMove(806)) - Return pendingBlock for target node 58798 2016-08-01 16:51:53,467 [pool-50-thread-11] INFO balancer.Dispatcher (Dispatcher.java:dispatch(322)) - Start moving blk_1073741834_1010 with size=100 from 127.0.0.1:58794:DISK to 127.0.0.1:58798:DISK through 127.0.0.1:58794 2016-08-01 16:51:53,468 [pool-49-thread-1] INFO balancer.Dispatcher (Dispatcher.java:chooseNextMove(799)) - TargetNode: 58798, bytes scheduled to move, after: -1267, before: -1167 2016-08-01 16:51:53,468 [pool-49-thread-1] INFO balancer.Dispatcher (Dispatcher.java:chooseNextMove(806)) - Return pendingBlock for target node 58798 2016-08-01 16:51:53,468 [pool-50-thread-12] INFO balancer.Dispatcher (Dispatcher.java:dispatch(322)) - Start moving blk_1073741835_1011 with size=100 from 127.0.0.1:58794:DISK to 127.0.0.1:58798:DISK through 127.0.0.1:58794 2016-08-01 16:51:53,468 [pool-49-thread-1] INFO balancer.Dispatcher (Dispatcher.java:chooseNextMove(799)) - TargetNode: 58798, bytes scheduled to move, after: -1367, before: -1267 2016-08-01 16:51:53,468 [pool-49-thread-1] INFO balancer.Dispatcher (Dispatcher.java:chooseNextMove(806)) - Return pendingBlock for target node 58798 2016-08-01 16:51:53,469 [pool-50-thread-13] INFO balancer.Dispatcher (Dispatcher.java:dispatch(322)) - Start moving blk_1073741836_1012 with size=100 from 127.0.0.1:58794:DISK to 127.0.0.1:58798:DISK through 127.0.0.1:58794 2016-08-01 16:51:53,469 [pool-49-thread-1] INFO balancer.Dispatcher (Dispatcher.java:chooseNextMove(799)) - TargetNode: 58798, bytes scheduled to move, after: -1467, before: -1367 2016-08-01 16:51:53,469 [pool-49-thread-1] INFO
[jira] [Updated] (HDFS-8901) Use ByteBuffer in striping positional read
[ https://issues.apache.org/jira/browse/HDFS-8901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Youwei Wang updated HDFS-8901: -- Attachment: (was: HDFS-8901.v14.patch) > Use ByteBuffer in striping positional read > -- > > Key: HDFS-8901 > URL: https://issues.apache.org/jira/browse/HDFS-8901 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Youwei Wang > Attachments: HDFS-8901-v10.patch, HDFS-8901-v2.patch, > HDFS-8901-v3.patch, HDFS-8901-v4.patch, HDFS-8901-v5.patch, > HDFS-8901-v6.patch, HDFS-8901-v7.patch, HDFS-8901-v8.patch, > HDFS-8901-v9.patch, HDFS-8901.v11.patch, HDFS-8901.v12.patch, > HDFS-8901.v13.patch, initial-poc.patch > > > Native erasure coder prefers to direct ByteBuffer for performance > consideration. To prepare for it, this change uses ByteBuffer through the > codes in implementing striping position read. It will also fix avoiding > unnecessary data copying between striping read chunk buffers and decode input > buffers. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Issue Comment Deleted] (HDFS-8901) Use ByteBuffer in striping positional read
[ https://issues.apache.org/jira/browse/HDFS-8901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Youwei Wang updated HDFS-8901: -- Comment: was deleted (was: New patch submitted. File name: HDFS-8901.v14.patch Based on commitid: 34ccaa8367f048ed9f56038efe7b3202c436b6e6 Commet: A small revision for the test class: TestDFSStripedInputStream.java) > Use ByteBuffer in striping positional read > -- > > Key: HDFS-8901 > URL: https://issues.apache.org/jira/browse/HDFS-8901 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Youwei Wang > Attachments: HDFS-8901-v10.patch, HDFS-8901-v2.patch, > HDFS-8901-v3.patch, HDFS-8901-v4.patch, HDFS-8901-v5.patch, > HDFS-8901-v6.patch, HDFS-8901-v7.patch, HDFS-8901-v8.patch, > HDFS-8901-v9.patch, HDFS-8901.v11.patch, HDFS-8901.v12.patch, > HDFS-8901.v13.patch, initial-poc.patch > > > Native erasure coder prefers to direct ByteBuffer for performance > consideration. To prepare for it, this change uses ByteBuffer through the > codes in implementing striping position read. It will also fix avoiding > unnecessary data copying between striping read chunk buffers and decode input > buffers. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-8901) Use ByteBuffer in striping positional read
[ https://issues.apache.org/jira/browse/HDFS-8901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Youwei Wang updated HDFS-8901: -- Attachment: HDFS-8901.v14.patch New patch submitted. File name: HDFS-8901.v14.patch Based on commitid: 34ccaa8367f048ed9f56038efe7b3202c436b6e6 Commet: A small revision for the test class: TestDFSStripedInputStream.java > Use ByteBuffer in striping positional read > -- > > Key: HDFS-8901 > URL: https://issues.apache.org/jira/browse/HDFS-8901 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Youwei Wang > Attachments: HDFS-8901-v10.patch, HDFS-8901-v2.patch, > HDFS-8901-v3.patch, HDFS-8901-v4.patch, HDFS-8901-v5.patch, > HDFS-8901-v6.patch, HDFS-8901-v7.patch, HDFS-8901-v8.patch, > HDFS-8901-v9.patch, HDFS-8901.v11.patch, HDFS-8901.v12.patch, > HDFS-8901.v13.patch, HDFS-8901.v14.patch, initial-poc.patch > > > Native erasure coder prefers to direct ByteBuffer for performance > consideration. To prepare for it, this change uses ByteBuffer through the > codes in implementing striping position read. It will also fix avoiding > unnecessary data copying between striping read chunk buffers and decode input > buffers. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org