[jira] [Commented] (HDFS-9129) Move the safemode block count into BlockManager
[ https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971490#comment-14971490 ] Haohui Mai commented on HDFS-9129: -- {code} + * The state machine is briefly elaborated in the following diagram. Specially + * the start status is always INITIALIZED and the end status is always OFF. + * There is no transition to INITIALIZED and no transition from OFF. Once + * entered, it will not leave THRESHOLD status until the block and datanode + * thresholds are met. Similarly, it will not leave EXTENSION status until the + * thresholds are met and extension period is reached. + * + */\ + * thresholds not met / | + * INITIALIZED -> THRESHOLD <--` + *| /| + *| / | + *|/ | + *| thresholds met/ | + *| & /| thresholds met + * thresholds |no need extension/ | & + *met |.---` | need extension + *| / | + *| /| + *| / | + *|/ | + *| / | + *V |/_ V + * OFF <--- EXTENSION <---. + * thresholds met \ | + * & \/ + * extension reached {code} It does not give much information compared to figuring out the issues on the code directly. What does "thresholds met" / "extensions reached" mean? It causes more confusions than explanations. {code} LOG.error("Non-recognized block manager safe mode status: {}", status); {code} Should be an assert. {code} /** * If the NN is in safemode, and not due to manual / low resources, we * assume it must be because of startup. If the NN had low resources during * startup, we assume it came out of startup safemode and it is now in low * resources safemode. */ private volatile boolean isInManualSafeMode = false; private volatile boolean isInResourceLowSafeMode = false; ... isInManualSafeMode = !resourcesLow; isInResourceLowSafeMode = resourcesLow; {code} How do these two variables synchronize? Is the system in consistent state in the middle of the execution? {code} +bmSafeMode = new BlockManagerSafeMode(bm, fsn, conf); +assertEquals(BMSafeModeStatus.INITIALIZED, getSafeModeStatus()); +assertFalse(bmSafeMode.isInSafeMode()); +// INITIALIZED -> THRESHOLD +bmSafeMode.setBlockTotal(BLOCK_TOTAL); +assertEquals(BMSafeModeStatus.THRESHOLD, getSafeModeStatus()); +assertTrue(bmSafeMode.isInSafeMode()); {code} It makes sense to put it in a test instead of in the {{@Before}} method. {code} +// INITIALIZED -> OFF +Whitebox.setInternalState(bmSafeMode, "status", +BMSafeModeStatus.INITIALIZED); +reachBlockThreshold(); +bmSafeMode.checkSafeMode(); +assertEquals(BMSafeModeStatus.OFF, getSafeModeStatus()); + +// INITIALIZED -> THRESHOLD +Whitebox.setInternalState(bmSafeMode, "status", +BMSafeModeStatus.INITIALIZED); +Whitebox.setInternalState(bmSafeMode, "blockSafe", 0); +bmSafeMode.checkSafeMode(); +assertEquals(BMSafeModeStatus.THRESHOLD, getSafeModeStatus()); + +// stays in THRESHOLD: pending block threshold +Whitebox.setInternalState(bmSafeMode, "status", BMSafeModeStatus.THRESHOLD); +for (long i = 0; i < BLOCK_THRESHOLD; i++) { + Whitebox.setInternalState(bmSafeMode, "blockSafe", i); + bmSafeMode.checkSafeMode(); + assertEquals(BMSafeModeStatus.THRESHOLD, getSafeModeStatus()); +} + +// THRESHOLD -> EXTENSION +Whitebox.setInternalState(bmSafeMode, "status", BMSafeModeStatus.THRESHOLD); +reachBlockThreshold(); +bmSafeMode.checkSafeMode(); +assertEquals(BMSafeModeStatus.EXTENSION, getSafeModeStatus()); +Whitebox.setInternalState(bmSafeMode, "smmthread", null); + +// THRESHOLD -> OFF +Whitebox.setInternalState(bmSafeMode, "status", BMSafeModeStatus.THRESHOLD); +reachBlockThreshold(); +Whitebox.setInternalState(bmSafeMode, "needExtension", false); +
[jira] [Commented] (HDFS-9297) Update TestBlockMissingException to use corruptBlockOnDataNodesByDeletingBlockFile()
[ https://issues.apache.org/jira/browse/HDFS-9297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971537#comment-14971537 ] Hadoop QA commented on HDFS-9297: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 8m 44s | Pre-patch trunk has 1 extant Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 8m 54s | There were no new javac warning messages. | | {color:green}+1{color} | release audit | 0m 25s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 37s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 52s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 36s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 45s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 1m 5s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 67m 11s | Tests failed in hadoop-hdfs. | | | | 93m 12s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.hdfs.TestRecoverStripedFile | | | hadoop.hdfs.server.namenode.TestFileTruncate | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12768333/HDFS-9297.001.patch | | Optional Tests | javac unit findbugs checkstyle | | git revision | trunk / 934d96a | | Pre-patch Findbugs warnings | https://builds.apache.org/job/PreCommit-HDFS-Build/13157/artifact/patchprocess/trunkFindbugsWarningshadoop-hdfs.html | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/13157/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/13157/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/13157/console | This message was automatically generated. > Update TestBlockMissingException to use > corruptBlockOnDataNodesByDeletingBlockFile() > > > Key: HDFS-9297 > URL: https://issues.apache.org/jira/browse/HDFS-9297 > Project: Hadoop HDFS > Issue Type: Improvement > Components: HDFS, test >Affects Versions: 2.7.1 >Reporter: Tony Wu >Assignee: Tony Wu >Priority: Trivial > Attachments: HDFS-9297.001.patch > > > TestBlockMissingException uses its own function to corrupt a block by > deleting all its block files. HDFS-7235 introduced a helper function > {{corruptBlockOnDataNodesByDeletingBlockFile()}} that does exactly the same > thing. We can update this test to use the helper function. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7284) Add more debug info to BlockInfoUnderConstruction#setGenerationStampAndVerifyReplicas
[ https://issues.apache.org/jira/browse/HDFS-7284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971586#comment-14971586 ] Yongjun Zhang commented on HDFS-7284: - Hi [~jojochuang], Thanks for the new rev. I noticed that you changed the default logger setting from info to debug, definitely we need to change it back: log4j.rootLogger=debug,stdout +1 after that pending jenkins test. > Add more debug info to > BlockInfoUnderConstruction#setGenerationStampAndVerifyReplicas > - > > Key: HDFS-7284 > URL: https://issues.apache.org/jira/browse/HDFS-7284 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 2.5.1 >Reporter: Hu Liu, >Assignee: Wei-Chiu Chuang > Labels: supportability > Attachments: HDFS-7284.001.patch, HDFS-7284.002.patch, > HDFS-7284.003.patch > > > When I was looking at some replica loss issue, I got the following info from > log > {code} > 2014-10-13 01:54:53,104 INFO BlockStateChange: BLOCK* Removing stale replica > from location x.x.x.x > {code} > I could just know that a replica is removed, but I don't know which block and > its timestamp. I need to know the id and timestamp of the block from the log > file. > So it's better to add more info including block id and timestamp to the code > snippet > {code} > for (ReplicaUnderConstruction r : replicas) { > if (genStamp != r.getGenerationStamp()) { > r.getExpectedLocation().removeBlock(this); > NameNode.blockStateChangeLog.info("BLOCK* Removing stale replica " > + "from location: " + r.getExpectedLocation()); > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9279) Decomissioned capacity should not be considered for configured/used capacity
[ https://issues.apache.org/jira/browse/HDFS-9279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kuhu Shukla updated HDFS-9279: -- Attachment: HDFS-9279-v1.patch The patch in addition to dfsUsed, also updates XceiverCount , and blockPoolUsed only when a node is not decommissioning or decommissioned. cacheCapacity and cacheUsed are updated for all nodes that are not decommissioned. > Decomissioned capacity should not be considered for configured/used capacity > > > Key: HDFS-9279 > URL: https://issues.apache.org/jira/browse/HDFS-9279 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.1 >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla > Attachments: HDFS-9279-v1.patch > > > Capacity of a decommissioned node is being accounted as configured and used > capacity metrics. This gives incorrect perception of cluster usage. > Once a node is decommissioned, its capacity should be considered similar to a > dead node. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7284) Add more debug info to BlockInfoUnderConstruction#setGenerationStampAndVerifyReplicas
[ https://issues.apache.org/jira/browse/HDFS-7284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDFS-7284: -- Attachment: HDFS-7284.004.patch Thanks [~yzhangal] for the code review. I am attaching a new version with no log4 change. > Add more debug info to > BlockInfoUnderConstruction#setGenerationStampAndVerifyReplicas > - > > Key: HDFS-7284 > URL: https://issues.apache.org/jira/browse/HDFS-7284 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 2.5.1 >Reporter: Hu Liu, >Assignee: Wei-Chiu Chuang > Labels: supportability > Attachments: HDFS-7284.001.patch, HDFS-7284.002.patch, > HDFS-7284.003.patch, HDFS-7284.004.patch > > > When I was looking at some replica loss issue, I got the following info from > log > {code} > 2014-10-13 01:54:53,104 INFO BlockStateChange: BLOCK* Removing stale replica > from location x.x.x.x > {code} > I could just know that a replica is removed, but I don't know which block and > its timestamp. I need to know the id and timestamp of the block from the log > file. > So it's better to add more info including block id and timestamp to the code > snippet > {code} > for (ReplicaUnderConstruction r : replicas) { > if (genStamp != r.getGenerationStamp()) { > r.getExpectedLocation().removeBlock(this); > NameNode.blockStateChangeLog.info("BLOCK* Removing stale replica " > + "from location: " + r.getExpectedLocation()); > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9299) Give ReplicationMonitor a readable thread name
[ https://issues.apache.org/jira/browse/HDFS-9299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Staffan Friberg updated HDFS-9299: -- Attachment: HDFS-9299.001.patch > Give ReplicationMonitor a readable thread name > -- > > Key: HDFS-9299 > URL: https://issues.apache.org/jira/browse/HDFS-9299 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 2.7.1 >Reporter: Staffan Friberg >Priority: Trivial > Attachments: HDFS-9299.001.patch > > > Currently the log output from the Replication Monitor is the class name, by > setting the name on the thread the output will be easier to read. > Current > 2015-10-23 11:07:53,344 > [org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor@2fbdc5dd] > INFO blockmanagement.BlockManager (BlockManager.java:run(4125)) - Stopping > ReplicationMonitor. > After > 2015-10-23 11:07:53,344 [ReplicationMonitor] INFO > blockmanagement.BlockManager (BlockManager.java:run(4125)) - Stopping > ReplicationMonitor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7284) Add more debug info to BlockInfoUnderConstruction#setGenerationStampAndVerifyReplicas
[ https://issues.apache.org/jira/browse/HDFS-7284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDFS-7284: -- Attachment: HDFS-7284.005.patch Thanks for catching my bad English :/ > Add more debug info to > BlockInfoUnderConstruction#setGenerationStampAndVerifyReplicas > - > > Key: HDFS-7284 > URL: https://issues.apache.org/jira/browse/HDFS-7284 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 2.5.1 >Reporter: Hu Liu, >Assignee: Wei-Chiu Chuang > Labels: supportability > Attachments: HDFS-7284.001.patch, HDFS-7284.002.patch, > HDFS-7284.003.patch, HDFS-7284.004.patch, HDFS-7284.005.patch > > > When I was looking at some replica loss issue, I got the following info from > log > {code} > 2014-10-13 01:54:53,104 INFO BlockStateChange: BLOCK* Removing stale replica > from location x.x.x.x > {code} > I could just know that a replica is removed, but I don't know which block and > its timestamp. I need to know the id and timestamp of the block from the log > file. > So it's better to add more info including block id and timestamp to the code > snippet > {code} > for (ReplicaUnderConstruction r : replicas) { > if (genStamp != r.getGenerationStamp()) { > r.getExpectedLocation().removeBlock(this); > NameNode.blockStateChangeLog.info("BLOCK* Removing stale replica " > + "from location: " + r.getExpectedLocation()); > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9299) Give ReplicationMonitor a readable thread name
Staffan Friberg created HDFS-9299: - Summary: Give ReplicationMonitor a readable thread name Key: HDFS-9299 URL: https://issues.apache.org/jira/browse/HDFS-9299 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.7.1 Reporter: Staffan Friberg Priority: Trivial Currently the log output from the Replication Monitor is the class name, by setting the name on the thread the output will be easier to read. Current 2015-10-23 11:07:53,344 [org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor@2fbdc5dd] INFO blockmanagement.BlockManager (BlockManager.java:run(4125)) - Stopping ReplicationMonitor. After 2015-10-23 11:07:53,344 [ReplicationMonitor] INFO blockmanagement.BlockManager (BlockManager.java:run(4125)) - Stopping ReplicationMonitor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9268) fuse_dfs chown crashes when uid is passed as -1
[ https://issues.apache.org/jira/browse/HDFS-9268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971633#comment-14971633 ] Zhe Zhang commented on HDFS-9268: - The patch LGTM. One minor suggestion is maybe we can fold {{fuseConnect}} into {{fuseConnectAsThreadUid}} to avoid bugs of this kind in the future? Seems we should always call {{fuseConnect}} with the thread UID anyway. > fuse_dfs chown crashes when uid is passed as -1 > --- > > Key: HDFS-9268 > URL: https://issues.apache.org/jira/browse/HDFS-9268 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Wei-Chiu Chuang >Assignee: Colin Patrick McCabe >Priority: Minor > Attachments: HDFS-9268.001.patch, HDFS-9268.002.patch > > > JVM crashes when users attempt to use vi to update a file on fuse file system > with insufficient permission. (I use CDH's hadoop-fuse-dfs wrapper script to > generate the bug, but the same bug is reproducible in trunk) > The root cause is a segfault in a dfs-fuse method > To reproduce it do as follows: > mkdir /mnt/fuse > chmod 777 /mnt/fuse > ulimit -c unlimited# to enable coredump > hadoop-fuse-dfs -odebug hdfs://localhost:9000/fuse /mnt/fuse > touch /mnt/fuse/y > chmod 600 /mnt/fuse/y > vim /mnt/fuse/y > (in vim, :w to save the file) > # > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x003b82f27ad6, pid=26606, tid=140079005689600 > # > # JRE version: Java(TM) SE Runtime Environment (7.0_79-b15) (build > 1.7.0_79-b15) > # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.79-b02 mixed mode > linux-amd64 compressed oops) > # Problematic frame: > # C [libc.so.6+0x127ad6] __tls_get_addr@@GLIBC_2.3+0x127ad6 > # > # Core dump written. Default location: /home/weichiu/core or core.26606 > # > # An error report file with more information is saved as: > # /home/weichiu/hs_err_pid26606.log > # > # If you would like to submit a bug report, please visit: > # http://bugreport.java.com/bugreport/crash.jsp > # The crash happened outside the Java Virtual Machine in native code. > # See problematic frame for where to report the bug. > # > /usr/bin/hadoop-fuse-dfs: line 29: 26606 Aborted (core > dumped) env CLASSPATH="${CLASSPATH}" ${HADOOP_HOME}/bin/fuse_dfs $@ > === > The coredump shows the segfault comes from > (gdb) bt > #0 0x003b82e328e5 in raise () from /lib64/libc.so.6 > #1 0x003b82e340c5 in abort () from /lib64/libc.so.6 > #2 0x7f66fc924d75 in os::abort(bool) () from > /etc/alternatives/jre/jre/lib/amd64/server/libjvm.so > #3 0x7f66fcaa76d7 in VMError::report_and_die() () from > /etc/alternatives/jre/jre/lib/amd64/server/libjvm.so > #4 0x7f66fc929c8f in JVM_handle_linux_signal () from > /etc/alternatives/jre/jre/lib/amd64/server/libjvm.so > #5 > #6 0x003b82f27ad6 in __strcmp_sse42 () from /lib64/libc.so.6 > #7 0x004039a0 in hdfsConnTree_RB_FIND () > #8 0x00403e8f in fuseConnect () > #9 0x004046db in dfs_chown () > #10 0x7f66fcf8f6d2 in ?? () from /lib64/libfuse.so.2 > #11 0x7f66fcf940d1 in ?? () from /lib64/libfuse.so.2 > #12 0x7f66fcf910ef in ?? () from /lib64/libfuse.so.2 > #13 0x003b83207851 in start_thread () from /lib64/libpthread.so.0 > #14 0x003b82ee894d in clone () from /lib64/libc.so.6 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9266) Avoid unsafe split and append on fields that might be IPv6 literals
[ https://issues.apache.org/jira/browse/HDFS-9266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark updated HDFS-9266: Summary: Avoid unsafe split and append on fields that might be IPv6 literals (was: hadoop-hdfs - Avoid unsafe split and append on fields that might be IPv6 literals) > Avoid unsafe split and append on fields that might be IPv6 literals > --- > > Key: HDFS-9266 > URL: https://issues.apache.org/jira/browse/HDFS-9266 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Nemanja Matkovic >Assignee: Nemanja Matkovic > Labels: ipv6 > Attachments: HDFS-9266-HADOOP-11890.1.patch, > HDFS-9266-HADOOP-11890.2.patch > > Original Estimate: 48h > Remaining Estimate: 48h > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-9299) Give ReplicationMonitor a readable thread name
[ https://issues.apache.org/jira/browse/HDFS-9299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Staffan Friberg reassigned HDFS-9299: - Assignee: Staffan Friberg > Give ReplicationMonitor a readable thread name > -- > > Key: HDFS-9299 > URL: https://issues.apache.org/jira/browse/HDFS-9299 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 2.7.1 >Reporter: Staffan Friberg >Assignee: Staffan Friberg >Priority: Trivial > Attachments: HDFS-9299.001.patch > > > Currently the log output from the Replication Monitor is the class name, by > setting the name on the thread the output will be easier to read. > Current > 2015-10-23 11:07:53,344 > [org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor@2fbdc5dd] > INFO blockmanagement.BlockManager (BlockManager.java:run(4125)) - Stopping > ReplicationMonitor. > After > 2015-10-23 11:07:53,344 [ReplicationMonitor] INFO > blockmanagement.BlockManager (BlockManager.java:run(4125)) - Stopping > ReplicationMonitor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9298) remove replica and not add replica with wrong genStamp
Chang Li created HDFS-9298: -- Summary: remove replica and not add replica with wrong genStamp Key: HDFS-9298 URL: https://issues.apache.org/jira/browse/HDFS-9298 Project: Hadoop HDFS Issue Type: Bug Reporter: Chang Li Assignee: Chang Li currently, in setGenerationStampAndVerifyReplicas, replica with wrong gen stamp is not really removed, only StorageLocation of that replica is removed. Moreover, we should check genStamp before addReplicaIfNotPresent -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9260) Improve performance and GC friendliness of startup and FBRs
[ https://issues.apache.org/jira/browse/HDFS-9260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Staffan Friberg updated HDFS-9260: -- Attachment: HDFS-7435.005.patch Fix the last todos Handles New NN and Old DN (unsorted entries), it is ineffiecient since the NN needs to sort entries. However it should only be a problem during the upgrade cycle, and avoidable if DNs are updated first. StorageInfoMonitor thread that can compact the TreeSet if the fill ratio gets too low. Added test to check that unsorted entries are handled correctly. > Improve performance and GC friendliness of startup and FBRs > --- > > Key: HDFS-9260 > URL: https://issues.apache.org/jira/browse/HDFS-9260 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, namenode, performance >Affects Versions: 2.7.1 >Reporter: Staffan Friberg >Assignee: Staffan Friberg > Attachments: HDFS Block and Replica Management 20151013.pdf, > HDFS-7435.001.patch, HDFS-7435.002.patch, HDFS-7435.003.patch, > HDFS-7435.004.patch, HDFS-7435.005.patch > > > This patch changes the datastructures used for BlockInfos and Replicas to > keep them sorted. This allows faster and more GC friendly handling of full > block reports. > Would like to hear peoples feedback on this change and also some help > investigating/understanding a few outstanding issues if we are interested in > moving forward with this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9255) Consolidate block recovery related implementation into a single class
[ https://issues.apache.org/jira/browse/HDFS-9255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971591#comment-14971591 ] Jing Zhao commented on HDFS-9255: - Thanks for working on this, Walter! The patch looks pretty good to me. One question is: since {{DataNode#blockRecoveryWorker}} is not declared as final, can we make sure the BPServiceActor thread can always see its non-null value when calling {{getBlockRecoveryWorker}}? > Consolidate block recovery related implementation into a single class > - > > Key: HDFS-9255 > URL: https://issues.apache.org/jira/browse/HDFS-9255 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Walter Su >Assignee: Walter Su >Priority: Minor > Attachments: HDFS-9255.01.patch, HDFS-9255.02.patch, > HDFS-9255.03.patch, HDFS-9255.04.patch, HDFS-9255.05.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9289) check genStamp when complete file
[ https://issues.apache.org/jira/browse/HDFS-9289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971590#comment-14971590 ] Elliott Clark commented on HDFS-9289: - It had all of the data and the same md5sums when I checked. So the only thing different was genstamps. Not really sure about why that happened. But I didn't mean to side track this jira. Test looks nice. > check genStamp when complete file > - > > Key: HDFS-9289 > URL: https://issues.apache.org/jira/browse/HDFS-9289 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Chang Li >Assignee: Chang Li >Priority: Critical > Attachments: HDFS-9289.1.patch, HDFS-9289.2.patch > > > we have seen a case of corrupt block which is caused by file complete after a > pipelineUpdate, but the file complete with the old block genStamp. This > caused the replicas of two datanodes in updated pipeline to be viewed as > corrupte. Propose to check genstamp when commit block -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7284) Add more debug info to BlockInfoUnderConstruction#setGenerationStampAndVerifyReplicas
[ https://issues.apache.org/jira/browse/HDFS-7284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971600#comment-14971600 ] Yongjun Zhang commented on HDFS-7284: - Sorry one more thing Suggest to change: {code} * A helper method to output the string representation of a derived class, {code} to {code} * A helper method to output the string representation of the Block portion of * a derived class' instance. {code} > Add more debug info to > BlockInfoUnderConstruction#setGenerationStampAndVerifyReplicas > - > > Key: HDFS-7284 > URL: https://issues.apache.org/jira/browse/HDFS-7284 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 2.5.1 >Reporter: Hu Liu, >Assignee: Wei-Chiu Chuang > Labels: supportability > Attachments: HDFS-7284.001.patch, HDFS-7284.002.patch, > HDFS-7284.003.patch, HDFS-7284.004.patch > > > When I was looking at some replica loss issue, I got the following info from > log > {code} > 2014-10-13 01:54:53,104 INFO BlockStateChange: BLOCK* Removing stale replica > from location x.x.x.x > {code} > I could just know that a replica is removed, but I don't know which block and > its timestamp. I need to know the id and timestamp of the block from the log > file. > So it's better to add more info including block id and timestamp to the code > snippet > {code} > for (ReplicaUnderConstruction r : replicas) { > if (genStamp != r.getGenerationStamp()) { > r.getExpectedLocation().removeBlock(this); > NameNode.blockStateChangeLog.info("BLOCK* Removing stale replica " > + "from location: " + r.getExpectedLocation()); > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9293) FSEditLog's 'OpInstanceCache' instance of threadLocal cache exists dirty 'rpcId',which may cause standby NN too busy to communicate
[ https://issues.apache.org/jira/browse/HDFS-9293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] 邓飞 updated HDFS-9293: - Description: In our cluster (hadoop 2.2.0-HA,700+ DN),we found standby NN tail editlog slowly,and hold the fsnamesystem writelock during the work and the DN's heartbeart/blockreport IPC request blocked.Lead to Active NN remove stale DN which can't send heartbeat because blocking at process Standby NN Regiest common(FIXED at 2.7.1). Below is the standby NN stack: "Edit log tailer" prio=10 tid=0x7f28fcf35800 nid=0x1a7d runnable [0x7f0dd1d76000] java.lang.Thread.State: RUNNABLE at java.util.PriorityQueue.remove(PriorityQueue.java:360) at org.apache.hadoop.util.LightWeightCache.put(LightWeightCache.java:217) at org.apache.hadoop.ipc.RetryCache.addCacheEntry(RetryCache.java:270) - locked <0x7f12817714b8> (a org.apache.hadoop.ipc.RetryCache) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addCacheEntry(FSNamesystem.java:724) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:406) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:199) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:112) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:733) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:227) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:321) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:279) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:296) at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:456) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:292) When apply editLogOp,if the IPC retryCache is found,need to remove the previous from priorityQueue(O(N)), The updateblock is don't need record rpcId on editlog except 'client request updatePipeline',but we found many 'UpdateBlocksOp' has repeat ipcId. was: In our cluster (hadoop 2.2.0-HA,700+ DN),we found standby NN tail editlog slowly,and hold the fsnamesystem writelock during the work and the DN's heartbeart/blockreport IPC request blocked.Lead to Active NN remove stale DN which can't send heartbeat because blocking at process Standby NN Regiest common(FIXED at 2.7.1). Below is the standby NN stack: "Edit log tailer" prio=10 tid=0x7f28fcf35800 nid=0x1a7d runnable [0x7f0dd1d76000] java.lang.Thread.State: RUNNABLE at java.util.PriorityQueue.remove(PriorityQueue.java:360) at org.apache.hadoop.util.LightWeightCache.put(LightWeightCache.java:217) at org.apache.hadoop.ipc.RetryCache.addCacheEntry(RetryCache.java:270) - locked <0x7f12817714b8> (a org.apache.hadoop.ipc.RetryCache) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addCacheEntry(FSNamesystem.java:724) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:406) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:199) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:112) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:733) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:227) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:321) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:279) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:296) at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:456) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:292) When apply editLogOp,if the IPC retryCache is found,need to remove the previous from priorityQueue(O(N)), The updateblock is don't need record rpcId on editlog except 'client request updatePipeline',but we found many 'UpdateBlocksOp' has repeat ipcId at editlog. > FSEditLog's 'OpInstanceCache' instance of threadLocal cache exists dirty > 'rpcId',which may cause standby NN too busy to communicate > --
[jira] [Updated] (HDFS-9293) FSEditLog's 'OpInstanceCache' instance of threadLocal cache exists dirty 'rpcId',which may cause standby NN too busy to communicate
[ https://issues.apache.org/jira/browse/HDFS-9293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] 邓飞 updated HDFS-9293: - Description: In our cluster (hadoop 2.2.0-HA,700+ DN),we found standby NN tail editlog slowly,and hold the fsnamesystem writelock during the work and the DN's heartbeart/blockreport IPC request blocked.Lead to Active NN remove stale DN which can't send heartbeat because blocking at process Standby NN Regiest common(FIXED at 2.7.1). Below is the standby NN stack: "Edit log tailer" prio=10 tid=0x7f28fcf35800 nid=0x1a7d runnable [0x7f0dd1d76000] java.lang.Thread.State: RUNNABLE at java.util.PriorityQueue.remove(PriorityQueue.java:360) at org.apache.hadoop.util.LightWeightCache.put(LightWeightCache.java:217) at org.apache.hadoop.ipc.RetryCache.addCacheEntry(RetryCache.java:270) - locked <0x7f12817714b8> (a org.apache.hadoop.ipc.RetryCache) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addCacheEntry(FSNamesystem.java:724) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:406) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:199) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:112) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:733) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:227) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:321) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:279) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:296) at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:456) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:292) When apply editLogOp,if the IPC retryCache is found,need to remove the previous from priorityQueue(O(N)), The updateblock is don't need record rpcId on editlog except 'client request updatePipeline',but we found many 'UpdateBlocksOp' has repeat ipcId at editlog. > FSEditLog's 'OpInstanceCache' instance of threadLocal cache exists dirty > 'rpcId',which may cause standby NN too busy to communicate > -- > > Key: HDFS-9293 > URL: https://issues.apache.org/jira/browse/HDFS-9293 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.2.0, 2.7.1 >Reporter: 邓飞 >Assignee: 邓飞 > > In our cluster (hadoop 2.2.0-HA,700+ DN),we found standby NN tail editlog > slowly,and hold the fsnamesystem writelock during the work and the DN's > heartbeart/blockreport IPC request blocked.Lead to Active NN remove stale DN > which can't send heartbeat because blocking at process Standby NN Regiest > common(FIXED at 2.7.1). > Below is the standby NN stack: > "Edit log tailer" prio=10 tid=0x7f28fcf35800 nid=0x1a7d runnable > [0x7f0dd1d76000] >java.lang.Thread.State: RUNNABLE > at java.util.PriorityQueue.remove(PriorityQueue.java:360) > at > org.apache.hadoop.util.LightWeightCache.put(LightWeightCache.java:217) > at org.apache.hadoop.ipc.RetryCache.addCacheEntry(RetryCache.java:270) > - locked <0x7f12817714b8> (a org.apache.hadoop.ipc.RetryCache) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addCacheEntry(FSNamesystem.java:724) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:406) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:199) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:112) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:733) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:227) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:321) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:279) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:296) > at > org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:456) > at >
[jira] [Commented] (HDFS-9297) Update TestBlockMissingException to use corruptBlockOnDataNodesByDeletingBlockFile()
[ https://issues.apache.org/jira/browse/HDFS-9297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971546#comment-14971546 ] Tony Wu commented on HDFS-9297: --- The failed tests are not related to this change. > Update TestBlockMissingException to use > corruptBlockOnDataNodesByDeletingBlockFile() > > > Key: HDFS-9297 > URL: https://issues.apache.org/jira/browse/HDFS-9297 > Project: Hadoop HDFS > Issue Type: Improvement > Components: HDFS, test >Affects Versions: 2.7.1 >Reporter: Tony Wu >Assignee: Tony Wu >Priority: Trivial > Attachments: HDFS-9297.001.patch > > > TestBlockMissingException uses its own function to corrupt a block by > deleting all its block files. HDFS-7235 introduced a helper function > {{corruptBlockOnDataNodesByDeletingBlockFile()}} that does exactly the same > thing. We can update this test to use the helper function. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9298) remove replica and not add replica with wrong genStamp
[ https://issues.apache.org/jira/browse/HDFS-9298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chang Li updated HDFS-9298: --- Attachment: HDFS-9298.1.patch > remove replica and not add replica with wrong genStamp > -- > > Key: HDFS-9298 > URL: https://issues.apache.org/jira/browse/HDFS-9298 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Chang Li >Assignee: Chang Li > Attachments: HDFS-9298.1.patch > > > currently, in setGenerationStampAndVerifyReplicas, replica with wrong gen > stamp is not really removed, only StorageLocation of that replica is removed. > Moreover, we should check genStamp before addReplicaIfNotPresent -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9298) remove replica and not add replica with wrong genStamp
[ https://issues.apache.org/jira/browse/HDFS-9298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chang Li updated HDFS-9298: --- Status: Patch Available (was: Open) > remove replica and not add replica with wrong genStamp > -- > > Key: HDFS-9298 > URL: https://issues.apache.org/jira/browse/HDFS-9298 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Chang Li >Assignee: Chang Li > Attachments: HDFS-9298.1.patch > > > currently, in setGenerationStampAndVerifyReplicas, replica with wrong gen > stamp is not really removed, only StorageLocation of that replica is removed. > Moreover, we should check genStamp before addReplicaIfNotPresent -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9266) Avoid unsafe split and append on fields that might be IPv6 literals
[ https://issues.apache.org/jira/browse/HDFS-9266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark updated HDFS-9266: Resolution: Fixed Status: Resolved (was: Patch Available) > Avoid unsafe split and append on fields that might be IPv6 literals > --- > > Key: HDFS-9266 > URL: https://issues.apache.org/jira/browse/HDFS-9266 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Nemanja Matkovic >Assignee: Nemanja Matkovic > Labels: ipv6 > Attachments: HDFS-9266-HADOOP-11890.1.patch, > HDFS-9266-HADOOP-11890.2.patch > > Original Estimate: 48h > Remaining Estimate: 48h > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9295) Add a thorough test of the full KMS code path
[ https://issues.apache.org/jira/browse/HDFS-9295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971634#comment-14971634 ] Hadoop QA commented on HDFS-9295: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 8m 14s | Pre-patch trunk has 1 extant Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 53s | There were no new javac warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 25s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 30s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 30s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 1m 2s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 52m 13s | Tests failed in hadoop-hdfs. | | | | 75m 47s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes | | | hadoop.hdfs.server.datanode.TestDirectoryScanner | | | hadoop.hdfs.server.datanode.TestFsDatasetCache | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12768338/HDFS-9295.002.patch | | Optional Tests | javac unit findbugs checkstyle | | git revision | trunk / eb6379c | | Pre-patch Findbugs warnings | https://builds.apache.org/job/PreCommit-HDFS-Build/13160/artifact/patchprocess/trunkFindbugsWarningshadoop-hdfs.html | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/13160/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/13160/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/13160/console | This message was automatically generated. > Add a thorough test of the full KMS code path > - > > Key: HDFS-9295 > URL: https://issues.apache.org/jira/browse/HDFS-9295 > Project: Hadoop HDFS > Issue Type: Test > Components: security, test >Affects Versions: 2.7.1 >Reporter: Daniel Templeton >Assignee: Daniel Templeton >Priority: Critical > Attachments: HDFS-9295.001.patch, HDFS-9295.002.patch > > > TestKMS does a good job of testing the ACLs directly, but they are tested out > of context. Additional tests are needed that test how the ACL impact key > creation, EZ creation, file creation in an EZ, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9294) DFSClient deadlock when close file and failed to renew lease
邓飞 created HDFS-9294: Summary: DFSClient deadlock when close file and failed to renew lease Key: HDFS-9294 URL: https://issues.apache.org/jira/browse/HDFS-9294 Project: Hadoop HDFS Issue Type: Bug Components: HDFS, hdfs-client Affects Versions: 2.7.1, 2.2.0 Environment: Hadoop 2.2.0 Reporter: 邓飞 We found a deadlock at our HBase(0.98) cluster(and the Hadoop Version is 2.2.0),and it should be HDFS BUG,at the time our network is not stable. below is the stack: * Found one Java-level deadlock: = "MemStoreFlusher.1": waiting to lock monitor 0x7ff27cfa5218 (object 0x0002fae5ebe0, a org.apache.hadoop.hdfs.LeaseRenewer), which is held by "LeaseRenewer:hbaseadmin@hbase-ns-gdt-sh-marvel" "LeaseRenewer:hbaseadmin@hbase-ns-gdt-sh-marvel": waiting to lock monitor 0x7ff2e67e16a8 (object 0x000486ce6620, a org.apache.hadoop.hdfs.DFSOutputStream), which is held by "MemStoreFlusher.0" "MemStoreFlusher.0": waiting to lock monitor 0x7ff27cfa5218 (object 0x0002fae5ebe0, a org.apache.hadoop.hdfs.LeaseRenewer), which is held by "LeaseRenewer:hbaseadmin@hbase-ns-gdt-sh-marvel" Java stack information for the threads listed above: === "MemStoreFlusher.1": at org.apache.hadoop.hdfs.LeaseRenewer.addClient(LeaseRenewer.java:216) - waiting to lock <0x0002fae5ebe0> (a org.apache.hadoop.hdfs.LeaseRenewer) at org.apache.hadoop.hdfs.LeaseRenewer.getInstance(LeaseRenewer.java:81) at org.apache.hadoop.hdfs.DFSClient.getLeaseRenewer(DFSClient.java:648) at org.apache.hadoop.hdfs.DFSClient.endFileLease(DFSClient.java:659) at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1882) - locked <0x00055b606cb0> (a org.apache.hadoop.hdfs.DFSOutputStream) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:71) at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:104) at org.apache.hadoop.hbase.io.hfile.AbstractHFileWriter.finishClose(AbstractHFileWriter.java:250) at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.close(HFileWriterV2.java:402) at org.apache.hadoop.hbase.regionserver.StoreFile$Writer.close(StoreFile.java:974) at org.apache.hadoop.hbase.regionserver.StoreFlusher.finalizeWriter(StoreFlusher.java:78) at org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:75) - locked <0x00059869eed8> (a java.lang.Object) at org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:812) at org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:1974) at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1795) at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1678) at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1591) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:472) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:211) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$500(MemStoreFlusher.java:66) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:238) at java.lang.Thread.run(Thread.java:744) "LeaseRenewer:hbaseadmin@hbase-ns-gdt-sh-marvel": at org.apache.hadoop.hdfs.DFSOutputStream.abort(DFSOutputStream.java:1822) - waiting to lock <0x000486ce6620> (a org.apache.hadoop.hdfs.DFSOutputStream) at org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:780) at org.apache.hadoop.hdfs.DFSClient.abort(DFSClient.java:753) at org.apache.hadoop.hdfs.LeaseRenewer.run(LeaseRenewer.java:453) - locked <0x0002fae5ebe0> (a org.apache.hadoop.hdfs.LeaseRenewer) at org.apache.hadoop.hdfs.LeaseRenewer.access$700(LeaseRenewer.java:71) at org.apache.hadoop.hdfs.LeaseRenewer$1.run(LeaseRenewer.java:298) at java.lang.Thread.run(Thread.java:744) "MemStoreFlusher.0": at org.apache.hadoop.hdfs.LeaseRenewer.addClient(LeaseRenewer.java:216) - waiting to lock <0x0002fae5ebe0> (a org.apache.hadoop.hdfs.LeaseRenewer) at org.apache.hadoop.hdfs.LeaseRenewer.getInstance(LeaseRenewer.java:81) at org.apache.hadoop.hdfs.DFSClient.getLeaseRenewer(DFSClient.java:648) at org.apache.hadoop.hdfs.DFSClient.endFileLease(DFSClient.java:659) at
[jira] [Commented] (HDFS-9293) FSEditLog's 'OpInstanceCache' instance of threadLocal cache exists dirty 'rpcId',which may cause standby NN too busy to communicate
[ https://issues.apache.org/jira/browse/HDFS-9293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970720#comment-14970720 ] 邓飞 commented on HDFS-9293: -- thank Walter, it's my mistake,that fixed at 2.7.1 > FSEditLog's 'OpInstanceCache' instance of threadLocal cache exists dirty > 'rpcId',which may cause standby NN too busy to communicate > -- > > Key: HDFS-9293 > URL: https://issues.apache.org/jira/browse/HDFS-9293 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.2.0, 2.7.1 >Reporter: 邓飞 >Assignee: 邓飞 > > In our cluster (hadoop 2.2.0-HA,700+ DN),we found standby NN tail editlog > slowly,and hold the fsnamesystem writelock during the work and the DN's > heartbeart/blockreport IPC request blocked.Lead to Active NN remove stale DN > which can't send heartbeat because blocking at process Standby NN Regiest > common(FIXED at 2.7.1). > Below is the standby NN stack: > "Edit log tailer" prio=10 tid=0x7f28fcf35800 nid=0x1a7d runnable > [0x7f0dd1d76000] >java.lang.Thread.State: RUNNABLE > at java.util.PriorityQueue.remove(PriorityQueue.java:360) > at > org.apache.hadoop.util.LightWeightCache.put(LightWeightCache.java:217) > at org.apache.hadoop.ipc.RetryCache.addCacheEntry(RetryCache.java:270) > - locked <0x7f12817714b8> (a org.apache.hadoop.ipc.RetryCache) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addCacheEntry(FSNamesystem.java:724) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:406) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:199) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:112) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:733) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:227) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:321) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:279) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:296) > at > org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:456) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:292) > > When apply editLogOp,if the IPC retryCache is found,need to remove the > previous from priorityQueue(O(N)), The updateblock is don't need record > rpcId on editlog except 'client request updatePipeline',but we found many > 'UpdateBlocksOp' has repeat ipcId. > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9276) Failed to Update HDFS Delegation Token for long running application in HA mode
[ https://issues.apache.org/jira/browse/HDFS-9276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970837#comment-14970837 ] Hadoop QA commented on HDFS-9276: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 19m 1s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 8m 38s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 11m 58s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 24s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 13s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 41s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 35s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 2m 23s | The patch appears to introduce 1 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | common tests | 8m 56s | Tests passed in hadoop-common. | | | | 54m 52s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-common | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12768270/HDFS-9276.03.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 124a412 | | Findbugs warnings | https://builds.apache.org/job/PreCommit-HDFS-Build/13151/artifact/patchprocess/newPatchFindbugsWarningshadoop-common.html | | hadoop-common test log | https://builds.apache.org/job/PreCommit-HDFS-Build/13151/artifact/patchprocess/testrun_hadoop-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/13151/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/13151/console | This message was automatically generated. > Failed to Update HDFS Delegation Token for long running application in HA mode > -- > > Key: HDFS-9276 > URL: https://issues.apache.org/jira/browse/HDFS-9276 > Project: Hadoop HDFS > Issue Type: Bug > Components: fs, ha, security >Affects Versions: 2.7.1 >Reporter: Liangliang Gu >Assignee: Liangliang Gu > Attachments: HDFS-9276.01.patch, HDFS-9276.02.patch, > HDFS-9276.03.patch, debug1.PNG, debug2.PNG > > > The Scenario is as follows: > 1. NameNode HA is enabled. > 2. Kerberos is enabled. > 3. HDFS Delegation Token (not Keytab or TGT) is used to communicate with > NameNode. > 4. We want to update the HDFS Delegation Token for long running applicatons. > HDFS Client will generate private tokens for each NameNode. When we update > the HDFS Delegation Token, these private tokens will not be updated, which > will cause token expired. > This bug can be reproduced by the following program: > {code} > import java.security.PrivilegedExceptionAction > import org.apache.hadoop.conf.Configuration > import org.apache.hadoop.fs.{FileSystem, Path} > import org.apache.hadoop.security.UserGroupInformation > object HadoopKerberosTest { > def main(args: Array[String]): Unit = { > val keytab = "/path/to/keytab/xxx.keytab" > val principal = "x...@abc.com" > val creds1 = new org.apache.hadoop.security.Credentials() > val ugi1 = > UserGroupInformation.loginUserFromKeytabAndReturnUGI(principal, keytab) > ugi1.doAs(new PrivilegedExceptionAction[Void] { > // Get a copy of the credentials > override def run(): Void = { > val fs = FileSystem.get(new Configuration()) > fs.addDelegationTokens("test", creds1) > null > } > }) > val ugi = UserGroupInformation.createRemoteUser("test") > ugi.addCredentials(creds1) > ugi.doAs(new PrivilegedExceptionAction[Void] { > // Get a copy of the credentials > override def run(): Void = { > var i = 0 > while (true) { > val creds1 = new org.apache.hadoop.security.Credentials() > val ugi1 = > UserGroupInformation.loginUserFromKeytabAndReturnUGI(principal, keytab) > ugi1.doAs(new
[jira] [Assigned] (HDFS-9294) DFSClient deadlock when close file and failed to renew lease
[ https://issues.apache.org/jira/browse/HDFS-9294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] 邓飞 reassigned HDFS-9294: Assignee: 邓飞 > DFSClient deadlock when close file and failed to renew lease > - > > Key: HDFS-9294 > URL: https://issues.apache.org/jira/browse/HDFS-9294 > Project: Hadoop HDFS > Issue Type: Bug > Components: HDFS, hdfs-client >Affects Versions: 2.2.0, 2.7.1 > Environment: Hadoop 2.2.0 >Reporter: 邓飞 >Assignee: 邓飞 > > We found a deadlock at our HBase(0.98) cluster(and the Hadoop Version is > 2.2.0),and it should be HDFS BUG,at the time our network is not stable. > below is the stack: > * > Found one Java-level deadlock: > = > "MemStoreFlusher.1": > waiting to lock monitor 0x7ff27cfa5218 (object 0x0002fae5ebe0, a > org.apache.hadoop.hdfs.LeaseRenewer), > which is held by "LeaseRenewer:hbaseadmin@hbase-ns-gdt-sh-marvel" > "LeaseRenewer:hbaseadmin@hbase-ns-gdt-sh-marvel": > waiting to lock monitor 0x7ff2e67e16a8 (object 0x000486ce6620, a > org.apache.hadoop.hdfs.DFSOutputStream), > which is held by "MemStoreFlusher.0" > "MemStoreFlusher.0": > waiting to lock monitor 0x7ff27cfa5218 (object 0x0002fae5ebe0, a > org.apache.hadoop.hdfs.LeaseRenewer), > which is held by "LeaseRenewer:hbaseadmin@hbase-ns-gdt-sh-marvel" > Java stack information for the threads listed above: > === > "MemStoreFlusher.1": > at org.apache.hadoop.hdfs.LeaseRenewer.addClient(LeaseRenewer.java:216) > - waiting to lock <0x0002fae5ebe0> (a > org.apache.hadoop.hdfs.LeaseRenewer) > at org.apache.hadoop.hdfs.LeaseRenewer.getInstance(LeaseRenewer.java:81) > at org.apache.hadoop.hdfs.DFSClient.getLeaseRenewer(DFSClient.java:648) > at org.apache.hadoop.hdfs.DFSClient.endFileLease(DFSClient.java:659) > at > org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1882) > - locked <0x00055b606cb0> (a org.apache.hadoop.hdfs.DFSOutputStream) > at > org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:71) > at > org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:104) > at > org.apache.hadoop.hbase.io.hfile.AbstractHFileWriter.finishClose(AbstractHFileWriter.java:250) > at > org.apache.hadoop.hbase.io.hfile.HFileWriterV2.close(HFileWriterV2.java:402) > at > org.apache.hadoop.hbase.regionserver.StoreFile$Writer.close(StoreFile.java:974) > at > org.apache.hadoop.hbase.regionserver.StoreFlusher.finalizeWriter(StoreFlusher.java:78) > at > org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:75) > - locked <0x00059869eed8> (a java.lang.Object) > at > org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:812) > at > org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:1974) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1795) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1678) > at > org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1591) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:472) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:211) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$500(MemStoreFlusher.java:66) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:238) > at java.lang.Thread.run(Thread.java:744) > "LeaseRenewer:hbaseadmin@hbase-ns-gdt-sh-marvel": > at > org.apache.hadoop.hdfs.DFSOutputStream.abort(DFSOutputStream.java:1822) > - waiting to lock <0x000486ce6620> (a > org.apache.hadoop.hdfs.DFSOutputStream) > at > org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:780) > at org.apache.hadoop.hdfs.DFSClient.abort(DFSClient.java:753) > at org.apache.hadoop.hdfs.LeaseRenewer.run(LeaseRenewer.java:453) > - locked <0x0002fae5ebe0> (a org.apache.hadoop.hdfs.LeaseRenewer) > at org.apache.hadoop.hdfs.LeaseRenewer.access$700(LeaseRenewer.java:71) > at org.apache.hadoop.hdfs.LeaseRenewer$1.run(LeaseRenewer.java:298) > at java.lang.Thread.run(Thread.java:744) > "MemStoreFlusher.0": > at org.apache.hadoop.hdfs.LeaseRenewer.addClient(LeaseRenewer.java:216) > - waiting to lock <0x0002fae5ebe0> (a
[jira] [Commented] (HDFS-9293) FSEditLog's 'OpInstanceCache' instance of threadLocal cache exists dirty 'rpcId',which may cause standby NN too busy to communicate
[ https://issues.apache.org/jira/browse/HDFS-9293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970711#comment-14970711 ] Walter Su commented on HDFS-9293: - I checked HDFS-7398. I think ClientId/CallId will be reset after logEdit(..). > FSEditLog's 'OpInstanceCache' instance of threadLocal cache exists dirty > 'rpcId',which may cause standby NN too busy to communicate > -- > > Key: HDFS-9293 > URL: https://issues.apache.org/jira/browse/HDFS-9293 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.2.0, 2.7.1 >Reporter: 邓飞 >Assignee: 邓飞 > > In our cluster (hadoop 2.2.0-HA,700+ DN),we found standby NN tail editlog > slowly,and hold the fsnamesystem writelock during the work and the DN's > heartbeart/blockreport IPC request blocked.Lead to Active NN remove stale DN > which can't send heartbeat because blocking at process Standby NN Regiest > common(FIXED at 2.7.1). > Below is the standby NN stack: > "Edit log tailer" prio=10 tid=0x7f28fcf35800 nid=0x1a7d runnable > [0x7f0dd1d76000] >java.lang.Thread.State: RUNNABLE > at java.util.PriorityQueue.remove(PriorityQueue.java:360) > at > org.apache.hadoop.util.LightWeightCache.put(LightWeightCache.java:217) > at org.apache.hadoop.ipc.RetryCache.addCacheEntry(RetryCache.java:270) > - locked <0x7f12817714b8> (a org.apache.hadoop.ipc.RetryCache) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addCacheEntry(FSNamesystem.java:724) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:406) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:199) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:112) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:733) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:227) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:321) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:279) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:296) > at > org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:456) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:292) > > When apply editLogOp,if the IPC retryCache is found,need to remove the > previous from priorityQueue(O(N)), The updateblock is don't need record > rpcId on editlog except 'client request updatePipeline',but we found many > 'UpdateBlocksOp' has repeat ipcId. > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9255) Consolidate block recovery related implementation into a single class
[ https://issues.apache.org/jira/browse/HDFS-9255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970735#comment-14970735 ] Hadoop QA commented on HDFS-9255: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 18m 17s | Pre-patch trunk has 1 extant Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 51s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 23s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 27s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 28s | The applied patch generated 1 new checkstyle issues (total was 286, now 264). | | {color:red}-1{color} | whitespace | 0m 1s | The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 33s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 2m 33s | The patch appears to introduce 1 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 12s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 51m 35s | Tests failed in hadoop-hdfs. | | | | 97m 58s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-hdfs | | Failed unit tests | hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12768244/HDFS-9255.05.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 124a412 | | Pre-patch Findbugs warnings | https://builds.apache.org/job/PreCommit-HDFS-Build/13149/artifact/patchprocess/trunkFindbugsWarningshadoop-hdfs.html | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/13149/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt | | whitespace | https://builds.apache.org/job/PreCommit-HDFS-Build/13149/artifact/patchprocess/whitespace.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-HDFS-Build/13149/artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/13149/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/13149/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/13149/console | This message was automatically generated. > Consolidate block recovery related implementation into a single class > - > > Key: HDFS-9255 > URL: https://issues.apache.org/jira/browse/HDFS-9255 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Walter Su >Assignee: Walter Su >Priority: Minor > Attachments: HDFS-9255.01.patch, HDFS-9255.02.patch, > HDFS-9255.03.patch, HDFS-9255.04.patch, HDFS-9255.05.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9290) DFSClient#callAppend() is not backward compatible for slightly older NameNodes
[ https://issues.apache.org/jira/browse/HDFS-9290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971060#comment-14971060 ] Kihwal Lee commented on HDFS-9290: -- The fix looks good. One minor nit is that logging at {{INFO}} can sometimes be noisy. I think end-users rarely care about the fact that it is talking to an older namenode. Let's make it {{DEBUG}}. > DFSClient#callAppend() is not backward compatible for slightly older NameNodes > -- > > Key: HDFS-9290 > URL: https://issues.apache.org/jira/browse/HDFS-9290 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.1 >Reporter: Tony Wu >Assignee: Tony Wu >Priority: Blocker > Attachments: HDFS-9290.001.patch > > > HDFS-7210 combined 2 RPC calls used at file append into a single one. > Specifically {{getFileInfo()}} is combined with {{append()}}. While backward > compatibility for older client is handled by the new NameNode (protobuf). > Newer client's {{append()}} call does not work with older NameNodes. One will > run into an exception like the following: > {code:java} > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.DFSOutputStream.isLazyPersist(DFSOutputStream.java:1741) > at > org.apache.hadoop.hdfs.DFSOutputStream.getChecksum4Compute(DFSOutputStream.java:1550) > at > org.apache.hadoop.hdfs.DFSOutputStream.(DFSOutputStream.java:1560) > at > org.apache.hadoop.hdfs.DFSOutputStream.(DFSOutputStream.java:1670) > at > org.apache.hadoop.hdfs.DFSOutputStream.newStreamForAppend(DFSOutputStream.java:1717) > at org.apache.hadoop.hdfs.DFSClient.callAppend(DFSClient.java:1861) > at org.apache.hadoop.hdfs.DFSClient.append(DFSClient.java:1922) > at org.apache.hadoop.hdfs.DFSClient.append(DFSClient.java:1892) > at > org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:340) > at > org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:336) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.append(DistributedFileSystem.java:336) > at > org.apache.hadoop.hdfs.DistributedFileSystem.append(DistributedFileSystem.java:318) > at org.apache.hadoop.fs.FileSystem.append(FileSystem.java:1164) > {code} > The cause is that the new client code is expecting both the last block and > file info in the same RPC but the old NameNode only replied with the first. > The exception itself does not reflect this and one will have to look at the > HDFS source code to really understand what happened. > We can have the client detect it's talking to a old NameNode and send an > extra {{getFileInfo()}} RPC. Or we should improve the exception being thrown > to accurately reflect the cause of failure. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9295) Add a thorough test of the full KMS code path
[ https://issues.apache.org/jira/browse/HDFS-9295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971106#comment-14971106 ] Hadoop QA commented on HDFS-9295: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 8m 31s | Pre-patch trunk has 1 extant Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:red}-1{color} | javac | 8m 32s | The applied patch generated 22 additional warning messages. | | {color:green}+1{color} | release audit | 0m 21s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 33s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 36s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 35s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 40s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 1m 9s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 66m 51s | Tests failed in hadoop-hdfs. | | | | 91m 52s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.hdfs.TestSafeMode | | | hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12768294/HDFS-9295.001.patch | | Optional Tests | javac unit findbugs checkstyle | | git revision | trunk / 124a412 | | Pre-patch Findbugs warnings | https://builds.apache.org/job/PreCommit-HDFS-Build/13152/artifact/patchprocess/trunkFindbugsWarningshadoop-hdfs.html | | javac | https://builds.apache.org/job/PreCommit-HDFS-Build/13152/artifact/patchprocess/diffJavacWarnings.txt | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/13152/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/13152/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/13152/console | This message was automatically generated. > Add a thorough test of the full KMS code path > - > > Key: HDFS-9295 > URL: https://issues.apache.org/jira/browse/HDFS-9295 > Project: Hadoop HDFS > Issue Type: Test > Components: security, test >Affects Versions: 2.7.1 >Reporter: Daniel Templeton >Assignee: Daniel Templeton >Priority: Critical > Attachments: HDFS-9295.001.patch > > > TestKMS does a good job of testing the ACLs directly, but they are tested out > of context. Additional tests are needed that test how the ACL impact key > creation, EZ creation, file creation in an EZ, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9117) Config file reader / options classes for libhdfs++
[ https://issues.apache.org/jira/browse/HDFS-9117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971142#comment-14971142 ] James Clampffer commented on HDFS-9117: --- Thanks for the update Bob! The last patch covered all of my concerns, the comments made it much easier to understand. I have one tiny issue and one nit: Issue: In configuration.cc #includecan just be #include Nit: A comment about why 20 is the recursion depth limit for Configuration::SubstituteVars could be handy even if it just says "this is how the java client does it". Certainly not a blocker. Once the include is fixed up I'll +1. > Config file reader / options classes for libhdfs++ > -- > > Key: HDFS-9117 > URL: https://issues.apache.org/jira/browse/HDFS-9117 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Affects Versions: HDFS-8707 >Reporter: Bob Hansen >Assignee: Bob Hansen > Attachments: HDFS-9117.HDFS-8707.001.patch, > HDFS-9117.HDFS-8707.002.patch, HDFS-9117.HDFS-8707.003.patch, > HDFS-9117.HDFS-8707.004.patch, HDFS-9117.HDFS-8707.005.patch, > HDFS-9117.HDFS-8707.006.patch > > > For environmental compatability with HDFS installations, libhdfs++ should be > able to read the configurations from Hadoop XML files and behave in line with > the Java implementation. > Most notably, machine names and ports should be readable from Hadoop XML > configuration files. > Similarly, an internal Options architecture for libhdfs++ should be developed > to efficiently transport the configuration information within the system. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7284) Add more debug info to BlockInfoUnderConstruction#setGenerationStampAndVerifyReplicas
[ https://issues.apache.org/jira/browse/HDFS-7284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971179#comment-14971179 ] Yongjun Zhang commented on HDFS-7284: - Hi [~jojochuang], It's important to have consistent block name appear in the log, so people can analyze the actions happened to a given block across the board by searching for the "blk_id_timestamp" or "blk_id". I'd suggest adding the following code to Block.java: {code} /** */ public static String toString(final Block b) { return b.getBlockName() + "_" + b.getGenerationStamp(); } /** */ @Override public String toString() { return toString(this); } {code} and change the message you are working on to {code} NameNode.blockStateChangeLog.debug("BLOCK* Removing stale replica {}" + " of {}", r, Block.toString(r)); {code} Hi [~andrew.wang], does this sound good to you? I think the replica state that comes with {{ReplicaUnderConstruction#toString}} would help debugging. Thanks. > Add more debug info to > BlockInfoUnderConstruction#setGenerationStampAndVerifyReplicas > - > > Key: HDFS-7284 > URL: https://issues.apache.org/jira/browse/HDFS-7284 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 2.5.1 >Reporter: Hu Liu, >Assignee: Wei-Chiu Chuang > Attachments: HDFS-7284.001.patch, HDFS-7284.002.patch > > > When I was looking at some replica loss issue, I got the following info from > log > {code} > 2014-10-13 01:54:53,104 INFO BlockStateChange: BLOCK* Removing stale replica > from location x.x.x.x > {code} > I could just know that a replica is removed, but I don't know which block and > its timestamp. I need to know the id and timestamp of the block from the log > file. > So it's better to add more info including block id and timestamp to the code > snippet > {code} > for (ReplicaUnderConstruction r : replicas) { > if (genStamp != r.getGenerationStamp()) { > r.getExpectedLocation().removeBlock(this); > NameNode.blockStateChangeLog.info("BLOCK* Removing stale replica " > + "from location: " + r.getExpectedLocation()); > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9269) Need to update the documentation and wrapper for fuse-dfs
[ https://issues.apache.org/jira/browse/HDFS-9269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDFS-9269: -- Attachment: HDFS-9269.001.patch rev1: work in progress. updated doc/README > Need to update the documentation and wrapper for fuse-dfs > - > > Key: HDFS-9269 > URL: https://issues.apache.org/jira/browse/HDFS-9269 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang >Priority: Minor > Attachments: HDFS-9269.001.patch > > > To reproduce the bug in HDFS-9268, I followed the wiki, the doc and read the > wrapper script of fuse-dfs, but found them super outdated. (the wrapper was > last updated four years ago, and the hadoop project layout has dramatically > changed since then). I am creating this JIRA to track the status of the > update. > There are quite a few external blogs/discussion threads floating around the > internet which talked about how to update the scripts, but no one took the > time to update them here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9290) DFSClient#callAppend() is not backward compatible for slightly older NameNodes
[ https://issues.apache.org/jira/browse/HDFS-9290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971255#comment-14971255 ] Hadoop QA commented on HDFS-9290: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 19m 26s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 9m 19s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 11m 55s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 24s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 53s | The applied patch generated 1 new checkstyle issues (total was 55, now 55). | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 36s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 36s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 18s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 34s | Pre-build of native portion | | {color:green}+1{color} | hdfs tests | 0m 31s | Tests passed in hadoop-hdfs-client. | | | | 51m 35s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12768311/HDFS-9290.002.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 35a303d | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/13154/artifact/patchprocess/diffcheckstylehadoop-hdfs-client.txt | | hadoop-hdfs-client test log | https://builds.apache.org/job/PreCommit-HDFS-Build/13154/artifact/patchprocess/testrun_hadoop-hdfs-client.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/13154/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/13154/console | This message was automatically generated. > DFSClient#callAppend() is not backward compatible for slightly older NameNodes > -- > > Key: HDFS-9290 > URL: https://issues.apache.org/jira/browse/HDFS-9290 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.1 >Reporter: Tony Wu >Assignee: Tony Wu >Priority: Blocker > Attachments: HDFS-9290.001.patch, HDFS-9290.002.patch > > > HDFS-7210 combined 2 RPC calls used at file append into a single one. > Specifically {{getFileInfo()}} is combined with {{append()}}. While backward > compatibility for older client is handled by the new NameNode (protobuf). > Newer client's {{append()}} call does not work with older NameNodes. One will > run into an exception like the following: > {code:java} > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.DFSOutputStream.isLazyPersist(DFSOutputStream.java:1741) > at > org.apache.hadoop.hdfs.DFSOutputStream.getChecksum4Compute(DFSOutputStream.java:1550) > at > org.apache.hadoop.hdfs.DFSOutputStream.(DFSOutputStream.java:1560) > at > org.apache.hadoop.hdfs.DFSOutputStream.(DFSOutputStream.java:1670) > at > org.apache.hadoop.hdfs.DFSOutputStream.newStreamForAppend(DFSOutputStream.java:1717) > at org.apache.hadoop.hdfs.DFSClient.callAppend(DFSClient.java:1861) > at org.apache.hadoop.hdfs.DFSClient.append(DFSClient.java:1922) > at org.apache.hadoop.hdfs.DFSClient.append(DFSClient.java:1892) > at > org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:340) > at > org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:336) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.append(DistributedFileSystem.java:336) > at > org.apache.hadoop.hdfs.DistributedFileSystem.append(DistributedFileSystem.java:318) > at org.apache.hadoop.fs.FileSystem.append(FileSystem.java:1164) > {code} > The cause is that the new client code is expecting both the last block and > file info in the same
[jira] [Commented] (HDFS-9296) ShellBasedUnixGroupMapping should support group names with space
[ https://issues.apache.org/jira/browse/HDFS-9296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971252#comment-14971252 ] Allen Wittenauer commented on HDFS-9296: bq. AD permits group names with space (e.g. "Domain Users"). Yes, but that doesn't mean they are POSIX compliant, which much match this regex: [_a-z][-0-9_a-z]*\$? . So a definite -1 on this. > ShellBasedUnixGroupMapping should support group names with space > > > Key: HDFS-9296 > URL: https://issues.apache.org/jira/browse/HDFS-9296 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > > In a typical configuration, group name is obtained from AD through SSSD/LDAP. > AD permits group names with space (e.g. "Domain Users"). > Unfortunately, the present implementation of ShellBasedUnixGroupMapping > parses the output of shell command "id -Gn", and assumes group names are > separated by space. > This could be achieved by using a combination of shell scripts, for example, > bash -c 'id -G weichiu | tr " " "\n" | xargs -I % getent group "%" | cut > -d":" -f1' > But I am still looking for a more compact form, and potentially more > efficient one. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9272) Implement a unix-like cat utility
[ https://issues.apache.org/jira/browse/HDFS-9272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971177#comment-14971177 ] James Clampffer commented on HDFS-9272: --- Will do. Is there generally a preference in the HDFS community about taking a host and port as seperate tokens vs taking a URI for these sorts of tests? > Implement a unix-like cat utility > - > > Key: HDFS-9272 > URL: https://issues.apache.org/jira/browse/HDFS-9272 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: James Clampffer >Assignee: James Clampffer >Priority: Minor > Attachments: HDFS-9272.HDFS-8707.000.patch > > > Implement the basic functionality of "cat" and have it build as a separate > executable. > 2 Reasons for this: > We don't have any real integration tests at the moment so something simple to > verify that the library actually works against a real cluster is useful. > Eventually I'll make more utilities like stat, mkdir etc. Once there are > enough of them it will be simple to make a C++ implementation of the hadoop > fs command line interface that doesn't take the latency hit of spinning up a > JVM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7284) Add more debug info to BlockInfoUnderConstruction#setGenerationStampAndVerifyReplicas
[ https://issues.apache.org/jira/browse/HDFS-7284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang updated HDFS-7284: Labels: supportability (was: ) > Add more debug info to > BlockInfoUnderConstruction#setGenerationStampAndVerifyReplicas > - > > Key: HDFS-7284 > URL: https://issues.apache.org/jira/browse/HDFS-7284 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 2.5.1 >Reporter: Hu Liu, >Assignee: Wei-Chiu Chuang > Labels: supportability > Attachments: HDFS-7284.001.patch, HDFS-7284.002.patch > > > When I was looking at some replica loss issue, I got the following info from > log > {code} > 2014-10-13 01:54:53,104 INFO BlockStateChange: BLOCK* Removing stale replica > from location x.x.x.x > {code} > I could just know that a replica is removed, but I don't know which block and > its timestamp. I need to know the id and timestamp of the block from the log > file. > So it's better to add more info including block id and timestamp to the code > snippet > {code} > for (ReplicaUnderConstruction r : replicas) { > if (genStamp != r.getGenerationStamp()) { > r.getExpectedLocation().removeBlock(this); > NameNode.blockStateChangeLog.info("BLOCK* Removing stale replica " > + "from location: " + r.getExpectedLocation()); > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9296) ShellBasedUnixGroupMapping should support group names with space
Wei-Chiu Chuang created HDFS-9296: - Summary: ShellBasedUnixGroupMapping should support group names with space Key: HDFS-9296 URL: https://issues.apache.org/jira/browse/HDFS-9296 Project: Hadoop HDFS Issue Type: Bug Reporter: Wei-Chiu Chuang Assignee: Wei-Chiu Chuang In a typical configuration, group name is obtained from AD through SSSD/LDAP. AD permits group names with space (e.g. "Domain Users"). Unfortunately, the present implementation of ShellBasedUnixGroupMapping parses the output of shell command "id -Gn", and assumes group names are separated by space. This could be achieved by using a combination of shell scripts, for example, bash -c 'id -G weichiu | tr " " "\n" | xargs -I % getent group "%" | cut -d":" -f1' But I am still looking for a more compact form, and potentially more efficient one. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-9296) ShellBasedUnixGroupMapping should support group names with space
[ https://issues.apache.org/jira/browse/HDFS-9296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang resolved HDFS-9296. --- Resolution: Duplicate I filed in the wrong category. A new one is filed as HADOOP-12505 > ShellBasedUnixGroupMapping should support group names with space > > > Key: HDFS-9296 > URL: https://issues.apache.org/jira/browse/HDFS-9296 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > > In a typical configuration, group name is obtained from AD through SSSD/LDAP. > AD permits group names with space (e.g. "Domain Users"). > Unfortunately, the present implementation of ShellBasedUnixGroupMapping > parses the output of shell command "id -Gn", and assumes group names are > separated by space. > This could be achieved by using a combination of shell scripts, for example, > bash -c 'id -G weichiu | tr " " "\n" | xargs -I % getent group "%" | cut > -d":" -f1' > But I am still looking for a more compact form, and potentially more > efficient one. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9295) Add a thorough test of the full KMS code path
[ https://issues.apache.org/jira/browse/HDFS-9295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Templeton updated HDFS-9295: --- Attachment: HDFS-9295.001.patch > Add a thorough test of the full KMS code path > - > > Key: HDFS-9295 > URL: https://issues.apache.org/jira/browse/HDFS-9295 > Project: Hadoop HDFS > Issue Type: Test > Components: security, test >Affects Versions: 2.6.1 >Reporter: Daniel Templeton >Assignee: Daniel Templeton >Priority: Critical > Attachments: HDFS-9295.001.patch > > > TestKMS does a good job of testing the ACLs directly, but they are tested out > of context. Additional tests are needed that test how the ACL impact key > creation, EZ creation, file creation in an EZ, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9243) TestUnderReplicatedBlocks#testSetrepIncWithUnderReplicatedBlocks test timeout
[ https://issues.apache.org/jira/browse/HDFS-9243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970914#comment-14970914 ] Wei-Chiu Chuang commented on HDFS-9243: --- Thanks for the analysis. Please feel free to assign this jira to yourself. Because the failure appear quite frequent, it should be possible to improve upon it, even though it may not be an issue in production. I am thinking it could be resolved by reducing certain timeout parameters, so that the test case doesn't need to wait long. > TestUnderReplicatedBlocks#testSetrepIncWithUnderReplicatedBlocks test timeout > - > > Key: HDFS-9243 > URL: https://issues.apache.org/jira/browse/HDFS-9243 > Project: Hadoop HDFS > Issue Type: Bug > Components: HDFS >Reporter: Wei-Chiu Chuang >Priority: Minor > > org.apache.hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks > sometimes time out. > This is happening on trunk as can be observed in several recent jenkins job. > (e.g. https://builds.apache.org/job/Hadoop-Hdfs-trunk/2423/ > https://builds.apache.org/job/Hadoop-Hdfs-trunk/2386/ > https://builds.apache.org/job/Hadoop-Hdfs-trunk/2351/ > https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/472/ > On my local Linux machine, this test case times out 6 out of 10 times. When > it does not time out, this test takes about 20 seconds, otherwise it takes > more than 60 seconds and then time out. > I suspect it's a deadlock issue, as dead lock had occurred at this test case > in HDFS-5527 before. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9295) Add a thorough test of the full KMS code path
Daniel Templeton created HDFS-9295: -- Summary: Add a thorough test of the full KMS code path Key: HDFS-9295 URL: https://issues.apache.org/jira/browse/HDFS-9295 Project: Hadoop HDFS Issue Type: Test Components: security, test Affects Versions: 2.6.1 Reporter: Daniel Templeton Assignee: Daniel Templeton Priority: Critical TestKMS does a good job of testing the ACLs directly, but they are tested out of context. Additional tests are needed that test how the ACL impact key creation, EZ creation, file creation in an EZ, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9295) Add a thorough test of the full KMS code path
[ https://issues.apache.org/jira/browse/HDFS-9295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Templeton updated HDFS-9295: --- Affects Version/s: (was: 2.6.1) 2.7.1 Status: Patch Available (was: Open) > Add a thorough test of the full KMS code path > - > > Key: HDFS-9295 > URL: https://issues.apache.org/jira/browse/HDFS-9295 > Project: Hadoop HDFS > Issue Type: Test > Components: security, test >Affects Versions: 2.7.1 >Reporter: Daniel Templeton >Assignee: Daniel Templeton >Priority: Critical > Attachments: HDFS-9295.001.patch > > > TestKMS does a good job of testing the ACLs directly, but they are tested out > of context. Additional tests are needed that test how the ACL impact key > creation, EZ creation, file creation in an EZ, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9285) testTruncateWithDataNodesRestartImmediately occasionally fails
[ https://issues.apache.org/jira/browse/HDFS-9285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970919#comment-14970919 ] Wei-Chiu Chuang commented on HDFS-9285: --- Thanks [~walter.k.su] for the analysis. You have done a fix in HDFS-8729 so I am assuming you're the expert :) Please feel free to assign this jira to yourself! > testTruncateWithDataNodesRestartImmediately occasionally fails > -- > > Key: HDFS-9285 > URL: https://issues.apache.org/jira/browse/HDFS-9285 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Wei-Chiu Chuang >Priority: Minor > > https://builds.apache.org/job/Hadoop-Hdfs-trunk/2462/testReport/org.apache.hadoop.hdfs.server.namenode/TestFileTruncate/testTruncateWithDataNodesRestartImmediately/ > Note that this is similar, but appears to be a different failure than > HDFS-8729. > Error Message > inode should complete in ~3 ms. > Expected: is > but: was > Stacktrace > java.lang.AssertionError: inode should complete in ~3 ms. > Expected: is > but: was > at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20) > at org.junit.Assert.assertThat(Assert.java:865) > at > org.apache.hadoop.hdfs.server.namenode.TestFileTruncate.checkBlockRecovery(TestFileTruncate.java:1192) > at > org.apache.hadoop.hdfs.server.namenode.TestFileTruncate.checkBlockRecovery(TestFileTruncate.java:1176) > at > org.apache.hadoop.hdfs.server.namenode.TestFileTruncate.checkBlockRecovery(TestFileTruncate.java:1171) > at > org.apache.hadoop.hdfs.server.namenode.TestFileTruncate.testTruncateWithDataNodesRestartImmediately(TestFileTruncate.java:798) > Log excerpt: > 2015-10-22 06:34:47,281 [IPC Server handler 8 on 8020] INFO > FSNamesystem.audit (FSNamesystem.java:logAuditMessage(7358)) - allowed=true >ugi=jenkins (auth:SIMPLE) ip=/127.0.0.1 cmd=open > src=/test/testTruncateWithDataNodesRestartImmediately dst=null > perm=null proto=rpc > 2015-10-22 06:34:47,382 [IPC Server handler 9 on 8020] INFO > FSNamesystem.audit (FSNamesystem.java:logAuditMessage(7358)) - allowed=true >ugi=jenkins (auth:SIMPLE) ip=/127.0.0.1 cmd=open > src=/test/testTruncateWithDataNodesRestartImmediately dst=null > perm=null proto=rpc > 2015-10-22 06:34:47,484 [IPC Server handler 0 on 8020] INFO > FSNamesystem.audit (FSNamesystem.java:logAuditMessage(7358)) - allowed=true >ugi=jenkins (auth:SIMPLE) ip=/127.0.0.1 cmd=open > src=/test/testTruncateWithDataNodesRestartImmediately dst=null > perm=null proto=rpc > 2015-10-22 06:34:47,585 [IPC Server handler 1 on 8020] INFO > FSNamesystem.audit (FSNamesystem.java:logAuditMessage(7358)) - allowed=true >ugi=jenkins (auth:SIMPLE) ip=/127.0.0.1 cmd=open > src=/test/testTruncateWithDataNodesRestartImmediately dst=null > perm=null proto=rpc > 2015-10-22 06:34:47,689 [main] INFO hdfs.MiniDFSCluster > (MiniDFSCluster.java:shutdown(1889)) - Shutting down the Mini HDFS Cluster > 2015-10-22 06:34:47,690 [main] INFO hdfs.MiniDFSCluster > (MiniDFSCluster.java:shutdownDataNodes(1935)) - Shutting down DataNode 2 > 2015-10-22 06:34:47,690 [main] WARN datanode.DirectoryScanner > (DirectoryScanner.java:shutdown(529)) - DirectoryScanner: shutdown has been > called -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9276) Failed to Update HDFS Delegation Token for long running application in HA mode
[ https://issues.apache.org/jira/browse/HDFS-9276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liangliang Gu updated HDFS-9276: Attachment: HDFS-9276.02.patch > Failed to Update HDFS Delegation Token for long running application in HA mode > -- > > Key: HDFS-9276 > URL: https://issues.apache.org/jira/browse/HDFS-9276 > Project: Hadoop HDFS > Issue Type: Bug > Components: fs, ha, security >Affects Versions: 2.7.1 >Reporter: Liangliang Gu >Assignee: Liangliang Gu > Attachments: HDFS-9276.01.patch, HDFS-9276.02.patch, debug1.PNG, > debug2.PNG > > > The Scenario is as follows: > 1. NameNode HA is enabled. > 2. Kerberos is enabled. > 3. HDFS Delegation Token (not Keytab or TGT) is used to communicate with > NameNode. > 4. We want to update the HDFS Delegation Token for long running applicatons. > HDFS Client will generate private tokens for each NameNode. When we update > the HDFS Delegation Token, these private tokens will not be updated, which > will cause token expired. > This bug can be reproduced by the following program: > {code} > import java.security.PrivilegedExceptionAction > import org.apache.hadoop.conf.Configuration > import org.apache.hadoop.fs.{FileSystem, Path} > import org.apache.hadoop.security.UserGroupInformation > object HadoopKerberosTest { > def main(args: Array[String]): Unit = { > val keytab = "/path/to/keytab/xxx.keytab" > val principal = "x...@abc.com" > val creds1 = new org.apache.hadoop.security.Credentials() > val ugi1 = > UserGroupInformation.loginUserFromKeytabAndReturnUGI(principal, keytab) > ugi1.doAs(new PrivilegedExceptionAction[Void] { > // Get a copy of the credentials > override def run(): Void = { > val fs = FileSystem.get(new Configuration()) > fs.addDelegationTokens("test", creds1) > null > } > }) > val ugi = UserGroupInformation.createRemoteUser("test") > ugi.addCredentials(creds1) > ugi.doAs(new PrivilegedExceptionAction[Void] { > // Get a copy of the credentials > override def run(): Void = { > var i = 0 > while (true) { > val creds1 = new org.apache.hadoop.security.Credentials() > val ugi1 = > UserGroupInformation.loginUserFromKeytabAndReturnUGI(principal, keytab) > ugi1.doAs(new PrivilegedExceptionAction[Void] { > // Get a copy of the credentials > override def run(): Void = { > val fs = FileSystem.get(new Configuration()) > fs.addDelegationTokens("test", creds1) > null > } > }) > UserGroupInformation.getCurrentUser.addCredentials(creds1) > val fs = FileSystem.get( new Configuration()) > i += 1 > println() > println(i) > println(fs.listFiles(new Path("/user"), false)) > Thread.sleep(60 * 1000) > } > null > } > }) > } > } > {code} > To reproduce the bug, please set the following configuration to Name Node: > {code} > dfs.namenode.delegation.token.max-lifetime = 10min > dfs.namenode.delegation.key.update-interval = 3min > dfs.namenode.delegation.token.renew-interval = 3min > {code} > The bug will occure after 3 minutes. > The stacktrace is: > {code} > Exception in thread "main" > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): > token (HDFS_DELEGATION_TOKEN token 330156 for test) is expired > at org.apache.hadoop.ipc.Client.call(Client.java:1347) > at org.apache.hadoop.ipc.Client.call(Client.java:1300) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:651) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy10.getFileInfo(Unknown Source) > at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1679) > at > org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1106) > at >
[jira] [Commented] (HDFS-9293) FSEditLog's 'OpInstanceCache' instance of threadLocal cache exists dirty 'rpcId',which may cause standby NN too busy to communicate
[ https://issues.apache.org/jira/browse/HDFS-9293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970696#comment-14970696 ] 邓飞 commented on HDFS-9293: -- It's dirty reference case on FSEditLog: private ThreadLocal cache = new ThreadLocal() { @Override protected OpInstanceCache initialValue() { return new OpInstanceCache(); } }; If NN all handler thread initial the OpInstanceCache instance, the the thread will use later. Such as logUpdateBlocks: public void logUpdateBlocks(String path, INodeFileUnderConstruction file, boolean toLogRpcIds) { UpdateBlocksOp op = UpdateBlocksOp.getInstance(cache.get()) .setPath(path) .setBlocks(file.getBlocks()); logRpcIds(op, toLogRpcIds); logEdit(op); } /** Record the RPC IDs if necessary */ private void logRpcIds(FSEditLogOp op, boolean toLogRpcIds) { if (toLogRpcIds) { op.setRpcClientId(Server.getClientId()); op.setRpcCallId(Server.getCallId()); } } If client recover the pipeline at oncetime,so the FSEditLogOp instance will set RpcId. Even though other UpdateBlocksOp like addBlock whick identified as @Idempotent,but also will record repeat RpcId at editlog. That made standby NN IPC handler thread parking, indirectly active NN. And we found 2.7.1 has the same problem. > FSEditLog's 'OpInstanceCache' instance of threadLocal cache exists dirty > 'rpcId',which may cause standby NN too busy to communicate > -- > > Key: HDFS-9293 > URL: https://issues.apache.org/jira/browse/HDFS-9293 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.2.0, 2.7.1 >Reporter: 邓飞 >Assignee: 邓飞 > > In our cluster (hadoop 2.2.0-HA,700+ DN),we found standby NN tail editlog > slowly,and hold the fsnamesystem writelock during the work and the DN's > heartbeart/blockreport IPC request blocked.Lead to Active NN remove stale DN > which can't send heartbeat because blocking at process Standby NN Regiest > common(FIXED at 2.7.1). > Below is the standby NN stack: > "Edit log tailer" prio=10 tid=0x7f28fcf35800 nid=0x1a7d runnable > [0x7f0dd1d76000] >java.lang.Thread.State: RUNNABLE > at java.util.PriorityQueue.remove(PriorityQueue.java:360) > at > org.apache.hadoop.util.LightWeightCache.put(LightWeightCache.java:217) > at org.apache.hadoop.ipc.RetryCache.addCacheEntry(RetryCache.java:270) > - locked <0x7f12817714b8> (a org.apache.hadoop.ipc.RetryCache) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addCacheEntry(FSNamesystem.java:724) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:406) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:199) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:112) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:733) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:227) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:321) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:279) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:296) > at > org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:456) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:292) > > When apply editLogOp,if the IPC retryCache is found,need to remove the > previous from priorityQueue(O(N)), The updateblock is don't need record > rpcId on editlog except 'client request updatePipeline',but we found many > 'UpdateBlocksOp' has repeat ipcId. > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9293) FSEditLog's 'OpInstanceCache' instance of threadLocal cache exists dirty 'rpcId',which may cause standby NN too busy to communicate
[ https://issues.apache.org/jira/browse/HDFS-9293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970689#comment-14970689 ] Walter Su commented on HDFS-9293: - relates to HDFS-7609, HDFS-8611 > FSEditLog's 'OpInstanceCache' instance of threadLocal cache exists dirty > 'rpcId',which may cause standby NN too busy to communicate > -- > > Key: HDFS-9293 > URL: https://issues.apache.org/jira/browse/HDFS-9293 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.2.0, 2.7.1 >Reporter: 邓飞 >Assignee: 邓飞 > > In our cluster (hadoop 2.2.0-HA,700+ DN),we found standby NN tail editlog > slowly,and hold the fsnamesystem writelock during the work and the DN's > heartbeart/blockreport IPC request blocked.Lead to Active NN remove stale DN > which can't send heartbeat because blocking at process Standby NN Regiest > common(FIXED at 2.7.1). > Below is the standby NN stack: > "Edit log tailer" prio=10 tid=0x7f28fcf35800 nid=0x1a7d runnable > [0x7f0dd1d76000] >java.lang.Thread.State: RUNNABLE > at java.util.PriorityQueue.remove(PriorityQueue.java:360) > at > org.apache.hadoop.util.LightWeightCache.put(LightWeightCache.java:217) > at org.apache.hadoop.ipc.RetryCache.addCacheEntry(RetryCache.java:270) > - locked <0x7f12817714b8> (a org.apache.hadoop.ipc.RetryCache) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addCacheEntry(FSNamesystem.java:724) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:406) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:199) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:112) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:733) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:227) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:321) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:279) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:296) > at > org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:456) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:292) > > When apply editLogOp,if the IPC retryCache is found,need to remove the > previous from priorityQueue(O(N)), The updateblock is don't need record > rpcId on editlog except 'client request updatePipeline',but we found many > 'UpdateBlocksOp' has repeat ipcId. > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-9293) FSEditLog's 'OpInstanceCache' instance of threadLocal cache exists dirty 'rpcId',which may cause standby NN too busy to communicate
[ https://issues.apache.org/jira/browse/HDFS-9293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] 邓飞 resolved HDFS-9293. -- Resolution: Fixed Fix Version/s: 2.7.1 > FSEditLog's 'OpInstanceCache' instance of threadLocal cache exists dirty > 'rpcId',which may cause standby NN too busy to communicate > -- > > Key: HDFS-9293 > URL: https://issues.apache.org/jira/browse/HDFS-9293 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.2.0, 2.7.1 >Reporter: 邓飞 >Assignee: 邓飞 > Fix For: 2.7.1 > > > In our cluster (hadoop 2.2.0-HA,700+ DN),we found standby NN tail editlog > slowly,and hold the fsnamesystem writelock during the work and the DN's > heartbeart/blockreport IPC request blocked.Lead to Active NN remove stale DN > which can't send heartbeat because blocking at process Standby NN Regiest > common(FIXED at 2.7.1). > Below is the standby NN stack: > "Edit log tailer" prio=10 tid=0x7f28fcf35800 nid=0x1a7d runnable > [0x7f0dd1d76000] >java.lang.Thread.State: RUNNABLE > at java.util.PriorityQueue.remove(PriorityQueue.java:360) > at > org.apache.hadoop.util.LightWeightCache.put(LightWeightCache.java:217) > at org.apache.hadoop.ipc.RetryCache.addCacheEntry(RetryCache.java:270) > - locked <0x7f12817714b8> (a org.apache.hadoop.ipc.RetryCache) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addCacheEntry(FSNamesystem.java:724) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:406) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:199) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:112) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:733) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:227) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:321) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:279) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:296) > at > org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:456) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:292) > > When apply editLogOp,if the IPC retryCache is found,need to remove the > previous from priorityQueue(O(N)), The updateblock is don't need record > rpcId on editlog except 'client request updatePipeline',but we found many > 'UpdateBlocksOp' has repeat ipcId. > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9276) Failed to Update HDFS Delegation Token for long running application in HA mode
[ https://issues.apache.org/jira/browse/HDFS-9276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liangliang Gu updated HDFS-9276: Attachment: HDFS-9276.03.patch > Failed to Update HDFS Delegation Token for long running application in HA mode > -- > > Key: HDFS-9276 > URL: https://issues.apache.org/jira/browse/HDFS-9276 > Project: Hadoop HDFS > Issue Type: Bug > Components: fs, ha, security >Affects Versions: 2.7.1 >Reporter: Liangliang Gu >Assignee: Liangliang Gu > Attachments: HDFS-9276.01.patch, HDFS-9276.02.patch, > HDFS-9276.03.patch, debug1.PNG, debug2.PNG > > > The Scenario is as follows: > 1. NameNode HA is enabled. > 2. Kerberos is enabled. > 3. HDFS Delegation Token (not Keytab or TGT) is used to communicate with > NameNode. > 4. We want to update the HDFS Delegation Token for long running applicatons. > HDFS Client will generate private tokens for each NameNode. When we update > the HDFS Delegation Token, these private tokens will not be updated, which > will cause token expired. > This bug can be reproduced by the following program: > {code} > import java.security.PrivilegedExceptionAction > import org.apache.hadoop.conf.Configuration > import org.apache.hadoop.fs.{FileSystem, Path} > import org.apache.hadoop.security.UserGroupInformation > object HadoopKerberosTest { > def main(args: Array[String]): Unit = { > val keytab = "/path/to/keytab/xxx.keytab" > val principal = "x...@abc.com" > val creds1 = new org.apache.hadoop.security.Credentials() > val ugi1 = > UserGroupInformation.loginUserFromKeytabAndReturnUGI(principal, keytab) > ugi1.doAs(new PrivilegedExceptionAction[Void] { > // Get a copy of the credentials > override def run(): Void = { > val fs = FileSystem.get(new Configuration()) > fs.addDelegationTokens("test", creds1) > null > } > }) > val ugi = UserGroupInformation.createRemoteUser("test") > ugi.addCredentials(creds1) > ugi.doAs(new PrivilegedExceptionAction[Void] { > // Get a copy of the credentials > override def run(): Void = { > var i = 0 > while (true) { > val creds1 = new org.apache.hadoop.security.Credentials() > val ugi1 = > UserGroupInformation.loginUserFromKeytabAndReturnUGI(principal, keytab) > ugi1.doAs(new PrivilegedExceptionAction[Void] { > // Get a copy of the credentials > override def run(): Void = { > val fs = FileSystem.get(new Configuration()) > fs.addDelegationTokens("test", creds1) > null > } > }) > UserGroupInformation.getCurrentUser.addCredentials(creds1) > val fs = FileSystem.get( new Configuration()) > i += 1 > println() > println(i) > println(fs.listFiles(new Path("/user"), false)) > Thread.sleep(60 * 1000) > } > null > } > }) > } > } > {code} > To reproduce the bug, please set the following configuration to Name Node: > {code} > dfs.namenode.delegation.token.max-lifetime = 10min > dfs.namenode.delegation.key.update-interval = 3min > dfs.namenode.delegation.token.renew-interval = 3min > {code} > The bug will occure after 3 minutes. > The stacktrace is: > {code} > Exception in thread "main" > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): > token (HDFS_DELEGATION_TOKEN token 330156 for test) is expired > at org.apache.hadoop.ipc.Client.call(Client.java:1347) > at org.apache.hadoop.ipc.Client.call(Client.java:1300) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:651) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy10.getFileInfo(Unknown Source) > at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1679) > at > org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1106) > at >
[jira] [Work started] (HDFS-9293) FSEditLog's 'OpInstanceCache' instance of threadLocal cache exists dirty 'rpcId',which may cause standby NN too busy to communicate
[ https://issues.apache.org/jira/browse/HDFS-9293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDFS-9293 started by 邓飞. > FSEditLog's 'OpInstanceCache' instance of threadLocal cache exists dirty > 'rpcId',which may cause standby NN too busy to communicate > -- > > Key: HDFS-9293 > URL: https://issues.apache.org/jira/browse/HDFS-9293 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.2.0, 2.7.1 >Reporter: 邓飞 >Assignee: 邓飞 > > In our cluster (hadoop 2.2.0-HA,700+ DN),we found standby NN tail editlog > slowly,and hold the fsnamesystem writelock during the work and the DN's > heartbeart/blockreport IPC request blocked.Lead to Active NN remove stale DN > which can't send heartbeat because blocking at process Standby NN Regiest > common(FIXED at 2.7.1). > Below is the standby NN stack: > "Edit log tailer" prio=10 tid=0x7f28fcf35800 nid=0x1a7d runnable > [0x7f0dd1d76000] >java.lang.Thread.State: RUNNABLE > at java.util.PriorityQueue.remove(PriorityQueue.java:360) > at > org.apache.hadoop.util.LightWeightCache.put(LightWeightCache.java:217) > at org.apache.hadoop.ipc.RetryCache.addCacheEntry(RetryCache.java:270) > - locked <0x7f12817714b8> (a org.apache.hadoop.ipc.RetryCache) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addCacheEntry(FSNamesystem.java:724) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:406) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:199) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:112) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:733) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:227) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:321) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:279) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:296) > at > org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:456) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:292) > > When apply editLogOp,if the IPC retryCache is found,need to remove the > previous from priorityQueue(O(N)), The updateblock is don't need record > rpcId on editlog except 'client request updatePipeline',but we found many > 'UpdateBlocksOp' has repeat ipcId. > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9276) Failed to Update HDFS Delegation Token for long running application in HA mode
[ https://issues.apache.org/jira/browse/HDFS-9276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970736#comment-14970736 ] Hadoop QA commented on HDFS-9276: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12768266/HDFS-9276.02.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 124a412 | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/13150/console | This message was automatically generated. > Failed to Update HDFS Delegation Token for long running application in HA mode > -- > > Key: HDFS-9276 > URL: https://issues.apache.org/jira/browse/HDFS-9276 > Project: Hadoop HDFS > Issue Type: Bug > Components: fs, ha, security >Affects Versions: 2.7.1 >Reporter: Liangliang Gu >Assignee: Liangliang Gu > Attachments: HDFS-9276.01.patch, HDFS-9276.02.patch, debug1.PNG, > debug2.PNG > > > The Scenario is as follows: > 1. NameNode HA is enabled. > 2. Kerberos is enabled. > 3. HDFS Delegation Token (not Keytab or TGT) is used to communicate with > NameNode. > 4. We want to update the HDFS Delegation Token for long running applicatons. > HDFS Client will generate private tokens for each NameNode. When we update > the HDFS Delegation Token, these private tokens will not be updated, which > will cause token expired. > This bug can be reproduced by the following program: > {code} > import java.security.PrivilegedExceptionAction > import org.apache.hadoop.conf.Configuration > import org.apache.hadoop.fs.{FileSystem, Path} > import org.apache.hadoop.security.UserGroupInformation > object HadoopKerberosTest { > def main(args: Array[String]): Unit = { > val keytab = "/path/to/keytab/xxx.keytab" > val principal = "x...@abc.com" > val creds1 = new org.apache.hadoop.security.Credentials() > val ugi1 = > UserGroupInformation.loginUserFromKeytabAndReturnUGI(principal, keytab) > ugi1.doAs(new PrivilegedExceptionAction[Void] { > // Get a copy of the credentials > override def run(): Void = { > val fs = FileSystem.get(new Configuration()) > fs.addDelegationTokens("test", creds1) > null > } > }) > val ugi = UserGroupInformation.createRemoteUser("test") > ugi.addCredentials(creds1) > ugi.doAs(new PrivilegedExceptionAction[Void] { > // Get a copy of the credentials > override def run(): Void = { > var i = 0 > while (true) { > val creds1 = new org.apache.hadoop.security.Credentials() > val ugi1 = > UserGroupInformation.loginUserFromKeytabAndReturnUGI(principal, keytab) > ugi1.doAs(new PrivilegedExceptionAction[Void] { > // Get a copy of the credentials > override def run(): Void = { > val fs = FileSystem.get(new Configuration()) > fs.addDelegationTokens("test", creds1) > null > } > }) > UserGroupInformation.getCurrentUser.addCredentials(creds1) > val fs = FileSystem.get( new Configuration()) > i += 1 > println() > println(i) > println(fs.listFiles(new Path("/user"), false)) > Thread.sleep(60 * 1000) > } > null > } > }) > } > } > {code} > To reproduce the bug, please set the following configuration to Name Node: > {code} > dfs.namenode.delegation.token.max-lifetime = 10min > dfs.namenode.delegation.key.update-interval = 3min > dfs.namenode.delegation.token.renew-interval = 3min > {code} > The bug will occure after 3 minutes. > The stacktrace is: > {code} > Exception in thread "main" > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): > token (HDFS_DELEGATION_TOKEN token 330156 for test) is expired > at org.apache.hadoop.ipc.Client.call(Client.java:1347) > at org.apache.hadoop.ipc.Client.call(Client.java:1300) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:651) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at
[jira] [Commented] (HDFS-9276) Failed to Update HDFS Delegation Token for long running application in HA mode
[ https://issues.apache.org/jira/browse/HDFS-9276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970768#comment-14970768 ] Steve Loughran commented on HDFS-9276: -- give the patch is all to hadoop common -filing a JIRA & patch there will run the hadoop-common build & test, which is a bit less brittle than the HDFS one > Failed to Update HDFS Delegation Token for long running application in HA mode > -- > > Key: HDFS-9276 > URL: https://issues.apache.org/jira/browse/HDFS-9276 > Project: Hadoop HDFS > Issue Type: Bug > Components: fs, ha, security >Affects Versions: 2.7.1 >Reporter: Liangliang Gu >Assignee: Liangliang Gu > Attachments: HDFS-9276.01.patch, HDFS-9276.02.patch, > HDFS-9276.03.patch, debug1.PNG, debug2.PNG > > > The Scenario is as follows: > 1. NameNode HA is enabled. > 2. Kerberos is enabled. > 3. HDFS Delegation Token (not Keytab or TGT) is used to communicate with > NameNode. > 4. We want to update the HDFS Delegation Token for long running applicatons. > HDFS Client will generate private tokens for each NameNode. When we update > the HDFS Delegation Token, these private tokens will not be updated, which > will cause token expired. > This bug can be reproduced by the following program: > {code} > import java.security.PrivilegedExceptionAction > import org.apache.hadoop.conf.Configuration > import org.apache.hadoop.fs.{FileSystem, Path} > import org.apache.hadoop.security.UserGroupInformation > object HadoopKerberosTest { > def main(args: Array[String]): Unit = { > val keytab = "/path/to/keytab/xxx.keytab" > val principal = "x...@abc.com" > val creds1 = new org.apache.hadoop.security.Credentials() > val ugi1 = > UserGroupInformation.loginUserFromKeytabAndReturnUGI(principal, keytab) > ugi1.doAs(new PrivilegedExceptionAction[Void] { > // Get a copy of the credentials > override def run(): Void = { > val fs = FileSystem.get(new Configuration()) > fs.addDelegationTokens("test", creds1) > null > } > }) > val ugi = UserGroupInformation.createRemoteUser("test") > ugi.addCredentials(creds1) > ugi.doAs(new PrivilegedExceptionAction[Void] { > // Get a copy of the credentials > override def run(): Void = { > var i = 0 > while (true) { > val creds1 = new org.apache.hadoop.security.Credentials() > val ugi1 = > UserGroupInformation.loginUserFromKeytabAndReturnUGI(principal, keytab) > ugi1.doAs(new PrivilegedExceptionAction[Void] { > // Get a copy of the credentials > override def run(): Void = { > val fs = FileSystem.get(new Configuration()) > fs.addDelegationTokens("test", creds1) > null > } > }) > UserGroupInformation.getCurrentUser.addCredentials(creds1) > val fs = FileSystem.get( new Configuration()) > i += 1 > println() > println(i) > println(fs.listFiles(new Path("/user"), false)) > Thread.sleep(60 * 1000) > } > null > } > }) > } > } > {code} > To reproduce the bug, please set the following configuration to Name Node: > {code} > dfs.namenode.delegation.token.max-lifetime = 10min > dfs.namenode.delegation.key.update-interval = 3min > dfs.namenode.delegation.token.renew-interval = 3min > {code} > The bug will occure after 3 minutes. > The stacktrace is: > {code} > Exception in thread "main" > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): > token (HDFS_DELEGATION_TOKEN token 330156 for test) is expired > at org.apache.hadoop.ipc.Client.call(Client.java:1347) > at org.apache.hadoop.ipc.Client.call(Client.java:1300) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:651) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy10.getFileInfo(Unknown Source) > at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1679)
[jira] [Commented] (HDFS-8914) Documentation conflict regarding fail-over of Namenode
[ https://issues.apache.org/jira/browse/HDFS-8914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970727#comment-14970727 ] Ravindra Babu commented on HDFS-8914: - Lars Francke : Are you committing this change as we have received +1 from Hadoop QA? > Documentation conflict regarding fail-over of Namenode > -- > > Key: HDFS-8914 > URL: https://issues.apache.org/jira/browse/HDFS-8914 > Project: Hadoop HDFS > Issue Type: Bug > Components: documentation >Affects Versions: 2.7.1 > Environment: Documentation page in live >Reporter: Ravindra Babu >Assignee: Lars Francke >Priority: Trivial > Attachments: HDFS-8914.1.patch, HDFS-8914.2.patch > > > Please refer to these two links and correct one of them. > http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html > The NameNode machine is a single point of failure for an HDFS cluster. If the > NameNode machine fails, manual intervention is necessary. Currently, > automatic restart and failover of the NameNode software to another machine is > not supported. > http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html > The HDFS High Availability feature addresses the above problems by providing > the option of running two redundant NameNodes in the same cluster in an > Active/Passive configuration with a hot standby. This allows a fast failover > to a new NameNode in the case that a machine crashes, or a graceful > administrator-initiated failover for the purpose of planned maintenance. > Please update hdfsDesign article with same facts to avoid confusion in > Reader's mind.. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HDFS-9254) HDFS Secure Mode Documentation updates
[ https://issues.apache.org/jira/browse/HDFS-9254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971312#comment-14971312 ] Arpit Agarwal edited comment on HDFS-9254 at 10/23/15 4:37 PM: --- So yes it looks like at least the {{SaslRpcClient}} doesn't like principals without a host component. {code} 192.168.56.80:8485: Failed on local exception: java.io.IOException: java.lang.IllegalArgumentException: Kerberos principal name does NOT have the expected hostname part: j...@example.com; Host Details : local host is: "cm0.example.com/192.168.56.80"; destination host is: "cm0.example.com":8485; at org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81) at org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:223) at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.hasSomeData(QuorumJournalManager.java:232) at org.apache.hadoop.hdfs.server.common.Storage.confirmFormat(Storage.java:899) {code} Whereas SecurityUtil handles them fine. We should be consistent. I'll file a separate bug to fix the {{SaslRpcClient}}, and any other components I run into, but also update the doc patch for now. Thanks for the catch. was (Author: arpitagarwal): So yes it looks like at least the Journal Node doesn't like principals without a host component. {code} 192.168.56.80:8485: Failed on local exception: java.io.IOException: java.lang.IllegalArgumentException: Kerberos principal name does NOT have the expected hostname part: j...@example.com; Host Details : local host is: "cm0.example.com/192.168.56.80"; destination host is: "cm0.example.com":8485; at org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81) at org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:223) at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.hasSomeData(QuorumJournalManager.java:232) at org.apache.hadoop.hdfs.server.common.Storage.confirmFormat(Storage.java:899) {code} Whereas SecurityUtil handles them fine. We should be consistent. I'll file a separate bug to fix the JN, and any other components I run into, but also update the doc patch for now. Thanks for the catch. > HDFS Secure Mode Documentation updates > -- > > Key: HDFS-9254 > URL: https://issues.apache.org/jira/browse/HDFS-9254 > Project: Hadoop HDFS > Issue Type: Bug > Components: documentation >Affects Versions: 2.7.1 >Reporter: Arpit Agarwal >Assignee: Arpit Agarwal > Attachments: HDFS-9254.01.patch > > > Some Kerberos configuration parameters are not documented well enough. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9231) fsck doesn't explicitly list when Bad Replicas/Blocks are in a snapshot
[ https://issues.apache.org/jira/browse/HDFS-9231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Chen updated HDFS-9231: Description: Currently for snapshot files, {{fsck -list-corruptfileblocks}} shows corrupt blocks with the original file dir instead of the snapshot dir, and {{fsck -list-corruptfileblocks -includeSnapshots}} behave the same. This can be confusing because even when the original file is deleted, fsck will still show that deleted file as corrupted, although what's actually corrupted is the snapshot. As a side note, {{fsck -files -includeSnapshots}} shows the snapshot dirs. was: For snapshot files, fsck shows corrupt blocks with the original file dir instead of the snapshot dir. This can be confusing since even when the original file is deleted, a new fsck run will still show that file as corrupted although what's actually corrupted is the snapshot. This is true even when given the -includeSnapshots option. > fsck doesn't explicitly list when Bad Replicas/Blocks are in a snapshot > --- > > Key: HDFS-9231 > URL: https://issues.apache.org/jira/browse/HDFS-9231 > Project: Hadoop HDFS > Issue Type: Bug > Components: snapshots >Reporter: Xiao Chen >Assignee: Xiao Chen > Attachments: HDFS-9231.001.patch, HDFS-9231.002.patch, > HDFS-9231.003.patch, HDFS-9231.004.patch > > > Currently for snapshot files, {{fsck -list-corruptfileblocks}} shows corrupt > blocks with the original file dir instead of the snapshot dir, and {{fsck > -list-corruptfileblocks -includeSnapshots}} behave the same. > This can be confusing because even when the original file is deleted, fsck > will still show that deleted file as corrupted, although what's actually > corrupted is the snapshot. > As a side note, {{fsck -files -includeSnapshots}} shows the snapshot dirs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9297) Update TestBlockMissingException to use corruptBlockOnDataNodesByDeletingBlockFile()
[ https://issues.apache.org/jira/browse/HDFS-9297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tony Wu updated HDFS-9297: -- Status: Patch Available (was: Open) > Update TestBlockMissingException to use > corruptBlockOnDataNodesByDeletingBlockFile() > > > Key: HDFS-9297 > URL: https://issues.apache.org/jira/browse/HDFS-9297 > Project: Hadoop HDFS > Issue Type: Improvement > Components: HDFS, test >Affects Versions: 2.7.1 >Reporter: Tony Wu >Assignee: Tony Wu >Priority: Trivial > Attachments: HDFS-9297.001.patch > > > TestBlockMissingException uses its own function to corrupt a block by > deleting all its block files. HDFS-7235 introduced a helper function > {{corruptBlockOnDataNodesByDeletingBlockFile()}} that does exactly the same > thing. We can update this test to use the helper function. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9272) Implement a unix-like cat utility
[ https://issues.apache.org/jira/browse/HDFS-9272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971367#comment-14971367 ] Haohui Mai commented on HDFS-9272: -- bq. Will do. Is there generally a preference in the HDFS community about taking a host and port as seperate tokens vs taking a URI for these sorts of tests? IMO either approach is reasonable. Please feel free to choose what is easier. > Implement a unix-like cat utility > - > > Key: HDFS-9272 > URL: https://issues.apache.org/jira/browse/HDFS-9272 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: James Clampffer >Assignee: James Clampffer >Priority: Minor > Attachments: HDFS-9272.HDFS-8707.000.patch > > > Implement the basic functionality of "cat" and have it build as a separate > executable. > 2 Reasons for this: > We don't have any real integration tests at the moment so something simple to > verify that the library actually works against a real cluster is useful. > Eventually I'll make more utilities like stat, mkdir etc. Once there are > enough of them it will be simple to make a C++ implementation of the hadoop > fs command line interface that doesn't take the latency hit of spinning up a > JVM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8914) Documentation conflict regarding fail-over of Namenode
[ https://issues.apache.org/jira/browse/HDFS-8914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971395#comment-14971395 ] Lars Francke commented on HDFS-8914: I'm not a Hadoop committer, we'll have to wait for one. > Documentation conflict regarding fail-over of Namenode > -- > > Key: HDFS-8914 > URL: https://issues.apache.org/jira/browse/HDFS-8914 > Project: Hadoop HDFS > Issue Type: Bug > Components: documentation >Affects Versions: 2.7.1 > Environment: Documentation page in live >Reporter: Ravindra Babu >Assignee: Lars Francke >Priority: Trivial > Attachments: HDFS-8914.1.patch, HDFS-8914.2.patch > > > Please refer to these two links and correct one of them. > http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html > The NameNode machine is a single point of failure for an HDFS cluster. If the > NameNode machine fails, manual intervention is necessary. Currently, > automatic restart and failover of the NameNode software to another machine is > not supported. > http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html > The HDFS High Availability feature addresses the above problems by providing > the option of running two redundant NameNodes in the same cluster in an > Active/Passive configuration with a hot standby. This allows a fast failover > to a new NameNode in the case that a machine crashes, or a graceful > administrator-initiated failover for the purpose of planned maintenance. > Please update hdfsDesign article with same facts to avoid confusion in > Reader's mind.. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7284) Add more debug info to BlockInfoUnderConstruction#setGenerationStampAndVerifyReplicas
[ https://issues.apache.org/jira/browse/HDFS-7284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDFS-7284: -- Attachment: HDFS-7284.003.patch [~yzhangal] Good idea! The output of your suggested change is: 2015-10-23 10:21:18,647 [IPC Server handler 7 on 51002] DEBUG BlockStateChange (BlockInfo.java:setGenerationStampAndVerifyReplicas(396)) - BLOCK* Removing stale replica ReplicaUC[[DISK]DS-b87b985d-6dc7-448e-9d45-dcd6c2c8ec37:NORMAL:127.0.0.1:51003|RBW] of blk_1073741826_1002 Attaching rev3 based on Yongjun's suggestion. > Add more debug info to > BlockInfoUnderConstruction#setGenerationStampAndVerifyReplicas > - > > Key: HDFS-7284 > URL: https://issues.apache.org/jira/browse/HDFS-7284 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 2.5.1 >Reporter: Hu Liu, >Assignee: Wei-Chiu Chuang > Labels: supportability > Attachments: HDFS-7284.001.patch, HDFS-7284.002.patch, > HDFS-7284.003.patch > > > When I was looking at some replica loss issue, I got the following info from > log > {code} > 2014-10-13 01:54:53,104 INFO BlockStateChange: BLOCK* Removing stale replica > from location x.x.x.x > {code} > I could just know that a replica is removed, but I don't know which block and > its timestamp. I need to know the id and timestamp of the block from the log > file. > So it's better to add more info including block id and timestamp to the code > snippet > {code} > for (ReplicaUnderConstruction r : replicas) { > if (genStamp != r.getGenerationStamp()) { > r.getExpectedLocation().removeBlock(this); > NameNode.blockStateChangeLog.info("BLOCK* Removing stale replica " > + "from location: " + r.getExpectedLocation()); > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9289) check genStamp when complete file
[ https://issues.apache.org/jira/browse/HDFS-9289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971302#comment-14971302 ] Chang Li commented on HDFS-9289: [~eclark], block on 10.210.31.38 should be marked as corrupt because it's from old pipeline right? > check genStamp when complete file > - > > Key: HDFS-9289 > URL: https://issues.apache.org/jira/browse/HDFS-9289 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Chang Li >Assignee: Chang Li >Priority: Critical > Attachments: HDFS-9289.1.patch, HDFS-9289.2.patch > > > we have seen a case of corrupt block which is caused by file complete after a > pipelineUpdate, but the file complete with the old block genStamp. This > caused the replicas of two datanodes in updated pipeline to be viewed as > corrupte. Propose to check genstamp when commit block -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9297) Update TestBlockMissingException to use corruptBlockOnDataNodesByDeletingBlockFile()
[ https://issues.apache.org/jira/browse/HDFS-9297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tony Wu updated HDFS-9297: -- Attachment: HDFS-9297.001.patch In this patch: * Use {{corruptBlockOnDataNodesByDeletingBlockFile()}} to corrupt a block by removing all block files. * Removed the test's own implementation of the same function. > Update TestBlockMissingException to use > corruptBlockOnDataNodesByDeletingBlockFile() > > > Key: HDFS-9297 > URL: https://issues.apache.org/jira/browse/HDFS-9297 > Project: Hadoop HDFS > Issue Type: Improvement > Components: HDFS, test >Affects Versions: 2.7.1 >Reporter: Tony Wu >Assignee: Tony Wu >Priority: Trivial > Attachments: HDFS-9297.001.patch > > > TestBlockMissingException uses its own function to corrupt a block by > deleting all its block files. HDFS-7235 introduced a helper function > {{corruptBlockOnDataNodesByDeletingBlockFile()}} that does exactly the same > thing. We can update this test to use the helper function. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9295) Add a thorough test of the full KMS code path
[ https://issues.apache.org/jira/browse/HDFS-9295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Templeton updated HDFS-9295: --- Attachment: HDFS-9295.002.patch Fixed complier warnings > Add a thorough test of the full KMS code path > - > > Key: HDFS-9295 > URL: https://issues.apache.org/jira/browse/HDFS-9295 > Project: Hadoop HDFS > Issue Type: Test > Components: security, test >Affects Versions: 2.7.1 >Reporter: Daniel Templeton >Assignee: Daniel Templeton >Priority: Critical > Attachments: HDFS-9295.001.patch, HDFS-9295.002.patch > > > TestKMS does a good job of testing the ACLs directly, but they are tested out > of context. Additional tests are needed that test how the ACL impact key > creation, EZ creation, file creation in an EZ, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9288) Add RapidXML to third-party
[ https://issues.apache.org/jira/browse/HDFS-9288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971362#comment-14971362 ] Haohui Mai commented on HDFS-9288: -- I think that requires us to fix pom.xml to exclude these files when checking ASF licenses. I think it's okay to separate it to another jira. > Add RapidXML to third-party > --- > > Key: HDFS-9288 > URL: https://issues.apache.org/jira/browse/HDFS-9288 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Bob Hansen >Assignee: Bob Hansen > Attachments: HDFS-9288.HDFS-8707.001.patch > > > Needed for Configuration class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9079) Erasure coding: preallocate multiple generation stamps and serialize updates from data streamers
[ https://issues.apache.org/jira/browse/HDFS-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971412#comment-14971412 ] Zhe Zhang commented on HDFS-9079: - Thanks Nicholas for the comment. bq. > ... => 2) Asks NN for new GS => 3) Gets new GS from NN => ... bq. What is the difference between #2 and #3? Is it just a single RPC? Yes it's a single RPC. I listed them as 2 steps because other events could happen between #2 and #3. E.g. while {{streamer_i}} is waiting for NN response {{streamer_j}} might start step #2. bq. Do you mean that client may update GS without letting NN knowing it? More details of the proposed protocol can be found [here | https://issues.apache.org/jira/browse/HDFS-9040?focusedCommentId=14741972=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14741972]. When {{streamer_i}} encounters a DN failure, the coordinator will ask other streamers to update their DN to increment GS. After all healthy DNs acknowledge that they have bumped their local GSes, the coordinator send {{updatePipeline}} RPC to NN to update the NN's copy of GS. So there will never be "false stale" -- fresh replica being considered as stale. bq. How to save step #1? Good catch, I meant saving steps 2~3. > Erasure coding: preallocate multiple generation stamps and serialize updates > from data streamers > > > Key: HDFS-9079 > URL: https://issues.apache.org/jira/browse/HDFS-9079 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: erasure-coding >Affects Versions: HDFS-7285 >Reporter: Zhe Zhang >Assignee: Zhe Zhang > Attachments: HDFS-9079-HDFS-7285.00.patch, HDFS-9079.01.patch, > HDFS-9079.02.patch, HDFS-9079.03.patch, HDFS-9079.04.patch, HDFS-9079.05.patch > > > A non-striped DataStreamer goes through the following steps in error handling: > {code} > 1) Finds error => 2) Asks NN for new GS => 3) Gets new GS from NN => 4) > Applies new GS to DN (createBlockOutputStream) => 5) Ack from DN => 6) > Updates block on NN > {code} > To simplify the above we can preallocate GS when NN creates a new striped > block group ({{FSN#createNewBlock}}). For each new striped block group we can > reserve {{NUM_PARITY_BLOCKS}} GS's. Then steps 1~3 in the above sequence can > be saved. If more than {{NUM_PARITY_BLOCKS}} errors have happened we > shouldn't try to further recover anyway. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9184) Logging HDFS operation's caller context into audit logs
[ https://issues.apache.org/jira/browse/HDFS-9184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971418#comment-14971418 ] Jitendra Nath Pandey commented on HDFS-9184: I think any check at the client side can be followed up as a separate jira. It is not so critical, because rogue clients can circumvent a client side check anyway. +1 for the latest patch. I also plan to commit it to branch-2, because this patch doesn't change the audit logs at all, unless explicitly enabled. > Logging HDFS operation's caller context into audit logs > --- > > Key: HDFS-9184 > URL: https://issues.apache.org/jira/browse/HDFS-9184 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Mingliang Liu >Assignee: Mingliang Liu > Attachments: HDFS-9184.000.patch, HDFS-9184.001.patch, > HDFS-9184.002.patch, HDFS-9184.003.patch, HDFS-9184.004.patch, > HDFS-9184.005.patch, HDFS-9184.006.patch, HDFS-9184.007.patch, > HDFS-9184.008.patch, HDFS-9184.009.patch > > > For a given HDFS operation (e.g. delete file), it's very helpful to track > which upper level job issues it. The upper level callers may be specific > Oozie tasks, MR jobs, and hive queries. One scenario is that the namenode > (NN) is abused/spammed, the operator may want to know immediately which MR > job should be blamed so that she can kill it. To this end, the caller context > contains at least the application-dependent "tracking id". > There are several existing techniques that may be related to this problem. > 1. Currently the HDFS audit log tracks the users of the the operation which > is obviously not enough. It's common that the same user issues multiple jobs > at the same time. Even for a single top level task, tracking back to a > specific caller in a chain of operations of the whole workflow (e.g.Oozie -> > Hive -> Yarn) is hard, if not impossible. > 2. HDFS integrated {{htrace}} support for providing tracing information > across multiple layers. The span is created in many places interconnected > like a tree structure which relies on offline analysis across RPC boundary. > For this use case, {{htrace}} has to be enabled at 100% sampling rate which > introduces significant overhead. Moreover, passing additional information > (via annotations) other than span id from root of the tree to leaf is a > significant additional work. > 3. In [HDFS-4680 | https://issues.apache.org/jira/browse/HDFS-4680], there > are some related discussion on this topic. The final patch implemented the > tracking id as a part of delegation token. This protects the tracking > information from being changed or impersonated. However, kerberos > authenticated connections or insecure connections don't have tokens. > [HADOOP-8779] proposes to use tokens in all the scenarios, but that might > mean changes to several upstream projects and is a major change in their > security implementation. > We propose another approach to address this problem. We also treat HDFS audit > log as a good place for after-the-fact root cause analysis. We propose to put > the caller id (e.g. Hive query id) in threadlocals. Specially, on client side > the threadlocal object is passed to NN as a part of RPC header (optional), > while on sever side NN retrieves it from header and put it to {{Handler}}'s > threadlocals. Finally in {{FSNamesystem}}, HDFS audit logger will record the > caller context for each operation. In this way, the existing code is not > affected. > It is still challenging to keep "lying" client from abusing the caller > context. Our proposal is to add a {{signature}} field to the caller context. > The client choose to provide its signature along with the caller id. The > operator may need to validate the signature at the time of offline analysis. > The NN is not responsible for validating the signature online. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9231) fsck doesn't explicitly list when Bad Replicas/Blocks are in a snapshot
[ https://issues.apache.org/jira/browse/HDFS-9231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Chen updated HDFS-9231: Attachment: HDFS-9231.005.patch > fsck doesn't explicitly list when Bad Replicas/Blocks are in a snapshot > --- > > Key: HDFS-9231 > URL: https://issues.apache.org/jira/browse/HDFS-9231 > Project: Hadoop HDFS > Issue Type: Bug > Components: snapshots >Reporter: Xiao Chen >Assignee: Xiao Chen > Attachments: HDFS-9231.001.patch, HDFS-9231.002.patch, > HDFS-9231.003.patch, HDFS-9231.004.patch, HDFS-9231.005.patch > > > Currently for snapshot files, {{fsck -list-corruptfileblocks}} shows corrupt > blocks with the original file dir instead of the snapshot dir, and {{fsck > -list-corruptfileblocks -includeSnapshots}} behave the same. > This can be confusing because even when the original file is deleted, fsck > will still show that deleted file as corrupted, although what's actually > corrupted is the snapshot. > As a side note, {{fsck -files -includeSnapshots}} shows the snapshot dirs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9231) fsck doesn't explicitly list when Bad Replicas/Blocks are in a snapshot
[ https://issues.apache.org/jira/browse/HDFS-9231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971469#comment-14971469 ] Xiao Chen commented on HDFS-9231: - Thanks a lot for the review [~yzhangal]! {quote} 1. The description is not quite accurate per our discussion, suggest to modify. Especially the patch actually does change (and fix) the behavior when without -includeSnapshots. {quote} It was great to talk to you. I have updated the description. Modified patch summary in the end of this comment. {quote} 2. A possible optimization in FSDirSnapshotOp#getSnapshotFiles. It seems that the sf variable could be calculated in caller for once before the loop in the caller, and pass to this method. {quote} My apologies for the confusion, I added some comments in this method. But getting sf for each snapshottable dir is needed, since /d1 and /d2 have different snapshotlist. {quote} 3. final INodesInPath iip = fsd.getINodesInPath4Write(snap, false); maybe substituted with call to getINodesInPath {quote} Good catch! I updated the code to call {{getINode}} which invokes {{getINodesInPath}}. {quote} 4. The check if (!corruptFileBlocks.isEmpty()) in listCorruptFileBlocksWithSnapshot is not needed {quote} Good call. Fixed. {quote} 5. Add comment in listCorruptFileBlocks() before the call namenode.getNamesystem().listCorruptFileBlocksWithSnapshot, to indicate that snapshottableDirs is only relevant when -includeSnapshots is specified. {quote} Added a link to {{FSNamesystem#listCorruptFileBlocksWithSnapshot}} which explains that parameter in javadoc. {quote} 6. In listCorruptFileBlocksWithSnapshot, we can add {code} if (snapshottableDirs == null) { continue; } {code} to avoid the call to getSnapshotFiles. {quote} I'm not sure this is necessary. On one hand, it definitely saves 1 call stack. On the other hand, with the existence of all those loops and checks, I think the performance gain of saving 1 call stack would be trivial. And the nullity check of snapshottableDirs is already performed as a first step in {{getSnapshotFiles}}. Attached patch 005 with the above modifications. Updated summary below: - {{fsck -list-corruptfileblocks -includeSnapshots}} will also show full dir of snapshots - {{fsck -list-corruptfileblocks}} without -includeSnapshots will not show corrupt blocks that only have snapshot files - NameNode WebUI's way of showing corrupted files/blocks unchanged. - Added a sentence in NN WebUI to hint the admin to run fsck with -includeSnapshots, if there're snapshots present in the system. - Some refactoring to reuse existing code in new methods getSnapshottableDirs and ListCorruptFileBlocksWithSnapshot - The reasoning of keep minimal change to NN WebUI and fsck without -includeSnapshots is that getting all possible snapshots may be slow. > fsck doesn't explicitly list when Bad Replicas/Blocks are in a snapshot > --- > > Key: HDFS-9231 > URL: https://issues.apache.org/jira/browse/HDFS-9231 > Project: Hadoop HDFS > Issue Type: Bug > Components: snapshots >Reporter: Xiao Chen >Assignee: Xiao Chen > Attachments: HDFS-9231.001.patch, HDFS-9231.002.patch, > HDFS-9231.003.patch, HDFS-9231.004.patch, HDFS-9231.005.patch > > > Currently for snapshot files, {{fsck -list-corruptfileblocks}} shows corrupt > blocks with the original file dir instead of the snapshot dir, and {{fsck > -list-corruptfileblocks -includeSnapshots}} behave the same. > This can be confusing because even when the original file is deleted, fsck > will still show that deleted file as corrupted, although what's actually > corrupted is the snapshot. > As a side note, {{fsck -files -includeSnapshots}} shows the snapshot dirs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9231) fsck doesn't explicitly list when Bad Replicas/Blocks are in a snapshot
[ https://issues.apache.org/jira/browse/HDFS-9231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Chen updated HDFS-9231: Status: Patch Available (was: Open) > fsck doesn't explicitly list when Bad Replicas/Blocks are in a snapshot > --- > > Key: HDFS-9231 > URL: https://issues.apache.org/jira/browse/HDFS-9231 > Project: Hadoop HDFS > Issue Type: Bug > Components: snapshots >Reporter: Xiao Chen >Assignee: Xiao Chen > Attachments: HDFS-9231.001.patch, HDFS-9231.002.patch, > HDFS-9231.003.patch, HDFS-9231.004.patch, HDFS-9231.005.patch > > > Currently for snapshot files, {{fsck -list-corruptfileblocks}} shows corrupt > blocks with the original file dir instead of the snapshot dir, and {{fsck > -list-corruptfileblocks -includeSnapshots}} behave the same. > This can be confusing because even when the original file is deleted, fsck > will still show that deleted file as corrupted, although what's actually > corrupted is the snapshot. > As a side note, {{fsck -files -includeSnapshots}} shows the snapshot dirs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9254) HDFS Secure Mode Documentation updates
[ https://issues.apache.org/jira/browse/HDFS-9254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971312#comment-14971312 ] Arpit Agarwal commented on HDFS-9254: - So yes it looks like at least the Journal Node doesn't like principals without a host component. {code} 192.168.56.80:8485: Failed on local exception: java.io.IOException: java.lang.IllegalArgumentException: Kerberos principal name does NOT have the expected hostname part: j...@example.com; Host Details : local host is: "cm0.example.com/192.168.56.80"; destination host is: "cm0.example.com":8485; at org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81) at org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:223) at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.hasSomeData(QuorumJournalManager.java:232) at org.apache.hadoop.hdfs.server.common.Storage.confirmFormat(Storage.java:899) {code} Whereas SecurityUtil handles them fine. We should be consistent. I'll file a separate bug to fix the JN, and any other components I run into, but also update the doc patch for now. Thanks for the catch. > HDFS Secure Mode Documentation updates > -- > > Key: HDFS-9254 > URL: https://issues.apache.org/jira/browse/HDFS-9254 > Project: Hadoop HDFS > Issue Type: Bug > Components: documentation >Affects Versions: 2.7.1 >Reporter: Arpit Agarwal >Assignee: Arpit Agarwal > Attachments: HDFS-9254.01.patch > > > Some Kerberos configuration parameters are not documented well enough. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9077) webhdfs client requires SPNEGO to do renew
[ https://issues.apache.org/jira/browse/HDFS-9077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] HeeSoo Kim updated HDFS-9077: - Attachment: HDFS-9077.002.patch > webhdfs client requires SPNEGO to do renew > -- > > Key: HDFS-9077 > URL: https://issues.apache.org/jira/browse/HDFS-9077 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Allen Wittenauer >Assignee: HeeSoo Kim > Attachments: HDFS-9077.001.patch, HDFS-9077.002.patch, HDFS-9077.patch > > > Simple bug. > webhdfs (the file system) doesn't pass delegation= in its REST call to renew > the same token. This forces a SPNEGO (or other auth) instead of just > renewing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9229) Expose size of NameNode directory as a metric
[ https://issues.apache.org/jira/browse/HDFS-9229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971375#comment-14971375 ] Surendra Singh Lilhore commented on HDFS-9229: -- Thanks [~wheat9] for suggestion.. Can I move this metric in {{NameNodeStatusMXBean}} ?? > Expose size of NameNode directory as a metric > - > > Key: HDFS-9229 > URL: https://issues.apache.org/jira/browse/HDFS-9229 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 2.7.1 >Reporter: Zhe Zhang >Assignee: Surendra Singh Lilhore >Priority: Minor > Attachments: HDFS-9229.001.patch, HDFS-9229.002.patch, > HDFS-9229.003.patch > > > Useful for admins in reserving / managing NN local file system space. Also > useful when transferring NN backups. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8808) dfs.image.transfer.bandwidthPerSec should not apply to -bootstrapStandby
[ https://issues.apache.org/jira/browse/HDFS-8808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971388#comment-14971388 ] Zhe Zhang commented on HDFS-8808: - Thanks ATM for reviewing again. I just triggered Jenkins since the last run was from a month ago. > dfs.image.transfer.bandwidthPerSec should not apply to -bootstrapStandby > > > Key: HDFS-8808 > URL: https://issues.apache.org/jira/browse/HDFS-8808 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.1 >Reporter: Gautam Gopalakrishnan >Assignee: Zhe Zhang > Attachments: HDFS-8808-00.patch, HDFS-8808-01.patch, > HDFS-8808-02.patch, HDFS-8808-03.patch, HDFS-8808.04.patch > > > The parameter {{dfs.image.transfer.bandwidthPerSec}} can be used to limit the > speed with which the fsimage is copied between the namenodes during regular > use. However, as a side effect, this also limits transfers when the > {{-bootstrapStandby}} option is used. This option is often used during > upgrades and could potentially slow down the entire workflow. The request > here is to ensure {{-bootstrapStandby}} is unaffected by this bandwidth > setting -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9231) fsck doesn't explicitly list when Bad Replicas/Blocks are in a snapshot
[ https://issues.apache.org/jira/browse/HDFS-9231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Chen updated HDFS-9231: Status: Open (was: Patch Available) > fsck doesn't explicitly list when Bad Replicas/Blocks are in a snapshot > --- > > Key: HDFS-9231 > URL: https://issues.apache.org/jira/browse/HDFS-9231 > Project: Hadoop HDFS > Issue Type: Bug > Components: snapshots >Reporter: Xiao Chen >Assignee: Xiao Chen > Attachments: HDFS-9231.001.patch, HDFS-9231.002.patch, > HDFS-9231.003.patch, HDFS-9231.004.patch > > > For snapshot files, fsck shows corrupt blocks with the original file dir > instead of the snapshot dir. > This can be confusing since even when the original file is deleted, a new > fsck run will still show that file as corrupted although what's actually > corrupted is the snapshot. > This is true even when given the -includeSnapshots option. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-4015) Safemode should count and report orphaned blocks
[ https://issues.apache.org/jira/browse/HDFS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971441#comment-14971441 ] Anu Engineer commented on HDFS-4015: none of the test failures seem to be related to this patch. > Safemode should count and report orphaned blocks > > > Key: HDFS-4015 > URL: https://issues.apache.org/jira/browse/HDFS-4015 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.0.0 >Reporter: Todd Lipcon >Assignee: Anu Engineer > Attachments: HDFS-4015.001.patch, HDFS-4015.002.patch, > HDFS-4015.003.patch, HDFS-4015.004.patch, HDFS-4015.005.patch, > HDFS-4015.006.patch > > > The safemode status currently reports the number of unique reported blocks > compared to the total number of blocks referenced by the namespace. However, > it does not report the inverse: blocks which are reported by datanodes but > not referenced by the namespace. > In the case that an admin accidentally starts up from an old image, this can > be confusing: safemode and fsck will show "corrupt files", which are the > files which actually have been deleted but got resurrected by restarting from > the old image. This will convince them that they can safely force leave > safemode and remove these files -- after all, they know that those files > should really have been deleted. However, they're not aware that leaving > safemode will also unrecoverably delete a bunch of other block files which > have been orphaned due to the namespace rollback. > I'd like to consider reporting something like: "90 of expected 100 > blocks have been reported. Additionally, 1 blocks have been reported > which do not correspond to any file in the namespace. Forcing exit of > safemode will unrecoverably remove those data blocks" > Whether this statistic is also used for some kind of "inverse safe mode" is > the logical next step, but just reporting it as a warning seems easy enough > to accomplish and worth doing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7284) Add more debug info to BlockInfoUnderConstruction#setGenerationStampAndVerifyReplicas
[ https://issues.apache.org/jira/browse/HDFS-7284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971297#comment-14971297 ] Hadoop QA commented on HDFS-7284: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 18m 25s | Pre-patch trunk has 1 extant Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 59s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 34s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 25s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 25s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 33s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 32s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 11s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 50m 12s | Tests failed in hadoop-hdfs. | | | | 96m 53s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes | | | hadoop.hdfs.server.namenode.snapshot.TestSnapshotBlocksMap | | | hadoop.hdfs.server.namenode.snapshot.TestNestedSnapshots | | | hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot | | | hadoop.hdfs.server.namenode.snapshot.TestSnapshot | | | hadoop.hdfs.server.datanode.TestNNHandlesCombinedBlockReport | | Timed out tests | org.apache.hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots | | | org.apache.hadoop.hdfs.server.namenode.TestFileTruncate | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12768308/HDFS-7284.002.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 35a303d | | Pre-patch Findbugs warnings | https://builds.apache.org/job/PreCommit-HDFS-Build/13153/artifact/patchprocess/trunkFindbugsWarningshadoop-hdfs.html | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/13153/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/13153/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/13153/console | This message was automatically generated. > Add more debug info to > BlockInfoUnderConstruction#setGenerationStampAndVerifyReplicas > - > > Key: HDFS-7284 > URL: https://issues.apache.org/jira/browse/HDFS-7284 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 2.5.1 >Reporter: Hu Liu, >Assignee: Wei-Chiu Chuang > Labels: supportability > Attachments: HDFS-7284.001.patch, HDFS-7284.002.patch > > > When I was looking at some replica loss issue, I got the following info from > log > {code} > 2014-10-13 01:54:53,104 INFO BlockStateChange: BLOCK* Removing stale replica > from location x.x.x.x > {code} > I could just know that a replica is removed, but I don't know which block and > its timestamp. I need to know the id and timestamp of the block from the log > file. > So it's better to add more info including block id and timestamp to the code > snippet > {code} > for (ReplicaUnderConstruction r : replicas) { > if (genStamp != r.getGenerationStamp()) { > r.getExpectedLocation().removeBlock(this); > NameNode.blockStateChangeLog.info("BLOCK* Removing stale replica " > + "from location: " + r.getExpectedLocation()); > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8631) WebHDFS : Support list/setQuota
[ https://issues.apache.org/jira/browse/HDFS-8631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Surendra Singh Lilhore updated HDFS-8631: - Attachment: HDFS-8631-002.patch Attached updated patch.. Please review > WebHDFS : Support list/setQuota > --- > > Key: HDFS-8631 > URL: https://issues.apache.org/jira/browse/HDFS-8631 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: nijel >Assignee: Surendra Singh Lilhore > Attachments: HDFS-8631-001.patch, HDFS-8631-002.patch > > > User is able do quota management from filesystem object. Same operation can > be allowed trough REST API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9289) check genStamp when complete file
[ https://issues.apache.org/jira/browse/HDFS-9289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chang Li updated HDFS-9289: --- Attachment: HDFS-9289.2.patch .2 patch include test. also include info of encountered genStamp and expected genStamp in exception > check genStamp when complete file > - > > Key: HDFS-9289 > URL: https://issues.apache.org/jira/browse/HDFS-9289 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Chang Li >Assignee: Chang Li >Priority: Critical > Attachments: HDFS-9289.1.patch, HDFS-9289.2.patch > > > we have seen a case of corrupt block which is caused by file complete after a > pipelineUpdate, but the file complete with the old block genStamp. This > caused the replicas of two datanodes in updated pipeline to be viewed as > corrupte. Propose to check genstamp when commit block -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9297) Update TestBlockMissingException to use corruptBlockOnDataNodesByDeletingBlockFile()
Tony Wu created HDFS-9297: - Summary: Update TestBlockMissingException to use corruptBlockOnDataNodesByDeletingBlockFile() Key: HDFS-9297 URL: https://issues.apache.org/jira/browse/HDFS-9297 Project: Hadoop HDFS Issue Type: Improvement Components: HDFS, test Affects Versions: 2.7.1 Reporter: Tony Wu Assignee: Tony Wu Priority: Trivial TestBlockMissingException uses its own function to corrupt a block by deleting all its block files. HDFS-7235 introduced a helper function {{corruptBlockOnDataNodesByDeletingBlockFile()}} that does exactly the same thing. We can update this test to use the helper function. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9290) DFSClient#callAppend() is not backward compatible for slightly older NameNodes
[ https://issues.apache.org/jira/browse/HDFS-9290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14972300#comment-14972300 ] Hudson commented on HDFS-9290: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2522 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2522/]) HDFS-9290. DFSClient#callAppend() is not backward compatible for (kihwal: rev b9e0417bdf2b9655dc4256bdb43683eca1ab46be) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java > DFSClient#callAppend() is not backward compatible for slightly older NameNodes > -- > > Key: HDFS-9290 > URL: https://issues.apache.org/jira/browse/HDFS-9290 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.1 >Reporter: Tony Wu >Assignee: Tony Wu >Priority: Blocker > Fix For: 3.0.0, 2.7.2 > > Attachments: HDFS-9290.001.patch, HDFS-9290.002.patch > > > HDFS-7210 combined 2 RPC calls used at file append into a single one. > Specifically {{getFileInfo()}} is combined with {{append()}}. While backward > compatibility for older client is handled by the new NameNode (protobuf). > Newer client's {{append()}} call does not work with older NameNodes. One will > run into an exception like the following: > {code:java} > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.DFSOutputStream.isLazyPersist(DFSOutputStream.java:1741) > at > org.apache.hadoop.hdfs.DFSOutputStream.getChecksum4Compute(DFSOutputStream.java:1550) > at > org.apache.hadoop.hdfs.DFSOutputStream.(DFSOutputStream.java:1560) > at > org.apache.hadoop.hdfs.DFSOutputStream.(DFSOutputStream.java:1670) > at > org.apache.hadoop.hdfs.DFSOutputStream.newStreamForAppend(DFSOutputStream.java:1717) > at org.apache.hadoop.hdfs.DFSClient.callAppend(DFSClient.java:1861) > at org.apache.hadoop.hdfs.DFSClient.append(DFSClient.java:1922) > at org.apache.hadoop.hdfs.DFSClient.append(DFSClient.java:1892) > at > org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:340) > at > org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:336) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.append(DistributedFileSystem.java:336) > at > org.apache.hadoop.hdfs.DistributedFileSystem.append(DistributedFileSystem.java:318) > at org.apache.hadoop.fs.FileSystem.append(FileSystem.java:1164) > {code} > The cause is that the new client code is expecting both the last block and > file info in the same RPC but the old NameNode only replied with the first. > The exception itself does not reflect this and one will have to look at the > HDFS source code to really understand what happened. > We can have the client detect it's talking to a old NameNode and send an > extra {{getFileInfo()}} RPC. Or we should improve the exception being thrown > to accurately reflect the cause of failure. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9301) HDFS clients can't construct HdfsConfiguration instances
[ https://issues.apache.org/jira/browse/HDFS-9301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14972301#comment-14972301 ] Hudson commented on HDFS-9301: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2522 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2522/]) HDFS-9301. HDFS clients can't construct HdfsConfiguration instances. (wheat9: rev 15eb84b37e6c0195d59d3a29fbc5b7417bf022ff) * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/HdfsConfiguration.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/HdfsConfiguration.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/client/HdfsClientConfigKeys.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/HdfsConfigurationLoader.java > HDFS clients can't construct HdfsConfiguration instances > > > Key: HDFS-9301 > URL: https://issues.apache.org/jira/browse/HDFS-9301 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Steve Loughran >Assignee: Mingliang Liu > Fix For: 2.8.0 > > Attachments: HDFS-9241.000.patch, HDFS-9241.001.patch, > HDFS-9241.002.patch, HDFS-9241.003.patch, HDFS-9241.004.patch, > HDFS-9241.005.patch > > > the changes for the hdfs client classpath make instantiating > {{HdfsConfiguration}} from the client impossible; it only lives server side. > This breaks any app which creates one. > I know people will look at the {{@Private}} tag and say "don't do that then", > but it's worth considering precisely why I, at least, do this: it's the only > way to guarantee that the hdfs-default and hdfs-site resources get on the > classpath, including all the security settings. It's precisely the use case > which {{HdfsConfigurationLoader.init();}} offers internally to the hdfs code. > What am I meant to do now? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-4015) Safemode should count and report orphaned blocks
[ https://issues.apache.org/jira/browse/HDFS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-4015: Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.8.0 Status: Resolved (was: Patch Available) Committed to branch-2 for 2.8.0. Thanks for contributing this improvement [~anu], and thanks for the reviews [~liuml07] and [~jnp]. > Safemode should count and report orphaned blocks > > > Key: HDFS-4015 > URL: https://issues.apache.org/jira/browse/HDFS-4015 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.0.0 >Reporter: Todd Lipcon >Assignee: Anu Engineer > Fix For: 2.8.0 > > Attachments: HDFS-4015.001.patch, HDFS-4015.002.patch, > HDFS-4015.003.patch, HDFS-4015.004.patch, HDFS-4015.005.patch, > HDFS-4015.006.patch, HDFS-4015.007.patch > > > The safemode status currently reports the number of unique reported blocks > compared to the total number of blocks referenced by the namespace. However, > it does not report the inverse: blocks which are reported by datanodes but > not referenced by the namespace. > In the case that an admin accidentally starts up from an old image, this can > be confusing: safemode and fsck will show "corrupt files", which are the > files which actually have been deleted but got resurrected by restarting from > the old image. This will convince them that they can safely force leave > safemode and remove these files -- after all, they know that those files > should really have been deleted. However, they're not aware that leaving > safemode will also unrecoverably delete a bunch of other block files which > have been orphaned due to the namespace rollback. > I'd like to consider reporting something like: "90 of expected 100 > blocks have been reported. Additionally, 1 blocks have been reported > which do not correspond to any file in the namespace. Forcing exit of > safemode will unrecoverably remove those data blocks" > Whether this statistic is also used for some kind of "inverse safe mode" is > the logical next step, but just reporting it as a warning seems easy enough > to accomplish and worth doing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-4015) Safemode should count and report orphaned blocks
[ https://issues.apache.org/jira/browse/HDFS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14972342#comment-14972342 ] Hudson commented on HDFS-4015: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #532 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/532/]) HDFS-4015. Safemode should count and report orphaned blocks. (arp: rev 86c92227fc56b6e06d879d250728e8dc8cbe98fe) * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelperClient.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsConstants.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java * hadoop-hdfs-project/hadoop-hdfs/src/test/resources/testHDFSConf.xml * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeMetadataConsistency.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeStatusMXBean.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSCommands.md * hadoop-hdfs-project/hadoop-hdfs-client/src/main/proto/ClientNamenodeProtocol.proto * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java > Safemode should count and report orphaned blocks > > > Key: HDFS-4015 > URL: https://issues.apache.org/jira/browse/HDFS-4015 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.0.0 >Reporter: Todd Lipcon >Assignee: Anu Engineer > Attachments: HDFS-4015.001.patch, HDFS-4015.002.patch, > HDFS-4015.003.patch, HDFS-4015.004.patch, HDFS-4015.005.patch, > HDFS-4015.006.patch, HDFS-4015.007.patch > > > The safemode status currently reports the number of unique reported blocks > compared to the total number of blocks referenced by the namespace. However, > it does not report the inverse: blocks which are reported by datanodes but > not referenced by the namespace. > In the case that an admin accidentally starts up from an old image, this can > be confusing: safemode and fsck will show "corrupt files", which are the > files which actually have been deleted but got resurrected by restarting from > the old image. This will convince them that they can safely force leave > safemode and remove these files -- after all, they know that those files > should really have been deleted. However, they're not aware that leaving > safemode will also unrecoverably delete a bunch of other block files which > have been orphaned due to the namespace rollback. > I'd like to consider reporting something like: "90 of expected 100 > blocks have been reported. Additionally, 1 blocks have been reported > which do not correspond to any file in the namespace. Forcing exit of > safemode will unrecoverably remove those data blocks" > Whether this statistic is also used for some kind of "inverse safe mode" is > the logical next step, but just reporting it as a warning seems easy enough > to accomplish and worth doing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9290) DFSClient#callAppend() is not backward compatible for slightly older NameNodes
[ https://issues.apache.org/jira/browse/HDFS-9290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14972341#comment-14972341 ] Hudson commented on HDFS-9290: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #532 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/532/]) HDFS-9290. DFSClient#callAppend() is not backward compatible for (kihwal: rev b9e0417bdf2b9655dc4256bdb43683eca1ab46be) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java > DFSClient#callAppend() is not backward compatible for slightly older NameNodes > -- > > Key: HDFS-9290 > URL: https://issues.apache.org/jira/browse/HDFS-9290 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.1 >Reporter: Tony Wu >Assignee: Tony Wu >Priority: Blocker > Fix For: 3.0.0, 2.7.2 > > Attachments: HDFS-9290.001.patch, HDFS-9290.002.patch > > > HDFS-7210 combined 2 RPC calls used at file append into a single one. > Specifically {{getFileInfo()}} is combined with {{append()}}. While backward > compatibility for older client is handled by the new NameNode (protobuf). > Newer client's {{append()}} call does not work with older NameNodes. One will > run into an exception like the following: > {code:java} > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.DFSOutputStream.isLazyPersist(DFSOutputStream.java:1741) > at > org.apache.hadoop.hdfs.DFSOutputStream.getChecksum4Compute(DFSOutputStream.java:1550) > at > org.apache.hadoop.hdfs.DFSOutputStream.(DFSOutputStream.java:1560) > at > org.apache.hadoop.hdfs.DFSOutputStream.(DFSOutputStream.java:1670) > at > org.apache.hadoop.hdfs.DFSOutputStream.newStreamForAppend(DFSOutputStream.java:1717) > at org.apache.hadoop.hdfs.DFSClient.callAppend(DFSClient.java:1861) > at org.apache.hadoop.hdfs.DFSClient.append(DFSClient.java:1922) > at org.apache.hadoop.hdfs.DFSClient.append(DFSClient.java:1892) > at > org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:340) > at > org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:336) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.append(DistributedFileSystem.java:336) > at > org.apache.hadoop.hdfs.DistributedFileSystem.append(DistributedFileSystem.java:318) > at org.apache.hadoop.fs.FileSystem.append(FileSystem.java:1164) > {code} > The cause is that the new client code is expecting both the last block and > file info in the same RPC but the old NameNode only replied with the first. > The exception itself does not reflect this and one will have to look at the > HDFS source code to really understand what happened. > We can have the client detect it's talking to a old NameNode and send an > extra {{getFileInfo()}} RPC. Or we should improve the exception being thrown > to accurately reflect the cause of failure. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9301) HDFS clients can't construct HdfsConfiguration instances
[ https://issues.apache.org/jira/browse/HDFS-9301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14972343#comment-14972343 ] Hudson commented on HDFS-9301: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #532 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/532/]) HDFS-9301. HDFS clients can't construct HdfsConfiguration instances. (wheat9: rev 15eb84b37e6c0195d59d3a29fbc5b7417bf022ff) * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/client/HdfsClientConfigKeys.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/HdfsConfiguration.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/HdfsConfiguration.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/HdfsConfigurationLoader.java > HDFS clients can't construct HdfsConfiguration instances > > > Key: HDFS-9301 > URL: https://issues.apache.org/jira/browse/HDFS-9301 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Steve Loughran >Assignee: Mingliang Liu > Fix For: 2.8.0 > > Attachments: HDFS-9241.000.patch, HDFS-9241.001.patch, > HDFS-9241.002.patch, HDFS-9241.003.patch, HDFS-9241.004.patch, > HDFS-9241.005.patch > > > the changes for the hdfs client classpath make instantiating > {{HdfsConfiguration}} from the client impossible; it only lives server side. > This breaks any app which creates one. > I know people will look at the {{@Private}} tag and say "don't do that then", > but it's worth considering precisely why I, at least, do this: it's the only > way to guarantee that the hdfs-default and hdfs-site resources get on the > classpath, including all the security settings. It's precisely the use case > which {{HdfsConfigurationLoader.init();}} offers internally to the hdfs code. > What am I meant to do now? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9297) Update TestBlockMissingException to use corruptBlockOnDataNodesByDeletingBlockFile()
[ https://issues.apache.org/jira/browse/HDFS-9297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14972344#comment-14972344 ] Hudson commented on HDFS-9297: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #532 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/532/]) HDFS-9297. Update TestBlockMissingException to use (lei: rev 5679e46b7f867f8f7f8195c86c37e3db7b23d7d7) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestBlockMissingException.java > Update TestBlockMissingException to use > corruptBlockOnDataNodesByDeletingBlockFile() > > > Key: HDFS-9297 > URL: https://issues.apache.org/jira/browse/HDFS-9297 > Project: Hadoop HDFS > Issue Type: Improvement > Components: HDFS, test >Affects Versions: 2.7.1 >Reporter: Tony Wu >Assignee: Tony Wu >Priority: Trivial > Fix For: 3.0.0, 2.8.0 > > Attachments: HDFS-9297.001.patch > > > TestBlockMissingException uses its own function to corrupt a block by > deleting all its block files. HDFS-7235 introduced a helper function > {{corruptBlockOnDataNodesByDeletingBlockFile()}} that does exactly the same > thing. We can update this test to use the helper function. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9297) Update TestBlockMissingException to use corruptBlockOnDataNodesByDeletingBlockFile()
[ https://issues.apache.org/jira/browse/HDFS-9297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14972285#comment-14972285 ] Hudson commented on HDFS-9297: -- FAILURE: Integrated in Hadoop-trunk-Commit #8699 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8699/]) HDFS-9297. Update TestBlockMissingException to use (lei: rev 5679e46b7f867f8f7f8195c86c37e3db7b23d7d7) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestBlockMissingException.java > Update TestBlockMissingException to use > corruptBlockOnDataNodesByDeletingBlockFile() > > > Key: HDFS-9297 > URL: https://issues.apache.org/jira/browse/HDFS-9297 > Project: Hadoop HDFS > Issue Type: Improvement > Components: HDFS, test >Affects Versions: 2.7.1 >Reporter: Tony Wu >Assignee: Tony Wu >Priority: Trivial > Fix For: 3.0.0, 2.8.0 > > Attachments: HDFS-9297.001.patch > > > TestBlockMissingException uses its own function to corrupt a block by > deleting all its block files. HDFS-7235 introduced a helper function > {{corruptBlockOnDataNodesByDeletingBlockFile()}} that does exactly the same > thing. We can update this test to use the helper function. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-4015) Safemode should count and report orphaned blocks
[ https://issues.apache.org/jira/browse/HDFS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14972385#comment-14972385 ] Hudson commented on HDFS-4015: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #578 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/578/]) HDFS-4015. Safemode should count and report orphaned blocks. (arp: rev 86c92227fc56b6e06d879d250728e8dc8cbe98fe) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeMetadataConsistency.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeStatusMXBean.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsConstants.java * hadoop-hdfs-project/hadoop-hdfs/src/test/resources/testHDFSConf.xml * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java * hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSCommands.md * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelperClient.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/proto/ClientNamenodeProtocol.proto > Safemode should count and report orphaned blocks > > > Key: HDFS-4015 > URL: https://issues.apache.org/jira/browse/HDFS-4015 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.0.0 >Reporter: Todd Lipcon >Assignee: Anu Engineer > Fix For: 2.8.0 > > Attachments: HDFS-4015.001.patch, HDFS-4015.002.patch, > HDFS-4015.003.patch, HDFS-4015.004.patch, HDFS-4015.005.patch, > HDFS-4015.006.patch, HDFS-4015.007.patch > > > The safemode status currently reports the number of unique reported blocks > compared to the total number of blocks referenced by the namespace. However, > it does not report the inverse: blocks which are reported by datanodes but > not referenced by the namespace. > In the case that an admin accidentally starts up from an old image, this can > be confusing: safemode and fsck will show "corrupt files", which are the > files which actually have been deleted but got resurrected by restarting from > the old image. This will convince them that they can safely force leave > safemode and remove these files -- after all, they know that those files > should really have been deleted. However, they're not aware that leaving > safemode will also unrecoverably delete a bunch of other block files which > have been orphaned due to the namespace rollback. > I'd like to consider reporting something like: "90 of expected 100 > blocks have been reported. Additionally, 1 blocks have been reported > which do not correspond to any file in the namespace. Forcing exit of > safemode will unrecoverably remove those data blocks" > Whether this statistic is also used for some kind of "inverse safe mode" is > the logical next step, but just reporting it as a warning seems easy enough > to accomplish and worth doing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9297) Update TestBlockMissingException to use corruptBlockOnDataNodesByDeletingBlockFile()
[ https://issues.apache.org/jira/browse/HDFS-9297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14972386#comment-14972386 ] Hudson commented on HDFS-9297: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #578 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/578/]) HDFS-9297. Update TestBlockMissingException to use (lei: rev 5679e46b7f867f8f7f8195c86c37e3db7b23d7d7) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestBlockMissingException.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > Update TestBlockMissingException to use > corruptBlockOnDataNodesByDeletingBlockFile() > > > Key: HDFS-9297 > URL: https://issues.apache.org/jira/browse/HDFS-9297 > Project: Hadoop HDFS > Issue Type: Improvement > Components: HDFS, test >Affects Versions: 2.7.1 >Reporter: Tony Wu >Assignee: Tony Wu >Priority: Trivial > Fix For: 3.0.0, 2.8.0 > > Attachments: HDFS-9297.001.patch > > > TestBlockMissingException uses its own function to corrupt a block by > deleting all its block files. HDFS-7235 introduced a helper function > {{corruptBlockOnDataNodesByDeletingBlockFile()}} that does exactly the same > thing. We can update this test to use the helper function. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-4015) Safemode should count and report orphaned blocks
[ https://issues.apache.org/jira/browse/HDFS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14972284#comment-14972284 ] Hudson commented on HDFS-4015: -- FAILURE: Integrated in Hadoop-trunk-Commit #8699 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8699/]) HDFS-4015. Safemode should count and report orphaned blocks. (arp: rev 86c92227fc56b6e06d879d250728e8dc8cbe98fe) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/proto/ClientNamenodeProtocol.proto * hadoop-hdfs-project/hadoop-hdfs/src/test/resources/testHDFSConf.xml * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsConstants.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java * hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSCommands.md * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeMetadataConsistency.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeStatusMXBean.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelperClient.java > Safemode should count and report orphaned blocks > > > Key: HDFS-4015 > URL: https://issues.apache.org/jira/browse/HDFS-4015 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.0.0 >Reporter: Todd Lipcon >Assignee: Anu Engineer > Attachments: HDFS-4015.001.patch, HDFS-4015.002.patch, > HDFS-4015.003.patch, HDFS-4015.004.patch, HDFS-4015.005.patch, > HDFS-4015.006.patch, HDFS-4015.007.patch > > > The safemode status currently reports the number of unique reported blocks > compared to the total number of blocks referenced by the namespace. However, > it does not report the inverse: blocks which are reported by datanodes but > not referenced by the namespace. > In the case that an admin accidentally starts up from an old image, this can > be confusing: safemode and fsck will show "corrupt files", which are the > files which actually have been deleted but got resurrected by restarting from > the old image. This will convince them that they can safely force leave > safemode and remove these files -- after all, they know that those files > should really have been deleted. However, they're not aware that leaving > safemode will also unrecoverably delete a bunch of other block files which > have been orphaned due to the namespace rollback. > I'd like to consider reporting something like: "90 of expected 100 > blocks have been reported. Additionally, 1 blocks have been reported > which do not correspond to any file in the namespace. Forcing exit of > safemode will unrecoverably remove those data blocks" > Whether this statistic is also used for some kind of "inverse safe mode" is > the logical next step, but just reporting it as a warning seems easy enough > to accomplish and worth doing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-4015) Safemode should count and report orphaned blocks
[ https://issues.apache.org/jira/browse/HDFS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14972348#comment-14972348 ] Hudson commented on HDFS-4015: -- FAILURE: Integrated in Hadoop-Yarn-trunk #1313 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/1313/]) HDFS-4015. Safemode should count and report orphaned blocks. (arp: rev 86c92227fc56b6e06d879d250728e8dc8cbe98fe) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeMetadataConsistency.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeStatusMXBean.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/proto/ClientNamenodeProtocol.proto * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java * hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSCommands.md * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsConstants.java * hadoop-hdfs-project/hadoop-hdfs/src/test/resources/testHDFSConf.xml * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelperClient.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > Safemode should count and report orphaned blocks > > > Key: HDFS-4015 > URL: https://issues.apache.org/jira/browse/HDFS-4015 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.0.0 >Reporter: Todd Lipcon >Assignee: Anu Engineer > Attachments: HDFS-4015.001.patch, HDFS-4015.002.patch, > HDFS-4015.003.patch, HDFS-4015.004.patch, HDFS-4015.005.patch, > HDFS-4015.006.patch, HDFS-4015.007.patch > > > The safemode status currently reports the number of unique reported blocks > compared to the total number of blocks referenced by the namespace. However, > it does not report the inverse: blocks which are reported by datanodes but > not referenced by the namespace. > In the case that an admin accidentally starts up from an old image, this can > be confusing: safemode and fsck will show "corrupt files", which are the > files which actually have been deleted but got resurrected by restarting from > the old image. This will convince them that they can safely force leave > safemode and remove these files -- after all, they know that those files > should really have been deleted. However, they're not aware that leaving > safemode will also unrecoverably delete a bunch of other block files which > have been orphaned due to the namespace rollback. > I'd like to consider reporting something like: "90 of expected 100 > blocks have been reported. Additionally, 1 blocks have been reported > which do not correspond to any file in the namespace. Forcing exit of > safemode will unrecoverably remove those data blocks" > Whether this statistic is also used for some kind of "inverse safe mode" is > the logical next step, but just reporting it as a warning seems easy enough > to accomplish and worth doing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9297) Update TestBlockMissingException to use corruptBlockOnDataNodesByDeletingBlockFile()
[ https://issues.apache.org/jira/browse/HDFS-9297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14972349#comment-14972349 ] Hudson commented on HDFS-9297: -- FAILURE: Integrated in Hadoop-Yarn-trunk #1313 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/1313/]) HDFS-9297. Update TestBlockMissingException to use (lei: rev 5679e46b7f867f8f7f8195c86c37e3db7b23d7d7) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestBlockMissingException.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > Update TestBlockMissingException to use > corruptBlockOnDataNodesByDeletingBlockFile() > > > Key: HDFS-9297 > URL: https://issues.apache.org/jira/browse/HDFS-9297 > Project: Hadoop HDFS > Issue Type: Improvement > Components: HDFS, test >Affects Versions: 2.7.1 >Reporter: Tony Wu >Assignee: Tony Wu >Priority: Trivial > Fix For: 3.0.0, 2.8.0 > > Attachments: HDFS-9297.001.patch > > > TestBlockMissingException uses its own function to corrupt a block by > deleting all its block files. HDFS-7235 introduced a helper function > {{corruptBlockOnDataNodesByDeletingBlockFile()}} that does exactly the same > thing. We can update this test to use the helper function. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-4015) Safemode should count and report orphaned blocks
[ https://issues.apache.org/jira/browse/HDFS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14972363#comment-14972363 ] Hudson commented on HDFS-4015: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2523 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2523/]) HDFS-4015. Safemode should count and report orphaned blocks. (arp: rev 86c92227fc56b6e06d879d250728e8dc8cbe98fe) * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSCommands.md * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeStatusMXBean.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/proto/ClientNamenodeProtocol.proto * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsConstants.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelperClient.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeMetadataConsistency.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java * hadoop-hdfs-project/hadoop-hdfs/src/test/resources/testHDFSConf.xml * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java > Safemode should count and report orphaned blocks > > > Key: HDFS-4015 > URL: https://issues.apache.org/jira/browse/HDFS-4015 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.0.0 >Reporter: Todd Lipcon >Assignee: Anu Engineer > Attachments: HDFS-4015.001.patch, HDFS-4015.002.patch, > HDFS-4015.003.patch, HDFS-4015.004.patch, HDFS-4015.005.patch, > HDFS-4015.006.patch, HDFS-4015.007.patch > > > The safemode status currently reports the number of unique reported blocks > compared to the total number of blocks referenced by the namespace. However, > it does not report the inverse: blocks which are reported by datanodes but > not referenced by the namespace. > In the case that an admin accidentally starts up from an old image, this can > be confusing: safemode and fsck will show "corrupt files", which are the > files which actually have been deleted but got resurrected by restarting from > the old image. This will convince them that they can safely force leave > safemode and remove these files -- after all, they know that those files > should really have been deleted. However, they're not aware that leaving > safemode will also unrecoverably delete a bunch of other block files which > have been orphaned due to the namespace rollback. > I'd like to consider reporting something like: "90 of expected 100 > blocks have been reported. Additionally, 1 blocks have been reported > which do not correspond to any file in the namespace. Forcing exit of > safemode will unrecoverably remove those data blocks" > Whether this statistic is also used for some kind of "inverse safe mode" is > the logical next step, but just reporting it as a warning seems easy enough > to accomplish and worth doing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9297) Update TestBlockMissingException to use corruptBlockOnDataNodesByDeletingBlockFile()
[ https://issues.apache.org/jira/browse/HDFS-9297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14972365#comment-14972365 ] Hudson commented on HDFS-9297: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2523 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2523/]) HDFS-9297. Update TestBlockMissingException to use (lei: rev 5679e46b7f867f8f7f8195c86c37e3db7b23d7d7) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestBlockMissingException.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > Update TestBlockMissingException to use > corruptBlockOnDataNodesByDeletingBlockFile() > > > Key: HDFS-9297 > URL: https://issues.apache.org/jira/browse/HDFS-9297 > Project: Hadoop HDFS > Issue Type: Improvement > Components: HDFS, test >Affects Versions: 2.7.1 >Reporter: Tony Wu >Assignee: Tony Wu >Priority: Trivial > Fix For: 3.0.0, 2.8.0 > > Attachments: HDFS-9297.001.patch > > > TestBlockMissingException uses its own function to corrupt a block by > deleting all its block files. HDFS-7235 introduced a helper function > {{corruptBlockOnDataNodesByDeletingBlockFile()}} that does exactly the same > thing. We can update this test to use the helper function. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7284) Add more debug info to BlockInfoUnderConstruction#setGenerationStampAndVerifyReplicas
[ https://issues.apache.org/jira/browse/HDFS-7284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14972426#comment-14972426 ] Hadoop QA commented on HDFS-7284: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 20m 17s | Pre-patch trunk has 1 extant Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 8m 17s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 25s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 24s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 2m 35s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 40s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 4m 33s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 13s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 51m 17s | Tests failed in hadoop-hdfs. | | {color:green}+1{color} | hdfs tests | 0m 36s | Tests passed in hadoop-hdfs-client. | | | | 103m 55s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.hdfs.util.TestByteArrayManager | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12768378/HDFS-7284.005.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 7781fe1 | | Pre-patch Findbugs warnings | https://builds.apache.org/job/PreCommit-HDFS-Build/13178/artifact/patchprocess/trunkFindbugsWarningshadoop-hdfs.html | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/13178/artifact/patchprocess/testrun_hadoop-hdfs.txt | | hadoop-hdfs-client test log | https://builds.apache.org/job/PreCommit-HDFS-Build/13178/artifact/patchprocess/testrun_hadoop-hdfs-client.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/13178/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf900.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/13178/console | This message was automatically generated. > Add more debug info to > BlockInfoUnderConstruction#setGenerationStampAndVerifyReplicas > - > > Key: HDFS-7284 > URL: https://issues.apache.org/jira/browse/HDFS-7284 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 2.5.1 >Reporter: Hu Liu, >Assignee: Wei-Chiu Chuang > Labels: supportability > Attachments: HDFS-7284.001.patch, HDFS-7284.002.patch, > HDFS-7284.003.patch, HDFS-7284.004.patch, HDFS-7284.005.patch > > > When I was looking at some replica loss issue, I got the following info from > log > {code} > 2014-10-13 01:54:53,104 INFO BlockStateChange: BLOCK* Removing stale replica > from location x.x.x.x > {code} > I could just know that a replica is removed, but I don't know which block and > its timestamp. I need to know the id and timestamp of the block from the log > file. > So it's better to add more info including block id and timestamp to the code > snippet > {code} > for (ReplicaUnderConstruction r : replicas) { > if (genStamp != r.getGenerationStamp()) { > r.getExpectedLocation().removeBlock(this); > NameNode.blockStateChangeLog.info("BLOCK* Removing stale replica " > + "from location: " + r.getExpectedLocation()); > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8831) Trash Support for deletion in HDFS encryption zone
[ https://issues.apache.org/jira/browse/HDFS-8831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-8831: - Attachment: HDFS-8831.02.patch > Trash Support for deletion in HDFS encryption zone > -- > > Key: HDFS-8831 > URL: https://issues.apache.org/jira/browse/HDFS-8831 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: encryption >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao > Attachments: HDFS-8831-10152015.pdf, HDFS-8831.00.patch, > HDFS-8831.01.patch, HDFS-8831.02.patch > > > Currently, "Soft Delete" is only supported if the whole encryption zone is > deleted. If you delete files whinin the zone with trash feature enabled, you > will get error similar to the following > {code} > rm: Failed to move to trash: hdfs://HW11217.local:9000/z1_1/startnn.sh: > /z1_1/startnn.sh can't be moved from an encryption zone. > {code} > With HDFS-8830, we can support "Soft Delete" by adding the .Trash folder of > the file being deleted appropriately to the same encryption zone. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9297) Update TestBlockMissingException to use corruptBlockOnDataNodesByDeletingBlockFile()
[ https://issues.apache.org/jira/browse/HDFS-9297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14972372#comment-14972372 ] Hudson commented on HDFS-9297: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #591 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/591/]) HDFS-9297. Update TestBlockMissingException to use (lei: rev 5679e46b7f867f8f7f8195c86c37e3db7b23d7d7) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestBlockMissingException.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > Update TestBlockMissingException to use > corruptBlockOnDataNodesByDeletingBlockFile() > > > Key: HDFS-9297 > URL: https://issues.apache.org/jira/browse/HDFS-9297 > Project: Hadoop HDFS > Issue Type: Improvement > Components: HDFS, test >Affects Versions: 2.7.1 >Reporter: Tony Wu >Assignee: Tony Wu >Priority: Trivial > Fix For: 3.0.0, 2.8.0 > > Attachments: HDFS-9297.001.patch > > > TestBlockMissingException uses its own function to corrupt a block by > deleting all its block files. HDFS-7235 introduced a helper function > {{corruptBlockOnDataNodesByDeletingBlockFile()}} that does exactly the same > thing. We can update this test to use the helper function. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-4015) Safemode should count and report orphaned blocks
[ https://issues.apache.org/jira/browse/HDFS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14972370#comment-14972370 ] Hudson commented on HDFS-4015: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #591 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/591/]) HDFS-4015. Safemode should count and report orphaned blocks. (arp: rev 86c92227fc56b6e06d879d250728e8dc8cbe98fe) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSCommands.md * hadoop-hdfs-project/hadoop-hdfs/src/test/resources/testHDFSConf.xml * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeMetadataConsistency.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelperClient.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsConstants.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeStatusMXBean.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/proto/ClientNamenodeProtocol.proto > Safemode should count and report orphaned blocks > > > Key: HDFS-4015 > URL: https://issues.apache.org/jira/browse/HDFS-4015 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.0.0 >Reporter: Todd Lipcon >Assignee: Anu Engineer > Attachments: HDFS-4015.001.patch, HDFS-4015.002.patch, > HDFS-4015.003.patch, HDFS-4015.004.patch, HDFS-4015.005.patch, > HDFS-4015.006.patch, HDFS-4015.007.patch > > > The safemode status currently reports the number of unique reported blocks > compared to the total number of blocks referenced by the namespace. However, > it does not report the inverse: blocks which are reported by datanodes but > not referenced by the namespace. > In the case that an admin accidentally starts up from an old image, this can > be confusing: safemode and fsck will show "corrupt files", which are the > files which actually have been deleted but got resurrected by restarting from > the old image. This will convince them that they can safely force leave > safemode and remove these files -- after all, they know that those files > should really have been deleted. However, they're not aware that leaving > safemode will also unrecoverably delete a bunch of other block files which > have been orphaned due to the namespace rollback. > I'd like to consider reporting something like: "90 of expected 100 > blocks have been reported. Additionally, 1 blocks have been reported > which do not correspond to any file in the namespace. Forcing exit of > safemode will unrecoverably remove those data blocks" > Whether this statistic is also used for some kind of "inverse safe mode" is > the logical next step, but just reporting it as a warning seems easy enough > to accomplish and worth doing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-4015) Safemode should count and report orphaned blocks
[ https://issues.apache.org/jira/browse/HDFS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14972367#comment-14972367 ] Arpit Agarwal commented on HDFS-4015: - Committed to trunk. Keeping Jira open for the branch-2 commit. > Safemode should count and report orphaned blocks > > > Key: HDFS-4015 > URL: https://issues.apache.org/jira/browse/HDFS-4015 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.0.0 >Reporter: Todd Lipcon >Assignee: Anu Engineer > Attachments: HDFS-4015.001.patch, HDFS-4015.002.patch, > HDFS-4015.003.patch, HDFS-4015.004.patch, HDFS-4015.005.patch, > HDFS-4015.006.patch, HDFS-4015.007.patch > > > The safemode status currently reports the number of unique reported blocks > compared to the total number of blocks referenced by the namespace. However, > it does not report the inverse: blocks which are reported by datanodes but > not referenced by the namespace. > In the case that an admin accidentally starts up from an old image, this can > be confusing: safemode and fsck will show "corrupt files", which are the > files which actually have been deleted but got resurrected by restarting from > the old image. This will convince them that they can safely force leave > safemode and remove these files -- after all, they know that those files > should really have been deleted. However, they're not aware that leaving > safemode will also unrecoverably delete a bunch of other block files which > have been orphaned due to the namespace rollback. > I'd like to consider reporting something like: "90 of expected 100 > blocks have been reported. Additionally, 1 blocks have been reported > which do not correspond to any file in the namespace. Forcing exit of > safemode will unrecoverably remove those data blocks" > Whether this statistic is also used for some kind of "inverse safe mode" is > the logical next step, but just reporting it as a warning seems easy enough > to accomplish and worth doing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9279) Decomissioned capacity should not be considered for configured/used capacity
[ https://issues.apache.org/jira/browse/HDFS-9279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14972309#comment-14972309 ] Hadoop QA commented on HDFS-9279: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 16m 29s | Findbugs (version ) appears to be broken on trunk. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 8m 4s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 35s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 25s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 31s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 40s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 2m 33s | The patch appears to introduce 1 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 15s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 64m 26s | Tests failed in hadoop-hdfs. | | | | 108m 34s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-hdfs | | Failed unit tests | hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes | | | hadoop.hdfs.server.datanode.fsdataset.impl.TestInterDatanodeProtocol | | | hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyWriter | | | hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks | | | hadoop.hdfs.TestReplaceDatanodeOnFailure | | | hadoop.hdfs.server.namenode.TestNameNodeMXBean | | | hadoop.hdfs.server.namenode.TestNamenodeCapacityReport | | | hadoop.hdfs.TestDecommission | | | hadoop.hdfs.server.namenode.TestCacheDirectives | | | hadoop.hdfs.TestLeaseRecovery2 | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12768433/HDFS-9279-v2.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 5679e46 | | Findbugs warnings | https://builds.apache.org/job/PreCommit-HDFS-Build/13176/artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/13176/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/13176/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/13176/console | This message was automatically generated. > Decomissioned capacity should not be considered for configured/used capacity > > > Key: HDFS-9279 > URL: https://issues.apache.org/jira/browse/HDFS-9279 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.1 >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla > Attachments: HDFS-9279-v1.patch, HDFS-9279-v2.patch > > > Capacity of a decommissioned node is being accounted as configured and used > capacity metrics. This gives incorrect perception of cluster usage. > Once a node is decommissioned, its capacity should be considered similar to a > dead node. -- This message was sent by Atlassian JIRA (v6.3.4#6332)