[jira] [Commented] (HDFS-15187) CORRUPT replica mismatch between namenodes after failover
[ https://issues.apache.org/jira/browse/HDFS-15187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043614#comment-17043614 ] Hudson commented on HDFS-15187: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17983 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17983/]) HDFS-15187. CORRUPT replica mismatch between namenodes after failover. (ayushsaxena: rev 7f8685f4760f1358bb30927a7da9a5041e8c39e1) * (add) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestCorruptionWithFailover.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java > CORRUPT replica mismatch between namenodes after failover > - > > Key: HDFS-15187 > URL: https://issues.apache.org/jira/browse/HDFS-15187 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Critical > Attachments: HDFS-15187-01.patch, HDFS-15187-02.patch, > HDFS-15187-03.patch > > > The corrupt replica identified by Active Namenode, isn't identified by the > Other Namenode, when it is failovered to Active, in case the replica is being > marked corrupt due to updatePipeline. > Scenario to repro : > 1. Create a file, while writing turn one datanode down, to trigger update > pipeline. > 2. Write some more data. > 3. Close the file. > 4. Turn on the shutdown datanode. > 5. The replica in the datanode will be identifed as CORRUPT and the corrupt > count will be 1. > 6. Failover to other Namenode. > 7. Wait for all pending IBR processing. > 8. The corrupt count will not be same, and the FSCK won't show the corrupt > replica. > 9. Failover back to first namenode. > 10. Corrupt count and corrupt replica will be there. > Both Namenodes shows different stuff. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15166) Remove redundant field fStream in ByteStringLog
[ https://issues.apache.org/jira/browse/HDFS-15166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043605#comment-17043605 ] Hudson commented on HDFS-15166: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17982 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17982/]) HDFS-15166. Remove redundant field fStream in ByteStringLog. Contributed (ayushsaxena: rev 93b8f453b96470f1a6cc9ac098f4934ddd631657) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EditLogFileInputStream.java > Remove redundant field fStream in ByteStringLog > --- > > Key: HDFS-15166 > URL: https://issues.apache.org/jira/browse/HDFS-15166 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.10.0 >Reporter: Konstantin Shvachko >Assignee: Xieming Li >Priority: Major > Labels: newbie, newbie++ > Fix For: 3.3.0, 3.1.4, 3.2.2, 2.10.1 > > Attachments: HDFS-15166.000.patch > > > {{ByteStringLog.fStream}} is only used in {{init()}} method and can be > replaced by a local variable. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15176) Enable GcTimePercentage Metric in NameNode's JvmMetrics.
[ https://issues.apache.org/jira/browse/HDFS-15176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043028#comment-17043028 ] Hudson commented on HDFS-15176: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17980 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17980/]) HDFS-15176. Enable GcTimePercentage Metric in NameNode's JvmMetrics. (ayushsaxena: rev b5698e0c33efd546dfea99980840c6e726795df3) * (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/GcTimeMonitor.java * (edit) hadoop-common-project/hadoop-common/src/site/markdown/Metrics.md * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/metrics/TestNameNodeMetrics.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml > Enable GcTimePercentage Metric in NameNode's JvmMetrics. > > > Key: HDFS-15176 > URL: https://issues.apache.org/jira/browse/HDFS-15176 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Jinglun >Assignee: Jinglun >Priority: Minor > Fix For: 3.3.0 > > Attachments: HDFS-15176.001.patch, HDFS-15176.002.patch, > HDFS-15176.003.patch, HDFS-15176.004.patch, HDFS-15176.005.patch, > HDFS-15176.006.patch > > > The GcTimePercentage(computed by GcTimeMonitor) could be used as a dimension > to analyze the NameNode GC. We should add a switch config to enable the > GcTimePercentage metric in HDFS. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15041) Make MAX_LOCK_HOLD_MS and full queue size configurable
[ https://issues.apache.org/jira/browse/HDFS-15041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17042847#comment-17042847 ] Hudson commented on HDFS-15041: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17979 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17979/]) HDFS-15041. Make MAX_LOCK_HOLD_MS and full queue size configurable. (ayushsaxena: rev 9eb7a8bdf8f3b1dc76efc22db9651474303d309e) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml > Make MAX_LOCK_HOLD_MS and full queue size configurable > -- > > Key: HDFS-15041 > URL: https://issues.apache.org/jira/browse/HDFS-15041 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 3.2.0 >Reporter: zhuqi >Assignee: zhuqi >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-15041.001.patch, HDFS-15041.002.patch, > HDFS-15041.003.patch, HDFS-15041.004.patch > > > Now the MAX_LOCK_HOLD_MS and the full queue size are fixed. But different > cluster have different need for the latency and the queue health standard. > We'd better to make the two parameter configurable. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15182) TestBlockManager#testOneOfTwoRacksDecommissioned() fail in trunk
[ https://issues.apache.org/jira/browse/HDFS-15182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17042474#comment-17042474 ] Hudson commented on HDFS-15182: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17978 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17978/]) HDFS-15182. TestBlockManager#testOneOfTwoRacksDecommissioned() fail in (ayushsaxena: rev ba9025c7cd8303dadaa792b6372a877414564cd7) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockManager.java > TestBlockManager#testOneOfTwoRacksDecommissioned() fail in trunk > > > Key: HDFS-15182 > URL: https://issues.apache.org/jira/browse/HDFS-15182 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Lisheng Sun >Assignee: Lisheng Sun >Priority: Minor > Fix For: 3.3.0 > > Attachments: HDFS-15182-001.patch, HDFS-15182-002.patch, > HDFS-15182-003.patch, HDFS-15182-004.patch > > > when run only a UT of TestBlockManager#testOneOfTwoRacksDecommissioned(), it > will fail and throw NullPointerException. > Since NameNode#metrics is static variable,run all uts in TestBlockManager and > other ut has init metrics. > But that it runs only testOneOfTwoRacksDecommissioned without initialing > metrics throws NullPointerException. > {code:java} > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.addBlock(BlockManager.java:4088) > at > org.apache.hadoop.hdfs.server.blockmanagement.TestBlockManager.fulfillPipeline(TestBlockManager.java:518) > at > org.apache.hadoop.hdfs.server.blockmanagement.TestBlockManager.doTestOneOfTwoRacksDecommissioned(TestBlockManager.java:388) > at > org.apache.hadoop.hdfs.server.blockmanagement.TestBlockManager.testOneOfTwoRacksDecommissioned(TestBlockManager.java:353) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at org.junit.runner.JUnitCore.run(JUnitCore.java:137) > at > com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68) > at > com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47) > at > com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242) > at > com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70) > {code} > And testAllNodesHoldingReplicasDecommissioned , > testTwoOfThreeNodesDecommissioned , testSufficientlyReplBlocksUsesNewRack > also have the same problem. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14731) [FGL] Remove redundant locking on NameNode.
[ https://issues.apache.org/jira/browse/HDFS-14731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17042341#comment-17042341 ] Hudson commented on HDFS-14731: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17977 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17977/]) HDFS-14731. [FGL] Remove redundant locking on NameNode. Contributed by (shv: rev ecbcb058b8bc0fbc3903acb56814c6d9608bc396) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirEncryptionZoneOp.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupImage.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlocksWithNotEnoughRacks.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EncryptionZoneManager.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ReencryptionHandler.java > [FGL] Remove redundant locking on NameNode. > --- > > Key: HDFS-14731 > URL: https://issues.apache.org/jira/browse/HDFS-14731 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Konstantin Shvachko >Assignee: Konstantin Shvachko >Priority: Major > Attachments: HDFS-14731.001.patch > > > Currently NameNode has two global locks: FSNamesystemLock and > FSDirectoryLock. An analysis shows that single FSNamesystemLock is sufficient > to guarantee consistency of the NameNode state. FSDirectoryLock can be > removed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15172) Remove unnecessary deadNodeDetectInterval in DeadNodeDetector#checkDeadNodes()
[ https://issues.apache.org/jira/browse/HDFS-15172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17042308#comment-17042308 ] Hudson commented on HDFS-15172: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17976 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17976/]) HDFS-15172. Remove unnecessary deadNodeDetectInterval in (inigoiri: rev ed70c115a8f5303766122ede97b4bb57f22754c8) * (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DeadNodeDetector.java > Remove unnecessary deadNodeDetectInterval in > DeadNodeDetector#checkDeadNodes() > --- > > Key: HDFS-15172 > URL: https://issues.apache.org/jira/browse/HDFS-15172 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Lisheng Sun >Assignee: Lisheng Sun >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-15172-001.patch, HDFS-15172-002.patch > > > Every call to checkDeadNodes() will change the state to IDLE forcing the > DeadNodeDetector to sleep for IDLE_SLEEP_MS. So we don't need > deadNodeDetectInterval between every checkDeadNodes(). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15185) StartupProgress reports edits segments until the entire startup completes
[ https://issues.apache.org/jira/browse/HDFS-15185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17042120#comment-17042120 ] Hudson commented on HDFS-15185: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17975 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17975/]) HDFS-15185. StartupProgress reports edits segments until the entire (shv: rev 6f84269bcd5cdb08ca68b2d8276f66d34a2a7a0d) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/startupprogress/TestStartupProgress.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/startupprogress/StartupProgress.java > StartupProgress reports edits segments until the entire startup completes > - > > Key: HDFS-15185 > URL: https://issues.apache.org/jira/browse/HDFS-15185 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.10.0 >Reporter: Konstantin Shvachko >Assignee: Konstantin Shvachko >Priority: Major > Fix For: 3.3.0, 3.1.4, 3.2.2, 2.10.1 > > Attachments: HDFS-15185.001.patch > > > Startup Progress page keeps reporting edits segments after the {{LOAD_EDITS}} > stage is complete. New steps are added to StartupProgress while journal > tailing until all startup phases are completed. This adds a lot of edits > steps, since {{SAFEMODE}} phase can take a long time on a large cluster. > With fast tailing the segments are small, but the number of them is large - > 160K. This makes the page load forever. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15052) WebHDFS getTrashRoot leads to OOM due to FileSystem object creation
[ https://issues.apache.org/jira/browse/HDFS-15052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17041489#comment-17041489 ] Hudson commented on HDFS-15052: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17971 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17971/]) HDFS-15052. WebHDFS getTrashRoot leads to OOM due to FileSystem object (github: rev 2338d25dc7150d75fbda84cc95422380b564) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/web/TestWebHDFS.java * (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java * (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSUtilClient.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/web/resources/NamenodeWebHdfsMethods.java > WebHDFS getTrashRoot leads to OOM due to FileSystem object creation > --- > > Key: HDFS-15052 > URL: https://issues.apache.org/jira/browse/HDFS-15052 > Project: Hadoop HDFS > Issue Type: Bug > Components: webhdfs >Affects Versions: 2.9.0, 3.0.0-alpha2 >Reporter: Wei-Chiu Chuang >Assignee: Masatake Iwasaki >Priority: Major > > Quoting [~daryn] in HDFS-10756 : > {quote}Surprised nobody has discovered this will lead to an inevitable OOM in > the NN. The NN should not be creating filesystems to itself, and must never > create filesystems in a remote user's context or the cache will explode. > {quote} > I guess the problem lies in side NamenodeWebHdfsMethods#getTrashRoot > {code:java} > private static String getTrashRoot(String fullPath, > Configuration conf) throws IOException { > FileSystem fs = FileSystem.get(conf != null ? conf : new > Configuration()); > return fs.getTrashRoot( > new org.apache.hadoop.fs.Path(fullPath)).toUri().getPath(); > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15165) In Du missed calling getAttributesProvider
[ https://issues.apache.org/jira/browse/HDFS-15165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17040380#comment-17040380 ] Hudson commented on HDFS-15165: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17968 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17968/]) HDFS-15165. In Du missed calling getAttributesProvider. Contributed by (inigoiri: rev ec7507162c7e23c0cd251e09b6be0030a500f1ca) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSPermissionChecker.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestINodeAttributeProvider.java > In Du missed calling getAttributesProvider > -- > > Key: HDFS-15165 > URL: https://issues.apache.org/jira/browse/HDFS-15165 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-15165.00.patch, HDFS-15165.01.patch, > example-test.patch > > > HDFS-12130 changed the behavior of DU command. > It merged both check permission and computation in to a single step. > During this change, when it is required to getInodeAttributes, it just used > inode.getAttributes(). But when attribute provider class is configured, we > should call attribute provider configured object to get InodeAttributes and > use the returned InodeAttributes during checkPermission. > So, if we see after HDFS-12130, code is changed as below. > > {code:java} > byte[][] localComponents = {inode.getLocalNameBytes()}; > INodeAttributes[] iNodeAttr = {inode.getSnapshotINode(snapshotId)}; > enforcer.checkPermission( > fsOwner, supergroup, callerUgi, > iNodeAttr, // single inode attr in the array > new INode[]{inode}, // single inode in the array > localComponents, snapshotId, > null, -1, // this will skip checkTraverse() because > // not checking ancestor here > false, null, null, > access, // the target access to be checked against the inode > null, // passing null sub access avoids checking children > false); > {code} > > If we observe 2nd line it is missing the check if attribute provider class is > configured use that to get InodeAttributeProvider. Because of this when hdfs > path is managed by sentry, and InodeAttributeProvider class is configured > with SentryINodeAttributeProvider, it does not get > SentryInodeAttributeProvider object and not using AclFeature from that if any > Acl’s are set. This has caused the issue of AccessControlException when du > command is run against hdfs path managed by Sentry. > > {code:java} > [root@gg-620-1 ~]# hdfs dfs -du /dev/edl/sc/consumer/lpfg/str/edf/abc/ > du: Permission denied: user=systest, access=READ_EXECUTE, > inode="/dev/edl/sc/consumer/lpfg/str/lpfg_wrk/PRISMA_TO_ICERTIS_OUTBOUND_RM_MASTER/_impala_insert_staging":impala:hive:drwxrwx--x{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13739) Add option to disable rack local write preference
[ https://issues.apache.org/jira/browse/HDFS-13739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17039675#comment-17039675 ] Hudson commented on HDFS-13739: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17964 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17964/]) HDFS-13739. Add option to disable rack local write preference. (ayushsaxena: rev ac4b556e2d44d3cd10b81c190ecee23e2dd66c10) * (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/AddBlockFlag.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDefaultBlockPlacementPolicy.java * (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/CreateFlag.java * (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java * (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDistributedFileSystem.java > Add option to disable rack local write preference > - > > Key: HDFS-13739 > URL: https://issues.apache.org/jira/browse/HDFS-13739 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer & mover, block placement, datanode, fs, > hdfs, hdfs-client, namenode, nn, performance >Affects Versions: 2.7.3 > Environment: Hortonworks HDP 2.6 >Reporter: Hari Sekhon >Assignee: Ayush Saxena >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-13739-01.patch > > > Request to be able to disable Rack Local Write preference / Write All > Replicas to different Racks. > Current HDFS write pattern of "local node, rack local node, other rack node" > is good for most purposes but there are at least 2 scenarios where this is > not ideal: > # Rack-by-Rack Maintenance leaves data at risk of losing last remaining > replica. If a single datanode failed it would likely cause some data outage > or even data loss if the rack is lost or an upgrade fails (or perhaps it's a > rack rebuild). Setting replicas to 4 would reduce write performance and waste > storage which is currently the only workaround to that issue. > # Major Storage Imbalance across datanodes when there is an uneven layout of > datanodes across racks - some nodes fill up while others are half empty. > I have observed this storage imbalance on a cluster where half the nodes were > 85% full and the other half were only 50% full. > Rack layouts like the following illustrate this - the nodes in the same rack > will only choose to send half their block replicas to each other, so they > will fill up first, while other nodes will receive far fewer replica blocks: > {code:java} > NumNodes - Rack > 2 - rack 1 > 2 - rack 2 > 1 - rack 3 > 1 - rack 4 > 1 - rack 5 > 1 - rack 6{code} > In this case if I reduce the number of replicas to 2 then I get an almost > perfect spread of blocks across all datanodes because HDFS has no choice but > to maintain the only 2nd replica on a different rack. If I increase the > replicas back to 3 it goes back to 85% on half the nodes and 50% on the other > half, because the extra replicas choose to replicate only to rack local nodes. > Why not just run the HDFS balancer to fix it you might say? This is a heavily > loaded HBase cluster - aside from destroying HBase's data locality and > performance by moving blocks out from underneath RegionServers - as soon as > an HBase major compaction occurs (at least weekly), all blocks will get > re-written by HBase and the HDFS client will again write to local node, rack > local node, other rack node - resulting in the same storage imbalance again. > Hence this cannot be solved by running HDFS balancer on HBase clusters - or > for any application sitting on top of HDFS that has any HDFS block churn. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15173) RBF: Delete repeated configuration 'dfs.federation.router.metrics.enable'
[ https://issues.apache.org/jira/browse/HDFS-15173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17038121#comment-17038121 ] Hudson commented on HDFS-15173: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17958 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17958/]) HDFS-15173. RBF: Delete repeated configuration (github: rev 439d935e1df601ed998521443fbe6752040e7a84) * (edit) hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/resources/hdfs-rbf-default.xml > RBF: Delete repeated configuration 'dfs.federation.router.metrics.enable' > - > > Key: HDFS-15173 > URL: https://issues.apache.org/jira/browse/HDFS-15173 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: documentation, rbf >Affects Versions: 3.1.1, 3.2.1 >Reporter: panlijie >Assignee: panlijie >Priority: Minor > Fix For: 3.3.0, 3.1.4, 3.2.2 > > > In The HDFS RBF default config hdfs-rbf-default.xml, The configuration > contains two repeated configurations, 'dfs.federation.router.metrics.enable' > appears twice. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15135) EC : ArrayIndexOutOfBoundsException in BlockRecoveryWorker#RecoveryTaskStriped.
[ https://issues.apache.org/jira/browse/HDFS-15135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17037721#comment-17037721 ] Hudson commented on HDFS-15135: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17957 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17957/]) HDFS-15135. EC : ArrayIndexOutOfBoundsException in (surendralilhore: rev 810783d443cce4dd560acfc3e652a912d57d6a77) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockRecovery.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockRecoveryWorker.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolClientSideTranslatorPB.java > EC : ArrayIndexOutOfBoundsException in > BlockRecoveryWorker#RecoveryTaskStriped. > --- > > Key: HDFS-15135 > URL: https://issues.apache.org/jira/browse/HDFS-15135 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Reporter: Surendra Singh Lilhore >Assignee: Ravuri Sushma sree >Priority: Major > Attachments: HDFS-15135.001.patch, HDFS-15135.002.patch, > HDFS-15135.003.patch, HDFS-15135.004.patch, HDFS-15135.005.patch > > > {noformat} > java.lang.ArrayIndexOutOfBoundsException: 8 >at > org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$RecoveryTaskStriped.recover(BlockRecoveryWorker.java:464) >at > org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$1.run(BlockRecoveryWorker.java:602) >at java.lang.Thread.run(Thread.java:745) {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15164) Fix TestDelegationTokensWithHA
[ https://issues.apache.org/jira/browse/HDFS-15164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17037116#comment-17037116 ] Hudson commented on HDFS-15164: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17956 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17956/]) HDFS-15164. Fix TestDelegationTokensWithHA. Contributed by Ayush Saxena. (ayushsaxena: rev c75756fe130a50905a195799ab2f8ba4329961fc) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestDelegationTokensWithHA.java > Fix TestDelegationTokensWithHA > -- > > Key: HDFS-15164 > URL: https://issues.apache.org/jira/browse/HDFS-15164 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Fix For: 3.3.0, 3.1.4, 3.2.2, 2.10.1 > > Attachments: HDFS-15164-01.patch, HDFS-15164-02.patch > > > {noformat} > java.lang.AssertionError > at org.junit.Assert.fail(Assert.java:86) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.junit.Assert.assertTrue(Assert.java:52) > at > org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA.testObserverReadProxyProviderWithDT(TestDelegationTokensWithHA.java:156){noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15086) Block scheduled counter never get decremet if the block got deleted before replication.
[ https://issues.apache.org/jira/browse/HDFS-15086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17036157#comment-17036157 ] Hudson commented on HDFS-15086: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17951 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17951/]) HDFS-15086. Block scheduled counter never get decremet if the block got (surendralilhore: rev a98352ced18e51003b443e1a652d19ec00b2f2d2) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/PendingReconstructionBlocks.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestBlocksScheduledCounter.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestPendingReconstruction.java > Block scheduled counter never get decremet if the block got deleted before > replication. > --- > > Key: HDFS-15086 > URL: https://issues.apache.org/jira/browse/HDFS-15086 > Project: Hadoop HDFS > Issue Type: Improvement > Components: 3.1.1 >Reporter: Surendra Singh Lilhore >Assignee: hemanthboyina >Priority: Major > Attachments: HDFS-15086.001.patch, HDFS-15086.002.patch, > HDFS-15086.003.patch, HDFS-15086.004.patch, HDFS-15086.005.patch > > > If the block is scheduled for replication and same file get deleted then this > type of block will be reported as a bad block from DN. > For this failed replication work scheduled block counter never get decrement. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13989) RBF: Add FSCK to the Router
[ https://issues.apache.org/jira/browse/HDFS-13989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17035839#comment-17035839 ] Hudson commented on HDFS-13989: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17948 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17948/]) HDFS-13989. RBF: Add FSCK to the Router (#1832) (github: rev 0ddb5f0881dee26d9258b3d5f4e0ac3733727820) * (edit) hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterHttpServer.java * (add) hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterFsck.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/DfsServlet.java * (add) hadoop-hdfs-project/hadoop-hdfs-rbf/src/test/java/org/apache/hadoop/hdfs/server/federation/router/TestRouterFsck.java * (add) hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterFsckServlet.java > RBF: Add FSCK to the Router > --- > > Key: HDFS-13989 > URL: https://issues.apache.org/jira/browse/HDFS-13989 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Íñigo Goiri >Assignee: Akira Ajisaka >Priority: Major > Attachments: HDFS-13989.001.patch > > > The namenode supports FSCK. > The Router should be able to forward FSCK to the right Namenode and aggregate > the results. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15161) When evictableMmapped or evictable size is zero, do not throw NoSuchElementException in ShortCircuitCache#close()
[ https://issues.apache.org/jira/browse/HDFS-15161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17035447#comment-17035447 ] Hudson commented on HDFS-15161: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17947 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17947/]) HDFS-15161. When evictableMmapped or evictable size is zero, do not (ayushsaxena: rev f09710bbb8e56d066f9d7a2e70a41ed82d5aa781) * (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/shortcircuit/ShortCircuitCache.java > When evictableMmapped or evictable size is zero, do not throw > NoSuchElementException in ShortCircuitCache#close() > -- > > Key: HDFS-15161 > URL: https://issues.apache.org/jira/browse/HDFS-15161 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Lisheng Sun >Assignee: Lisheng Sun >Priority: Major > Fix For: 3.3.0, 2.9.3, 3.1.4, 3.2.2, 2.10.1 > > Attachments: HDFS-15161.001.patch, HDFS-15161.002.patch > > > detail see HDFS-14541 > {code:java} > /** > * Close the cache and free all associated resources. > */ > @Override > public void close() { > try { > lock.lock(); > if (closed) return; > closed = true; > LOG.info(this + ": closing"); > maxNonMmappedEvictableLifespanMs = 0; > maxEvictableMmapedSize = 0; > // Close and join cacheCleaner thread. > IOUtilsClient.cleanupWithLogger(LOG, cacheCleaner); > // Purge all replicas. > while (true) { > Object eldestKey; > try { > eldestKey = evictable.firstKey(); > } catch (NoSuchElementException e) { > break; > } > purge((ShortCircuitReplica)evictable.get(eldestKey)); > } > while (true) { > Object eldestKey; > try { > eldestKey = evictableMmapped.firstKey(); > } catch (NoSuchElementException e) { > break; > } > purge((ShortCircuitReplica)evictableMmapped.get(eldestKey)); > } > } finally { > lock.unlock(); > } > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15127) RBF: Do not allow writes when a subcluster is unavailable for HASH_ALL mount points.
[ https://issues.apache.org/jira/browse/HDFS-15127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17035404#comment-17035404 ] Hudson commented on HDFS-15127: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17945 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17945/]) HDFS-15127. RBF: Do not allow writes when a subcluster is unavailable (ayushsaxena: rev 3df0adaaea485bcbd4ae1a04fe160f3148c14437) * (edit) hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterRpcServer.java * (edit) hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterRpcClient.java * (edit) hadoop-hdfs-project/hadoop-hdfs-rbf/src/test/java/org/apache/hadoop/hdfs/server/federation/router/TestRouterFaultTolerant.java > RBF: Do not allow writes when a subcluster is unavailable for HASH_ALL mount > points. > > > Key: HDFS-15127 > URL: https://issues.apache.org/jira/browse/HDFS-15127 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Íñigo Goiri >Assignee: Íñigo Goiri >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-15127.000.patch, HDFS-15127.001.patch, > HDFS-15127.002.patch, HDFS-15127.003.patch > > > A HASH_ALL mount point should not allow creating new files if one subcluster > is down. > If the file already existed in the past, this could lead to inconsistencies. > We should return an unavailable exception. > {{TestRouterFaultTolerant#testWriteWithFailedSubcluster()}} needs to be > changed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14758) Decrease lease hard limit
[ https://issues.apache.org/jira/browse/HDFS-14758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17034789#comment-17034789 ] Hudson commented on HDFS-14758: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17942 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17942/]) HDFS-14758. Make lease hard limit configurable and reduce the default. (kihwal: rev 9b8a78d97bfd825ce840c6033371c7f10e49a5b8) * (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/client/HdfsClientConfigKeys.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestLeaseRecovery2.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/LeaseManager.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestLease.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileAppend4.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFileTruncate.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/snapshot/TestINodeFileUnderConstructionWithSnapshot.java * (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java * (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/client/impl/DfsClientConf.java * (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsConstants.java > Decrease lease hard limit > - > > Key: HDFS-14758 > URL: https://issues.apache.org/jira/browse/HDFS-14758 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Eric Payne >Assignee: hemanthboyina >Priority: Minor > Attachments: HDFS-14758.001.patch, HDFS-14758.002.patch, > HDFS-14758.003.patch, HDFS-14758.004.patch, HDFS-14758.005.patch, > HDFS-14758.005.patch, HDFS-14758.006.patch > > > The hard limit is currently hard-coded to be 1 hour. This also determines the > NN automatic lease recovery interval. Something like 20 min will make more > sense. > After the 5 min soft limit, other clients can recover the lease. If no one > else takes the lease away, the original client still can renew the lease > within the hard limit. So even after a NN full GC of 8 minutes, leases can be > still valid. > However, there is one risk in reducing the hard limit. E.g. Reduced to 20 > min. If the NN crashes and the manual failover takes more than 20 minutes, > clients will abort. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15150) Introduce read write lock to Datanode
[ https://issues.apache.org/jira/browse/HDFS-15150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17034582#comment-17034582 ] Hudson commented on HDFS-15150: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17940 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17940/]) HDFS-15150. Introduce read write lock to Datanode. Contributed Stephen (weichiu: rev d7c136b9ed6d99e1b03f5b89723b3a20df359ba8) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImplTestUtils.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestFsVolumeList.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestProvidedImpl.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/ReplicaMap.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/extdataset/ExternalDatasetImpl.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/FsDatasetSpi.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml * (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/InstrumentedReadWriteLock.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/BlockPoolSlice.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestWriteToReplica.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestReplicaMap.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestInterDatanodeProtocol.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/ProvidedVolumeImpl.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java > Introduce read write lock to Datanode > - > > Key: HDFS-15150 > URL: https://issues.apache.org/jira/browse/HDFS-15150 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.3.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-15150.001.patch, HDFS-15150.002.patch, > HDFS-15150.003.patch > > > HDFS-9668 pointed out the issues around the DN lock being a point of > contention some time ago, but that Jira went in a direction of creating a new > FSDataset implementation which is very risky, and activity on the Jira has > stalled for a few years now. Edit: Looks like HDFS-9668 eventually went in a > similar direction to what I was thinking, so I will review that Jira in more > detail to see if this one is necessary. > I feel there could be significant gains by moving to a ReentrantReadWrite > lock within the DN. The current implementation is simply a ReentrantLock so > any locker blocks all others. > Once place I think a read lock would benefit us significantly, is when the DN > is serving a lot of small blocks and there are jobs which perform a lot of > reads. The start of reading any blocks right now takes the lock, but if we > moved this to a read lock, many reads could do this at the same time. > The first conservative step, would be to change the current lock and then > make all accesses to it obtain the write lock. That way, we should keep the > current behaviour and then we can selectively move some lock accesses to the > readlock in separate Jiras. > I would appreciate any thoughts on this, and also if anyone has attempted it > before and found any blockers. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15158) The number of failed volumes mismatch with volumeFailures of Datanode metrics
[ https://issues.apache.org/jira/browse/HDFS-15158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17033273#comment-17033273 ] Hudson commented on HDFS-15158: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17934 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17934/]) HDFS-15158. The number of failed volumes mismatch with volumeFailures of (ayushsaxena: rev 6191d4b4a0919863fda78e549ab6c60022e3ebc2) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeVolumeFailure.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/metrics/DataNodeMetrics.java > The number of failed volumes mismatch with volumeFailures of Datanode > metrics > --- > > Key: HDFS-15158 > URL: https://issues.apache.org/jira/browse/HDFS-15158 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Yang Yun >Assignee: Yang Yun >Priority: Minor > Fix For: 3.3.0 > > Attachments: HDFS-15158.patch, HDFS-15158.patch, HDFS-15158.patch > > > The metrics of Datanode only increment 1, even If more than one volume fails > during a disk check. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15115) Namenode crash caused by NPE in BlockPlacementPolicyDefault when dynamically change logger to debug
[ https://issues.apache.org/jira/browse/HDFS-15115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17032826#comment-17032826 ] Hudson commented on HDFS-15115: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17931 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17931/]) HDFS-15115. Namenode crash caused by NPE in BlockPlacementPolicyDefault (ayushsaxena: rev d23317b1024c90be0d22adbd9d4db094bf6c8cb8) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java * (add) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockPlacementPolicyDebugLoggingBuilder.java > Namenode crash caused by NPE in BlockPlacementPolicyDefault when dynamically > change logger to debug > --- > > Key: HDFS-15115 > URL: https://issues.apache.org/jira/browse/HDFS-15115 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: wangzhixiang >Assignee: wangzhixiang >Priority: Major > Fix For: 3.3.0, 3.2.2 > > Attachments: HDFS-15115.001.patch, HDFS-15115.003.patch, > HDFS-15115.004.patch, HDFS-15115.005.patch, HDFS-15115.2.patch > > > To get debug info, we dynamically change the logger of > BlockPlacementPolicyDefault to debug when namenode is running. However, the > Namenode crashs. From the log, we find some NPE in > BlockPlacementPolicyDefault.chooseRandom. Because *StringBuilder builder* > will be used 4 times in BlockPlacementPolicyDefault.chooseRandom method. > While the *builder* only initializes in the first time of this method. If we > change the logger of BlockPlacementPolicyDefault to debug after the part, the > *builder* in remaining part is *NULL* and cause *NPE* -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15136) LOG flooding in secure mode when Cookies are not set in request header
[ https://issues.apache.org/jira/browse/HDFS-15136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17032634#comment-17032634 ] Hudson commented on HDFS-15136: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17930 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17930/]) HDFS-15136. LOG flooding in secure mode when Cookies are not set in (ayushsaxena: rev 23787e4bddc84f576daa1299f5a99c8cc753aecb) * (edit) hadoop-common-project/hadoop-auth/src/main/java/org/apache/hadoop/security/authentication/client/AuthenticatedURL.java > LOG flooding in secure mode when Cookies are not set in request header > -- > > Key: HDFS-15136 > URL: https://issues.apache.org/jira/browse/HDFS-15136 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Renukaprasad C >Assignee: Renukaprasad C >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-15136.0001.patch, HDFS-15136.0002.patch, > HDFS-15136.0003.patch > > > In debug mode below exception gets logged when Cookie is not set in the > request header. This exception stack gets repeated and and has no meaning > here. > Instead, log the error in debug mode and continue without throw/catch/log of > exception. > 2020-01-20 18:25:57,792 DEBUG > org.apache.hadoop.security.UserGroupInformation: PrivilegedAction > as:test/t...@hadoop.com (auth:KERBEROS) > from:org.apache.hadoop.security.SecurityUtil.doAsUser(SecurityUtil.java:518) > 2020-01-20 18:25:57,792 DEBUG > org.apache.hadoop.hdfs.web.URLConnectionFactory: open AuthenticatedURL > connection > https://IP:PORT/getJournal?jid=hacluster&segmentTxId=295&storageInfo=-64%3A39449123%3A1579244618105%3Amyhacluster&inProgressOk=true > 2020-01-20 18:25:57,803 DEBUG > org.apache.hadoop.security.authentication.client.KerberosAuthenticator: JDK > performed authentication on our behalf. > 2020-01-20 18:25:57,803 DEBUG > org.apache.hadoop.security.authentication.client.AuthenticatedURL: Cannot > parse cookie header: > java.lang.IllegalArgumentException: Empty cookie header string > at java.net.HttpCookie.parseInternal(HttpCookie.java:826) > at java.net.HttpCookie.parse(HttpCookie.java:202) > at java.net.HttpCookie.parse(HttpCookie.java:178) > at > org.apache.hadoop.security.authentication.client.AuthenticatedURL$AuthCookieHandler.put(AuthenticatedURL.java:99) > at > org.apache.hadoop.security.authentication.client.AuthenticatedURL.extractToken(AuthenticatedURL.java:390) > at > org.apache.hadoop.security.authentication.client.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:197) > at > org.apache.hadoop.security.authentication.client.AuthenticatedURL.openConnection(AuthenticatedURL.java:348) > at > org.apache.hadoop.hdfs.web.URLConnectionFactory.openConnection(URLConnectionFactory.java:186) > at > org.apache.hadoop.hdfs.server.namenode.EditLogFileInputStream$URLLog$1.run(EditLogFileInputStream.java:470) > at > org.apache.hadoop.hdfs.server.namenode.EditLogFileInputStream$URLLog$1.run(EditLogFileInputStream.java:464) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) > at > org.apache.hadoop.security.SecurityUtil.doAsUser(SecurityUtil.java:518) > at > org.apache.hadoop.security.SecurityUtil.doAsCurrentUser(SecurityUtil.java:512) > at > org.apache.hadoop.hdfs.server.namenode.EditLogFileInputStream$URLLog.getInputStream(EditLogFileInputStream.java:463) > at > org.apache.hadoop.hdfs.server.namenode.EditLogFileInputStream.init(EditLogFileInputStream.java:157) > at > org.apache.hadoop.hdfs.server.namenode.EditLogFileInputStream.nextOpImpl(EditLogFileInputStream.java:208) > at > org.apache.hadoop.hdfs.server.namenode.EditLogFileInputStream.nextOp(EditLogFileInputStream.java:266) > at > org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.readOp(EditLogInputStream.java:85) > at > org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.skipUntil(EditLogInputStream.java:151) > at > org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream.nextOp(RedundantEditLogInputStream.java:198) > at > org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.readOp(EditLogInputStream.java:85) > at > org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.skipUntil(EditLogInputStream.java:151) > at > org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream.nextOp(RedundantEditLogInputStream.java:198) > at > org.apache.hadoop.hdfs.server.namenode.EditLogInputStr
[jira] [Commented] (HDFS-15148) dfs.namenode.send.qop.enabled should not apply to primary NN port
[ https://issues.apache.org/jira/browse/HDFS-15148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17030102#comment-17030102 ] Hudson commented on HDFS-15148: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17925 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17925/]) HDFS-15148. dfs.namenode.send.qop.enabled should not apply to primary NN (cliang: rev ce7b8b5634ef84602019cac4ce52337fbe4f9d42) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestMultipleNNPortQOP.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/security/token/block/BlockTokenSecretManager.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestBlockTokenWrappingQOP.java * (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java > dfs.namenode.send.qop.enabled should not apply to primary NN port > - > > Key: HDFS-15148 > URL: https://issues.apache.org/jira/browse/HDFS-15148 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.10.1, 3.3.1 >Reporter: Chen Liang >Assignee: Chen Liang >Priority: Major > Attachments: HDFS-15148.001.patch, HDFS-15148.002.patch, > HDFS-15148.003.patch, HDFS-15148.004.patch > > > In HDFS-13617, NameNode can be configured to wrap its established QOP into > block access token as an encrypted message. Later on DataNode will use this > message to create SASL connection. But this new behavior should only apply to > new auxiliary NameNode ports, not the primary port (the one configured in > fs.defaultFS), as it may cause conflicting behavior with existing other SASL > related configuration (e.g. dfs.data.transfer.protection). Since this > configure is introduced for to auxiliary ports only, we should restrict this > new behavior to not apply to primary port. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12491) Support wildcard in CLASSPATH for libhdfs
[ https://issues.apache.org/jira/browse/HDFS-12491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17030053#comment-17030053 ] Hudson commented on HDFS-12491: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17924 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17924/]) HDFS-12491. Support wildcard in CLASSPATH for libhdfs. Contributed by (kihwal: rev 10a60fbe20bb08cdd71076ea9bf2ebb3a2f6226e) * (edit) hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/jni_helper.c * (edit) hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/jni_helper.h * (edit) hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/LibHdfs.md > Support wildcard in CLASSPATH for libhdfs > - > > Key: HDFS-12491 > URL: https://issues.apache.org/jira/browse/HDFS-12491 > Project: Hadoop HDFS > Issue Type: Improvement > Components: libhdfs >Affects Versions: 2.8.0 >Reporter: John Zhuge >Assignee: Muhammad Samir Khan >Priority: Major > Fix For: 3.3.0, 3.1.4, 3.2.2, 2.10.1 > > Attachments: HDFS-12491.001.patch, HDFS-12491.002.patch, > testWildCard.sh > > > According to libhdfs doc, wildcard in CLASSPATH is not support: > bq. The most common problem is the CLASSPATH is not set properly when calling > a program that uses libhdfs. Make sure you set it to all the Hadoop jars > needed to run Hadoop itself as well as the right configuration directory > containing hdfs-site.xml. It is not valid to use wildcard syntax for > specifying multiple jars. It may be useful to run hadoop classpath --glob or > hadoop classpath --jar to generate the correct classpath for your > deployment. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7175) Client-side SocketTimeoutException during Fsck
[ https://issues.apache.org/jira/browse/HDFS-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17027905#comment-17027905 ] Hudson commented on HDFS-7175: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17923 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17923/]) HDFS-7175. Client-side SocketTimeoutException during Fsck. Contributed (weichiu: rev 1e3a0b0d931676b191cb4813ed1a283ebb24d4eb) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NamenodeFsck.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSCommands.md > Client-side SocketTimeoutException during Fsck > -- > > Key: HDFS-7175 > URL: https://issues.apache.org/jira/browse/HDFS-7175 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.3.0 >Reporter: Carl Steinbach >Assignee: Stephen O'Donnell >Priority: Major > Fix For: 3.3.0, 3.1.4, 3.2.2 > > Attachments: HDFS-7157.004.patch, HDFS-7175.2.patch, > HDFS-7175.3.patch, HDFS-7175.patch, HDFS-7175.patch > > > HDFS-2538 disabled status reporting for the fsck command (it can optionally > be enabled with the -showprogress option). We have observed that without > status reporting the client will abort with read timeout: > {noformat} > [hdfs@lva1-hcl0030 ~]$ hdfs fsck / > Connecting to namenode via http://lva1-tarocknn01.grid.linkedin.com:50070 > 14/09/30 06:03:41 WARN security.UserGroupInformation: > PriviledgedActionException as:h...@grid.linkedin.com (auth:KERBEROS) > cause:java.net.SocketTimeoutException: Read timed out > Exception in thread "main" java.net.SocketTimeoutException: Read timed out > at java.net.SocketInputStream.socketRead0(Native Method) > at java.net.SocketInputStream.read(SocketInputStream.java:152) > at java.net.SocketInputStream.read(SocketInputStream.java:122) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) > at java.io.BufferedInputStream.read(BufferedInputStream.java:334) > at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687) > at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633) > at > sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323) > at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:312) > at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:72) > at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:149) > at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:146) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:145) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) > at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:346) > {noformat} > Since there's nothing for the client to read it will abort if the time > required to complete the fsck operation is longer than the client's read > timeout setting. > I can think of a couple ways to fix this: > # Set an infinite read timeout on the client side (not a good idea!). > # Have the server-side write (and flush) zeros to the wire and instruct the > client to ignore these characters instead of echoing them. > # It's possible that flushing an empty buffer on the server-side will trigger > an HTTP response with a zero length payload. This may be enough to keep the > client from hanging up. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15146) TestBalancerRPCDelay.testBalancerRPCDelay fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-15146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17026057#comment-17026057 ] Hudson commented on HDFS-15146: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17918 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17918/]) HDFS-15146. TestBalancerRPCDelay.testBalancerRPCDelay fails (kihwal: rev 799d4c1cf4e8fe78eb9ab607a0449cdd075041fb) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestBalancer.java > TestBalancerRPCDelay.testBalancerRPCDelay fails intermittently > -- > > Key: HDFS-15146 > URL: https://issues.apache.org/jira/browse/HDFS-15146 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Minor > Fix For: 3.2.2, 2.10.1, 3.3.1, 3.4.0 > > Attachments: HDFS-15146-branch-2.10.001.patch, HDFS-15146.001.patch > > > TestBalancerRPCDelay.testBalancerRPCDelay fails intermittently when the > number of blocks does not match the expected. In {{testBalancerRPCDelay}}, it > seems like some datanodes will not be up by the time we fetch the block > locations. > I see the following stack trace: > {code:bash} > [ERROR] Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: > 39.969 s <<< FAILURE! - in > org.apache.hadoop.hdfs.server.balancer.TestBalancerRPCDelay > [ERROR] > testBalancerRPCDelayQpsDefault(org.apache.hadoop.hdfs.server.balancer.TestBalancerRPCDelay) > Time elapsed: 12.035 s <<< FAILURE! > java.lang.AssertionError: Number of getBlocks should be not less than 20 > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.assertTrue(Assert.java:41) > at > org.apache.hadoop.hdfs.server.balancer.TestBalancer.testBalancerRPCDelay(TestBalancer.java:2197) > at > org.apache.hadoop.hdfs.server.balancer.TestBalancerRPCDelay.testBalancerRPCDelayQpsDefault(TestBalancerRPCDelay.java:53) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13179) TestLazyPersistReplicaRecovery#testDnRestartWithSavedReplicas fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-13179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17025364#comment-17025364 ] Hudson commented on HDFS-13179: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17910 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17910/]) HDFS-13179. (inigoiri: rev 1839c467f60cbb8592d446694ec3d7710cda5142) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestLazyPersistReplicaRecovery.java > TestLazyPersistReplicaRecovery#testDnRestartWithSavedReplicas fails > intermittently > -- > > Key: HDFS-13179 > URL: https://issues.apache.org/jira/browse/HDFS-13179 > Project: Hadoop HDFS > Issue Type: Bug > Components: fs >Affects Versions: 3.0.0 >Reporter: Gabor Bota >Assignee: Ahmed Hussein >Priority: Critical > Fix For: 3.3.0 > > Attachments: HDFS-13179.001.patch, HDFS-13179.002.patch, > HDFS-13179.003.patch, test runs.zip > > > The error caused by TimeoutException because the test is waiting to ensure > that the file is replicated to DISK storage but the replication can't be > finished to DISK during the 30s timeout in ensureFileReplicasOnStorageType(), > but the file is still on RAM_DISK - so there is no data loss. > Adding the following to TestLazyPersistReplicaRecovery.java:56 essentially > fixes the flakiness. > {code:java} > try { > ensureFileReplicasOnStorageType(path1, DEFAULT); > }catch (TimeoutException t){ > LOG.warn("We got \"" + t.getMessage() + "\" so trying to find data on > RAM_DISK"); > ensureFileReplicasOnStorageType(path1, RAM_DISK); > } > } > {code} > Some thoughts: > * Successful and failed tests run similar to the point when datanode > restarts. Restart line is the following in the log: LazyPersistTestCase - > Restarting the DataNode > * There is a line which only occurs in the failed test: *addStoredBlock: > Redundant addStoredBlock request received for blk_1073741825_1001 on node > 127.0.0.1:49455 size 5242880* > * This redundant request at BlockManager#addStoredBlock could be the main > reason for the test fail. Something wrong with the gen stamp? Corrupt > replicas? > = > Current fail ratio based on my test of TestLazyPersistReplicaRecovery: > 1000 runs, 34 failures (3.4% fail) > Failure rate analysis: > TestLazyPersistReplicaRecovery.testDnRestartWithSavedReplicas: 3.4% > 33 failures caused by: {noformat} > java.util.concurrent.TimeoutException: Timed out waiting for condition. > Thread diagnostics: Timestamp: 2018-01-05 11:50:34,964 "IPC Server handler 6 > on 39589" > {noformat} > 1 failure caused by: {noformat} > java.net.BindException: Problem binding to [localhost:56729] > java.net.BindException: Address already in use; For more details see: > http://wiki.apache.org/hadoop/BindException at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaRecovery.testDnRestartWithSavedReplicas(TestLazyPersistReplicaRecovery.java:49) > Caused by: java.net.BindException: Address already in use at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaRecovery.testDnRestartWithSavedReplicas(TestLazyPersistReplicaRecovery.java:49) > {noformat} > = > Example stacktrace: > {noformat} > Timed out waiting for condition. Thread diagnostics: > Timestamp: 2017-11-01 10:36:49,499 > "Thread-1" prio=5 tid=13 runnable > java.lang.Thread.State: RUNNABLE > at java.lang.Thread.dumpThreads(Native Method) > at java.lang.Thread.getAllStackTraces(Thread.java:1610) > at > org.apache.hadoop.test.TimedOutTestsListener.buildThreadDump(TimedOutTestsListener.java:87) > at > org.apache.hadoop.test.TimedOutTestsListener.buildThreadDiagnosticString(TimedOutTestsListener.java:73) > at org.apache.hadoop.test.GenericTestUtils.waitFor(GenericTestUtils.java:369) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.LazyPersistTestCase.ensureFileReplicasOnStorageType(LazyPersistTestCase.java:140) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaRecovery.testDnRestartWithSavedReplicas(TestLazyPersistReplicaRecovery.java:54) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > ... > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) ---
[jira] [Commented] (HDFS-15145) HttpFS: getAclStatus() returns permission as null
[ https://issues.apache.org/jira/browse/HDFS-15145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17025354#comment-17025354 ] Hudson commented on HDFS-15145: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17909 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17909/]) HDFS-15145. HttpFS: getAclStatus() returns permission as null. (inigoiri: rev 061421fc6d66405e7109d17b8818ea023ef3acc2) * (edit) hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/java/org/apache/hadoop/fs/http/server/FSOperations.java * (edit) hadoop-hdfs-project/hadoop-hdfs-httpfs/src/test/java/org/apache/hadoop/fs/http/client/BaseTestHttpFSWith.java * (edit) hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/java/org/apache/hadoop/fs/http/client/HttpFSFileSystem.java > HttpFS: getAclStatus() returns permission as null > - > > Key: HDFS-15145 > URL: https://issues.apache.org/jira/browse/HDFS-15145 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: hemanthboyina >Assignee: hemanthboyina >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-15145.001.patch, HDFS-15145.002.patch > > > getAclStatus always return permission as null -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14993) checkDiskError doesn't work during datanode startup
[ https://issues.apache.org/jira/browse/HDFS-14993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17025336#comment-17025336 ] Hudson commented on HDFS-14993: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17907 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17907/]) HDFS-14993. checkDiskError doesn't work during datanode startup. (ayushsaxena: rev 87c198468bb6a6312bbb27b174c18822b6b9ccf8) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeVolumeFailure.java > checkDiskError doesn't work during datanode startup > --- > > Key: HDFS-14993 > URL: https://issues.apache.org/jira/browse/HDFS-14993 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Yang Yun >Assignee: Yang Yun >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-14993.patch, HDFS-14993.patch, HDFS-14993.patch > > > the function checkDiskError() is called before addBlockPool, but list > bpSlices is empty this time. So the function check() in FsVolumeImpl.java > does nothing. > @Override > public VolumeCheckResult check(VolumeCheckContext ignored) > throws DiskErrorException { > // TODO:FEDERATION valid synchronization > for (BlockPoolSlice s : bpSlices.values()) { > s.checkDirs(); > } > return VolumeCheckResult.HEALTHY; > } -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15143) LocatedStripedBlock returns wrong block type
[ https://issues.apache.org/jira/browse/HDFS-15143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17025154#comment-17025154 ] Hudson commented on HDFS-15143: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17903 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17903/]) HDFS-15143. LocatedStripedBlock returns wrong block type. Contributed by (ayushsaxena: rev f876dc228b263d823baf6ab198d211e57e100985) * (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/LocatedStripedBlock.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/util/TestStripedBlockUtil.java > LocatedStripedBlock returns wrong block type > > > Key: HDFS-15143 > URL: https://issues.apache.org/jira/browse/HDFS-15143 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-15143-01.patch, HDFS-15143-02.patch > > > LocatedStripedBlock returns block type as {{CONTIGUOUS}} which actually > should be {{STRIPED}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15128) Unit test failing to clean testing data and crashed future Maven test run due to failure in TestDataNodeVolumeFailureToleration
[ https://issues.apache.org/jira/browse/HDFS-15128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023434#comment-17023434 ] Hudson commented on HDFS-15128: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17899 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17899/]) HDFS-15128. Unit test failing to clean testing data and crashed future (ayushsaxena: rev 6d008c0d39185f18dbec4676f4d0e7ef77104eb7) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeVolumeFailureToleration.java > Unit test failing to clean testing data and crashed future Maven test run due > to failure in TestDataNodeVolumeFailureToleration > --- > > Key: HDFS-15128 > URL: https://issues.apache.org/jira/browse/HDFS-15128 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, test >Affects Versions: 3.2.1 >Reporter: Ctest >Assignee: Ctest >Priority: Critical > Labels: easyfix, patch, test > Fix For: 3.3.0 > > Attachments: HDFS-15128-000.patch, HDFS-15128-001.patch > > > Actively-used test helper function `testVolumeConfig` in > `org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration` > chmod a directory with invalid perm 000 for testing purposes but later failed > to chmod back this directory with a valid perm if the assertion inside this > function failed. Any subsequent `mvn test` command would fail to run if this > test had failed before. It is because Maven failed to build itself as it did > not have permission to clean the temporarily-generated directory that has > perm 000. See below for the code snippet that is buggy. > {code:java} > try { > for (int i = 0; i < volumesFailed; i++) { > prepareDirToFail(dirs[i]); // this will chmod dirs[i] to perm 000 > } > restartDatanodes(volumesTolerated, manageDfsDirs); > } catch (DiskErrorException e) { > ... > } finally { > ... > } > > assertEquals(expectedBPServiceState, bpServiceState); > > for (File dir : dirs) { > FileUtil.chmod(dir.toString(), "755"); > } > } > {code} > The failure of the statement `assertEquals(expectedBPServiceState, > bpServiceState)` caused function to terminate without executing > `FileUtil.chmod(dir.toString(), "755")` for each temporary directory with > invalid perm 000 the test has created. > > *Consequence* > Any subsequent `mvn test` command would fail to run if this test had failed > before. It is because Maven failed to build itself since it does not have > permission to clean this temporarily-generated directory. For details of the > failure, see below: > {noformat} > [INFO] --- maven-antrun-plugin:1.7:run (create-log-dir) @ hadoop-hdfs --- > [INFO] Executing tasks > > main: > [delete] Deleting directory > /home/ctest/app/Ctest-Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data > [INFO] > > [INFO] BUILD FAILURE > [INFO] > > [INFO] Total time: 8.349 s > [INFO] Finished at: 2019-12-27T03:53:04-06:00 > [INFO] > > [ERROR] Failed to execute > goalorg.apache.maven.plugins:maven-antrun-plugin:1.7:run (create-log-dir) on > project hadoop-hdfs: An Ant BuildException has occured: Unable to delete > directory > /home/ctest/app/Ctest-Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1/current > [ERROR] around Ant part ... dir="/home/ctest/app/Ctest-Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data"/>... > @ 4:105 in > /home/ctest/app/Ctest-Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/antrun/build-main.xml > [ERROR] -> [Help 1] > [ERROR] > [ERROR] To see the full stack trace of the errors, re-run Maven with the -e > switch. > [ERROR] Re-run Maven using the -X switch to enable full debug logging. > [ERROR] > [ERROR] For more information about the errors and possible solutions, please > read the following articles: > [ERROR] [Help 1] > http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException{noformat} > > *Root Cause* > The test helper function > `org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration#testVolumeConfig` > purposely set the directory > `/home/ctest/app/Ctest-Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1/current` > to have perm 000. And at the end of this function, it changed the perm of > this directory to 755. However, there is an assertion in this function before > the perm was able to changed to 755. Once
[jira] [Commented] (HDFS-15119) Allow expiration of cached locations in DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-15119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023023#comment-17023023 ] Hudson commented on HDFS-15119: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17897 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17897/]) HDFS-15119. Allow expiration of cached locations in DFSInputStream. (kihwal: rev d10f77e3c91225f86ed9c0f0e6a9adf2e1434674) * (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/client/HdfsClientConfigKeys.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml * (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java * (add) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSInputStreamBlockLocations.java * (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java * (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/client/impl/DfsClientConf.java > Allow expiration of cached locations in DFSInputStream > -- > > Key: HDFS-15119 > URL: https://issues.apache.org/jira/browse/HDFS-15119 > Project: Hadoop HDFS > Issue Type: Improvement > Components: dfsclient >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Minor > Attachments: HDFS-15119.001.patch, HDFS-15119.002.patch, > HDFS-15119.003.patch > > > Staleness and other transient conditions can affect reads for a long time > since the block locations may not be re-fetched. It makes sense to make > cached locations to expire. > For example, we may not take advantage of local-reads since the nodes are > blacklisted and have not been updated. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15117) EC: Add getECTopologyResultForPolicies to DistributedFileSystem
[ https://issues.apache.org/jira/browse/HDFS-15117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17022093#comment-17022093 ] Hudson commented on HDFS-15117: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17893 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17893/]) HDFS-15117. EC: Add getECTopologyResultForPolicies to (ayushsaxena: rev 92c58901d767f4fea571274544a590608c911cb8) * (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/ECAdmin.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/proto/hdfs.proto * (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/ECTopologyVerifier.java * (add) hadoop-hdfs-project/hadoop-hdfs-rbf/src/test/java/org/apache/hadoop/hdfs/server/federation/router/TestRouterMultiRack.java * (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelperClient.java * (edit) hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/ErasureCoding.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDistributedFileSystem.java * (delete) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ECTopologyVerifierResult.java * (edit) hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterRpcServer.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirErasureCodingOp.java * (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/proto/ClientNamenodeProtocol.proto * (add) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/ECTopologyVerifierResult.java * (edit) hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterClientProtocol.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolServerSideTranslatorPB.java * (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolTranslatorPB.java * (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/proto/erasurecoding.proto * (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/test/java/org/apache/hadoop/hdfs/protocol/TestReadOnly.java * (edit) hadoop-hdfs-project/hadoop-hdfs-rbf/src/test/java/org/apache/hadoop/hdfs/server/federation/MiniRouterDFSCluster.java * (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java > EC: Add getECTopologyResultForPolicies to DistributedFileSystem > --- > > Key: HDFS-15117 > URL: https://issues.apache.org/jira/browse/HDFS-15117 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Labels: ec > Fix For: 3.3.0 > > Attachments: HDFS-15117-01.patch, HDFS-15117-02.patch, > HDFS-15117-03.patch, HDFS-15117-04.patch, HDFS-15117-05.patch, > HDFS-15117-06.patch, HDFS-15117-07.patch, HDFS-15117-08.patch > > > Add getECTopologyResultForPolicies API to distributed filesystem. > It is as of now only present as part of ECAdmin. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14968) Add ability to know datanode staleness
[ https://issues.apache.org/jira/browse/HDFS-14968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17021200#comment-17021200 ] Hudson commented on HDFS-14968: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17891 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17891/]) HDFS-14968. Add ability to log stale datanodes. Contributed by Ahmed (kihwal: rev bd03053ea2f32ef982e37fbf2ffd679cb7dda797) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java > Add ability to know datanode staleness > -- > > Key: HDFS-14968 > URL: https://issues.apache.org/jira/browse/HDFS-14968 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, logging, namenode >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Minor > Attachments: HDFS-14968.001.patch, HDFS-14968.002.patch, > HDFS-14968.003.patch > > > There is no way to know whether a DataNode was marked stale or no longer > stale by the NameNode. > It will be good to have the option to enable logging the DataNode staleness > to figure out if the staleness was the reason behind remote reads. > Therefore, analyze performance and decision making of the local vs remote > reads. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15092) TestRedudantBlocks#testProcessOverReplicatedAndRedudantBlock sometimes fails
[ https://issues.apache.org/jira/browse/HDFS-15092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17020590#comment-17020590 ] Hudson commented on HDFS-15092: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17885 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17885/]) HDFS-15092. TestRedudantBlocks#testProcessOverReplicatedAndRedudantBlock (inigoiri: rev 8cfc3673dcbf1901ca6fad11b5c996e54e32ed6b) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestRedudantBlocks.java > TestRedudantBlocks#testProcessOverReplicatedAndRedudantBlock sometimes fails > > > Key: HDFS-15092 > URL: https://issues.apache.org/jira/browse/HDFS-15092 > Project: Hadoop HDFS > Issue Type: Test > Components: test >Affects Versions: 3.3.0 >Reporter: Fei Hui >Assignee: Fei Hui >Priority: Minor > Fix For: 3.3.0 > > Attachments: HDFS-15092.001.patch, HDFS-15092.002.patch > > > TestRedudantBlocks#testProcessOverReplicatedAndRedudantBlock sometimes failed > {quote} > java.lang.AssertionError: > Expected :5 > Actual :4 > > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:631) > at > org.apache.hadoop.hdfs.server.namenode.TestRedudantBlocks.testProcessOverReplicatedAndRedudantBlock(TestRedudantBlocks.java:138) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at org.junit.runner.JUnitCore.run(JUnitCore.java:137) > at > com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68) > at > com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:51) > at > com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242) > at > com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70) > {quote} > Maybe we should increase sleep time -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15126) TestDatanodeRegistration#testForcedRegistration fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-15126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17020583#comment-17020583 ] Hudson commented on HDFS-15126: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17884 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17884/]) HDFS-15126. TestDatanodeRegistration#testForcedRegistration fails (inigoiri: rev b657822b98781f042fad5281c20123e803ebae0f) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDatanodeRegistration.java > TestDatanodeRegistration#testForcedRegistration fails intermittently > > > Key: HDFS-15126 > URL: https://issues.apache.org/jira/browse/HDFS-15126 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-15126.001.patch, HDFS-15126.002.patch, > HDFS-15126.003.patch, HDFS-15126.004.patch > > > Recently, {{TestDatanodeRegistration.testForcedRegistration}} started to fail > in the recommit builds: > {code:bash} > [ERROR] Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: > 25.333 s <<< FAILURE! - in org.apache.hadoop.hdfs.TestDatanodeRegistration > [ERROR] > testForcedRegistration(org.apache.hadoop.hdfs.TestDatanodeRegistration) Time > elapsed: 11.39 s <<< FAILURE! > java.lang.AssertionError > at org.junit.Assert.fail(Assert.java:86) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.junit.Assert.assertTrue(Assert.java:52) > at > org.apache.hadoop.hdfs.TestDatanodeRegistration.testForcedRegistration(TestDatanodeRegistration.java:385) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) > at > org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15112) RBF: Do not return FileNotFoundException when a subcluster is unavailable
[ https://issues.apache.org/jira/browse/HDFS-15112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017409#comment-17017409 ] Hudson commented on HDFS-15112: --- FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #17873 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17873/]) HDFS-15112. RBF: Do not return FileNotFoundException when a subcluster (inigoiri: rev 263413e83840c7795a988e3939cd292d020c8d5f) * (edit) hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterRpcServer.java * (edit) hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterRpcClient.java * (edit) hadoop-hdfs-project/hadoop-hdfs-rbf/src/test/java/org/apache/hadoop/hdfs/server/federation/router/TestRouterFaultTolerant.java > RBF: Do not return FileNotFoundException when a subcluster is unavailable > -- > > Key: HDFS-15112 > URL: https://issues.apache.org/jira/browse/HDFS-15112 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Íñigo Goiri >Assignee: Íñigo Goiri >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-15112.000.patch, HDFS-15112.001.patch, > HDFS-15112.002.patch, HDFS-15112.004.patch, HDFS-15112.005.patch, > HDFS-15112.006.patch, HDFS-15112.007.patch, HDFS-15112.008.patch, > HDFS-15112.009.patch, HDFS-15112.patch > > > If we have a mount point using HASH_ALL across two subclusters and one of > them is down, we may return FileNotFoundException while the file is just in > the unavailable subcluster. > We should not return FileNotFoundException but something that shows that the > subcluster is unavailable. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13616) Batch listing of multiple directories
[ https://issues.apache.org/jira/browse/HDFS-13616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17016452#comment-17016452 ] Hudson commented on HDFS-13616: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17868 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17868/]) HDFS-13616. Batch listing of multiple directories (#1725) (weichiu: rev d7c4f8ab21c56a52afcfbd0a56d9120e61376d0c) * (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/proto/hdfs.proto * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolServerSideTranslatorPB.java * (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/proto/ClientNamenodeProtocol.proto * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml * (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/test/java/org/apache/hadoop/hdfs/protocol/TestReadOnly.java * (edit) hadoop-hdfs-project/hadoop-hdfs-client/dev-support/findbugsExcludeFile.xml * (add) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/BatchedDirectoryListing.java * (edit) hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterRpcServer.java * (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java * (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java * (edit) hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterClientProtocol.java * (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolTranslatorPB.java * (edit) hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/TestHarFileSystem.java * (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java * (add) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/ListingBenchmark.java * (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java * (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelperClient.java * (add) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestBatchedListDirectories.java * (add) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsPartialListing.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java * (edit) hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/TestFilterFileSystem.java * (add) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/PartialListing.java > Batch listing of multiple directories > - > > Key: HDFS-13616 > URL: https://issues.apache.org/jira/browse/HDFS-13616 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 3.2.0 >Reporter: Andrew Wang >Assignee: Chao Sun >Priority: Major > Fix For: 3.3.0 > > Attachments: BenchmarkListFiles.java, HDFS-13616.001.patch, > HDFS-13616.002.patch > > > One of the dominant workloads for external metadata services is listing of > partition directories. This can end up being bottlenecked on RTT time when > partition directories contain a small number of files. This is fairly common, > since fine-grained partitioning is used for partition pruning by the query > engines. > A batched listing API that takes multiple paths amortizes the RTT cost. > Initial benchmarks show a 10-20x improvement in metadata loading performance. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15097) Purge log in KMS and HttpFS
[ https://issues.apache.org/jira/browse/HDFS-15097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17014589#comment-17014589 ] Hudson commented on HDFS-15097: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17860 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17860/]) HDFS-15097. Purge log in KMS and HttpFS. Contributed by Doris Gu. (weichiu: rev 6b86a5110e4b3f45bf55f97bd402680bf26cefb9) * (edit) hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/java/org/apache/hadoop/fs/http/server/HttpFSServerWebServer.java * (edit) hadoop-common-project/hadoop-kms/src/main/java/org/apache/hadoop/crypto/key/kms/server/KMSWebServer.java > Purge log in KMS and HttpFS > --- > > Key: HDFS-15097 > URL: https://issues.apache.org/jira/browse/HDFS-15097 > Project: Hadoop HDFS > Issue Type: Improvement > Components: httpfs, kms >Affects Versions: 3.0.3, 3.3.0, 3.2.1, 3.1.3 >Reporter: Doris Gu >Assignee: Doris Gu >Priority: Minor > Fix For: 3.3.0, 3.1.4, 3.2.2 > > Attachments: HDFS-15097.001.patch > > > KMS and HttpFS uses ConfigurationWithLogging instead of Configuration, which > logs a configuration object each access. It's more like a development use. > {code:java} > 2020-01-07 16:52:00,456 INFO org.apache.hadoop.conf.ConfigurationWithLogging: > Got hadoop.security.instrumentation.requires.admin = 'false' > 2020-01-07 16:52:00,456 INFO org.apache.hadoop.conf.ConfigurationWithLogging: > Got hadoop.security.instrumentation.requires.admin = 'false' (default > 'false') > 2020-01-07 16:52:15,091 INFO org.apache.hadoop.conf.ConfigurationWithLogging: > Got hadoop.security.instrumentation.requires.admin = 'false' > 2020-01-07 16:52:15,091 INFO org.apache.hadoop.conf.ConfigurationWithLogging: > Got hadoop.security.instrumentation.requires.admin = 'false' (default 'false') > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14578) AvailableSpaceBlockPlacementPolicy always prefers local node
[ https://issues.apache.org/jira/browse/HDFS-14578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17013386#comment-17013386 ] Hudson commented on HDFS-14578: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17853 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17853/]) HDFS-14578. AvailableSpaceBlockPlacementPolicy always prefers local (ayushsaxena: rev cebce0a348d60ded20eb4a55d1c26ee20017ed17) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml * (add) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestAvailableSpaceBPPBalanceLocal.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/AvailableSpaceBlockPlacementPolicy.java > AvailableSpaceBlockPlacementPolicy always prefers local node > > > Key: HDFS-14578 > URL: https://issues.apache.org/jira/browse/HDFS-14578 > Project: Hadoop HDFS > Issue Type: Bug > Components: block placement >Affects Versions: 2.8.0, 2.7.4, 3.0.0-alpha1 >Reporter: Wei-Chiu Chuang >Assignee: Ayush Saxena >Priority: Major > Fix For: 3.3.0, 3.1.4, 3.2.2 > > Attachments: HDFS-14578-02.patch, HDFS-14578-03.patch, > HDFS-14578-04.patch, HDFS-14578-05.patch, HDFS-14578-06.patch, > HDFS-14578-07.patch, HDFS-14578-WIP-01.patch, HDFS-14758-01.patch > > > It looks like AvailableSpaceBlockPlacementPolicy prefers local disk just like > in the BlockPlacementPolicyDefault > > As Yongjun mentioned in > [HDFS-8131|https://issues.apache.org/jira/browse/HDFS-8131?focusedCommentId=16558739&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16558739], > > {quote}Class AvailableSpaceBlockPlacementPolicy extends > BlockPlacementPolicyDefault. But it doesn't change the behavior of choosing > the first node in BlockPlacementPolicyDefault, so even with this new feature, > the local DN is always chosen as the first DN (of course when it is not > excluded), and the new feature only changes the selection of the rest of the > two DNs. > {quote} > I'm file this Jira as I groom Cloudera's internal Jira and found this > unreported issue. We do have a customer hitting this problem. I don't have a > fix, but thought it would be beneficial to report it to Apache Jira. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15108) RBF: MembershipNamenodeResolver should invalidate cache incase of active namenode update
[ https://issues.apache.org/jira/browse/HDFS-15108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17013380#comment-17013380 ] Hudson commented on HDFS-15108: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17852 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17852/]) HDFS-15108. RBF: MembershipNamenodeResolver should invalidate cache (ayushsaxena: rev 7b62409ace165603ee137561d7d75b1e742ed9a2) * (edit) hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/resolver/MembershipNamenodeResolver.java * (edit) hadoop-hdfs-project/hadoop-hdfs-rbf/src/test/java/org/apache/hadoop/hdfs/server/federation/resolver/TestNamenodeResolver.java > RBF: MembershipNamenodeResolver should invalidate cache incase of active > namenode update > > > Key: HDFS-15108 > URL: https://issues.apache.org/jira/browse/HDFS-15108 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-15108-01.patch, HDFS-15108-02.patch, > HDFS-15108-03.patch, HDFS-15108-04.patch, HDFS-15108-05.patch > > > If a failover happens, {{namenodeResolver.updateActiveNamenode(nsId, > address);}} is called, but this doesn't invalidates the cache, so as the next > time the correct active is fetched. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15099) [SBN Read] checkOperation(WRITE) should throw ObserverRetryOnActiveException on ObserverNode
[ https://issues.apache.org/jira/browse/HDFS-15099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17013329#comment-17013329 ] Hudson commented on HDFS-15099: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17851 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17851/]) HDFS-15099. [SBN Read] checkOperation(WRITE) should throw (shv: rev 26a969ec734dbdbf1d544f486dfa33f15c291789) * (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/ObserverRetryOnActiveException.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestObserverNode.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/StandbyState.java * (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/retry/RetryPolicies.java > [SBN Read] checkOperation(WRITE) should throw ObserverRetryOnActiveException > on ObserverNode > > > Key: HDFS-15099 > URL: https://issues.apache.org/jira/browse/HDFS-15099 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.10.0 >Reporter: Konstantin Shvachko >Assignee: Chen Liang >Priority: Major > Attachments: HDFS-15099-branch-2.10.001.patch, > HDFS-15099-branch-2.10.002.patch, HDFS-15099-branch-2.10.003.patch > > > The precision of updating an INode's aTime while executing > {{getBlockLocations()}} is 1 hour by default. Updates cannot be handled by > ObserverNode, so the call should be redirected to Active NameNode. In order > to redirect to active the ObserverNode should through > {{ObserverRetryOnActiveException}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15095) Fix accidental comment in flaky test TestDecommissioningStatus
[ https://issues.apache.org/jira/browse/HDFS-15095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17013153#comment-17013153 ] Hudson commented on HDFS-15095: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17847 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17847/]) HDFS-15095. Fix TestDecommissioningStatus. Contributed by Ahmed Hussein. (kihwal: rev 5fb901ac4017b4f13b089ecd920e864cd53ad3a6) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDecommissioningStatus.java > Fix accidental comment in flaky test TestDecommissioningStatus > -- > > Key: HDFS-15095 > URL: https://issues.apache.org/jira/browse/HDFS-15095 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Major > Fix For: 3.3.0, 3.1.4, 3.2.2 > > Attachments: HDFS-15095.001.patch, HDFS-15095.002.patch > > > There are some old Jiras suggesting that "{{testDecommissionStatus"}} is > flaky. > * HDFS-12188 > * HDFS-9599 > * HDFS-9950 > * HDFS-10755 > However, HDFS-14854 fix accidentally commented out one of the checks in > {{TestDecommissioningStatus.testDecommissionStatus()"}}. This Jira will > restore the commented out code and adds a blocking queue to make the test > case deterministic. > My intuition is that monitor task launched by AdminManager may not have > enough time to act before we start verifying the status. I suggest the force > the main thread to block until the node is added to the blocked node. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15110) HttpFS : post requests are not supported for path "/"
[ https://issues.apache.org/jira/browse/HDFS-15110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012605#comment-17012605 ] Hudson commented on HDFS-15110: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17844 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17844/]) HDFS-15110. HttpFS: post requests are not supported for path "/". (tasanuma: rev 9da294a140a919d9ba648637d09340bccfd5edd6) * (edit) hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/java/org/apache/hadoop/fs/http/server/HttpFSServer.java * (edit) hadoop-hdfs-project/hadoop-hdfs-httpfs/src/test/java/org/apache/hadoop/fs/http/server/TestHttpFSServer.java > HttpFS : post requests are not supported for path "/" > -- > > Key: HDFS-15110 > URL: https://issues.apache.org/jira/browse/HDFS-15110 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: hemanthboyina >Assignee: hemanthboyina >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-15110.001.patch, HDFS-15110.002.patch > > > POST requests in HttpFS with path as "/" were not supported . -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15100) RBF: Print stacktrace when DFSRouter fails to fetch/parse JMX output from NameNode
[ https://issues.apache.org/jira/browse/HDFS-15100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012448#comment-17012448 ] Hudson commented on HDFS-15100: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17842 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17842/]) HDFS-15100. RBF: Print stacktrace when DFSRouter fails to fetch/parse (tasanuma: rev 0315ef844862ee863d646b562ba6d8889876ffa9) * (edit) hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/FederationUtil.java > RBF: Print stacktrace when DFSRouter fails to fetch/parse JMX output from > NameNode > -- > > Key: HDFS-15100 > URL: https://issues.apache.org/jira/browse/HDFS-15100 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf >Reporter: Akira Ajisaka >Assignee: Akira Ajisaka >Priority: Major > Labels: supportability > Fix For: 3.3.0 > > > When DFSRouter fails to fetch or parse JMX output from NameNode, it prints > only the error message. Therefore we had to modify the source code to print > the stacktrace of the exception to find the root cause. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15107) dfs.client.server-defaults.validity.period.ms to support time units
[ https://issues.apache.org/jira/browse/HDFS-15107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012417#comment-17012417 ] Hudson commented on HDFS-15107: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17841 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17841/]) HDFS-15107. dfs.client.server-defaults.validity.period.ms to support (ayushsaxena: rev b32757c616cc89c6df2312edd1aa05b7dab6ee6c) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml * (edit) hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterClientProtocol.java * (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java > dfs.client.server-defaults.validity.period.ms to support time units > --- > > Key: HDFS-15107 > URL: https://issues.apache.org/jira/browse/HDFS-15107 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-15107-01.patch > > > Add support for time units for dfs.client.server-defaults.validity.period.ms -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15102) HttpFS: put requests are not supported for path "/"
[ https://issues.apache.org/jira/browse/HDFS-15102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012339#comment-17012339 ] Hudson commented on HDFS-15102: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17840 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17840/]) HDFS-15102. HttpFS: put requests are not supported for path "/". (tasanuma: rev 782c0556fb413d54c9d028ddc11d67cdc32585ff) * (edit) hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/java/org/apache/hadoop/fs/http/server/HttpFSServer.java * (edit) hadoop-hdfs-project/hadoop-hdfs-httpfs/src/test/java/org/apache/hadoop/fs/http/server/TestHttpFSServer.java > HttpFS: put requests are not supported for path "/" > --- > > Key: HDFS-15102 > URL: https://issues.apache.org/jira/browse/HDFS-15102 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: hemanthboyina >Assignee: hemanthboyina >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-15102.001.patch > > > PUT requests in HttpFS with path as "/" were not supported . > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14957) INodeReference Space Consumed was not same in QuotaUsage and ContentSummary
[ https://issues.apache.org/jira/browse/HDFS-14957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17011487#comment-17011487 ] Hudson commented on HDFS-14957: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17837 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17837/]) HDFS-14957. INodeReference Space Consumed was not same in QuotaUsage and (surendralilhore: rev bf45f3b80a88ca6e6ab1289dc5b71d9d6e6f6c10) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/snapshot/TestRenameWithSnapshots.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INodeFile.java > INodeReference Space Consumed was not same in QuotaUsage and ContentSummary > --- > > Key: HDFS-14957 > URL: https://issues.apache.org/jira/browse/HDFS-14957 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.1.4 >Reporter: hemanthboyina >Assignee: hemanthboyina >Priority: Major > Attachments: HDFS-14957.001.patch, HDFS-14957.002.patch, > HDFS-14957.003.patch, HDFS-14957.JPG > > > for INodeReferences , space consumed was different in QuotaUsage and Content > Summary -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15094) RBF: Reuse ugi string in ConnectionPoolID
[ https://issues.apache.org/jira/browse/HDFS-15094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17011381#comment-17011381 ] Hudson commented on HDFS-15094: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17836 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17836/]) HDFS-15094. RBF: Reuse ugi string in ConnectionPoolID. Contributed by (ayushsaxena: rev 8fe01db34afd681baa7f8d8d4a45bd080278f0f3) * (edit) hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/ConnectionPoolId.java > RBF: Reuse ugi string in ConnectionPoolID > - > > Key: HDFS-15094 > URL: https://issues.apache.org/jira/browse/HDFS-15094 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-15094-01.patch, HDFS-15094-02.patch, > UGI_AFTER-01.PNG, UGI_BEFORE-01.PNG > > > The connectionPoolID Hash Code and equals contains ugi.toString(), These > methods are used as part of getConnection() in ConnectionManager as part of > every call. > The ugi.toString() eats up considerable amount of time, the Hash Calculation > itself is ~10 percent of the total time of the call. And even more for the > equals method. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15096) RBF: GetServerDefaults Should be Cached At Router
[ https://issues.apache.org/jira/browse/HDFS-15096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17011372#comment-17011372 ] Hudson commented on HDFS-15096: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17835 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17835/]) HDFS-15096. RBF: GetServerDefaults Should be Cached At Router. (ayushsaxena: rev fd30f4c52b73705f6837da39c2c7e35f7052454e) * (edit) hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterClientProtocol.java > RBF: GetServerDefaults Should be Cached At Router > - > > Key: HDFS-15096 > URL: https://issues.apache.org/jira/browse/HDFS-15096 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Fix For: 3.3.0 > > Attachments: GetServerDefault-Cached.PNG, HDFS-15096-01.patch, > HDFS-15096-02.patch, HDFS-15096-03.patch, Server-Before.PNG > > > Presently each getServerDefault calls are triggered to the Namespace, The > DFSClient caches the getServerDefaults(), similarly the Router can also cache > the getServerDefaults(), as router connects to multiple clients, this > improves performance for subsequent calls. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15080) Fix the issue in reading persistent memory cached data with an offset
[ https://issues.apache.org/jira/browse/HDFS-15080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17010497#comment-17010497 ] Hudson commented on HDFS-15080: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17828 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17828/]) HDFS-15080. Fix the issue in reading persistent memory cached data with (rakeshr: rev 7030722e5d9f376245a9ab0a6a883538b6c55f82) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java > Fix the issue in reading persistent memory cached data with an offset > - > > Key: HDFS-15080 > URL: https://issues.apache.org/jira/browse/HDFS-15080 > Project: Hadoop HDFS > Issue Type: Bug > Components: caching, datanode >Reporter: Feilong He >Assignee: Feilong He >Priority: Major > Fix For: 3.3.0, 3.1.4, 3.2.2 > > Attachments: HDFS-15080-000.patch, HDFS-15080-branch-3.1-000.patch, > HDFS-15080-branch-3.2-000.patch > > > Some applications can read a segment of pmem cache with an offset specified. > The previous implementation for pmem cache read with DirectByteBuffer didn't > cover this situation. > Let me explain further. In our test, we used spark SQL to run some TPC-DS > workload to read the cache data and hits read exception. This was due to the > missed seek offset arg, which is used in spark SQL to read data packet by > packet. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15077) Fix intermittent failure of TestDFSClientRetries#testLeaseRenewSocketTimeout
[ https://issues.apache.org/jira/browse/HDFS-15077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17010448#comment-17010448 ] Hudson commented on HDFS-15077: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17827 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17827/]) HDFS-15077. Fix intermittent failure of (github: rev aba3f6c3e1fbb150ea7ff0411c41ffd3a2796208) * (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/client/impl/LeaseRenewer.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSClientRetries.java > Fix intermittent failure of TestDFSClientRetries#testLeaseRenewSocketTimeout > > > Key: HDFS-15077 > URL: https://issues.apache.org/jira/browse/HDFS-15077 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Reporter: Masatake Iwasaki >Assignee: Masatake Iwasaki >Priority: Minor > > {{TestDFSClientRetries#testLeaseRenewSocketTimeout}} intermittently fails due > to race between test thread and LeaseRenewer thread. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15072) HDFS MiniCluster fails to start when run in directory path with a %
[ https://issues.apache.org/jira/browse/HDFS-15072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17010290#comment-17010290 ] Hudson commented on HDFS-15072: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17825 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17825/]) HDFS-15072. HDFS MiniCluster fails to start when run in directory path (aajisaka: rev a43c177f1d4c2b6149a2680dd23d91103eca3be0) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsVolumeImpl.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestFsDatasetImpl.java > HDFS MiniCluster fails to start when run in directory path with a % > --- > > Key: HDFS-15072 > URL: https://issues.apache.org/jira/browse/HDFS-15072 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.5, 3.3.0 > Environment: I encountered this on a Mac while running an HBase > minicluster that was using Hadoop 2.7.5. However, the code looks the same in > trunk so it likely affects most or all current versions. >Reporter: Geoffrey Jacoby >Assignee: Masatake Iwasaki >Priority: Minor > > FSVolumeImpl.initializeCacheExecutor calls Guava's ThreadPoolExecutorBuilder. > setNameFormat, passing in the String representation of the parent File. Guava > will take the String whole and pass it to String.format, which uses % as a > special character. That means that if parent.toString() contains a percentage > sign, followed by a character that's illegal to use as a formatter in > String.format(), you'll get an exception that stops the MiniCluster from > starting up. > I did not check to see if this would also happen on a normal DataNode daemon. > initializeCacheExecutor should escape the parent file name before passing it > in. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15066) HttpFS: Implement setErasureCodingPolicy , unsetErasureCodingPolicy , getErasureCodingPolicy
[ https://issues.apache.org/jira/browse/HDFS-15066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17009306#comment-17009306 ] Hudson commented on HDFS-15066: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17819 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17819/]) HDFS-15066. HttpFS: Implement setErasureCodingPolicy , (tasanuma: rev 59aac002834aaeb6475faad4c894b8c764957f68) * (edit) hadoop-hdfs-project/hadoop-hdfs-httpfs/src/test/java/org/apache/hadoop/fs/http/server/TestHttpFSServer.java * (edit) hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/java/org/apache/hadoop/fs/http/server/HttpFSServer.java * (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/web/JsonUtilClient.java * (edit) hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/java/org/apache/hadoop/fs/http/server/HttpFSParametersProvider.java * (edit) hadoop-hdfs-project/hadoop-hdfs-httpfs/src/test/java/org/apache/hadoop/fs/http/client/BaseTestHttpFSWith.java * (edit) hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/java/org/apache/hadoop/fs/http/server/FSOperations.java * (edit) hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/java/org/apache/hadoop/fs/http/client/HttpFSFileSystem.java > HttpFS: Implement setErasureCodingPolicy , unsetErasureCodingPolicy , > getErasureCodingPolicy > > > Key: HDFS-15066 > URL: https://issues.apache.org/jira/browse/HDFS-15066 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: hemanthboyina >Assignee: hemanthboyina >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-15066.001.patch, HDFS-15066.002.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14788) Use dynamic regex filter to ignore copy of source files in Distcp
[ https://issues.apache.org/jira/browse/HDFS-14788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17009117#comment-17009117 ] Hudson commented on HDFS-14788: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17818 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17818/]) HDFS-14788. Use dynamic regex filter to ignore copy of source files in (stevel: rev 819159fa060897bcf7c9ae09bf4b2fc97292f92b) * (edit) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpConstants.java * (add) hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestCopyFilter.java * (edit) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/CopyFilter.java * (add) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/RegexpInConfigurationFilter.java * (add) hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestRegexpInConfigurationFilter.java * (edit) hadoop-tools/hadoop-distcp/src/site/markdown/DistCp.md.vm > Use dynamic regex filter to ignore copy of source files in Distcp > - > > Key: HDFS-14788 > URL: https://issues.apache.org/jira/browse/HDFS-14788 > Project: Hadoop HDFS > Issue Type: Improvement > Components: distcp >Affects Versions: 3.2.1 >Reporter: Mukund Thakur >Assignee: Mukund Thakur >Priority: Major > Fix For: 3.3.0 > > > There is a feature in Distcp where we can ignore specific files to get copied > to the destination. This is currently based on a filter regex which is read > from a specific file. The process of creating different regex file for > different distcp jobs seems like a tedious task. What we are proposing is to > expose a regex_filter parameter which can be set during Distcp job creation > and use this filter in a new implementation CopyFilter class. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15090) RBF: MountPoint Listing Should Return Flag Values Of Destination
[ https://issues.apache.org/jira/browse/HDFS-15090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17008665#comment-17008665 ] Hudson commented on HDFS-15090: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17814 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17814/]) HDFS-15090. RBF: MountPoint Listing Should Return Flag Values Of (tasanuma: rev 4a76ab777fdd2b72c438c73d45ffbe2f6bb8bb0d) * (edit) hadoop-hdfs-project/hadoop-hdfs-rbf/src/test/java/org/apache/hadoop/hdfs/server/federation/router/TestRouterMountTable.java * (edit) hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterClientProtocol.java > RBF: MountPoint Listing Should Return Flag Values Of Destination > > > Key: HDFS-15090 > URL: https://issues.apache.org/jira/browse/HDFS-15090 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-15090-01.patch > > > While doing listing, if a mount point is there and if the actual destination > is there, the owner and group details are taken from destination, similarly > the flags value can also be returned from destination. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15089) RBF: SmallFix for RBFMetrics in doc
[ https://issues.apache.org/jira/browse/HDFS-15089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17008527#comment-17008527 ] Hudson commented on HDFS-15089: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17812 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17812/]) HDFS-15089. RBF: SmallFix for RBFMetrics in doc (#1786) (aajisaka: rev 77ae7b9ce20ab013c6d492d04f444784b43fa871) * (edit) hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/metrics/RBFMetrics.java > RBF: SmallFix for RBFMetrics in doc > --- > > Key: HDFS-15089 > URL: https://issues.apache.org/jira/browse/HDFS-15089 > Project: Hadoop HDFS > Issue Type: Bug > Components: documentation, rbf >Reporter: luhuachao >Assignee: luhuachao >Priority: Trivial > Fix For: 3.3.0 > > > SmallFix for RBFMetrics in doc -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15091) Cache Admin and Quota Commands Should Check SuperUser Before Taking Lock
[ https://issues.apache.org/jira/browse/HDFS-15091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17008034#comment-17008034 ] Hudson commented on HDFS-15091: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17811 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17811/]) HDFS-15091. Cache Admin and Quota Commands Should Check SuperUser Before (ayushsaxena: rev f8644fbe9f76fb2990bcc997a346649e4d432d91) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirAttrOp.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNDNCacheOp.java > Cache Admin and Quota Commands Should Check SuperUser Before Taking Lock > > > Key: HDFS-15091 > URL: https://issues.apache.org/jira/browse/HDFS-15091 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-15091-01.patch, HDFS-15091-02.patch > > > As of now all API check superuser before taking lock, Similarly can be done > for the cache commands and setQuota. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15068) DataNode could meet deadlock if invoke refreshVolumes when register
[ https://issues.apache.org/jira/browse/HDFS-15068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17007632#comment-17007632 ] Hudson commented on HDFS-15068: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17810 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17810/]) HDFS-15068. DataNode could meet deadlock if invoke refreshVolumes when (iwasakims: rev 037ec8cfb1406ea3a8225a1b6306c2e78440353b) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeVolumeFailure.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNodeFaultInjector.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPOfferService.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java > DataNode could meet deadlock if invoke refreshVolumes when register > --- > > Key: HDFS-15068 > URL: https://issues.apache.org/jira/browse/HDFS-15068 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Xiaoqiao He >Assignee: Aiphago >Priority: Critical > Attachments: HDFS-15068.001.patch, HDFS-15068.002.patch, > HDFS-15068.003.patch, HDFS-15068.004.patch, HDFS-15068.005.patch > > > DataNode could meet deadlock when invoke `dfsadmin -reconfig datanode ip:host > start` to trigger #refreshVolumes. > 1. DataNod#refreshVolumes hold datanode instance ownable {{synchronizer}} > when enter this method first, then try to hold BPOfferService {{readlock}} > when `bpos.getNamespaceInfo()` in following code segment. > {code:java} > for (BPOfferService bpos : blockPoolManager.getAllNamenodeThreads()) { > nsInfos.add(bpos.getNamespaceInfo()); > } > {code} > 2. BPOfferService#registrationSucceeded (which is invoked by #register when > DataNode start or #reregister when processCommandFromActor) hold > BPOfferService {{writelock}} first, then try to hold datanode instance > ownable {{synchronizer}} in following method. > {code:java} > synchronized void bpRegistrationSucceeded(DatanodeRegistration > bpRegistration, > String blockPoolId) throws IOException { > id = bpRegistration; > if(!storage.getDatanodeUuid().equals(bpRegistration.getDatanodeUuid())) { > throw new IOException("Inconsistent Datanode IDs. Name-node returned " > + bpRegistration.getDatanodeUuid() > + ". Expecting " + storage.getDatanodeUuid()); > } > > registerBlockPoolWithSecretManager(bpRegistration, blockPoolId); > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14740) Recover data blocks from persistent memory read cache during datanode restarts
[ https://issues.apache.org/jira/browse/HDFS-14740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17006612#comment-17006612 ] Hudson commented on HDFS-14740: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17803 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17803/]) HDFS-14740. Recover data blocks from persistent memory read cache during (rakeshr: rev d79cce20abbbf321f6dcce03f4087544124a7cd2) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/NativePmemMappedBlock.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/MappableBlockLoader.java * (edit) hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/nativeio/NativeIO.c * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetCache.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestCacheByPmemMappableBlockLoader.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/MemoryMappableBlockLoader.java * (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/nativeio/NativeIO.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/CentralizedCacheManagement.md * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/PmemVolumeManager.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/MappableBlock.java * (edit) hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/nativeio/TestNativeIO.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/PmemMappableBlockLoader.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/MemoryMappedBlock.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/NativePmemMappableBlockLoader.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/PmemMappedBlock.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DNConf.java * (add) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestPmemCacheRecovery.java > Recover data blocks from persistent memory read cache during datanode restarts > -- > > Key: HDFS-14740 > URL: https://issues.apache.org/jira/browse/HDFS-14740 > Project: Hadoop HDFS > Issue Type: Improvement > Components: caching, datanode >Reporter: Feilong He >Assignee: Feilong He >Priority: Major > Attachments: HDFS-14740-branch-3.1-000.patch, > HDFS-14740-branch-3.1-001.patch, HDFS-14740-branch-3.2-000.patch, > HDFS-14740-branch-3.2-001.patch, HDFS-14740.000.patch, HDFS-14740.001.patch, > HDFS-14740.002.patch, HDFS-14740.003.patch, HDFS-14740.004.patch, > HDFS-14740.005.patch, HDFS-14740.006.patch, HDFS-14740.007.patch, > HDFS-14740.008.patch, HDFS-14740.009.patch, > HDFS_Persistent_Read-Cache_Design-v1.pdf, > HDFS_Persistent_Read-Cache_Test-v1.1.pdf, > HDFS_Persistent_Read-Cache_Test-v1.pdf, HDFS_Persistent_Read-Cache_Test-v2.pdf > > > In HDFS-13762, persistent memory (PM) is enabled in HDFS centralized cache > management. Even though PM can persist cache data, for simplifying the > initial implementation, the previous cache data will be cleaned up during > DataNode restarts. Here, we are proposing to improve HDFS PM cache by taking > advantage of PM's data persistence characteristic, i.e., recovering the > status for cached data, if any, when DataNode restarts, thus, cache warm up > time can be saved for user. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15063) HttpFS : getFileStatus doesn't return ecPolicy
[ https://issues.apache.org/jira/browse/HDFS-15063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17006285#comment-17006285 ] Hudson commented on HDFS-15063: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17802 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17802/]) HDFS-15063. HttpFS: getFileStatus doesn't return ecPolicy. Contributed (tasanuma: rev 074050ca595a81927c867951e48cef132a0284be) * (edit) hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/java/org/apache/hadoop/fs/http/server/FSOperations.java * (edit) hadoop-hdfs-project/hadoop-hdfs-httpfs/src/test/java/org/apache/hadoop/fs/http/server/TestHttpFSServer.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/JsonUtil.java * (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/web/JsonUtilClient.java * (edit) hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/java/org/apache/hadoop/fs/http/client/HttpFSFileSystem.java > HttpFS : getFileStatus doesn't return ecPolicy > -- > > Key: HDFS-15063 > URL: https://issues.apache.org/jira/browse/HDFS-15063 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: hemanthboyina >Assignee: hemanthboyina >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-15063.001.patch, HDFS-15063.002.patch, > HDFS-15063.003.patch, HDFS-15063.004.patch, HDFS-15063.005.patch > > > Currently LISTSTATUS call to HttpFS returns a json. These jsonArray elements > have the ecPolicy name. > But when HttpFsFileSystem converts it back into a FileStatus object, then > ecPolicy is not added -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14937) [SBN read] ObserverReadProxyProvider should throw InterruptException
[ https://issues.apache.org/jira/browse/HDFS-14937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17004698#comment-17004698 ] Hudson commented on HDFS-14937: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17801 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17801/]) HDFS-14937. [SBN read] ObserverReadProxyProvider should throw (ayushsaxena: rev 62423910a4020bea6200c44c12fe96b6e14bd59c) * (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/ObserverReadProxyProvider.java > [SBN read] ObserverReadProxyProvider should throw InterruptException > > > Key: HDFS-14937 > URL: https://issues.apache.org/jira/browse/HDFS-14937 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: xuzq >Assignee: xuzq >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-14937-trunk-001.patch, HDFS-14937-trunk-002.patch > > > ObserverReadProxyProvider should throw InterruptException immediately if one > Observer catch InterruptException in invoking. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15074) DataNode.DataTransfer thread should catch all the expception and log it.
[ https://issues.apache.org/jira/browse/HDFS-15074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17004656#comment-17004656 ] Hudson commented on HDFS-15074: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17800 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17800/]) HDFS-15074. DataNode.DataTransfer thread should catch all the expception (surendralilhore: rev ee51eadda01e02ac5759ca19756f6f961c8eb0cd) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java > DataNode.DataTransfer thread should catch all the expception and log it. > > > Key: HDFS-15074 > URL: https://issues.apache.org/jira/browse/HDFS-15074 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.1.1 >Reporter: Surendra Singh Lilhore >Assignee: hemanthboyina >Priority: Major > Fix For: 3.3.0, 3.1.4, 3.2.2 > > Attachments: HDFS-15074.001.patch, HDFS-15074.002.patch > > > Some time If this thread is throwing exception other than IOException, will > not be able to trash it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14934) [SBN Read] Standby NN throws many InterruptedExceptions when dfs.ha.tail-edits.period is 0
[ https://issues.apache.org/jira/browse/HDFS-14934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17004477#comment-17004477 ] Hudson commented on HDFS-14934: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17799 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17799/]) HDFS-14934. [SBN Read] Standby NN throws many InterruptedExceptions when (tasanuma: rev dc32f583afffc372f78fb45211c3e7ce13f6a4be) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/client/IPCLoggerChannel.java > [SBN Read] Standby NN throws many InterruptedExceptions when > dfs.ha.tail-edits.period is 0 > -- > > Key: HDFS-14934 > URL: https://issues.apache.org/jira/browse/HDFS-14934 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Takanobu Asanuma >Priority: Major > Attachments: HDFS-14934-01.patch > > > When dfs.ha.tail-edits.period is 0ms (or very short-time), there are many > warn logs in standby NN. > {noformat} > 2019-10-25 16:25:46,945 [Logger channel (from parallel executor) to hostname>/:] WARN concurrent.ExecutorHelper > (ExecutorHelper.java:logThrowableFromAfterExecute(55)) - Thread > (Thread[Logger channel (from parallel executor) to / address>:,5,main]) interrupted: > java.lang.InterruptedException > at > com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:509) > at > com.google.common.util.concurrent.FluentFuture$TrustedFuture.get(FluentFuture.java:82) > at > org.apache.hadoop.util.concurrent.ExecutorHelper.logThrowableFromAfterExecute(ExecutorHelper.java:48) > at > org.apache.hadoop.util.concurrent.HadoopThreadPoolExecutor.afterExecute(HadoopThreadPoolExecutor.java:90) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1157) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15081) Typo in RetryCache#waitForCompletion annotation
[ https://issues.apache.org/jira/browse/HDFS-15081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17004185#comment-17004185 ] Hudson commented on HDFS-15081: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17798 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17798/]) HDFS-15081. Typo in RetryCache#waitForCompletion annotation. Contributed (ayushsaxena: rev 926d0b48f0d829f679d3d51e162fa494e577ea66) * (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/RetryCache.java > Typo in RetryCache#waitForCompletion annotation > --- > > Key: HDFS-15081 > URL: https://issues.apache.org/jira/browse/HDFS-15081 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.3.0 >Reporter: Fei Hui >Assignee: Fei Hui >Priority: Trivial > Fix For: 3.3.0 > > Attachments: HDFS-15081.001.patch > > > Typo in RetryCache#waitForCompletion annotation > {code} > // Previous request has failed, the expectation is is that it will be > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15003) RBF: Make Router support storage type quota.
[ https://issues.apache.org/jira/browse/HDFS-15003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17003880#comment-17003880 ] Hudson commented on HDFS-15003: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17795 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17795/]) HDFS-15003. RBF: Make Router support storage type quota. Contributed by (ayushsaxena: rev 8730a7bf6025a3b2b7d6e6686533283b854af192) * (edit) hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterQuotaUsage.java * (edit) hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterAdminServer.java * (edit) hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterQuotaUpdateService.java * (edit) hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/store/records/impl/pb/MountTablePBImpl.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSCommands.md * (edit) hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterQuotaManager.java * (edit) hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/Quota.java * (edit) hadoop-hdfs-project/hadoop-hdfs-rbf/src/test/java/org/apache/hadoop/hdfs/server/federation/router/TestRouterAdminCLI.java * (edit) hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterRpcServer.java * (edit) hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/store/impl/MountTableStoreImpl.java * (edit) hadoop-hdfs-project/hadoop-hdfs-rbf/src/test/java/org/apache/hadoop/hdfs/server/federation/router/TestRouterQuota.java * (edit) hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/tools/federation/RouterAdmin.java * (edit) hadoop-hdfs-project/hadoop-hdfs-rbf/src/site/markdown/HDFSRouterFederation.md > RBF: Make Router support storage type quota. > > > Key: HDFS-15003 > URL: https://issues.apache.org/jira/browse/HDFS-15003 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Jinglun >Assignee: Jinglun >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-15003.001.patch, HDFS-15003.002.patch, > HDFS-15003.003.patch, HDFS-15003.004.patch, HDFS-15003.005.patch, > HDFS-15003.006.patch, HDFS-15003.007.patch, HDFS-15003.008.patch > > > Make Router support storage type quota. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14997) BPServiceActor processes commands from NameNode asynchronously
[ https://issues.apache.org/jira/browse/HDFS-14997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17003874#comment-17003874 ] Hudson commented on HDFS-14997: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17794 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17794/]) HDFS-14997. Addendum: BPServiceActor processes commands from NameNode (ayushsaxena: rev 80f91d14ab0fb385252d4eeb19141bd059303d59) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java > BPServiceActor processes commands from NameNode asynchronously > -- > > Key: HDFS-14997 > URL: https://issues.apache.org/jira/browse/HDFS-14997 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Xiaoqiao He >Assignee: Xiaoqiao He >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-14997.001.patch, HDFS-14997.002.patch, > HDFS-14997.003.patch, HDFS-14997.004.patch, HDFS-14997.005.patch, > HDFS-14997.addendum.patch, image-2019-12-26-16-15-44-814.png > > > There are two core functions, report(#sendHeartbeat, #blockReport, > #cacheReport) and #processCommand in #BPServiceActor main process flow. If > processCommand cost long time it will block send report flow. Meanwhile > processCommand could cost long time(over 1000s the worst case I meet) when IO > load of DataNode is very high. Since some IO operations are under > #datasetLock, So it has to wait to acquire #datasetLock long time when > process some of commands(such as #DNA_INVALIDATE). In such case, #heartbeat > will not send to NameNode in-time, and trigger other disasters. > I propose to improve #processCommand asynchronously and not block > #BPServiceActor to send heartbeat back to NameNode when meet high IO load. > Notes: > 1. Lifeline could be one effective solution, however some old branches are > not support this feature. > 2. IO operations under #datasetLock is another issue, I think we should solve > it at another JIRA. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15054) Delete Snapshot not updating new modification time
[ https://issues.apache.org/jira/browse/HDFS-15054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17003130#comment-17003130 ] Hudson commented on HDFS-15054: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17793 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17793/]) HDFS-15054. Delete Snapshot not updating new modification time. (ayushsaxena: rev 300505c56277982ea4369dce1a2b323b4822fe47) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/DirectorySnapshottableFeature.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/SnapshotManager.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/snapshot/TestSnapshotManager.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogOp.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirSnapshotOp.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INodeDirectory.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/resources/editsStored.xml * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/snapshot/TestSnapshot.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/resources/editsStored > Delete Snapshot not updating new modification time > -- > > Key: HDFS-15054 > URL: https://issues.apache.org/jira/browse/HDFS-15054 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: hemanthboyina >Assignee: hemanthboyina >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-15054.001.patch, HDFS-15054.002.patch > > > on creating a snapshot , we set modifcation time for the snapshot along with > that we update modification time of snapshot created directory > {code:java} > snapshotRoot.updateModificationTime(now, Snapshot.CURRENT_STATE_ID); > s.getRoot().setModificationTime(now, Snapshot.CURRENT_STATE_ID); {code} > So on deleting snapshot , we should update the modification time for snapshot > created directory . -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12999) When reach the end of the block group, it may not need to flush all the data packets(flushAllInternals) twice.
[ https://issues.apache.org/jira/browse/HDFS-12999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17003094#comment-17003094 ] Hudson commented on HDFS-12999: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17792 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17792/]) HDFS-12999. When reach the end of the block group, it may not need to (ayushsaxena: rev df622cf4a32ee172ded6c4b3b97a1e49befc4f10) * (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSStripedOutputStream.java > When reach the end of the block group, it may not need to flush all the data > packets(flushAllInternals) twice. > --- > > Key: HDFS-12999 > URL: https://issues.apache.org/jira/browse/HDFS-12999 > Project: Hadoop HDFS > Issue Type: Improvement > Components: erasure-coding, hdfs-client >Affects Versions: 3.0.0-beta1, 3.1.0 >Reporter: lufei >Assignee: lufei >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-12999.001.patch, HDFS-12999.002.patch, > HDFS-12999.003.patch > > > In order to make the process simplification. It's no need to flush all the > data packets(flushAllInternals) twice,when reach the end of the block group. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15073) Replace curator-shaded guava import with the standard one
[ https://issues.apache.org/jira/browse/HDFS-15073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17003043#comment-17003043 ] Hudson commented on HDFS-15073: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17790 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17790/]) HDFS-15073. Replace curator-shaded guava import with the standard one (aajisaka: rev d8cd7098b4bcfbfd76915b9ecefb2c7ea320e149) * (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/SnapshotDiffReportListing.java > Replace curator-shaded guava import with the standard one > - > > Key: HDFS-15073 > URL: https://issues.apache.org/jira/browse/HDFS-15073 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Reporter: Akira Ajisaka >Assignee: Chandra Sanivarapu >Priority: Minor > Labels: newbie > > In SnapshotDiffReportListing.java, > {code} > import org.apache.curator.shaded.com.google.common.base.Preconditions; > {code} > should be > {code} > import com.google.common.base.Preconditions; > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15076) Fix tests that hold FSDirectory lock, without holding FSNamesystem lock.
[ https://issues.apache.org/jira/browse/HDFS-15076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17002994#comment-17002994 ] Hudson commented on HDFS-15076: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17789 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17789/]) HDFS-15076. Fix tests that hold FSDirectory lock, without holding (shv: rev b98ac2a3af50ccf2af07790ab0760d4c51820836) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDiskspaceQuotaUpdate.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestGetBlockLocations.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSNamesystem.java > Fix tests that hold FSDirectory lock, without holding FSNamesystem lock. > > > Key: HDFS-15076 > URL: https://issues.apache.org/jira/browse/HDFS-15076 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Reporter: Konstantin Shvachko >Assignee: Konstantin Shvachko >Priority: Major > Fix For: 3.3.0, 3.1.4, 3.2.2, 2.10.1 > > Attachments: HDFS-15076.001.patch > > > Three tests {{TestGetBlockLocations}}, {{TestFSNamesystem}}, > {{TestDiskspaceQuotaUpdate}} use {{FSDirectory}} methods, which hold > FSDirectory lock. They should also hold the global Namesystem lock. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15062) Add LOG when sendIBRs failed
[ https://issues.apache.org/jira/browse/HDFS-15062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17000268#comment-17000268 ] Hudson commented on HDFS-15062: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17779 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17779/]) HDFS-15062. Add LOG when sendIBRs failed. Contributed by Fei Hui. (inigoiri: rev 52d7b745c6d95e799542d6409dac30d0418ce8a8) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/IncrementalBlockReportManager.java > Add LOG when sendIBRs failed > > > Key: HDFS-15062 > URL: https://issues.apache.org/jira/browse/HDFS-15062 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.0.3, 3.2.1, 3.1.3 >Reporter: Fei Hui >Assignee: Fei Hui >Priority: Major > Fix For: 3.0.4, 3.3.0, 3.1.4, 3.2.2 > > Attachments: HDFS-15062.001.patch, HDFS-15062.002.patch, > HDFS-15062.003.patch > > > {code} > /** Send IBRs to namenode. */ > void sendIBRs(DatanodeProtocol namenode, DatanodeRegistration registration, > String bpid, String nnRpcLatencySuffix) throws IOException { > // Generate a list of the pending reports for each storage under the lock > final StorageReceivedDeletedBlocks[] reports = generateIBRs(); > if (reports.length == 0) { > // Nothing new to report. > return; > } > // Send incremental block reports to the Namenode outside the lock > if (LOG.isDebugEnabled()) { > LOG.debug("call blockReceivedAndDeleted: " + Arrays.toString(reports)); > } > boolean success = false; > final long startTime = monotonicNow(); > try { > namenode.blockReceivedAndDeleted(registration, bpid, reports); > success = true; > } finally { > if (success) { > dnMetrics.addIncrementalBlockReport(monotonicNow() - startTime, > nnRpcLatencySuffix); > lastIBR = startTime; > } else { > // If we didn't succeed in sending the report, put all of the > // blocks back onto our queue, but only in the case where we > // didn't put something newer in the meantime. > putMissing(reports); > } > } > } > {code} > When call namenode.blockReceivedAndDelete failed, will put reports to > pendingIBRs. Maybe we should add log for failed case. It is helpful for > trouble shooting -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14997) BPServiceActor processes commands from NameNode asynchronously
[ https://issues.apache.org/jira/browse/HDFS-14997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17000249#comment-17000249 ] Hudson commented on HDFS-14997: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17778 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17778/]) HDFS-14997. BPServiceActor processes commands from NameNode (inigoiri: rev b86895485d95440de099831e0db38db037f16bdd) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/metrics/DataNodeMetrics.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBPOfferService.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java > BPServiceActor processes commands from NameNode asynchronously > -- > > Key: HDFS-14997 > URL: https://issues.apache.org/jira/browse/HDFS-14997 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Xiaoqiao He >Assignee: Xiaoqiao He >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-14997.001.patch, HDFS-14997.002.patch, > HDFS-14997.003.patch, HDFS-14997.004.patch, HDFS-14997.005.patch > > > There are two core functions, report(#sendHeartbeat, #blockReport, > #cacheReport) and #processCommand in #BPServiceActor main process flow. If > processCommand cost long time it will block send report flow. Meanwhile > processCommand could cost long time(over 1000s the worst case I meet) when IO > load of DataNode is very high. Since some IO operations are under > #datasetLock, So it has to wait to acquire #datasetLock long time when > process some of commands(such as #DNA_INVALIDATE). In such case, #heartbeat > will not send to NameNode in-time, and trigger other disasters. > I propose to improve #processCommand asynchronously and not block > #BPServiceActor to send heartbeat back to NameNode when meet high IO load. > Notes: > 1. Lifeline could be one effective solution, however some old branches are > not support this feature. > 2. IO operations under #datasetLock is another issue, I think we should solve > it at another JIRA. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15012) NN fails to parse Edit logs after applying HDFS-13101
[ https://issues.apache.org/jira/browse/HDFS-15012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16999378#comment-16999378 ] Hudson commented on HDFS-15012: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17773 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17773/]) HDFS-15012. NN fails to parse Edit logs after applying HDFS-13101. (shashikant: rev fdd96e46d1f89f0ecdb9b1836dc7fca9fbb954fd) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/DirectoryWithSnapshotFeature.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/snapshot/TestRenameWithSnapshots.java > NN fails to parse Edit logs after applying HDFS-13101 > - > > Key: HDFS-15012 > URL: https://issues.apache.org/jira/browse/HDFS-15012 > Project: Hadoop HDFS > Issue Type: Bug > Components: nn >Reporter: Eric Lin >Assignee: Shashikant Banerjee >Priority: Blocker > Labels: release-blocker > Attachments: HDFS-15012.000.patch, HDFS-15012.001.patch > > > After applying HDFS-13101, and deleting and creating large number of > snapshots, SNN exited with below error: > > {code:sh} > 2019-11-18 08:28:06,528 ERROR > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception > on operation DeleteSnapshotOp [snapshotRoot=/path/to/hdfs/file, > snapshotName=distcp-3479-31-old, > RpcClientId=b16a6cb5-bdbb-45ae-9f9a-f7dc57931f37, Rpc > CallId=1] > java.lang.AssertionError: Element already exists: > element=partition_isactive=true, DELETED=[partition_isactive=true] > at org.apache.hadoop.hdfs.util.Diff.insert(Diff.java:193) > at org.apache.hadoop.hdfs.util.Diff.delete(Diff.java:239) > at org.apache.hadoop.hdfs.util.Diff.combinePosterior(Diff.java:462) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff$2.initChildren(DirectoryWithSnapshotFeature.java:240) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff$2.iterator(DirectoryWithSnapshotFeature.java:250) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtreeRecursively(INodeDirectory.java:755) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:753) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:790) > at > org.apache.hadoop.hdfs.server.namenode.INodeReference.cleanSubtree(INodeReference.java:332) > at > org.apache.hadoop.hdfs.server.namenode.INodeReference$WithName.cleanSubtree(INodeReference.java:583) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtreeRecursively(INodeDirectory.java:760) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:753) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:790) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectorySnapshottableFeature.removeSnapshot(DirectorySnapshottableFeature.java:235) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.removeSnapshot(INodeDirectory.java:259) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.deleteSnapshot(SnapshotManager.java:301) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:688) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:232) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:141) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:903) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:756) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:324) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1144) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:796) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:614) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:676) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:844) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:823) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:
[jira] [Commented] (HDFS-13101) Yet another fsimage corruption related to snapshot
[ https://issues.apache.org/jira/browse/HDFS-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16999379#comment-16999379 ] Hudson commented on HDFS-13101: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17773 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17773/]) HDFS-15012. NN fails to parse Edit logs after applying HDFS-13101. (shashikant: rev fdd96e46d1f89f0ecdb9b1836dc7fca9fbb954fd) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/DirectoryWithSnapshotFeature.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/snapshot/TestRenameWithSnapshots.java > Yet another fsimage corruption related to snapshot > -- > > Key: HDFS-13101 > URL: https://issues.apache.org/jira/browse/HDFS-13101 > Project: Hadoop HDFS > Issue Type: Bug > Components: snapshots >Reporter: Yongjun Zhang >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 2.10.0, 3.0.4, 3.3.0, 2.8.6, 3.2.1, 2.9.3, 3.1.3 > > Attachments: HDFS-13101.001.patch, HDFS-13101.002.patch, > HDFS-13101.003.patch, HDFS-13101.004.patch, HDFS-13101.branch-2.001.patch, > HDFS-13101.branch-2.8.patch, HDFS-13101.corruption_repro.patch, > HDFS-13101.corruption_repro_simplified.patch > > > Lately we saw case similar to HDFS-9406, even though HDFS-9406 fix is > present, so it's likely another case not covered by the fix. We are currently > trying to collect good fsimage + editlogs to replay to reproduce it and > investigate. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14908) LeaseManager should check parent-child relationship when filter open files.
[ https://issues.apache.org/jira/browse/HDFS-14908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16997820#comment-16997820 ] Hudson commented on HDFS-14908: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17770 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17770/]) HDFS-14908. LeaseManager should check parent-child relationship when (inigoiri: rev 24080666e5e2214d4a362c889cd9aa617be5de81) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/LeaseManager.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestListOpenFiles.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/DFSTestUtil.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java > LeaseManager should check parent-child relationship when filter open files. > --- > > Key: HDFS-14908 > URL: https://issues.apache.org/jira/browse/HDFS-14908 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.1.0, 3.0.1 >Reporter: Jinglun >Assignee: Jinglun >Priority: Minor > Fix For: 3.3.0 > > Attachments: HDFS-14908.001.patch, HDFS-14908.002.patch, > HDFS-14908.003.patch, HDFS-14908.004.patch, HDFS-14908.005.patch, > HDFS-14908.006.patch, HDFS-14908.007.patch, HDFS-14908.008.patch, > HDFS-14908.009.patch, HDFS-14908.010.patch, HDFS-14908.TestV4.patch, > Test.java, TestV2.java, TestV3.java > > > Now when doing listOpenFiles(), LeaseManager only checks whether the filter > path is the prefix of the open files. We should check whether the filter path > is the parent/ancestor of the open files. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15048) Fix findbug in DirectoryScanner
[ https://issues.apache.org/jira/browse/HDFS-15048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16997133#comment-16997133 ] Hudson commented on HDFS-15048: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17766 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17766/]) HDFS-15048. Fix findbug in DirectoryScanner. (iwasakims: rev dc6cf17b3405a5f03b75b1f7bf3b9e79663deaf1) * (edit) hadoop-hdfs-project/hadoop-hdfs/dev-support/findbugsExcludeFile.xml > Fix findbug in DirectoryScanner > --- > > Key: HDFS-15048 > URL: https://issues.apache.org/jira/browse/HDFS-15048 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Takanobu Asanuma >Assignee: Masatake Iwasaki >Priority: Major > Attachments: HDFS-15048.001.patch > > > There is a findbug in DirectoryScanner. > {noformat} > Multithreaded correctness Warnings > org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile() calls > Thread.sleep() with a lock held > Bug type SWL_SLEEP_WITH_LOCK_HELD (click for details) > In class org.apache.hadoop.hdfs.server.datanode.DirectoryScanner > In method org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile() > At DirectoryScanner.java:[line 441] > {noformat} > https://builds.apache.org/job/PreCommit-HDFS-Build/28498/artifact/out/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15038) TestFsck testFsckListCorruptSnapshotFiles is failing in trunk
[ https://issues.apache.org/jira/browse/HDFS-15038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16996532#comment-16996532 ] Hudson commented on HDFS-15038: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17762 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17762/]) HDFS-15038. TestFsck testFsckListCorruptSnapshotFiles is failing in (ayushsaxena: rev 7a8700754537353496b7546177a4706f3f1404cf) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFsck.java > TestFsck testFsckListCorruptSnapshotFiles is failing in trunk > - > > Key: HDFS-15038 > URL: https://issues.apache.org/jira/browse/HDFS-15038 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: hemanthboyina >Assignee: hemanthboyina >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-15038.001.patch, HDFS-15038.002.patch, > HDFS-15038.003.patch > > > [https://builds.apache.org/job/PreCommit-HDFS-Build/28481/testReport/] > > [https://builds.apache.org/job/PreCommit-HDFS-Build/28482/testReport/] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15053) RBF: Add permission check for safemode operation
[ https://issues.apache.org/jira/browse/HDFS-15053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16996371#comment-16996371 ] Hudson commented on HDFS-15053: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17761 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17761/]) HDFS-15053. RBF: Add permission check for safemode operation. (ayushsaxena: rev 72aee114f8b1feae4a187cce0aa5a8d2ff55f416) * (edit) hadoop-hdfs-project/hadoop-hdfs-rbf/src/test/java/org/apache/hadoop/hdfs/server/federation/router/TestRouterAdminCLI.java * (edit) hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterAdminServer.java > RBF: Add permission check for safemode operation > > > Key: HDFS-15053 > URL: https://issues.apache.org/jira/browse/HDFS-15053 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf >Reporter: Xiaoqiao He >Assignee: Xiaoqiao He >Priority: Major > Attachments: HDFS-15053.001.patch, HDFS-15053.002.patch > > > Propose to add superuser permission check for safemode operation. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15016) RBF: getDatanodeReport() should return the latest update
[ https://issues.apache.org/jira/browse/HDFS-15016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16995828#comment-16995828 ] Hudson commented on HDFS-15016: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17760 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17760/]) HDFS-15016. RBF: getDatanodeReport() should return the latest update. (inigoiri: rev 7fe924b1c03a2fa45188027bdc0a36cb6c8b4ba4) * (edit) hadoop-hdfs-project/hadoop-hdfs-rbf/src/test/java/org/apache/hadoop/hdfs/server/federation/router/TestRouterNamenodeMonitoring.java * (edit) hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterRpcServer.java * (edit) hadoop-hdfs-project/hadoop-hdfs-rbf/src/test/java/org/apache/hadoop/hdfs/server/federation/MockNamenode.java * (edit) hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterClientProtocol.java > RBF: getDatanodeReport() should return the latest update > > > Key: HDFS-15016 > URL: https://issues.apache.org/jira/browse/HDFS-15016 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Íñigo Goiri >Assignee: Íñigo Goiri >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-15016.000.patch, HDFS-15016.001.patch, > HDFS-15016.002.patch, HDFS-15016.003.patch > > > Currently, when the Router calls getDatanodeReport() (or > getDatanodeStorageReport()) and the DN is in multiple clusters, it just takes > the one that comes first. It should consider the latest update. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15036) Active NameNode should not silently fail the image transfer
[ https://issues.apache.org/jira/browse/HDFS-15036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16994950#comment-16994950 ] Hudson commented on HDFS-15036: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17758 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17758/]) HDFS-15036. Active NameNode should not silently fail the image transfer. (cliang: rev 65c4660bcd897e139fc175ca438cff75ec0c6be8) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ImageServlet.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/StandbyCheckpointer.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestRollingUpgrade.java > Active NameNode should not silently fail the image transfer > --- > > Key: HDFS-15036 > URL: https://issues.apache.org/jira/browse/HDFS-15036 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.10.0 >Reporter: Konstantin Shvachko >Assignee: Chen Liang >Priority: Major > Attachments: HDFS-15036.001.patch, HDFS-15036.002.patch, > HDFS-15036.003.patch > > > Image transfer from Standby NameNode to Active silently fails on Active, > without any logging and not notifying the receiver side. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15044) [Dynamometer] Show the line of audit log when parsing it unsuccessfully
[ https://issues.apache.org/jira/browse/HDFS-15044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16994818#comment-16994818 ] Hudson commented on HDFS-15044: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17757 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17757/]) HDFS-15044. [Dynamometer] Show the line of audit log when parsing it (xkrogen: rev c210cede5ce143a0c12646d82d657863f0ec96b6) * (edit) hadoop-tools/hadoop-dynamometer/hadoop-dynamometer-workload/src/main/java/org/apache/hadoop/tools/dynamometer/workloadgenerator/audit/AuditLogDirectParser.java > [Dynamometer] Show the line of audit log when parsing it unsuccessfully > --- > > Key: HDFS-15044 > URL: https://issues.apache.org/jira/browse/HDFS-15044 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: tools >Reporter: Takanobu Asanuma >Assignee: Takanobu Asanuma >Priority: Major > Fix For: 3.3.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15047) Document the new decommission monitor (HDFS-14854)
[ https://issues.apache.org/jira/browse/HDFS-15047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16994801#comment-16994801 ] Hudson commented on HDFS-15047: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17756 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17756/]) HDFS-15047. Document the new decommission monitor (HDFS-14854). (#1755) (github: rev bdd00f10b46c1c856433e2948906f36c70d3a0be) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsDataNodeAdminGuide.md > Document the new decommission monitor (HDFS-14854) > -- > > Key: HDFS-15047 > URL: https://issues.apache.org/jira/browse/HDFS-15047 > Project: Hadoop HDFS > Issue Type: Task > Components: documentation >Affects Versions: 3.3.0 >Reporter: Wei-Chiu Chuang >Assignee: Masatake Iwasaki >Priority: Major > > We can document HDFS-14854, add it to > https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDataNodeAdminGuide.html > and mark it as an experimental feature. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14854) Create improved decommission monitor implementation
[ https://issues.apache.org/jira/browse/HDFS-14854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16994802#comment-16994802 ] Hudson commented on HDFS-14854: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17756 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17756/]) HDFS-15047. Document the new decommission monitor (HDFS-14854). (#1755) (github: rev bdd00f10b46c1c856433e2948906f36c70d3a0be) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsDataNodeAdminGuide.md > Create improved decommission monitor implementation > --- > > Key: HDFS-14854 > URL: https://issues.apache.org/jira/browse/HDFS-14854 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.3.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Fix For: 3.3.0 > > Attachments: 012_to_013_changes.diff, > Decommission_Monitor_V2_001.pdf, HDFS-14854.001.patch, HDFS-14854.002.patch, > HDFS-14854.003.patch, HDFS-14854.004.patch, HDFS-14854.005.patch, > HDFS-14854.006.patch, HDFS-14854.007.patch, HDFS-14854.008.patch, > HDFS-14854.009.patch, HDFS-14854.010.patch, HDFS-14854.011.patch, > HDFS-14854.012.patch, HDFS-14854.013.patch, HDFS-14854.014.patch > > > In HDFS-13157, we discovered a series of problems with the current > decommission monitor implementation, such as: > * Blocks are replicated sequentially disk by disk and node by node, and > hence the load is not spread well across the cluster > * Adding a node for decommission can cause the namenode write lock to be > held for a long time. > * Decommissioning nodes floods the replication queue and under replicated > blocks from a future node or disk failure may way for a long time before they > are replicated. > * Blocks pending replication are checked many times under a write lock > before they are sufficiently replicate, wasting resources > In this Jira I propose to create a new implementation of the decommission > monitor that resolves these issues. As it will be difficult to prove one > implementation is better than another, the new implementation can be enabled > or disabled giving the option of the existing implementation or the new one. > I will attach a pdf with some more details on the design and then a version 1 > patch shortly. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15050) Optimize log information when DFSInputStream meet CannotObtainBlockLengthException
[ https://issues.apache.org/jira/browse/HDFS-15050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16994506#comment-16994506 ] Hudson commented on HDFS-15050: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17755 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17755/]) HDFS-15050. Optimize log information when DFSInputStream meet (weichiu: rev 0e28cd8f63615ed2f1183f27efb5c2aaf6aa) * (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/CannotObtainBlockLengthException.java * (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java > Optimize log information when DFSInputStream meet > CannotObtainBlockLengthException > -- > > Key: HDFS-15050 > URL: https://issues.apache.org/jira/browse/HDFS-15050 > Project: Hadoop HDFS > Issue Type: Improvement > Components: dfsclient >Reporter: Xiaoqiao He >Assignee: Xiaoqiao He >Priority: Major > Fix For: 3.3.0, 3.1.4, 3.2.2 > > Attachments: HDFS-15050.001.patch > > > We could not identify which file it belongs easily when DFSInputStream meet > CannotObtainBlockLengthException, as the following exception log. Just > suggest to log file path string when we meet CannotObtainBlockLengthException. > {code:java} > Caused by: java.io.IOException: Cannot obtain block length for > LocatedBlock{BP-***:blk_***_***; getBlockSize()=690504; corrupt=false; > offset=1811939328; > locs=[DatanodeInfoWithStorage[*:50010,DS-2bcadcc4-458a-45c6-a91b-8461bf7cdd71,DISK], > > DatanodeInfoWithStorage[*:50010,DS-8f2bb259-ecb2-4839-8769-4a0523360d58,DISK], > > DatanodeInfoWithStorage[*:50010,DS-69f4de6f-2428-42ff-9486-98c2544b1ada,DISK]]} > at > org.apache.hadoop.hdfs.DFSInputStream.readBlockLength(DFSInputStream.java:402) > at > org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:345) > at > org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:280) > at org.apache.hadoop.hdfs.DFSInputStream.(DFSInputStream.java:272) > at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1664) > at > org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:304) > at > org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:300) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:300) > at org.apache.hadoop.fs.FilterFileSystem.open(FilterFileSystem.java:161) > at > org.apache.hadoop.fs.viewfs.ChRootedFileSystem.open(ChRootedFileSystem.java:266) > at > org.apache.hadoop.fs.viewfs.ViewFileSystem.open(ViewFileSystem.java:481) > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:828) > at > org.apache.hadoop.mapred.LineRecordReader.(LineRecordReader.java:109) > at > org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67) > at > org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.(CombineHiveRecordReader.java:65) > ... 16 more > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14983) RBF: Add dfsrouteradmin -refreshSuperUserGroupsConfiguration command option
[ https://issues.apache.org/jira/browse/HDFS-14983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16994084#comment-16994084 ] Hudson commented on HDFS-14983: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17754 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17754/]) HDFS-14983. RBF: Add dfsrouteradmin -refreshSuperUserGroupsConfiguration (tasanuma: rev 93bb368094e48e752c0732d979fbcd24e432bfc1) * (add) hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/store/protocol/RefreshSuperUserGroupsConfigurationRequest.java * (add) hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/store/protocol/impl/pb/RefreshSuperUserGroupsConfigurationRequestPBImpl.java * (add) hadoop-hdfs-project/hadoop-hdfs-rbf/src/test/java/org/apache/hadoop/hdfs/server/federation/router/TestRouterRefreshSuperUserGroupsConfiguration.java * (edit) hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterClient.java * (edit) hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/proto/RouterProtocol.proto * (add) hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/store/protocol/impl/pb/RefreshSuperUserGroupsConfigurationResponsePBImpl.java * (edit) hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterAdminServer.java * (edit) hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/proto/FederationProtocol.proto * (edit) hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/protocolPB/RouterAdminProtocol.java * (edit) hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/protocolPB/RouterAdminProtocolServerSideTranslatorPB.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSCommands.md * (edit) hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/tools/federation/RouterAdmin.java * (add) hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/store/protocol/RefreshSuperUserGroupsConfigurationResponse.java * (add) hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/resolver/RouterGenericManager.java * (edit) hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/protocolPB/RouterAdminProtocolTranslatorPB.java > RBF: Add dfsrouteradmin -refreshSuperUserGroupsConfiguration command option > --- > > Key: HDFS-14983 > URL: https://issues.apache.org/jira/browse/HDFS-14983 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf >Reporter: Akira Ajisaka >Assignee: Xieming Li >Priority: Minor > Fix For: 3.3.0 > > Attachments: HDFS-14983.002.patch, HDFS-14983.003.patch, > HDFS-14983.004.patch, HDFS-14983.005.patch, HDFS-14983.006.patch, > HDFS-14983.draft.001.patch > > > NameNode can update proxyuser config by -refreshSuperUserGroupsConfiguration > without restarting but DFSRouter cannot. It would be better for DFSRouter to > have such functionality to be compatible with NameNode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15032) Balancer crashes when it fails to contact an unavailable NN via ObserverReadProxyProvider
[ https://issues.apache.org/jira/browse/HDFS-15032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16993769#comment-16993769 ] Hudson commented on HDFS-15032: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17752 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17752/]) HDFS-15032. Properly handle InvocationTargetExceptions in the proxy (xkrogen: rev c174d50b30abc08a4642614fb35165e79792608b) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestBalancerService.java * (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/ProxyCombiner.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestBalancerWithHANameNodes.java > Balancer crashes when it fails to contact an unavailable NN via > ObserverReadProxyProvider > - > > Key: HDFS-15032 > URL: https://issues.apache.org/jira/browse/HDFS-15032 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover >Affects Versions: 2.10.0 >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Fix For: 3.3.0, 3.1.4, 3.2.2, 2.10.1 > > Attachments: HDFS-15032.000.patch, HDFS-15032.001.patch, > HDFS-15032.002.patch, HDFS-15032.003.patch, HDFS-15032.004.patch, > HDFS-15032.005.patch, debugger_with_tostring.png, > debugger_without_tostring.png > > > When trying to run the Balancer using ObserverReadProxyProvider (to allow it > to read from the Observer Node as described in HDFS-14979), if one of the NNs > isn't running, the Balancer will crash. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15045) DataStreamer#createBlockOutputStream() should log exception in warn.
[ https://issues.apache.org/jira/browse/HDFS-15045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16993166#comment-16993166 ] Hudson commented on HDFS-15045: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17750 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17750/]) HDFS-15045. DataStreamer#createBlockOutputStream() should log exception (surendralilhore: rev c2e9783d5f236015f2ad826fcbad061e2118e454) * (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DataStreamer.java > DataStreamer#createBlockOutputStream() should log exception in warn. > > > Key: HDFS-15045 > URL: https://issues.apache.org/jira/browse/HDFS-15045 > Project: Hadoop HDFS > Issue Type: Bug > Components: dfsclient >Affects Versions: 3.1.1 >Reporter: Surendra Singh Lilhore >Assignee: Ravuri Sushma sree >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-15045.001.patch > > > {code:java} > } catch (IOException ie) { > if (!errorState.isRestartingNode()) { > LOG.info("Exception in createBlockOutputStream " + this, ie); > } {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14854) Create improved decommission monitor implementation
[ https://issues.apache.org/jira/browse/HDFS-14854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16993123#comment-16993123 ] Hudson commented on HDFS-14854: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17749 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17749/]) HDFS-14854. Create improved decommission monitor implementation. (weichiu: rev c93cb6790e0f1c64efd03d859f907a0522010894) * (add) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeAdminDefaultMonitor.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDecommissioningStatus.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java * (add) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDecommissionWithStripedBackoffMonitor.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeAdminManager.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDecommissionWithStriped.java * (add) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeAdminBackoffMonitor.java * (add) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeAdminMonitorBase.java * (add) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDecommissioningStatusWithBackoffMonitor.java * (add) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDecommissionWithBackoffMonitor.java * (add) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeAdminMonitorInterface.java > Create improved decommission monitor implementation > --- > > Key: HDFS-14854 > URL: https://issues.apache.org/jira/browse/HDFS-14854 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.3.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Fix For: 3.3.0 > > Attachments: 012_to_013_changes.diff, > Decommission_Monitor_V2_001.pdf, HDFS-14854.001.patch, HDFS-14854.002.patch, > HDFS-14854.003.patch, HDFS-14854.004.patch, HDFS-14854.005.patch, > HDFS-14854.006.patch, HDFS-14854.007.patch, HDFS-14854.008.patch, > HDFS-14854.009.patch, HDFS-14854.010.patch, HDFS-14854.011.patch, > HDFS-14854.012.patch, HDFS-14854.013.patch, HDFS-14854.014.patch > > > In HDFS-13157, we discovered a series of problems with the current > decommission monitor implementation, such as: > * Blocks are replicated sequentially disk by disk and node by node, and > hence the load is not spread well across the cluster > * Adding a node for decommission can cause the namenode write lock to be > held for a long time. > * Decommissioning nodes floods the replication queue and under replicated > blocks from a future node or disk failure may way for a long time before they > are replicated. > * Blocks pending replication are checked many times under a write lock > before they are sufficiently replicate, wasting resources > In this Jira I propose to create a new implementation of the decommission > monitor that resolves these issues. As it will be difficult to prove one > implementation is better than another, the new implementation can be enabled > or disabled giving the option of the existing implementation or the new one. > I will attach a pdf with some more details on the design and then a version 1 > patch shortly. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15040) RBF: Secured Router should not run when SecretManager is not running
[ https://issues.apache.org/jira/browse/HDFS-15040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16992263#comment-16992263 ] Hudson commented on HDFS-15040: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17746 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17746/]) HDFS-15040. RBF: Secured Router should not run when SecretManager is not (github: rev c4733377d0fa375a8d585f5cb1db79bf20ec6710) * (add) hadoop-hdfs-project/hadoop-hdfs-rbf/src/test/java/org/apache/hadoop/hdfs/server/federation/security/MockNotRunningSecretManager.java * (edit) hadoop-hdfs-project/hadoop-hdfs-rbf/src/test/java/org/apache/hadoop/hdfs/server/federation/security/TestRouterSecurityManager.java * (edit) hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/security/RouterSecurityManager.java > RBF: Secured Router should not run when SecretManager is not running > > > Key: HDFS-15040 > URL: https://issues.apache.org/jira/browse/HDFS-15040 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Takanobu Asanuma >Assignee: Takanobu Asanuma >Priority: Major > Fix For: 3.3.0 > > > We have faced an issue that router is running while SecretManager is not > running. HDFS-14835 is a similar fix which checks whether SecreatManager is > null or not. But it didn't cover this case. So we also need to check the > running status. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15043) RBF: The detail of the Exception is not shown in ZKDelegationTokenSecretManagerImpl
[ https://issues.apache.org/jira/browse/HDFS-15043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16992257#comment-16992257 ] Hudson commented on HDFS-15043: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17745 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17745/]) HDFS-15043. RBF: The detail of the Exception is not shown in (github: rev 9f098520517e3adfad0a2721284ccc19af3e6673) * (edit) hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/security/token/ZKDelegationTokenSecretManagerImpl.java > RBF: The detail of the Exception is not shown in > ZKDelegationTokenSecretManagerImpl > --- > > Key: HDFS-15043 > URL: https://issues.apache.org/jira/browse/HDFS-15043 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Akira Ajisaka >Assignee: Akira Ajisaka >Priority: Major > Fix For: 3.3.0 > > > In the constructor of ZKDTSMImpl, when IOException occurs in > super.startThreads(), the message of the exception is not logged. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14522) Allow compact property description in xml in httpfs
[ https://issues.apache.org/jira/browse/HDFS-14522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16992105#comment-16992105 ] Hudson commented on HDFS-14522: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17744 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17744/]) HDFS-14522. Allow compact property description in xml in httpfs. (#1737) (github: rev 4dffd81bb75efaa5742d2246354ebdc86cbd1aab) * (add) hadoop-hdfs-project/hadoop-hdfs-httpfs/src/test/resources/test-compact-format-property.xml * (edit) hadoop-hdfs-project/hadoop-hdfs-httpfs/src/test/java/org/apache/hadoop/lib/util/TestConfigurationUtils.java * (edit) hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/java/org/apache/hadoop/lib/util/ConfigurationUtils.java > Allow compact property description in xml in httpfs > --- > > Key: HDFS-14522 > URL: https://issues.apache.org/jira/browse/HDFS-14522 > Project: Hadoop HDFS > Issue Type: Improvement > Components: httpfs >Reporter: Akira Ajisaka >Assignee: Masatake Iwasaki >Priority: Major > > HADOOP-6964 allowed compact property description in Hadoop configuration, > however, it is not allowed in httpfs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15028) Keep the capacity of volume and reduce a system call
[ https://issues.apache.org/jira/browse/HDFS-15028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16990237#comment-16990237 ] Hudson commented on HDFS-15028: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17736 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17736/]) HDFS-15028. Keep the capacity of volume and reduce a system call. (iwasakims: rev 11cd5b6e39adbf159891852f3482aebdde5459fb) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestFsVolumeList.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsVolumeImpl.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java > Keep the capacity of volume and reduce a system call > > > Key: HDFS-15028 > URL: https://issues.apache.org/jira/browse/HDFS-15028 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Yang Yun >Assignee: Yang Yun >Priority: Minor > Attachments: HDFS-15028.patch, HDFS-15028.patch, HDFS-15028.patch, > HDFS-15028.patch, HDFS-15028.patch > > > The local volume is not changed. so keep the first value of the capacity and > reuse for each heartbeat. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14751) Synchronize on diffs in DirectoryScanner
[ https://issues.apache.org/jira/browse/HDFS-14751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16990228#comment-16990228 ] Hudson commented on HDFS-14751: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17735 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17735/]) HDFS-14751. Synchronize on diffs in DirectoryScanner. Contributed by (weichiu: rev ecd461f940efcd8c75f4833cf09bc7a52cc0b559) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DirectoryScanner.java > Synchronize on diffs in DirectoryScanner > > > Key: HDFS-14751 > URL: https://issues.apache.org/jira/browse/HDFS-14751 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Lisheng Sun >Assignee: Lisheng Sun >Priority: Minor > Fix For: 3.3.0 > > Attachments: HDFS-14751.001.patch, HDFS-14751.002.patch, > HDFS-14751.003.patch, HDFS-14751.004.patch > > > {code:java} > [ERROR] Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: > 21.693 s <<< FAILURE! - in > org.apache.hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency > [ERROR] > testGenerationStampInFuture(org.apache.hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency) > Time elapsed: 7.572 s <<< ERROR! > java.util.ConcurrentModificationException > at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:909) > at java.util.ArrayList$Itr.next(ArrayList.java:859) > at > com.google.common.collect.AbstractMapBasedMultimap$Itr.next(AbstractMapBasedMultimap.java:1153) > at > java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1044) > at > org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:433) > at > org.apache.hadoop.hdfs.server.datanode.DataNodeTestUtils.runDirectoryScanner(DataNodeTestUtils.java:202) > at > org.apache.hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency.testGenerationStampInFuture(TestNameNodeMetadataConsistency.java:92) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) > at > org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) > {code} > Ref:[https://builds.apache.org/job/PreCommit-HDFS-Build/27567/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...
[jira] [Commented] (HDFS-14476) lock too long when fix inconsistent blocks between disk and in-memory
[ https://issues.apache.org/jira/browse/HDFS-14476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16990227#comment-16990227 ] Hudson commented on HDFS-14476: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17735 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17735/]) HDFS-14476. lock too long when fix inconsistent blocks between disk and (weichiu: rev 313b76f8e92643e3412a98dc73f83437729f3984) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DirectoryScanner.java > lock too long when fix inconsistent blocks between disk and in-memory > - > > Key: HDFS-14476 > URL: https://issues.apache.org/jira/browse/HDFS-14476 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 2.6.0, 2.7.0, 3.0.3 >Reporter: Sean Chow >Assignee: Sean Chow >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-14476-branch-2.01.patch, HDFS-14476.00.patch, > HDFS-14476.002.patch, HDFS-14476.01.patch, HDFS-14476.branch-3.2.001.patch, > datanode-with-patch-14476.png > > > When directoryScanner have the results of differences between disk and > in-memory blocks. it will try to run {{checkAndUpdate}} to fix it. However > {{FsDatasetImpl.checkAndUpdate}} is a synchronized call > As I have about 6millions blocks for every datanodes and every 6hours' scan > will have about 25000 abnormal blocks to fix. That leads to a long lock > holding FsDatasetImpl object. > let's assume every block need 10ms to fix(because of latency of SAS disk), > that will cost 250 seconds to finish. That means all reads and writes will be > blocked for 3mins for that datanode. > > {code:java} > 2019-05-06 08:06:51,704 INFO > org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool > BP-1644920766-10.223.143.220-1450099987967 Total blocks: 6850197, missing > metadata files:23574, missing block files:23574, missing blocks in > memory:47625, mismatched blocks:0 > ... > 2019-05-06 08:16:41,625 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Took 588402ms to process 1 commands from NN > {code} > Take long time to process command from nn because threads are blocked. And > namenode will see long lastContact time for this datanode. > Maybe this affect all hdfs versions. > *how to fix:* > just like process invalidate command from namenode with 1000 batch size, fix > these abnormal block should be handled with batch too and sleep 2 seconds > between the batch to allow normal reading/writing blocks. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14998) [SBN read] Update Observer Namenode doc for ZKFC after HDFS-14130
[ https://issues.apache.org/jira/browse/HDFS-14998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16990028#comment-16990028 ] Hudson commented on HDFS-14998: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17733 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17733/]) HDFS-14998. [SBN read] Update Observer Namenode doc for ZKFC after (ayushsaxena: rev 705b172b95db345a99adf088fca83c67bd13a691) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/ObserverNameNode.md > [SBN read] Update Observer Namenode doc for ZKFC after HDFS-14130 > - > > Key: HDFS-14998 > URL: https://issues.apache.org/jira/browse/HDFS-14998 > Project: Hadoop HDFS > Issue Type: Improvement > Components: documentation >Affects Versions: 3.3.0 >Reporter: Fei Hui >Assignee: Fei Hui >Priority: Minor > Attachments: HDFS-14998.001.patch, HDFS-14998.002.patch, > HDFS-14998.003.patch, HDFS-14998.004.patch, HDFS-14998.005.patch, > HDFS-14998.006.patch > > > After HDFS-14130, we should update observer namenode doc, observer namenode > can run with ZKFC running -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14869) Data loss in case of distcp using snapshot diff. Replication should include rename records if file was skipped in the previous iteration
[ https://issues.apache.org/jira/browse/HDFS-14869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16989699#comment-16989699 ] Hudson commented on HDFS-14869: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17732 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17732/]) HDFS-14869 Copy renamed files which are not excluded anymore by filter (shashikant: rev fc97034b29243a0509633849de55aa734859) * (edit) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpSync.java * (edit) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCp.java * (edit) hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestDistCpSync.java > Data loss in case of distcp using snapshot diff. Replication should include > rename records if file was skipped in the previous iteration > > > Key: HDFS-14869 > URL: https://issues.apache.org/jira/browse/HDFS-14869 > Project: Hadoop HDFS > Issue Type: Bug > Components: distcp >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Fix For: 3.1.4 > > > This issue arises when a directory or file is excluded by exclusion filter > during distcp replication. Later on if the directory is renamed later to a > name which is not excluded by the filter, the snapshot diff reports only a > rename operation. The directory is never copied to target even though its > not excluded now. This also doesn't throw any error so there is no way to > find the issue. > Steps to reproduce > * Create a directory in hdfs to copy using distcp. > * Include a staging folder in the directory. > {code:java} > [hdfs@ctr-e141-1563959304486-33995-01-03 hadoop-mapreduce]$ hadoop fs -ls > /tmp/tocopy > Found 4 items > -rw-r--r-- 3 hdfs hdfs 16 2019-09-12 10:32 /tmp/tocopy/.b.txt > drwxr-xr-x - hdfs hdfs 0 2019-09-23 09:18 /tmp/tocopy/.staging > -rw-r--r-- 3 hdfs hdfs 12 2019-09-12 10:32 /tmp/tocopy/a.txt > -rw-r--r-- 3 hdfs hdfs 4 2019-09-20 08:23 /tmp/tocopy/foo.txt{code} > * The exclusion filter is set to exclude any staging directory > {code:java} > [hdfs@ctr-e141-1563959304486-33995-01-03 hadoop-mapreduce]$ cat > /tmp/filter > .*\.Trash.* > .*\.staging.*{code} > * Do a copy using distcp snapshots, the staging directory is not replicated. > {code:java} > hadoop jar hadoop-distcp-3.3.0-SNAPSHOT.jar > -Dmapreduce.job.user.classpath.first=true -filters /tmp/filter > /tmp/tocopy/.snapshot/s1 /tmp/target > [hdfs@ctr-e141-1563959304486-33995-01-03 root]$ hadoop fs -ls /tmp/target > Found 3 items > -rw-r--r-- 3 hdfs hdfs 16 2019-09-24 06:56 /tmp/target/.b.txt > -rw-r--r-- 3 hdfs hdfs 12 2019-09-24 06:56 /tmp/target/a.txt > -rw-r--r-- 3 hdfs hdfs 4 2019-09-24 06:56 /tmp/target/foo.txt{code} > * Rename the staging directory to final > {code:java} > [hdfs@ctr-e141-1563959304486-33995-01-03 hadoop-mapreduce]$ hadoop fs -mv > /tmp/tocopy/.staging /tmp/tocopy/final{code} > * Do a copy using snapshot diff. > {code:java} > [hdfs@ctr-e141-1563959304486-33995-01-03 hadoop-mapreduce]$ hdfs > snapshotDiff /tmp/tocopy s1 s2[hdfs@ctr-e141-1563959304486-33995-01-03 > hadoop-mapreduce]$ hdfs snapshotDiff /tmp/tocopy s1 s2Difference between > snapshot s1 and snapshot s2 under directory /tmp/tocopy:M .R ./.staging -> > ./final > {code} > * The diff report just has a rename record and the new final directory is > never copied. > {code:java} > [hdfs@ctr-e141-1563959304486-33995-01-03 hadoop-mapreduce]$ hadoop jar > hadoop-distcp-3.3.0-SNAPSHOT.jar -Dmapreduce.job.user.classpath.first=true > -filters /tmp/filter -diff s1 s2 -update /tmp/tocopy /tmp/target > 19/09/24 07:05:32 INFO tools.DistCp: Input Options: > DistCpOptions{atomicCommit=false, syncFolder=true, deleteMissing=false, > ignoreFailures=false, overwrite=false, append=false, useDiff=true, > useRdiff=false, fromSnapshot=s1, toSnapshot=s2, skipCRC=false, blocking=true, > numListstatusThreads=0, maxMaps=20, mapBandwidth=0.0, > copyStrategy='uniformsize', preserveStatus=[BLOCKSIZE], atomicWorkPath=null, > logPath=null, sourceFileListing=null, sourcePaths=[/tmp/tocopy], > targetPath=/tmp/target, filtersFile='/tmp/filter', blocksPerChunk=0, > copyBufferSize=8192, verboseLog=false, directWrite=false}, > sourcePaths=[/tmp/tocopy], targetPathExists=true, preserveRawXattrsfalse > 19/09/24 07:05:32 INFO client.RMProxy: Connecting to ResourceManager at > ctr-e141-1563959304486-33995-01-03.hwx.site/172.27.68.128:8050 > 19/09/24 07:05:33 INFO client.AHSProxy: Connecting to Application History > server at ctr-e141-1563959304486-33995-01-03.hwx.site/172.27.68.128
[jira] [Commented] (HDFS-15023) [SBN read] ZKFC should check the state before joining the election
[ https://issues.apache.org/jira/browse/HDFS-15023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16988825#comment-16988825 ] Hudson commented on HDFS-15023: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17725 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17725/]) HDFS-15023. [SBN read] ZKFC should check the state before joining the (ayushsaxena: rev 83a14559e594b0e918d04cafd8c7c6ac57715b22) * (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ZKFailoverController.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/tools/TestDFSZKFailoverController.java * (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ActiveStandbyElector.java > [SBN read] ZKFC should check the state before joining the election > -- > > Key: HDFS-15023 > URL: https://issues.apache.org/jira/browse/HDFS-15023 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Fei Hui >Assignee: Fei Hui >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-15023.001.patch, HDFS-15023.002.patch, > HDFS-15023.003.patch, HDFS-15023.004.patch, HDFS-15023.005.patch > > > As discussed HDFS-14961, ZKFC should not join election when its state is > observer. > Right now when namemode was an observer, it joined election and it would be > become a standby. > MonitorDaemon thread callchain is that > doHealthChecks -> enterState(State.SERVICE_HEALTHY) -> recheckElectability() > -> elector.joinElection(targetToData(localTarget)) -> joinElectionInternal -> > createLockNodeAsync > callBack for zookeeper > processResult -> becomeStandby -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13811) RBF: Race condition between router admin quota update and periodic quota update service
[ https://issues.apache.org/jira/browse/HDFS-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16987723#comment-16987723 ] Hudson commented on HDFS-13811: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17720 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17720/]) HDFS-13811. RBF: Race condition between router admin quota update and (yqlin: rev 47fdae79041ba2bb036ef7723a93ade5b1ac3619) * (edit) hadoop-hdfs-project/hadoop-hdfs-rbf/src/test/java/org/apache/hadoop/hdfs/server/federation/router/TestRouterQuota.java * (edit) hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterQuotaManager.java * (edit) hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterQuotaUpdateService.java * (edit) hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/store/impl/MountTableStoreImpl.java * (edit) hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/Router.java * (edit) hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/store/MountTableStore.java > RBF: Race condition between router admin quota update and periodic quota > update service > --- > > Key: HDFS-13811 > URL: https://issues.apache.org/jira/browse/HDFS-13811 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 3.1.0 >Reporter: Dibyendu Karmakar >Assignee: Jinglun >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-13811-000.patch, HDFS-13811-HDFS-13891-000.patch, > HDFS-13811.001.patch, HDFS-13811.002.patch, HDFS-13811.003.patch, > HDFS-13811.004.patch, HDFS-13811.005.patch, HDFS-13811.006.patch, > HDFS-13811.007.patch > > > If we try to update quota of an existing mount entry and at the same time > periodic quota update service is running on the same mount entry, it is > leading the mount table to _inconsistent state._ > Here transactions are: > A - Quota update service is fetching mount table entries. > B - Quota update service is updating the mount table with current usage. > A' - User is trying to update quota using admin cmd. > and the transaction sequence is [ A A' B ] > quota update service is updating the mount table with old quota value. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org