[jira] [Assigned] (HDFS-4953) enable HDFS local reads via mmap
[ https://issues.apache.org/jira/browse/HDFS-4953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Masatake Iwasaki reassigned HDFS-4953: -- Assignee: Colin McCabe (was: Masatake Iwasaki) > enable HDFS local reads via mmap > > > Key: HDFS-4953 > URL: https://issues.apache.org/jira/browse/HDFS-4953 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 2.3.0 >Reporter: Colin McCabe >Assignee: Colin McCabe >Priority: Major > Fix For: HDFS-4949 > > Attachments: HDFS-4953.001.patch, HDFS-4953.002.patch, > HDFS-4953.003.patch, HDFS-4953.004.patch, HDFS-4953.005.patch, > HDFS-4953.006.patch, HDFS-4953.007.patch, HDFS-4953.008.patch, benchmark.png > > > Currently, the short-circuit local read pathway allows HDFS clients to access > files directly without going through the DataNode. However, all of these > reads involve a copy at the operating system level, since they rely on the > read() / pread() / etc family of kernel interfaces. > We would like to enable HDFS to read local files via mmap. This would enable > truly zero-copy reads. > In the initial implementation, zero-copy reads will only be performed when > checksums were disabled. Later, we can use the DataNode's cache awareness to > only perform zero-copy reads when we know that checksum has already been > verified. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-4953) enable HDFS local reads via mmap
[ https://issues.apache.org/jira/browse/HDFS-4953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Masatake Iwasaki reassigned HDFS-4953: -- Assignee: Masatake Iwasaki (was: Colin McCabe) > enable HDFS local reads via mmap > > > Key: HDFS-4953 > URL: https://issues.apache.org/jira/browse/HDFS-4953 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 2.3.0 >Reporter: Colin McCabe >Assignee: Masatake Iwasaki >Priority: Major > Fix For: HDFS-4949 > > Attachments: HDFS-4953.001.patch, HDFS-4953.002.patch, > HDFS-4953.003.patch, HDFS-4953.004.patch, HDFS-4953.005.patch, > HDFS-4953.006.patch, HDFS-4953.007.patch, HDFS-4953.008.patch, benchmark.png > > > Currently, the short-circuit local read pathway allows HDFS clients to access > files directly without going through the DataNode. However, all of these > reads involve a copy at the operating system level, since they rely on the > read() / pread() / etc family of kernel interfaces. > We would like to enable HDFS to read local files via mmap. This would enable > truly zero-copy reads. > In the initial implementation, zero-copy reads will only be performed when > checksums were disabled. Later, we can use the DataNode's cache awareness to > only perform zero-copy reads when we know that checksum has already been > verified. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-15436) Default mount table name used by ViewFileSystem should be configurable
[ https://issues.apache.org/jira/browse/HDFS-15436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Virajith Jalaparti resolved HDFS-15436. --- Resolution: Fixed > Default mount table name used by ViewFileSystem should be configurable > -- > > Key: HDFS-15436 > URL: https://issues.apache.org/jira/browse/HDFS-15436 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: viewfs, viewfsOverloadScheme >Reporter: Virajith Jalaparti >Assignee: Virajith Jalaparti >Priority: Major > > Currently, if no authority is provided and the scheme of the Path is doesn't > match the scheme of the {{fs.defaultFS}} , the mount table used by > ViewFileSystem to resolve this path is {{default}}. > This breaks accesses to path like {{hdfs:///foo/bar}} (without any authority) > when the following configurations are used: > (1) {{fs.defaultFS}} = {{viewfs://clustername/}} > (2) {{fs.hdfs.impl = > org.apache.hadoop.fs.viewfs.ViewFileSystemOverloadScheme}} > This JIRA proposes to add a new configuration > {{fs.viewfs.mounttable.default.name.key}} which is used to get the name of > the cluster/mount table when the authority is missing in cases like the > above. If not set, the string {{default}} will be used as today. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15436) Default mount table name used by ViewFileSystem should be configurable
[ https://issues.apache.org/jira/browse/HDFS-15436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17146627#comment-17146627 ] Hudson commented on HDFS-15436: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #18385 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/18385/]) HDFS-15436. Default mount table name used by ViewFileSystem should be (github: rev bed0a3a37404e9defda13a5bffe5609e72466e46) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/fs/viewfs/TestViewFileSystemOverloadSchemeWithHdfsScheme.java * (edit) hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/viewfs/TestViewFsWithAuthorityLocalFs.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/ViewFsOverloadScheme.md * (edit) hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/viewfs/ViewFsBaseTest.java * (edit) hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/viewfs/ViewFsTestSetup.java * (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/Constants.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/fs/viewfs/TestViewFileSystemHdfs.java * (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/ConfigUtil.java * (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/InodeTree.java > Default mount table name used by ViewFileSystem should be configurable > -- > > Key: HDFS-15436 > URL: https://issues.apache.org/jira/browse/HDFS-15436 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: viewfs, viewfsOverloadScheme >Reporter: Virajith Jalaparti >Assignee: Virajith Jalaparti >Priority: Major > > Currently, if no authority is provided and the scheme of the Path is doesn't > match the scheme of the {{fs.defaultFS}} , the mount table used by > ViewFileSystem to resolve this path is {{default}}. > This breaks accesses to path like {{hdfs:///foo/bar}} (without any authority) > when the following configurations are used: > (1) {{fs.defaultFS}} = {{viewfs://clustername/}} > (2) {{fs.hdfs.impl = > org.apache.hadoop.fs.viewfs.ViewFileSystemOverloadScheme}} > This JIRA proposes to add a new configuration > {{fs.viewfs.mounttable.default.name.key}} which is used to get the name of > the cluster/mount table when the authority is missing in cases like the > above. If not set, the string {{default}} will be used as today. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15420) approx scheduled blocks not reseting over time
[ https://issues.apache.org/jira/browse/HDFS-15420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17146614#comment-17146614 ] Max Mizikar commented on HDFS-15420: Is there a stat or a log I can look for to see if any reconstructions have timed out? I'm not familiar with this metric. > approx scheduled blocks not reseting over time > -- > > Key: HDFS-15420 > URL: https://issues.apache.org/jira/browse/HDFS-15420 > Project: Hadoop HDFS > Issue Type: Bug > Components: block placement >Affects Versions: 2.6.0, 3.0.0 > Environment: Our 2.6.0 environment is a 3 node cluster running > cdh5.15.0. > Our 3.0.0 environment is a 4 node cluster running cdh6.3.0. >Reporter: Max Mizikar >Priority: Minor > Attachments: Screenshot from 2020-06-18 09-29-57.png, Screenshot from > 2020-06-18 09-31-15.png > > > We have been experiencing large amounts of scheduled blocks that never get > cleared out. This is preventing blocks from being placed even when there is > plenty of space on the system. > Here is an example of the block growth over 24 hours on one of our systems > running 2.6.0 > !Screenshot from 2020-06-18 09-29-57.png! > Here is an example of the block growth over 24 hours on one of our systems > running 3.0.0 > !Screenshot from 2020-06-18 09-31-15.png! > https://issues.apache.org/jira/browse/HDFS-1172 appears to be the main issue > we were having on 2.6.0 so the growth has decreased since upgrading to 3.0.0, > however, there appears to still be a systemic growth in scheduled blocks over > time and our systems will still need to restart the namenode on occasion to > reset this count. I have not determined what is causing the leaked blocks in > 3.0.0. > Looking into the issue, I discovered that the intention is for scheduled > blocks to slowly go back down to 0 after errors cause blocks to be leaked. > {code} > /** Increment the number of blocks scheduled. */ > void incrementBlocksScheduled(StorageType t) { > currApproxBlocksScheduled.add(t, 1); > } > > /** Decrement the number of blocks scheduled. */ > void decrementBlocksScheduled(StorageType t) { > if (prevApproxBlocksScheduled.get(t) > 0) { > prevApproxBlocksScheduled.subtract(t, 1); > } else if (currApproxBlocksScheduled.get(t) > 0) { > currApproxBlocksScheduled.subtract(t, 1); > } > // its ok if both counters are zero. > } > > /** Adjusts curr and prev number of blocks scheduled every few minutes. */ > private void rollBlocksScheduled(long now) { > if (now - lastBlocksScheduledRollTime > BLOCKS_SCHEDULED_ROLL_INTERVAL) { > prevApproxBlocksScheduled.set(currApproxBlocksScheduled); > currApproxBlocksScheduled.reset(); > lastBlocksScheduledRollTime = now; > } > } > {code} > However, this code does not do what is intended if the system has a constant > flow of written blocks. If blocks make it into prevApproxBlocksScheduled, the > next scheduled block increments currApproxBlocksScheduled and when it > completes, it decrements prevApproxBlocksScheduled preventing the leaked > block to be removed from the approx count. So, for errors to be corrected, we > have to not write any data for the roll period of 10 minutes. The number of > blocks we write per 10 minutes is quite high. This allows the error on the > approx counts to grow to very large numbers. > The comments in the ticket for the original implementation suggest this > issues was known. https://issues.apache.org/jira/browse/HADOOP-3707. However, > it's not clear to me if the severity of it was known at the time. > > So if there are some blocks that are not reported back by the datanode, > > they will eventually get adjusted (usually 10 min; bit longer if datanode > > is continuously receiving blocks). > The comments suggest it will eventually get cleared out, but in our case, it > never gets cleared out. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15434) RBF: MountTableResolver#getDestinationForPath failing with AssertionError from localCache
[ https://issues.apache.org/jira/browse/HDFS-15434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17146593#comment-17146593 ] Íñigo Goiri commented on HDFS-15434: I have never seen this error, no. I guess refining the removalListener would help. > RBF: MountTableResolver#getDestinationForPath failing with AssertionError > from localCache > - > > Key: HDFS-15434 > URL: https://issues.apache.org/jira/browse/HDFS-15434 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: hemanthboyina >Priority: Major > > {code:java} > org.apache.hadoop.ipc.Remote.Exception : java.lang.AssertionError > com.google.common.cache.LocalCache$Segment.evictEntries(LocalCache.java:2698) > at > com.google.common.cache.LocalCache$Segment.storeLoadedValue(LocalCache.java:3166) > > at > com.google.common.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2386) > at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2351) > at > com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2313) > > at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228) at > at com.google.common.cache.LocalCache.get(LocalCache.java:3965) > at > com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4764) > at > org.apache.hadoop.hdfs.server.federation.resolver.MountTableResolver.getDestinationForPath(MountTableResolver.java:382) > at > org.apache.hadoop.hdfs.server.federation.resolver.MultipleDestinationMountTableResolver.getDestinationForPath(MultipleDestinationMountTableResolver.java:87) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getLocationsForPath(RouterRpcServer.java:1406) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getLocationsForPath(RouterRpcServer.java:1389) > at > org.apache.hadoop.hdfs.server.federation.router.RouterClientProtocol.getFileInfo(RouterClientProtocol.java:741) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getFileInfo(RouterRpcServer.java:763) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15421) IBR leak causes standby NN to be stuck in safe mode
[ https://issues.apache.org/jira/browse/HDFS-15421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17146537#comment-17146537 ] Konstantin Shvachko commented on HDFS-15421: Correct {{addBlockCollection()}} takes care of (3). +1 on v04 patch. > IBR leak causes standby NN to be stuck in safe mode > --- > > Key: HDFS-15421 > URL: https://issues.apache.org/jira/browse/HDFS-15421 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Kihwal Lee >Assignee: Akira Ajisaka >Priority: Blocker > Labels: release-blocker > Attachments: HDFS-15421-000.patch, HDFS-15421-001.patch, > HDFS-15421.002.patch, HDFS-15421.003.patch, HDFS-15421.004.patch, > HDFS-15421.005.patch, HDFS-15421.006.patch, HDFS-15421.007.patch > > > After HDFS-14941, update of the global gen stamp is delayed in certain > situations. This makes the last set of incremental block reports from append > "from future", which causes it to be simply re-queued to the pending DN > message queue, rather than processed to complete the block. The last set of > IBRs will leak and never cleaned until it transitions to active. The size of > {{pendingDNMessages}} constantly grows until then. > If a leak happens while in a startup safe mode, the namenode will never be > able to come out of safe mode on its own. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15434) RBF: MountTableResolver#getDestinationForPath failing with AssertionError from localCache
[ https://issues.apache.org/jira/browse/HDFS-15434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17146447#comment-17146447 ] Brahma Reddy Battula commented on HDFS-15434: - [~inigoiri] and [~crh] did you come across this scenario..? > RBF: MountTableResolver#getDestinationForPath failing with AssertionError > from localCache > - > > Key: HDFS-15434 > URL: https://issues.apache.org/jira/browse/HDFS-15434 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: hemanthboyina >Priority: Major > > {code:java} > org.apache.hadoop.ipc.Remote.Exception : java.lang.AssertionError > com.google.common.cache.LocalCache$Segment.evictEntries(LocalCache.java:2698) > at > com.google.common.cache.LocalCache$Segment.storeLoadedValue(LocalCache.java:3166) > > at > com.google.common.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2386) > at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2351) > at > com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2313) > > at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228) at > at com.google.common.cache.LocalCache.get(LocalCache.java:3965) > at > com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4764) > at > org.apache.hadoop.hdfs.server.federation.resolver.MountTableResolver.getDestinationForPath(MountTableResolver.java:382) > at > org.apache.hadoop.hdfs.server.federation.resolver.MultipleDestinationMountTableResolver.getDestinationForPath(MultipleDestinationMountTableResolver.java:87) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getLocationsForPath(RouterRpcServer.java:1406) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getLocationsForPath(RouterRpcServer.java:1389) > at > org.apache.hadoop.hdfs.server.federation.router.RouterClientProtocol.getFileInfo(RouterClientProtocol.java:741) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getFileInfo(RouterRpcServer.java:763) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15421) IBR leak causes standby NN to be stuck in safe mode
[ https://issues.apache.org/jira/browse/HDFS-15421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17146211#comment-17146211 ] Takanobu Asanuma commented on HDFS-15421: - Thanks for your explanation, [~shv]. After thinking about it again, my concern is already covered by HDFS-14941. Surely Standby NN should not update the global GS in OP_REASSIGN_LEASE. Sorry for confusing you, [~aajisaka]. I agree that HDFS-14941 and HDFS-15421.004.patch cover all the cases. I'm +1 on [^HDFS-15421.004.patch]. > IBR leak causes standby NN to be stuck in safe mode > --- > > Key: HDFS-15421 > URL: https://issues.apache.org/jira/browse/HDFS-15421 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Kihwal Lee >Assignee: Akira Ajisaka >Priority: Blocker > Labels: release-blocker > Attachments: HDFS-15421-000.patch, HDFS-15421-001.patch, > HDFS-15421.002.patch, HDFS-15421.003.patch, HDFS-15421.004.patch, > HDFS-15421.005.patch, HDFS-15421.006.patch, HDFS-15421.007.patch > > > After HDFS-14941, update of the global gen stamp is delayed in certain > situations. This makes the last set of incremental block reports from append > "from future", which causes it to be simply re-queued to the pending DN > message queue, rather than processed to complete the block. The last set of > IBRs will leak and never cleaned until it transitions to active. The size of > {{pendingDNMessages}} constantly grows until then. > If a leak happens while in a startup safe mode, the namenode will never be > able to come out of safe mode on its own. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15441) The default value of dfs.datanode.outliers.report.interval has no time unit
[ https://issues.apache.org/jira/browse/HDFS-15441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17146129#comment-17146129 ] Akira Ajisaka commented on HDFS-15441: -- Please ignore this. Sorry. > The default value of dfs.datanode.outliers.report.interval has no time unit > --- > > Key: HDFS-15441 > URL: https://issues.apache.org/jira/browse/HDFS-15441 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Akira Ajisaka >Priority: Minor > Labels: newbie > > {noformat} > 2020-06-25 13:03:10,619 WARN org.apache.hadoop.conf.Configuration: No unit > for dfs.datanode.outliers.report.interval(180) assuming MILLISECONDS > {noformat} > The default value in hdfs-default.xml should be "30m". -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15429) mkdirs should work when parent dir is internalDir and fallback configured.
[ https://issues.apache.org/jira/browse/HDFS-15429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17146128#comment-17146128 ] Hudson commented on HDFS-15429: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #18380 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/18380/]) HDFS-15429. mkdirs should work when parent dir is an internalDir and (github: rev d5e1bb6155496cf9d82e121dd1b65d0072312197) * (add) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/fs/viewfs/TestViewFsLinkFallback.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/fs/viewfs/TestViewFileSystemLinkFallback.java * (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/ViewFs.java * (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/ViewFileSystem.java > mkdirs should work when parent dir is internalDir and fallback configured. > -- > > Key: HDFS-15429 > URL: https://issues.apache.org/jira/browse/HDFS-15429 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 3.21 >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G >Priority: Major > Fix For: 3.4.0 > > > mkdir will not work if the parent dir is Internal mount dir (non leaf in > mount path) and fall back configured. > Since fallback is available and if same tree structure available in fallback, > we should be able to mkdir in fallback. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-15441) The default value of dfs.datanode.outliers.report.interval has no time unit
[ https://issues.apache.org/jira/browse/HDFS-15441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira Ajisaka resolved HDFS-15441. -- Resolution: Invalid > The default value of dfs.datanode.outliers.report.interval has no time unit > --- > > Key: HDFS-15441 > URL: https://issues.apache.org/jira/browse/HDFS-15441 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Akira Ajisaka >Priority: Minor > Labels: newbie > > {noformat} > 2020-06-25 13:03:10,619 WARN org.apache.hadoop.conf.Configuration: No unit > for dfs.datanode.outliers.report.interval(180) assuming MILLISECONDS > {noformat} > The default value in hdfs-default.xml should be "30m". -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15441) The default value of dfs.datanode.outliers.report.interval has no time unit
[ https://issues.apache.org/jira/browse/HDFS-15441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira Ajisaka updated HDFS-15441: - Description: {noformat} 2020-06-25 13:03:10,619 WARN org.apache.hadoop.conf.Configuration: No unit for dfs.datanode.outliers.report.interval(180) assuming MILLISECONDS {noformat} The default value in hdfs-default.xml should be "30m". was: {noformat} 2020-06-25 13:03:10,619 WARN org.apache.hadoop.conf.Configuration: No unit for dfs.datanode.outliers.report.interval(180) assuming MILLISECONDS {noformat} The default value should be "30m". > The default value of dfs.datanode.outliers.report.interval has no time unit > --- > > Key: HDFS-15441 > URL: https://issues.apache.org/jira/browse/HDFS-15441 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Akira Ajisaka >Priority: Minor > Labels: newbie > > {noformat} > 2020-06-25 13:03:10,619 WARN org.apache.hadoop.conf.Configuration: No unit > for dfs.datanode.outliers.report.interval(180) assuming MILLISECONDS > {noformat} > The default value in hdfs-default.xml should be "30m". -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15441) The default value of dfs.datanode.outliers.report.interval has no time unit
[ https://issues.apache.org/jira/browse/HDFS-15441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira Ajisaka updated HDFS-15441: - Labels: newbie (was: ) > The default value of dfs.datanode.outliers.report.interval has no time unit > --- > > Key: HDFS-15441 > URL: https://issues.apache.org/jira/browse/HDFS-15441 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Akira Ajisaka >Priority: Minor > Labels: newbie > > {noformat} > 2020-06-25 13:03:10,619 WARN org.apache.hadoop.conf.Configuration: No unit > for dfs.datanode.outliers.report.interval(180) assuming MILLISECONDS > {noformat} > The default value should be "30m". -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-15441) The default value of dfs.datanode.outliers.report.interval has no time unit
Akira Ajisaka created HDFS-15441: Summary: The default value of dfs.datanode.outliers.report.interval has no time unit Key: HDFS-15441 URL: https://issues.apache.org/jira/browse/HDFS-15441 Project: Hadoop HDFS Issue Type: Bug Components: datanode Reporter: Akira Ajisaka {noformat} 2020-06-25 13:03:10,619 WARN org.apache.hadoop.conf.Configuration: No unit for dfs.datanode.outliers.report.interval(180) assuming MILLISECONDS {noformat} The default value should be "30m". -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15429) mkdirs should work when parent dir is internalDir and fallback configured.
[ https://issues.apache.org/jira/browse/HDFS-15429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G updated HDFS-15429: --- Fix Version/s: 3.4.0 Hadoop Flags: Reviewed Resolution: Fixed Status: Resolved (was: Patch Available) Appreciate you timely reviews [~ayushtkn]. Thanks a lot. I have just committed it to trunk. > mkdirs should work when parent dir is internalDir and fallback configured. > -- > > Key: HDFS-15429 > URL: https://issues.apache.org/jira/browse/HDFS-15429 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 3.21 >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G >Priority: Major > Fix For: 3.4.0 > > > mkdir will not work if the parent dir is Internal mount dir (non leaf in > mount path) and fall back configured. > Since fallback is available and if same tree structure available in fallback, > we should be able to mkdir in fallback. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15421) IBR leak causes standby NN to be stuck in safe mode
[ https://issues.apache.org/jira/browse/HDFS-15421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17146113#comment-17146113 ] Akira Ajisaka commented on HDFS-15421: -- Thanks [~shv] for your comment. {quote} 2. I think adding applyImpendingGenerationStamp() in OP_REASSIGN_LEASE is incorrect as it restores the race condition of HDFS-14941. {quote} Agreed. We should not update the SNN global genstamp in {{OP_REASSIGN_LEASE}}. The global genstamp will be updated when tailing {{OP_CLOSE}} or other operations after {{OP_REASSIGN_LEASE}}. {quote} 3. Found one more place FSEditLogLoader.addNewBlock() were we need to add setGenerationStampIfGreater(). addNewBlock() adds a block with a new genStamp. {quote} Already covered in HDFS-14941: https://github.com/apache/hadoop/blob/dd900259c421d6edd0b89a535a1fe08ada91735f/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java#L4742 Now I'm +1 for 004 patch. > IBR leak causes standby NN to be stuck in safe mode > --- > > Key: HDFS-15421 > URL: https://issues.apache.org/jira/browse/HDFS-15421 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Kihwal Lee >Assignee: Akira Ajisaka >Priority: Blocker > Labels: release-blocker > Attachments: HDFS-15421-000.patch, HDFS-15421-001.patch, > HDFS-15421.002.patch, HDFS-15421.003.patch, HDFS-15421.004.patch, > HDFS-15421.005.patch, HDFS-15421.006.patch, HDFS-15421.007.patch > > > After HDFS-14941, update of the global gen stamp is delayed in certain > situations. This makes the last set of incremental block reports from append > "from future", which causes it to be simply re-queued to the pending DN > message queue, rather than processed to complete the block. The last set of > IBRs will leak and never cleaned until it transitions to active. The size of > {{pendingDNMessages}} constantly grows until then. > If a leak happens while in a startup safe mode, the namenode will never be > able to come out of safe mode on its own. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15105) Standby NN exits and fails to restart due to edit log corruption
[ https://issues.apache.org/jira/browse/HDFS-15105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17146036#comment-17146036 ] Xiaoqiao He commented on HDFS-15105: [~Tao Yang] Do you enable async editlog feature, Not sure if is it related to HDFS-15175, Please refer to that ticket. > Standby NN exits and fails to restart due to edit log corruption > > > Key: HDFS-15105 > URL: https://issues.apache.org/jira/browse/HDFS-15105 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.8.0 >Reporter: Tao Yang >Priority: Critical > > We found a issue that Standby NN exited and failed to restart until we > resolved the edit log corruption. > Error logs: > {noformat} > java.io.IOException: Mismatched block IDs or generation stamps, attempting to > replace block blk_74288647857_73526148211 with blk_74288647857_73526377369 as > block # 15/17 of > /maindump/mainv10/dump_online/lasttable/20200105015500/part-319 > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.updateBlocks(FSEditLogLoader.java:1019) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:431) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:234) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:143) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:885) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:866) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:234) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:342) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:295) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:312) > at > org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:455) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:308) > {noformat} > Related edit log transactions of the same file: > {noformat} > 1. TXID=444341628498 time=1578251449632 > OP_UPDATE_BLOCKS > blocks: ... blk_74288647857_73526148211 blk_74454090866_73526215536 > 2. TXID=444342382774 time=1578251520740 > OP_REASSIGN_LEASE > 3. TXID=444342401216 time=1578251522779 > OP_CLOSE > blocks: ... blk_74288647857_73526377369 blk_74454090866_73526374095 > 4. TXID=444342401394 > OP_SET_GENSTAMP_V2 > generate stamp: 73526377369 > 5. TXID=444342401395 time=1578251522835 > OP_TRUNCATE > 6. TXID=444342402176 time=1578251523246 > OP_CLOSE > blocks: ... blk_74288647857_73526377369 > {noformat} > According to the edit logs, it's wield to see that stamp(73526377369) was > generated in transaction 4 but already used in transaction 3, and for > transaction 3 there should be only the last block changed but in fact the > last two blocks are both changed. > This problem might be produced in a complex scenario that truncate operation > immediately followed the recover-lease operation for the same file. A > suspicious point is that between creation and being written for transaction > 3, stamp of the second last block was updated when committing block > synchronization caused by the truncate operation. > Related calling stack is as follows: > {noformat} > NameNodeRpcServer#commitBlockSynchronization > FSNamesystem#commitBlockSynchronization > // update last block > if(!copyTruncate) { > storedBlock.setGenerationStamp(newgenerationstamp); //updated the stamp > of the second last block in transaction 3 before being written > storedBlock.setNumBytes(newlength); > } > {noformat} > Any comments are welcome. Thanks. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org