[jira] [Assigned] (HDFS-4953) enable HDFS local reads via mmap

2020-06-26 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-4953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki reassigned HDFS-4953:
--

Assignee: Colin McCabe  (was: Masatake Iwasaki)

> enable HDFS local reads via mmap
> 
>
> Key: HDFS-4953
> URL: https://issues.apache.org/jira/browse/HDFS-4953
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 2.3.0
>Reporter: Colin McCabe
>Assignee: Colin McCabe
>Priority: Major
> Fix For: HDFS-4949
>
> Attachments: HDFS-4953.001.patch, HDFS-4953.002.patch, 
> HDFS-4953.003.patch, HDFS-4953.004.patch, HDFS-4953.005.patch, 
> HDFS-4953.006.patch, HDFS-4953.007.patch, HDFS-4953.008.patch, benchmark.png
>
>
> Currently, the short-circuit local read pathway allows HDFS clients to access 
> files directly without going through the DataNode.  However, all of these 
> reads involve a copy at the operating system level, since they rely on the 
> read() / pread() / etc family of kernel interfaces.
> We would like to enable HDFS to read local files via mmap.  This would enable 
> truly zero-copy reads.
> In the initial implementation, zero-copy reads will only be performed when 
> checksums were disabled.  Later, we can use the DataNode's cache awareness to 
> only perform zero-copy reads when we know that checksum has already been 
> verified.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-4953) enable HDFS local reads via mmap

2020-06-26 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-4953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki reassigned HDFS-4953:
--

Assignee: Masatake Iwasaki  (was: Colin McCabe)

> enable HDFS local reads via mmap
> 
>
> Key: HDFS-4953
> URL: https://issues.apache.org/jira/browse/HDFS-4953
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 2.3.0
>Reporter: Colin McCabe
>Assignee: Masatake Iwasaki
>Priority: Major
> Fix For: HDFS-4949
>
> Attachments: HDFS-4953.001.patch, HDFS-4953.002.patch, 
> HDFS-4953.003.patch, HDFS-4953.004.patch, HDFS-4953.005.patch, 
> HDFS-4953.006.patch, HDFS-4953.007.patch, HDFS-4953.008.patch, benchmark.png
>
>
> Currently, the short-circuit local read pathway allows HDFS clients to access 
> files directly without going through the DataNode.  However, all of these 
> reads involve a copy at the operating system level, since they rely on the 
> read() / pread() / etc family of kernel interfaces.
> We would like to enable HDFS to read local files via mmap.  This would enable 
> truly zero-copy reads.
> In the initial implementation, zero-copy reads will only be performed when 
> checksums were disabled.  Later, we can use the DataNode's cache awareness to 
> only perform zero-copy reads when we know that checksum has already been 
> verified.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-15436) Default mount table name used by ViewFileSystem should be configurable

2020-06-26 Thread Virajith Jalaparti (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Virajith Jalaparti resolved HDFS-15436.
---
Resolution: Fixed

> Default mount table name used by ViewFileSystem should be configurable
> --
>
> Key: HDFS-15436
> URL: https://issues.apache.org/jira/browse/HDFS-15436
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: viewfs, viewfsOverloadScheme
>Reporter: Virajith Jalaparti
>Assignee: Virajith Jalaparti
>Priority: Major
>
> Currently, if no authority is provided and the scheme of the Path is doesn't 
> match the scheme of the {{fs.defaultFS}} , the mount table used by 
> ViewFileSystem to resolve this path is {{default}}. 
> This breaks accesses to path like {{hdfs:///foo/bar}} (without any authority) 
> when the following configurations are used:
> (1) {{fs.defaultFS}} = {{viewfs://clustername/}} 
> (2) {{fs.hdfs.impl = 
> org.apache.hadoop.fs.viewfs.ViewFileSystemOverloadScheme}}
> This JIRA proposes to add a new configuration 
> {{fs.viewfs.mounttable.default.name.key}} which is used to get the name of 
> the cluster/mount table when the authority is missing in cases like the 
> above. If not set, the string {{default}} will be used as today.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15436) Default mount table name used by ViewFileSystem should be configurable

2020-06-26 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17146627#comment-17146627
 ] 

Hudson commented on HDFS-15436:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #18385 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/18385/])
HDFS-15436. Default mount table name used by ViewFileSystem should be (github: 
rev bed0a3a37404e9defda13a5bffe5609e72466e46)
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/fs/viewfs/TestViewFileSystemOverloadSchemeWithHdfsScheme.java
* (edit) 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/viewfs/TestViewFsWithAuthorityLocalFs.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/ViewFsOverloadScheme.md
* (edit) 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/viewfs/ViewFsBaseTest.java
* (edit) 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/viewfs/ViewFsTestSetup.java
* (edit) 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/Constants.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/fs/viewfs/TestViewFileSystemHdfs.java
* (edit) 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/ConfigUtil.java
* (edit) 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/InodeTree.java


> Default mount table name used by ViewFileSystem should be configurable
> --
>
> Key: HDFS-15436
> URL: https://issues.apache.org/jira/browse/HDFS-15436
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: viewfs, viewfsOverloadScheme
>Reporter: Virajith Jalaparti
>Assignee: Virajith Jalaparti
>Priority: Major
>
> Currently, if no authority is provided and the scheme of the Path is doesn't 
> match the scheme of the {{fs.defaultFS}} , the mount table used by 
> ViewFileSystem to resolve this path is {{default}}. 
> This breaks accesses to path like {{hdfs:///foo/bar}} (without any authority) 
> when the following configurations are used:
> (1) {{fs.defaultFS}} = {{viewfs://clustername/}} 
> (2) {{fs.hdfs.impl = 
> org.apache.hadoop.fs.viewfs.ViewFileSystemOverloadScheme}}
> This JIRA proposes to add a new configuration 
> {{fs.viewfs.mounttable.default.name.key}} which is used to get the name of 
> the cluster/mount table when the authority is missing in cases like the 
> above. If not set, the string {{default}} will be used as today.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15420) approx scheduled blocks not reseting over time

2020-06-26 Thread Max Mizikar (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17146614#comment-17146614
 ] 

Max Mizikar commented on HDFS-15420:


Is there a stat or a log I can look for to see if any reconstructions have 
timed out? I'm not familiar with this metric.

> approx scheduled blocks not reseting over time
> --
>
> Key: HDFS-15420
> URL: https://issues.apache.org/jira/browse/HDFS-15420
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: block placement
>Affects Versions: 2.6.0, 3.0.0
> Environment: Our 2.6.0 environment is a 3 node cluster running 
> cdh5.15.0.
> Our 3.0.0 environment is a 4 node cluster running cdh6.3.0.
>Reporter: Max Mizikar
>Priority: Minor
> Attachments: Screenshot from 2020-06-18 09-29-57.png, Screenshot from 
> 2020-06-18 09-31-15.png
>
>
> We have been experiencing large amounts of scheduled blocks that never get 
> cleared out. This is preventing blocks from being placed even when there is 
> plenty of space on the system.
> Here is an example of the block growth over 24 hours on one of our systems 
> running 2.6.0
>  !Screenshot from 2020-06-18 09-29-57.png! 
> Here is an example of the block growth over 24 hours on one of our systems 
> running 3.0.0
>  !Screenshot from 2020-06-18 09-31-15.png! 
> https://issues.apache.org/jira/browse/HDFS-1172 appears to be the main issue 
> we were having on 2.6.0 so the growth has decreased since upgrading to 3.0.0, 
> however, there appears to still be a systemic growth in scheduled blocks over 
> time and our systems will still need to restart the namenode on occasion to 
> reset this count. I have not determined what is causing the leaked blocks in 
> 3.0.0.
> Looking into the issue, I discovered that the intention is for scheduled 
> blocks to slowly go back down to 0 after errors cause blocks to be leaked.
> {code}
>   /** Increment the number of blocks scheduled. */
>   void incrementBlocksScheduled(StorageType t) {
> currApproxBlocksScheduled.add(t, 1);
>   }
>   
>   /** Decrement the number of blocks scheduled. */
>   void decrementBlocksScheduled(StorageType t) {
> if (prevApproxBlocksScheduled.get(t) > 0) {
>   prevApproxBlocksScheduled.subtract(t, 1);
> } else if (currApproxBlocksScheduled.get(t) > 0) {
>   currApproxBlocksScheduled.subtract(t, 1);
> } 
> // its ok if both counters are zero.
>   }
>   
>   /** Adjusts curr and prev number of blocks scheduled every few minutes. */
>   private void rollBlocksScheduled(long now) {
> if (now - lastBlocksScheduledRollTime > BLOCKS_SCHEDULED_ROLL_INTERVAL) {
>   prevApproxBlocksScheduled.set(currApproxBlocksScheduled);
>   currApproxBlocksScheduled.reset();
>   lastBlocksScheduledRollTime = now;
> }
>   }
> {code}
> However, this code does not do what is intended if the system has a constant 
> flow of written blocks. If blocks make it into prevApproxBlocksScheduled, the 
> next scheduled block increments currApproxBlocksScheduled and when it 
> completes, it decrements prevApproxBlocksScheduled preventing the leaked 
> block to be removed from the approx count. So, for errors to be corrected, we 
> have to not write any data for the roll period of 10 minutes. The number of 
> blocks we write per 10 minutes is quite high. This allows the error on the 
> approx counts to grow to very large numbers.
> The comments in the ticket for the original implementation suggest this 
> issues was known. https://issues.apache.org/jira/browse/HADOOP-3707. However, 
> it's not clear to me if the severity of it was known at the time.
> > So if there are some blocks that are not reported back by the datanode, 
> > they will eventually get adjusted (usually 10 min; bit longer if datanode 
> > is continuously receiving blocks).
> The comments suggest it will eventually get cleared out, but in our case, it 
> never gets cleared out.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15434) RBF: MountTableResolver#getDestinationForPath failing with AssertionError from localCache

2020-06-26 Thread Jira


[ 
https://issues.apache.org/jira/browse/HDFS-15434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17146593#comment-17146593
 ] 

Íñigo Goiri commented on HDFS-15434:


I have never seen this error, no.
I guess refining the removalListener would help.

> RBF: MountTableResolver#getDestinationForPath failing with AssertionError 
> from localCache
> -
>
> Key: HDFS-15434
> URL: https://issues.apache.org/jira/browse/HDFS-15434
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Priority: Major
>
> {code:java}
> org.apache.hadoop.ipc.Remote.Exception : java.lang.AssertionError 
> com.google.common.cache.LocalCache$Segment.evictEntries(LocalCache.java:2698) 
> at 
> com.google.common.cache.LocalCache$Segment.storeLoadedValue(LocalCache.java:3166)
>  
> at 
> com.google.common.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2386)
> at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2351) 
> at 
> com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2313)
>  
> at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228) at
> at com.google.common.cache.LocalCache.get(LocalCache.java:3965) 
> at 
> com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4764)
> at 
> org.apache.hadoop.hdfs.server.federation.resolver.MountTableResolver.getDestinationForPath(MountTableResolver.java:382)
> at 
> org.apache.hadoop.hdfs.server.federation.resolver.MultipleDestinationMountTableResolver.getDestinationForPath(MultipleDestinationMountTableResolver.java:87)
> at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getLocationsForPath(RouterRpcServer.java:1406)
> at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getLocationsForPath(RouterRpcServer.java:1389)
> at 
> org.apache.hadoop.hdfs.server.federation.router.RouterClientProtocol.getFileInfo(RouterClientProtocol.java:741)
> at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getFileInfo(RouterRpcServer.java:763)
>  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15421) IBR leak causes standby NN to be stuck in safe mode

2020-06-26 Thread Konstantin Shvachko (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17146537#comment-17146537
 ] 

Konstantin Shvachko commented on HDFS-15421:


Correct {{addBlockCollection()}} takes care of (3).
+1 on v04 patch.

> IBR leak causes standby NN to be stuck in safe mode
> ---
>
> Key: HDFS-15421
> URL: https://issues.apache.org/jira/browse/HDFS-15421
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Kihwal Lee
>Assignee: Akira Ajisaka
>Priority: Blocker
>  Labels: release-blocker
> Attachments: HDFS-15421-000.patch, HDFS-15421-001.patch, 
> HDFS-15421.002.patch, HDFS-15421.003.patch, HDFS-15421.004.patch, 
> HDFS-15421.005.patch, HDFS-15421.006.patch, HDFS-15421.007.patch
>
>
> After HDFS-14941, update of the global gen stamp is delayed in certain 
> situations.  This makes the last set of incremental block reports from append 
> "from future", which causes it to be simply re-queued to the pending DN 
> message queue, rather than processed to complete the block.  The last set of 
> IBRs will leak and never cleaned until it transitions to active.  The size of 
> {{pendingDNMessages}} constantly grows until then.
> If a leak happens while in a startup safe mode, the namenode will never be 
> able to come out of safe mode on its own.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15434) RBF: MountTableResolver#getDestinationForPath failing with AssertionError from localCache

2020-06-26 Thread Brahma Reddy Battula (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17146447#comment-17146447
 ] 

Brahma Reddy Battula commented on HDFS-15434:
-

[~inigoiri] and [~crh] did you come across this scenario..?

> RBF: MountTableResolver#getDestinationForPath failing with AssertionError 
> from localCache
> -
>
> Key: HDFS-15434
> URL: https://issues.apache.org/jira/browse/HDFS-15434
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Priority: Major
>
> {code:java}
> org.apache.hadoop.ipc.Remote.Exception : java.lang.AssertionError 
> com.google.common.cache.LocalCache$Segment.evictEntries(LocalCache.java:2698) 
> at 
> com.google.common.cache.LocalCache$Segment.storeLoadedValue(LocalCache.java:3166)
>  
> at 
> com.google.common.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2386)
> at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2351) 
> at 
> com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2313)
>  
> at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228) at
> at com.google.common.cache.LocalCache.get(LocalCache.java:3965) 
> at 
> com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4764)
> at 
> org.apache.hadoop.hdfs.server.federation.resolver.MountTableResolver.getDestinationForPath(MountTableResolver.java:382)
> at 
> org.apache.hadoop.hdfs.server.federation.resolver.MultipleDestinationMountTableResolver.getDestinationForPath(MultipleDestinationMountTableResolver.java:87)
> at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getLocationsForPath(RouterRpcServer.java:1406)
> at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getLocationsForPath(RouterRpcServer.java:1389)
> at 
> org.apache.hadoop.hdfs.server.federation.router.RouterClientProtocol.getFileInfo(RouterClientProtocol.java:741)
> at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getFileInfo(RouterRpcServer.java:763)
>  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15421) IBR leak causes standby NN to be stuck in safe mode

2020-06-26 Thread Takanobu Asanuma (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17146211#comment-17146211
 ] 

Takanobu Asanuma commented on HDFS-15421:
-

Thanks for your explanation, [~shv].

After thinking about it again, my concern is already covered by HDFS-14941. 
Surely Standby NN should not update the global GS in OP_REASSIGN_LEASE. Sorry 
for confusing you, [~aajisaka].

I agree that HDFS-14941 and HDFS-15421.004.patch cover all the cases. I'm +1 on 
[^HDFS-15421.004.patch].

> IBR leak causes standby NN to be stuck in safe mode
> ---
>
> Key: HDFS-15421
> URL: https://issues.apache.org/jira/browse/HDFS-15421
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Kihwal Lee
>Assignee: Akira Ajisaka
>Priority: Blocker
>  Labels: release-blocker
> Attachments: HDFS-15421-000.patch, HDFS-15421-001.patch, 
> HDFS-15421.002.patch, HDFS-15421.003.patch, HDFS-15421.004.patch, 
> HDFS-15421.005.patch, HDFS-15421.006.patch, HDFS-15421.007.patch
>
>
> After HDFS-14941, update of the global gen stamp is delayed in certain 
> situations.  This makes the last set of incremental block reports from append 
> "from future", which causes it to be simply re-queued to the pending DN 
> message queue, rather than processed to complete the block.  The last set of 
> IBRs will leak and never cleaned until it transitions to active.  The size of 
> {{pendingDNMessages}} constantly grows until then.
> If a leak happens while in a startup safe mode, the namenode will never be 
> able to come out of safe mode on its own.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15441) The default value of dfs.datanode.outliers.report.interval has no time unit

2020-06-26 Thread Akira Ajisaka (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17146129#comment-17146129
 ] 

Akira Ajisaka commented on HDFS-15441:
--

Please ignore this. Sorry.

> The default value of dfs.datanode.outliers.report.interval has no time unit
> ---
>
> Key: HDFS-15441
> URL: https://issues.apache.org/jira/browse/HDFS-15441
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Akira Ajisaka
>Priority: Minor
>  Labels: newbie
>
> {noformat}
> 2020-06-25 13:03:10,619 WARN org.apache.hadoop.conf.Configuration: No unit 
> for dfs.datanode.outliers.report.interval(180) assuming MILLISECONDS
> {noformat}
> The default value in hdfs-default.xml should be "30m".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15429) mkdirs should work when parent dir is internalDir and fallback configured.

2020-06-26 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17146128#comment-17146128
 ] 

Hudson commented on HDFS-15429:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #18380 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/18380/])
HDFS-15429. mkdirs should work when parent dir is an internalDir and (github: 
rev d5e1bb6155496cf9d82e121dd1b65d0072312197)
* (add) 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/fs/viewfs/TestViewFsLinkFallback.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/fs/viewfs/TestViewFileSystemLinkFallback.java
* (edit) 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/ViewFs.java
* (edit) 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/ViewFileSystem.java


> mkdirs should work when parent dir is internalDir and fallback configured.
> --
>
> Key: HDFS-15429
> URL: https://issues.apache.org/jira/browse/HDFS-15429
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 3.21
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
>Priority: Major
> Fix For: 3.4.0
>
>
> mkdir will not work if the parent dir is Internal mount dir (non leaf in 
> mount path) and fall back configured.
> Since fallback is available and if same tree structure available in fallback, 
> we should be able to mkdir in fallback.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-15441) The default value of dfs.datanode.outliers.report.interval has no time unit

2020-06-26 Thread Akira Ajisaka (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka resolved HDFS-15441.
--
Resolution: Invalid

> The default value of dfs.datanode.outliers.report.interval has no time unit
> ---
>
> Key: HDFS-15441
> URL: https://issues.apache.org/jira/browse/HDFS-15441
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Akira Ajisaka
>Priority: Minor
>  Labels: newbie
>
> {noformat}
> 2020-06-25 13:03:10,619 WARN org.apache.hadoop.conf.Configuration: No unit 
> for dfs.datanode.outliers.report.interval(180) assuming MILLISECONDS
> {noformat}
> The default value in hdfs-default.xml should be "30m".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15441) The default value of dfs.datanode.outliers.report.interval has no time unit

2020-06-26 Thread Akira Ajisaka (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated HDFS-15441:
-
Description: 
{noformat}
2020-06-25 13:03:10,619 WARN org.apache.hadoop.conf.Configuration: No unit for 
dfs.datanode.outliers.report.interval(180) assuming MILLISECONDS
{noformat}
The default value in hdfs-default.xml should be "30m".

  was:
{noformat}
2020-06-25 13:03:10,619 WARN org.apache.hadoop.conf.Configuration: No unit for 
dfs.datanode.outliers.report.interval(180) assuming MILLISECONDS
{noformat}
The default value should be "30m".


> The default value of dfs.datanode.outliers.report.interval has no time unit
> ---
>
> Key: HDFS-15441
> URL: https://issues.apache.org/jira/browse/HDFS-15441
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Akira Ajisaka
>Priority: Minor
>  Labels: newbie
>
> {noformat}
> 2020-06-25 13:03:10,619 WARN org.apache.hadoop.conf.Configuration: No unit 
> for dfs.datanode.outliers.report.interval(180) assuming MILLISECONDS
> {noformat}
> The default value in hdfs-default.xml should be "30m".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15441) The default value of dfs.datanode.outliers.report.interval has no time unit

2020-06-26 Thread Akira Ajisaka (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated HDFS-15441:
-
Labels: newbie  (was: )

> The default value of dfs.datanode.outliers.report.interval has no time unit
> ---
>
> Key: HDFS-15441
> URL: https://issues.apache.org/jira/browse/HDFS-15441
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Akira Ajisaka
>Priority: Minor
>  Labels: newbie
>
> {noformat}
> 2020-06-25 13:03:10,619 WARN org.apache.hadoop.conf.Configuration: No unit 
> for dfs.datanode.outliers.report.interval(180) assuming MILLISECONDS
> {noformat}
> The default value should be "30m".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15441) The default value of dfs.datanode.outliers.report.interval has no time unit

2020-06-26 Thread Akira Ajisaka (Jira)
Akira Ajisaka created HDFS-15441:


 Summary: The default value of 
dfs.datanode.outliers.report.interval has no time unit
 Key: HDFS-15441
 URL: https://issues.apache.org/jira/browse/HDFS-15441
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Reporter: Akira Ajisaka


{noformat}
2020-06-25 13:03:10,619 WARN org.apache.hadoop.conf.Configuration: No unit for 
dfs.datanode.outliers.report.interval(180) assuming MILLISECONDS
{noformat}
The default value should be "30m".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15429) mkdirs should work when parent dir is internalDir and fallback configured.

2020-06-26 Thread Uma Maheswara Rao G (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-15429:
---
Fix Version/s: 3.4.0
 Hadoop Flags: Reviewed
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Appreciate you timely reviews [~ayushtkn]. Thanks a lot.

I have just committed it to trunk.

> mkdirs should work when parent dir is internalDir and fallback configured.
> --
>
> Key: HDFS-15429
> URL: https://issues.apache.org/jira/browse/HDFS-15429
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 3.21
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
>Priority: Major
> Fix For: 3.4.0
>
>
> mkdir will not work if the parent dir is Internal mount dir (non leaf in 
> mount path) and fall back configured.
> Since fallback is available and if same tree structure available in fallback, 
> we should be able to mkdir in fallback.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15421) IBR leak causes standby NN to be stuck in safe mode

2020-06-26 Thread Akira Ajisaka (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17146113#comment-17146113
 ] 

Akira Ajisaka commented on HDFS-15421:
--

Thanks [~shv] for your comment.
{quote}
2. I think adding applyImpendingGenerationStamp() in OP_REASSIGN_LEASE is 
incorrect as it restores the race condition of HDFS-14941. 
{quote}
Agreed. We should not update the SNN global genstamp in {{OP_REASSIGN_LEASE}}. 
The global genstamp will be updated when tailing {{OP_CLOSE}} or other 
operations after {{OP_REASSIGN_LEASE}}.
{quote}
3. Found one more place FSEditLogLoader.addNewBlock() were we need to add 
setGenerationStampIfGreater(). addNewBlock() adds a block with a new genStamp.
{quote}
Already covered in HDFS-14941: 
https://github.com/apache/hadoop/blob/dd900259c421d6edd0b89a535a1fe08ada91735f/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java#L4742

Now I'm +1 for 004 patch.

> IBR leak causes standby NN to be stuck in safe mode
> ---
>
> Key: HDFS-15421
> URL: https://issues.apache.org/jira/browse/HDFS-15421
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Kihwal Lee
>Assignee: Akira Ajisaka
>Priority: Blocker
>  Labels: release-blocker
> Attachments: HDFS-15421-000.patch, HDFS-15421-001.patch, 
> HDFS-15421.002.patch, HDFS-15421.003.patch, HDFS-15421.004.patch, 
> HDFS-15421.005.patch, HDFS-15421.006.patch, HDFS-15421.007.patch
>
>
> After HDFS-14941, update of the global gen stamp is delayed in certain 
> situations.  This makes the last set of incremental block reports from append 
> "from future", which causes it to be simply re-queued to the pending DN 
> message queue, rather than processed to complete the block.  The last set of 
> IBRs will leak and never cleaned until it transitions to active.  The size of 
> {{pendingDNMessages}} constantly grows until then.
> If a leak happens while in a startup safe mode, the namenode will never be 
> able to come out of safe mode on its own.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15105) Standby NN exits and fails to restart due to edit log corruption

2020-06-26 Thread Xiaoqiao He (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17146036#comment-17146036
 ] 

Xiaoqiao He commented on HDFS-15105:


[~Tao Yang] Do you enable async editlog feature, Not sure if is it related to 
HDFS-15175, Please refer to that ticket.

> Standby NN exits and fails to restart due to edit log corruption
> 
>
> Key: HDFS-15105
> URL: https://issues.apache.org/jira/browse/HDFS-15105
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Tao Yang
>Priority: Critical
>
> We found a issue that Standby NN exited and failed to restart until we 
> resolved the edit log corruption.
>  Error logs:
> {noformat}
> java.io.IOException: Mismatched block IDs or generation stamps, attempting to 
> replace block blk_74288647857_73526148211 with blk_74288647857_73526377369 as 
> block # 15/17 of 
> /maindump/mainv10/dump_online/lasttable/20200105015500/part-319
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.updateBlocks(FSEditLogLoader.java:1019)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:431)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:234)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:143)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:885)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:866)
>         at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:234)
>         at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:342)
>         at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:295)
>         at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:312)
>         at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:455)
>         at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:308)
> {noformat}
> Related edit log transactions of the same file:
> {noformat}
> 1. TXID=444341628498  time=1578251449632
> OP_UPDATE_BLOCKS
> blocks: ... blk_74288647857_73526148211   blk_74454090866_73526215536
> 2. TXID=444342382774   time=1578251520740
> OP_REASSIGN_LEASE
> 3. TXID=444342401216  time=1578251522779
> OP_CLOSE
> blocks: ... blk_74288647857_73526377369   blk_74454090866_73526374095
> 4. TXID=444342401394
> OP_SET_GENSTAMP_V2 
> generate stamp: 73526377369
> 5. TXID=444342401395  time=1578251522835
> OP_TRUNCATE
> 6. TXID=444342402176  time=1578251523246
> OP_CLOSE
> blocks: ... blk_74288647857_73526377369 
> {noformat}
> According to the edit logs, it's wield to see that stamp(73526377369) was 
> generated in transaction 4 but already used in transaction 3, and for 
> transaction 3 there should be only the last block changed but in fact the 
> last two blocks are both changed.
> This problem might be produced in a complex scenario that truncate operation 
> immediately followed the recover-lease operation for the same file. A 
> suspicious point is that between creation and being written for transaction 
> 3, stamp of the second last block was updated when committing block 
> synchronization caused by the truncate operation.
> Related calling stack is as follows: 
> {noformat}
> NameNodeRpcServer#commitBlockSynchronization
>   FSNamesystem#commitBlockSynchronization
>     // update last block
>     if(!copyTruncate) {
>       storedBlock.setGenerationStamp(newgenerationstamp); //updated the stamp 
> of the second last block in transaction 3 before being written
>       storedBlock.setNumBytes(newlength);
>     }
> {noformat}
> Any comments are welcome. Thanks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org