[jira] [Commented] (HDFS-14993) checkDiskError doesn't work during datanode startup
[ https://issues.apache.org/jira/browse/HDFS-14993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16990370#comment-16990370 ] Yang Yun commented on HDFS-14993: - Thanks [~ayushtkn] and [~weichiu] for the review. changed according to comments. > checkDiskError doesn't work during datanode startup > --- > > Key: HDFS-14993 > URL: https://issues.apache.org/jira/browse/HDFS-14993 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Yang Yun >Assignee: Yang Yun >Priority: Major > Attachments: HDFS-14993.patch, HDFS-14993.patch, HDFS-14993.patch > > > the function checkDiskError() is called before addBlockPool, but list > bpSlices is empty this time. So the function check() in FsVolumeImpl.java > does nothing. > @Override > public VolumeCheckResult check(VolumeCheckContext ignored) > throws DiskErrorException { > // TODO:FEDERATION valid synchronization > for (BlockPoolSlice s : bpSlices.values()) { > s.checkDirs(); > } > return VolumeCheckResult.HEALTHY; > } -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14993) checkDiskError doesn't work during datanode startup
[ https://issues.apache.org/jira/browse/HDFS-14993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Yun updated HDFS-14993: Attachment: HDFS-14993.patch Status: Patch Available (was: Open) > checkDiskError doesn't work during datanode startup > --- > > Key: HDFS-14993 > URL: https://issues.apache.org/jira/browse/HDFS-14993 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Yang Yun >Assignee: Yang Yun >Priority: Major > Attachments: HDFS-14993.patch, HDFS-14993.patch, HDFS-14993.patch > > > the function checkDiskError() is called before addBlockPool, but list > bpSlices is empty this time. So the function check() in FsVolumeImpl.java > does nothing. > @Override > public VolumeCheckResult check(VolumeCheckContext ignored) > throws DiskErrorException { > // TODO:FEDERATION valid synchronization > for (BlockPoolSlice s : bpSlices.values()) { > s.checkDirs(); > } > return VolumeCheckResult.HEALTHY; > } -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14983) RBF: Add dfsrouteradmin -refreshSuperUserGroupsConfiguration command option
[ https://issues.apache.org/jira/browse/HDFS-14983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16990369#comment-16990369 ] Xieming Li commented on HDFS-14983: --- HI, [~elgoiri], thank you for your prompt response. I have fixed almost all the issues pointed out. {quote}Can we avoid the SuppressWarnings in TestRouterRefreshSuperUserGroupsConfiguration? {quote} I have deleted the SuppressWarning from that file, but it will produce a LineLengh CheckStyle Error, Since that "\{@link org.apache.hadoop.hdfs.server.federation.router.RouterAdminServer#refreshSuperUserGroupsConfiguration}" in javadoc can not be broken into two lines. > RBF: Add dfsrouteradmin -refreshSuperUserGroupsConfiguration command option > --- > > Key: HDFS-14983 > URL: https://issues.apache.org/jira/browse/HDFS-14983 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf >Reporter: Akira Ajisaka >Assignee: Xieming Li >Priority: Minor > Attachments: HDFS-14983.002.patch, HDFS-14983.003.patch, > HDFS-14983.draft.001.patch > > > NameNode can update proxyuser config by -refreshSuperUserGroupsConfiguration > without restarting but DFSRouter cannot. It would be better for DFSRouter to > have such functionality to be compatible with NameNode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14993) checkDiskError doesn't work during datanode startup
[ https://issues.apache.org/jira/browse/HDFS-14993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Yun updated HDFS-14993: Status: Open (was: Patch Available) > checkDiskError doesn't work during datanode startup > --- > > Key: HDFS-14993 > URL: https://issues.apache.org/jira/browse/HDFS-14993 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Yang Yun >Assignee: Yang Yun >Priority: Major > Attachments: HDFS-14993.patch, HDFS-14993.patch > > > the function checkDiskError() is called before addBlockPool, but list > bpSlices is empty this time. So the function check() in FsVolumeImpl.java > does nothing. > @Override > public VolumeCheckResult check(VolumeCheckContext ignored) > throws DiskErrorException { > // TODO:FEDERATION valid synchronization > for (BlockPoolSlice s : bpSlices.values()) { > s.checkDirs(); > } > return VolumeCheckResult.HEALTHY; > } -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14983) RBF: Add dfsrouteradmin -refreshSuperUserGroupsConfiguration command option
[ https://issues.apache.org/jira/browse/HDFS-14983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xieming Li updated HDFS-14983: -- Status: Open (was: Patch Available) > RBF: Add dfsrouteradmin -refreshSuperUserGroupsConfiguration command option > --- > > Key: HDFS-14983 > URL: https://issues.apache.org/jira/browse/HDFS-14983 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf >Reporter: Akira Ajisaka >Assignee: Xieming Li >Priority: Minor > Attachments: HDFS-14983.002.patch, HDFS-14983.003.patch, > HDFS-14983.draft.001.patch > > > NameNode can update proxyuser config by -refreshSuperUserGroupsConfiguration > without restarting but DFSRouter cannot. It would be better for DFSRouter to > have such functionality to be compatible with NameNode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14983) RBF: Add dfsrouteradmin -refreshSuperUserGroupsConfiguration command option
[ https://issues.apache.org/jira/browse/HDFS-14983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xieming Li updated HDFS-14983: -- Attachment: HDFS-14983.003.patch Status: Patch Available (was: Open) > RBF: Add dfsrouteradmin -refreshSuperUserGroupsConfiguration command option > --- > > Key: HDFS-14983 > URL: https://issues.apache.org/jira/browse/HDFS-14983 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf >Reporter: Akira Ajisaka >Assignee: Xieming Li >Priority: Minor > Attachments: HDFS-14983.002.patch, HDFS-14983.003.patch, > HDFS-14983.draft.001.patch > > > NameNode can update proxyuser config by -refreshSuperUserGroupsConfiguration > without restarting but DFSRouter cannot. It would be better for DFSRouter to > have such functionality to be compatible with NameNode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15037) Encryption Zone operations should not block other RPC calls while retreiving encryption keys.
[ https://issues.apache.org/jira/browse/HDFS-15037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16990275#comment-16990275 ] Konstantin Shvachko commented on HDFS-15037: Ah, OK. Good catch Wei-Chiu. So may be {{dirLock}}-only locking in these methods was introduced by mistake then, and we should fix it in a different issue? [~xiaochen] may be you could give some background here. In any case it would be good to make KMS calls outside of the namesystem lock. > Encryption Zone operations should not block other RPC calls while retreiving > encryption keys. > - > > Key: HDFS-15037 > URL: https://issues.apache.org/jira/browse/HDFS-15037 > Project: Hadoop HDFS > Issue Type: Bug > Components: encryption, namenode >Affects Versions: 2.10.0 >Reporter: Konstantin Shvachko >Priority: Major > > I believe it was an intention to avoid blocking other operations while > retrieving keys with holding {{FSDirectory.dirLock}}. But in reality all > other operations enter first {{FSNamesystemLock}} then {{dirLock}}. So they > are all blocked waiting for the key. > We see substantial increase in RPC wait time ({{RpcQueueTimeAvgTime}}) on > NameNode when encryption operations are intermixed with regular workloads. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15037) Encryption Zone operations should not block other RPC calls while retreiving encryption keys.
[ https://issues.apache.org/jira/browse/HDFS-15037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-15037: --- Summary: Encryption Zone operations should not block other RPC calls while retreiving encryption keys. (was: Encryption Zone operations should not block other RPC calls while retreivingencryption keys.) > Encryption Zone operations should not block other RPC calls while retreiving > encryption keys. > - > > Key: HDFS-15037 > URL: https://issues.apache.org/jira/browse/HDFS-15037 > Project: Hadoop HDFS > Issue Type: Bug > Components: encryption, namenode >Affects Versions: 2.10.0 >Reporter: Konstantin Shvachko >Priority: Major > > I believe it was an intention to avoid blocking other operations while > retrieving keys with holding {{FSDirectory.dirLock}}. But in reality all > other operations enter first {{FSNamesystemLock}} then {{dirLock}}. So they > are all blocked waiting for the key. > We see substantial increase in RPC wait time ({{RpcQueueTimeAvgTime}}) on > NameNode when encryption operations are intermixed with regular workloads. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15032) Balancer crashes when it fails to contact an unavailable NN via ObserverReadProxyProvider
[ https://issues.apache.org/jira/browse/HDFS-15032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16990272#comment-16990272 ] Konstantin Shvachko commented on HDFS-15032: Erik. the patch looks good and tests work as expected for me. I did not understand what you are trying to achieve with method {{toString()}}. It is a good thing to define it for class {{ProxyCombiner}}, then in debugger I can see all proxies it combines. But I don't see why invoking {{toString()}} on say {{ClientProtocol}} should be diverted to {{ProxyCombiner.toString()}}. I just don't see what is it useful for, but it will do string comparison for all other calls hurting performance. > Balancer crashes when it fails to contact an unavailable NN via > ObserverReadProxyProvider > - > > Key: HDFS-15032 > URL: https://issues.apache.org/jira/browse/HDFS-15032 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer mover >Affects Versions: 2.10.0 >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Attachments: HDFS-15032.000.patch, HDFS-15032.001.patch, > HDFS-15032.002.patch > > > When trying to run the Balancer using ObserverReadProxyProvider (to allow it > to read from the Observer Node as described in HDFS-14979), if one of the NNs > isn't running, the Balancer will crash. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15017) Remove redundant import of AtomicBoolean in NameNodeConnector.
[ https://issues.apache.org/jira/browse/HDFS-15017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16990267#comment-16990267 ] Wei-Chiu Chuang commented on HDFS-15017: +1 > Remove redundant import of AtomicBoolean in NameNodeConnector. > -- > > Key: HDFS-15017 > URL: https://issues.apache.org/jira/browse/HDFS-15017 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer mover, hdfs >Affects Versions: 2.10.0 >Reporter: Konstantin Shvachko >Assignee: Chao Sun >Priority: Major > Labels: newbie > Attachments: HDFS-15017-branch-2.000.patch > > > Should remove redundant import. > Looks like it is specific to branch 2.10. Trunk and 3x branches don't have it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15028) Keep the capacity of volume and reduce a system call
[ https://issues.apache.org/jira/browse/HDFS-15028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Masatake Iwasaki updated HDFS-15028: Fix Version/s: 3.3.0 Hadoop Flags: Reviewed Resolution: Fixed Status: Resolved (was: Patch Available) Committed. Thanks, [~hadoop_yangyun]. > Keep the capacity of volume and reduce a system call > > > Key: HDFS-15028 > URL: https://issues.apache.org/jira/browse/HDFS-15028 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Yang Yun >Assignee: Yang Yun >Priority: Minor > Fix For: 3.3.0 > > Attachments: HDFS-15028.patch, HDFS-15028.patch, HDFS-15028.patch, > HDFS-15028.patch, HDFS-15028.patch > > > The local volume is not changed. so keep the first value of the capacity and > reuse for each heartbeat. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15031) Allow BootstrapStandby to download FSImage if the directory is already formatted
[ https://issues.apache.org/jira/browse/HDFS-15031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16990257#comment-16990257 ] Íñigo Goiri commented on HDFS-15031: I think we can fix the checkstyle warnings. > Allow BootstrapStandby to download FSImage if the directory is already > formatted > > > Key: HDFS-15031 > URL: https://issues.apache.org/jira/browse/HDFS-15031 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs, namenode >Reporter: Danny Becker >Assignee: Danny Becker >Priority: Minor > Attachments: HDFS-15031.000.patch, HDFS-15031.001.patch, > HDFS-15031.002.patch, HDFS-15031.003.patch, HDFS-15031.005.patch, > HDFS-15031.006.patch > > > Currently, BootstrapStandby will only download the latest FSImage if it has > formatted the local image directory. This can be an issue when there are out > of date FSImages on a Standby NameNode, as the non-interactive mode will not > format the image directory, and BootstrapStandby will return an error code. > The changes here simply allow BootstrapStandby to download the latest FSImage > to the image directory, without needing to format first. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15017) Remove redundant import of AtomicBoolean in NameNodeConnector.
[ https://issues.apache.org/jira/browse/HDFS-15017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16990247#comment-16990247 ] Hadoop QA commented on HDFS-15017: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 57s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} branch-2 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 10m 23s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 53s{color} | {color:green} branch-2 passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 47s{color} | {color:green} branch-2 passed with JDK v1.8.0_222 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 29s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 54s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 4s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 8s{color} | {color:green} branch-2 passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 44s{color} | {color:green} branch-2 passed with JDK v1.8.0_222 {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 48s{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s{color} | {color:green} the patch passed with JDK v1.8.0_222 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 25s{color} | {color:green} hadoop-hdfs-project/hadoop-hdfs: The patch generated 0 new + 2 unchanged - 1 fixed = 2 total (was 3) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 8s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 6s{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 40s{color} | {color:green} the patch passed with JDK v1.8.0_222 {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 70m 45s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 27s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 99m 28s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.qjournal.server.TestJournalNodeRespectsBindHostKeys | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:f555aa740b5 | | JIRA Issue | HDFS-15017 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12987788/HDFS-15017-branch-2.000.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux e830bfdd6c26 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality |
[jira] [Commented] (HDFS-15028) Keep the capacity of volume and reduce a system call
[ https://issues.apache.org/jira/browse/HDFS-15028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16990237#comment-16990237 ] Hudson commented on HDFS-15028: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17736 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17736/]) HDFS-15028. Keep the capacity of volume and reduce a system call. (iwasakims: rev 11cd5b6e39adbf159891852f3482aebdde5459fb) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestFsVolumeList.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsVolumeImpl.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java > Keep the capacity of volume and reduce a system call > > > Key: HDFS-15028 > URL: https://issues.apache.org/jira/browse/HDFS-15028 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Yang Yun >Assignee: Yang Yun >Priority: Minor > Attachments: HDFS-15028.patch, HDFS-15028.patch, HDFS-15028.patch, > HDFS-15028.patch, HDFS-15028.patch > > > The local volume is not changed. so keep the first value of the capacity and > reuse for each heartbeat. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15028) Keep the capacity of volume and reduce a system call
[ https://issues.apache.org/jira/browse/HDFS-15028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16990232#comment-16990232 ] Masatake Iwasaki commented on HDFS-15028: - +1. committing this. > Keep the capacity of volume and reduce a system call > > > Key: HDFS-15028 > URL: https://issues.apache.org/jira/browse/HDFS-15028 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Yang Yun >Assignee: Yang Yun >Priority: Minor > Attachments: HDFS-15028.patch, HDFS-15028.patch, HDFS-15028.patch, > HDFS-15028.patch, HDFS-15028.patch > > > The local volume is not changed. so keep the first value of the capacity and > reuse for each heartbeat. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15017) Remove redundant import of AtomicBoolean in NameNodeConnector.
[ https://issues.apache.org/jira/browse/HDFS-15017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16990229#comment-16990229 ] Konstantin Shvachko commented on HDFS-15017: +1 Thanks Chao for the patch. > Remove redundant import of AtomicBoolean in NameNodeConnector. > -- > > Key: HDFS-15017 > URL: https://issues.apache.org/jira/browse/HDFS-15017 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer mover, hdfs >Affects Versions: 2.10.0 >Reporter: Konstantin Shvachko >Assignee: Chao Sun >Priority: Major > Labels: newbie > Attachments: HDFS-15017-branch-2.000.patch > > > Should remove redundant import. > Looks like it is specific to branch 2.10. Trunk and 3x branches don't have it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14751) Synchronize on diffs in DirectoryScanner
[ https://issues.apache.org/jira/browse/HDFS-14751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16990228#comment-16990228 ] Hudson commented on HDFS-14751: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17735 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17735/]) HDFS-14751. Synchronize on diffs in DirectoryScanner. Contributed by (weichiu: rev ecd461f940efcd8c75f4833cf09bc7a52cc0b559) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DirectoryScanner.java > Synchronize on diffs in DirectoryScanner > > > Key: HDFS-14751 > URL: https://issues.apache.org/jira/browse/HDFS-14751 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Lisheng Sun >Assignee: Lisheng Sun >Priority: Minor > Fix For: 3.3.0 > > Attachments: HDFS-14751.001.patch, HDFS-14751.002.patch, > HDFS-14751.003.patch, HDFS-14751.004.patch > > > {code:java} > [ERROR] Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: > 21.693 s <<< FAILURE! - in > org.apache.hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency > [ERROR] > testGenerationStampInFuture(org.apache.hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency) > Time elapsed: 7.572 s <<< ERROR! > java.util.ConcurrentModificationException > at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:909) > at java.util.ArrayList$Itr.next(ArrayList.java:859) > at > com.google.common.collect.AbstractMapBasedMultimap$Itr.next(AbstractMapBasedMultimap.java:1153) > at > java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1044) > at > org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:433) > at > org.apache.hadoop.hdfs.server.datanode.DataNodeTestUtils.runDirectoryScanner(DataNodeTestUtils.java:202) > at > org.apache.hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency.testGenerationStampInFuture(TestNameNodeMetadataConsistency.java:92) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) > at > org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) > {code} > Ref:[https://builds.apache.org/job/PreCommit-HDFS-Build/27567/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail:
[jira] [Commented] (HDFS-14476) lock too long when fix inconsistent blocks between disk and in-memory
[ https://issues.apache.org/jira/browse/HDFS-14476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16990227#comment-16990227 ] Hudson commented on HDFS-14476: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17735 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17735/]) HDFS-14476. lock too long when fix inconsistent blocks between disk and (weichiu: rev 313b76f8e92643e3412a98dc73f83437729f3984) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DirectoryScanner.java > lock too long when fix inconsistent blocks between disk and in-memory > - > > Key: HDFS-14476 > URL: https://issues.apache.org/jira/browse/HDFS-14476 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 2.6.0, 2.7.0, 3.0.3 >Reporter: Sean Chow >Assignee: Sean Chow >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-14476-branch-2.01.patch, HDFS-14476.00.patch, > HDFS-14476.002.patch, HDFS-14476.01.patch, HDFS-14476.branch-3.2.001.patch, > datanode-with-patch-14476.png > > > When directoryScanner have the results of differences between disk and > in-memory blocks. it will try to run {{checkAndUpdate}} to fix it. However > {{FsDatasetImpl.checkAndUpdate}} is a synchronized call > As I have about 6millions blocks for every datanodes and every 6hours' scan > will have about 25000 abnormal blocks to fix. That leads to a long lock > holding FsDatasetImpl object. > let's assume every block need 10ms to fix(because of latency of SAS disk), > that will cost 250 seconds to finish. That means all reads and writes will be > blocked for 3mins for that datanode. > > {code:java} > 2019-05-06 08:06:51,704 INFO > org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool > BP-1644920766-10.223.143.220-1450099987967 Total blocks: 6850197, missing > metadata files:23574, missing block files:23574, missing blocks in > memory:47625, mismatched blocks:0 > ... > 2019-05-06 08:16:41,625 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Took 588402ms to process 1 commands from NN > {code} > Take long time to process command from nn because threads are blocked. And > namenode will see long lastContact time for this datanode. > Maybe this affect all hdfs versions. > *how to fix:* > just like process invalidate command from namenode with 1000 batch size, fix > these abnormal block should be handled with batch too and sleep 2 seconds > between the batch to allow normal reading/writing blocks. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15005) Backport HDFS-12300 to branch-2
[ https://issues.apache.org/jira/browse/HDFS-15005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDFS-15005: --- Fix Version/s: 2.11.0 2.10.1 Resolution: Fixed Status: Resolved (was: Patch Available) Thanks [~csun] committed the patch to branch-2 and branch-2.10 > Backport HDFS-12300 to branch-2 > --- > > Key: HDFS-15005 > URL: https://issues.apache.org/jira/browse/HDFS-15005 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Major > Fix For: 2.10.1, 2.11.0 > > Attachments: HDFS-15005-branch-2.000.patch, > HDFS-15005-branch-2.001.patch, HDFS-15005-branch-2.002.patch, > HDFS-15005-branch-2.003.patch > > > Having DT related information is very useful in audit log. This tracks effort > to backport HDFS-12300 to branch-2. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14751) Synchronize on diffs in DirectoryScanner
[ https://issues.apache.org/jira/browse/HDFS-14751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDFS-14751: --- Resolution: Fixed Status: Resolved (was: Patch Available) Thanks [~leosun08] > Synchronize on diffs in DirectoryScanner > > > Key: HDFS-14751 > URL: https://issues.apache.org/jira/browse/HDFS-14751 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Lisheng Sun >Assignee: Lisheng Sun >Priority: Minor > Fix For: 3.3.0 > > Attachments: HDFS-14751.001.patch, HDFS-14751.002.patch, > HDFS-14751.003.patch, HDFS-14751.004.patch > > > {code:java} > [ERROR] Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: > 21.693 s <<< FAILURE! - in > org.apache.hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency > [ERROR] > testGenerationStampInFuture(org.apache.hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency) > Time elapsed: 7.572 s <<< ERROR! > java.util.ConcurrentModificationException > at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:909) > at java.util.ArrayList$Itr.next(ArrayList.java:859) > at > com.google.common.collect.AbstractMapBasedMultimap$Itr.next(AbstractMapBasedMultimap.java:1153) > at > java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1044) > at > org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:433) > at > org.apache.hadoop.hdfs.server.datanode.DataNodeTestUtils.runDirectoryScanner(DataNodeTestUtils.java:202) > at > org.apache.hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency.testGenerationStampInFuture(TestNameNodeMetadataConsistency.java:92) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) > at > org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) > {code} > Ref:[https://builds.apache.org/job/PreCommit-HDFS-Build/27567/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-14476) lock too long when fix inconsistent blocks between disk and in-memory
[ https://issues.apache.org/jira/browse/HDFS-14476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang resolved HDFS-14476. Resolution: Fixed Push into trunk. > lock too long when fix inconsistent blocks between disk and in-memory > - > > Key: HDFS-14476 > URL: https://issues.apache.org/jira/browse/HDFS-14476 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 2.6.0, 2.7.0, 3.0.3 >Reporter: Sean Chow >Assignee: Sean Chow >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-14476-branch-2.01.patch, HDFS-14476.00.patch, > HDFS-14476.002.patch, HDFS-14476.01.patch, HDFS-14476.branch-3.2.001.patch, > datanode-with-patch-14476.png > > > When directoryScanner have the results of differences between disk and > in-memory blocks. it will try to run {{checkAndUpdate}} to fix it. However > {{FsDatasetImpl.checkAndUpdate}} is a synchronized call > As I have about 6millions blocks for every datanodes and every 6hours' scan > will have about 25000 abnormal blocks to fix. That leads to a long lock > holding FsDatasetImpl object. > let's assume every block need 10ms to fix(because of latency of SAS disk), > that will cost 250 seconds to finish. That means all reads and writes will be > blocked for 3mins for that datanode. > > {code:java} > 2019-05-06 08:06:51,704 INFO > org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool > BP-1644920766-10.223.143.220-1450099987967 Total blocks: 6850197, missing > metadata files:23574, missing block files:23574, missing blocks in > memory:47625, mismatched blocks:0 > ... > 2019-05-06 08:16:41,625 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Took 588402ms to process 1 commands from NN > {code} > Take long time to process command from nn because threads are blocked. And > namenode will see long lastContact time for this datanode. > Maybe this affect all hdfs versions. > *how to fix:* > just like process invalidate command from namenode with 1000 batch size, fix > these abnormal block should be handled with batch too and sleep 2 seconds > between the batch to allow normal reading/writing blocks. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14751) Synchronize on diffs in DirectoryScanner
[ https://issues.apache.org/jira/browse/HDFS-14751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDFS-14751: --- Fix Version/s: 3.3.0 > Synchronize on diffs in DirectoryScanner > > > Key: HDFS-14751 > URL: https://issues.apache.org/jira/browse/HDFS-14751 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Lisheng Sun >Assignee: Lisheng Sun >Priority: Minor > Fix For: 3.3.0 > > Attachments: HDFS-14751.001.patch, HDFS-14751.002.patch, > HDFS-14751.003.patch, HDFS-14751.004.patch > > > {code:java} > [ERROR] Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: > 21.693 s <<< FAILURE! - in > org.apache.hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency > [ERROR] > testGenerationStampInFuture(org.apache.hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency) > Time elapsed: 7.572 s <<< ERROR! > java.util.ConcurrentModificationException > at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:909) > at java.util.ArrayList$Itr.next(ArrayList.java:859) > at > com.google.common.collect.AbstractMapBasedMultimap$Itr.next(AbstractMapBasedMultimap.java:1153) > at > java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1044) > at > org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:433) > at > org.apache.hadoop.hdfs.server.datanode.DataNodeTestUtils.runDirectoryScanner(DataNodeTestUtils.java:202) > at > org.apache.hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency.testGenerationStampInFuture(TestNameNodeMetadataConsistency.java:92) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) > at > org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) > {code} > Ref:[https://builds.apache.org/job/PreCommit-HDFS-Build/27567/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14476) lock too long when fix inconsistent blocks between disk and in-memory
[ https://issues.apache.org/jira/browse/HDFS-14476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDFS-14476: --- Fix Version/s: 3.3.0 > lock too long when fix inconsistent blocks between disk and in-memory > - > > Key: HDFS-14476 > URL: https://issues.apache.org/jira/browse/HDFS-14476 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 2.6.0, 2.7.0, 3.0.3 >Reporter: Sean Chow >Assignee: Sean Chow >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-14476-branch-2.01.patch, HDFS-14476.00.patch, > HDFS-14476.002.patch, HDFS-14476.01.patch, HDFS-14476.branch-3.2.001.patch, > datanode-with-patch-14476.png > > > When directoryScanner have the results of differences between disk and > in-memory blocks. it will try to run {{checkAndUpdate}} to fix it. However > {{FsDatasetImpl.checkAndUpdate}} is a synchronized call > As I have about 6millions blocks for every datanodes and every 6hours' scan > will have about 25000 abnormal blocks to fix. That leads to a long lock > holding FsDatasetImpl object. > let's assume every block need 10ms to fix(because of latency of SAS disk), > that will cost 250 seconds to finish. That means all reads and writes will be > blocked for 3mins for that datanode. > > {code:java} > 2019-05-06 08:06:51,704 INFO > org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool > BP-1644920766-10.223.143.220-1450099987967 Total blocks: 6850197, missing > metadata files:23574, missing block files:23574, missing blocks in > memory:47625, mismatched blocks:0 > ... > 2019-05-06 08:16:41,625 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Took 588402ms to process 1 commands from NN > {code} > Take long time to process command from nn because threads are blocked. And > namenode will see long lastContact time for this datanode. > Maybe this affect all hdfs versions. > *how to fix:* > just like process invalidate command from namenode with 1000 batch size, fix > these abnormal block should be handled with batch too and sleep 2 seconds > between the batch to allow normal reading/writing blocks. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14751) Synchronize on diffs in DirectoryScanner
[ https://issues.apache.org/jira/browse/HDFS-14751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16990218#comment-16990218 ] Wei-Chiu Chuang commented on HDFS-14751: +1 I'll add version 03 on top of HDFS-14476. > Synchronize on diffs in DirectoryScanner > > > Key: HDFS-14751 > URL: https://issues.apache.org/jira/browse/HDFS-14751 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Lisheng Sun >Assignee: Lisheng Sun >Priority: Minor > Attachments: HDFS-14751.001.patch, HDFS-14751.002.patch, > HDFS-14751.003.patch, HDFS-14751.004.patch > > > {code:java} > [ERROR] Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: > 21.693 s <<< FAILURE! - in > org.apache.hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency > [ERROR] > testGenerationStampInFuture(org.apache.hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency) > Time elapsed: 7.572 s <<< ERROR! > java.util.ConcurrentModificationException > at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:909) > at java.util.ArrayList$Itr.next(ArrayList.java:859) > at > com.google.common.collect.AbstractMapBasedMultimap$Itr.next(AbstractMapBasedMultimap.java:1153) > at > java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1044) > at > org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:433) > at > org.apache.hadoop.hdfs.server.datanode.DataNodeTestUtils.runDirectoryScanner(DataNodeTestUtils.java:202) > at > org.apache.hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency.testGenerationStampInFuture(TestNameNodeMetadataConsistency.java:92) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) > at > org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) > {code} > Ref:[https://builds.apache.org/job/PreCommit-HDFS-Build/27567/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15012) NN fails to parse Edit logs after applying HDFS-13101
[ https://issues.apache.org/jira/browse/HDFS-15012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16990206#comment-16990206 ] Hadoop QA commented on HDFS-15012: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 21s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 25s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 7s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 4s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 57s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 26s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 13s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 53s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 2 new + 141 unchanged - 0 fixed = 143 total (was 141) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 4s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 27s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}117m 33s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 47s{color} | {color:red} The patch generated 1 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}187m 58s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.web.TestWebHdfsWithMultipleNameNodes | | | hadoop.hdfs.web.TestWebHDFSAcl | | | hadoop.hdfs.web.TestWebHDFSForHA | | | hadoop.hdfs.server.namenode.TestAuditLogs | | | hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA | | | hadoop.hdfs.server.namenode.TestFileTruncate | | | hadoop.hdfs.web.TestWebHDFS | | | hadoop.hdfs.server.namenode.TestFSEditLogLoader | | | hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics | | | hadoop.hdfs.server.namenode.TestLeaseManager | | | hadoop.hdfs.TestFileChecksum | | | hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaRecovery | | | hadoop.hdfs.server.namenode.TestPersistentStoragePolicySatisfier | | | hadoop.hdfs.web.TestWebHDFSXAttr | | | hadoop.hdfs.TestEncryptedTransfer | | | hadoop.hdfs.web.TestWebHdfsTokens | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 | | JIRA Issue | HDFS-15012 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12987780/HDFS-15012.000.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux
[jira] [Commented] (HDFS-15037) Encryption Zone operations should not block other RPC calls while retreivingencryption keys.
[ https://issues.apache.org/jira/browse/HDFS-15037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16990204#comment-16990204 ] Wei-Chiu Chuang commented on HDFS-15037: Thanks [~shv] for reporting the issue. I believe these three calls were made to support reencrypt (HDFS-10899), which was added since Hadoop 3. Did you backport reencrypt to your branch? > Encryption Zone operations should not block other RPC calls while > retreivingencryption keys. > > > Key: HDFS-15037 > URL: https://issues.apache.org/jira/browse/HDFS-15037 > Project: Hadoop HDFS > Issue Type: Bug > Components: encryption, namenode >Affects Versions: 2.10.0 >Reporter: Konstantin Shvachko >Priority: Major > > I believe it was an intention to avoid blocking other operations while > retrieving keys with holding {{FSDirectory.dirLock}}. But in reality all > other operations enter first {{FSNamesystemLock}} then {{dirLock}}. So they > are all blocked waiting for the key. > We see substantial increase in RPC wait time ({{RpcQueueTimeAvgTime}}) on > NameNode when encryption operations are intermixed with regular workloads. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15017) Remove redundant import of AtomicBoolean in NameNodeConnector.
[ https://issues.apache.org/jira/browse/HDFS-15017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HDFS-15017: Status: Patch Available (was: Open) > Remove redundant import of AtomicBoolean in NameNodeConnector. > -- > > Key: HDFS-15017 > URL: https://issues.apache.org/jira/browse/HDFS-15017 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer mover, hdfs >Affects Versions: 2.10.0 >Reporter: Konstantin Shvachko >Assignee: Chao Sun >Priority: Major > Labels: newbie > Attachments: HDFS-15017-branch-2.000.patch > > > Should remove redundant import. > Looks like it is specific to branch 2.10. Trunk and 3x branches don't have it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15017) Remove redundant import of AtomicBoolean in NameNodeConnector.
[ https://issues.apache.org/jira/browse/HDFS-15017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HDFS-15017: Attachment: HDFS-15017-branch-2.000.patch > Remove redundant import of AtomicBoolean in NameNodeConnector. > -- > > Key: HDFS-15017 > URL: https://issues.apache.org/jira/browse/HDFS-15017 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer mover, hdfs >Affects Versions: 2.10.0 >Reporter: Konstantin Shvachko >Assignee: Chao Sun >Priority: Major > Labels: newbie > Attachments: HDFS-15017-branch-2.000.patch > > > Should remove redundant import. > Looks like it is specific to branch 2.10. Trunk and 3x branches don't have it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15017) Remove redundant import of AtomicBoolean in NameNodeConnector.
[ https://issues.apache.org/jira/browse/HDFS-15017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16990201#comment-16990201 ] Chao Sun commented on HDFS-15017: - Seems like a trivial change - the import was added by HDFS-7073 > Remove redundant import of AtomicBoolean in NameNodeConnector. > -- > > Key: HDFS-15017 > URL: https://issues.apache.org/jira/browse/HDFS-15017 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer mover, hdfs >Affects Versions: 2.10.0 >Reporter: Konstantin Shvachko >Assignee: Chao Sun >Priority: Major > Labels: newbie > Attachments: HDFS-15017-branch-2.000.patch > > > Should remove redundant import. > Looks like it is specific to branch 2.10. Trunk and 3x branches don't have it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15017) Remove redundant import of AtomicBoolean in NameNodeConnector.
[ https://issues.apache.org/jira/browse/HDFS-15017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HDFS-15017: Attachment: (was: HDFS-15017-branch-2.000.patch) > Remove redundant import of AtomicBoolean in NameNodeConnector. > -- > > Key: HDFS-15017 > URL: https://issues.apache.org/jira/browse/HDFS-15017 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer mover, hdfs >Affects Versions: 2.10.0 >Reporter: Konstantin Shvachko >Assignee: Chao Sun >Priority: Major > Labels: newbie > > Should remove redundant import. > Looks like it is specific to branch 2.10. Trunk and 3x branches don't have it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15017) Remove redundant import of AtomicBoolean in NameNodeConnector.
[ https://issues.apache.org/jira/browse/HDFS-15017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HDFS-15017: Attachment: HDFS-15017-branch-2.000.patch > Remove redundant import of AtomicBoolean in NameNodeConnector. > -- > > Key: HDFS-15017 > URL: https://issues.apache.org/jira/browse/HDFS-15017 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer mover, hdfs >Affects Versions: 2.10.0 >Reporter: Konstantin Shvachko >Assignee: Chao Sun >Priority: Major > Labels: newbie > > Should remove redundant import. > Looks like it is specific to branch 2.10. Trunk and 3x branches don't have it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14852) Remove of LowRedundancyBlocks do NOT remove the block from all queues
[ https://issues.apache.org/jira/browse/HDFS-14852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16990192#comment-16990192 ] Wei-Chiu Chuang commented on HDFS-14852: [~ferhui] if the symptom you saw was "web UI reporting missing blocks but the file path was empty", it would have been HDFS-13999. But since you reported this issue on a Hadoop 3 cluster, that wouldn't be possible. I added ec as a component. But looks like I was wrong. It doesn't seem to be ec related. Additionally, I would like to see a test added to cover the change inside BlockManager. The test code attached covers LowRedundancyBlocks and I am concerned since BlockManager is a hugely complex piece of code. [~sodonnell] you've looked at LowRedundancyBlocks recently. How do you think about the change? > Remove of LowRedundancyBlocks do NOT remove the block from all queues > - > > Key: HDFS-14852 > URL: https://issues.apache.org/jira/browse/HDFS-14852 > Project: Hadoop HDFS > Issue Type: Bug > Components: ec >Affects Versions: 3.2.0, 3.0.3, 3.1.2, 3.3.0 >Reporter: Fei Hui >Assignee: Fei Hui >Priority: Major > Attachments: CorruptBlocksMismatch.png, HDFS-14852.001.patch, > HDFS-14852.002.patch, HDFS-14852.003.patch, HDFS-14852.004.patch, > HDFS-14852.005.patch, screenshot-1.png > > > LowRedundancyBlocks.java > {code:java} > // Some comments here > if(priLevel >= 0 && priLevel < LEVEL > && priorityQueues.get(priLevel).remove(block)) { > NameNode.blockStateChangeLog.debug( > "BLOCK* NameSystem.LowRedundancyBlock.remove: Removing block {}" > + " from priority queue {}", > block, priLevel); > decrementBlockStat(block, priLevel, oldExpectedReplicas); > return true; > } else { > // Try to remove the block from all queues if the block was > // not found in the queue for the given priority level. > for (int i = 0; i < LEVEL; i++) { > if (i != priLevel && priorityQueues.get(i).remove(block)) { > NameNode.blockStateChangeLog.debug( > "BLOCK* NameSystem.LowRedundancyBlock.remove: Removing block" + > " {} from priority queue {}", block, i); > decrementBlockStat(block, i, oldExpectedReplicas); > return true; > } > } > } > return false; > } > {code} > Source code is above, the comments as follow > {quote} > // Try to remove the block from all queues if the block was > // not found in the queue for the given priority level. > {quote} > The function "remove" does NOT remove the block from all queues. > Function add from LowRedundancyBlocks.java is used on some places and maybe > one block in two or more queues. > We found that corrupt blocks mismatch corrupt files on NN web UI. Maybe it is > related to this. > Upload initial patch -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15031) Allow BootstrapStandby to download FSImage if the directory is already formatted
[ https://issues.apache.org/jira/browse/HDFS-15031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16990189#comment-16990189 ] Hadoop QA commented on HDFS-15031: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 33s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 2s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 40s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 5s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 20s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 7s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 14s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 35s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 2 new + 26 unchanged - 0 fixed = 28 total (was 26) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 24s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 18s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 92m 25s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 36s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}150m 12s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.namenode.TestRedudantBlocks | | | hadoop.hdfs.TestDeadNodeDetection | | | hadoop.hdfs.TestMultipleNNPortQOP | | | hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency | | | hadoop.hdfs.server.namenode.ha.TestDFSUpgradeWithHA | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:104ccca9169 | | JIRA Issue | HDFS-15031 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12987778/HDFS-15031.006.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 3f9021029e81 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 76bb297 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_222 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/28475/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt | | unit |
[jira] [Updated] (HDFS-15037) Encryption Zone operations should not block other RPC calls while retreivingencryption keys.
[ https://issues.apache.org/jira/browse/HDFS-15037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-15037: --- Description: I believe it was an intention to avoid blocking other operations while retrieving keys with holding {{FSDirectory.dirLock}}. But in reality all other operations enter first {{FSNamesystemLock}} then {{dirLock}}. So they are all blocked waiting for the key. We see substantial increase in RPC wait time ({{RpcQueueTimeAvgTime}}) on NameNode when encryption operations are intermixed with regular workloads. was: I believe it was an intention to avoid blocking other operations while retrieving keys with holding {{[FSDirectory.dirLock}}. But in reality all other operations enter first {{FSNamesystemLock}} then {{dirLock}}. So they are all blocked waiting for the key. We see substantial increase in RPC wait time ({{RpcQueueTimeAvgTime}}) on NameNode when encryption operations are intermixed with regular workloads. Here are the three methods, which hold only {{FSDirectory.dirLock}}, but not {{FSNamesystemLock}}: * {{ReencryptionHandler.run()}} * {{FSDirEncryptionZoneOp.getKeyNameForZone()}} * {{EncryptionZoneManager.pauseForTestingAfterNthCheckpoint()}} Looks to me the code need to be rearranged using some other lock for key retrieval. > Encryption Zone operations should not block other RPC calls while > retreivingencryption keys. > > > Key: HDFS-15037 > URL: https://issues.apache.org/jira/browse/HDFS-15037 > Project: Hadoop HDFS > Issue Type: Bug > Components: encryption, namenode >Affects Versions: 2.10.0 >Reporter: Konstantin Shvachko >Priority: Major > > I believe it was an intention to avoid blocking other operations while > retrieving keys with holding {{FSDirectory.dirLock}}. But in reality all > other operations enter first {{FSNamesystemLock}} then {{dirLock}}. So they > are all blocked waiting for the key. > We see substantial increase in RPC wait time ({{RpcQueueTimeAvgTime}}) on > NameNode when encryption operations are intermixed with regular workloads. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-15037) Encryption Zone operations should not block other RPC calls while retreivingencryption keys.
Konstantin Shvachko created HDFS-15037: -- Summary: Encryption Zone operations should not block other RPC calls while retreivingencryption keys. Key: HDFS-15037 URL: https://issues.apache.org/jira/browse/HDFS-15037 Project: Hadoop HDFS Issue Type: Bug Components: encryption, namenode Affects Versions: 2.10.0 Reporter: Konstantin Shvachko I believe it was an intention to avoid blocking other operations while retrieving keys with holding {{[FSDirectory.dirLock}}. But in reality all other operations enter first {{FSNamesystemLock}} then {{dirLock}}. So they are all blocked waiting for the key. We see substantial increase in RPC wait time ({{RpcQueueTimeAvgTime}}) on NameNode when encryption operations are intermixed with regular workloads. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13616) Batch listing of multiple directories
[ https://issues.apache.org/jira/browse/HDFS-13616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16990172#comment-16990172 ] Hadoop QA commented on HDFS-13616: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 12s{color} | {color:red} HDFS-13616 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HDFS-13616 | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/28477/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > Batch listing of multiple directories > - > > Key: HDFS-13616 > URL: https://issues.apache.org/jira/browse/HDFS-13616 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 3.2.0 >Reporter: Andrew Wang >Assignee: Chao Sun >Priority: Major > Attachments: BenchmarkListFiles.java, HDFS-13616.001.patch, > HDFS-13616.002.patch > > > One of the dominant workloads for external metadata services is listing of > partition directories. This can end up being bottlenecked on RTT time when > partition directories contain a small number of files. This is fairly common, > since fine-grained partitioning is used for partition pruning by the query > engines. > A batched listing API that takes multiple paths amortizes the RTT cost. > Initial benchmarks show a 10-20x improvement in metadata loading performance. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15036) Active NameNode should not silently fail the image transfer
[ https://issues.apache.org/jira/browse/HDFS-15036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16990155#comment-16990155 ] Konstantin Shvachko commented on HDFS-15036: This can happen during checkpointing or preparing for a rolling upgrade. We observed it during rolling upgrade, when Standby was reporting: _"Rollback image has been created. Proceed to upgrade daemons."_ While Active still reported _" Rollback image has not been created."_ In the logs for ANN I see that it started receiving the image: {code:java} 2019-12-05 23:14:56,328 INFO org.apache.hadoop.hdfs.server.namenode.ImageServlet: ImageServlet allowing checkpointer: hdfs/active.namenode.com {code} But ANN did not print anything related to the image transfer afterwards. And the transferred image is missing in its storage directory. The ANN log message comes from {{isValidRequestor()}} called by {{ImageServlet.doPut()}}. SBN log indicates that the image was fully and successfully transferred to ANN {code:java} 2019-12-05 23:22:29,526 INFO org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Sending fileName: /hdfs-storage-dir/current/fsimage_rollback_00773999609, fileSize: 1889021016. Sent total: 1889021016 bytes. Size of last segment intended to send: -1 bytes. {code} The SBN log message comes from {{TransferFsImage.copyFileToStream()}}. Looking at the code in {{ImageServlet.doPut()}} I see that in one of the methods it calls {{Util.receiveFile()}} if an Exception is thrown inside the while-loop performing reading from the input (socket) stream and writing to the output (image file) stream, then it will go through a series of finalized sections without catching the exception and logging it or reporting the error to the sender. We should: # Catch and log any exceptions occurring there # Notify SBN about the error, so that it could retry the transfer > Active NameNode should not silently fail the image transfer > --- > > Key: HDFS-15036 > URL: https://issues.apache.org/jira/browse/HDFS-15036 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.10.0 >Reporter: Konstantin Shvachko >Assignee: Chao Sun >Priority: Major > > Image transfer from Standby NameNode to Active silently fails on Active, > without any logging and not notifying the receiver side. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-15036) Active NameNode should not silently fail the image transfer
[ https://issues.apache.org/jira/browse/HDFS-15036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun reassigned HDFS-15036: --- Assignee: Chao Sun > Active NameNode should not silently fail the image transfer > --- > > Key: HDFS-15036 > URL: https://issues.apache.org/jira/browse/HDFS-15036 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.10.0 >Reporter: Konstantin Shvachko >Assignee: Chao Sun >Priority: Major > > Image transfer from Standby NameNode to Active silently fails on Active, > without any logging and not notifying the receiver side. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-15036) Active NameNode should not silently fail the image transfer
Konstantin Shvachko created HDFS-15036: -- Summary: Active NameNode should not silently fail the image transfer Key: HDFS-15036 URL: https://issues.apache.org/jira/browse/HDFS-15036 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.10.0 Reporter: Konstantin Shvachko Image transfer from Standby NameNode to Active silently fails on Active, without any logging and not notifying the receiver side. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14993) checkDiskError doesn't work during datanode startup
[ https://issues.apache.org/jira/browse/HDFS-14993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16990139#comment-16990139 ] Wei-Chiu Chuang commented on HDFS-14993: nit: I would really love to use slf4j to log messages rather than using System.out.println in the tests. Other than that lgtm > checkDiskError doesn't work during datanode startup > --- > > Key: HDFS-14993 > URL: https://issues.apache.org/jira/browse/HDFS-14993 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Yang Yun >Assignee: Yang Yun >Priority: Major > Attachments: HDFS-14993.patch, HDFS-14993.patch > > > the function checkDiskError() is called before addBlockPool, but list > bpSlices is empty this time. So the function check() in FsVolumeImpl.java > does nothing. > @Override > public VolumeCheckResult check(VolumeCheckContext ignored) > throws DiskErrorException { > // TODO:FEDERATION valid synchronization > for (BlockPoolSlice s : bpSlices.values()) { > s.checkDirs(); > } > return VolumeCheckResult.HEALTHY; > } -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14983) RBF: Add dfsrouteradmin -refreshSuperUserGroupsConfiguration command option
[ https://issues.apache.org/jira/browse/HDFS-14983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16990129#comment-16990129 ] Íñigo Goiri commented on HDFS-14983: Thanks [~risyomei], minor comments: * Add break line between the imports and the javadocs (e.g., RefreshSuperUserGroupsConfigurationResponse, RefreshSuperUserGroupsConfigurationRequest,...). * What is the {{address}} parameter in RouterAdmin#refreshSuperUserGroupsConfiguration()? * I'm not sure there is a point having RouterAdmin#refreshSuperUserGroupsConfiguration() and RouterAdmin#refreshSuperUserGroupsExecutor(), we could have a single method. * Can we avoid the SuppressWarnings in TestRouterRefreshSuperUserGroupsConfiguration? * For TestRouterRefreshSuperUserGroupsConfiguration#initializeClientConfig() I think is cleaner to return a new configuration instead of messing around with the internal one. Actually, I would try to get a full new client configuration. > RBF: Add dfsrouteradmin -refreshSuperUserGroupsConfiguration command option > --- > > Key: HDFS-14983 > URL: https://issues.apache.org/jira/browse/HDFS-14983 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf >Reporter: Akira Ajisaka >Assignee: Xieming Li >Priority: Minor > Attachments: HDFS-14983.002.patch, HDFS-14983.draft.001.patch > > > NameNode can update proxyuser config by -refreshSuperUserGroupsConfiguration > without restarting but DFSRouter cannot. It would be better for DFSRouter to > have such functionality to be compatible with NameNode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15005) Backport HDFS-12300 to branch-2
[ https://issues.apache.org/jira/browse/HDFS-15005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16990121#comment-16990121 ] Hadoop QA commented on HDFS-15005: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 51s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} branch-2 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 10m 11s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 55s{color} | {color:green} branch-2 passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 49s{color} | {color:green} branch-2 passed with JDK v1.8.0_222 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 36s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 57s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 10s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 8s{color} | {color:green} branch-2 passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 45s{color} | {color:green} branch-2 passed with JDK v1.8.0_222 {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 53s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 50s{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 46s{color} | {color:green} the patch passed with JDK v1.8.0_222 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 46s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 31s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 9 new + 235 unchanged - 1 fixed = 244 total (was 236) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 7s{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 43s{color} | {color:green} the patch passed with JDK v1.8.0_222 {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 83m 12s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 33s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}113m 43s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.namenode.snapshot.TestSnapshotFileLength | | | hadoop.hdfs.qjournal.server.TestJournalNodeRespectsBindHostKeys | | | hadoop.hdfs.TestSecureEncryptionZoneWithKMS | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:f555aa740b5 | | JIRA Issue | HDFS-15005 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12987774/HDFS-15005-branch-2.003.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux f4a71bf356af 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality |
[jira] [Updated] (HDFS-15012) NN fails to parse Edit logs after applying HDFS-13101
[ https://issues.apache.org/jira/browse/HDFS-15012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDFS-15012: --- Status: Patch Available (was: Open) Patch v0 adds a unit test and fix to address the issue. > NN fails to parse Edit logs after applying HDFS-13101 > - > > Key: HDFS-15012 > URL: https://issues.apache.org/jira/browse/HDFS-15012 > Project: Hadoop HDFS > Issue Type: Bug > Components: nn >Reporter: Eric Lin >Assignee: Shashikant Banerjee >Priority: Blocker > Labels: release-blocker > Attachments: HDFS-15012.000.patch > > > After applying HDFS-13101, and deleting and creating large number of > snapshots, SNN exited with below error: > > {code:sh} > 2019-11-18 08:28:06,528 ERROR > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception > on operation DeleteSnapshotOp [snapshotRoot=/path/to/hdfs/file, > snapshotName=distcp-3479-31-old, > RpcClientId=b16a6cb5-bdbb-45ae-9f9a-f7dc57931f37, Rpc > CallId=1] > java.lang.AssertionError: Element already exists: > element=partition_isactive=true, DELETED=[partition_isactive=true] > at org.apache.hadoop.hdfs.util.Diff.insert(Diff.java:193) > at org.apache.hadoop.hdfs.util.Diff.delete(Diff.java:239) > at org.apache.hadoop.hdfs.util.Diff.combinePosterior(Diff.java:462) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff$2.initChildren(DirectoryWithSnapshotFeature.java:240) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff$2.iterator(DirectoryWithSnapshotFeature.java:250) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtreeRecursively(INodeDirectory.java:755) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:753) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:790) > at > org.apache.hadoop.hdfs.server.namenode.INodeReference.cleanSubtree(INodeReference.java:332) > at > org.apache.hadoop.hdfs.server.namenode.INodeReference$WithName.cleanSubtree(INodeReference.java:583) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtreeRecursively(INodeDirectory.java:760) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:753) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:790) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectorySnapshottableFeature.removeSnapshot(DirectorySnapshottableFeature.java:235) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.removeSnapshot(INodeDirectory.java:259) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.deleteSnapshot(SnapshotManager.java:301) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:688) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:232) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:141) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:903) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:756) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:324) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1144) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:796) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:614) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:676) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:844) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:823) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1547) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1615) > {code} > We confirmed that fsimage and edit files were NOT corrupted, as reverting > HDFS-13101 fixed the issue. So the logic introduced in HDFS-13101 is broken > and failed to parse edit log files. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail:
[jira] [Updated] (HDFS-15012) NN fails to parse Edit logs after applying HDFS-13101
[ https://issues.apache.org/jira/browse/HDFS-15012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDFS-15012: --- Attachment: HDFS-15012.000.patch > NN fails to parse Edit logs after applying HDFS-13101 > - > > Key: HDFS-15012 > URL: https://issues.apache.org/jira/browse/HDFS-15012 > Project: Hadoop HDFS > Issue Type: Bug > Components: nn >Reporter: Eric Lin >Assignee: Shashikant Banerjee >Priority: Blocker > Labels: release-blocker > Attachments: HDFS-15012.000.patch > > > After applying HDFS-13101, and deleting and creating large number of > snapshots, SNN exited with below error: > > {code:sh} > 2019-11-18 08:28:06,528 ERROR > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception > on operation DeleteSnapshotOp [snapshotRoot=/path/to/hdfs/file, > snapshotName=distcp-3479-31-old, > RpcClientId=b16a6cb5-bdbb-45ae-9f9a-f7dc57931f37, Rpc > CallId=1] > java.lang.AssertionError: Element already exists: > element=partition_isactive=true, DELETED=[partition_isactive=true] > at org.apache.hadoop.hdfs.util.Diff.insert(Diff.java:193) > at org.apache.hadoop.hdfs.util.Diff.delete(Diff.java:239) > at org.apache.hadoop.hdfs.util.Diff.combinePosterior(Diff.java:462) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff$2.initChildren(DirectoryWithSnapshotFeature.java:240) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff$2.iterator(DirectoryWithSnapshotFeature.java:250) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtreeRecursively(INodeDirectory.java:755) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:753) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:790) > at > org.apache.hadoop.hdfs.server.namenode.INodeReference.cleanSubtree(INodeReference.java:332) > at > org.apache.hadoop.hdfs.server.namenode.INodeReference$WithName.cleanSubtree(INodeReference.java:583) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtreeRecursively(INodeDirectory.java:760) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:753) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:790) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectorySnapshottableFeature.removeSnapshot(DirectorySnapshottableFeature.java:235) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.removeSnapshot(INodeDirectory.java:259) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.deleteSnapshot(SnapshotManager.java:301) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:688) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:232) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:141) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:903) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:756) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:324) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1144) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:796) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:614) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:676) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:844) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:823) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1547) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1615) > {code} > We confirmed that fsimage and edit files were NOT corrupted, as reverting > HDFS-13101 fixed the issue. So the logic introduced in HDFS-13101 is broken > and failed to parse edit log files. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail:
[jira] [Commented] (HDFS-6874) Add GETFILEBLOCKLOCATIONS operation to HttpFS
[ https://issues.apache.org/jira/browse/HDFS-6874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16990092#comment-16990092 ] hemanthboyina commented on HDFS-6874: - have implemented getfileblocklocations and is working fine with httpfs but there is an issue with httpfswithWebHdfs as webhdfs on getfileblockloactions is trying to access getblocklocations in httpfs , which doesn't exists. I think we need to implement getblocklocations in httpfs and call getfileblocklocations . please correct me if am wrong . > Add GETFILEBLOCKLOCATIONS operation to HttpFS > - > > Key: HDFS-6874 > URL: https://issues.apache.org/jira/browse/HDFS-6874 > Project: Hadoop HDFS > Issue Type: Improvement > Components: httpfs >Affects Versions: 2.4.1, 2.7.3 >Reporter: Gao Zhong Liang >Assignee: Weiwei Yang >Priority: Major > Labels: BB2015-05-TBR > Attachments: HDFS-6874-1.patch, HDFS-6874-branch-2.6.0.patch, > HDFS-6874.02.patch, HDFS-6874.03.patch, HDFS-6874.04.patch, > HDFS-6874.05.patch, HDFS-6874.06.patch, HDFS-6874.07.patch, > HDFS-6874.08.patch, HDFS-6874.09.patch, HDFS-6874.10.patch, HDFS-6874.patch > > > GETFILEBLOCKLOCATIONS operation is missing in HttpFS, which is already > supported in WebHDFS. For the request of GETFILEBLOCKLOCATIONS in > org.apache.hadoop.fs.http.server.HttpFSServer, BAD_REQUEST is returned so far: > ... > case GETFILEBLOCKLOCATIONS: { > response = Response.status(Response.Status.BAD_REQUEST).build(); > break; > } > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14963) Add HDFS Client machine caching active namenode index mechanism.
[ https://issues.apache.org/jira/browse/HDFS-14963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16990087#comment-16990087 ] Chao Sun commented on HDFS-14963: - Seems this and HDFS-15024 are solving very similar problems, and the solution there could be much simpler. Should we instead pursue that approach? I also tend to echo [~shv]'s point and not sure having clients to write to local file is a good idea. > Add HDFS Client machine caching active namenode index mechanism. > > > Key: HDFS-14963 > URL: https://issues.apache.org/jira/browse/HDFS-14963 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.1.3 >Reporter: Xudong Cao >Assignee: Xudong Cao >Priority: Minor > Labels: multi-sbnn > > In multi-NameNodes scenery, a new hdfs client always begins a rpc call from > the 1st namenode, simply polls, and finally determines the current Active > namenode. > This brings at least two problems: > # Extra failover consumption, especially in the case of frequent creation of > clients. > # Unnecessary log printing, suppose there are 3 NNs and the 3rd is ANN, and > then a client starts rpc with the 1st NN, it will be silent when failover > from the 1st NN to the 2nd NN, but when failover from the 2nd NN to the 3rd > NN, it prints some unnecessary logs, in some scenarios, these logs will be > very numerous: > {code:java} > 2019-11-07 11:35:41,577 INFO retry.RetryInvocationHandler: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): > Operation category READ is not supported in state standby. Visit > https://s.apache.org/sbnn-error > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:98) > at > org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:2052) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1459) > ...{code} > We can introduce a solution for this problem: in client machine, for every > hdfs cluster, caching its current Active NameNode index in a separate cache > file named by its uri. *Note these cache files are shared by all hdfs client > processes on this machine*. > For example, suppose there are hdfs://ns1 and hdfs://ns2, and the client > machine cache file directory is /tmp, then: > # the ns1 cluster related cache file is /tmp/ns1 > # the ns2 cluster related cache file is /tmp/ns2 > And then: > # When a client starts, it reads the current Active NameNode index from the > corresponding cache file based on the target hdfs uri, and then directly make > an rpc call toward the right ANN. > # After each time client failovers, it need to write the latest Active > NameNode index to the corresponding cache file based on the target hdfs uri. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15031) Allow BootstrapStandby to download FSImage if the directory is already formatted
[ https://issues.apache.org/jira/browse/HDFS-15031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Becker updated HDFS-15031: Attachment: HDFS-15031.006.patch > Allow BootstrapStandby to download FSImage if the directory is already > formatted > > > Key: HDFS-15031 > URL: https://issues.apache.org/jira/browse/HDFS-15031 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs, namenode >Reporter: Danny Becker >Assignee: Danny Becker >Priority: Minor > Attachments: HDFS-15031.000.patch, HDFS-15031.001.patch, > HDFS-15031.002.patch, HDFS-15031.003.patch, HDFS-15031.005.patch, > HDFS-15031.006.patch > > > Currently, BootstrapStandby will only download the latest FSImage if it has > formatted the local image directory. This can be an issue when there are out > of date FSImages on a Standby NameNode, as the non-interactive mode will not > format the image directory, and BootstrapStandby will return an error code. > The changes here simply allow BootstrapStandby to download the latest FSImage > to the image directory, without needing to format first. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14908) LeaseManager should check parent-child relationship when filter open files.
[ https://issues.apache.org/jira/browse/HDFS-14908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16990076#comment-16990076 ] hemanthboyina commented on HDFS-14908: -- thanks for the ping [~elgoiri] , either unifying the methods or using the DFSUtil.isParentEntry would be fine with me . > LeaseManager should check parent-child relationship when filter open files. > --- > > Key: HDFS-14908 > URL: https://issues.apache.org/jira/browse/HDFS-14908 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.1.0, 3.0.1 >Reporter: Jinglun >Assignee: Jinglun >Priority: Minor > Attachments: HDFS-14908.001.patch, HDFS-14908.002.patch, > HDFS-14908.003.patch, HDFS-14908.004.patch, HDFS-14908.005.patch, > HDFS-14908.006.patch, HDFS-14908.007.patch, HDFS-14908.008.patch, > HDFS-14908.TestV4.patch, Test.java, TestV2.java, TestV3.java > > > Now when doing listOpenFiles(), LeaseManager only checks whether the filter > path is the prefix of the open files. We should check whether the filter path > is the parent/ancestor of the open files. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15024) [SBN read] In FailoverOnNetworkExceptionRetry , Number of NameNodes as a condition of calculation of sleep time
[ https://issues.apache.org/jira/browse/HDFS-15024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16990059#comment-16990059 ] Chao Sun commented on HDFS-15024: - {quote} Chao Sun I think the msync case is just a case, maybe the current problem is a common problem for Support more than 2 NameNodes? {quote} yes you are correct. This is a more general problem for multi-sbn feature but I think we could optimize {{msync}} specifically to avoid the retry backoff. Regarding patch v1, seems it only handles the first few retries and later on when {{times}} gradually increment to passes beyond {{numNameNodes - 1 }}, it will still do exponential backoff on all the SBNs. > [SBN read] In FailoverOnNetworkExceptionRetry , Number of NameNodes as a > condition of calculation of sleep time > --- > > Key: HDFS-15024 > URL: https://issues.apache.org/jira/browse/HDFS-15024 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.10.0, 3.3.0, 3.2.1 >Reporter: huhaiyang >Assignee: huhaiyang >Priority: Major > Labels: multi-sbnn > Attachments: HDFS-15024.001.patch, client_error.log > > > When we enable the ONN , there will be three NN nodes for the client > configuration, > Such as configuration > > dfs.ha.namenodes.ns1 > nn2,nn3,nn1 > > Currently, > nn2 is in standby state > nn3 is in observer state > nn1 is in active state > When the user performs an access HDFS operation > ./bin/hadoop --loglevel debug fs > -Ddfs.client.failover.proxy.provider.ns1=org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider > -mkdir /user/haiyang1/test8 > You need to request nn1 when you execute the msync method, > Actually connect nn2 first and failover is required > In connection nn3 does not meet the requirements, failover needs to be > performed, but at this time, failover operation needs to be performed during > a period of hibernation > Finally, it took a period of hibernation to connect the successful request to > nn1 > In FailoverOnNetworkExceptionRetry getFailoverOrRetrySleepTime The current > default implementation is Sleep time is calculated when more than one > failover operation is performed > I think that the Number of NameNodes as a condition of calculation of sleep > time is more reasonable > That is, in the current test, executing failover on connection nn3 does not > need to sleep time to directly connect to the next nn node > See client_error.log for details -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14998) [SBN read] Update Observer Namenode doc for ZKFC after HDFS-14130
[ https://issues.apache.org/jira/browse/HDFS-14998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16990028#comment-16990028 ] Hudson commented on HDFS-14998: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17733 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17733/]) HDFS-14998. [SBN read] Update Observer Namenode doc for ZKFC after (ayushsaxena: rev 705b172b95db345a99adf088fca83c67bd13a691) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/ObserverNameNode.md > [SBN read] Update Observer Namenode doc for ZKFC after HDFS-14130 > - > > Key: HDFS-14998 > URL: https://issues.apache.org/jira/browse/HDFS-14998 > Project: Hadoop HDFS > Issue Type: Improvement > Components: documentation >Affects Versions: 3.3.0 >Reporter: Fei Hui >Assignee: Fei Hui >Priority: Minor > Attachments: HDFS-14998.001.patch, HDFS-14998.002.patch, > HDFS-14998.003.patch, HDFS-14998.004.patch, HDFS-14998.005.patch, > HDFS-14998.006.patch > > > After HDFS-14130, we should update observer namenode doc, observer namenode > can run with ZKFC running -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15005) Backport HDFS-12300 to branch-2
[ https://issues.apache.org/jira/browse/HDFS-15005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HDFS-15005: Attachment: HDFS-15005-branch-2.003.patch > Backport HDFS-12300 to branch-2 > --- > > Key: HDFS-15005 > URL: https://issues.apache.org/jira/browse/HDFS-15005 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Major > Attachments: HDFS-15005-branch-2.000.patch, > HDFS-15005-branch-2.001.patch, HDFS-15005-branch-2.002.patch, > HDFS-15005-branch-2.003.patch > > > Having DT related information is very useful in audit log. This tracks effort > to backport HDFS-12300 to branch-2. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15005) Backport HDFS-12300 to branch-2
[ https://issues.apache.org/jira/browse/HDFS-15005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16990023#comment-16990023 ] Chao Sun commented on HDFS-15005: - Rebased to the latest branch-2. [~weichiu] pls take a look. > Backport HDFS-12300 to branch-2 > --- > > Key: HDFS-15005 > URL: https://issues.apache.org/jira/browse/HDFS-15005 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Major > Attachments: HDFS-15005-branch-2.000.patch, > HDFS-15005-branch-2.001.patch, HDFS-15005-branch-2.002.patch, > HDFS-15005-branch-2.003.patch > > > Having DT related information is very useful in audit log. This tracks effort > to backport HDFS-12300 to branch-2. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14998) [SBN read] Update Observer Namenode doc for ZKFC after HDFS-14130
[ https://issues.apache.org/jira/browse/HDFS-14998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ayush Saxena updated HDFS-14998: Hadoop Flags: Reviewed Resolution: Fixed Status: Resolved (was: Patch Available) Committed to trunk. Thanx [~ferhui] for the contribution and [~csun] for the review!!! > [SBN read] Update Observer Namenode doc for ZKFC after HDFS-14130 > - > > Key: HDFS-14998 > URL: https://issues.apache.org/jira/browse/HDFS-14998 > Project: Hadoop HDFS > Issue Type: Improvement > Components: documentation >Affects Versions: 3.3.0 >Reporter: Fei Hui >Assignee: Fei Hui >Priority: Minor > Attachments: HDFS-14998.001.patch, HDFS-14998.002.patch, > HDFS-14998.003.patch, HDFS-14998.004.patch, HDFS-14998.005.patch, > HDFS-14998.006.patch > > > After HDFS-14130, we should update observer namenode doc, observer namenode > can run with ZKFC running -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14998) [SBN read] Update Observer Namenode doc for ZKFC after HDFS-14130
[ https://issues.apache.org/jira/browse/HDFS-14998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16990010#comment-16990010 ] Ayush Saxena commented on HDFS-14998: - +1, Committing Shortly. > [SBN read] Update Observer Namenode doc for ZKFC after HDFS-14130 > - > > Key: HDFS-14998 > URL: https://issues.apache.org/jira/browse/HDFS-14998 > Project: Hadoop HDFS > Issue Type: Improvement > Components: documentation >Affects Versions: 3.3.0 >Reporter: Fei Hui >Assignee: Fei Hui >Priority: Minor > Attachments: HDFS-14998.001.patch, HDFS-14998.002.patch, > HDFS-14998.003.patch, HDFS-14998.004.patch, HDFS-14998.005.patch, > HDFS-14998.006.patch > > > After HDFS-14130, we should update observer namenode doc, observer namenode > can run with ZKFC running -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15032) Balancer crashes when it fails to contact an unavailable NN via ObserverReadProxyProvider
[ https://issues.apache.org/jira/browse/HDFS-15032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16989926#comment-16989926 ] Erik Krogen commented on HDFS-15032: [~shv] can you take a look at the v2 patch when you have a chance? I don't think the test failures are related. > Balancer crashes when it fails to contact an unavailable NN via > ObserverReadProxyProvider > - > > Key: HDFS-15032 > URL: https://issues.apache.org/jira/browse/HDFS-15032 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer mover >Affects Versions: 2.10.0 >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Attachments: HDFS-15032.000.patch, HDFS-15032.001.patch, > HDFS-15032.002.patch > > > When trying to run the Balancer using ObserverReadProxyProvider (to allow it > to read from the Observer Node as described in HDFS-14979), if one of the NNs > isn't running, the Balancer will crash. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14983) RBF: Add dfsrouteradmin -refreshSuperUserGroupsConfiguration command option
[ https://issues.apache.org/jira/browse/HDFS-14983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16989817#comment-16989817 ] Hadoop QA commented on HDFS-14983: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 25s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 18s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 47s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 3m 20s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 57s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 48s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 26s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 49s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 21s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 3m 45s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} cc {color} | {color:red} 3m 45s{color} | {color:red} hadoop-hdfs-project generated 3 new + 16 unchanged - 3 fixed = 19 total (was 19) {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 3m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 53s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 33s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 12s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}113m 20s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 7m 15s{color} | {color:green} hadoop-hdfs-rbf in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 32s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}202m 19s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.datanode.TestDataNodeErasureCodingMetrics | | | hadoop.hdfs.server.namenode.TestNameNodeMXBean | | | hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:104ccca9169 | | JIRA Issue | HDFS-14983 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12987702/HDFS-14983.002.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle cc | | uname | Linux 0e12d2d5e49c 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 05:24:09 UTC 2019 x86_64 x86_64
[jira] [Commented] (HDFS-14740) Recover data blocks from persistent memory read cache during datanode restarts
[ https://issues.apache.org/jira/browse/HDFS-14740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16989774#comment-16989774 ] Hadoop QA commented on HDFS-14740: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 44s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 3 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 10s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 50s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 17m 16s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 50s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 29s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 18m 53s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 16s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 37s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 20s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 16m 33s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} cc {color} | {color:red} 16m 33s{color} | {color:red} root generated 3 new + 23 unchanged - 3 fixed = 26 total (was 26) {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 16m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 3m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 2s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 11s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 38s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 2s{color} | {color:green} hadoop-common in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red}113m 10s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 1m 54s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}237m 5s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.datanode.TestDataNodeUUID | | | hadoop.hdfs.TestFileCorruption | | | hadoop.hdfs.TestReconstructStripedFileWithRandomECPolicy | | | hadoop.tools.TestHdfsConfigFields | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:104ccca9169 | | JIRA Issue | HDFS-14740 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12987690/HDFS-14740.007.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient
[jira] [Created] (HDFS-15035) Fix Rename API in BasicOzoneFileSystem
Ayush Saxena created HDFS-15035: --- Summary: Fix Rename API in BasicOzoneFileSystem Key: HDFS-15035 URL: https://issues.apache.org/jira/browse/HDFS-15035 Project: Hadoop HDFS Issue Type: Bug Reporter: Ayush Saxena Assignee: Ayush Saxena In the Rename API : 1. This doesn't work if one of the path contains URI and other doesn't. {code:java} if (src.equals(dst)) { return true; } {code} 2. This check is suppose to be done only for directories, but is done for Files too, it can be moved after getting the FileStatus and checking the type. {code:java} // Some comments here public String getFoo() { return foo; } {code} 3. This too doesn't work (similar to 1.) {code:java} if (srcStatus.isDirectory()) { if (dst.toString().startsWith(src.toString() + OZONE_URI_DELIMITER)) { LOG.trace("Cannot rename a directory to a subdirectory of self"); return false; } {code} 4. Rename is even success if the URI provided is of different FileSystem. In general HDFS/Other FS shall throw IllegalArgumentException if the path doesn't belong to the same FS. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14668) Support Fuse with Users from multiple Security Realms
[ https://issues.apache.org/jira/browse/HDFS-14668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Istvan Fajth updated HDFS-14668: Description: UPDATE: See [this|https://issues.apache.org/jira/browse/HDFS-14668?focusedCommentId=16979466=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16979466] comment for the complete description of what is happening here. Users from non-default krb5 domain can't use hadoop-fuse. There are 2 Realms with kdc. -one realm is for human users (USERS.COM.US) -the other is for service principals. (SERVICE.COM.US) Cross realm trust is setup. In krb5.conf the default domain is set to SERVICE.COM.US Users within USERS.COM.US Realm are not able to put any files to Fuse mounted location The client shows: cp: cannot create regular file ‘/hdfs_mount/tmp/hello_from_fuse.txt’: Input/output error was: UPDATE: See this comment for the complete description of what is happening here. Users from non-default krb5 domain can't use hadoop-fuse. There are 2 Realms with kdc. -one realm is for human users (USERS.COM.US) -the other is for service principals. (SERVICE.COM.US) Cross realm trust is setup. In krb5.conf the default domain is set to SERVICE.COM.US Users within USERS.COM.US Realm are not able to put any files to Fuse mounted location The client shows: cp: cannot create regular file ‘/hdfs_mount/tmp/hello_from_fuse.txt’: Input/output error > Support Fuse with Users from multiple Security Realms > - > > Key: HDFS-14668 > URL: https://issues.apache.org/jira/browse/HDFS-14668 > Project: Hadoop HDFS > Issue Type: Bug > Components: fuse-dfs >Affects Versions: 3.1.0, 3.0.3 >Reporter: Sailesh Patel >Assignee: Istvan Fajth >Priority: Critical > Labels: regression > > UPDATE: > See > [this|https://issues.apache.org/jira/browse/HDFS-14668?focusedCommentId=16979466=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16979466] > comment for the complete description of what is happening here. > Users from non-default krb5 domain can't use hadoop-fuse. > There are 2 Realms with kdc. > -one realm is for human users (USERS.COM.US) > -the other is for service principals. (SERVICE.COM.US) > Cross realm trust is setup. > In krb5.conf the default domain is set to SERVICE.COM.US > Users within USERS.COM.US Realm are not able to put any files to Fuse mounted > location > The client shows: > cp: cannot create regular file ‘/hdfs_mount/tmp/hello_from_fuse.txt’: > Input/output error -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14668) Support Fuse with Users from multiple Security Realms
[ https://issues.apache.org/jira/browse/HDFS-14668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Istvan Fajth updated HDFS-14668: Description: UPDATE: See this comment for the complete description of what is happening here. Users from non-default krb5 domain can't use hadoop-fuse. There are 2 Realms with kdc. -one realm is for human users (USERS.COM.US) -the other is for service principals. (SERVICE.COM.US) Cross realm trust is setup. In krb5.conf the default domain is set to SERVICE.COM.US Users within USERS.COM.US Realm are not able to put any files to Fuse mounted location The client shows: cp: cannot create regular file ‘/hdfs_mount/tmp/hello_from_fuse.txt’: Input/output error was: Users from non-default krb5 domain can't use hadoop-fuse. There are 2 Realms with kdc. -one realm is for human users (USERS.COM.US) -the other is for service principals. (SERVICE.COM.US) Cross realm trust is setup. In krb5.conf the default domain is set to SERVICE.COM.US Users within USERS.COM.US Realm are not able to put any files to Fuse mounted location The client shows: cp: cannot create regular file ‘/hdfs_mount/tmp/hello_from_fuse.txt’: Input/output error > Support Fuse with Users from multiple Security Realms > - > > Key: HDFS-14668 > URL: https://issues.apache.org/jira/browse/HDFS-14668 > Project: Hadoop HDFS > Issue Type: Bug > Components: fuse-dfs >Affects Versions: 3.1.0, 3.0.3 >Reporter: Sailesh Patel >Assignee: Istvan Fajth >Priority: Critical > Labels: regression > > UPDATE: > See this comment for the complete description of what is happening here. > Users from non-default krb5 domain can't use hadoop-fuse. > There are 2 Realms with kdc. > -one realm is for human users (USERS.COM.US) > -the other is for service principals. (SERVICE.COM.US) > Cross realm trust is setup. > In krb5.conf the default domain is set to SERVICE.COM.US > Users within USERS.COM.US Realm are not able to put any files to Fuse mounted > location > The client shows: > cp: cannot create regular file ‘/hdfs_mount/tmp/hello_from_fuse.txt’: > Input/output error -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14668) Support Fuse with Users from multiple Security Realms
[ https://issues.apache.org/jira/browse/HDFS-14668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Istvan Fajth updated HDFS-14668: Labels: regression (was: ) > Support Fuse with Users from multiple Security Realms > - > > Key: HDFS-14668 > URL: https://issues.apache.org/jira/browse/HDFS-14668 > Project: Hadoop HDFS > Issue Type: Bug > Components: fuse-dfs >Affects Versions: 3.1.0, 3.0.3 >Reporter: Sailesh Patel >Assignee: Istvan Fajth >Priority: Critical > Labels: regression > > Users from non-default krb5 domain can't use hadoop-fuse. > There are 2 Realms with kdc. > -one realm is for human users (USERS.COM.US) > -the other is for service principals. (SERVICE.COM.US) > Cross realm trust is setup. > In krb5.conf the default domain is set to SERVICE.COM.US > Users within USERS.COM.US Realm are not able to put any files to Fuse mounted > location > The client shows: > cp: cannot create regular file ‘/hdfs_mount/tmp/hello_from_fuse.txt’: > Input/output error -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14668) Support Fuse with Users from multiple Security Realms
[ https://issues.apache.org/jira/browse/HDFS-14668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Istvan Fajth updated HDFS-14668: Priority: Critical (was: Minor) > Support Fuse with Users from multiple Security Realms > - > > Key: HDFS-14668 > URL: https://issues.apache.org/jira/browse/HDFS-14668 > Project: Hadoop HDFS > Issue Type: Bug > Components: fuse-dfs >Reporter: Sailesh Patel >Assignee: Istvan Fajth >Priority: Critical > > Users from non-default krb5 domain can't use hadoop-fuse. > There are 2 Realms with kdc. > -one realm is for human users (USERS.COM.US) > -the other is for service principals. (SERVICE.COM.US) > Cross realm trust is setup. > In krb5.conf the default domain is set to SERVICE.COM.US > Users within USERS.COM.US Realm are not able to put any files to Fuse mounted > location > The client shows: > cp: cannot create regular file ‘/hdfs_mount/tmp/hello_from_fuse.txt’: > Input/output error -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14668) Support Fuse with Users from multiple Security Realms
[ https://issues.apache.org/jira/browse/HDFS-14668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Istvan Fajth updated HDFS-14668: Affects Version/s: 3.1.0 3.0.3 > Support Fuse with Users from multiple Security Realms > - > > Key: HDFS-14668 > URL: https://issues.apache.org/jira/browse/HDFS-14668 > Project: Hadoop HDFS > Issue Type: Bug > Components: fuse-dfs >Affects Versions: 3.1.0, 3.0.3 >Reporter: Sailesh Patel >Assignee: Istvan Fajth >Priority: Critical > > Users from non-default krb5 domain can't use hadoop-fuse. > There are 2 Realms with kdc. > -one realm is for human users (USERS.COM.US) > -the other is for service principals. (SERVICE.COM.US) > Cross realm trust is setup. > In krb5.conf the default domain is set to SERVICE.COM.US > Users within USERS.COM.US Realm are not able to put any files to Fuse mounted > location > The client shows: > cp: cannot create regular file ‘/hdfs_mount/tmp/hello_from_fuse.txt’: > Input/output error -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14668) Support Fuse with Users from multiple Security Realms
[ https://issues.apache.org/jira/browse/HDFS-14668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Istvan Fajth updated HDFS-14668: Issue Type: Bug (was: Improvement) > Support Fuse with Users from multiple Security Realms > - > > Key: HDFS-14668 > URL: https://issues.apache.org/jira/browse/HDFS-14668 > Project: Hadoop HDFS > Issue Type: Bug > Components: fuse-dfs >Reporter: Sailesh Patel >Assignee: Istvan Fajth >Priority: Minor > > Users from non-default krb5 domain can't use hadoop-fuse. > There are 2 Realms with kdc. > -one realm is for human users (USERS.COM.US) > -the other is for service principals. (SERVICE.COM.US) > Cross realm trust is setup. > In krb5.conf the default domain is set to SERVICE.COM.US > Users within USERS.COM.US Realm are not able to put any files to Fuse mounted > location > The client shows: > cp: cannot create regular file ‘/hdfs_mount/tmp/hello_from_fuse.txt’: > Input/output error -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14668) Support Fuse with Users from multiple Security Realms
[ https://issues.apache.org/jira/browse/HDFS-14668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16989730#comment-16989730 ] Istvan Fajth commented on HDFS-14668: - After a couple of days thinking, and few hours of testing I decided to come up with the given PR. The main reasons I chose this solution is the following: - the affected UGI API calls are public, and may be used in other projects, where the necessary tunings might already have happened. - there does not seem to be a good way of deciding whether the given username is a valid principal name, and we can not implement FUSE specific solutions in the UGI code - I am not familiar enough with how other projects are using the UGI, this phenomenon might cause problems there as well, and I am not sure why it was necessary to add the username as a principal all the time from the UGI, and it is not clear if this scenario was considered at that time, but without [~daryn] I think we might not get this information ever so removing the newly added behaviour does not seem to be a good option and can cause troubles in other areas. - this change has the least effect to any other code that has been written The solution itself changes the connection builder setup, and in case of a kerberized environment FUSE does not set the username, which renders the value to null on the Java level properly, so that the Java kerberos layer from inside the UGI calls will determine the principal's name from the ticket cache provided. In the non-kerberized environments, we still need to provide the username, as in that case we are checking permissions against the OS user name, and we don't want to loose this inside the FUSE logic either. While I have been checking this, I came across the fact that inside FUSE we could have check and if set use the value of the HADOOP_USER_NAME environment variable, but we currently do not use it anywhere. I filed HDFS-15034 for this improvement to track it. > Support Fuse with Users from multiple Security Realms > - > > Key: HDFS-14668 > URL: https://issues.apache.org/jira/browse/HDFS-14668 > Project: Hadoop HDFS > Issue Type: Improvement > Components: fuse-dfs >Reporter: Sailesh Patel >Assignee: Istvan Fajth >Priority: Minor > > Users from non-default krb5 domain can't use hadoop-fuse. > There are 2 Realms with kdc. > -one realm is for human users (USERS.COM.US) > -the other is for service principals. (SERVICE.COM.US) > Cross realm trust is setup. > In krb5.conf the default domain is set to SERVICE.COM.US > Users within USERS.COM.US Realm are not able to put any files to Fuse mounted > location > The client shows: > cp: cannot create regular file ‘/hdfs_mount/tmp/hello_from_fuse.txt’: > Input/output error -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14869) Data loss in case of distcp using snapshot diff. Replication should include rename records if file was skipped in the previous iteration
[ https://issues.apache.org/jira/browse/HDFS-14869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16989699#comment-16989699 ] Hudson commented on HDFS-14869: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17732 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17732/]) HDFS-14869 Copy renamed files which are not excluded anymore by filter (shashikant: rev fc97034b29243a0509633849de55aa734859) * (edit) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpSync.java * (edit) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCp.java * (edit) hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestDistCpSync.java > Data loss in case of distcp using snapshot diff. Replication should include > rename records if file was skipped in the previous iteration > > > Key: HDFS-14869 > URL: https://issues.apache.org/jira/browse/HDFS-14869 > Project: Hadoop HDFS > Issue Type: Bug > Components: distcp >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Fix For: 3.1.4 > > > This issue arises when a directory or file is excluded by exclusion filter > during distcp replication. Later on if the directory is renamed later to a > name which is not excluded by the filter, the snapshot diff reports only a > rename operation. The directory is never copied to target even though its > not excluded now. This also doesn't throw any error so there is no way to > find the issue. > Steps to reproduce > * Create a directory in hdfs to copy using distcp. > * Include a staging folder in the directory. > {code:java} > [hdfs@ctr-e141-1563959304486-33995-01-03 hadoop-mapreduce]$ hadoop fs -ls > /tmp/tocopy > Found 4 items > -rw-r--r-- 3 hdfs hdfs 16 2019-09-12 10:32 /tmp/tocopy/.b.txt > drwxr-xr-x - hdfs hdfs 0 2019-09-23 09:18 /tmp/tocopy/.staging > -rw-r--r-- 3 hdfs hdfs 12 2019-09-12 10:32 /tmp/tocopy/a.txt > -rw-r--r-- 3 hdfs hdfs 4 2019-09-20 08:23 /tmp/tocopy/foo.txt{code} > * The exclusion filter is set to exclude any staging directory > {code:java} > [hdfs@ctr-e141-1563959304486-33995-01-03 hadoop-mapreduce]$ cat > /tmp/filter > .*\.Trash.* > .*\.staging.*{code} > * Do a copy using distcp snapshots, the staging directory is not replicated. > {code:java} > hadoop jar hadoop-distcp-3.3.0-SNAPSHOT.jar > -Dmapreduce.job.user.classpath.first=true -filters /tmp/filter > /tmp/tocopy/.snapshot/s1 /tmp/target > [hdfs@ctr-e141-1563959304486-33995-01-03 root]$ hadoop fs -ls /tmp/target > Found 3 items > -rw-r--r-- 3 hdfs hdfs 16 2019-09-24 06:56 /tmp/target/.b.txt > -rw-r--r-- 3 hdfs hdfs 12 2019-09-24 06:56 /tmp/target/a.txt > -rw-r--r-- 3 hdfs hdfs 4 2019-09-24 06:56 /tmp/target/foo.txt{code} > * Rename the staging directory to final > {code:java} > [hdfs@ctr-e141-1563959304486-33995-01-03 hadoop-mapreduce]$ hadoop fs -mv > /tmp/tocopy/.staging /tmp/tocopy/final{code} > * Do a copy using snapshot diff. > {code:java} > [hdfs@ctr-e141-1563959304486-33995-01-03 hadoop-mapreduce]$ hdfs > snapshotDiff /tmp/tocopy s1 s2[hdfs@ctr-e141-1563959304486-33995-01-03 > hadoop-mapreduce]$ hdfs snapshotDiff /tmp/tocopy s1 s2Difference between > snapshot s1 and snapshot s2 under directory /tmp/tocopy:M .R ./.staging -> > ./final > {code} > * The diff report just has a rename record and the new final directory is > never copied. > {code:java} > [hdfs@ctr-e141-1563959304486-33995-01-03 hadoop-mapreduce]$ hadoop jar > hadoop-distcp-3.3.0-SNAPSHOT.jar -Dmapreduce.job.user.classpath.first=true > -filters /tmp/filter -diff s1 s2 -update /tmp/tocopy /tmp/target > 19/09/24 07:05:32 INFO tools.DistCp: Input Options: > DistCpOptions{atomicCommit=false, syncFolder=true, deleteMissing=false, > ignoreFailures=false, overwrite=false, append=false, useDiff=true, > useRdiff=false, fromSnapshot=s1, toSnapshot=s2, skipCRC=false, blocking=true, > numListstatusThreads=0, maxMaps=20, mapBandwidth=0.0, > copyStrategy='uniformsize', preserveStatus=[BLOCKSIZE], atomicWorkPath=null, > logPath=null, sourceFileListing=null, sourcePaths=[/tmp/tocopy], > targetPath=/tmp/target, filtersFile='/tmp/filter', blocksPerChunk=0, > copyBufferSize=8192, verboseLog=false, directWrite=false}, > sourcePaths=[/tmp/tocopy], targetPathExists=true, preserveRawXattrsfalse > 19/09/24 07:05:32 INFO client.RMProxy: Connecting to ResourceManager at > ctr-e141-1563959304486-33995-01-03.hwx.site/172.27.68.128:8050 > 19/09/24 07:05:33 INFO client.AHSProxy: Connecting to Application History > server at ctr-e141-1563959304486-33995-01-03.hwx.site/172.27.68.128:10200 >
[jira] [Resolved] (HDFS-14869) Data loss in case of distcp using snapshot diff. Replication should include rename records if file was skipped in the previous iteration
[ https://issues.apache.org/jira/browse/HDFS-14869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee resolved HDFS-14869. Fix Version/s: 3.1.4 Resolution: Fixed Thanks [~aasha] for the contribution and [~ste...@apache.org] for the review. I have committed this. > Data loss in case of distcp using snapshot diff. Replication should include > rename records if file was skipped in the previous iteration > > > Key: HDFS-14869 > URL: https://issues.apache.org/jira/browse/HDFS-14869 > Project: Hadoop HDFS > Issue Type: Bug > Components: distcp >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Fix For: 3.1.4 > > > This issue arises when a directory or file is excluded by exclusion filter > during distcp replication. Later on if the directory is renamed later to a > name which is not excluded by the filter, the snapshot diff reports only a > rename operation. The directory is never copied to target even though its > not excluded now. This also doesn't throw any error so there is no way to > find the issue. > Steps to reproduce > * Create a directory in hdfs to copy using distcp. > * Include a staging folder in the directory. > {code:java} > [hdfs@ctr-e141-1563959304486-33995-01-03 hadoop-mapreduce]$ hadoop fs -ls > /tmp/tocopy > Found 4 items > -rw-r--r-- 3 hdfs hdfs 16 2019-09-12 10:32 /tmp/tocopy/.b.txt > drwxr-xr-x - hdfs hdfs 0 2019-09-23 09:18 /tmp/tocopy/.staging > -rw-r--r-- 3 hdfs hdfs 12 2019-09-12 10:32 /tmp/tocopy/a.txt > -rw-r--r-- 3 hdfs hdfs 4 2019-09-20 08:23 /tmp/tocopy/foo.txt{code} > * The exclusion filter is set to exclude any staging directory > {code:java} > [hdfs@ctr-e141-1563959304486-33995-01-03 hadoop-mapreduce]$ cat > /tmp/filter > .*\.Trash.* > .*\.staging.*{code} > * Do a copy using distcp snapshots, the staging directory is not replicated. > {code:java} > hadoop jar hadoop-distcp-3.3.0-SNAPSHOT.jar > -Dmapreduce.job.user.classpath.first=true -filters /tmp/filter > /tmp/tocopy/.snapshot/s1 /tmp/target > [hdfs@ctr-e141-1563959304486-33995-01-03 root]$ hadoop fs -ls /tmp/target > Found 3 items > -rw-r--r-- 3 hdfs hdfs 16 2019-09-24 06:56 /tmp/target/.b.txt > -rw-r--r-- 3 hdfs hdfs 12 2019-09-24 06:56 /tmp/target/a.txt > -rw-r--r-- 3 hdfs hdfs 4 2019-09-24 06:56 /tmp/target/foo.txt{code} > * Rename the staging directory to final > {code:java} > [hdfs@ctr-e141-1563959304486-33995-01-03 hadoop-mapreduce]$ hadoop fs -mv > /tmp/tocopy/.staging /tmp/tocopy/final{code} > * Do a copy using snapshot diff. > {code:java} > [hdfs@ctr-e141-1563959304486-33995-01-03 hadoop-mapreduce]$ hdfs > snapshotDiff /tmp/tocopy s1 s2[hdfs@ctr-e141-1563959304486-33995-01-03 > hadoop-mapreduce]$ hdfs snapshotDiff /tmp/tocopy s1 s2Difference between > snapshot s1 and snapshot s2 under directory /tmp/tocopy:M .R ./.staging -> > ./final > {code} > * The diff report just has a rename record and the new final directory is > never copied. > {code:java} > [hdfs@ctr-e141-1563959304486-33995-01-03 hadoop-mapreduce]$ hadoop jar > hadoop-distcp-3.3.0-SNAPSHOT.jar -Dmapreduce.job.user.classpath.first=true > -filters /tmp/filter -diff s1 s2 -update /tmp/tocopy /tmp/target > 19/09/24 07:05:32 INFO tools.DistCp: Input Options: > DistCpOptions{atomicCommit=false, syncFolder=true, deleteMissing=false, > ignoreFailures=false, overwrite=false, append=false, useDiff=true, > useRdiff=false, fromSnapshot=s1, toSnapshot=s2, skipCRC=false, blocking=true, > numListstatusThreads=0, maxMaps=20, mapBandwidth=0.0, > copyStrategy='uniformsize', preserveStatus=[BLOCKSIZE], atomicWorkPath=null, > logPath=null, sourceFileListing=null, sourcePaths=[/tmp/tocopy], > targetPath=/tmp/target, filtersFile='/tmp/filter', blocksPerChunk=0, > copyBufferSize=8192, verboseLog=false, directWrite=false}, > sourcePaths=[/tmp/tocopy], targetPathExists=true, preserveRawXattrsfalse > 19/09/24 07:05:32 INFO client.RMProxy: Connecting to ResourceManager at > ctr-e141-1563959304486-33995-01-03.hwx.site/172.27.68.128:8050 > 19/09/24 07:05:33 INFO client.AHSProxy: Connecting to Application History > server at ctr-e141-1563959304486-33995-01-03.hwx.site/172.27.68.128:10200 > 19/09/24 07:05:33 INFO tools.DistCp: Number of paths in the copy list: 0 > 19/09/24 07:05:33 INFO client.RMProxy: Connecting to ResourceManager at > ctr-e141-1563959304486-33995-01-03.hwx.site/172.27.68.128:8050 > 19/09/24 07:05:33 INFO client.AHSProxy: Connecting to Application History > server at ctr-e141-1563959304486-33995-01-03.hwx.site/172.27.68.128:10200 > 19/09/24 07:05:33 INFO
[jira] [Created] (HDFS-15034) fuse-dfs does not respect HADOOP_USER_NAME envvar with simple auth
Istvan Fajth created HDFS-15034: --- Summary: fuse-dfs does not respect HADOOP_USER_NAME envvar with simple auth Key: HDFS-15034 URL: https://issues.apache.org/jira/browse/HDFS-15034 Project: Hadoop HDFS Issue Type: Improvement Components: fuse-dfs Reporter: Istvan Fajth In the fuse code, there is an explicit map fro the context uid to the username on the OS level with the help of getpwuid() system call. As we have already a way to access the callers environment, to determine the kerberos ticket cache path, we can respect the HADOOP_USER_NAME setting in a SIMPLE_AUTH based environment, so that the host where the mount is does not need to have all the users that are defined and used on HDFS. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14983) RBF: Add dfsrouteradmin -refreshSuperUserGroupsConfiguration command option
[ https://issues.apache.org/jira/browse/HDFS-14983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16989634#comment-16989634 ] Xieming Li commented on HDFS-14983: --- I have added documentation, javadoc, and a unit test. > RBF: Add dfsrouteradmin -refreshSuperUserGroupsConfiguration command option > --- > > Key: HDFS-14983 > URL: https://issues.apache.org/jira/browse/HDFS-14983 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf >Reporter: Akira Ajisaka >Assignee: Xieming Li >Priority: Minor > Attachments: HDFS-14983.002.patch, HDFS-14983.draft.001.patch > > > NameNode can update proxyuser config by -refreshSuperUserGroupsConfiguration > without restarting but DFSRouter cannot. It would be better for DFSRouter to > have such functionality to be compatible with NameNode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14983) RBF: Add dfsrouteradmin -refreshSuperUserGroupsConfiguration command option
[ https://issues.apache.org/jira/browse/HDFS-14983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xieming Li updated HDFS-14983: -- Attachment: HDFS-14983.002.patch Status: Patch Available (was: Open) > RBF: Add dfsrouteradmin -refreshSuperUserGroupsConfiguration command option > --- > > Key: HDFS-14983 > URL: https://issues.apache.org/jira/browse/HDFS-14983 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf >Reporter: Akira Ajisaka >Assignee: Xieming Li >Priority: Minor > Attachments: HDFS-14983.002.patch, HDFS-14983.draft.001.patch > > > NameNode can update proxyuser config by -refreshSuperUserGroupsConfiguration > without restarting but DFSRouter cannot. It would be better for DFSRouter to > have such functionality to be compatible with NameNode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14983) RBF: Add dfsrouteradmin -refreshSuperUserGroupsConfiguration command option
[ https://issues.apache.org/jira/browse/HDFS-14983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xieming Li updated HDFS-14983: -- Status: Open (was: Patch Available) > RBF: Add dfsrouteradmin -refreshSuperUserGroupsConfiguration command option > --- > > Key: HDFS-14983 > URL: https://issues.apache.org/jira/browse/HDFS-14983 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf >Reporter: Akira Ajisaka >Assignee: Xieming Li >Priority: Minor > Attachments: HDFS-14983.draft.001.patch > > > NameNode can update proxyuser config by -refreshSuperUserGroupsConfiguration > without restarting but DFSRouter cannot. It would be better for DFSRouter to > have such functionality to be compatible with NameNode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14740) Recover data blocks from persistent memory read cache during datanode restarts
[ https://issues.apache.org/jira/browse/HDFS-14740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16989600#comment-16989600 ] Rakesh Radhakrishnan commented on HDFS-14740: - Thanks [~PhiloHe] for the updates. How about keeping the two pmem related configs with matching names like below : {{'dfs.datanode.pmem.cache.restore'}} and {{'dfs.datanode.pmem.cache.dirs'}} ? > Recover data blocks from persistent memory read cache during datanode restarts > -- > > Key: HDFS-14740 > URL: https://issues.apache.org/jira/browse/HDFS-14740 > Project: Hadoop HDFS > Issue Type: Improvement > Components: caching, datanode >Reporter: Feilong He >Assignee: Feilong He >Priority: Major > Attachments: HDFS-14740.000.patch, HDFS-14740.001.patch, > HDFS-14740.002.patch, HDFS-14740.003.patch, HDFS-14740.004.patch, > HDFS-14740.005.patch, HDFS-14740.006.patch, HDFS-14740.007.patch, > HDFS_Persistent_Read-Cache_Design-v1.pdf, > HDFS_Persistent_Read-Cache_Test-v1.1.pdf, > HDFS_Persistent_Read-Cache_Test-v1.pdf, HDFS_Persistent_Read-Cache_Test-v2.pdf > > > In HDFS-13762, persistent memory (PM) is enabled in HDFS centralized cache > management. Even though PM can persist cache data, for simplifying the > initial implementation, the previous cache data will be cleaned up during > DataNode restarts. Here, we are proposing to improve HDFS PM cache by taking > advantage of PM's data persistence characteristic, i.e., recovering the > status for cached data, if any, when DataNode restarts, thus, cache warm up > time can be saved for user. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14740) Recover data blocks from persistent memory read cache during datanode restarts
[ https://issues.apache.org/jira/browse/HDFS-14740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16989587#comment-16989587 ] Feilong He commented on HDFS-14740: --- [^HDFS-14740.007.patch] has been uploaded to change a property to 'dfs.datanode.cache.restore.enabled'. Comment is welcome! > Recover data blocks from persistent memory read cache during datanode restarts > -- > > Key: HDFS-14740 > URL: https://issues.apache.org/jira/browse/HDFS-14740 > Project: Hadoop HDFS > Issue Type: Improvement > Components: caching, datanode >Reporter: Feilong He >Assignee: Feilong He >Priority: Major > Attachments: HDFS-14740.000.patch, HDFS-14740.001.patch, > HDFS-14740.002.patch, HDFS-14740.003.patch, HDFS-14740.004.patch, > HDFS-14740.005.patch, HDFS-14740.006.patch, HDFS-14740.007.patch, > HDFS_Persistent_Read-Cache_Design-v1.pdf, > HDFS_Persistent_Read-Cache_Test-v1.1.pdf, > HDFS_Persistent_Read-Cache_Test-v1.pdf, HDFS_Persistent_Read-Cache_Test-v2.pdf > > > In HDFS-13762, persistent memory (PM) is enabled in HDFS centralized cache > management. Even though PM can persist cache data, for simplifying the > initial implementation, the previous cache data will be cleaned up during > DataNode restarts. Here, we are proposing to improve HDFS PM cache by taking > advantage of PM's data persistence characteristic, i.e., recovering the > status for cached data, if any, when DataNode restarts, thus, cache warm up > time can be saved for user. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14740) Recover data blocks from persistent memory read cache during datanode restarts
[ https://issues.apache.org/jira/browse/HDFS-14740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feilong He updated HDFS-14740: -- Attachment: HDFS-14740.007.patch > Recover data blocks from persistent memory read cache during datanode restarts > -- > > Key: HDFS-14740 > URL: https://issues.apache.org/jira/browse/HDFS-14740 > Project: Hadoop HDFS > Issue Type: Improvement > Components: caching, datanode >Reporter: Feilong He >Assignee: Feilong He >Priority: Major > Attachments: HDFS-14740.000.patch, HDFS-14740.001.patch, > HDFS-14740.002.patch, HDFS-14740.003.patch, HDFS-14740.004.patch, > HDFS-14740.005.patch, HDFS-14740.006.patch, HDFS-14740.007.patch, > HDFS_Persistent_Read-Cache_Design-v1.pdf, > HDFS_Persistent_Read-Cache_Test-v1.1.pdf, > HDFS_Persistent_Read-Cache_Test-v1.pdf, HDFS_Persistent_Read-Cache_Test-v2.pdf > > > In HDFS-13762, persistent memory (PM) is enabled in HDFS centralized cache > management. Even though PM can persist cache data, for simplifying the > initial implementation, the previous cache data will be cleaned up during > DataNode restarts. Here, we are proposing to improve HDFS PM cache by taking > advantage of PM's data persistence characteristic, i.e., recovering the > status for cached data, if any, when DataNode restarts, thus, cache warm up > time can be saved for user. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14740) Recover data blocks from persistent memory read cache during datanode restarts
[ https://issues.apache.org/jira/browse/HDFS-14740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16989573#comment-16989573 ] Feilong He commented on HDFS-14740: --- Thanks [~rakeshr] so much for your comments. Sorry for this late reply. # Yes, 'dfs.datanode.cache.persistence.enabled' looks a bit ambiguous to user. This property is used to control whether the cache on pmem should be restored to aviod unnecessarily pulling data to pmem again after DataNode restarts. I prefer to use 'dfs.datanode.cache.restore.enabled'. If you have other comment, please kindly let me know. # I have conducted some tests on the case you mentioned. 1) In my test, a file is cached to pmem by HDFS with the above flag set to true. Then, I shutdown the cluster and set the flag to false. After restarted the cluster, I noted that the previous cache is dropped on pmem and DataNode has to recache the block data to pmem, as we expected. 2) I also did another test. Firstly, a file is cached to pmem by HDFS with the above flag set to false. Then, I shutdown the cluster and set the flat to true. During the restarting of DataNode, I can see that the previous cache is restored, as we expected. To sum up, the behavior in the two tests aligns with the purpose of this flag. > Recover data blocks from persistent memory read cache during datanode restarts > -- > > Key: HDFS-14740 > URL: https://issues.apache.org/jira/browse/HDFS-14740 > Project: Hadoop HDFS > Issue Type: Improvement > Components: caching, datanode >Reporter: Feilong He >Assignee: Feilong He >Priority: Major > Attachments: HDFS-14740.000.patch, HDFS-14740.001.patch, > HDFS-14740.002.patch, HDFS-14740.003.patch, HDFS-14740.004.patch, > HDFS-14740.005.patch, HDFS-14740.006.patch, > HDFS_Persistent_Read-Cache_Design-v1.pdf, > HDFS_Persistent_Read-Cache_Test-v1.1.pdf, > HDFS_Persistent_Read-Cache_Test-v1.pdf, HDFS_Persistent_Read-Cache_Test-v2.pdf > > > In HDFS-13762, persistent memory (PM) is enabled in HDFS centralized cache > management. Even though PM can persist cache data, for simplifying the > initial implementation, the previous cache data will be cleaned up during > DataNode restarts. Here, we are proposing to improve HDFS PM cache by taking > advantage of PM's data persistence characteristic, i.e., recovering the > status for cached data, if any, when DataNode restarts, thus, cache warm up > time can be saved for user. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-14740) Recover data blocks from persistent memory read cache during datanode restarts
[ https://issues.apache.org/jira/browse/HDFS-14740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feilong He reassigned HDFS-14740: - Assignee: Feilong He (was: Rui Mo) > Recover data blocks from persistent memory read cache during datanode restarts > -- > > Key: HDFS-14740 > URL: https://issues.apache.org/jira/browse/HDFS-14740 > Project: Hadoop HDFS > Issue Type: Improvement > Components: caching, datanode >Reporter: Feilong He >Assignee: Feilong He >Priority: Major > Attachments: HDFS-14740.000.patch, HDFS-14740.001.patch, > HDFS-14740.002.patch, HDFS-14740.003.patch, HDFS-14740.004.patch, > HDFS-14740.005.patch, HDFS-14740.006.patch, > HDFS_Persistent_Read-Cache_Design-v1.pdf, > HDFS_Persistent_Read-Cache_Test-v1.1.pdf, > HDFS_Persistent_Read-Cache_Test-v1.pdf, HDFS_Persistent_Read-Cache_Test-v2.pdf > > > In HDFS-13762, persistent memory (PM) is enabled in HDFS centralized cache > management. Even though PM can persist cache data, for simplifying the > initial implementation, the previous cache data will be cleaned up during > DataNode restarts. Here, we are proposing to improve HDFS PM cache by taking > advantage of PM's data persistence characteristic, i.e., recovering the > status for cached data, if any, when DataNode restarts, thus, cache warm up > time can be saved for user. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org