[jira] [Commented] (HDFS-15792) ClasscastException while loading FSImage
[ https://issues.apache.org/jira/browse/HDFS-15792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17277780#comment-17277780 ] Xiaoqiao He commented on HDFS-15792: [~prasad-acit] the following lambda expression need to change to general expression for branch-2.10. {code:java} @Override public int decrementAndGetRefCount() { -return (refCount > 0) ? --refCount : 0; +return value.updateAndGet(i -> i > 0 ? i - 1 : i); } {code} > ClasscastException while loading FSImage > > > Key: HDFS-15792 > URL: https://issues.apache.org/jira/browse/HDFS-15792 > Project: Hadoop HDFS > Issue Type: Bug > Components: nn >Reporter: Renukaprasad C >Assignee: Renukaprasad C >Priority: Major > Fix For: 3.3.1, 3.4.0 > > Attachments: HDFS-15792-branch-2.10.001.patch, HDFS-15792.001.patch, > HDFS-15792.002.patch, HDFS-15792.003.patch, HDFS-15792.004.patch, > HDFS-15792.005.patch, HDFS-15792.addendum.001.patch, > image-2021-01-27-12-00-34-846.png > > > FSImage loading has failed with ClasscastException - > java.lang.ClassCastException: java.util.HashMap$Node cannot be cast to > java.util.HashMap$TreeNode. > This is the usage issue with Hashmap in concurrent scenarios. > Same issue has been reported on Java & closed as usage issue. - > https://bugs.openjdk.java.net/browse/JDK-8173671 > 2020-12-28 11:36:26,127 | ERROR | main | An exception occurred when loading > INODE from fsiamge. | FSImageFormatProtobuf.java:442 > java.lang. > : java.util.HashMap$Node cannot be cast to java.util.HashMap$TreeNode > at java.util.HashMap$TreeNode.moveRootToFront(HashMap.java:1835) > at java.util.HashMap$TreeNode.treeify(HashMap.java:1951) > at java.util.HashMap.treeifyBin(HashMap.java:772) > at java.util.HashMap.putVal(HashMap.java:644) > at java.util.HashMap.put(HashMap.java:612) > at > org.apache.hadoop.hdfs.util.ReferenceCountMap.put(ReferenceCountMap.java:53) > at > org.apache.hadoop.hdfs.server.namenode.AclStorage.addAclFeature(AclStorage.java:391) > at > org.apache.hadoop.hdfs.server.namenode.INodeWithAdditionalFields.addAclFeature(INodeWithAdditionalFields.java:349) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeDirectory(FSImageFormatPBINode.java:225) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINode(FSImageFormatPBINode.java:406) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.readPBINodes(FSImageFormatPBINode.java:367) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeSection(FSImageFormatPBINode.java:342) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader$2.call(FSImageFormatProtobuf.java:469) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2020-12-28 11:36:26,130 | ERROR | main | Failed to load image from > FSImageFile(file=/srv/BigData/namenode/current/fsimage_00198227480, > cpktTxId=00198227480) | FSImage.java:738 > java.io.IOException: java.lang.ClassCastException: java.util.HashMap$Node > cannot be cast to java.util.HashMap$TreeNode > at > org.apache.hadoop.io.MultipleIOException$Builder.add(MultipleIOException.java:68) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.runLoaderTasks(FSImageFormatProtobuf.java:444) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:360) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:263) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:227) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:971) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:955) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:820) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:733) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:331) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1113) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:730) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:648) > at >
[jira] [Commented] (HDFS-15815) if required storageType are unavailable log the failed reason during choosing Datanode
[ https://issues.apache.org/jira/browse/HDFS-15815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17277743#comment-17277743 ] Hadoop QA commented on HDFS-15815: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 45s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} {color} | {color:green} 0m 0s{color} | {color:green}test4tests{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 50s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 21s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 15s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 1s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 19s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 13s{color} | {color:green}{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 54s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 30s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01 {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 3m 5s{color} | {color:blue}{color} | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 3s{color} | {color:green}{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 12s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 13s{color} | {color:green}{color} | {color:green} the patch passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 13s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 8s{color} | {color:green}{color} | {color:green} the patch passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 8s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 53s{color} | {color:orange}https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/453/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 70 unchanged - 0 fixed = 71 total (was 70) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 14s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 36s{color} | {color:green}{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 48s{color} | {color:green}{color} |
[jira] [Work logged] (HDFS-15624) Fix the SetQuotaByStorageTypeOp problem after updating hadoop
[ https://issues.apache.org/jira/browse/HDFS-15624?focusedWorklogId=546884=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-546884 ] ASF GitHub Bot logged work on HDFS-15624: - Author: ASF GitHub Bot Created on: 03/Feb/21 06:47 Start Date: 03/Feb/21 06:47 Worklog Time Spent: 10m Work Description: liuml07 commented on pull request #2377: URL: https://github.com/apache/hadoop/pull/2377#issuecomment-772277416 Merged and resolved the JIRA. Thank you all! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 546884) Time Spent: 9h 40m (was: 9.5h) > Fix the SetQuotaByStorageTypeOp problem after updating hadoop > --- > > Key: HDFS-15624 > URL: https://issues.apache.org/jira/browse/HDFS-15624 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.4.0 >Reporter: YaYun Wang >Assignee: huangtianhua >Priority: Major > Labels: pull-request-available, release-blocker > Fix For: 3.4.0 > > Time Spent: 9h 40m > Remaining Estimate: 0h > > HDFS-15025 adds a new storage Type NVDIMM, changes the ordinal() of the enum > of StorageType. And, setting the quota by storageType depends on the > ordinal(), therefore, it may cause the setting of quota to be invalid after > upgrade. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-15624) Fix the SetQuotaByStorageTypeOp problem after updating hadoop
[ https://issues.apache.org/jira/browse/HDFS-15624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Liu resolved HDFS-15624. -- Fix Version/s: 3.4.0 Hadoop Flags: Reviewed Resolution: Fixed Committed to trunk branch. Thank you [~huangtianhua] and su xu for your contribution. Thank you [~ayushtkn] and [~vinayakumarb] for your helpful review. > Fix the SetQuotaByStorageTypeOp problem after updating hadoop > --- > > Key: HDFS-15624 > URL: https://issues.apache.org/jira/browse/HDFS-15624 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.4.0 >Reporter: YaYun Wang >Assignee: huangtianhua >Priority: Major > Labels: pull-request-available, release-blocker > Fix For: 3.4.0 > > Time Spent: 9.5h > Remaining Estimate: 0h > > HDFS-15025 adds a new storage Type NVDIMM, changes the ordinal() of the enum > of StorageType. And, setting the quota by storageType depends on the > ordinal(), therefore, it may cause the setting of quota to be invalid after > upgrade. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-15624) Fix the SetQuotaByStorageTypeOp problem after updating hadoop
[ https://issues.apache.org/jira/browse/HDFS-15624?focusedWorklogId=546882=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-546882 ] ASF GitHub Bot logged work on HDFS-15624: - Author: ASF GitHub Bot Created on: 03/Feb/21 06:45 Start Date: 03/Feb/21 06:45 Worklog Time Spent: 10m Work Description: huangtianhua commented on pull request #2377: URL: https://github.com/apache/hadoop/pull/2377#issuecomment-772276331 @ayushtkn would you please to approve this, thanks very much. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 546882) Time Spent: 9h 20m (was: 9h 10m) > Fix the SetQuotaByStorageTypeOp problem after updating hadoop > --- > > Key: HDFS-15624 > URL: https://issues.apache.org/jira/browse/HDFS-15624 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.4.0 >Reporter: YaYun Wang >Assignee: YaYun Wang >Priority: Major > Labels: pull-request-available, release-blocker > Time Spent: 9h 20m > Remaining Estimate: 0h > > HDFS-15025 adds a new storage Type NVDIMM, changes the ordinal() of the enum > of StorageType. And, setting the quota by storageType depends on the > ordinal(), therefore, it may cause the setting of quota to be invalid after > upgrade. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-15624) Fix the SetQuotaByStorageTypeOp problem after updating hadoop
[ https://issues.apache.org/jira/browse/HDFS-15624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Liu reassigned HDFS-15624: Assignee: YaYun Wang > Fix the SetQuotaByStorageTypeOp problem after updating hadoop > --- > > Key: HDFS-15624 > URL: https://issues.apache.org/jira/browse/HDFS-15624 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.4.0 >Reporter: YaYun Wang >Assignee: YaYun Wang >Priority: Major > Labels: pull-request-available, release-blocker > Time Spent: 9h 10m > Remaining Estimate: 0h > > HDFS-15025 adds a new storage Type NVDIMM, changes the ordinal() of the enum > of StorageType. And, setting the quota by storageType depends on the > ordinal(), therefore, it may cause the setting of quota to be invalid after > upgrade. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-15624) Fix the SetQuotaByStorageTypeOp problem after updating hadoop
[ https://issues.apache.org/jira/browse/HDFS-15624?focusedWorklogId=546883=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-546883 ] ASF GitHub Bot logged work on HDFS-15624: - Author: ASF GitHub Bot Created on: 03/Feb/21 06:45 Start Date: 03/Feb/21 06:45 Worklog Time Spent: 10m Work Description: huangtianhua removed a comment on pull request #2377: URL: https://github.com/apache/hadoop/pull/2377#issuecomment-772276331 @ayushtkn would you please to approve this, thanks very much. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 546883) Time Spent: 9.5h (was: 9h 20m) > Fix the SetQuotaByStorageTypeOp problem after updating hadoop > --- > > Key: HDFS-15624 > URL: https://issues.apache.org/jira/browse/HDFS-15624 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.4.0 >Reporter: YaYun Wang >Assignee: huangtianhua >Priority: Major > Labels: pull-request-available, release-blocker > Time Spent: 9.5h > Remaining Estimate: 0h > > HDFS-15025 adds a new storage Type NVDIMM, changes the ordinal() of the enum > of StorageType. And, setting the quota by storageType depends on the > ordinal(), therefore, it may cause the setting of quota to be invalid after > upgrade. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-15624) Fix the SetQuotaByStorageTypeOp problem after updating hadoop
[ https://issues.apache.org/jira/browse/HDFS-15624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Liu reassigned HDFS-15624: Assignee: huangtianhua (was: YaYun Wang) > Fix the SetQuotaByStorageTypeOp problem after updating hadoop > --- > > Key: HDFS-15624 > URL: https://issues.apache.org/jira/browse/HDFS-15624 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.4.0 >Reporter: YaYun Wang >Assignee: huangtianhua >Priority: Major > Labels: pull-request-available, release-blocker > Time Spent: 9h 20m > Remaining Estimate: 0h > > HDFS-15025 adds a new storage Type NVDIMM, changes the ordinal() of the enum > of StorageType. And, setting the quota by storageType depends on the > ordinal(), therefore, it may cause the setting of quota to be invalid after > upgrade. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-15624) Fix the SetQuotaByStorageTypeOp problem after updating hadoop
[ https://issues.apache.org/jira/browse/HDFS-15624?focusedWorklogId=546881=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-546881 ] ASF GitHub Bot logged work on HDFS-15624: - Author: ASF GitHub Bot Created on: 03/Feb/21 06:44 Start Date: 03/Feb/21 06:44 Worklog Time Spent: 10m Work Description: liuml07 merged pull request #2377: URL: https://github.com/apache/hadoop/pull/2377 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 546881) Time Spent: 9h 10m (was: 9h) > Fix the SetQuotaByStorageTypeOp problem after updating hadoop > --- > > Key: HDFS-15624 > URL: https://issues.apache.org/jira/browse/HDFS-15624 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.4.0 >Reporter: YaYun Wang >Priority: Major > Labels: pull-request-available, release-blocker > Time Spent: 9h 10m > Remaining Estimate: 0h > > HDFS-15025 adds a new storage Type NVDIMM, changes the ordinal() of the enum > of StorageType. And, setting the quota by storageType depends on the > ordinal(), therefore, it may cause the setting of quota to be invalid after > upgrade. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15812) after deleting data of hbase table hdfs size is not decreasing
[ https://issues.apache.org/jira/browse/HDFS-15812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17277725#comment-17277725 ] Satya Gaurav commented on HDFS-15812: - [~surendralilhore] I have sent an email on u...@hadoop.apache.org > after deleting data of hbase table hdfs size is not decreasing > -- > > Key: HDFS-15812 > URL: https://issues.apache.org/jira/browse/HDFS-15812 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.0.2-alpha > Environment: HDP 3.1.4.0-315 > Hbase 2.0.2.3.1.4.0-315 >Reporter: Satya Gaurav >Priority: Major > > I am deleting the data from hbase table, it's deleting from hbase table but > the size of the hdfs directory is not reducing. Even I ran the major > compaction but after that also hdfs size didn't reduce. Any solution for this > issue? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-15624) Fix the SetQuotaByStorageTypeOp problem after updating hadoop
[ https://issues.apache.org/jira/browse/HDFS-15624?focusedWorklogId=546874=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-546874 ] ASF GitHub Bot logged work on HDFS-15624: - Author: ASF GitHub Bot Created on: 03/Feb/21 06:12 Start Date: 03/Feb/21 06:12 Worklog Time Spent: 10m Work Description: huangtianhua commented on pull request #2377: URL: https://github.com/apache/hadoop/pull/2377#issuecomment-772261120 @liuml07 could you approve this? Thanks very much. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 546874) Time Spent: 9h (was: 8h 50m) > Fix the SetQuotaByStorageTypeOp problem after updating hadoop > --- > > Key: HDFS-15624 > URL: https://issues.apache.org/jira/browse/HDFS-15624 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.4.0 >Reporter: YaYun Wang >Priority: Major > Labels: pull-request-available, release-blocker > Time Spent: 9h > Remaining Estimate: 0h > > HDFS-15025 adds a new storage Type NVDIMM, changes the ordinal() of the enum > of StorageType. And, setting the quota by storageType depends on the > ordinal(), therefore, it may cause the setting of quota to be invalid after > upgrade. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15812) after deleting data of hbase table hdfs size is not decreasing
[ https://issues.apache.org/jira/browse/HDFS-15812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17277710#comment-17277710 ] Satya Gaurav commented on HDFS-15812: - [~surendralilhore] it's not moving into trash also after 2 days also the size is same. > after deleting data of hbase table hdfs size is not decreasing > -- > > Key: HDFS-15812 > URL: https://issues.apache.org/jira/browse/HDFS-15812 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.0.2-alpha > Environment: HDP 3.1.4.0-315 > Hbase 2.0.2.3.1.4.0-315 >Reporter: Satya Gaurav >Priority: Major > > I am deleting the data from hbase table, it's deleting from hbase table but > the size of the hdfs directory is not reducing. Even I ran the major > compaction but after that also hdfs size didn't reduce. Any solution for this > issue? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15812) after deleting data of hbase table hdfs size is not decreasing
[ https://issues.apache.org/jira/browse/HDFS-15812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17277683#comment-17277683 ] Surendra Singh Lilhore commented on HDFS-15812: --- please send your query on [u...@hadoop.apache.org.|mailto:u...@hadoop.apache.org] > after deleting data of hbase table hdfs size is not decreasing > -- > > Key: HDFS-15812 > URL: https://issues.apache.org/jira/browse/HDFS-15812 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.0.2-alpha > Environment: HDP 3.1.4.0-315 > Hbase 2.0.2.3.1.4.0-315 >Reporter: Satya Gaurav >Priority: Major > > I am deleting the data from hbase table, it's deleting from hbase table but > the size of the hdfs directory is not reducing. Even I ran the major > compaction but after that also hdfs size didn't reduce. Any solution for this > issue? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15812) after deleting data of hbase table hdfs size is not decreasing
[ https://issues.apache.org/jira/browse/HDFS-15812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17277682#comment-17277682 ] Surendra Singh Lilhore commented on HDFS-15812: --- [~satycse06], it will take time to delete data from hdfs if is moved to trash. > after deleting data of hbase table hdfs size is not decreasing > -- > > Key: HDFS-15812 > URL: https://issues.apache.org/jira/browse/HDFS-15812 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.0.2-alpha > Environment: HDP 3.1.4.0-315 > Hbase 2.0.2.3.1.4.0-315 >Reporter: Satya Gaurav >Priority: Major > > I am deleting the data from hbase table, it's deleting from hbase table but > the size of the hdfs directory is not reducing. Even I ran the major > compaction but after that also hdfs size didn't reduce. Any solution for this > issue? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-13609) [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via RPC
[ https://issues.apache.org/jira/browse/HDFS-13609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17277680#comment-17277680 ] xuzq edited comment on HDFS-13609 at 2/3/21, 4:59 AM: -- Give one example to illustrate what I think. We have 5 journals, like jn1 ~ jn5. And Active write edits like: |Txid|SuccessWriteJournalId|FailedJournalId| |TxId1|jn2, jn3, jn4, jn5|jn1(write into cache, write disk failed)| |TxId2|jn2, jn3, jn4, jn5| | |TxId3|jn2, jn3, jn4, jn5| | |TxId4|jn2, jn3, jn4, jn5| | |TxId5|jn2, jn3, jn4, jn5| | When we attempt to failover standby to active, standby need to catchup all edits from TxId1 ~ TxId5 from TxId1, and change to active. But before to failover standby to active, jn4 and jn5 have some delay times, caused responseCounts like (0(jn1), 5(jn2), 5(jn3)) when _editLogTailer.catchupDuringFailover()._ Standby NameNode expect to get all edits from TxId1 ~ TxId5, but only get txId1. TxId2 ~ TxId5 don't applied into fsImage. And it will caused StandbyNameNode cashed when _getFSImage().editLog.openForWrite()._ I think we should use responseCounts(2) ~ responseCounts(4) to ensure can catchup all edits. But the last edit in responseCounts(2) ~ responseCounts(4) maybe is writing by active, maybe not on a quorum of JNs. It will cause Obsever NameNode or Standby NameNode tail UnQuorum edits. Or maybe we can write disk first, then write cache in Journal Node. [~xkrogen] On this question, if you have some good ideas, please tell me, thanks. was (Author: xuzq_zander): Give one example to illustrate what I think. We have 5 journals, like jn1 ~ jn5. And Active write edits like: |Txid|SuccessWriteJournalId|FailedJournalId| |TxId1|jn2, jn3, jn4, jn5|jn1(write into cache, write disk failed)| |TxId2|jn2, jn3, jn4, jn5| | |TxId3|jn2, jn3, jn4, jn5| | |TxId4|jn2, jn3, jn4, jn5| | |TxId5|jn2, jn3, jn4, jn5| | When we attempt to failover standby to active, standby need to catchup all edits from TxId1 ~ TxId5 from TxId1, and change to active. But before to failover standby to active, jn4 and jn5 have some delay times, caused responseCounts like (0(jn1), 5(jn2), 5(jn3)) when _editLogTailer.catchupDuringFailover()._ Standby NameNode expect to get all edits from TxId1 ~ TxId5, but only get txId1. TxId2 ~ TxId5 don't applied into fsImage. And it will caused StandbyNameNode cashed when _getFSImage().editLog.openForWrite()._ I think we should use responseCounts(2) ~ responseCounts(4) to ensure can catchup all edits. But the last edit in responseCounts(2) ~ responseCounts(4) maybe is writing by active, maybe not on a quorum of JNs. It will cause Obsever NameNode or Standby NameNode tail UnQuorum edits. [~xkrogen] On this question, if you have some good ideas, please tell me, thanks. > [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via > RPC > - > > Key: HDFS-13609 > URL: https://issues.apache.org/jira/browse/HDFS-13609 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, namenode >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Fix For: HDFS-12943, 3.3.0 > > Attachments: HDFS-13609-HDFS-12943.000.patch, > HDFS-13609-HDFS-12943.001.patch, HDFS-13609-HDFS-12943.002.patch, > HDFS-13609-HDFS-12943.003.patch, HDFS-13609-HDFS-12943.004.patch > > > See HDFS-13150 for the full design. > This JIRA is targetted at the NameNode-side changes to enable tailing > in-progress edits via the RPC mechanism added in HDFS-13608. Most changes are > in the QuorumJournalManager. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13609) [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via RPC
[ https://issues.apache.org/jira/browse/HDFS-13609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17277680#comment-17277680 ] xuzq commented on HDFS-13609: - Give one example to illustrate what I think. We have 5 journals, like jn1 ~ jn5. And Active write edits like: |Txid|SuccessWriteJournalId|FailedJournalId| |TxId1|jn2, jn3, jn4, jn5|jn1(write into cache, write disk failed)| |TxId2|jn2, jn3, jn4, jn5| | |TxId3|jn2, jn3, jn4, jn5| | |TxId4|jn2, jn3, jn4, jn5| | |TxId5|jn2, jn3, jn4, jn5| | When we attempt to failover standby to active, standby need to catchup all edits from TxId1 ~ TxId5 from TxId1, and change to active. But before to failover standby to active, jn4 and jn5 have some delay times, caused responseCounts like (0(jn1), 5(jn2), 5(jn3)) when _editLogTailer.catchupDuringFailover()._ Standby NameNode expect to get all edits from TxId1 ~ TxId5, but only get txId1. TxId2 ~ TxId5 don't applied into fsImage. And it will caused StandbyNameNode cashed when _getFSImage().editLog.openForWrite()._ I think we should use responseCounts(2) ~ responseCounts(4) to ensure can catchup all edits. But the last edit in responseCounts(2) ~ responseCounts(4) maybe is writing by active, maybe not on a quorum of JNs. It will cause Obsever NameNode or Standby NameNode tail UnQuorum edits. [~xkrogen] On this question, if you have some good ideas, please tell me, thanks. > [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via > RPC > - > > Key: HDFS-13609 > URL: https://issues.apache.org/jira/browse/HDFS-13609 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, namenode >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Fix For: HDFS-12943, 3.3.0 > > Attachments: HDFS-13609-HDFS-12943.000.patch, > HDFS-13609-HDFS-12943.001.patch, HDFS-13609-HDFS-12943.002.patch, > HDFS-13609-HDFS-12943.003.patch, HDFS-13609-HDFS-12943.004.patch > > > See HDFS-13150 for the full design. > This JIRA is targetted at the NameNode-side changes to enable tailing > in-progress edits via the RPC mechanism added in HDFS-13608. Most changes are > in the QuorumJournalManager. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15792) ClasscastException while loading FSImage
[ https://issues.apache.org/jira/browse/HDFS-15792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17277670#comment-17277670 ] Hadoop QA commented on HDFS-15792: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 0m 59s{color} | {color:red}{color} | {color:red} Docker failed to build yetus/hadoop:7257b17793d. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HDFS-15792 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/13019864/HDFS-15792-branch-2.10.001.patch | | Console output | https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/455/console | | versions | git=2.17.1 | | Powered by | Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org | This message was automatically generated. > ClasscastException while loading FSImage > > > Key: HDFS-15792 > URL: https://issues.apache.org/jira/browse/HDFS-15792 > Project: Hadoop HDFS > Issue Type: Bug > Components: nn >Reporter: Renukaprasad C >Assignee: Renukaprasad C >Priority: Major > Fix For: 3.3.1, 3.4.0 > > Attachments: HDFS-15792-branch-2.10.001.patch, HDFS-15792.001.patch, > HDFS-15792.002.patch, HDFS-15792.003.patch, HDFS-15792.004.patch, > HDFS-15792.005.patch, HDFS-15792.addendum.001.patch, > image-2021-01-27-12-00-34-846.png > > > FSImage loading has failed with ClasscastException - > java.lang.ClassCastException: java.util.HashMap$Node cannot be cast to > java.util.HashMap$TreeNode. > This is the usage issue with Hashmap in concurrent scenarios. > Same issue has been reported on Java & closed as usage issue. - > https://bugs.openjdk.java.net/browse/JDK-8173671 > 2020-12-28 11:36:26,127 | ERROR | main | An exception occurred when loading > INODE from fsiamge. | FSImageFormatProtobuf.java:442 > java.lang. > : java.util.HashMap$Node cannot be cast to java.util.HashMap$TreeNode > at java.util.HashMap$TreeNode.moveRootToFront(HashMap.java:1835) > at java.util.HashMap$TreeNode.treeify(HashMap.java:1951) > at java.util.HashMap.treeifyBin(HashMap.java:772) > at java.util.HashMap.putVal(HashMap.java:644) > at java.util.HashMap.put(HashMap.java:612) > at > org.apache.hadoop.hdfs.util.ReferenceCountMap.put(ReferenceCountMap.java:53) > at > org.apache.hadoop.hdfs.server.namenode.AclStorage.addAclFeature(AclStorage.java:391) > at > org.apache.hadoop.hdfs.server.namenode.INodeWithAdditionalFields.addAclFeature(INodeWithAdditionalFields.java:349) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeDirectory(FSImageFormatPBINode.java:225) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINode(FSImageFormatPBINode.java:406) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.readPBINodes(FSImageFormatPBINode.java:367) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeSection(FSImageFormatPBINode.java:342) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader$2.call(FSImageFormatProtobuf.java:469) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2020-12-28 11:36:26,130 | ERROR | main | Failed to load image from > FSImageFile(file=/srv/BigData/namenode/current/fsimage_00198227480, > cpktTxId=00198227480) | FSImage.java:738 > java.io.IOException: java.lang.ClassCastException: java.util.HashMap$Node > cannot be cast to java.util.HashMap$TreeNode > at > org.apache.hadoop.io.MultipleIOException$Builder.add(MultipleIOException.java:68) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.runLoaderTasks(FSImageFormatProtobuf.java:444) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:360) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:263) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:227) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:971) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:955) > at >
[jira] [Commented] (HDFS-15792) ClasscastException while loading FSImage
[ https://issues.apache.org/jira/browse/HDFS-15792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17277654#comment-17277654 ] Hadoop QA commented on HDFS-15792: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 9m 58s{color} | {color:red}{color} | {color:red} Docker failed to build yetus/hadoop:7257b17793d. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HDFS-15792 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/13019864/HDFS-15792-branch-2.10.001.patch | | Console output | https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/454/console | | versions | git=2.17.1 | | Powered by | Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org | This message was automatically generated. > ClasscastException while loading FSImage > > > Key: HDFS-15792 > URL: https://issues.apache.org/jira/browse/HDFS-15792 > Project: Hadoop HDFS > Issue Type: Bug > Components: nn >Reporter: Renukaprasad C >Assignee: Renukaprasad C >Priority: Major > Fix For: 3.3.1, 3.4.0 > > Attachments: HDFS-15792-branch-2.10.001.patch, HDFS-15792.001.patch, > HDFS-15792.002.patch, HDFS-15792.003.patch, HDFS-15792.004.patch, > HDFS-15792.005.patch, HDFS-15792.addendum.001.patch, > image-2021-01-27-12-00-34-846.png > > > FSImage loading has failed with ClasscastException - > java.lang.ClassCastException: java.util.HashMap$Node cannot be cast to > java.util.HashMap$TreeNode. > This is the usage issue with Hashmap in concurrent scenarios. > Same issue has been reported on Java & closed as usage issue. - > https://bugs.openjdk.java.net/browse/JDK-8173671 > 2020-12-28 11:36:26,127 | ERROR | main | An exception occurred when loading > INODE from fsiamge. | FSImageFormatProtobuf.java:442 > java.lang. > : java.util.HashMap$Node cannot be cast to java.util.HashMap$TreeNode > at java.util.HashMap$TreeNode.moveRootToFront(HashMap.java:1835) > at java.util.HashMap$TreeNode.treeify(HashMap.java:1951) > at java.util.HashMap.treeifyBin(HashMap.java:772) > at java.util.HashMap.putVal(HashMap.java:644) > at java.util.HashMap.put(HashMap.java:612) > at > org.apache.hadoop.hdfs.util.ReferenceCountMap.put(ReferenceCountMap.java:53) > at > org.apache.hadoop.hdfs.server.namenode.AclStorage.addAclFeature(AclStorage.java:391) > at > org.apache.hadoop.hdfs.server.namenode.INodeWithAdditionalFields.addAclFeature(INodeWithAdditionalFields.java:349) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeDirectory(FSImageFormatPBINode.java:225) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINode(FSImageFormatPBINode.java:406) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.readPBINodes(FSImageFormatPBINode.java:367) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeSection(FSImageFormatPBINode.java:342) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader$2.call(FSImageFormatProtobuf.java:469) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2020-12-28 11:36:26,130 | ERROR | main | Failed to load image from > FSImageFile(file=/srv/BigData/namenode/current/fsimage_00198227480, > cpktTxId=00198227480) | FSImage.java:738 > java.io.IOException: java.lang.ClassCastException: java.util.HashMap$Node > cannot be cast to java.util.HashMap$TreeNode > at > org.apache.hadoop.io.MultipleIOException$Builder.add(MultipleIOException.java:68) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.runLoaderTasks(FSImageFormatProtobuf.java:444) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:360) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:263) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:227) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:971) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:955) > at >
[jira] [Updated] (HDFS-15792) ClasscastException while loading FSImage
[ https://issues.apache.org/jira/browse/HDFS-15792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoqiao He updated HDFS-15792: --- Status: Patch Available (was: Reopened) > ClasscastException while loading FSImage > > > Key: HDFS-15792 > URL: https://issues.apache.org/jira/browse/HDFS-15792 > Project: Hadoop HDFS > Issue Type: Bug > Components: nn >Reporter: Renukaprasad C >Assignee: Renukaprasad C >Priority: Major > Fix For: 3.3.1, 3.4.0 > > Attachments: HDFS-15792-branch-2.10.001.patch, HDFS-15792.001.patch, > HDFS-15792.002.patch, HDFS-15792.003.patch, HDFS-15792.004.patch, > HDFS-15792.005.patch, HDFS-15792.addendum.001.patch, > image-2021-01-27-12-00-34-846.png > > > FSImage loading has failed with ClasscastException - > java.lang.ClassCastException: java.util.HashMap$Node cannot be cast to > java.util.HashMap$TreeNode. > This is the usage issue with Hashmap in concurrent scenarios. > Same issue has been reported on Java & closed as usage issue. - > https://bugs.openjdk.java.net/browse/JDK-8173671 > 2020-12-28 11:36:26,127 | ERROR | main | An exception occurred when loading > INODE from fsiamge. | FSImageFormatProtobuf.java:442 > java.lang. > : java.util.HashMap$Node cannot be cast to java.util.HashMap$TreeNode > at java.util.HashMap$TreeNode.moveRootToFront(HashMap.java:1835) > at java.util.HashMap$TreeNode.treeify(HashMap.java:1951) > at java.util.HashMap.treeifyBin(HashMap.java:772) > at java.util.HashMap.putVal(HashMap.java:644) > at java.util.HashMap.put(HashMap.java:612) > at > org.apache.hadoop.hdfs.util.ReferenceCountMap.put(ReferenceCountMap.java:53) > at > org.apache.hadoop.hdfs.server.namenode.AclStorage.addAclFeature(AclStorage.java:391) > at > org.apache.hadoop.hdfs.server.namenode.INodeWithAdditionalFields.addAclFeature(INodeWithAdditionalFields.java:349) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeDirectory(FSImageFormatPBINode.java:225) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINode(FSImageFormatPBINode.java:406) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.readPBINodes(FSImageFormatPBINode.java:367) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeSection(FSImageFormatPBINode.java:342) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader$2.call(FSImageFormatProtobuf.java:469) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2020-12-28 11:36:26,130 | ERROR | main | Failed to load image from > FSImageFile(file=/srv/BigData/namenode/current/fsimage_00198227480, > cpktTxId=00198227480) | FSImage.java:738 > java.io.IOException: java.lang.ClassCastException: java.util.HashMap$Node > cannot be cast to java.util.HashMap$TreeNode > at > org.apache.hadoop.io.MultipleIOException$Builder.add(MultipleIOException.java:68) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.runLoaderTasks(FSImageFormatProtobuf.java:444) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:360) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:263) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:227) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:971) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:955) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:820) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:733) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:331) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1113) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:730) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:648) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:710) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:953) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:926) > at >
[jira] [Reopened] (HDFS-15792) ClasscastException while loading FSImage
[ https://issues.apache.org/jira/browse/HDFS-15792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoqiao He reopened HDFS-15792: > ClasscastException while loading FSImage > > > Key: HDFS-15792 > URL: https://issues.apache.org/jira/browse/HDFS-15792 > Project: Hadoop HDFS > Issue Type: Bug > Components: nn >Reporter: Renukaprasad C >Assignee: Renukaprasad C >Priority: Major > Fix For: 3.3.1, 3.4.0 > > Attachments: HDFS-15792-branch-2.10.001.patch, HDFS-15792.001.patch, > HDFS-15792.002.patch, HDFS-15792.003.patch, HDFS-15792.004.patch, > HDFS-15792.005.patch, HDFS-15792.addendum.001.patch, > image-2021-01-27-12-00-34-846.png > > > FSImage loading has failed with ClasscastException - > java.lang.ClassCastException: java.util.HashMap$Node cannot be cast to > java.util.HashMap$TreeNode. > This is the usage issue with Hashmap in concurrent scenarios. > Same issue has been reported on Java & closed as usage issue. - > https://bugs.openjdk.java.net/browse/JDK-8173671 > 2020-12-28 11:36:26,127 | ERROR | main | An exception occurred when loading > INODE from fsiamge. | FSImageFormatProtobuf.java:442 > java.lang. > : java.util.HashMap$Node cannot be cast to java.util.HashMap$TreeNode > at java.util.HashMap$TreeNode.moveRootToFront(HashMap.java:1835) > at java.util.HashMap$TreeNode.treeify(HashMap.java:1951) > at java.util.HashMap.treeifyBin(HashMap.java:772) > at java.util.HashMap.putVal(HashMap.java:644) > at java.util.HashMap.put(HashMap.java:612) > at > org.apache.hadoop.hdfs.util.ReferenceCountMap.put(ReferenceCountMap.java:53) > at > org.apache.hadoop.hdfs.server.namenode.AclStorage.addAclFeature(AclStorage.java:391) > at > org.apache.hadoop.hdfs.server.namenode.INodeWithAdditionalFields.addAclFeature(INodeWithAdditionalFields.java:349) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeDirectory(FSImageFormatPBINode.java:225) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINode(FSImageFormatPBINode.java:406) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.readPBINodes(FSImageFormatPBINode.java:367) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeSection(FSImageFormatPBINode.java:342) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader$2.call(FSImageFormatProtobuf.java:469) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2020-12-28 11:36:26,130 | ERROR | main | Failed to load image from > FSImageFile(file=/srv/BigData/namenode/current/fsimage_00198227480, > cpktTxId=00198227480) | FSImage.java:738 > java.io.IOException: java.lang.ClassCastException: java.util.HashMap$Node > cannot be cast to java.util.HashMap$TreeNode > at > org.apache.hadoop.io.MultipleIOException$Builder.add(MultipleIOException.java:68) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.runLoaderTasks(FSImageFormatProtobuf.java:444) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:360) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:263) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:227) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:971) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:955) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:820) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:733) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:331) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1113) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:730) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:648) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:710) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:953) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:926) > at >
[jira] [Updated] (HDFS-15815) if required storageType are unavailable log the failed reason during choosing Datanode
[ https://issues.apache.org/jira/browse/HDFS-15815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Yun updated HDFS-15815: Attachment: HDFS-15815.001.patch Status: Patch Available (was: Open) > if required storageType are unavailable log the failed reason during > choosing Datanode > --- > > Key: HDFS-15815 > URL: https://issues.apache.org/jira/browse/HDFS-15815 > Project: Hadoop HDFS > Issue Type: Improvement > Components: block placement >Reporter: Yang Yun >Assignee: Yang Yun >Priority: Minor > Attachments: HDFS-15815.001.patch > > > For better debug, if required storageType are unavailable, log the failed > reason "NO_REQUIRED_STORAGE_TYPE" when choosing Datanode. > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15815) if required storageType are unavailable log the failed reason during choosing Datanode
[ https://issues.apache.org/jira/browse/HDFS-15815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Yun updated HDFS-15815: Summary: if required storageType are unavailable log the failed reason during choosing Datanode (was: if required storageType are unavailable Log the failed reason when choosing Datanode) > if required storageType are unavailable log the failed reason during > choosing Datanode > --- > > Key: HDFS-15815 > URL: https://issues.apache.org/jira/browse/HDFS-15815 > Project: Hadoop HDFS > Issue Type: Improvement > Components: block placement >Reporter: Yang Yun >Assignee: Yang Yun >Priority: Minor > > For better debug, if required storageType are unavailable, log the failed > reason "NO_REQUIRED_STORAGE_TYPE" when choosing Datanode. > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-15815) if required storageType are unavailable Log the failed reason when choosing Datanode
Yang Yun created HDFS-15815: --- Summary: if required storageType are unavailable Log the failed reason when choosing Datanode Key: HDFS-15815 URL: https://issues.apache.org/jira/browse/HDFS-15815 Project: Hadoop HDFS Issue Type: Improvement Components: block placement Reporter: Yang Yun Assignee: Yang Yun For better debug, if required storageType are unavailable, log the failed reason "NO_REQUIRED_STORAGE_TYPE" when choosing Datanode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-13609) [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via RPC
[ https://issues.apache.org/jira/browse/HDFS-13609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17277640#comment-17277640 ] xuzq edited comment on HDFS-13609 at 2/3/21, 2:36 AM: -- Thanks [~xkrogen], It is when onlyDurableTxns is true that we get responseCounts.get(0). In our production environment, one nameNode is crashed when we failover it to active, and cached one exception like: {code:java} 2021-02-01 20:38:23,402 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode IPC Server handler 227 on 8022: Error encountered requiring N N shutdown. Shutting down immediately. java.lang.IllegalStateException: Cannot start writing at txid 58504771317 when there is a stream available for read: org.apache.hadoop.hdfs .server.namenode.RedundantEditLogInputStream@57d3ac44 at org.apache.hadoop.hdfs.server.namenode.FSEditLog.openForWrite(FSEditLog.java:324) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:1417) at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:1969) at org.apache.hadoop.hdfs.server.namenode.ha.ActiveState.enterState(ActiveState.java:61) at org.apache.hadoop.hdfs.server.namenode.ha.HAState.setStateInternal(HAState.java:64) at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.setState(StandbyState.java:58) at org.apache.hadoop.hdfs.server.namenode.NameNode.transitionToActive(NameNode.java:1826) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.transitionToActive(NameNodeRpcServer.java:1658) at org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToActive(HAServiceProtocolServerSideTranslator PB.java:111) at org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:54 09) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:620) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1125) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:3246) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:3242) at java.base/java.security.AccessController.doPrivileged(AccessController.java:689) at java.base/javax.security.auth.Subject.doAs(Subject.java:423) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1796) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3240) {code} After looking at code, I think is _editLogTailer.catchupDuringFailover()_ can't catchup all edits, caused check failed when _getFSImage().editLog.openForWrite()_. And one journal is wrong when write edit into disk after write it into cache successfully. And as _onlyDurableTxns_ is true, then we get _responseCounts.get(0),_ and the wrong journal's response is _responseCounts.get(0),_ so caused _editLogTailer.catchupDuringFailover()_ can't catchup all edits. And the response like {{(0, 1000, 1000).}} {quote}Thus since we only got 3 responses, we have to take the lowest txn that any of those responses are aware of. {quote} It maybe causes we can't tail any edits when the first response Journal is wrong. * It maybe caused _editLogTailer.catchupDuringFailover()_ can't catchup all edits, and cause NN crash when failover it to active. * It maybe caused Observer NameNode can't supported read rpc. was (Author: xuzq_zander): Thanks [~xkrogen], It is when onlyDurableTxns is true that we get responseCounts.get(0). In our production environment, one nameNode is crashed when we failover it to active, and cached one exception like: {code:java} 2021-02-01 20:38:23,402 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode IPC Server handler 227 on 8022: Error encountered requiring N N shutdown. Shutting down immediately. java.lang.IllegalStateException: Cannot start writing at txid 58504771317 when there is a stream available for read: org.apache.hadoop.hdfs .server.namenode.RedundantEditLogInputStream@57d3ac44 at org.apache.hadoop.hdfs.server.namenode.FSEditLog.openForWrite(FSEditLog.java:324) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:1417) at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:1969) at org.apache.hadoop.hdfs.server.namenode.ha.ActiveState.enterState(ActiveState.java:61) at org.apache.hadoop.hdfs.server.namenode.ha.HAState.setStateInternal(HAState.java:64) at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.setState(StandbyState.java:58) at org.apache.hadoop.hdfs.server.namenode.NameNode.transitionToActive(NameNode.java:1826) at
[jira] [Comment Edited] (HDFS-13609) [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via RPC
[ https://issues.apache.org/jira/browse/HDFS-13609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17277640#comment-17277640 ] xuzq edited comment on HDFS-13609 at 2/3/21, 2:35 AM: -- Thanks [~xkrogen], It is when onlyDurableTxns is true that we get responseCounts.get(0). In our production environment, one nameNode is crashed when we failover it to active, and cached one exception like: {code:java} 2021-02-01 20:38:23,402 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode IPC Server handler 227 on 8022: Error encountered requiring N N shutdown. Shutting down immediately. java.lang.IllegalStateException: Cannot start writing at txid 58504771317 when there is a stream available for read: org.apache.hadoop.hdfs .server.namenode.RedundantEditLogInputStream@57d3ac44 at org.apache.hadoop.hdfs.server.namenode.FSEditLog.openForWrite(FSEditLog.java:324) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:1417) at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:1969) at org.apache.hadoop.hdfs.server.namenode.ha.ActiveState.enterState(ActiveState.java:61) at org.apache.hadoop.hdfs.server.namenode.ha.HAState.setStateInternal(HAState.java:64) at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.setState(StandbyState.java:58) at org.apache.hadoop.hdfs.server.namenode.NameNode.transitionToActive(NameNode.java:1826) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.transitionToActive(NameNodeRpcServer.java:1658) at org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToActive(HAServiceProtocolServerSideTranslator PB.java:111) at org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:54 09) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:620) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1125) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:3246) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:3242) at java.base/java.security.AccessController.doPrivileged(AccessController.java:689) at java.base/javax.security.auth.Subject.doAs(Subject.java:423) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1796) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3240) {code} After looking at code, I think is _editLogTailer.catchupDuringFailover()_ can't catchup all edits, caused check failed when _getFSImage().editLog.openForWrite()_. And one journal is wrong when write edit into disk after write it into cache successfully. And as _onlyDurableTxns_ is true, then we get _responseCounts.get(0),_ and the wrong journal's response is _responseCounts.get(0),_ so caused _editLogTailer.catchupDuringFailover()_ can't catchup all edits. And the response like {{(0, 1000, 1000).}} {quote}Thus since we only got 3 responses, we have to take the lowest txn that any of those responses are aware of. {quote} It maybe causes we can't tail any edits when the first response Journal is wrong. * It maybe caused _editLogTailer.catchupDuringFailover()_ can't catchup all edits, and NN crashed when change to active. * It maybe caused Observer NameNode can't supported read rpc. was (Author: xuzq_zander): Thanks [~xkrogen], It is when onlyDurableTxns is true that we get responseCounts.get(0). In our production environment, one nameNode is down when we failover it to active, and cached one exception like: {code:java} 2021-02-01 20:38:23,402 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode IPC Server handler 227 on 8022: Error encountered requiring N N shutdown. Shutting down immediately. java.lang.IllegalStateException: Cannot start writing at txid 58504771317 when there is a stream available for read: org.apache.hadoop.hdfs .server.namenode.RedundantEditLogInputStream@57d3ac44 at org.apache.hadoop.hdfs.server.namenode.FSEditLog.openForWrite(FSEditLog.java:324) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:1417) at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:1969) at org.apache.hadoop.hdfs.server.namenode.ha.ActiveState.enterState(ActiveState.java:61) at org.apache.hadoop.hdfs.server.namenode.ha.HAState.setStateInternal(HAState.java:64) at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.setState(StandbyState.java:58) at org.apache.hadoop.hdfs.server.namenode.NameNode.transitionToActive(NameNode.java:1826) at
[jira] [Work logged] (HDFS-15683) Allow configuring DISK/ARCHIVE capacity for individual volumes
[ https://issues.apache.org/jira/browse/HDFS-15683?focusedWorklogId=546818=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-546818 ] ASF GitHub Bot logged work on HDFS-15683: - Author: ASF GitHub Bot Created on: 03/Feb/21 02:32 Start Date: 03/Feb/21 02:32 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #2625: URL: https://github.com/apache/hadoop/pull/2625#issuecomment-772171696 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 14m 22s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 1s | | No case conflicting files found. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | | 0m 0s | [test4tests](test4tests) | The patch appears to include 2 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 32m 22s | | trunk passed | | +1 :green_heart: | compile | 1m 21s | | trunk passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04 | | +1 :green_heart: | compile | 1m 18s | | trunk passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01 | | +1 :green_heart: | checkstyle | 1m 20s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 26s | | trunk passed | | +1 :green_heart: | shadedclient | 16m 7s | | branch has no errors when building and testing our client artifacts. | | +1 :green_heart: | javadoc | 0m 54s | | trunk passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04 | | +1 :green_heart: | javadoc | 1m 29s | | trunk passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01 | | +0 :ok: | spotbugs | 3m 6s | | Used deprecated FindBugs config; considering switching to SpotBugs. | | +1 :green_heart: | findbugs | 3m 2s | | trunk passed | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 1m 14s | | the patch passed | | +1 :green_heart: | compile | 1m 13s | | the patch passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04 | | +1 :green_heart: | javac | 1m 13s | | the patch passed | | +1 :green_heart: | compile | 1m 3s | | the patch passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01 | | +1 :green_heart: | javac | 1m 3s | | the patch passed | | +1 :green_heart: | checkstyle | 1m 12s | | hadoop-hdfs-project/hadoop-hdfs: The patch generated 0 new + 741 unchanged - 1 fixed = 741 total (was 742) | | +1 :green_heart: | mvnsite | 1m 13s | | the patch passed | | +1 :green_heart: | whitespace | 0m 0s | | The patch has no whitespace issues. | | +1 :green_heart: | xml | 0m 1s | | The patch has no ill-formed XML file. | | +1 :green_heart: | shadedclient | 12m 54s | | patch has no errors when building and testing our client artifacts. | | +1 :green_heart: | javadoc | 0m 50s | | the patch passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04 | | +1 :green_heart: | javadoc | 1m 25s | | the patch passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01 | | +1 :green_heart: | findbugs | 3m 7s | | the patch passed | _ Other Tests _ | | -1 :x: | unit | 197m 57s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2625/7/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 42s | | The patch does not generate ASF License warnings. | | | | 297m 55s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdfs.server.namenode.TestNamenodeRetryCache | | | hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl | | | hadoop.hdfs.server.namenode.TestFSEditLogLoader | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2625/7/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/2625 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml | | uname | Linux b84ecd5c541c 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / f37bf651993 | | Default Java | Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01 | | Multi-JDK versions |
[jira] [Comment Edited] (HDFS-13609) [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via RPC
[ https://issues.apache.org/jira/browse/HDFS-13609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17277640#comment-17277640 ] xuzq edited comment on HDFS-13609 at 2/3/21, 2:32 AM: -- Thanks [~xkrogen], It is when onlyDurableTxns is true that we get responseCounts.get(0). In our production environment, one nameNode is down when we failover it to active, and cached one exception like: {code:java} 2021-02-01 20:38:23,402 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode IPC Server handler 227 on 8022: Error encountered requiring N N shutdown. Shutting down immediately. java.lang.IllegalStateException: Cannot start writing at txid 58504771317 when there is a stream available for read: org.apache.hadoop.hdfs .server.namenode.RedundantEditLogInputStream@57d3ac44 at org.apache.hadoop.hdfs.server.namenode.FSEditLog.openForWrite(FSEditLog.java:324) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:1417) at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:1969) at org.apache.hadoop.hdfs.server.namenode.ha.ActiveState.enterState(ActiveState.java:61) at org.apache.hadoop.hdfs.server.namenode.ha.HAState.setStateInternal(HAState.java:64) at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.setState(StandbyState.java:58) at org.apache.hadoop.hdfs.server.namenode.NameNode.transitionToActive(NameNode.java:1826) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.transitionToActive(NameNodeRpcServer.java:1658) at org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToActive(HAServiceProtocolServerSideTranslator PB.java:111) at org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:54 09) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:620) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1125) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:3246) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:3242) at java.base/java.security.AccessController.doPrivileged(AccessController.java:689) at java.base/javax.security.auth.Subject.doAs(Subject.java:423) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1796) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3240) {code} After looking at code, I think is _editLogTailer.catchupDuringFailover()_ can't catchup all edits, caused check failed when _getFSImage().editLog.openForWrite()_. And one journal is wrong when write edit into disk after write it into cache successfully. And as _onlyDurableTxns_ is true, then we get _responseCounts.get(0),_ and the wrong journal's response is _responseCounts.get(0),_ so caused _editLogTailer.catchupDuringFailover()_ can't catchup all edits. And the response like {{(0, 1000, 1000).}} {quote}Thus since we only got 3 responses, we have to take the lowest txn that any of those responses are aware of. {quote} It maybe causes we can't tail any edits when the first response Journal is wrong. * It maybe caused _editLogTailer.catchupDuringFailover()_ can't catchup all edits, and NN crashed when change to active. * It maybe caused Observer NameNode can't supported read rpc. was (Author: xuzq_zander): Thanks [~xkrogen], It is when onlyDurableTxns is true that we get responseCounts.get(0). In our production environment, one nameNode is down when we failover it to active, and cached one exception like: {code:java} 2021-02-01 20:38:23,402 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode IPC Server handler 227 on 8022: Error encountered requiring N N shutdown. Shutting down immediately. java.lang.IllegalStateException: Cannot start writing at txid 58504771317 when there is a stream available for read: org.apache.hadoop.hdfs .server.namenode.RedundantEditLogInputStream@57d3ac44 at org.apache.hadoop.hdfs.server.namenode.FSEditLog.openForWrite(FSEditLog.java:324) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:1417) at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:1969) at org.apache.hadoop.hdfs.server.namenode.ha.ActiveState.enterState(ActiveState.java:61) at org.apache.hadoop.hdfs.server.namenode.ha.HAState.setStateInternal(HAState.java:64) at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.setState(StandbyState.java:58) at org.apache.hadoop.hdfs.server.namenode.NameNode.transitionToActive(NameNode.java:1826) at
[jira] [Commented] (HDFS-13609) [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via RPC
[ https://issues.apache.org/jira/browse/HDFS-13609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17277637#comment-17277637 ] xuzq commented on HDFS-13609: - Thanks [~xkrogen] for the comment. It is when {{onlyDurableTxns}} is true that we get {{responseCounts.get(0)}} In our production environment, one nameNode is down when we failover it to active, and cache one exception like: {code:java} 2021-02-01 20:38:23,402 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode IPC Server handler 227 on 8022: Error encountered requiring N N shutdown. Shutting down immediately. java.lang.IllegalStateException: Cannot start writing at txid 58504771317 when there is a stream available for read: org.apache.hadoop.hdfs .server.namenode.RedundantEditLogInputStream@57d3ac44 at org.apache.hadoop.hdfs.server.namenode.FSEditLog.openForWrite(FSEditLog.java:324) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:1417) at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:1969) at org.apache.hadoop.hdfs.server.namenode.ha.ActiveState.enterState(ActiveState.java:61) at org.apache.hadoop.hdfs.server.namenode.ha.HAState.setStateInternal(HAState.java:64) at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.setState(StandbyState.java:58) at org.apache.hadoop.hdfs.server.namenode.NameNode.transitionToActive(NameNode.java:1826) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.transitionToActive(NameNodeRpcServer.java:1658) at org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToActive(HAServiceProtocolServerSideTranslator PB.java:111) at org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:54 09) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:620) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1125) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:3246) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:3242) at java.base/java.security.AccessController.doPrivileged(AccessController.java:689) at java.base/javax.security.auth.Subject.doAs(Subject.java:423) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1796) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3240) {code} After looking at the code, i think _editLogTailer.catchupDuringFailover()_ can't catchup all edits, cause check failed when _getFSImage().editLog.openForWrite()_. As when _{{onlyDurableTxns}}_ is true that we get {{_responseCounts.get(0)_, }}cause _editLogTailer.catchupDuringFailover()_ can't catchup all edits, because one journal is wrong when write journal on disk after write into cache, and this journal response is {{_responseCounts.get(0)_.}} {quote}Thus since we only got 3 responses, we have to take the lowest txn that any of those responses are aware of. {quote} * It maybe cause *_editLogTailer.catchupDuringFailover()_ can't catchup all edits* when _maxAllowedTxns={{responseCounts.get(0)=0.}}_ * __And It maybe cause doTailEdits can't tail any edits too. > [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via > RPC > - > > Key: HDFS-13609 > URL: https://issues.apache.org/jira/browse/HDFS-13609 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, namenode >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Fix For: HDFS-12943, 3.3.0 > > Attachments: HDFS-13609-HDFS-12943.000.patch, > HDFS-13609-HDFS-12943.001.patch, HDFS-13609-HDFS-12943.002.patch, > HDFS-13609-HDFS-12943.003.patch, HDFS-13609-HDFS-12943.004.patch > > > See HDFS-13150 for the full design. > This JIRA is targetted at the NameNode-side changes to enable tailing > in-progress edits via the RPC mechanism added in HDFS-13608. Most changes are > in the QuorumJournalManager. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-13609) [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via RPC
[ https://issues.apache.org/jira/browse/HDFS-13609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17277640#comment-17277640 ] xuzq edited comment on HDFS-13609 at 2/3/21, 2:29 AM: -- Thanks [~xkrogen], It is when onlyDurableTxns is true that we get responseCounts.get(0). In our production environment, one nameNode is down when we failover it to active, and cached one exception like: {code:java} 2021-02-01 20:38:23,402 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode IPC Server handler 227 on 8022: Error encountered requiring N N shutdown. Shutting down immediately. java.lang.IllegalStateException: Cannot start writing at txid 58504771317 when there is a stream available for read: org.apache.hadoop.hdfs .server.namenode.RedundantEditLogInputStream@57d3ac44 at org.apache.hadoop.hdfs.server.namenode.FSEditLog.openForWrite(FSEditLog.java:324) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:1417) at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:1969) at org.apache.hadoop.hdfs.server.namenode.ha.ActiveState.enterState(ActiveState.java:61) at org.apache.hadoop.hdfs.server.namenode.ha.HAState.setStateInternal(HAState.java:64) at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.setState(StandbyState.java:58) at org.apache.hadoop.hdfs.server.namenode.NameNode.transitionToActive(NameNode.java:1826) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.transitionToActive(NameNodeRpcServer.java:1658) at org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToActive(HAServiceProtocolServerSideTranslator PB.java:111) at org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:54 09) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:620) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1125) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:3246) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:3242) at java.base/java.security.AccessController.doPrivileged(AccessController.java:689) at java.base/javax.security.auth.Subject.doAs(Subject.java:423) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1796) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3240) {code} After looking at code, I think is _editLogTailer.catchupDuringFailover()_ can't catchup all edits, caused check failed when _getFSImage().editLog.openForWrite()_. And one journal is wrong when write edit into disk after write is into cache successfully. And as _onlyDurableTxns_ is true, then we get _responseCounts.get(0),_ and the wrong journal's response is _responseCounts.get(0),_ so caused _editLogTailer.catchupDuringFailover()_ can't catchup all edits. And the response like {{(0, 1000, 1000).}} {quote}Thus since we only got 3 responses, we have to take the lowest txn that any of those responses are aware of. {quote} It maybe causes we can't tail any edits when the first response Journal is wrong. * It maybe caused _editLogTailer.catchupDuringFailover()_ can't catchup all edits, and NN crashed when change to active. * It maybe caused Observer NameNode can't supported read rpc. was (Author: xuzq_zander): Thanks [~xkrogen], It is when onlyDurableTxns is true that we get responseCounts.get(0). In our production environment, one nameNode is down when we failover it to active, and cached one exception like: {code:java} 2021-02-01 20:38:23,402 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode IPC Server handler 227 on 8022: Error encountered requiring N N shutdown. Shutting down immediately. java.lang.IllegalStateException: Cannot start writing at txid 58504771317 when there is a stream available for read: org.apache.hadoop.hdfs .server.namenode.RedundantEditLogInputStream@57d3ac44 at org.apache.hadoop.hdfs.server.namenode.FSEditLog.openForWrite(FSEditLog.java:324) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:1417) at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:1969) at org.apache.hadoop.hdfs.server.namenode.ha.ActiveState.enterState(ActiveState.java:61) at org.apache.hadoop.hdfs.server.namenode.ha.HAState.setStateInternal(HAState.java:64) at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.setState(StandbyState.java:58) at org.apache.hadoop.hdfs.server.namenode.NameNode.transitionToActive(NameNode.java:1826) at
[jira] [Comment Edited] (HDFS-13609) [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via RPC
[ https://issues.apache.org/jira/browse/HDFS-13609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17277637#comment-17277637 ] xuzq edited comment on HDFS-13609 at 2/3/21, 2:18 AM: -- Thanks [~xkrogen] for the comment. It is when {{onlyDurableTxns}} is true that we get {{responseCounts.get(0)}} In our production environment, one nameNode is down when we failover it to active, and cache one exception like: {code:java} 2021-02-01 20:38:23,402 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode IPC Server handler 227 on 8022: Error encountered requiring N N shutdown. Shutting down immediately. java.lang.IllegalStateException: Cannot start writing at txid 58504771317 when there is a stream available for read: org.apache.hadoop.hdfs .server.namenode.RedundantEditLogInputStream@57d3ac44 at org.apache.hadoop.hdfs.server.namenode.FSEditLog.openForWrite(FSEditLog.java:324) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:1417) at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:1969) at org.apache.hadoop.hdfs.server.namenode.ha.ActiveState.enterState(ActiveState.java:61) at org.apache.hadoop.hdfs.server.namenode.ha.HAState.setStateInternal(HAState.java:64) at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.setState(StandbyState.java:58) at org.apache.hadoop.hdfs.server.namenode.NameNode.transitionToActive(NameNode.java:1826) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.transitionToActive(NameNodeRpcServer.java:1658) at org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToActive(HAServiceProtocolServerSideTranslator PB.java:111) at org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:54 09) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:620) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1125) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:3246) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:3242) at java.base/java.security.AccessController.doPrivileged(AccessController.java:689) at java.base/javax.security.auth.Subject.doAs(Subject.java:423) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1796) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3240) {code} After looking at the code, i think _editLogTailer.catchupDuringFailover()_ can't catchup all edits, cause check failed when _getFSImage().editLog.openForWrite()_. As when _{{onlyDurableTxns}}_ is true that we get {{_responseCounts.get(0)_, caused }}{{_editLogTailer.catchupDuringFailover()_ can't catchup all edits, because one journal is wrong when write edit on disk after write it into cache successfully, and this wrong journal's response is }}{{_responseCounts.get(0)._}}{{}}{{}} {quote}Thus since we only got 3 responses, we have to take the lowest txn that any of those responses are aware of. {quote} * It maybe cause *_editLogTailer.catchupDuringFailover()_ can't catchup all edits* when _maxAllowedTxns={{responseCounts.get(0)=0.}}_ * And It maybe cause doTailEdits can't tail any edits too. was (Author: xuzq_zander): Thanks [~xkrogen] for the comment. It is when {{onlyDurableTxns}} is true that we get {{responseCounts.get(0)}} In our production environment, one nameNode is down when we failover it to active, and cache one exception like: {code:java} 2021-02-01 20:38:23,402 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode IPC Server handler 227 on 8022: Error encountered requiring N N shutdown. Shutting down immediately. java.lang.IllegalStateException: Cannot start writing at txid 58504771317 when there is a stream available for read: org.apache.hadoop.hdfs .server.namenode.RedundantEditLogInputStream@57d3ac44 at org.apache.hadoop.hdfs.server.namenode.FSEditLog.openForWrite(FSEditLog.java:324) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:1417) at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:1969) at org.apache.hadoop.hdfs.server.namenode.ha.ActiveState.enterState(ActiveState.java:61) at org.apache.hadoop.hdfs.server.namenode.ha.HAState.setStateInternal(HAState.java:64) at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.setState(StandbyState.java:58) at org.apache.hadoop.hdfs.server.namenode.NameNode.transitionToActive(NameNode.java:1826) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.transitionToActive(NameNodeRpcServer.java:1658) at
[jira] [Comment Edited] (HDFS-13609) [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via RPC
[ https://issues.apache.org/jira/browse/HDFS-13609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17277637#comment-17277637 ] xuzq edited comment on HDFS-13609 at 2/3/21, 2:18 AM: -- Thanks [~xkrogen] for the comment. It is when {{onlyDurableTxns}} is true that we get {{responseCounts.get(0)}} In our production environment, one nameNode is down when we failover it to active, and cache one exception like: {code:java} 2021-02-01 20:38:23,402 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode IPC Server handler 227 on 8022: Error encountered requiring N N shutdown. Shutting down immediately. java.lang.IllegalStateException: Cannot start writing at txid 58504771317 when there is a stream available for read: org.apache.hadoop.hdfs .server.namenode.RedundantEditLogInputStream@57d3ac44 at org.apache.hadoop.hdfs.server.namenode.FSEditLog.openForWrite(FSEditLog.java:324) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:1417) at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:1969) at org.apache.hadoop.hdfs.server.namenode.ha.ActiveState.enterState(ActiveState.java:61) at org.apache.hadoop.hdfs.server.namenode.ha.HAState.setStateInternal(HAState.java:64) at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.setState(StandbyState.java:58) at org.apache.hadoop.hdfs.server.namenode.NameNode.transitionToActive(NameNode.java:1826) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.transitionToActive(NameNodeRpcServer.java:1658) at org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToActive(HAServiceProtocolServerSideTranslator PB.java:111) at org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:54 09) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:620) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1125) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:3246) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:3242) at java.base/java.security.AccessController.doPrivileged(AccessController.java:689) at java.base/javax.security.auth.Subject.doAs(Subject.java:423) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1796) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3240) {code} After looking at the code, i think _editLogTailer.catchupDuringFailover()_ can't catchup all edits, cause check failed when _getFSImage().editLog.openForWrite()_. As when _{{onlyDurableTxns}}_ is true that we get {{_responseCounts.get(0)_, caused }}{{_editLogTailer.catchupDuringFailover()_ can't catchup all edits, because one journal is wrong when write edit on disk after write it into cache successfully, and this wrong journal's response is }}{{_responseCounts.get(0)._}}{{}}{{}} {quote}Thus since we only got 3 responses, we have to take the lowest txn that any of those responses are aware of. {quote} * It maybe cause *_editLogTailer.catchupDuringFailover()_ can't catchup all edits* when _maxAllowedTxns={{responseCounts.get(0)=0.}}_ * And It maybe cause doTailEdits can't tail any edits too. was (Author: xuzq_zander): Thanks [~xkrogen] for the comment. It is when {{onlyDurableTxns}} is true that we get {{responseCounts.get(0)}} In our production environment, one nameNode is down when we failover it to active, and cache one exception like: {code:java} 2021-02-01 20:38:23,402 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode IPC Server handler 227 on 8022: Error encountered requiring N N shutdown. Shutting down immediately. java.lang.IllegalStateException: Cannot start writing at txid 58504771317 when there is a stream available for read: org.apache.hadoop.hdfs .server.namenode.RedundantEditLogInputStream@57d3ac44 at org.apache.hadoop.hdfs.server.namenode.FSEditLog.openForWrite(FSEditLog.java:324) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:1417) at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:1969) at org.apache.hadoop.hdfs.server.namenode.ha.ActiveState.enterState(ActiveState.java:61) at org.apache.hadoop.hdfs.server.namenode.ha.HAState.setStateInternal(HAState.java:64) at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.setState(StandbyState.java:58) at org.apache.hadoop.hdfs.server.namenode.NameNode.transitionToActive(NameNode.java:1826) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.transitionToActive(NameNodeRpcServer.java:1658) at
[jira] [Commented] (HDFS-13609) [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via RPC
[ https://issues.apache.org/jira/browse/HDFS-13609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17277640#comment-17277640 ] xuzq commented on HDFS-13609: - Thanks [~xkrogen], It is when onlyDurableTxns is true that we get responseCounts.get(0). In our production environment, one nameNode is down when we failover it to active, and cached one exception like: {code:java} 2021-02-01 20:38:23,402 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode IPC Server handler 227 on 8022: Error encountered requiring N N shutdown. Shutting down immediately. java.lang.IllegalStateException: Cannot start writing at txid 58504771317 when there is a stream available for read: org.apache.hadoop.hdfs .server.namenode.RedundantEditLogInputStream@57d3ac44 at org.apache.hadoop.hdfs.server.namenode.FSEditLog.openForWrite(FSEditLog.java:324) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:1417) at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:1969) at org.apache.hadoop.hdfs.server.namenode.ha.ActiveState.enterState(ActiveState.java:61) at org.apache.hadoop.hdfs.server.namenode.ha.HAState.setStateInternal(HAState.java:64) at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.setState(StandbyState.java:58) at org.apache.hadoop.hdfs.server.namenode.NameNode.transitionToActive(NameNode.java:1826) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.transitionToActive(NameNodeRpcServer.java:1658) at org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToActive(HAServiceProtocolServerSideTranslator PB.java:111) at org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:54 09) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:620) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1125) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:3246) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:3242) at java.base/java.security.AccessController.doPrivileged(AccessController.java:689) at java.base/javax.security.auth.Subject.doAs(Subject.java:423) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1796) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3240) {code} After looking at code, I think is _editLogTailer.catchupDuringFailover()_ can't catchup all edits, caused check failed when _getFSImage().editLog.openForWrite()_. And one journal is wrong when write edit into disk after write is into cache successfully. And as _onlyDurableTxns_ is true, then we get _responseCounts.get(0),_ and the wrong journal's response is _responseCounts.get(0),_ so caused _editLogTailer.catchupDuringFailover()_ can't catchup all edits. {quote}Thus since we only got 3 responses, we have to take the lowest txn that any of those responses are aware of. {quote} It maybe causes we can't tail any edits when the first response Journal is wrong. * It maybe caused _editLogTailer.catchupDuringFailover()_ can't catchup all edits, and NN crashed when change to active. * It maybe caused Observer NameNode can't supported read rpc. > [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via > RPC > - > > Key: HDFS-13609 > URL: https://issues.apache.org/jira/browse/HDFS-13609 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, namenode >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Fix For: HDFS-12943, 3.3.0 > > Attachments: HDFS-13609-HDFS-12943.000.patch, > HDFS-13609-HDFS-12943.001.patch, HDFS-13609-HDFS-12943.002.patch, > HDFS-13609-HDFS-12943.003.patch, HDFS-13609-HDFS-12943.004.patch > > > See HDFS-13150 for the full design. > This JIRA is targetted at the NameNode-side changes to enable tailing > in-progress edits via the RPC mechanism added in HDFS-13608. Most changes are > in the QuorumJournalManager. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Issue Comment Deleted] (HDFS-13609) [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via RPC
[ https://issues.apache.org/jira/browse/HDFS-13609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xuzq updated HDFS-13609: Comment: was deleted (was: Thanks [~xkrogen] for the comment. It is when {{onlyDurableTxns}} is true that we get {{responseCounts.get(0)}} In our production environment, one nameNode is down when we failover it to active, and cache one exception like: {code:java} 2021-02-01 20:38:23,402 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode IPC Server handler 227 on 8022: Error encountered requiring N N shutdown. Shutting down immediately. java.lang.IllegalStateException: Cannot start writing at txid 58504771317 when there is a stream available for read: org.apache.hadoop.hdfs .server.namenode.RedundantEditLogInputStream@57d3ac44 at org.apache.hadoop.hdfs.server.namenode.FSEditLog.openForWrite(FSEditLog.java:324) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:1417) at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:1969) at org.apache.hadoop.hdfs.server.namenode.ha.ActiveState.enterState(ActiveState.java:61) at org.apache.hadoop.hdfs.server.namenode.ha.HAState.setStateInternal(HAState.java:64) at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.setState(StandbyState.java:58) at org.apache.hadoop.hdfs.server.namenode.NameNode.transitionToActive(NameNode.java:1826) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.transitionToActive(NameNodeRpcServer.java:1658) at org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToActive(HAServiceProtocolServerSideTranslator PB.java:111) at org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:54 09) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:620) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1125) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:3246) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:3242) at java.base/java.security.AccessController.doPrivileged(AccessController.java:689) at java.base/javax.security.auth.Subject.doAs(Subject.java:423) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1796) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3240) {code} After looking at the code, i think _editLogTailer.catchupDuringFailover()_ can't catchup all edits, cause check failed when _getFSImage().editLog.openForWrite()_. As when _{{onlyDurableTxns}}_ is true that we get {{_responseCounts.get(0)_, caused }}{{_editLogTailer.catchupDuringFailover()_ can't catchup all edits, because one journal is wrong when write edit on disk after write it into cache successfully, and this wrong journal's response is }}{{_responseCounts.get(0)._}}{{}}{{}} {quote}Thus since we only got 3 responses, we have to take the lowest txn that any of those responses are aware of. {quote} * It maybe cause *_editLogTailer.catchupDuringFailover()_ can't catchup all edits* when _maxAllowedTxns={{responseCounts.get(0)=0.}}_ * And It maybe cause doTailEdits can't tail any edits too. ) > [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via > RPC > - > > Key: HDFS-13609 > URL: https://issues.apache.org/jira/browse/HDFS-13609 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, namenode >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Fix For: HDFS-12943, 3.3.0 > > Attachments: HDFS-13609-HDFS-12943.000.patch, > HDFS-13609-HDFS-12943.001.patch, HDFS-13609-HDFS-12943.002.patch, > HDFS-13609-HDFS-12943.003.patch, HDFS-13609-HDFS-12943.004.patch > > > See HDFS-13150 for the full design. > This JIRA is targetted at the NameNode-side changes to enable tailing > in-progress edits via the RPC mechanism added in HDFS-13608. Most changes are > in the QuorumJournalManager. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15798) EC: Reconstruct task failed, and It would be XmitsInProgress of DN has negative number
[ https://issues.apache.org/jira/browse/HDFS-15798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17277629#comment-17277629 ] Hui Fei commented on HDFS-15798: [~sodonnell] Thanks for comments, [~haiyang Hu] thanks for update. +1 on [^HDFS-15798.003.patch] > EC: Reconstruct task failed, and It would be XmitsInProgress of DN has > negative number > -- > > Key: HDFS-15798 > URL: https://issues.apache.org/jira/browse/HDFS-15798 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: huhaiyang >Assignee: huhaiyang >Priority: Major > Attachments: HDFS-15798.001.patch, HDFS-15798.002.patch, > HDFS-15798.003.patch > > > The EC reconstruct task failed, and the decrementXmitsInProgress of > processErasureCodingTasks operation abnormal value ; > It would be XmitsInProgress of DN has negative number, it affects NN chooses > pending tasks based on the ratio between the lengths of replication and > erasure-coded block queues. > {code:java} > // 1.ErasureCodingWorker.java > public void processErasureCodingTasks( > Collection ecTasks) { > for (BlockECReconstructionInfo reconInfo : ecTasks) { > int xmitsSubmitted = 0; > try { > ... > // It may throw IllegalArgumentException from task#stripedReader > // constructor. > final StripedBlockReconstructor task = > new StripedBlockReconstructor(this, stripedReconInfo); > if (task.hasValidTargets()) { > // See HDFS-12044. We increase xmitsInProgress even the task is only > // enqueued, so that > // 1) NN will not send more tasks than what DN can execute and > // 2) DN will not throw away reconstruction tasks, and instead keeps > // an unbounded number of tasks in the executor's task queue. > xmitsSubmitted = Math.max((int)(task.getXmits() * xmitWeight), 1); > getDatanode().incrementXmitsInProcess(xmitsSubmitted); // task start > increment > stripedReconstructionPool.submit(task); > } else { > LOG.warn("No missing internal block. Skip reconstruction for task:{}", > reconInfo); > } > } catch (Throwable e) { > getDatanode().decrementXmitsInProgress(xmitsSubmitted); // task failed > decrement, XmitsInProgress is decremented by the previous value > LOG.warn("Failed to reconstruct striped block {}", > reconInfo.getExtendedBlock().getLocalBlock(), e); > } > } > } > // 2.StripedBlockReconstructor.java > public void run() { > try { > initDecoderIfNecessary(); >... > } catch (Throwable e) { > LOG.warn("Failed to reconstruct striped block: {}", getBlockGroup(), e); > getDatanode().getMetrics().incrECFailedReconstructionTasks(); > } finally { > float xmitWeight = getErasureCodingWorker().getXmitWeight(); > // if the xmits is smaller than 1, the xmitsSubmitted should be set to 1 > // because if it set to zero, we cannot to measure the xmits submitted > int xmitsSubmitted = Math.max((int) (getXmits() * xmitWeight), 1); > getDatanode().decrementXmitsInProgress(xmitsSubmitted); // task complete > decrement > ... > } > }{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15799) Make DisallowedDatanodeException terse
[ https://issues.apache.org/jira/browse/HDFS-15799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17277625#comment-17277625 ] Hadoop QA commented on HDFS-15799: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 6m 31s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red}{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 4s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 22s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 13s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 59s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 24s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 17m 41s{color} | {color:green}{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 53s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 21s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01 {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 3m 17s{color} | {color:blue}{color} | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 14s{color} | {color:green}{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 18s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 16s{color} | {color:green}{color} | {color:green} the patch passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 16s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 7s{color} | {color:green}{color} | {color:green} the patch passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 7s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 53s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 13s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 0s{color} | {color:green}{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 47s{color} | {color:green}{color} | {color:green} the patch passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04 {color} | |
[jira] [Work logged] (HDFS-15795) EC: Wrong checksum when reconstruction was failed by exception
[ https://issues.apache.org/jira/browse/HDFS-15795?focusedWorklogId=546756=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-546756 ] ASF GitHub Bot logged work on HDFS-15795: - Author: ASF GitHub Bot Created on: 03/Feb/21 01:35 Start Date: 03/Feb/21 01:35 Worklog Time Spent: 10m Work Description: sodonnel merged pull request #2657: URL: https://github.com/apache/hadoop/pull/2657 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 546756) Time Spent: 1h 50m (was: 1h 40m) > EC: Wrong checksum when reconstruction was failed by exception > -- > > Key: HDFS-15795 > URL: https://issues.apache.org/jira/browse/HDFS-15795 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, ec, erasure-coding >Reporter: Yushi Hayasaka >Assignee: Yushi Hayasaka >Priority: Major > Labels: pull-request-available > Fix For: 3.3.1, 3.4.0, 3.1.5, 3.2.3 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > If the reconstruction task is failed on StripedBlockChecksumReconstructor by > exception, the checksum becomes wrong one because it is calculated with > blocks except a failure one. > It is caused by catching exception with not appropriate way. As a result, the > failed block is not fetched again. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-15757) RBF: Improving Router Connection Management
[ https://issues.apache.org/jira/browse/HDFS-15757?focusedWorklogId=546625=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-546625 ] ASF GitHub Bot logged work on HDFS-15757: - Author: ASF GitHub Bot Created on: 03/Feb/21 01:23 Start Date: 03/Feb/21 01:23 Worklog Time Spent: 10m Work Description: fengnanli commented on a change in pull request #2651: URL: https://github.com/apache/hadoop/pull/2651#discussion_r568866912 ## File path: hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/ConnectionPool.java ## @@ -252,19 +252,23 @@ public synchronized void addConnection(ConnectionContext conn) { */ public synchronized List removeConnections(int num) { List removed = new LinkedList<>(); - -// Remove and close the last connection -List tmpConnections = new ArrayList<>(); -for (int i=0; i this.minSize) { + int targetCount = Math.min(num, this.connections.size() - this.minSize); Review comment: I don't think it can negative here since the only place connections become less is in this function at the swap part with the tmpConnections. The other place where this var gets assigned is in the creation part and it can only increase the value. ## File path: hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/ConnectionContext.java ## @@ -57,6 +62,17 @@ public synchronized boolean isActive() { return this.numThreads > 0; } + /** + * Check if the connection is/was active recently. + * + * @return True if the connection is active or + * was active in the past period of time. + */ + public synchronized boolean isActiveRecently() { +return isActive() || Review comment: That can removed since the timewindow calculation covers the active case. Updated. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 546625) Time Spent: 3h 10m (was: 3h) > RBF: Improving Router Connection Management > --- > > Key: HDFS-15757 > URL: https://issues.apache.org/jira/browse/HDFS-15757 > Project: Hadoop HDFS > Issue Type: Improvement > Components: rbf >Reporter: Fengnan Li >Assignee: Fengnan Li >Priority: Major > Labels: pull-request-available > Attachments: RBF_ Improving Router Connection Management_v2.pdf, RBF_ > Improving Router Connection Management_v3.pdf, RBF_ Router Connection > Management.pdf > > Time Spent: 3h 10m > Remaining Estimate: 0h > > We have seen high number of connections from Router to namenodes, leaving > namenodes unstable. > This ticket is trying to reduce connections through some changes. Please take > a look at the design and leave comments. > Thanks! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-15757) RBF: Improving Router Connection Management
[ https://issues.apache.org/jira/browse/HDFS-15757?focusedWorklogId=546612=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-546612 ] ASF GitHub Bot logged work on HDFS-15757: - Author: ASF GitHub Bot Created on: 03/Feb/21 01:21 Start Date: 03/Feb/21 01:21 Worklog Time Spent: 10m Work Description: fengnanli commented on pull request #2651: URL: https://github.com/apache/hadoop/pull/2651#issuecomment-771903717 > Thanks @fengnanli for you works here. Leave some nit comment inline. > Sorry I do not get why the change can reduce connections here after review the changes, is it related "Be greedy here to close as many connections as possible in one shot"? It will be helpful if we add some javadocs explicitly. Thanks. Thanks for the review @Hexiaoqiao I put the reason behind this change in the design doc in the original JIRA ticket. In short, I did synchronous connection closing + better picking connections + greedy closing connections. I have seen 50% reduce in number of connections and better ProxyTime. It will be great if you can try in your setup as well. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 546612) Time Spent: 3h (was: 2h 50m) > RBF: Improving Router Connection Management > --- > > Key: HDFS-15757 > URL: https://issues.apache.org/jira/browse/HDFS-15757 > Project: Hadoop HDFS > Issue Type: Improvement > Components: rbf >Reporter: Fengnan Li >Assignee: Fengnan Li >Priority: Major > Labels: pull-request-available > Attachments: RBF_ Improving Router Connection Management_v2.pdf, RBF_ > Improving Router Connection Management_v3.pdf, RBF_ Router Connection > Management.pdf > > Time Spent: 3h > Remaining Estimate: 0h > > We have seen high number of connections from Router to namenodes, leaving > namenodes unstable. > This ticket is trying to reduce connections through some changes. Please take > a look at the design and leave comments. > Thanks! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15779) EC: fix NPE caused by StripedWriter.clearBuffers during reconstruct block
[ https://issues.apache.org/jira/browse/HDFS-15779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17277592#comment-17277592 ] Hui Fei commented on HDFS-15779: [~wanghongbing] Thanks for update, will commit later. > EC: fix NPE caused by StripedWriter.clearBuffers during reconstruct block > - > > Key: HDFS-15779 > URL: https://issues.apache.org/jira/browse/HDFS-15779 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.2.0 >Reporter: Hongbing Wang >Assignee: Hongbing Wang >Priority: Major > Attachments: HDFS-15779.001.patch, HDFS-15779.002.patch > > > The NullPointerException in DN log as follows: > {code:java} > 2020-12-28 15:49:25,453 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > DatanodeCommand action: DNA_ERASURE_CODING_RECOVERY > //... > 2020-12-28 15:51:25,551 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Connection timed out > 2020-12-28 15:51:25,553 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Failed to reconstruct striped block: > BP-1922004198-10.83.xx.xx-1515033360950:blk_-9223372036804064064_6311920695 > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedWriter.clearBuffers(StripedWriter.java:299) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.clearBuffers(StripedBlockReconstructor.java:139) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.reconstruct(StripedBlockReconstructor.java:115) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.run(StripedBlockReconstructor.java:60) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 2020-12-28 15:51:25,749 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Receiving > BP-1922004198-10.83.xx.xx-1515033360950:blk_-9223372036799445643_6313197139 > src: /10.83.xxx.52:53198 dest: /10.83.xxx.52:50 > 010 > {code} > NPE occurs at `writer.getTargetBuffer()` in codes: > {code:java} > // StripedWriter#clearBuffers > void clearBuffers() { > for (StripedBlockWriter writer : writers) { > ByteBuffer targetBuffer = writer.getTargetBuffer(); > if (targetBuffer != null) { > targetBuffer.clear(); > } > } > } > {code} > So, why is the writer null? Let's track when the writer is initialized and > when reconstruct() is called, as follows: > {code:java} > // StripedBlockReconstructor#run > public void run() { > try { > initDecoderIfNecessary(); > getStripedReader().init(); > stripedWriter.init(); //① > reconstruct(); //② > stripedWriter.endTargetBlocks(); > } catch (Throwable e) { > LOG.warn("Failed to reconstruct striped block: {}", getBlockGroup(), e); > // ...{code} > They are called at ① and ② above respectively. `stripedWriter.init()` -> > `initTargetStreams()`, as follows: > {code:java} > // StripedWriter#initTargetStreams > int initTargetStreams() { > int nSuccess = 0; > for (short i = 0; i < targets.length; i++) { > try { > writers[i] = createWriter(i); > nSuccess++; > targetsStatus[i] = true; > } catch (Throwable e) { > LOG.warn(e.getMessage()); > } > } > return nSuccess; > } > {code} > NPE occurs when createWriter() gets an exception and 0 < nSuccess < > targets.length. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15814) Make some parameters configurable for DataNodeDiskMetrics
[ https://issues.apache.org/jira/browse/HDFS-15814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] tomscut updated HDFS-15814: --- External issue URL: https://github.com/apache/hadoop/pull/2676 > Make some parameters configurable for DataNodeDiskMetrics > - > > Key: HDFS-15814 > URL: https://issues.apache.org/jira/browse/HDFS-15814 > Project: Hadoop HDFS > Issue Type: Wish > Components: hdfs >Reporter: tomscut >Priority: Major > > For ease of use, especially for small clusters, we can change some > parameters(MIN_OUTLIER_DETECTION_DISKS, SLOW_DISK_LOW_THRESHOLD_MS) > configurable. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15814) Make some parameters configurable for DataNodeDiskMetrics
[ https://issues.apache.org/jira/browse/HDFS-15814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] tomscut updated HDFS-15814: --- External issue URL: (was: https://github.com/apache/hadoop/pull/2676) > Make some parameters configurable for DataNodeDiskMetrics > - > > Key: HDFS-15814 > URL: https://issues.apache.org/jira/browse/HDFS-15814 > Project: Hadoop HDFS > Issue Type: Wish > Components: hdfs >Reporter: tomscut >Priority: Major > > For ease of use, especially for small clusters, we can change some > parameters(MIN_OUTLIER_DETECTION_DISKS, SLOW_DISK_LOW_THRESHOLD_MS) > configurable. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15814) Make some parameters configurable for DataNodeDiskMetrics
[ https://issues.apache.org/jira/browse/HDFS-15814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] tomscut updated HDFS-15814: --- External issue URL: https://github.com/apache/hadoop/pull/2676 > Make some parameters configurable for DataNodeDiskMetrics > - > > Key: HDFS-15814 > URL: https://issues.apache.org/jira/browse/HDFS-15814 > Project: Hadoop HDFS > Issue Type: Wish > Components: hdfs >Reporter: tomscut >Priority: Major > > For ease of use, especially for small clusters, we can change some > parameters(MIN_OUTLIER_DETECTION_DISKS, SLOW_DISK_LOW_THRESHOLD_MS) > configurable. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15814) Make some parameters configurable for DataNodeDiskMetrics
[ https://issues.apache.org/jira/browse/HDFS-15814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] tomscut updated HDFS-15814: --- External issue URL: https://github.com/apache/hadoop/pull/2676 > Make some parameters configurable for DataNodeDiskMetrics > - > > Key: HDFS-15814 > URL: https://issues.apache.org/jira/browse/HDFS-15814 > Project: Hadoop HDFS > Issue Type: Wish > Components: hdfs >Reporter: tomscut >Priority: Major > > For ease of use, especially for small clusters, we can change some > parameters(MIN_OUTLIER_DETECTION_DISKS, SLOW_DISK_LOW_THRESHOLD_MS) > configurable. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15814) Make some parameters configurable for DataNodeDiskMetrics
[ https://issues.apache.org/jira/browse/HDFS-15814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] tomscut updated HDFS-15814: --- External issue URL: (was: https://github.com/apache/hadoop/pull/2676) > Make some parameters configurable for DataNodeDiskMetrics > - > > Key: HDFS-15814 > URL: https://issues.apache.org/jira/browse/HDFS-15814 > Project: Hadoop HDFS > Issue Type: Wish > Components: hdfs >Reporter: tomscut >Priority: Major > > For ease of use, especially for small clusters, we can change some > parameters(MIN_OUTLIER_DETECTION_DISKS, SLOW_DISK_LOW_THRESHOLD_MS) > configurable. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-15801) Backport HDFS-14582 to branch-2.10 (Failed to start DN with ArithmeticException when NULL checksum used)
[ https://issues.apache.org/jira/browse/HDFS-15801?focusedWorklogId=546491=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-546491 ] ASF GitHub Bot logged work on HDFS-15801: - Author: ASF GitHub Bot Created on: 03/Feb/21 01:10 Start Date: 03/Feb/21 01:10 Worklog Time Spent: 10m Work Description: jojochuang merged pull request #2659: URL: https://github.com/apache/hadoop/pull/2659 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 546491) Time Spent: 1h (was: 50m) > Backport HDFS-14582 to branch-2.10 (Failed to start DN with > ArithmeticException when NULL checksum used) > > > Key: HDFS-15801 > URL: https://issues.apache.org/jira/browse/HDFS-15801 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Janus Chow >Assignee: Janus Chow >Priority: Major > Labels: pull-request-available > Fix For: 2.10.2 > > Time Spent: 1h > Remaining Estimate: 0h > > In HDFS-14582, the error message is more clear as follows: > {code:java} > Caused by: java.lang.ArithmeticException: / by zero > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.validateIntegrityAndSetLength(BlockPoolSlice.java:823) > at > {code} > But in branch-2.10.1, the exception message is omitted as follows: > {code:java} > 2021-01-29 14:20:30,694 INFO impl.FsDatasetImpl (FsVolumeList.java:run(204)) > - Caught exception while adding replicas from /mnt/disk/0/hdfs/data/current. > Will throw later. > java.io.IOException: Failed to start sub tasks to add replica in replica map > :java.lang.ArithmeticExceptionjava.io.IOException: Failed to start sub tasks > to add replica in replica map :java.lang.ArithmeticException at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.getVolumeMap(BlockPoolSlice.java:434) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getVolumeMap(FsVolumeImpl.java:930) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList$1.run(FsVolumeList.java:196) > {code} > The specific error message is omitted, causing it harder to find the root > cause. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-15808) Add metrics for FSNamesystem read/write lock warnings
[ https://issues.apache.org/jira/browse/HDFS-15808?focusedWorklogId=546495=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-546495 ] ASF GitHub Bot logged work on HDFS-15808: - Author: ASF GitHub Bot Created on: 03/Feb/21 01:10 Start Date: 03/Feb/21 01:10 Worklog Time Spent: 10m Work Description: tomscut commented on pull request #2668: URL: https://github.com/apache/hadoop/pull/2668#issuecomment-771457006 Failed junit tests hadoop.hdfs.server.namenode.ha.TestStandbyCheckpoints hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks Sorry. I didn't change those two unit tests, and they worked fine locally. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 546495) Time Spent: 1h (was: 50m) > Add metrics for FSNamesystem read/write lock warnings > - > > Key: HDFS-15808 > URL: https://issues.apache.org/jira/browse/HDFS-15808 > Project: Hadoop HDFS > Issue Type: Wish > Components: hdfs >Reporter: tomscut >Assignee: tomscut >Priority: Major > Labels: hdfs, lock, metrics, pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > To monitor how often read/write locks exceed thresholds, we can add two > metrics(ReadLockWarning/WriteLockWarning), which are exposed in JMX. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-15814) Make some parameters configurable for DataNodeDiskMetrics
tomscut created HDFS-15814: -- Summary: Make some parameters configurable for DataNodeDiskMetrics Key: HDFS-15814 URL: https://issues.apache.org/jira/browse/HDFS-15814 Project: Hadoop HDFS Issue Type: Wish Components: hdfs Reporter: tomscut For ease of use, especially for small clusters, we can change some parameters(MIN_OUTLIER_DETECTION_DISKS, SLOW_DISK_LOW_THRESHOLD_MS) configurable. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-15757) RBF: Improving Router Connection Management
[ https://issues.apache.org/jira/browse/HDFS-15757?focusedWorklogId=546454=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-546454 ] ASF GitHub Bot logged work on HDFS-15757: - Author: ASF GitHub Bot Created on: 03/Feb/21 01:06 Start Date: 03/Feb/21 01:06 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #2651: URL: https://github.com/apache/hadoop/pull/2651#issuecomment-771379941 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 546454) Time Spent: 2h 50m (was: 2h 40m) > RBF: Improving Router Connection Management > --- > > Key: HDFS-15757 > URL: https://issues.apache.org/jira/browse/HDFS-15757 > Project: Hadoop HDFS > Issue Type: Improvement > Components: rbf >Reporter: Fengnan Li >Assignee: Fengnan Li >Priority: Major > Labels: pull-request-available > Attachments: RBF_ Improving Router Connection Management_v2.pdf, RBF_ > Improving Router Connection Management_v3.pdf, RBF_ Router Connection > Management.pdf > > Time Spent: 2h 50m > Remaining Estimate: 0h > > We have seen high number of connections from Router to namenodes, leaving > namenodes unstable. > This ticket is trying to reduce connections through some changes. Please take > a look at the design and leave comments. > Thanks! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-15757) RBF: Improving Router Connection Management
[ https://issues.apache.org/jira/browse/HDFS-15757?focusedWorklogId=546422=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-546422 ] ASF GitHub Bot logged work on HDFS-15757: - Author: ASF GitHub Bot Created on: 03/Feb/21 01:03 Start Date: 03/Feb/21 01:03 Worklog Time Spent: 10m Work Description: goiri commented on a change in pull request #2651: URL: https://github.com/apache/hadoop/pull/2651#discussion_r568315552 ## File path: hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/ConnectionContext.java ## @@ -42,7 +44,10 @@ private int numThreads = 0; /** If the connection is closed. */ private boolean closed = false; - + /** Last timestamp the connection was active. */ + private long lastActiveTs = 0; + /** The connection's active status would expire after this window. */ + private long activeWindow = TimeUnit.SECONDS.toMillis(30); Review comment: I agree. and called ACTIVE_WINDOW_TIME ## File path: hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/ConnectionContext.java ## @@ -57,6 +62,17 @@ public synchronized boolean isActive() { return this.numThreads > 0; } + /** + * Check if the connection is/was active recently. + * + * @return True if the connection is active or + * was active in the past period of time. + */ + public synchronized boolean isActiveRecently() { +return isActive() || Review comment: If we do the timestamp based, do we even need to check for the isActive() or the timestamp comparisson is enoug? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 546422) Time Spent: 2h 40m (was: 2.5h) > RBF: Improving Router Connection Management > --- > > Key: HDFS-15757 > URL: https://issues.apache.org/jira/browse/HDFS-15757 > Project: Hadoop HDFS > Issue Type: Improvement > Components: rbf >Reporter: Fengnan Li >Assignee: Fengnan Li >Priority: Major > Labels: pull-request-available > Attachments: RBF_ Improving Router Connection Management_v2.pdf, RBF_ > Improving Router Connection Management_v3.pdf, RBF_ Router Connection > Management.pdf > > Time Spent: 2h 40m > Remaining Estimate: 0h > > We have seen high number of connections from Router to namenodes, leaving > namenodes unstable. > This ticket is trying to reduce connections through some changes. Please take > a look at the design and leave comments. > Thanks! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-15811) completeFile should log final file size
[ https://issues.apache.org/jira/browse/HDFS-15811?focusedWorklogId=546376=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-546376 ] ASF GitHub Bot logged work on HDFS-15811: - Author: ASF GitHub Bot Created on: 03/Feb/21 00:58 Start Date: 03/Feb/21 00:58 Worklog Time Spent: 10m Work Description: sunchao commented on a change in pull request #2670: URL: https://github.com/apache/hadoop/pull/2670#discussion_r568348691 ## File path: hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java ## @@ -3146,23 +3148,30 @@ INodeFile checkLease(INodesInPath iip, String holder, long fileId) boolean completeFile(final String src, String holder, ExtendedBlock last, long fileId) throws IOException { +final String operationName = CMD_COMPLETE_FILE; boolean success = false; +FileStatus stat = null; checkOperation(OperationCategory.WRITE); final FSPermissionChecker pc = getPermissionChecker(); FSPermissionChecker.setOperationType(null); writeLock(); try { checkOperation(OperationCategory.WRITE); checkNameNodeSafeMode("Cannot complete file " + src); - success = FSDirWriteFileOp.completeFile(this, pc, src, holder, last, + INodesInPath iip = dir.resolvePath(pc, src, fileId); + success = FSDirWriteFileOp.completeFile(this, iip, src, holder, last, fileId); + if (success) { +stat = dir.getAuditFileInfo(iip); + } } finally { - writeUnlock("completeFile"); + writeUnlock(operationName); Review comment: why change this? ## File path: hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java ## @@ -8667,6 +8676,9 @@ public void logAuditEvent(boolean succeeded, String userName, } sb.append("\t").append("proto=") .append(Server.getProtocol()); +if (cmd.equals(CMD_COMPLETE_FILE) && status != null) { + sb.append("\t").append("fileSize=").append(status.getLen()); Review comment: we shouldn't only add a new field for this particular command, as it will probably break lots of applications parsing audit log. See https://issues.apache.org/jira/browse/HDFS-9184 for some more context. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 546376) Time Spent: 0.5h (was: 20m) > completeFile should log final file size > --- > > Key: HDFS-15811 > URL: https://issues.apache.org/jira/browse/HDFS-15811 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Zehao Chen >Assignee: Zehao Chen >Priority: Minor > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > Jobs, particularly hive queries by non-headless users, can create an > excessive number of files (many hundreds of thousands). A single user's query > can generate a sustained burst of 60-80% of all creates for tens of minutes > or more and impact overall cluster performance. Adding the file size to the > logline allows us to identify excessive tiny or large files. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-15808) Add metrics for FSNamesystem read/write lock warnings
[ https://issues.apache.org/jira/browse/HDFS-15808?focusedWorklogId=546358=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-546358 ] ASF GitHub Bot logged work on HDFS-15808: - Author: ASF GitHub Bot Created on: 03/Feb/21 00:56 Start Date: 03/Feb/21 00:56 Worklog Time Spent: 10m Work Description: tomscut edited a comment on pull request #2668: URL: https://github.com/apache/hadoop/pull/2668#issuecomment-771457006 Failed junit tests hadoop.hdfs.server.namenode.ha.TestStandbyCheckpoints hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks Sorry. I didn't update those two unit tests, and they worked fine locally. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 546358) Time Spent: 50m (was: 40m) > Add metrics for FSNamesystem read/write lock warnings > - > > Key: HDFS-15808 > URL: https://issues.apache.org/jira/browse/HDFS-15808 > Project: Hadoop HDFS > Issue Type: Wish > Components: hdfs >Reporter: tomscut >Assignee: tomscut >Priority: Major > Labels: hdfs, lock, metrics, pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > To monitor how often read/write locks exceed thresholds, we can add two > metrics(ReadLockWarning/WriteLockWarning), which are exposed in JMX. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13148) Unit test for EZ with KMS and Federation
[ https://issues.apache.org/jira/browse/HDFS-13148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17277555#comment-17277555 ] Hadoop QA commented on HDFS-13148: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 50s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} {color} | {color:green} 0m 0s{color} | {color:green}test4tests{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 49s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 42s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 25s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 12s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 32s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 19m 58s{color} | {color:green}{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 1s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 39s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01 {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 3m 49s{color} | {color:blue}{color} | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 46s{color} | {color:green}{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 29s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 31s{color} | {color:green}{color} | {color:green} the patch passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04 {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 1m 31s{color} | {color:red}https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/451/artifact/out/diff-compile-javac-hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04.txt{color} | {color:red} hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04 with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04 generated 1 new + 594 unchanged - 0 fixed = 595 total (was 594) {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 18s{color} | {color:green}{color} | {color:green} the patch passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01 {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 1m 18s{color} | {color:red}https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/451/artifact/out/diff-compile-javac-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01.txt{color} | {color:red} hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01 with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01 generated 1 new + 578 unchanged - 0 fixed = 579 total (was 578) {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 2s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 27s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color}
[jira] [Work logged] (HDFS-15757) RBF: Improving Router Connection Management
[ https://issues.apache.org/jira/browse/HDFS-15757?focusedWorklogId=546266=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-546266 ] ASF GitHub Bot logged work on HDFS-15757: - Author: ASF GitHub Bot Created on: 02/Feb/21 21:24 Start Date: 02/Feb/21 21:24 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #2651: URL: https://github.com/apache/hadoop/pull/2651#issuecomment-771998442 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 6m 45s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | | 0m 0s | [test4tests](test4tests) | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 40m 40s | | trunk passed | | +1 :green_heart: | compile | 0m 50s | | trunk passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04 | | +1 :green_heart: | compile | 0m 46s | | trunk passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01 | | +1 :green_heart: | checkstyle | 0m 34s | | trunk passed | | +1 :green_heart: | mvnsite | 0m 53s | | trunk passed | | +1 :green_heart: | shadedclient | 19m 42s | | branch has no errors when building and testing our client artifacts. | | +1 :green_heart: | javadoc | 0m 48s | | trunk passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04 | | +1 :green_heart: | javadoc | 1m 6s | | trunk passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01 | | +0 :ok: | spotbugs | 1m 48s | | Used deprecated FindBugs config; considering switching to SpotBugs. | | +1 :green_heart: | findbugs | 1m 42s | | trunk passed | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 44s | | the patch passed | | +1 :green_heart: | compile | 0m 45s | | the patch passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04 | | +1 :green_heart: | javac | 0m 45s | | the patch passed | | +1 :green_heart: | compile | 0m 36s | | the patch passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01 | | +1 :green_heart: | javac | 0m 36s | | the patch passed | | +1 :green_heart: | checkstyle | 0m 17s | | the patch passed | | +1 :green_heart: | mvnsite | 0m 36s | | the patch passed | | +1 :green_heart: | whitespace | 0m 0s | | The patch has no whitespace issues. | | +1 :green_heart: | shadedclient | 17m 1s | | patch has no errors when building and testing our client artifacts. | | +1 :green_heart: | javadoc | 0m 39s | | the patch passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04 | | +1 :green_heart: | javadoc | 0m 57s | | the patch passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01 | | +1 :green_heart: | findbugs | 1m 36s | | the patch passed | _ Other Tests _ | | +1 :green_heart: | unit | 20m 1s | | hadoop-hdfs-rbf in the patch passed. | | +1 :green_heart: | asflicense | 0m 33s | | The patch does not generate ASF License warnings. | | | | 120m 39s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2651/6/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/2651 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux b7d09608d928 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / f37bf651993 | | Default Java | Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2651/6/testReport/ | | Max. process+thread count | 2389 (vs. ulimit of 5500) | | modules | C: hadoop-hdfs-project/hadoop-hdfs-rbf U: hadoop-hdfs-project/hadoop-hdfs-rbf | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2651/6/console | | versions | git=2.25.1 maven=3.6.3 findbugs=4.0.6 | | Powered by | Apache Yetus 0.13.0-SNAPSHOT
[jira] [Updated] (HDFS-15799) Make DisallowedDatanodeException terse
[ https://issues.apache.org/jira/browse/HDFS-15799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Brennan updated HDFS-15799: --- Affects Version/s: 3.4.0 2.10.1 > Make DisallowedDatanodeException terse > -- > > Key: HDFS-15799 > URL: https://issues.apache.org/jira/browse/HDFS-15799 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 2.10.1, 3.4.0 >Reporter: Richard >Assignee: Richard >Priority: Minor > Attachments: HDFS-15799.001.patch > > > When org.apache.hadoop.hdfs.server.protocol.DisallowedDatanodeException is > thrown back to a datanode, the namenode logs a full stack trace. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15799) Make DisallowedDatanodeException terse
[ https://issues.apache.org/jira/browse/HDFS-15799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Brennan updated HDFS-15799: --- Target Version/s: (was: 3.3.0) > Make DisallowedDatanodeException terse > -- > > Key: HDFS-15799 > URL: https://issues.apache.org/jira/browse/HDFS-15799 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Reporter: Richard >Assignee: Richard >Priority: Minor > Attachments: HDFS-15799.001.patch > > > When org.apache.hadoop.hdfs.server.protocol.DisallowedDatanodeException is > thrown back to a datanode, the namenode logs a full stack trace. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15799) Make DisallowedDatanodeException terse
[ https://issues.apache.org/jira/browse/HDFS-15799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Brennan updated HDFS-15799: --- Status: Patch Available (was: Open) > Make DisallowedDatanodeException terse > -- > > Key: HDFS-15799 > URL: https://issues.apache.org/jira/browse/HDFS-15799 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Reporter: Richard >Assignee: Richard >Priority: Minor > Attachments: HDFS-15799.001.patch > > > When org.apache.hadoop.hdfs.server.protocol.DisallowedDatanodeException is > thrown back to a datanode, the namenode logs a full stack trace. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15757) RBF: Improving Router Connection Management
[ https://issues.apache.org/jira/browse/HDFS-15757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17277429#comment-17277429 ] Íñigo Goiri commented on HDFS-15757: Thank you for the updated document with the data. I think these results justify this improvement. I'm fine going forward with this. > RBF: Improving Router Connection Management > --- > > Key: HDFS-15757 > URL: https://issues.apache.org/jira/browse/HDFS-15757 > Project: Hadoop HDFS > Issue Type: Improvement > Components: rbf >Reporter: Fengnan Li >Assignee: Fengnan Li >Priority: Major > Labels: pull-request-available > Attachments: RBF_ Improving Router Connection Management_v2.pdf, RBF_ > Improving Router Connection Management_v3.pdf, RBF_ Router Connection > Management.pdf > > Time Spent: 2h 20m > Remaining Estimate: 0h > > We have seen high number of connections from Router to namenodes, leaving > namenodes unstable. > This ticket is trying to reduce connections through some changes. Please take > a look at the design and leave comments. > Thanks! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15757) RBF: Improving Router Connection Management
[ https://issues.apache.org/jira/browse/HDFS-15757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17277412#comment-17277412 ] Fengnan Li commented on HDFS-15757: --- [~elgoiri] [~hexiaoqiao] Addressed comments in the PR. What's more important is that you guys can try this from your setup since this essentially is an optimization where only metrics improvement can justify it. > RBF: Improving Router Connection Management > --- > > Key: HDFS-15757 > URL: https://issues.apache.org/jira/browse/HDFS-15757 > Project: Hadoop HDFS > Issue Type: Improvement > Components: rbf >Reporter: Fengnan Li >Assignee: Fengnan Li >Priority: Major > Labels: pull-request-available > Attachments: RBF_ Improving Router Connection Management_v2.pdf, RBF_ > Improving Router Connection Management_v3.pdf, RBF_ Router Connection > Management.pdf > > Time Spent: 2h 20m > Remaining Estimate: 0h > > We have seen high number of connections from Router to namenodes, leaving > namenodes unstable. > This ticket is trying to reduce connections through some changes. Please take > a look at the design and leave comments. > Thanks! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-15757) RBF: Improving Router Connection Management
[ https://issues.apache.org/jira/browse/HDFS-15757?focusedWorklogId=546221=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-546221 ] ASF GitHub Bot logged work on HDFS-15757: - Author: ASF GitHub Bot Created on: 02/Feb/21 19:21 Start Date: 02/Feb/21 19:21 Worklog Time Spent: 10m Work Description: fengnanli commented on a change in pull request #2651: URL: https://github.com/apache/hadoop/pull/2651#discussion_r568868167 ## File path: hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/ConnectionContext.java ## @@ -57,6 +62,17 @@ public synchronized boolean isActive() { return this.numThreads > 0; } + /** + * Check if the connection is/was active recently. + * + * @return True if the connection is active or + * was active in the past period of time. + */ + public synchronized boolean isActiveRecently() { +return isActive() || Review comment: That can removed since the timewindow calculation covers the active case. Updated. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 546221) Time Spent: 2h 20m (was: 2h 10m) > RBF: Improving Router Connection Management > --- > > Key: HDFS-15757 > URL: https://issues.apache.org/jira/browse/HDFS-15757 > Project: Hadoop HDFS > Issue Type: Improvement > Components: rbf >Reporter: Fengnan Li >Assignee: Fengnan Li >Priority: Major > Labels: pull-request-available > Attachments: RBF_ Improving Router Connection Management_v2.pdf, RBF_ > Improving Router Connection Management_v3.pdf, RBF_ Router Connection > Management.pdf > > Time Spent: 2h 20m > Remaining Estimate: 0h > > We have seen high number of connections from Router to namenodes, leaving > namenodes unstable. > This ticket is trying to reduce connections through some changes. Please take > a look at the design and leave comments. > Thanks! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-15757) RBF: Improving Router Connection Management
[ https://issues.apache.org/jira/browse/HDFS-15757?focusedWorklogId=546220=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-546220 ] ASF GitHub Bot logged work on HDFS-15757: - Author: ASF GitHub Bot Created on: 02/Feb/21 19:20 Start Date: 02/Feb/21 19:20 Worklog Time Spent: 10m Work Description: fengnanli commented on a change in pull request #2651: URL: https://github.com/apache/hadoop/pull/2651#discussion_r568866912 ## File path: hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/ConnectionPool.java ## @@ -252,19 +252,23 @@ public synchronized void addConnection(ConnectionContext conn) { */ public synchronized List removeConnections(int num) { List removed = new LinkedList<>(); - -// Remove and close the last connection -List tmpConnections = new ArrayList<>(); -for (int i=0; i this.minSize) { + int targetCount = Math.min(num, this.connections.size() - this.minSize); Review comment: I don't think it can negative here since the only place connections become less is in this function at the swap part with the tmpConnections. The other place where this var gets assigned is in the creation part and it can only increase the value. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 546220) Time Spent: 2h 10m (was: 2h) > RBF: Improving Router Connection Management > --- > > Key: HDFS-15757 > URL: https://issues.apache.org/jira/browse/HDFS-15757 > Project: Hadoop HDFS > Issue Type: Improvement > Components: rbf >Reporter: Fengnan Li >Assignee: Fengnan Li >Priority: Major > Labels: pull-request-available > Attachments: RBF_ Improving Router Connection Management_v2.pdf, RBF_ > Improving Router Connection Management_v3.pdf, RBF_ Router Connection > Management.pdf > > Time Spent: 2h 10m > Remaining Estimate: 0h > > We have seen high number of connections from Router to namenodes, leaving > namenodes unstable. > This ticket is trying to reduce connections through some changes. Please take > a look at the design and leave comments. > Thanks! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-15757) RBF: Improving Router Connection Management
[ https://issues.apache.org/jira/browse/HDFS-15757?focusedWorklogId=546216=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-546216 ] ASF GitHub Bot logged work on HDFS-15757: - Author: ASF GitHub Bot Created on: 02/Feb/21 19:14 Start Date: 02/Feb/21 19:14 Worklog Time Spent: 10m Work Description: fengnanli commented on pull request #2651: URL: https://github.com/apache/hadoop/pull/2651#issuecomment-771903717 > Thanks @fengnanli for you works here. Leave some nit comment inline. > Sorry I do not get why the change can reduce connections here after review the changes, is it related "Be greedy here to close as many connections as possible in one shot"? It will be helpful if we add some javadocs explicitly. Thanks. Thanks for the review @Hexiaoqiao I put the reason behind this change in the design doc in the original JIRA ticket. In short, I did synchronous connection closing + better picking connections + greedy closing connections. I have seen 50% reduce in number of connections and better ProxyTime. It will be great if you can try in your setup as well. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 546216) Time Spent: 2h (was: 1h 50m) > RBF: Improving Router Connection Management > --- > > Key: HDFS-15757 > URL: https://issues.apache.org/jira/browse/HDFS-15757 > Project: Hadoop HDFS > Issue Type: Improvement > Components: rbf >Reporter: Fengnan Li >Assignee: Fengnan Li >Priority: Major > Labels: pull-request-available > Attachments: RBF_ Improving Router Connection Management_v2.pdf, RBF_ > Improving Router Connection Management_v3.pdf, RBF_ Router Connection > Management.pdf > > Time Spent: 2h > Remaining Estimate: 0h > > We have seen high number of connections from Router to namenodes, leaving > namenodes unstable. > This ticket is trying to reduce connections through some changes. Please take > a look at the design and leave comments. > Thanks! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15813) DataStreamer: keep sending heartbeat packets while streaming
[ https://issues.apache.org/jira/browse/HDFS-15813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17277389#comment-17277389 ] Hadoop QA commented on HDFS-15813: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 41s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red}{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 6s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 59s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 51s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 30s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 56s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 28s{color} | {color:green}{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 38s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 37s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01 {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 2m 23s{color} | {color:blue}{color} | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 21s{color} | {color:green}{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 47s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 50s{color} | {color:green}{color} | {color:green} the patch passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 50s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s{color} | {color:green}{color} | {color:green} the patch passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 43s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 20s{color} | {color:green}{color} | {color:green} hadoop-hdfs-project/hadoop-hdfs-client: The patch generated 0 new + 73 unchanged - 1 fixed = 73 total (was 74) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 45s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 38s{color} | {color:green}{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 36s{color} | {color:green}{color} |
[jira] [Updated] (HDFS-15799) Make DisallowedDatanodeException terse
[ https://issues.apache.org/jira/browse/HDFS-15799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard updated HDFS-15799: --- Attachment: HDFS-15799.001.patch > Make DisallowedDatanodeException terse > -- > > Key: HDFS-15799 > URL: https://issues.apache.org/jira/browse/HDFS-15799 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Reporter: Richard >Assignee: Richard >Priority: Minor > Attachments: HDFS-15799.001.patch > > > When org.apache.hadoop.hdfs.server.protocol.DisallowedDatanodeException is > thrown back to a datanode, the namenode logs a full stack trace. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15798) EC: Reconstruct task failed, and It would be XmitsInProgress of DN has negative number
[ https://issues.apache.org/jira/browse/HDFS-15798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17277354#comment-17277354 ] Hadoop QA commented on HDFS-15798: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 45s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red}{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 8s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 25s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 19s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 59s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 27s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 17m 21s{color} | {color:green}{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 57s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 39s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01 {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 3m 22s{color} | {color:blue}{color} | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 19s{color} | {color:green}{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 18s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 20s{color} | {color:green}{color} | {color:green} the patch passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 20s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 6s{color} | {color:green}{color} | {color:green} the patch passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 6s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 57s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 15s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 2s{color} | {color:green}{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 54s{color} | {color:green}{color} | {color:green} the patch passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04 {color} | |
[jira] [Updated] (HDFS-15813) DataStreamer: keep sending heartbeat packets while streaming
[ https://issues.apache.org/jira/browse/HDFS-15813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Brennan updated HDFS-15813: --- Attachment: HDFS-15813.002.patch > DataStreamer: keep sending heartbeat packets while streaming > > > Key: HDFS-15813 > URL: https://issues.apache.org/jira/browse/HDFS-15813 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.4.0 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Major > Attachments: HDFS-15813.001.patch, HDFS-15813.002.patch > > > In response to [HDFS-5032], [~daryn] made a change to our internal code to > ensure that heartbeats continue during data steaming, even in the face of a > slow disk. > As [~kihwal] noted, absence of heartbeat during flush will be fixed in a > separate jira. It doesn't look like this change was ever pushed back to > apache, so I am providing it here. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15813) DataStreamer: keep sending heartbeat packets while streaming
[ https://issues.apache.org/jira/browse/HDFS-15813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17277309#comment-17277309 ] Jim Brennan commented on HDFS-15813: Looks like I need to update the patch. > DataStreamer: keep sending heartbeat packets while streaming > > > Key: HDFS-15813 > URL: https://issues.apache.org/jira/browse/HDFS-15813 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.4.0 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Major > Attachments: HDFS-15813.001.patch > > > In response to [HDFS-5032], [~daryn] made a change to our internal code to > ensure that heartbeats continue during data steaming, even in the face of a > slow disk. > As [~kihwal] noted, absence of heartbeat during flush will be fixed in a > separate jira. It doesn't look like this change was ever pushed back to > apache, so I am providing it here. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15813) DataStreamer: keep sending heartbeat packets while streaming
[ https://issues.apache.org/jira/browse/HDFS-15813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17277303#comment-17277303 ] Hadoop QA commented on HDFS-15813: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 11s{color} | {color:red}{color} | {color:red} HDFS-15813 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HDFS-15813 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/13019865/HDFS-15813.001.patch | | Console output | https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/449/console | | versions | git=2.17.1 | | Powered by | Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org | This message was automatically generated. > DataStreamer: keep sending heartbeat packets while streaming > > > Key: HDFS-15813 > URL: https://issues.apache.org/jira/browse/HDFS-15813 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.4.0 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Major > Attachments: HDFS-15813.001.patch > > > In response to [HDFS-5032], [~daryn] made a change to our internal code to > ensure that heartbeats continue during data steaming, even in the face of a > slow disk. > As [~kihwal] noted, absence of heartbeat during flush will be fixed in a > separate jira. It doesn't look like this change was ever pushed back to > apache, so I am providing it here. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15813) DataStreamer: keep sending heartbeat packets while streaming
[ https://issues.apache.org/jira/browse/HDFS-15813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Brennan updated HDFS-15813: --- Attachment: HDFS-15813.001.patch Status: Patch Available (was: Open) Submitting patch - we have been running with this change in production for years. > DataStreamer: keep sending heartbeat packets while streaming > > > Key: HDFS-15813 > URL: https://issues.apache.org/jira/browse/HDFS-15813 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.4.0 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Major > Attachments: HDFS-15813.001.patch > > > In response to [HDFS-5032], [~daryn] made a change to our internal code to > ensure that heartbeats continue during data steaming, even in the face of a > slow disk. > As [~kihwal] noted, absence of heartbeat during flush will be fixed in a > separate jira. It doesn't look like this change was ever pushed back to > apache, so I am providing it here. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-15813) DataStreamer: keep sending heartbeat packets while streaming
Jim Brennan created HDFS-15813: -- Summary: DataStreamer: keep sending heartbeat packets while streaming Key: HDFS-15813 URL: https://issues.apache.org/jira/browse/HDFS-15813 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs Affects Versions: 3.4.0 Reporter: Jim Brennan Assignee: Jim Brennan In response to [HDFS-5032], [~daryn] made a change to our internal code to ensure that heartbeats continue during data steaming, even in the face of a slow disk. As [~kihwal] noted, absence of heartbeat during flush will be fixed in a separate jira. It doesn't look like this change was ever pushed back to apache, so I am providing it here. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-15792) ClasscastException while loading FSImage
[ https://issues.apache.org/jira/browse/HDFS-15792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17277279#comment-17277279 ] Renukaprasad C edited comment on HDFS-15792 at 2/2/21, 4:51 PM: [~hexiaoqiao] I have added the patch for branch-2.10. Included checkstyle fixes as well. Please review. Thank you. was (Author: prasad-acit): [~hexiaoqiao] I have added the patch for branch-2.10. Included checkstyle issues as well. Please review. Thank you. > ClasscastException while loading FSImage > > > Key: HDFS-15792 > URL: https://issues.apache.org/jira/browse/HDFS-15792 > Project: Hadoop HDFS > Issue Type: Bug > Components: nn >Reporter: Renukaprasad C >Assignee: Renukaprasad C >Priority: Major > Fix For: 3.3.1, 3.4.0 > > Attachments: HDFS-15792-branch-2.10.001.patch, HDFS-15792.001.patch, > HDFS-15792.002.patch, HDFS-15792.003.patch, HDFS-15792.004.patch, > HDFS-15792.005.patch, HDFS-15792.addendum.001.patch, > image-2021-01-27-12-00-34-846.png > > > FSImage loading has failed with ClasscastException - > java.lang.ClassCastException: java.util.HashMap$Node cannot be cast to > java.util.HashMap$TreeNode. > This is the usage issue with Hashmap in concurrent scenarios. > Same issue has been reported on Java & closed as usage issue. - > https://bugs.openjdk.java.net/browse/JDK-8173671 > 2020-12-28 11:36:26,127 | ERROR | main | An exception occurred when loading > INODE from fsiamge. | FSImageFormatProtobuf.java:442 > java.lang. > : java.util.HashMap$Node cannot be cast to java.util.HashMap$TreeNode > at java.util.HashMap$TreeNode.moveRootToFront(HashMap.java:1835) > at java.util.HashMap$TreeNode.treeify(HashMap.java:1951) > at java.util.HashMap.treeifyBin(HashMap.java:772) > at java.util.HashMap.putVal(HashMap.java:644) > at java.util.HashMap.put(HashMap.java:612) > at > org.apache.hadoop.hdfs.util.ReferenceCountMap.put(ReferenceCountMap.java:53) > at > org.apache.hadoop.hdfs.server.namenode.AclStorage.addAclFeature(AclStorage.java:391) > at > org.apache.hadoop.hdfs.server.namenode.INodeWithAdditionalFields.addAclFeature(INodeWithAdditionalFields.java:349) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeDirectory(FSImageFormatPBINode.java:225) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINode(FSImageFormatPBINode.java:406) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.readPBINodes(FSImageFormatPBINode.java:367) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeSection(FSImageFormatPBINode.java:342) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader$2.call(FSImageFormatProtobuf.java:469) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2020-12-28 11:36:26,130 | ERROR | main | Failed to load image from > FSImageFile(file=/srv/BigData/namenode/current/fsimage_00198227480, > cpktTxId=00198227480) | FSImage.java:738 > java.io.IOException: java.lang.ClassCastException: java.util.HashMap$Node > cannot be cast to java.util.HashMap$TreeNode > at > org.apache.hadoop.io.MultipleIOException$Builder.add(MultipleIOException.java:68) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.runLoaderTasks(FSImageFormatProtobuf.java:444) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:360) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:263) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:227) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:971) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:955) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:820) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:733) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:331) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1113) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:730) > at >
[jira] [Commented] (HDFS-15792) ClasscastException while loading FSImage
[ https://issues.apache.org/jira/browse/HDFS-15792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17277279#comment-17277279 ] Renukaprasad C commented on HDFS-15792: --- [~hexiaoqiao] I have added the patch for branch-2.10. Included checkstyle issues as well. Please review. Thank you. > ClasscastException while loading FSImage > > > Key: HDFS-15792 > URL: https://issues.apache.org/jira/browse/HDFS-15792 > Project: Hadoop HDFS > Issue Type: Bug > Components: nn >Reporter: Renukaprasad C >Assignee: Renukaprasad C >Priority: Major > Fix For: 3.3.1, 3.4.0 > > Attachments: HDFS-15792-branch-2.10.001.patch, HDFS-15792.001.patch, > HDFS-15792.002.patch, HDFS-15792.003.patch, HDFS-15792.004.patch, > HDFS-15792.005.patch, HDFS-15792.addendum.001.patch, > image-2021-01-27-12-00-34-846.png > > > FSImage loading has failed with ClasscastException - > java.lang.ClassCastException: java.util.HashMap$Node cannot be cast to > java.util.HashMap$TreeNode. > This is the usage issue with Hashmap in concurrent scenarios. > Same issue has been reported on Java & closed as usage issue. - > https://bugs.openjdk.java.net/browse/JDK-8173671 > 2020-12-28 11:36:26,127 | ERROR | main | An exception occurred when loading > INODE from fsiamge. | FSImageFormatProtobuf.java:442 > java.lang. > : java.util.HashMap$Node cannot be cast to java.util.HashMap$TreeNode > at java.util.HashMap$TreeNode.moveRootToFront(HashMap.java:1835) > at java.util.HashMap$TreeNode.treeify(HashMap.java:1951) > at java.util.HashMap.treeifyBin(HashMap.java:772) > at java.util.HashMap.putVal(HashMap.java:644) > at java.util.HashMap.put(HashMap.java:612) > at > org.apache.hadoop.hdfs.util.ReferenceCountMap.put(ReferenceCountMap.java:53) > at > org.apache.hadoop.hdfs.server.namenode.AclStorage.addAclFeature(AclStorage.java:391) > at > org.apache.hadoop.hdfs.server.namenode.INodeWithAdditionalFields.addAclFeature(INodeWithAdditionalFields.java:349) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeDirectory(FSImageFormatPBINode.java:225) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINode(FSImageFormatPBINode.java:406) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.readPBINodes(FSImageFormatPBINode.java:367) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeSection(FSImageFormatPBINode.java:342) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader$2.call(FSImageFormatProtobuf.java:469) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2020-12-28 11:36:26,130 | ERROR | main | Failed to load image from > FSImageFile(file=/srv/BigData/namenode/current/fsimage_00198227480, > cpktTxId=00198227480) | FSImage.java:738 > java.io.IOException: java.lang.ClassCastException: java.util.HashMap$Node > cannot be cast to java.util.HashMap$TreeNode > at > org.apache.hadoop.io.MultipleIOException$Builder.add(MultipleIOException.java:68) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.runLoaderTasks(FSImageFormatProtobuf.java:444) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:360) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:263) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:227) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:971) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:955) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:820) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:733) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:331) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1113) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:730) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:648) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:710) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:953) >
[jira] [Updated] (HDFS-15792) ClasscastException while loading FSImage
[ https://issues.apache.org/jira/browse/HDFS-15792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Renukaprasad C updated HDFS-15792: -- Attachment: HDFS-15792-branch-2.10.001.patch > ClasscastException while loading FSImage > > > Key: HDFS-15792 > URL: https://issues.apache.org/jira/browse/HDFS-15792 > Project: Hadoop HDFS > Issue Type: Bug > Components: nn >Reporter: Renukaprasad C >Assignee: Renukaprasad C >Priority: Major > Fix For: 3.3.1, 3.4.0 > > Attachments: HDFS-15792-branch-2.10.001.patch, HDFS-15792.001.patch, > HDFS-15792.002.patch, HDFS-15792.003.patch, HDFS-15792.004.patch, > HDFS-15792.005.patch, HDFS-15792.addendum.001.patch, > image-2021-01-27-12-00-34-846.png > > > FSImage loading has failed with ClasscastException - > java.lang.ClassCastException: java.util.HashMap$Node cannot be cast to > java.util.HashMap$TreeNode. > This is the usage issue with Hashmap in concurrent scenarios. > Same issue has been reported on Java & closed as usage issue. - > https://bugs.openjdk.java.net/browse/JDK-8173671 > 2020-12-28 11:36:26,127 | ERROR | main | An exception occurred when loading > INODE from fsiamge. | FSImageFormatProtobuf.java:442 > java.lang. > : java.util.HashMap$Node cannot be cast to java.util.HashMap$TreeNode > at java.util.HashMap$TreeNode.moveRootToFront(HashMap.java:1835) > at java.util.HashMap$TreeNode.treeify(HashMap.java:1951) > at java.util.HashMap.treeifyBin(HashMap.java:772) > at java.util.HashMap.putVal(HashMap.java:644) > at java.util.HashMap.put(HashMap.java:612) > at > org.apache.hadoop.hdfs.util.ReferenceCountMap.put(ReferenceCountMap.java:53) > at > org.apache.hadoop.hdfs.server.namenode.AclStorage.addAclFeature(AclStorage.java:391) > at > org.apache.hadoop.hdfs.server.namenode.INodeWithAdditionalFields.addAclFeature(INodeWithAdditionalFields.java:349) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeDirectory(FSImageFormatPBINode.java:225) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINode(FSImageFormatPBINode.java:406) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.readPBINodes(FSImageFormatPBINode.java:367) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeSection(FSImageFormatPBINode.java:342) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader$2.call(FSImageFormatProtobuf.java:469) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2020-12-28 11:36:26,130 | ERROR | main | Failed to load image from > FSImageFile(file=/srv/BigData/namenode/current/fsimage_00198227480, > cpktTxId=00198227480) | FSImage.java:738 > java.io.IOException: java.lang.ClassCastException: java.util.HashMap$Node > cannot be cast to java.util.HashMap$TreeNode > at > org.apache.hadoop.io.MultipleIOException$Builder.add(MultipleIOException.java:68) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.runLoaderTasks(FSImageFormatProtobuf.java:444) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:360) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:263) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:227) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:971) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:955) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:820) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:733) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:331) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1113) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:730) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:648) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:710) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:953) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:926) > at >
[jira] [Commented] (HDFS-13609) [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via RPC
[ https://issues.apache.org/jira/browse/HDFS-13609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17277250#comment-17277250 ] Erik Krogen commented on HDFS-13609: Hi [~xuzq_zander], thanks for taking a look. {quote} when onlyDurableTxns is false, maxAllowedTxns = responseCounts.get(0) {quote} Correct me if I'm wrong but I think you have this backwards. If {{onlyDurableTxns}} is false, then {{maxAllowedTxns = highestTxnCount}} which is {{responseCounts.get(2)}} It is when {{onlyDurableTxns}} is true that you get {{responseCounts.get(0)}}. In this case, we really do need to take the lowest of the returned values. Since we only got 3 responses, we can't make any assumptions about the other 2 JNs, so just assume they have 0 txns. We only want to take txns that have landed on a quorum of JNs (thus becoming durable). Thus since we only got 3 responses, we have to take the lowest txn that any of those responses are aware of. For example if we got back {{(5, 10, 20)}}, then only txns 1-5 are available on all 3 JNs we got responses from, so those are the only transactions we know are durable. Of course more _might_ be durable if they were persisted on the two JNs we didn't get responses from, but we don't know that. Let me know if that clears things up. > [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via > RPC > - > > Key: HDFS-13609 > URL: https://issues.apache.org/jira/browse/HDFS-13609 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, namenode >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Fix For: HDFS-12943, 3.3.0 > > Attachments: HDFS-13609-HDFS-12943.000.patch, > HDFS-13609-HDFS-12943.001.patch, HDFS-13609-HDFS-12943.002.patch, > HDFS-13609-HDFS-12943.003.patch, HDFS-13609-HDFS-12943.004.patch > > > See HDFS-13150 for the full design. > This JIRA is targetted at the NameNode-side changes to enable tailing > in-progress edits via the RPC mechanism added in HDFS-13608. Most changes are > in the QuorumJournalManager. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-15812) after deleting data of hbase table hdfs size is not decreasing
Satya Gaurav created HDFS-15812: --- Summary: after deleting data of hbase table hdfs size is not decreasing Key: HDFS-15812 URL: https://issues.apache.org/jira/browse/HDFS-15812 Project: Hadoop HDFS Issue Type: Bug Components: hdfs Affects Versions: 2.0.2-alpha Environment: HDP 3.1.4.0-315 Hbase 2.0.2.3.1.4.0-315 Reporter: Satya Gaurav I am deleting the data from hbase table, it's deleting from hbase table but the size of the hdfs directory is not reducing. Even I ran the major compaction but after that also hdfs size didn't reduce. Any solution for this issue? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15779) EC: fix NPE caused by StripedWriter.clearBuffers during reconstruct block
[ https://issues.apache.org/jira/browse/HDFS-15779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17277174#comment-17277174 ] Hadoop QA commented on HDFS-15779: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 24m 34s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red}{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 24s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 20s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 15s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 56s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 17s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 37s{color} | {color:green}{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 51s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 27s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01 {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 3m 1s{color} | {color:blue}{color} | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 59s{color} | {color:green}{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 12s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 13s{color} | {color:green}{color} | {color:green} the patch passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 13s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 12s{color} | {color:green}{color} | {color:green} the patch passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 12s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 54s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 18s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 56s{color} | {color:green}{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 48s{color} | {color:green}{color} | {color:green} the patch passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04 {color} | |
[jira] [Commented] (HDFS-15798) EC: Reconstruct task failed, and It would be XmitsInProgress of DN has negative number
[ https://issues.apache.org/jira/browse/HDFS-15798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17277120#comment-17277120 ] huhaiyang commented on HDFS-15798: -- Upload v003 patch according to your suggestions. > EC: Reconstruct task failed, and It would be XmitsInProgress of DN has > negative number > -- > > Key: HDFS-15798 > URL: https://issues.apache.org/jira/browse/HDFS-15798 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: huhaiyang >Assignee: huhaiyang >Priority: Major > Attachments: HDFS-15798.001.patch, HDFS-15798.002.patch, > HDFS-15798.003.patch > > > The EC reconstruct task failed, and the decrementXmitsInProgress of > processErasureCodingTasks operation abnormal value ; > It would be XmitsInProgress of DN has negative number, it affects NN chooses > pending tasks based on the ratio between the lengths of replication and > erasure-coded block queues. > {code:java} > // 1.ErasureCodingWorker.java > public void processErasureCodingTasks( > Collection ecTasks) { > for (BlockECReconstructionInfo reconInfo : ecTasks) { > int xmitsSubmitted = 0; > try { > ... > // It may throw IllegalArgumentException from task#stripedReader > // constructor. > final StripedBlockReconstructor task = > new StripedBlockReconstructor(this, stripedReconInfo); > if (task.hasValidTargets()) { > // See HDFS-12044. We increase xmitsInProgress even the task is only > // enqueued, so that > // 1) NN will not send more tasks than what DN can execute and > // 2) DN will not throw away reconstruction tasks, and instead keeps > // an unbounded number of tasks in the executor's task queue. > xmitsSubmitted = Math.max((int)(task.getXmits() * xmitWeight), 1); > getDatanode().incrementXmitsInProcess(xmitsSubmitted); // task start > increment > stripedReconstructionPool.submit(task); > } else { > LOG.warn("No missing internal block. Skip reconstruction for task:{}", > reconInfo); > } > } catch (Throwable e) { > getDatanode().decrementXmitsInProgress(xmitsSubmitted); // task failed > decrement, XmitsInProgress is decremented by the previous value > LOG.warn("Failed to reconstruct striped block {}", > reconInfo.getExtendedBlock().getLocalBlock(), e); > } > } > } > // 2.StripedBlockReconstructor.java > public void run() { > try { > initDecoderIfNecessary(); >... > } catch (Throwable e) { > LOG.warn("Failed to reconstruct striped block: {}", getBlockGroup(), e); > getDatanode().getMetrics().incrECFailedReconstructionTasks(); > } finally { > float xmitWeight = getErasureCodingWorker().getXmitWeight(); > // if the xmits is smaller than 1, the xmitsSubmitted should be set to 1 > // because if it set to zero, we cannot to measure the xmits submitted > int xmitsSubmitted = Math.max((int) (getXmits() * xmitWeight), 1); > getDatanode().decrementXmitsInProgress(xmitsSubmitted); // task complete > decrement > ... > } > }{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15798) EC: Reconstruct task failed, and It would be XmitsInProgress of DN has negative number
[ https://issues.apache.org/jira/browse/HDFS-15798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] huhaiyang updated HDFS-15798: - Attachment: HDFS-15798.003.patch > EC: Reconstruct task failed, and It would be XmitsInProgress of DN has > negative number > -- > > Key: HDFS-15798 > URL: https://issues.apache.org/jira/browse/HDFS-15798 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: huhaiyang >Assignee: huhaiyang >Priority: Major > Attachments: HDFS-15798.001.patch, HDFS-15798.002.patch, > HDFS-15798.003.patch > > > The EC reconstruct task failed, and the decrementXmitsInProgress of > processErasureCodingTasks operation abnormal value ; > It would be XmitsInProgress of DN has negative number, it affects NN chooses > pending tasks based on the ratio between the lengths of replication and > erasure-coded block queues. > {code:java} > // 1.ErasureCodingWorker.java > public void processErasureCodingTasks( > Collection ecTasks) { > for (BlockECReconstructionInfo reconInfo : ecTasks) { > int xmitsSubmitted = 0; > try { > ... > // It may throw IllegalArgumentException from task#stripedReader > // constructor. > final StripedBlockReconstructor task = > new StripedBlockReconstructor(this, stripedReconInfo); > if (task.hasValidTargets()) { > // See HDFS-12044. We increase xmitsInProgress even the task is only > // enqueued, so that > // 1) NN will not send more tasks than what DN can execute and > // 2) DN will not throw away reconstruction tasks, and instead keeps > // an unbounded number of tasks in the executor's task queue. > xmitsSubmitted = Math.max((int)(task.getXmits() * xmitWeight), 1); > getDatanode().incrementXmitsInProcess(xmitsSubmitted); // task start > increment > stripedReconstructionPool.submit(task); > } else { > LOG.warn("No missing internal block. Skip reconstruction for task:{}", > reconInfo); > } > } catch (Throwable e) { > getDatanode().decrementXmitsInProgress(xmitsSubmitted); // task failed > decrement, XmitsInProgress is decremented by the previous value > LOG.warn("Failed to reconstruct striped block {}", > reconInfo.getExtendedBlock().getLocalBlock(), e); > } > } > } > // 2.StripedBlockReconstructor.java > public void run() { > try { > initDecoderIfNecessary(); >... > } catch (Throwable e) { > LOG.warn("Failed to reconstruct striped block: {}", getBlockGroup(), e); > getDatanode().getMetrics().incrECFailedReconstructionTasks(); > } finally { > float xmitWeight = getErasureCodingWorker().getXmitWeight(); > // if the xmits is smaller than 1, the xmitsSubmitted should be set to 1 > // because if it set to zero, we cannot to measure the xmits submitted > int xmitsSubmitted = Math.max((int) (getXmits() * xmitWeight), 1); > getDatanode().decrementXmitsInProgress(xmitsSubmitted); // task complete > decrement > ... > } > }{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15798) EC: Reconstruct task failed, and It would be XmitsInProgress of DN has negative number
[ https://issues.apache.org/jira/browse/HDFS-15798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17277075#comment-17277075 ] huhaiyang commented on HDFS-15798: -- [~ferhui] [~sodonnell] Thank you for your advice! I think it makes sense to ,I later submit a new patch. > EC: Reconstruct task failed, and It would be XmitsInProgress of DN has > negative number > -- > > Key: HDFS-15798 > URL: https://issues.apache.org/jira/browse/HDFS-15798 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: huhaiyang >Assignee: huhaiyang >Priority: Major > Attachments: HDFS-15798.001.patch, HDFS-15798.002.patch > > > The EC reconstruct task failed, and the decrementXmitsInProgress of > processErasureCodingTasks operation abnormal value ; > It would be XmitsInProgress of DN has negative number, it affects NN chooses > pending tasks based on the ratio between the lengths of replication and > erasure-coded block queues. > {code:java} > // 1.ErasureCodingWorker.java > public void processErasureCodingTasks( > Collection ecTasks) { > for (BlockECReconstructionInfo reconInfo : ecTasks) { > int xmitsSubmitted = 0; > try { > ... > // It may throw IllegalArgumentException from task#stripedReader > // constructor. > final StripedBlockReconstructor task = > new StripedBlockReconstructor(this, stripedReconInfo); > if (task.hasValidTargets()) { > // See HDFS-12044. We increase xmitsInProgress even the task is only > // enqueued, so that > // 1) NN will not send more tasks than what DN can execute and > // 2) DN will not throw away reconstruction tasks, and instead keeps > // an unbounded number of tasks in the executor's task queue. > xmitsSubmitted = Math.max((int)(task.getXmits() * xmitWeight), 1); > getDatanode().incrementXmitsInProcess(xmitsSubmitted); // task start > increment > stripedReconstructionPool.submit(task); > } else { > LOG.warn("No missing internal block. Skip reconstruction for task:{}", > reconInfo); > } > } catch (Throwable e) { > getDatanode().decrementXmitsInProgress(xmitsSubmitted); // task failed > decrement, XmitsInProgress is decremented by the previous value > LOG.warn("Failed to reconstruct striped block {}", > reconInfo.getExtendedBlock().getLocalBlock(), e); > } > } > } > // 2.StripedBlockReconstructor.java > public void run() { > try { > initDecoderIfNecessary(); >... > } catch (Throwable e) { > LOG.warn("Failed to reconstruct striped block: {}", getBlockGroup(), e); > getDatanode().getMetrics().incrECFailedReconstructionTasks(); > } finally { > float xmitWeight = getErasureCodingWorker().getXmitWeight(); > // if the xmits is smaller than 1, the xmitsSubmitted should be set to 1 > // because if it set to zero, we cannot to measure the xmits submitted > int xmitsSubmitted = Math.max((int) (getXmits() * xmitWeight), 1); > getDatanode().decrementXmitsInProgress(xmitsSubmitted); // task complete > decrement > ... > } > }{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15792) ClasscastException while loading FSImage
[ https://issues.apache.org/jira/browse/HDFS-15792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17277069#comment-17277069 ] Renukaprasad C commented on HDFS-15792: --- Thanks [~hexiaoqiao] Sure, i will create for branch 2.10 & submit. > ClasscastException while loading FSImage > > > Key: HDFS-15792 > URL: https://issues.apache.org/jira/browse/HDFS-15792 > Project: Hadoop HDFS > Issue Type: Bug > Components: nn >Reporter: Renukaprasad C >Assignee: Renukaprasad C >Priority: Major > Fix For: 3.3.1, 3.4.0 > > Attachments: HDFS-15792.001.patch, HDFS-15792.002.patch, > HDFS-15792.003.patch, HDFS-15792.004.patch, HDFS-15792.005.patch, > HDFS-15792.addendum.001.patch, image-2021-01-27-12-00-34-846.png > > > FSImage loading has failed with ClasscastException - > java.lang.ClassCastException: java.util.HashMap$Node cannot be cast to > java.util.HashMap$TreeNode. > This is the usage issue with Hashmap in concurrent scenarios. > Same issue has been reported on Java & closed as usage issue. - > https://bugs.openjdk.java.net/browse/JDK-8173671 > 2020-12-28 11:36:26,127 | ERROR | main | An exception occurred when loading > INODE from fsiamge. | FSImageFormatProtobuf.java:442 > java.lang. > : java.util.HashMap$Node cannot be cast to java.util.HashMap$TreeNode > at java.util.HashMap$TreeNode.moveRootToFront(HashMap.java:1835) > at java.util.HashMap$TreeNode.treeify(HashMap.java:1951) > at java.util.HashMap.treeifyBin(HashMap.java:772) > at java.util.HashMap.putVal(HashMap.java:644) > at java.util.HashMap.put(HashMap.java:612) > at > org.apache.hadoop.hdfs.util.ReferenceCountMap.put(ReferenceCountMap.java:53) > at > org.apache.hadoop.hdfs.server.namenode.AclStorage.addAclFeature(AclStorage.java:391) > at > org.apache.hadoop.hdfs.server.namenode.INodeWithAdditionalFields.addAclFeature(INodeWithAdditionalFields.java:349) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeDirectory(FSImageFormatPBINode.java:225) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINode(FSImageFormatPBINode.java:406) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.readPBINodes(FSImageFormatPBINode.java:367) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeSection(FSImageFormatPBINode.java:342) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader$2.call(FSImageFormatProtobuf.java:469) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2020-12-28 11:36:26,130 | ERROR | main | Failed to load image from > FSImageFile(file=/srv/BigData/namenode/current/fsimage_00198227480, > cpktTxId=00198227480) | FSImage.java:738 > java.io.IOException: java.lang.ClassCastException: java.util.HashMap$Node > cannot be cast to java.util.HashMap$TreeNode > at > org.apache.hadoop.io.MultipleIOException$Builder.add(MultipleIOException.java:68) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.runLoaderTasks(FSImageFormatProtobuf.java:444) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:360) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:263) > at > org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:227) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:971) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:955) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:820) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:733) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:331) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1113) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:730) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:648) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:710) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:953) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:926) >
[jira] [Commented] (HDFS-15798) EC: Reconstruct task failed, and It would be XmitsInProgress of DN has negative number
[ https://issues.apache.org/jira/browse/HDFS-15798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17277067#comment-17277067 ] Stephen O'Donnell commented on HDFS-15798: -- Yes I had wondered about that too. I think it makes sense to have: {code} ... stripedReconstructionPool.submit(task); xmitsSubmitted = Math.max((int)(task.getXmits() * xmitWeight), 1); getDatanode().incrementXmitsInProcess(xmitsSubmitted); ... {code} That way, if we have some issue submitting the task the xmits will not get incremented at all. I think we can also drop the change in the test. [~haiyang Hu] Would you like to submit a new patch with these changes? > EC: Reconstruct task failed, and It would be XmitsInProgress of DN has > negative number > -- > > Key: HDFS-15798 > URL: https://issues.apache.org/jira/browse/HDFS-15798 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: huhaiyang >Assignee: huhaiyang >Priority: Major > Attachments: HDFS-15798.001.patch, HDFS-15798.002.patch > > > The EC reconstruct task failed, and the decrementXmitsInProgress of > processErasureCodingTasks operation abnormal value ; > It would be XmitsInProgress of DN has negative number, it affects NN chooses > pending tasks based on the ratio between the lengths of replication and > erasure-coded block queues. > {code:java} > // 1.ErasureCodingWorker.java > public void processErasureCodingTasks( > Collection ecTasks) { > for (BlockECReconstructionInfo reconInfo : ecTasks) { > int xmitsSubmitted = 0; > try { > ... > // It may throw IllegalArgumentException from task#stripedReader > // constructor. > final StripedBlockReconstructor task = > new StripedBlockReconstructor(this, stripedReconInfo); > if (task.hasValidTargets()) { > // See HDFS-12044. We increase xmitsInProgress even the task is only > // enqueued, so that > // 1) NN will not send more tasks than what DN can execute and > // 2) DN will not throw away reconstruction tasks, and instead keeps > // an unbounded number of tasks in the executor's task queue. > xmitsSubmitted = Math.max((int)(task.getXmits() * xmitWeight), 1); > getDatanode().incrementXmitsInProcess(xmitsSubmitted); // task start > increment > stripedReconstructionPool.submit(task); > } else { > LOG.warn("No missing internal block. Skip reconstruction for task:{}", > reconInfo); > } > } catch (Throwable e) { > getDatanode().decrementXmitsInProgress(xmitsSubmitted); // task failed > decrement, XmitsInProgress is decremented by the previous value > LOG.warn("Failed to reconstruct striped block {}", > reconInfo.getExtendedBlock().getLocalBlock(), e); > } > } > } > // 2.StripedBlockReconstructor.java > public void run() { > try { > initDecoderIfNecessary(); >... > } catch (Throwable e) { > LOG.warn("Failed to reconstruct striped block: {}", getBlockGroup(), e); > getDatanode().getMetrics().incrECFailedReconstructionTasks(); > } finally { > float xmitWeight = getErasureCodingWorker().getXmitWeight(); > // if the xmits is smaller than 1, the xmitsSubmitted should be set to 1 > // because if it set to zero, we cannot to measure the xmits submitted > int xmitsSubmitted = Math.max((int) (getXmits() * xmitWeight), 1); > getDatanode().decrementXmitsInProgress(xmitsSubmitted); // task complete > decrement > ... > } > }{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15803) EC: Remove unnecessary method (getWeight) in StripedReconstructionInfo
[ https://issues.apache.org/jira/browse/HDFS-15803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17277060#comment-17277060 ] Hui Fei commented on HDFS-15803: +1 [~haiyang Hu] Thanks for report and fix, [~sodonnell] Thanks for review! Will commit tommorow > EC: Remove unnecessary method (getWeight) in StripedReconstructionInfo > --- > > Key: HDFS-15803 > URL: https://issues.apache.org/jira/browse/HDFS-15803 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: huhaiyang >Assignee: huhaiyang >Priority: Trivial > Attachments: HDFS-15803_001.patch > > > Removing the unused method from StripedReconstructionInfo > {code:java} > // StripedReconstructionInfo.java > /** > * Return the weight of this EC reconstruction task. > * > * DN uses it to coordinate with NN to adjust the speed of scheduling the > * reconstructions tasks to this DN. > * > * @return the weight of this reconstruction task. > * @see HDFS-12044 > */ > int getWeight() { > // See HDFS-12044. The weight of a RS(n, k) is calculated by the network > // connections it opens. > return sources.length + targets.length; > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-15795) EC: Wrong checksum when reconstruction was failed by exception
[ https://issues.apache.org/jira/browse/HDFS-15795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen O'Donnell resolved HDFS-15795. -- Resolution: Fixed > EC: Wrong checksum when reconstruction was failed by exception > -- > > Key: HDFS-15795 > URL: https://issues.apache.org/jira/browse/HDFS-15795 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, ec, erasure-coding >Reporter: Yushi Hayasaka >Assignee: Yushi Hayasaka >Priority: Major > Labels: pull-request-available > Fix For: 3.3.1, 3.4.0, 3.1.5, 3.2.3 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > If the reconstruction task is failed on StripedBlockChecksumReconstructor by > exception, the checksum becomes wrong one because it is calculated with > blocks except a failure one. > It is caused by catching exception with not appropriate way. As a result, the > failed block is not fetched again. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15795) EC: Wrong checksum when reconstruction was failed by exception
[ https://issues.apache.org/jira/browse/HDFS-15795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17277035#comment-17277035 ] Stephen O'Donnell commented on HDFS-15795: -- Committed to trunk on github and it cherry-picked cleanly down to 3.1. Thanks for the contribution [~yhaya]. This was a good find. > EC: Wrong checksum when reconstruction was failed by exception > -- > > Key: HDFS-15795 > URL: https://issues.apache.org/jira/browse/HDFS-15795 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, ec, erasure-coding >Reporter: Yushi Hayasaka >Assignee: Yushi Hayasaka >Priority: Major > Labels: pull-request-available > Fix For: 3.3.1, 3.4.0, 3.1.5, 3.2.3 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > If the reconstruction task is failed on StripedBlockChecksumReconstructor by > exception, the checksum becomes wrong one because it is calculated with > blocks except a failure one. > It is caused by catching exception with not appropriate way. As a result, the > failed block is not fetched again. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15795) EC: Wrong checksum when reconstruction was failed by exception
[ https://issues.apache.org/jira/browse/HDFS-15795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen O'Donnell updated HDFS-15795: - Fix Version/s: 3.2.3 3.1.5 3.4.0 3.3.1 > EC: Wrong checksum when reconstruction was failed by exception > -- > > Key: HDFS-15795 > URL: https://issues.apache.org/jira/browse/HDFS-15795 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, ec, erasure-coding >Reporter: Yushi Hayasaka >Assignee: Yushi Hayasaka >Priority: Major > Labels: pull-request-available > Fix For: 3.3.1, 3.4.0, 3.1.5, 3.2.3 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > If the reconstruction task is failed on StripedBlockChecksumReconstructor by > exception, the checksum becomes wrong one because it is calculated with > blocks except a failure one. > It is caused by catching exception with not appropriate way. As a result, the > failed block is not fetched again. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15779) EC: fix NPE caused by StripedWriter.clearBuffers during reconstruct block
[ https://issues.apache.org/jira/browse/HDFS-15779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17277006#comment-17277006 ] Hongbing Wang commented on HDFS-15779: -- [~ferhui] Thanks for the guidance. Fix code style in [^HDFS-15779.002.patch]. > EC: fix NPE caused by StripedWriter.clearBuffers during reconstruct block > - > > Key: HDFS-15779 > URL: https://issues.apache.org/jira/browse/HDFS-15779 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.2.0 >Reporter: Hongbing Wang >Assignee: Hongbing Wang >Priority: Major > Attachments: HDFS-15779.001.patch, HDFS-15779.002.patch > > > The NullPointerException in DN log as follows: > {code:java} > 2020-12-28 15:49:25,453 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > DatanodeCommand action: DNA_ERASURE_CODING_RECOVERY > //... > 2020-12-28 15:51:25,551 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Connection timed out > 2020-12-28 15:51:25,553 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Failed to reconstruct striped block: > BP-1922004198-10.83.xx.xx-1515033360950:blk_-9223372036804064064_6311920695 > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedWriter.clearBuffers(StripedWriter.java:299) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.clearBuffers(StripedBlockReconstructor.java:139) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.reconstruct(StripedBlockReconstructor.java:115) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.run(StripedBlockReconstructor.java:60) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 2020-12-28 15:51:25,749 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Receiving > BP-1922004198-10.83.xx.xx-1515033360950:blk_-9223372036799445643_6313197139 > src: /10.83.xxx.52:53198 dest: /10.83.xxx.52:50 > 010 > {code} > NPE occurs at `writer.getTargetBuffer()` in codes: > {code:java} > // StripedWriter#clearBuffers > void clearBuffers() { > for (StripedBlockWriter writer : writers) { > ByteBuffer targetBuffer = writer.getTargetBuffer(); > if (targetBuffer != null) { > targetBuffer.clear(); > } > } > } > {code} > So, why is the writer null? Let's track when the writer is initialized and > when reconstruct() is called, as follows: > {code:java} > // StripedBlockReconstructor#run > public void run() { > try { > initDecoderIfNecessary(); > getStripedReader().init(); > stripedWriter.init(); //① > reconstruct(); //② > stripedWriter.endTargetBlocks(); > } catch (Throwable e) { > LOG.warn("Failed to reconstruct striped block: {}", getBlockGroup(), e); > // ...{code} > They are called at ① and ② above respectively. `stripedWriter.init()` -> > `initTargetStreams()`, as follows: > {code:java} > // StripedWriter#initTargetStreams > int initTargetStreams() { > int nSuccess = 0; > for (short i = 0; i < targets.length; i++) { > try { > writers[i] = createWriter(i); > nSuccess++; > targetsStatus[i] = true; > } catch (Throwable e) { > LOG.warn(e.getMessage()); > } > } > return nSuccess; > } > {code} > NPE occurs when createWriter() gets an exception and 0 < nSuccess < > targets.length. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13609) [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via RPC
[ https://issues.apache.org/jira/browse/HDFS-13609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17277005#comment-17277005 ] xuzq commented on HDFS-13609: - Hi [~xkrogen] and [~linyiqun] , recently I am learning *Consistent Reads from Standby Node*. {code:java} private void selectRpcInputStreams(Collection streams, long fromTxnId, boolean onlyDurableTxns) throws IOException { QuorumCall q = loggers.getJournaledEdits(fromTxnId, maxTxnsPerRpc); Map responseMap = loggers.waitForWriteQuorum(q, selectInputStreamsTimeoutMs, "selectRpcInputStreams"); assert responseMap.size() >= loggers.getMajoritySize() : "Quorum call returned without a majority"; List responseCounts = new ArrayList<>(); for (GetJournaledEditsResponseProto resp : responseMap.values()) { responseCounts.add(resp.getTxnCount()); } Collections.sort(responseCounts); int highestTxnCount = responseCounts.get(responseCounts.size() - 1); ... // Cancel any outstanding calls to JN's. q.cancelCalls(); int maxAllowedTxns = !onlyDurableTxns ? highestTxnCount : responseCounts.get(responseCounts.size() - loggers.getMajoritySize()); if (maxAllowedTxns == 0) { LOG.debug("No new edits available in logs; requested starting from " + "ID " + fromTxnId); return; } ... } {code} Maybe somethings wrong in {code:java} int maxAllowedTxns = !onlyDurableTxns ? highestTxnCount : responseCounts.get(responseCounts.size() - loggers.getMajoritySize());{code} * Let's say we have 5 JournalNodes, and loggers.getMajoritySize() is 3. * _loggers.getJournaledEdits(fromTxnId, maxTxnsPerRpc)_ just need quorum result, so responseCounts.size() maybe is 3. * when _onlyDurableTxns_ is false, _maxAllowedTxns_ = responseCounts.get(0) * _responseCounts.get(0)_ maybe not expect Quorum Result, and it maybe even doesn't have any results from _fromTxnId_ ** maybe one journal disk is wrong, and only write into cache for _fromTxnId_ [~xkrogen] and [~linyiqun], if have time, please look at this question, thanks. > [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via > RPC > - > > Key: HDFS-13609 > URL: https://issues.apache.org/jira/browse/HDFS-13609 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, namenode >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Fix For: HDFS-12943, 3.3.0 > > Attachments: HDFS-13609-HDFS-12943.000.patch, > HDFS-13609-HDFS-12943.001.patch, HDFS-13609-HDFS-12943.002.patch, > HDFS-13609-HDFS-12943.003.patch, HDFS-13609-HDFS-12943.004.patch > > > See HDFS-13150 for the full design. > This JIRA is targetted at the NameNode-side changes to enable tailing > in-progress edits via the RPC mechanism added in HDFS-13608. Most changes are > in the QuorumJournalManager. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15779) EC: fix NPE caused by StripedWriter.clearBuffers during reconstruct block
[ https://issues.apache.org/jira/browse/HDFS-15779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hongbing Wang updated HDFS-15779: - Attachment: HDFS-15779.002.patch > EC: fix NPE caused by StripedWriter.clearBuffers during reconstruct block > - > > Key: HDFS-15779 > URL: https://issues.apache.org/jira/browse/HDFS-15779 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.2.0 >Reporter: Hongbing Wang >Assignee: Hongbing Wang >Priority: Major > Attachments: HDFS-15779.001.patch, HDFS-15779.002.patch > > > The NullPointerException in DN log as follows: > {code:java} > 2020-12-28 15:49:25,453 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > DatanodeCommand action: DNA_ERASURE_CODING_RECOVERY > //... > 2020-12-28 15:51:25,551 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Connection timed out > 2020-12-28 15:51:25,553 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Failed to reconstruct striped block: > BP-1922004198-10.83.xx.xx-1515033360950:blk_-9223372036804064064_6311920695 > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedWriter.clearBuffers(StripedWriter.java:299) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.clearBuffers(StripedBlockReconstructor.java:139) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.reconstruct(StripedBlockReconstructor.java:115) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.run(StripedBlockReconstructor.java:60) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 2020-12-28 15:51:25,749 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Receiving > BP-1922004198-10.83.xx.xx-1515033360950:blk_-9223372036799445643_6313197139 > src: /10.83.xxx.52:53198 dest: /10.83.xxx.52:50 > 010 > {code} > NPE occurs at `writer.getTargetBuffer()` in codes: > {code:java} > // StripedWriter#clearBuffers > void clearBuffers() { > for (StripedBlockWriter writer : writers) { > ByteBuffer targetBuffer = writer.getTargetBuffer(); > if (targetBuffer != null) { > targetBuffer.clear(); > } > } > } > {code} > So, why is the writer null? Let's track when the writer is initialized and > when reconstruct() is called, as follows: > {code:java} > // StripedBlockReconstructor#run > public void run() { > try { > initDecoderIfNecessary(); > getStripedReader().init(); > stripedWriter.init(); //① > reconstruct(); //② > stripedWriter.endTargetBlocks(); > } catch (Throwable e) { > LOG.warn("Failed to reconstruct striped block: {}", getBlockGroup(), e); > // ...{code} > They are called at ① and ② above respectively. `stripedWriter.init()` -> > `initTargetStreams()`, as follows: > {code:java} > // StripedWriter#initTargetStreams > int initTargetStreams() { > int nSuccess = 0; > for (short i = 0; i < targets.length; i++) { > try { > writers[i] = createWriter(i); > nSuccess++; > targetsStatus[i] = true; > } catch (Throwable e) { > LOG.warn(e.getMessage()); > } > } > return nSuccess; > } > {code} > NPE occurs when createWriter() gets an exception and 0 < nSuccess < > targets.length. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-15795) EC: Wrong checksum when reconstruction was failed by exception
[ https://issues.apache.org/jira/browse/HDFS-15795?focusedWorklogId=545883=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-545883 ] ASF GitHub Bot logged work on HDFS-15795: - Author: ASF GitHub Bot Created on: 02/Feb/21 09:02 Start Date: 02/Feb/21 09:02 Worklog Time Spent: 10m Work Description: sodonnel merged pull request #2657: URL: https://github.com/apache/hadoop/pull/2657 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 545883) Time Spent: 1h 40m (was: 1.5h) > EC: Wrong checksum when reconstruction was failed by exception > -- > > Key: HDFS-15795 > URL: https://issues.apache.org/jira/browse/HDFS-15795 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, ec, erasure-coding >Reporter: Yushi Hayasaka >Assignee: Yushi Hayasaka >Priority: Major > Labels: pull-request-available > Time Spent: 1h 40m > Remaining Estimate: 0h > > If the reconstruction task is failed on StripedBlockChecksumReconstructor by > exception, the checksum becomes wrong one because it is calculated with > blocks except a failure one. > It is caused by catching exception with not appropriate way. As a result, the > failed block is not fetched again. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15795) EC: Wrong checksum when reconstruction was failed by exception
[ https://issues.apache.org/jira/browse/HDFS-15795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen O'Donnell updated HDFS-15795: - Summary: EC: Wrong checksum when reconstruction was failed by exception (was: EC: Returned wrong checksum when reconstruction was failed by exception) > EC: Wrong checksum when reconstruction was failed by exception > -- > > Key: HDFS-15795 > URL: https://issues.apache.org/jira/browse/HDFS-15795 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, ec, erasure-coding >Reporter: Yushi Hayasaka >Assignee: Yushi Hayasaka >Priority: Major > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > > If the reconstruction task is failed on StripedBlockChecksumReconstructor by > exception, the checksum becomes wrong one because it is calculated with > blocks except a failure one. > It is caused by catching exception with not appropriate way. As a result, the > failed block is not fetched again. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-15808) Add metrics for FSNamesystem read/write lock warnings
[ https://issues.apache.org/jira/browse/HDFS-15808?focusedWorklogId=545865=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-545865 ] ASF GitHub Bot logged work on HDFS-15808: - Author: ASF GitHub Bot Created on: 02/Feb/21 08:22 Start Date: 02/Feb/21 08:22 Worklog Time Spent: 10m Work Description: tomscut edited a comment on pull request #2668: URL: https://github.com/apache/hadoop/pull/2668#issuecomment-771457006 Failed junit tests hadoop.hdfs.server.namenode.ha.TestStandbyCheckpoints hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks Sorry. I didn't update those two unit tests, and they worked fine locally. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 545865) Time Spent: 40m (was: 0.5h) > Add metrics for FSNamesystem read/write lock warnings > - > > Key: HDFS-15808 > URL: https://issues.apache.org/jira/browse/HDFS-15808 > Project: Hadoop HDFS > Issue Type: Wish > Components: hdfs >Reporter: tomscut >Assignee: tomscut >Priority: Major > Labels: hdfs, lock, metrics, pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > To monitor how often read/write locks exceed thresholds, we can add two > metrics(ReadLockWarning/WriteLockWarning), which are exposed in JMX. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-15808) Add metrics for FSNamesystem read/write lock warnings
[ https://issues.apache.org/jira/browse/HDFS-15808?focusedWorklogId=545864=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-545864 ] ASF GitHub Bot logged work on HDFS-15808: - Author: ASF GitHub Bot Created on: 02/Feb/21 08:21 Start Date: 02/Feb/21 08:21 Worklog Time Spent: 10m Work Description: tomscut commented on pull request #2668: URL: https://github.com/apache/hadoop/pull/2668#issuecomment-771457006 Failed junit tests hadoop.hdfs.server.namenode.ha.TestStandbyCheckpoints hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks Sorry. I didn't change those two unit tests, and they worked fine locally. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 545864) Time Spent: 0.5h (was: 20m) > Add metrics for FSNamesystem read/write lock warnings > - > > Key: HDFS-15808 > URL: https://issues.apache.org/jira/browse/HDFS-15808 > Project: Hadoop HDFS > Issue Type: Wish > Components: hdfs >Reporter: tomscut >Assignee: tomscut >Priority: Major > Labels: hdfs, lock, metrics, pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > To monitor how often read/write locks exceed thresholds, we can add two > metrics(ReadLockWarning/WriteLockWarning), which are exposed in JMX. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org