[jira] [Commented] (HDFS-14313) Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory instead of df/du
[ https://issues.apache.org/jira/browse/HDFS-14313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870797#comment-16870797 ] Lisheng Sun commented on HDFS-14313: [~jojochuang] Thanks for your comment .I add synchronized to FsDatasetImpl#deepCopyReplica. {code:java} @Override public synchronized Set deepCopyReplica(String bpid) throws IOException { Set replicas = new HashSet<>(volumeMap.replicas(bpid) == null ? Collections.EMPTY_SET : volumeMap.replicas(bpid)); return replicas; } {code} I don't use FsDatasetImpl#datasetLock , Because FsDatasetImpl#addBlockPool with datasetLock call FsDatasetImpl#deepCopyReplica in another Thread. Please continue to help review code. Please correct me if I am wrong. Thanks. > Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory > instead of df/du > > > Key: HDFS-14313 > URL: https://issues.apache.org/jira/browse/HDFS-14313 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, performance >Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.0 >Reporter: Lisheng Sun >Assignee: Lisheng Sun >Priority: Major > Attachments: HDFS-14313.000.patch, HDFS-14313.001.patch, > HDFS-14313.002.patch, HDFS-14313.003.patch > > > There are two ways of DU/DF getting used space that are insufficient. > # Running DU across lots of disks is very expensive and running all of the > processes at the same time creates a noticeable IO spike. > # Running DF is inaccurate when the disk sharing by multiple datanode or > other servers. > Getting hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfos in memory > is very small and accurate. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14313) Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory instead of df/du
[ https://issues.apache.org/jira/browse/HDFS-14313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lisheng Sun updated HDFS-14313: --- Attachment: HDFS-14313.003.patch > Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory > instead of df/du > > > Key: HDFS-14313 > URL: https://issues.apache.org/jira/browse/HDFS-14313 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, performance >Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.0 >Reporter: Lisheng Sun >Assignee: Lisheng Sun >Priority: Major > Attachments: HDFS-14313.000.patch, HDFS-14313.001.patch, > HDFS-14313.002.patch, HDFS-14313.003.patch > > > There are two ways of DU/DF getting used space that are insufficient. > # Running DU across lots of disks is very expensive and running all of the > processes at the same time creates a noticeable IO spike. > # Running DF is inaccurate when the disk sharing by multiple datanode or > other servers. > Getting hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfos in memory > is very small and accurate. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12914) Block report leases cause missing blocks until next report
[ https://issues.apache.org/jira/browse/HDFS-12914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDFS-12914: --- Fix Version/s: 3.0.4 > Block report leases cause missing blocks until next report > -- > > Key: HDFS-12914 > URL: https://issues.apache.org/jira/browse/HDFS-12914 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.8.0, 2.9.2 >Reporter: Daryn Sharp >Assignee: Santosh Marella >Priority: Critical > Fix For: 3.0.4, 3.3.0, 3.2.1, 3.1.3 > > Attachments: HDFS-12914-branch-2.001.patch, > HDFS-12914-trunk.00.patch, HDFS-12914-trunk.01.patch, HDFS-12914.005.patch, > HDFS-12914.006.patch, HDFS-12914.007.patch, HDFS-12914.008.patch, > HDFS-12914.009.patch, HDFS-12914.branch-3.1.001.patch, > HDFS-12914.branch-3.1.002.patch, HDFS-12914.branch-3.2.patch, > HDFS-12914.utfix.patch > > > {{BlockReportLeaseManager#checkLease}} will reject FBRs from DNs for > conditions such as "unknown datanode", "not in pending set", "lease has > expired", wrong lease id, etc. Lease rejection does not throw an exception. > It returns false which bubbles up to {{NameNodeRpcServer#blockReport}} and > interpreted as {{noStaleStorages}}. > A re-registering node whose FBR is rejected from an invalid lease becomes > active with _no blocks_. A replication storm ensues possibly causing DNs to > temporarily go dead (HDFS-12645), leading to more FBR lease rejections on > re-registration. The cluster will have many "missing blocks" until the DNs > next FBR is sent and/or forced. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12914) Block report leases cause missing blocks until next report
[ https://issues.apache.org/jira/browse/HDFS-12914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDFS-12914: --- Fix Version/s: 3.1.3 > Block report leases cause missing blocks until next report > -- > > Key: HDFS-12914 > URL: https://issues.apache.org/jira/browse/HDFS-12914 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.8.0, 2.9.2 >Reporter: Daryn Sharp >Assignee: Santosh Marella >Priority: Critical > Fix For: 3.3.0, 3.2.1, 3.1.3 > > Attachments: HDFS-12914-branch-2.001.patch, > HDFS-12914-trunk.00.patch, HDFS-12914-trunk.01.patch, HDFS-12914.005.patch, > HDFS-12914.006.patch, HDFS-12914.007.patch, HDFS-12914.008.patch, > HDFS-12914.009.patch, HDFS-12914.branch-3.1.001.patch, > HDFS-12914.branch-3.1.002.patch, HDFS-12914.branch-3.2.patch, > HDFS-12914.utfix.patch > > > {{BlockReportLeaseManager#checkLease}} will reject FBRs from DNs for > conditions such as "unknown datanode", "not in pending set", "lease has > expired", wrong lease id, etc. Lease rejection does not throw an exception. > It returns false which bubbles up to {{NameNodeRpcServer#blockReport}} and > interpreted as {{noStaleStorages}}. > A re-registering node whose FBR is rejected from an invalid lease becomes > active with _no blocks_. A replication storm ensues possibly causing DNs to > temporarily go dead (HDFS-12645), leading to more FBR lease rejections on > re-registration. The cluster will have many "missing blocks" until the DNs > next FBR is sent and/or forced. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Issue Comment Deleted] (HDFS-14313) Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory instead of df/du
[ https://issues.apache.org/jira/browse/HDFS-14313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lisheng Sun updated HDFS-14313: --- Comment: was deleted (was: [~jojochuang] Thanks for your comments. I add lock in FsDatasetImpl#deepCopyReplica. {code:java} @Override public Set deepCopyReplica(String bpid) throws IOException { try (AutoCloseableLock lock = datasetLock.acquire()) { Set replicas = new HashSet<>(volumeMap.replicas(bpid) == null ? Collections.EMPTY_SET : volumeMap.replicas(bpid)); return replicas; } } {code} Please continue to help review this issue. Thank you. ) > Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory > instead of df/du > > > Key: HDFS-14313 > URL: https://issues.apache.org/jira/browse/HDFS-14313 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, performance >Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.0 >Reporter: Lisheng Sun >Assignee: Lisheng Sun >Priority: Major > Attachments: HDFS-14313.000.patch, HDFS-14313.001.patch, > HDFS-14313.002.patch > > > There are two ways of DU/DF getting used space that are insufficient. > # Running DU across lots of disks is very expensive and running all of the > processes at the same time creates a noticeable IO spike. > # Running DF is inaccurate when the disk sharing by multiple datanode or > other servers. > Getting hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfos in memory > is very small and accurate. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12914) Block report leases cause missing blocks until next report
[ https://issues.apache.org/jira/browse/HDFS-12914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870775#comment-16870775 ] Wei-Chiu Chuang commented on HDFS-12914: +1 for the branch-3.1 002 patch. Other than the TestDiskBalancer test, failed tests don't reproduce for me. > Block report leases cause missing blocks until next report > -- > > Key: HDFS-12914 > URL: https://issues.apache.org/jira/browse/HDFS-12914 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.8.0, 2.9.2 >Reporter: Daryn Sharp >Assignee: Santosh Marella >Priority: Critical > Fix For: 3.3.0, 3.2.1 > > Attachments: HDFS-12914-branch-2.001.patch, > HDFS-12914-trunk.00.patch, HDFS-12914-trunk.01.patch, HDFS-12914.005.patch, > HDFS-12914.006.patch, HDFS-12914.007.patch, HDFS-12914.008.patch, > HDFS-12914.009.patch, HDFS-12914.branch-3.1.001.patch, > HDFS-12914.branch-3.1.002.patch, HDFS-12914.branch-3.2.patch, > HDFS-12914.utfix.patch > > > {{BlockReportLeaseManager#checkLease}} will reject FBRs from DNs for > conditions such as "unknown datanode", "not in pending set", "lease has > expired", wrong lease id, etc. Lease rejection does not throw an exception. > It returns false which bubbles up to {{NameNodeRpcServer#blockReport}} and > interpreted as {{noStaleStorages}}. > A re-registering node whose FBR is rejected from an invalid lease becomes > active with _no blocks_. A replication storm ensues possibly causing DNs to > temporarily go dead (HDFS-12645), leading to more FBR lease rejections on > re-registration. The cluster will have many "missing blocks" until the DNs > next FBR is sent and/or forced. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14313) Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory instead of df/du
[ https://issues.apache.org/jira/browse/HDFS-14313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870773#comment-16870773 ] Lisheng Sun commented on HDFS-14313: [~jojochuang] Thanks for your comments. I add lock in FsDatasetImpl#deepCopyReplica. {code:java} @Override public Set deepCopyReplica(String bpid) throws IOException { try (AutoCloseableLock lock = datasetLock.acquire()) { Set replicas = new HashSet<>(volumeMap.replicas(bpid) == null ? Collections.EMPTY_SET : volumeMap.replicas(bpid)); return replicas; } } {code} Please continue to help review this issue. Thank you. > Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory > instead of df/du > > > Key: HDFS-14313 > URL: https://issues.apache.org/jira/browse/HDFS-14313 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, performance >Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.0 >Reporter: Lisheng Sun >Assignee: Lisheng Sun >Priority: Major > Attachments: HDFS-14313.000.patch, HDFS-14313.001.patch, > HDFS-14313.002.patch > > > There are two ways of DU/DF getting used space that are insufficient. > # Running DU across lots of disks is very expensive and running all of the > processes at the same time creates a noticeable IO spike. > # Running DF is inaccurate when the disk sharing by multiple datanode or > other servers. > Getting hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfos in memory > is very small and accurate. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14313) Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory instead of df/du
[ https://issues.apache.org/jira/browse/HDFS-14313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lisheng Sun updated HDFS-14313: --- Attachment: HDFS-14313.002.patch > Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory > instead of df/du > > > Key: HDFS-14313 > URL: https://issues.apache.org/jira/browse/HDFS-14313 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, performance >Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.0 >Reporter: Lisheng Sun >Assignee: Lisheng Sun >Priority: Major > Attachments: HDFS-14313.000.patch, HDFS-14313.001.patch, > HDFS-14313.002.patch > > > There are two ways of DU/DF getting used space that are insufficient. > # Running DU across lots of disks is very expensive and running all of the > processes at the same time creates a noticeable IO spike. > # Running DF is inaccurate when the disk sharing by multiple datanode or > other servers. > Getting hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfos in memory > is very small and accurate. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14590) [SBN Read] Add the document link to the top page
[ https://issues.apache.org/jira/browse/HDFS-14590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870739#comment-16870739 ] Hadoop QA commented on HDFS-14590: -- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 23s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 17s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 19s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 31m 13s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 2s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 47s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 37s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 47m 0s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e | | JIRA Issue | HDFS-14590 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12972676/HDFS-14590.002.patch | | Optional Tests | dupname asflicense mvnsite xml | | uname | Linux ac0b78de9ef9 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / b28ddb2 | | maven | version: Apache Maven 3.3.9 | | Max. process+thread count | 318 (vs. ulimit of 1) | | modules | C: hadoop-project U: hadoop-project | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/27047/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > [SBN Read] Add the document link to the top page > > > Key: HDFS-14590 > URL: https://issues.apache.org/jira/browse/HDFS-14590 > Project: Hadoop HDFS > Issue Type: Bug > Components: documentation >Reporter: Takanobu Asanuma >Assignee: Takanobu Asanuma >Priority: Major > Attachments: HDFS-14590.001.patch, HDFS-14590.002.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14593) RBF: RouterAdmin should be able to remove expired routers from Routers Information
[ https://issues.apache.org/jira/browse/HDFS-14593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870729#comment-16870729 ] Takanobu Asanuma commented on HDFS-14593: - Thanks for your comment, [~crh]. IIUC, currently users can't remove old routers, which are replaced or whose hostnames are changed, from RouterState(Router Information page). I propose that RouterAdmin can remove a router from RouterState when its status is EXPIRED. > RBF: RouterAdmin should be able to remove expired routers from Routers > Information > -- > > Key: HDFS-14593 > URL: https://issues.apache.org/jira/browse/HDFS-14593 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf >Reporter: Takanobu Asanuma >Assignee: Takanobu Asanuma >Priority: Major > > Currently, any router seems to exist in the Router Information eternally. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14590) [SBN Read] Add the document link to the top page
[ https://issues.apache.org/jira/browse/HDFS-14590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870719#comment-16870719 ] Takanobu Asanuma commented on HDFS-14590: - Thanks for your review, [~ayushtkn]. I agree with you. Updated the patch. > [SBN Read] Add the document link to the top page > > > Key: HDFS-14590 > URL: https://issues.apache.org/jira/browse/HDFS-14590 > Project: Hadoop HDFS > Issue Type: Bug > Components: documentation >Reporter: Takanobu Asanuma >Assignee: Takanobu Asanuma >Priority: Major > Attachments: HDFS-14590.001.patch, HDFS-14590.002.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14590) [SBN Read] Add the document link to the top page
[ https://issues.apache.org/jira/browse/HDFS-14590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takanobu Asanuma updated HDFS-14590: Attachment: HDFS-14590.002.patch > [SBN Read] Add the document link to the top page > > > Key: HDFS-14590 > URL: https://issues.apache.org/jira/browse/HDFS-14590 > Project: Hadoop HDFS > Issue Type: Bug > Components: documentation >Reporter: Takanobu Asanuma >Assignee: Takanobu Asanuma >Priority: Major > Attachments: HDFS-14590.001.patch, HDFS-14590.002.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-12914) Block report leases cause missing blocks until next report
[ https://issues.apache.org/jira/browse/HDFS-12914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870714#comment-16870714 ] Wei-Chiu Chuang edited comment on HDFS-12914 at 6/24/19 12:36 AM: -- Looks like HDFS-12487 breaks the TestDiskBalancer test. Filed HDFS-14599 for that. was (Author: jojochuang): Looks like HDFS-12487 breaks the getBlockToCopy test. Filed HDFS-14599 for that. > Block report leases cause missing blocks until next report > -- > > Key: HDFS-12914 > URL: https://issues.apache.org/jira/browse/HDFS-12914 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.8.0, 2.9.2 >Reporter: Daryn Sharp >Assignee: Santosh Marella >Priority: Critical > Fix For: 3.3.0, 3.2.1 > > Attachments: HDFS-12914-branch-2.001.patch, > HDFS-12914-trunk.00.patch, HDFS-12914-trunk.01.patch, HDFS-12914.005.patch, > HDFS-12914.006.patch, HDFS-12914.007.patch, HDFS-12914.008.patch, > HDFS-12914.009.patch, HDFS-12914.branch-3.1.001.patch, > HDFS-12914.branch-3.1.002.patch, HDFS-12914.branch-3.2.patch, > HDFS-12914.utfix.patch > > > {{BlockReportLeaseManager#checkLease}} will reject FBRs from DNs for > conditions such as "unknown datanode", "not in pending set", "lease has > expired", wrong lease id, etc. Lease rejection does not throw an exception. > It returns false which bubbles up to {{NameNodeRpcServer#blockReport}} and > interpreted as {{noStaleStorages}}. > A re-registering node whose FBR is rejected from an invalid lease becomes > active with _no blocks_. A replication storm ensues possibly causing DNs to > temporarily go dead (HDFS-12645), leading to more FBR lease rejections on > re-registration. The cluster will have many "missing blocks" until the DNs > next FBR is sent and/or forced. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-14599) HDFS-12487 breaks test TestDiskBalancer.testDiskBalancerWithFedClusterWithOneNameServiceEmpty
Wei-Chiu Chuang created HDFS-14599: -- Summary: HDFS-12487 breaks test TestDiskBalancer.testDiskBalancerWithFedClusterWithOneNameServiceEmpty Key: HDFS-14599 URL: https://issues.apache.org/jira/browse/HDFS-14599 Project: Hadoop HDFS Issue Type: Bug Components: diskbalancer Affects Versions: 3.3.0, 3.2.1, 3.1.3 Reporter: Wei-Chiu Chuang It looks like HDFS-12487 changes the error message expected by {{TestDiskBalancer#testDiskBalancerWithFedClusterWithOneNameServiceEmpty}}. The test expects error "There are no blocks in the blockPool" but after HDFS-12487, it returns error string "NextBlock call returned null.No valid block to copy." Probably the simplest approach to fix it is to update the expected error string. Thoughts? [~bharatviswa] you crafted the test in HDFS-13715. Should we update the expected error string, or revert HDFS-12487? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-12914) Block report leases cause missing blocks until next report
[ https://issues.apache.org/jira/browse/HDFS-12914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870714#comment-16870714 ] Wei-Chiu Chuang edited comment on HDFS-12914 at 6/24/19 12:36 AM: -- Looks like HDFS-12487 breaks the getBlockToCopy test. Filed HDFS-14599 for that. was (Author: jojochuang): Looks like HDFS-12487 breaks the getBlockToCopy test. > Block report leases cause missing blocks until next report > -- > > Key: HDFS-12914 > URL: https://issues.apache.org/jira/browse/HDFS-12914 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.8.0, 2.9.2 >Reporter: Daryn Sharp >Assignee: Santosh Marella >Priority: Critical > Fix For: 3.3.0, 3.2.1 > > Attachments: HDFS-12914-branch-2.001.patch, > HDFS-12914-trunk.00.patch, HDFS-12914-trunk.01.patch, HDFS-12914.005.patch, > HDFS-12914.006.patch, HDFS-12914.007.patch, HDFS-12914.008.patch, > HDFS-12914.009.patch, HDFS-12914.branch-3.1.001.patch, > HDFS-12914.branch-3.1.002.patch, HDFS-12914.branch-3.2.patch, > HDFS-12914.utfix.patch > > > {{BlockReportLeaseManager#checkLease}} will reject FBRs from DNs for > conditions such as "unknown datanode", "not in pending set", "lease has > expired", wrong lease id, etc. Lease rejection does not throw an exception. > It returns false which bubbles up to {{NameNodeRpcServer#blockReport}} and > interpreted as {{noStaleStorages}}. > A re-registering node whose FBR is rejected from an invalid lease becomes > active with _no blocks_. A replication storm ensues possibly causing DNs to > temporarily go dead (HDFS-12645), leading to more FBR lease rejections on > re-registration. The cluster will have many "missing blocks" until the DNs > next FBR is sent and/or forced. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12914) Block report leases cause missing blocks until next report
[ https://issues.apache.org/jira/browse/HDFS-12914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870714#comment-16870714 ] Wei-Chiu Chuang commented on HDFS-12914: Looks like HDFS-12487 breaks the getBlockToCopy test. > Block report leases cause missing blocks until next report > -- > > Key: HDFS-12914 > URL: https://issues.apache.org/jira/browse/HDFS-12914 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.8.0, 2.9.2 >Reporter: Daryn Sharp >Assignee: Santosh Marella >Priority: Critical > Fix For: 3.3.0, 3.2.1 > > Attachments: HDFS-12914-branch-2.001.patch, > HDFS-12914-trunk.00.patch, HDFS-12914-trunk.01.patch, HDFS-12914.005.patch, > HDFS-12914.006.patch, HDFS-12914.007.patch, HDFS-12914.008.patch, > HDFS-12914.009.patch, HDFS-12914.branch-3.1.001.patch, > HDFS-12914.branch-3.1.002.patch, HDFS-12914.branch-3.2.patch, > HDFS-12914.utfix.patch > > > {{BlockReportLeaseManager#checkLease}} will reject FBRs from DNs for > conditions such as "unknown datanode", "not in pending set", "lease has > expired", wrong lease id, etc. Lease rejection does not throw an exception. > It returns false which bubbles up to {{NameNodeRpcServer#blockReport}} and > interpreted as {{noStaleStorages}}. > A re-registering node whose FBR is rejected from an invalid lease becomes > active with _no blocks_. A replication storm ensues possibly causing DNs to > temporarily go dead (HDFS-12645), leading to more FBR lease rejections on > re-registration. The cluster will have many "missing blocks" until the DNs > next FBR is sent and/or forced. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12914) Block report leases cause missing blocks until next report
[ https://issues.apache.org/jira/browse/HDFS-12914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870712#comment-16870712 ] Wei-Chiu Chuang commented on HDFS-12914: Filed HDFS-14598 for the findbugs warning. > Block report leases cause missing blocks until next report > -- > > Key: HDFS-12914 > URL: https://issues.apache.org/jira/browse/HDFS-12914 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.8.0, 2.9.2 >Reporter: Daryn Sharp >Assignee: Santosh Marella >Priority: Critical > Fix For: 3.3.0, 3.2.1 > > Attachments: HDFS-12914-branch-2.001.patch, > HDFS-12914-trunk.00.patch, HDFS-12914-trunk.01.patch, HDFS-12914.005.patch, > HDFS-12914.006.patch, HDFS-12914.007.patch, HDFS-12914.008.patch, > HDFS-12914.009.patch, HDFS-12914.branch-3.1.001.patch, > HDFS-12914.branch-3.1.002.patch, HDFS-12914.branch-3.2.patch, > HDFS-12914.utfix.patch > > > {{BlockReportLeaseManager#checkLease}} will reject FBRs from DNs for > conditions such as "unknown datanode", "not in pending set", "lease has > expired", wrong lease id, etc. Lease rejection does not throw an exception. > It returns false which bubbles up to {{NameNodeRpcServer#blockReport}} and > interpreted as {{noStaleStorages}}. > A re-registering node whose FBR is rejected from an invalid lease becomes > active with _no blocks_. A replication storm ensues possibly causing DNs to > temporarily go dead (HDFS-12645), leading to more FBR lease rejections on > re-registration. The cluster will have many "missing blocks" until the DNs > next FBR is sent and/or forced. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-14598) Findbugs warning caused by HDFS-12487
Wei-Chiu Chuang created HDFS-14598: -- Summary: Findbugs warning caused by HDFS-12487 Key: HDFS-14598 URL: https://issues.apache.org/jira/browse/HDFS-14598 Project: Hadoop HDFS Issue Type: Bug Components: diskbalancer Reporter: Wei-Chiu Chuang https://builds.apache.org/job/PreCommit-HDFS-Build/27038/artifact/out/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html {noformat} Redundant nullcheck of block, which is known to be non-null in org.apache.hadoop.hdfs.server.datanode.DiskBalancer$DiskBalancerMover.getBlockToCopy(FsVolumeSpi$BlockIterator, DiskBalancerWorkItem) Bug type RCN_REDUNDANT_NULLCHECK_OF_NONNULL_VALUE (click for details) In class org.apache.hadoop.hdfs.server.datanode.DiskBalancer$DiskBalancerMover In method org.apache.hadoop.hdfs.server.datanode.DiskBalancer$DiskBalancerMover.getBlockToCopy(FsVolumeSpi$BlockIterator, DiskBalancerWorkItem) Value loaded from block Return value of org.apache.hadoop.hdfs.server.datanode.fsdataset.FsVolumeSpi$BlockIterator.nextBlock() of type org.apache.hadoop.hdfs.protocol.ExtendedBlock Redundant null check at DiskBalancer.java:[line 912] {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14586) Trash missing delete the folder which near timeout checkpoint
[ https://issues.apache.org/jira/browse/HDFS-14586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870692#comment-16870692 ] maobaolong commented on HDFS-14586: --- [~hexiaoqiao] i have another viewpoint, we need to keep checkpoint generated timestamp to measure something, so he do the check when delete, i think the two ways are both correct, but this way can keep the timestamp. > Trash missing delete the folder which near timeout checkpoint > - > > Key: HDFS-14586 > URL: https://issues.apache.org/jira/browse/HDFS-14586 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: hu yongfa >Assignee: hu yongfa >Priority: Major > Attachments: HDFS-14586.001.patch > > > when trash timeout checkpoint coming, trash will delete the old folder first, > then create a new checkpoint folder. > as the delete action may spend a long time, such as 2 minutes, so the new > checkpoint folder created late. > at the next trash timeout checkpoint, trash will skip delete the new > checkpoint folder, because the new checkpoint folder is > less than a checkpoint interval. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11242) Add refresh cluster network topology operation to dfs admin
[ https://issues.apache.org/jira/browse/HDFS-11242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870634#comment-16870634 ] Lars Francke commented on HDFS-11242: - I'm afraid I still don't fully understand. Either way [~reidchan] I think this is a good change and desperately needed. Are you willing to rebase and bring up-to-date? If you like I can ask on the mailing list about some review support before you spent time on this. > Add refresh cluster network topology operation to dfs admin > --- > > Key: HDFS-11242 > URL: https://issues.apache.org/jira/browse/HDFS-11242 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.0.0-alpha1 >Reporter: Reid Chan >Priority: Minor > Attachments: HDFS-11242.002.patch, HDFS-11242.patch > > > The network topology and dns to switch mapping are initialized at the start > of the namenode. > If admin wants to change the topology because of new datanodes added, he has > to stop and restart namenode(s), otherwise those new added datanodes are > squeezed under /default-rack. > It is a low frequency operation, but it should be operated appropriately, so > dfs admin should take the responsibility. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14429) Block remain in COMMITTED but not COMPLETE cause by Decommission
[ https://issues.apache.org/jira/browse/HDFS-14429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870607#comment-16870607 ] Hadoop QA commented on HDFS-14429: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 21m 16s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} branch-2 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 32s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 50s{color} | {color:green} branch-2 passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 46s{color} | {color:green} branch-2 passed with JDK v1.8.0_212 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 31s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 57s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 52s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 7s{color} | {color:green} branch-2 passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 43s{color} | {color:green} branch-2 passed with JDK v1.8.0_212 {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 46s{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s{color} | {color:green} the patch passed with JDK v1.8.0_212 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 26s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 20 new + 159 unchanged - 0 fixed = 179 total (was 159) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 1s{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 39s{color} | {color:green} the patch passed with JDK v1.8.0_212 {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 71m 16s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 29s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}125m 42s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.qjournal.server.TestJournalNodeRespectsBindHostKeys | | | hadoop.hdfs.server.namenode.ha.TestEditLogTailer | | | hadoop.hdfs.server.namenode.TestNamenodeCapacityReport | | | hadoop.hdfs.server.balancer.TestBalancerRPCDelay | | | hadoop.hdfs.server.datanode.TestDirectoryScanner | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:da67579 | | JIRA Issue | HDFS-14429 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12972620/HDFS-14429.branch-2.01.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 2b1a32e77838 3.13.0-153-generic #203-Ubuntu SMP
[jira] [Commented] (HDFS-14429) Block remain in COMMITTED but not COMPLETE cause by Decommission
[ https://issues.apache.org/jira/browse/HDFS-14429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870593#comment-16870593 ] Hadoop QA commented on HDFS-14429: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 39s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 17s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 6s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 33s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 58s{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs in trunk has 1 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 48s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 38s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 18 new + 135 unchanged - 0 fixed = 153 total (was 135) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 44s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 5s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}106m 57s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 33s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}165m 42s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.diskbalancer.TestDiskBalancer | | | hadoop.hdfs.server.datanode.TestDirectoryScanner | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e | | JIRA Issue | HDFS-14429 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12972616/HDFS-14429.02.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 2d54115f24c2 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / b28ddb2 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_212 | | findbugs | v3.1.0-RC1 | | findbugs | https://builds.apache.org/job/PreCommit-HDFS-Build/27045/artifact/out/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/27045/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt | | unit |
[jira] [Updated] (HDFS-14429) Block remain in COMMITTED but not COMPLETE cause by Decommission
[ https://issues.apache.org/jira/browse/HDFS-14429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yicong Cai updated HDFS-14429: -- Attachment: HDFS-14429.branch-2.01.patch > Block remain in COMMITTED but not COMPLETE cause by Decommission > > > Key: HDFS-14429 > URL: https://issues.apache.org/jira/browse/HDFS-14429 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.9.2 >Reporter: Yicong Cai >Assignee: Yicong Cai >Priority: Major > Attachments: HDFS-14429.01.patch, HDFS-14429.02.patch, > HDFS-14429.branch-2.01.patch > > > In the following scenario, the Block will remain in the COMMITTED but not > COMPLETE state and cannot be closed properly: > # Client writes Block(bk1) to three data nodes (dn1/dn2/dn3). > # bk1 has been completely written to three data nodes, and the data node > succeeds FinalizeBlock, joins IBR and waits to report to NameNode. > # The client commits bk1 after receiving the ACK. > # When the DN has not been reported to the IBR, all three nodes dn1/dn2/dn3 > enter Decommissioning. > # The DN reports the IBR, but the block cannot be completed normally. > > Then it will lead to the following related exceptions: > {panel:title=Exception} > 2019-04-02 13:40:31,882 INFO namenode.FSNamesystem > (FSNamesystem.java:checkBlocksComplete(2790)) - BLOCK* > blk_4313483521_3245321090 is COMMITTED but not COMPLETE(numNodes= 3 >= > minimum = 1) in file xxx > 2019-04-02 13:40:31,882 INFO ipc.Server (Server.java:logException(2650)) - > IPC Server handler 499 on 8020, call Call#122552 Retry#0 > org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from xxx:47615 > org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not > replicated yet: xxx > at > org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.validateAddBlock(FSDirWriteFileOp.java:171) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2579) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:846) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:510) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:503) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:871) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:817) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2606) > {panel} > And will cause the scenario described in HDFS-12747 > The root cause is that addStoredBlock does not consider the case where the > replications are in Decommission. > This problem needs to be fixed like HDFS-11499. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14429) Block remain in COMMITTED but not COMPLETE cause by Decommission
[ https://issues.apache.org/jira/browse/HDFS-14429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870579#comment-16870579 ] Yicong Cai commented on HDFS-14429: --- Provided branch-2 [^HDFS-14429.branch-2.01.patch] and trunck [^HDFS-14429.02.patch] [~jojochuang] > Block remain in COMMITTED but not COMPLETE cause by Decommission > > > Key: HDFS-14429 > URL: https://issues.apache.org/jira/browse/HDFS-14429 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.9.2 >Reporter: Yicong Cai >Assignee: Yicong Cai >Priority: Major > Attachments: HDFS-14429.01.patch, HDFS-14429.02.patch, > HDFS-14429.branch-2.01.patch > > > In the following scenario, the Block will remain in the COMMITTED but not > COMPLETE state and cannot be closed properly: > # Client writes Block(bk1) to three data nodes (dn1/dn2/dn3). > # bk1 has been completely written to three data nodes, and the data node > succeeds FinalizeBlock, joins IBR and waits to report to NameNode. > # The client commits bk1 after receiving the ACK. > # When the DN has not been reported to the IBR, all three nodes dn1/dn2/dn3 > enter Decommissioning. > # The DN reports the IBR, but the block cannot be completed normally. > > Then it will lead to the following related exceptions: > {panel:title=Exception} > 2019-04-02 13:40:31,882 INFO namenode.FSNamesystem > (FSNamesystem.java:checkBlocksComplete(2790)) - BLOCK* > blk_4313483521_3245321090 is COMMITTED but not COMPLETE(numNodes= 3 >= > minimum = 1) in file xxx > 2019-04-02 13:40:31,882 INFO ipc.Server (Server.java:logException(2650)) - > IPC Server handler 499 on 8020, call Call#122552 Retry#0 > org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from xxx:47615 > org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not > replicated yet: xxx > at > org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.validateAddBlock(FSDirWriteFileOp.java:171) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2579) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:846) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:510) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:503) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:871) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:817) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2606) > {panel} > And will cause the scenario described in HDFS-12747 > The root cause is that addStoredBlock does not consider the case where the > replications are in Decommission. > This problem needs to be fixed like HDFS-11499. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14429) Block remain in COMMITTED but not COMPLETE cause by Decommission
[ https://issues.apache.org/jira/browse/HDFS-14429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yicong Cai updated HDFS-14429: -- Target Version/s: 2.10.0, 3.3.0, 2.9.3 (was: 3.3.0, 2.9.3) > Block remain in COMMITTED but not COMPLETE cause by Decommission > > > Key: HDFS-14429 > URL: https://issues.apache.org/jira/browse/HDFS-14429 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.9.2 >Reporter: Yicong Cai >Assignee: Yicong Cai >Priority: Major > Attachments: HDFS-14429.01.patch, HDFS-14429.02.patch, > HDFS-14429.branch-2.01.patch > > > In the following scenario, the Block will remain in the COMMITTED but not > COMPLETE state and cannot be closed properly: > # Client writes Block(bk1) to three data nodes (dn1/dn2/dn3). > # bk1 has been completely written to three data nodes, and the data node > succeeds FinalizeBlock, joins IBR and waits to report to NameNode. > # The client commits bk1 after receiving the ACK. > # When the DN has not been reported to the IBR, all three nodes dn1/dn2/dn3 > enter Decommissioning. > # The DN reports the IBR, but the block cannot be completed normally. > > Then it will lead to the following related exceptions: > {panel:title=Exception} > 2019-04-02 13:40:31,882 INFO namenode.FSNamesystem > (FSNamesystem.java:checkBlocksComplete(2790)) - BLOCK* > blk_4313483521_3245321090 is COMMITTED but not COMPLETE(numNodes= 3 >= > minimum = 1) in file xxx > 2019-04-02 13:40:31,882 INFO ipc.Server (Server.java:logException(2650)) - > IPC Server handler 499 on 8020, call Call#122552 Retry#0 > org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from xxx:47615 > org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not > replicated yet: xxx > at > org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.validateAddBlock(FSDirWriteFileOp.java:171) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2579) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:846) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:510) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:503) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:871) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:817) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2606) > {panel} > And will cause the scenario described in HDFS-12747 > The root cause is that addStoredBlock does not consider the case where the > replications are in Decommission. > This problem needs to be fixed like HDFS-11499. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11242) Add refresh cluster network topology operation to dfs admin
[ https://issues.apache.org/jira/browse/HDFS-11242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870574#comment-16870574 ] He Xiaoqiao commented on HDFS-11242: Thanks [~larsfrancke] for your doubt and the detailed comments. For the situation you said above, I think the proper way is: a. clean CachedDNSToSwitchMapping#cache complete or just clean one host. (which do not support currently). b. fix the bug via Consul (sorry I am not familiar with that). c. stop DataNode and better to wait for proper time (630s by default which namenode consider datanode dead) then start DataNode. Please correct me if something wrong. I want to state that change network topology / rack aware script is very risky. We should change it cautiously. especially for large cluster. FYI. > Add refresh cluster network topology operation to dfs admin > --- > > Key: HDFS-11242 > URL: https://issues.apache.org/jira/browse/HDFS-11242 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.0.0-alpha1 >Reporter: Reid Chan >Priority: Minor > Attachments: HDFS-11242.002.patch, HDFS-11242.patch > > > The network topology and dns to switch mapping are initialized at the start > of the namenode. > If admin wants to change the topology because of new datanodes added, he has > to stop and restart namenode(s), otherwise those new added datanodes are > squeezed under /default-rack. > It is a low frequency operation, but it should be operated appropriately, so > dfs admin should take the responsibility. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11242) Add refresh cluster network topology operation to dfs admin
[ https://issues.apache.org/jira/browse/HDFS-11242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870566#comment-16870566 ] Lars Francke commented on HDFS-11242: - I think I'm getting a clearer picture now but (and there's always a chance I'm mistaken) I believe what you're suggesting does not work. Focusing on your second point: This is what's not currently possible! Because the script is only ever called _once_ and the result is then cached. Look at CachedDNSToSwitchMapping. So if you have a flexible rack aware script (which we do) but have a bug in it (e.g. reading racks from Consul but there's an error in the mapping) you have no way of fixing that error without restarting the NameNode because the bad result is cached. Regarding your first point: Thanks for the pointer. After a restart a fixed rack awareness script would be called anyway so this wouldn't make any difference. In short: I believe the patch & solution that [~reidchan] proposed is a very valid use-case. > Add refresh cluster network topology operation to dfs admin > --- > > Key: HDFS-11242 > URL: https://issues.apache.org/jira/browse/HDFS-11242 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.0.0-alpha1 >Reporter: Reid Chan >Priority: Minor > Attachments: HDFS-11242.002.patch, HDFS-11242.patch > > > The network topology and dns to switch mapping are initialized at the start > of the namenode. > If admin wants to change the topology because of new datanodes added, he has > to stop and restart namenode(s), otherwise those new added datanodes are > squeezed under /default-rack. > It is a low frequency operation, but it should be operated appropriately, so > dfs admin should take the responsibility. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12914) Block report leases cause missing blocks until next report
[ https://issues.apache.org/jira/browse/HDFS-12914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870558#comment-16870558 ] He Xiaoqiao commented on HDFS-12914: I found failed unit tests also failed before patch. Any changes for branch-3.1 recently? > Block report leases cause missing blocks until next report > -- > > Key: HDFS-12914 > URL: https://issues.apache.org/jira/browse/HDFS-12914 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.8.0, 2.9.2 >Reporter: Daryn Sharp >Assignee: Santosh Marella >Priority: Critical > Fix For: 3.3.0, 3.2.1 > > Attachments: HDFS-12914-branch-2.001.patch, > HDFS-12914-trunk.00.patch, HDFS-12914-trunk.01.patch, HDFS-12914.005.patch, > HDFS-12914.006.patch, HDFS-12914.007.patch, HDFS-12914.008.patch, > HDFS-12914.009.patch, HDFS-12914.branch-3.1.001.patch, > HDFS-12914.branch-3.1.002.patch, HDFS-12914.branch-3.2.patch, > HDFS-12914.utfix.patch > > > {{BlockReportLeaseManager#checkLease}} will reject FBRs from DNs for > conditions such as "unknown datanode", "not in pending set", "lease has > expired", wrong lease id, etc. Lease rejection does not throw an exception. > It returns false which bubbles up to {{NameNodeRpcServer#blockReport}} and > interpreted as {{noStaleStorages}}. > A re-registering node whose FBR is rejected from an invalid lease becomes > active with _no blocks_. A replication storm ensues possibly causing DNs to > temporarily go dead (HDFS-12645), leading to more FBR lease rejections on > re-registration. The cluster will have many "missing blocks" until the DNs > next FBR is sent and/or forced. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11242) Add refresh cluster network topology operation to dfs admin
[ https://issues.apache.org/jira/browse/HDFS-11242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870557#comment-16870557 ] He Xiaoqiao commented on HDFS-11242: Thanks [~larsfrancke] for your quick response. IIUC, namenode will check if replications of one block has enough racks based on PlacementPolicy when namenode restart or failover. {{BlockManager#postponedMisreplicatedBlocks}} could be one data structure which is related. Please double check and correct me if something wrong. As mentioned above, I prefer to design flexible rack aware script (refer to https://hadoop.apache.org/docs/r3.2.0/hadoop-project-dist/hadoop-common/RackAwareness.html), rather than have to specify proper rack when add new nodes. Thanks again. > Add refresh cluster network topology operation to dfs admin > --- > > Key: HDFS-11242 > URL: https://issues.apache.org/jira/browse/HDFS-11242 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.0.0-alpha1 >Reporter: Reid Chan >Priority: Minor > Attachments: HDFS-11242.002.patch, HDFS-11242.patch > > > The network topology and dns to switch mapping are initialized at the start > of the namenode. > If admin wants to change the topology because of new datanodes added, he has > to stop and restart namenode(s), otherwise those new added datanodes are > squeezed under /default-rack. > It is a low frequency operation, but it should be operated appropriately, so > dfs admin should take the responsibility. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14429) Block remain in COMMITTED but not COMPLETE cause by Decommission
[ https://issues.apache.org/jira/browse/HDFS-14429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yicong Cai updated HDFS-14429: -- Attachment: HDFS-14429.02.patch > Block remain in COMMITTED but not COMPLETE cause by Decommission > > > Key: HDFS-14429 > URL: https://issues.apache.org/jira/browse/HDFS-14429 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.9.2 >Reporter: Yicong Cai >Assignee: Yicong Cai >Priority: Major > Attachments: HDFS-14429.01.patch, HDFS-14429.02.patch > > > In the following scenario, the Block will remain in the COMMITTED but not > COMPLETE state and cannot be closed properly: > # Client writes Block(bk1) to three data nodes (dn1/dn2/dn3). > # bk1 has been completely written to three data nodes, and the data node > succeeds FinalizeBlock, joins IBR and waits to report to NameNode. > # The client commits bk1 after receiving the ACK. > # When the DN has not been reported to the IBR, all three nodes dn1/dn2/dn3 > enter Decommissioning. > # The DN reports the IBR, but the block cannot be completed normally. > > Then it will lead to the following related exceptions: > {panel:title=Exception} > 2019-04-02 13:40:31,882 INFO namenode.FSNamesystem > (FSNamesystem.java:checkBlocksComplete(2790)) - BLOCK* > blk_4313483521_3245321090 is COMMITTED but not COMPLETE(numNodes= 3 >= > minimum = 1) in file xxx > 2019-04-02 13:40:31,882 INFO ipc.Server (Server.java:logException(2650)) - > IPC Server handler 499 on 8020, call Call#122552 Retry#0 > org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from xxx:47615 > org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not > replicated yet: xxx > at > org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.validateAddBlock(FSDirWriteFileOp.java:171) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2579) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:846) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:510) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:503) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:871) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:817) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2606) > {panel} > And will cause the scenario described in HDFS-12747 > The root cause is that addStoredBlock does not consider the case where the > replications are in Decommission. > This problem needs to be fixed like HDFS-11499. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14429) Block remain in COMMITTED but not COMPLETE cause by Decommission
[ https://issues.apache.org/jira/browse/HDFS-14429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yicong Cai updated HDFS-14429: -- Attachment: (was: HDFS-14429.02.patch) > Block remain in COMMITTED but not COMPLETE cause by Decommission > > > Key: HDFS-14429 > URL: https://issues.apache.org/jira/browse/HDFS-14429 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.9.2 >Reporter: Yicong Cai >Assignee: Yicong Cai >Priority: Major > Attachments: HDFS-14429.01.patch > > > In the following scenario, the Block will remain in the COMMITTED but not > COMPLETE state and cannot be closed properly: > # Client writes Block(bk1) to three data nodes (dn1/dn2/dn3). > # bk1 has been completely written to three data nodes, and the data node > succeeds FinalizeBlock, joins IBR and waits to report to NameNode. > # The client commits bk1 after receiving the ACK. > # When the DN has not been reported to the IBR, all three nodes dn1/dn2/dn3 > enter Decommissioning. > # The DN reports the IBR, but the block cannot be completed normally. > > Then it will lead to the following related exceptions: > {panel:title=Exception} > 2019-04-02 13:40:31,882 INFO namenode.FSNamesystem > (FSNamesystem.java:checkBlocksComplete(2790)) - BLOCK* > blk_4313483521_3245321090 is COMMITTED but not COMPLETE(numNodes= 3 >= > minimum = 1) in file xxx > 2019-04-02 13:40:31,882 INFO ipc.Server (Server.java:logException(2650)) - > IPC Server handler 499 on 8020, call Call#122552 Retry#0 > org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from xxx:47615 > org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not > replicated yet: xxx > at > org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.validateAddBlock(FSDirWriteFileOp.java:171) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2579) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:846) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:510) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:503) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:871) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:817) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2606) > {panel} > And will cause the scenario described in HDFS-12747 > The root cause is that addStoredBlock does not consider the case where the > replications are in Decommission. > This problem needs to be fixed like HDFS-11499. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14429) Block remain in COMMITTED but not COMPLETE cause by Decommission
[ https://issues.apache.org/jira/browse/HDFS-14429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yicong Cai updated HDFS-14429: -- Attachment: HDFS-14429.02.patch > Block remain in COMMITTED but not COMPLETE cause by Decommission > > > Key: HDFS-14429 > URL: https://issues.apache.org/jira/browse/HDFS-14429 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.9.2 >Reporter: Yicong Cai >Assignee: Yicong Cai >Priority: Major > Attachments: HDFS-14429.01.patch, HDFS-14429.02.patch > > > In the following scenario, the Block will remain in the COMMITTED but not > COMPLETE state and cannot be closed properly: > # Client writes Block(bk1) to three data nodes (dn1/dn2/dn3). > # bk1 has been completely written to three data nodes, and the data node > succeeds FinalizeBlock, joins IBR and waits to report to NameNode. > # The client commits bk1 after receiving the ACK. > # When the DN has not been reported to the IBR, all three nodes dn1/dn2/dn3 > enter Decommissioning. > # The DN reports the IBR, but the block cannot be completed normally. > > Then it will lead to the following related exceptions: > {panel:title=Exception} > 2019-04-02 13:40:31,882 INFO namenode.FSNamesystem > (FSNamesystem.java:checkBlocksComplete(2790)) - BLOCK* > blk_4313483521_3245321090 is COMMITTED but not COMPLETE(numNodes= 3 >= > minimum = 1) in file xxx > 2019-04-02 13:40:31,882 INFO ipc.Server (Server.java:logException(2650)) - > IPC Server handler 499 on 8020, call Call#122552 Retry#0 > org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from xxx:47615 > org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not > replicated yet: xxx > at > org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.validateAddBlock(FSDirWriteFileOp.java:171) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2579) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:846) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:510) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:503) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:871) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:817) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2606) > {panel} > And will cause the scenario described in HDFS-12747 > The root cause is that addStoredBlock does not consider the case where the > replications are in Decommission. > This problem needs to be fixed like HDFS-11499. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11242) Add refresh cluster network topology operation to dfs admin
[ https://issues.apache.org/jira/browse/HDFS-11242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870546#comment-16870546 ] Lars Francke commented on HDFS-11242: - Specifically: What [~reidchan] can currently already be done but requires a restart, so the same "risk" already exists. Does HDFS start moving data when the topolgy changes? I always was under the assumption that it only happens on a balance operation. > Add refresh cluster network topology operation to dfs admin > --- > > Key: HDFS-11242 > URL: https://issues.apache.org/jira/browse/HDFS-11242 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.0.0-alpha1 >Reporter: Reid Chan >Priority: Minor > Attachments: HDFS-11242.002.patch, HDFS-11242.patch > > > The network topology and dns to switch mapping are initialized at the start > of the namenode. > If admin wants to change the topology because of new datanodes added, he has > to stop and restart namenode(s), otherwise those new added datanodes are > squeezed under /default-rack. > It is a low frequency operation, but it should be operated appropriately, so > dfs admin should take the responsibility. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11242) Add refresh cluster network topology operation to dfs admin
[ https://issues.apache.org/jira/browse/HDFS-11242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870544#comment-16870544 ] Lars Francke commented on HDFS-11242: - [~hexiaoqiao] I don't understand. Can you be more specific? The usecase I see is simple: You add new nodes but forget to specify the proper rack, the only way to fix it now is to restart the NameNodes which is not a good experience. I haven't looked at the patch in detail but all I need is a way to call reloadCachedMappings on the DNSToSwitchMapping. We can change the ScriptBasedMapping to not extend CachedDNSToSwitchMapping or to change CachedDNSToSwitchMapping to periodically invalidate all entries or something like that. > Add refresh cluster network topology operation to dfs admin > --- > > Key: HDFS-11242 > URL: https://issues.apache.org/jira/browse/HDFS-11242 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.0.0-alpha1 >Reporter: Reid Chan >Priority: Minor > Attachments: HDFS-11242.002.patch, HDFS-11242.patch > > > The network topology and dns to switch mapping are initialized at the start > of the namenode. > If admin wants to change the topology because of new datanodes added, he has > to stop and restart namenode(s), otherwise those new added datanodes are > squeezed under /default-rack. > It is a low frequency operation, but it should be operated appropriately, so > dfs admin should take the responsibility. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11242) Add refresh cluster network topology operation to dfs admin
[ https://issues.apache.org/jira/browse/HDFS-11242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870542#comment-16870542 ] He Xiaoqiao commented on HDFS-11242: It looks like a good idea in some ways. But I do not think we should do that, since it could bring huge data transfer risk and destroy stability of cluster if we provide tools to refresh network topology dynamically. IMO, we should design more flexible rack aware script and avoid to change network topology. > Add refresh cluster network topology operation to dfs admin > --- > > Key: HDFS-11242 > URL: https://issues.apache.org/jira/browse/HDFS-11242 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.0.0-alpha1 >Reporter: Reid Chan >Priority: Minor > Attachments: HDFS-11242.002.patch, HDFS-11242.patch > > > The network topology and dns to switch mapping are initialized at the start > of the namenode. > If admin wants to change the topology because of new datanodes added, he has > to stop and restart namenode(s), otherwise those new added datanodes are > squeezed under /default-rack. > It is a low frequency operation, but it should be operated appropriately, so > dfs admin should take the responsibility. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14577) RBF: FederationUtil#newInstance should allow constructor without context
[ https://issues.apache.org/jira/browse/HDFS-14577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870543#comment-16870543 ] Hadoop QA commented on HDFS-14577: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 52s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} HDFS-13891 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 23s{color} | {color:green} HDFS-13891 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 52s{color} | {color:green} HDFS-13891 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 30s{color} | {color:green} HDFS-13891 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 51s{color} | {color:green} HDFS-13891 passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 59s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 21s{color} | {color:green} HDFS-13891 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 44s{color} | {color:green} HDFS-13891 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 1s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 8s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 28m 21s{color} | {color:red} hadoop-hdfs-rbf in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 35s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 94m 28s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.federation.router.TestRouterWithSecureStartup | | | hadoop.hdfs.server.federation.security.TestRouterHttpDelegationToken | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e | | JIRA Issue | HDFS-14577 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12972008/HDFS-14577-HDFS-13891.001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 3e4b94d9553a 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | HDFS-13891 / 02597b6 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_212 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/27044/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/27044/testReport/ | | Max. process+thread count | 1585 (vs. ulimit of 1) | | modules | C: hadoop-hdfs-project/hadoop-hdfs-rbf U:
[jira] [Commented] (HDFS-11242) Add refresh cluster network topology operation to dfs admin
[ https://issues.apache.org/jira/browse/HDFS-11242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870532#comment-16870532 ] Reid Chan commented on HDFS-11242: -- Yes, only if any committer or member can review or comment, most importantly willing to have it in HDFS, otherwise it will just get no notice and set aside as it was. > Add refresh cluster network topology operation to dfs admin > --- > > Key: HDFS-11242 > URL: https://issues.apache.org/jira/browse/HDFS-11242 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.0.0-alpha1 >Reporter: Reid Chan >Priority: Minor > Attachments: HDFS-11242.002.patch, HDFS-11242.patch > > > The network topology and dns to switch mapping are initialized at the start > of the namenode. > If admin wants to change the topology because of new datanodes added, he has > to stop and restart namenode(s), otherwise those new added datanodes are > squeezed under /default-rack. > It is a low frequency operation, but it should be operated appropriately, so > dfs admin should take the responsibility. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11242) Add refresh cluster network topology operation to dfs admin
[ https://issues.apache.org/jira/browse/HDFS-11242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870486#comment-16870486 ] Hadoop QA commented on HDFS-11242: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 6s{color} | {color:red} HDFS-11242 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HDFS-11242 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12843182/HDFS-11242.002.patch | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/27043/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > Add refresh cluster network topology operation to dfs admin > --- > > Key: HDFS-11242 > URL: https://issues.apache.org/jira/browse/HDFS-11242 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.0.0-alpha1 >Reporter: Reid Chan >Priority: Minor > Attachments: HDFS-11242.002.patch, HDFS-11242.patch > > > The network topology and dns to switch mapping are initialized at the start > of the namenode. > If admin wants to change the topology because of new datanodes added, he has > to stop and restart namenode(s), otherwise those new added datanodes are > squeezed under /default-rack. > It is a low frequency operation, but it should be operated appropriately, so > dfs admin should take the responsibility. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11242) Add refresh cluster network topology operation to dfs admin
[ https://issues.apache.org/jira/browse/HDFS-11242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870483#comment-16870483 ] Lars Francke commented on HDFS-11242: - This would be super useful to have. [~reidchan] are you by any chance looking to revive this patch? > Add refresh cluster network topology operation to dfs admin > --- > > Key: HDFS-11242 > URL: https://issues.apache.org/jira/browse/HDFS-11242 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.0.0-alpha1 >Reporter: Reid Chan >Priority: Minor > Attachments: HDFS-11242.002.patch, HDFS-11242.patch > > > The network topology and dns to switch mapping are initialized at the start > of the namenode. > If admin wants to change the topology because of new datanodes added, he has > to stop and restart namenode(s), otherwise those new added datanodes are > squeezed under /default-rack. > It is a low frequency operation, but it should be operated appropriately, so > dfs admin should take the responsibility. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org