[jira] [Commented] (HDFS-15438) Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy
[ https://issues.apache.org/jira/browse/HDFS-15438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17305227#comment-17305227 ] Stephen O'Donnell commented on HDFS-15438: -- This was a clean cherry-pick down to 3.1 and the TestDiskBalancer tests ran fine, so I backported to the lower branches too. > Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy > -- > > Key: HDFS-15438 > URL: https://issues.apache.org/jira/browse/HDFS-15438 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover >Reporter: AMC-team >Assignee: AMC-team >Priority: Major > Fix For: 3.3.1, 3.4.0, 3.1.5, 3.2.3 > > Attachments: HDFS-15438.000.patch, HDFS-15438.001.patch, Screen Shot > 2020-09-03 at 4.33.53 PM.png > > > In HDFS disk balancer, the config parameter > "dfs.disk.balancer.max.disk.errors" is to control the value of maximum number > of errors we can ignore for a specific move between two disks before it is > abandoned. > The parameter can accept value that >= 0. And setting the value to 0 should > mean no error tolerance. However, setting the value to 0 will simply don't do > the block copy even there is no disk error occur because the while loop > condition *item.getErrorCount() < getMaxError(item)* will not satisfied. > {code:java} > // Gets the next block that we can copy > private ExtendedBlock getBlockToCopy(FsVolumeSpi.BlockIterator iter, > DiskBalancerWorkItem item) { > while (!iter.atEnd() && item.getErrorCount() < getMaxError(item)) { > try { > ... //get the block > } catch (IOException e) { > item.incErrorCount(); > } >if (item.getErrorCount() >= getMaxError(item)) { > item.setErrMsg("Error count exceeded."); > LOG.info("Maximum error count exceeded. Error count: {} Max error:{} > ", > item.getErrorCount(), item.getMaxDiskErrors()); > } > {code} > *How to fix* > Change the while loop condition to support value 0. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15438) Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy
[ https://issues.apache.org/jira/browse/HDFS-15438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17198305#comment-17198305 ] Ayush Saxena commented on HDFS-15438: - Committed to trunk. Thanx [~AMC-team] for the contribution. > Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy > -- > > Key: HDFS-15438 > URL: https://issues.apache.org/jira/browse/HDFS-15438 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover >Reporter: AMC-team >Assignee: AMC-team >Priority: Major > Attachments: HDFS-15438.000.patch, HDFS-15438.001.patch, Screen Shot > 2020-09-03 at 4.33.53 PM.png > > > In HDFS disk balancer, the config parameter > "dfs.disk.balancer.max.disk.errors" is to control the value of maximum number > of errors we can ignore for a specific move between two disks before it is > abandoned. > The parameter can accept value that >= 0. And setting the value to 0 should > mean no error tolerance. However, setting the value to 0 will simply don't do > the block copy even there is no disk error occur because the while loop > condition *item.getErrorCount() < getMaxError(item)* will not satisfied. > {code:java} > // Gets the next block that we can copy > private ExtendedBlock getBlockToCopy(FsVolumeSpi.BlockIterator iter, > DiskBalancerWorkItem item) { > while (!iter.atEnd() && item.getErrorCount() < getMaxError(item)) { > try { > ... //get the block > } catch (IOException e) { > item.incErrorCount(); > } >if (item.getErrorCount() >= getMaxError(item)) { > item.setErrMsg("Error count exceeded."); > LOG.info("Maximum error count exceeded. Error count: {} Max error:{} > ", > item.getErrorCount(), item.getMaxDiskErrors()); > } > {code} > *How to fix* > Change the while loop condition to support value 0. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15438) Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy
[ https://issues.apache.org/jira/browse/HDFS-15438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17190978#comment-17190978 ] Ayush Saxena commented on HDFS-15438: - +1 > Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy > -- > > Key: HDFS-15438 > URL: https://issues.apache.org/jira/browse/HDFS-15438 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover >Reporter: AMC-team >Priority: Major > Attachments: HDFS-15438.000.patch, HDFS-15438.001.patch, Screen Shot > 2020-09-03 at 4.33.53 PM.png > > > In HDFS disk balancer, the config parameter > "dfs.disk.balancer.max.disk.errors" is to control the value of maximum number > of errors we can ignore for a specific move between two disks before it is > abandoned. > The parameter can accept value that >= 0. And setting the value to 0 should > mean no error tolerance. However, setting the value to 0 will simply don't do > the block copy even there is no disk error occur because the while loop > condition *item.getErrorCount() < getMaxError(item)* will not satisfied. > {code:java} > // Gets the next block that we can copy > private ExtendedBlock getBlockToCopy(FsVolumeSpi.BlockIterator iter, > DiskBalancerWorkItem item) { > while (!iter.atEnd() && item.getErrorCount() < getMaxError(item)) { > try { > ... //get the block > } catch (IOException e) { > item.incErrorCount(); > } >if (item.getErrorCount() >= getMaxError(item)) { > item.setErrMsg("Error count exceeded."); > LOG.info("Maximum error count exceeded. Error count: {} Max error:{} > ", > item.getErrorCount(), item.getMaxDiskErrors()); > } > {code} > *How to fix* > Change the while loop condition to support value 0. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15438) Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy
[ https://issues.apache.org/jira/browse/HDFS-15438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17189950#comment-17189950 ] AMC-team commented on HDFS-15438: - [~ayushtkn] Sorry for the late reply, I check the related tests and they passed. !Screen Shot 2020-09-03 at 4.33.53 PM.png! Please let me know if I need to check more failed tests > Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy > -- > > Key: HDFS-15438 > URL: https://issues.apache.org/jira/browse/HDFS-15438 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover >Reporter: AMC-team >Priority: Major > Attachments: HDFS-15438.000.patch, HDFS-15438.001.patch, Screen Shot > 2020-09-03 at 4.33.53 PM.png > > > In HDFS disk balancer, the config parameter > "dfs.disk.balancer.max.disk.errors" is to control the value of maximum number > of errors we can ignore for a specific move between two disks before it is > abandoned. > The parameter can accept value that >= 0. And setting the value to 0 should > mean no error tolerance. However, setting the value to 0 will simply don't do > the block copy even there is no disk error occur because the while loop > condition *item.getErrorCount() < getMaxError(item)* will not satisfied. > {code:java} > // Gets the next block that we can copy > private ExtendedBlock getBlockToCopy(FsVolumeSpi.BlockIterator iter, > DiskBalancerWorkItem item) { > while (!iter.atEnd() && item.getErrorCount() < getMaxError(item)) { > try { > ... //get the block > } catch (IOException e) { > item.incErrorCount(); > } >if (item.getErrorCount() >= getMaxError(item)) { > item.setErrMsg("Error count exceeded."); > LOG.info("Maximum error count exceeded. Error count: {} Max error:{} > ", > item.getErrorCount(), item.getMaxDiskErrors()); > } > {code} > *How to fix* > Change the while loop condition to support value 0. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15438) Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy
[ https://issues.apache.org/jira/browse/HDFS-15438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17181417#comment-17181417 ] Hadoop QA commented on HDFS-15438: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 13s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 57s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 16s{color} | {color:green} trunk passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 8s{color} | {color:green} trunk passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 15s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 49s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 48s{color} | {color:green} trunk passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 19s{color} | {color:green} trunk passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 2m 59s{color} | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 55s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 8s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 7s{color} | {color:green} the patch passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 7s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 4s{color} | {color:green} the patch passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 35s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 44s{color} | {color:green} the patch passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 18s{color} | {color:green} the patch passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 58s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}124m 50s{color} | {color:red} hadoop-hdfs in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 40s{color} | {color:green} The patch does not generate ASF License warnings.
[jira] [Commented] (HDFS-15438) Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy
[ https://issues.apache.org/jira/browse/HDFS-15438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17167085#comment-17167085 ] Ayush Saxena commented on HDFS-15438: - Can you check the test failures, Couple of them seems related > Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy > -- > > Key: HDFS-15438 > URL: https://issues.apache.org/jira/browse/HDFS-15438 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover >Reporter: AMC-team >Priority: Major > Attachments: HDFS-15438.000.patch, HDFS-15438.001.patch > > > In HDFS disk balancer, the config parameter > "dfs.disk.balancer.max.disk.errors" is to control the value of maximum number > of errors we can ignore for a specific move between two disks before it is > abandoned. > The parameter can accept value that >= 0. And setting the value to 0 should > mean no error tolerance. However, setting the value to 0 will simply don't do > the block copy even there is no disk error occur because the while loop > condition *item.getErrorCount() < getMaxError(item)* will not satisfied. > {code:java} > // Gets the next block that we can copy > private ExtendedBlock getBlockToCopy(FsVolumeSpi.BlockIterator iter, > DiskBalancerWorkItem item) { > while (!iter.atEnd() && item.getErrorCount() < getMaxError(item)) { > try { > ... //get the block > } catch (IOException e) { > item.incErrorCount(); > } >if (item.getErrorCount() >= getMaxError(item)) { > item.setErrMsg("Error count exceeded."); > LOG.info("Maximum error count exceeded. Error count: {} Max error:{} > ", > item.getErrorCount(), item.getMaxDiskErrors()); > } > {code} > *How to fix* > Change the while loop condition to support value 0. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15438) Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy
[ https://issues.apache.org/jira/browse/HDFS-15438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17165320#comment-17165320 ] Hadoop QA commented on HDFS-15438: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 2m 33s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 17s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 52s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 21s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 17m 35s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 44s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 3m 31s{color} | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 28s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 16s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 35s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}132m 34s{color} | {color:red} hadoop-hdfs in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 44s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}209m 33s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.tools.TestHdfsConfigFields | | | hadoop.hdfs.server.namenode.TestNameNodeRetryCacheMetrics | | | hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA | | | hadoop.hdfs.server.namenode.TestAddOverReplicatedStripedBlocks | | | hadoop.hdfs.server.balancer.TestBalancer | | | hadoop.fs.contract.hdfs.TestHDFSContractMultipartUploader | | | hadoop.hdfs.TestGetFileChecksum | | | hadoop.hdfs.server.sps.TestExternalStoragePolicySatisfier | | | hadoop.hdfs.server.diskbalancer.command.TestDiskBalancerCommand | \\ \\ || Subsystem || Report/Notes || | Docker | ClientAPI=1.40 ServerAPI=1.40 base: https://builds.apache.org/job/PreCommit-HDFS-Build/29565/artifact/out/Dockerfile | | JIRA Issue | HDFS-15438 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/13008384/HDFS-15438.001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname |
[jira] [Commented] (HDFS-15438) Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy
[ https://issues.apache.org/jira/browse/HDFS-15438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17164810#comment-17164810 ] Hadoop QA commented on HDFS-15438: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 0m 35s{color} | {color:red} Docker failed to build yetus/hadoop:cce5a6f6094. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HDFS-15438 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/13008384/HDFS-15438.001.patch | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/29564/console | | versions | git=2.17.1 | | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org | This message was automatically generated. > Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy > -- > > Key: HDFS-15438 > URL: https://issues.apache.org/jira/browse/HDFS-15438 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover >Reporter: AMC-team >Priority: Major > Attachments: HDFS-15438.000.patch, HDFS-15438.001.patch > > > In HDFS disk balancer, the config parameter > "dfs.disk.balancer.max.disk.errors" is to control the value of maximum number > of errors we can ignore for a specific move between two disks before it is > abandoned. > The parameter can accept value that >= 0. And setting the value to 0 should > mean no error tolerance. However, setting the value to 0 will simply don't do > the block copy even there is no disk error occur because the while loop > condition *item.getErrorCount() < getMaxError(item)* will not satisfied. > {code:java} > // Gets the next block that we can copy > private ExtendedBlock getBlockToCopy(FsVolumeSpi.BlockIterator iter, > DiskBalancerWorkItem item) { > while (!iter.atEnd() && item.getErrorCount() < getMaxError(item)) { > try { > ... //get the block > } catch (IOException e) { > item.incErrorCount(); > } >if (item.getErrorCount() >= getMaxError(item)) { > item.setErrMsg("Error count exceeded."); > LOG.info("Maximum error count exceeded. Error count: {} Max error:{} > ", > item.getErrorCount(), item.getMaxDiskErrors()); > } > {code} > *How to fix* > Change the while loop condition to support value 0. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15438) Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy
[ https://issues.apache.org/jira/browse/HDFS-15438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17164714#comment-17164714 ] Hadoop QA commented on HDFS-15438: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 0m 48s{color} | {color:red} Docker failed to build yetus/hadoop:cce5a6f6094. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HDFS-15438 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/13008384/HDFS-15438.001.patch | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/29559/console | | versions | git=2.17.1 | | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org | This message was automatically generated. > Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy > -- > > Key: HDFS-15438 > URL: https://issues.apache.org/jira/browse/HDFS-15438 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover >Reporter: AMC-team >Priority: Major > Attachments: HDFS-15438.000.patch, HDFS-15438.001.patch > > > In HDFS disk balancer, the config parameter > "dfs.disk.balancer.max.disk.errors" is to control the value of maximum number > of errors we can ignore for a specific move between two disks before it is > abandoned. > The parameter can accept value that >= 0. And setting the value to 0 should > mean no error tolerance. However, setting the value to 0 will simply don't do > the block copy even there is no disk error occur because the while loop > condition *item.getErrorCount() < getMaxError(item)* will not satisfied. > {code:java} > // Gets the next block that we can copy > private ExtendedBlock getBlockToCopy(FsVolumeSpi.BlockIterator iter, > DiskBalancerWorkItem item) { > while (!iter.atEnd() && item.getErrorCount() < getMaxError(item)) { > try { > ... //get the block > } catch (IOException e) { > item.incErrorCount(); > } >if (item.getErrorCount() >= getMaxError(item)) { > item.setErrMsg("Error count exceeded."); > LOG.info("Maximum error count exceeded. Error count: {} Max error:{} > ", > item.getErrorCount(), item.getMaxDiskErrors()); > } > {code} > *How to fix* > Change the while loop condition to support value 0. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15438) Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy
[ https://issues.apache.org/jira/browse/HDFS-15438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17164713#comment-17164713 ] AMC-team commented on HDFS-15438: - Thanks [~ayushtkn] for the feedback. I upload a patch to change the while loop condition and if condition to support value 0. What's more, IMHO, the current code logic may be more intuitive. Previously if we set dfs.disk.balancer.max.disk.errors to n, it can actually just tolerate n-1 errors. Now it can tolerate n errors, which is more consistent with the parameter's documentation: {quote}During a block move from a source to destination disk, we might encounter various errors. *This defines how many errors we can tolerate* before we declare a move between 2 disks (or a step) has failed. {quote} > Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy > -- > > Key: HDFS-15438 > URL: https://issues.apache.org/jira/browse/HDFS-15438 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover >Reporter: AMC-team >Priority: Major > Attachments: HDFS-15438.000.patch, HDFS-15438.001.patch > > > In HDFS disk balancer, the config parameter > "dfs.disk.balancer.max.disk.errors" is to control the value of maximum number > of errors we can ignore for a specific move between two disks before it is > abandoned. > The parameter can accept value that >= 0. And setting the value to 0 should > mean no error tolerance. However, setting the value to 0 will simply don't do > the block copy even there is no disk error occur because the while loop > condition *item.getErrorCount() < getMaxError(item)* will not satisfied. > {code:java} > // Gets the next block that we can copy > private ExtendedBlock getBlockToCopy(FsVolumeSpi.BlockIterator iter, > DiskBalancerWorkItem item) { > while (!iter.atEnd() && item.getErrorCount() < getMaxError(item)) { > try { > ... //get the block > } catch (IOException e) { > item.incErrorCount(); > } >if (item.getErrorCount() >= getMaxError(item)) { > item.setErrMsg("Error count exceeded."); > LOG.info("Maximum error count exceeded. Error count: {} Max error:{} > ", > item.getErrorCount(), item.getMaxDiskErrors()); > } > {code} > *How to fix* > Change the while loop condition to support value 0. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15438) Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy
[ https://issues.apache.org/jira/browse/HDFS-15438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17164690#comment-17164690 ] AMC-team commented on HDFS-15438: - I upload a new patch based on [~ayushtkn]'s suggestion. IMHO, the current logic may be better because previously if we set "dfs.disk.balancer.max.disk.errors" to n, it actually can just tolerate n-1 errors because of the while loop condition. Now it can tolerate n errors, which is more consistent with the documentation: {quote}During a block move from a source to destination disk, we might encounter various errors. This defines how many errors we can tolerate before we declare a move between 2 disks (or a step) has failed.{quote} > Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy > -- > > Key: HDFS-15438 > URL: https://issues.apache.org/jira/browse/HDFS-15438 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover >Reporter: AMC-team >Priority: Major > Attachments: HDFS-15438.000.patch, HDFS-15438.001.patch > > > In HDFS disk balancer, the config parameter > "dfs.disk.balancer.max.disk.errors" is to control the value of maximum number > of errors we can ignore for a specific move between two disks before it is > abandoned. > The parameter can accept value that >= 0. And setting the value to 0 should > mean no error tolerance. However, setting the value to 0 will simply don't do > the block copy even there is no disk error occur because the while loop > condition *item.getErrorCount() < getMaxError(item)* will not satisfied. > {code:java} > // Gets the next block that we can copy > private ExtendedBlock getBlockToCopy(FsVolumeSpi.BlockIterator iter, > DiskBalancerWorkItem item) { > while (!iter.atEnd() && item.getErrorCount() < getMaxError(item)) { > try { > ... //get the block > } catch (IOException e) { > item.incErrorCount(); > } >if (item.getErrorCount() >= getMaxError(item)) { > item.setErrMsg("Error count exceeded."); > LOG.info("Maximum error count exceeded. Error count: {} Max error:{} > ", > item.getErrorCount(), item.getMaxDiskErrors()); > } > {code} > *How to fix* > Change the while loop condition to support value 0. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15438) Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy
[ https://issues.apache.org/jira/browse/HDFS-15438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17164689#comment-17164689 ] Hadoop QA commented on HDFS-15438: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 0m 25s{color} | {color:red} Docker failed to build yetus/hadoop:cce5a6f6094. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HDFS-15438 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/13008378/HDFS-15438.001.patch | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/29555/console | | versions | git=2.17.1 | | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org | This message was automatically generated. > Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy > -- > > Key: HDFS-15438 > URL: https://issues.apache.org/jira/browse/HDFS-15438 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover >Reporter: AMC-team >Priority: Major > Attachments: HDFS-15438.000.patch, HDFS-15438.001.patch > > > In HDFS disk balancer, the config parameter > "dfs.disk.balancer.max.disk.errors" is to control the value of maximum number > of errors we can ignore for a specific move between two disks before it is > abandoned. > The parameter can accept value that >= 0. And setting the value to 0 should > mean no error tolerance. However, setting the value to 0 will simply don't do > the block copy even there is no disk error occur because the while loop > condition *item.getErrorCount() < getMaxError(item)* will not satisfied. > {code:java} > // Gets the next block that we can copy > private ExtendedBlock getBlockToCopy(FsVolumeSpi.BlockIterator iter, > DiskBalancerWorkItem item) { > while (!iter.atEnd() && item.getErrorCount() < getMaxError(item)) { > try { > ... //get the block > } catch (IOException e) { > item.incErrorCount(); > } >if (item.getErrorCount() >= getMaxError(item)) { > item.setErrMsg("Error count exceeded."); > LOG.info("Maximum error count exceeded. Error count: {} Max error:{} > ", > item.getErrorCount(), item.getMaxDiskErrors()); > } > {code} > *How to fix* > Change the while loop condition to support value 0. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15438) Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy
[ https://issues.apache.org/jira/browse/HDFS-15438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17161369#comment-17161369 ] Ayush Saxena commented on HDFS-15438: - Thanx [~AMC-team] for the report. even if you get away here, you may get stuck below at L926 : {code:java} if (item.getErrorCount() >= getMaxError(item)) { {code} error count and max error both shall be zero and this condition shall become true and ultimately you would land up setting an error. Instead of this : {code:java} + (item.getErrorCount() == 0 || item.getErrorCount() < getMaxError(item))) { {code} Shouldn't we just have : {code:java} item.getErrorCount() <= getMaxError(item) { {code} and even tweak the if condition at L926 to remove the '=' sign? cc [~aengineer] you wrote this up, any pointers. > Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy > -- > > Key: HDFS-15438 > URL: https://issues.apache.org/jira/browse/HDFS-15438 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover >Reporter: AMC-team >Priority: Major > Attachments: HDFS-15438.000.patch > > > In HDFS disk balancer, the config parameter > "dfs.disk.balancer.max.disk.errors" is to control the value of maximum number > of errors we can ignore for a specific move between two disks before it is > abandoned. > The parameter can accept value that >= 0. And setting the value to 0 should > mean no error tolerance. However, setting the value to 0 will simply don't do > the block copy even there is no disk error occur because the while loop > condition *item.getErrorCount() < getMaxError(item)* will not satisfied. > {code:java} > // Gets the next block that we can copy > private ExtendedBlock getBlockToCopy(FsVolumeSpi.BlockIterator iter, > DiskBalancerWorkItem item) { > while (!iter.atEnd() && item.getErrorCount() < getMaxError(item)) { > try { > ... //get the block > } catch (IOException e) { > item.incErrorCount(); > } >if (item.getErrorCount() >= getMaxError(item)) { > item.setErrMsg("Error count exceeded."); > LOG.info("Maximum error count exceeded. Error count: {} Max error:{} > ", > item.getErrorCount(), item.getMaxDiskErrors()); > } > {code} > *How to fix* > Change the while loop condition to support value 0. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15438) Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy
[ https://issues.apache.org/jira/browse/HDFS-15438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17159774#comment-17159774 ] Hadoop QA commented on HDFS-15438: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 27m 40s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 15s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 51s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 23s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 17m 40s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 46s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 3m 23s{color} | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 21s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 11s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 44s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 3 unchanged - 0 fixed = 4 total (was 3) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 18s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 41s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 26s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}133m 50s{color} | {color:red} hadoop-hdfs in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 49s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}236m 42s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.fs.viewfs.TestViewFileSystemLinkFallback | | | hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped | | | hadoop.hdfs.tools.TestDFSZKFailoverController | | | hadoop.hdfs.server.datanode.TestBPOfferService | | | hadoop.hdfs.TestMultipleNNPortQOP | | | hadoop.hdfs.server.namenode.TestReconstructStripedBlocks | | | hadoop.hdfs.web.TestWebHdfsWithMultipleNameNodes | | | hadoop.fs.contract.hdfs.TestHDFSContractMultipartUploader | | | hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl | | | hadoop.hdfs.TestGetFileChecksum | | | hadoop.hdfs.server.sps.TestExternalStoragePolicySatisfier | \\ \\ || Subsystem
[jira] [Commented] (HDFS-15438) Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy
[ https://issues.apache.org/jira/browse/HDFS-15438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17151394#comment-17151394 ] AMC-team commented on HDFS-15438: - I added a patch to change the while loop condition to adapt that max disk errors = 0 > Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy > -- > > Key: HDFS-15438 > URL: https://issues.apache.org/jira/browse/HDFS-15438 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover >Reporter: AMC-team >Priority: Major > Attachments: HDFS-15438.000.patch > > > In HDFS disk balancer, the config parameter > "dfs.disk.balancer.max.disk.errors" is to control the value of maximum number > of errors we can ignore for a specific move between two disks before it is > abandoned. > The parameter can accept value that >= 0. And setting the value to 0 should > mean no error tolerance. However, setting the value to 0 will simply don't do > the block copy even there is no disk error occur because the while loop > condition *item.getErrorCount() < getMaxError(item)* will not satisfied. > {code:java} > // Gets the next block that we can copy > private ExtendedBlock getBlockToCopy(FsVolumeSpi.BlockIterator iter, > DiskBalancerWorkItem item) { > while (!iter.atEnd() && item.getErrorCount() < getMaxError(item)) { > try { > ... //get the block > } catch (IOException e) { > item.incErrorCount(); > } >if (item.getErrorCount() >= getMaxError(item)) { > item.setErrMsg("Error count exceeded."); > LOG.info("Maximum error count exceeded. Error count: {} Max error:{} > ", > item.getErrorCount(), item.getMaxDiskErrors()); > } > {code} > *How to fix* > Change the while loop condition to support value 0. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org