[jira] [Commented] (HDFS-15438) Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy

2021-03-19 Thread Stephen O'Donnell (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17305227#comment-17305227
 ] 

Stephen O'Donnell commented on HDFS-15438:
--

This was a clean cherry-pick down to 3.1 and the TestDiskBalancer tests ran 
fine, so I backported to the lower branches too.

> Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy
> --
>
> Key: HDFS-15438
> URL: https://issues.apache.org/jira/browse/HDFS-15438
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
>Reporter: AMC-team
>Assignee: AMC-team
>Priority: Major
> Fix For: 3.3.1, 3.4.0, 3.1.5, 3.2.3
>
> Attachments: HDFS-15438.000.patch, HDFS-15438.001.patch, Screen Shot 
> 2020-09-03 at 4.33.53 PM.png
>
>
> In HDFS disk balancer, the config parameter 
> "dfs.disk.balancer.max.disk.errors" is to control the value of maximum number 
> of errors we can ignore for a specific move between two disks before it is 
> abandoned.
> The parameter can accept value that >= 0. And setting the value to 0 should 
> mean no error tolerance. However, setting the value to 0 will simply don't do 
> the block copy even there is no disk error occur because the while loop 
> condition *item.getErrorCount() < getMaxError(item)* will not satisfied.
> {code:java}
> // Gets the next block that we can copy
> private ExtendedBlock getBlockToCopy(FsVolumeSpi.BlockIterator iter,
>  DiskBalancerWorkItem item) {
>   while (!iter.atEnd() && item.getErrorCount() < getMaxError(item)) {
> try {
>   ... //get the block
> }  catch (IOException e) {
> item.incErrorCount();
> }
>if (item.getErrorCount() >= getMaxError(item)) {
> item.setErrMsg("Error count exceeded.");
> LOG.info("Maximum error count exceeded. Error count: {} Max error:{} 
> ",
> item.getErrorCount(), item.getMaxDiskErrors());
>   }
> {code}
> *How to fix*
> Change the while loop condition to support value 0.
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15438) Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy

2020-09-18 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17198305#comment-17198305
 ] 

Ayush Saxena commented on HDFS-15438:
-

Committed to trunk.

Thanx [~AMC-team]  for the contribution.

> Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy
> --
>
> Key: HDFS-15438
> URL: https://issues.apache.org/jira/browse/HDFS-15438
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
>Reporter: AMC-team
>Assignee: AMC-team
>Priority: Major
> Attachments: HDFS-15438.000.patch, HDFS-15438.001.patch, Screen Shot 
> 2020-09-03 at 4.33.53 PM.png
>
>
> In HDFS disk balancer, the config parameter 
> "dfs.disk.balancer.max.disk.errors" is to control the value of maximum number 
> of errors we can ignore for a specific move between two disks before it is 
> abandoned.
> The parameter can accept value that >= 0. And setting the value to 0 should 
> mean no error tolerance. However, setting the value to 0 will simply don't do 
> the block copy even there is no disk error occur because the while loop 
> condition *item.getErrorCount() < getMaxError(item)* will not satisfied.
> {code:java}
> // Gets the next block that we can copy
> private ExtendedBlock getBlockToCopy(FsVolumeSpi.BlockIterator iter,
>  DiskBalancerWorkItem item) {
>   while (!iter.atEnd() && item.getErrorCount() < getMaxError(item)) {
> try {
>   ... //get the block
> }  catch (IOException e) {
> item.incErrorCount();
> }
>if (item.getErrorCount() >= getMaxError(item)) {
> item.setErrMsg("Error count exceeded.");
> LOG.info("Maximum error count exceeded. Error count: {} Max error:{} 
> ",
> item.getErrorCount(), item.getMaxDiskErrors());
>   }
> {code}
> *How to fix*
> Change the while loop condition to support value 0.
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15438) Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy

2020-09-04 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17190978#comment-17190978
 ] 

Ayush Saxena commented on HDFS-15438:
-

+1

> Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy
> --
>
> Key: HDFS-15438
> URL: https://issues.apache.org/jira/browse/HDFS-15438
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
>Reporter: AMC-team
>Priority: Major
> Attachments: HDFS-15438.000.patch, HDFS-15438.001.patch, Screen Shot 
> 2020-09-03 at 4.33.53 PM.png
>
>
> In HDFS disk balancer, the config parameter 
> "dfs.disk.balancer.max.disk.errors" is to control the value of maximum number 
> of errors we can ignore for a specific move between two disks before it is 
> abandoned.
> The parameter can accept value that >= 0. And setting the value to 0 should 
> mean no error tolerance. However, setting the value to 0 will simply don't do 
> the block copy even there is no disk error occur because the while loop 
> condition *item.getErrorCount() < getMaxError(item)* will not satisfied.
> {code:java}
> // Gets the next block that we can copy
> private ExtendedBlock getBlockToCopy(FsVolumeSpi.BlockIterator iter,
>  DiskBalancerWorkItem item) {
>   while (!iter.atEnd() && item.getErrorCount() < getMaxError(item)) {
> try {
>   ... //get the block
> }  catch (IOException e) {
> item.incErrorCount();
> }
>if (item.getErrorCount() >= getMaxError(item)) {
> item.setErrMsg("Error count exceeded.");
> LOG.info("Maximum error count exceeded. Error count: {} Max error:{} 
> ",
> item.getErrorCount(), item.getMaxDiskErrors());
>   }
> {code}
> *How to fix*
> Change the while loop condition to support value 0.
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15438) Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy

2020-09-03 Thread AMC-team (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17189950#comment-17189950
 ] 

AMC-team commented on HDFS-15438:
-

[~ayushtkn]

Sorry for the late reply, I check the related tests and they passed.

!Screen Shot 2020-09-03 at 4.33.53 PM.png!

Please let me know if I need to check more failed tests

> Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy
> --
>
> Key: HDFS-15438
> URL: https://issues.apache.org/jira/browse/HDFS-15438
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
>Reporter: AMC-team
>Priority: Major
> Attachments: HDFS-15438.000.patch, HDFS-15438.001.patch, Screen Shot 
> 2020-09-03 at 4.33.53 PM.png
>
>
> In HDFS disk balancer, the config parameter 
> "dfs.disk.balancer.max.disk.errors" is to control the value of maximum number 
> of errors we can ignore for a specific move between two disks before it is 
> abandoned.
> The parameter can accept value that >= 0. And setting the value to 0 should 
> mean no error tolerance. However, setting the value to 0 will simply don't do 
> the block copy even there is no disk error occur because the while loop 
> condition *item.getErrorCount() < getMaxError(item)* will not satisfied.
> {code:java}
> // Gets the next block that we can copy
> private ExtendedBlock getBlockToCopy(FsVolumeSpi.BlockIterator iter,
>  DiskBalancerWorkItem item) {
>   while (!iter.atEnd() && item.getErrorCount() < getMaxError(item)) {
> try {
>   ... //get the block
> }  catch (IOException e) {
> item.incErrorCount();
> }
>if (item.getErrorCount() >= getMaxError(item)) {
> item.setErrMsg("Error count exceeded.");
> LOG.info("Maximum error count exceeded. Error count: {} Max error:{} 
> ",
> item.getErrorCount(), item.getMaxDiskErrors());
>   }
> {code}
> *How to fix*
> Change the while loop condition to support value 0.
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15438) Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy

2020-08-20 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17181417#comment-17181417
 ] 

Hadoop QA commented on HDFS-15438:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
13s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
57s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
16s{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
8s{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 49s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
48s{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
19s{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  2m 
59s{color} | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
55s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
7s{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
4s{color} | {color:green} the patch passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 35s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
18s{color} | {color:green} the patch passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
58s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}124m 50s{color} 
| {color:red} hadoop-hdfs in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
40s{color} | {color:green} The patch does not generate ASF License warnings.

[jira] [Commented] (HDFS-15438) Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy

2020-07-29 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17167085#comment-17167085
 ] 

Ayush Saxena commented on HDFS-15438:
-

Can you check the test failures, Couple of them seems related


> Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy
> --
>
> Key: HDFS-15438
> URL: https://issues.apache.org/jira/browse/HDFS-15438
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
>Reporter: AMC-team
>Priority: Major
> Attachments: HDFS-15438.000.patch, HDFS-15438.001.patch
>
>
> In HDFS disk balancer, the config parameter 
> "dfs.disk.balancer.max.disk.errors" is to control the value of maximum number 
> of errors we can ignore for a specific move between two disks before it is 
> abandoned.
> The parameter can accept value that >= 0. And setting the value to 0 should 
> mean no error tolerance. However, setting the value to 0 will simply don't do 
> the block copy even there is no disk error occur because the while loop 
> condition *item.getErrorCount() < getMaxError(item)* will not satisfied.
> {code:java}
> // Gets the next block that we can copy
> private ExtendedBlock getBlockToCopy(FsVolumeSpi.BlockIterator iter,
>  DiskBalancerWorkItem item) {
>   while (!iter.atEnd() && item.getErrorCount() < getMaxError(item)) {
> try {
>   ... //get the block
> }  catch (IOException e) {
> item.incErrorCount();
> }
>if (item.getErrorCount() >= getMaxError(item)) {
> item.setErrMsg("Error count exceeded.");
> LOG.info("Maximum error count exceeded. Error count: {} Max error:{} 
> ",
> item.getErrorCount(), item.getMaxDiskErrors());
>   }
> {code}
> *How to fix*
> Change the while loop condition to support value 0.
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15438) Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy

2020-07-26 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17165320#comment-17165320
 ] 

Hadoop QA commented on HDFS-15438:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  2m 
33s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
52s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
17m 35s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
44s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  3m 
31s{color} | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
28s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 16s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
35s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}132m 34s{color} 
| {color:red} hadoop-hdfs in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
44s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}209m 33s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.tools.TestHdfsConfigFields |
|   | hadoop.hdfs.server.namenode.TestNameNodeRetryCacheMetrics |
|   | hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA |
|   | hadoop.hdfs.server.namenode.TestAddOverReplicatedStripedBlocks |
|   | hadoop.hdfs.server.balancer.TestBalancer |
|   | hadoop.fs.contract.hdfs.TestHDFSContractMultipartUploader |
|   | hadoop.hdfs.TestGetFileChecksum |
|   | hadoop.hdfs.server.sps.TestExternalStoragePolicySatisfier |
|   | hadoop.hdfs.server.diskbalancer.command.TestDiskBalancerCommand |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | ClientAPI=1.40 ServerAPI=1.40 base: 
https://builds.apache.org/job/PreCommit-HDFS-Build/29565/artifact/out/Dockerfile
 |
| JIRA Issue | HDFS-15438 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13008384/HDFS-15438.001.patch |
| Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite 
unit shadedclient findbugs checkstyle |
| uname |

[jira] [Commented] (HDFS-15438) Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy

2020-07-25 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17164810#comment-17164810
 ] 

Hadoop QA commented on HDFS-15438:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red}  0m 
35s{color} | {color:red} Docker failed to build yetus/hadoop:cce5a6f6094. 
{color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-15438 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13008384/HDFS-15438.001.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/29564/console |
| versions | git=2.17.1 |
| Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |


This message was automatically generated.



> Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy
> --
>
> Key: HDFS-15438
> URL: https://issues.apache.org/jira/browse/HDFS-15438
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
>Reporter: AMC-team
>Priority: Major
> Attachments: HDFS-15438.000.patch, HDFS-15438.001.patch
>
>
> In HDFS disk balancer, the config parameter 
> "dfs.disk.balancer.max.disk.errors" is to control the value of maximum number 
> of errors we can ignore for a specific move between two disks before it is 
> abandoned.
> The parameter can accept value that >= 0. And setting the value to 0 should 
> mean no error tolerance. However, setting the value to 0 will simply don't do 
> the block copy even there is no disk error occur because the while loop 
> condition *item.getErrorCount() < getMaxError(item)* will not satisfied.
> {code:java}
> // Gets the next block that we can copy
> private ExtendedBlock getBlockToCopy(FsVolumeSpi.BlockIterator iter,
>  DiskBalancerWorkItem item) {
>   while (!iter.atEnd() && item.getErrorCount() < getMaxError(item)) {
> try {
>   ... //get the block
> }  catch (IOException e) {
> item.incErrorCount();
> }
>if (item.getErrorCount() >= getMaxError(item)) {
> item.setErrMsg("Error count exceeded.");
> LOG.info("Maximum error count exceeded. Error count: {} Max error:{} 
> ",
> item.getErrorCount(), item.getMaxDiskErrors());
>   }
> {code}
> *How to fix*
> Change the while loop condition to support value 0.
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15438) Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy

2020-07-24 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17164714#comment-17164714
 ] 

Hadoop QA commented on HDFS-15438:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red}  0m 
48s{color} | {color:red} Docker failed to build yetus/hadoop:cce5a6f6094. 
{color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-15438 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13008384/HDFS-15438.001.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/29559/console |
| versions | git=2.17.1 |
| Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |


This message was automatically generated.



> Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy
> --
>
> Key: HDFS-15438
> URL: https://issues.apache.org/jira/browse/HDFS-15438
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
>Reporter: AMC-team
>Priority: Major
> Attachments: HDFS-15438.000.patch, HDFS-15438.001.patch
>
>
> In HDFS disk balancer, the config parameter 
> "dfs.disk.balancer.max.disk.errors" is to control the value of maximum number 
> of errors we can ignore for a specific move between two disks before it is 
> abandoned.
> The parameter can accept value that >= 0. And setting the value to 0 should 
> mean no error tolerance. However, setting the value to 0 will simply don't do 
> the block copy even there is no disk error occur because the while loop 
> condition *item.getErrorCount() < getMaxError(item)* will not satisfied.
> {code:java}
> // Gets the next block that we can copy
> private ExtendedBlock getBlockToCopy(FsVolumeSpi.BlockIterator iter,
>  DiskBalancerWorkItem item) {
>   while (!iter.atEnd() && item.getErrorCount() < getMaxError(item)) {
> try {
>   ... //get the block
> }  catch (IOException e) {
> item.incErrorCount();
> }
>if (item.getErrorCount() >= getMaxError(item)) {
> item.setErrMsg("Error count exceeded.");
> LOG.info("Maximum error count exceeded. Error count: {} Max error:{} 
> ",
> item.getErrorCount(), item.getMaxDiskErrors());
>   }
> {code}
> *How to fix*
> Change the while loop condition to support value 0.
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15438) Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy

2020-07-24 Thread AMC-team (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17164713#comment-17164713
 ] 

AMC-team commented on HDFS-15438:
-

Thanks [~ayushtkn] for the feedback. I upload a patch to change the while loop 
condition and if condition to support value 0.

What's more, IMHO, the current code logic may be more intuitive. Previously if 
we set  dfs.disk.balancer.max.disk.errors to n, it can actually just tolerate 
n-1 errors. Now it can tolerate n errors, which is more consistent with the 
parameter's documentation:
{quote}During a block move from a source to destination disk, we might 
encounter various errors. *This defines how many errors we can tolerate* before 
we declare a move between 2 disks (or a step) has failed.
{quote}

> Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy
> --
>
> Key: HDFS-15438
> URL: https://issues.apache.org/jira/browse/HDFS-15438
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
>Reporter: AMC-team
>Priority: Major
> Attachments: HDFS-15438.000.patch, HDFS-15438.001.patch
>
>
> In HDFS disk balancer, the config parameter 
> "dfs.disk.balancer.max.disk.errors" is to control the value of maximum number 
> of errors we can ignore for a specific move between two disks before it is 
> abandoned.
> The parameter can accept value that >= 0. And setting the value to 0 should 
> mean no error tolerance. However, setting the value to 0 will simply don't do 
> the block copy even there is no disk error occur because the while loop 
> condition *item.getErrorCount() < getMaxError(item)* will not satisfied.
> {code:java}
> // Gets the next block that we can copy
> private ExtendedBlock getBlockToCopy(FsVolumeSpi.BlockIterator iter,
>  DiskBalancerWorkItem item) {
>   while (!iter.atEnd() && item.getErrorCount() < getMaxError(item)) {
> try {
>   ... //get the block
> }  catch (IOException e) {
> item.incErrorCount();
> }
>if (item.getErrorCount() >= getMaxError(item)) {
> item.setErrMsg("Error count exceeded.");
> LOG.info("Maximum error count exceeded. Error count: {} Max error:{} 
> ",
> item.getErrorCount(), item.getMaxDiskErrors());
>   }
> {code}
> *How to fix*
> Change the while loop condition to support value 0.
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15438) Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy

2020-07-24 Thread AMC-team (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17164690#comment-17164690
 ] 

AMC-team commented on HDFS-15438:
-

I upload a new patch based on [~ayushtkn]'s suggestion.

IMHO, the current logic may be better because previously if we set 
"dfs.disk.balancer.max.disk.errors" to n, it actually can just tolerate n-1 
errors because of the while loop condition. Now it can tolerate n errors, which 
is more consistent with the documentation:

{quote}During a block move from a source to destination disk, we might 
encounter various errors. This defines how many errors we can tolerate before 
we declare a move between 2 disks (or a step) has failed.{quote}

> Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy
> --
>
> Key: HDFS-15438
> URL: https://issues.apache.org/jira/browse/HDFS-15438
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
>Reporter: AMC-team
>Priority: Major
> Attachments: HDFS-15438.000.patch, HDFS-15438.001.patch
>
>
> In HDFS disk balancer, the config parameter 
> "dfs.disk.balancer.max.disk.errors" is to control the value of maximum number 
> of errors we can ignore for a specific move between two disks before it is 
> abandoned.
> The parameter can accept value that >= 0. And setting the value to 0 should 
> mean no error tolerance. However, setting the value to 0 will simply don't do 
> the block copy even there is no disk error occur because the while loop 
> condition *item.getErrorCount() < getMaxError(item)* will not satisfied.
> {code:java}
> // Gets the next block that we can copy
> private ExtendedBlock getBlockToCopy(FsVolumeSpi.BlockIterator iter,
>  DiskBalancerWorkItem item) {
>   while (!iter.atEnd() && item.getErrorCount() < getMaxError(item)) {
> try {
>   ... //get the block
> }  catch (IOException e) {
> item.incErrorCount();
> }
>if (item.getErrorCount() >= getMaxError(item)) {
> item.setErrMsg("Error count exceeded.");
> LOG.info("Maximum error count exceeded. Error count: {} Max error:{} 
> ",
> item.getErrorCount(), item.getMaxDiskErrors());
>   }
> {code}
> *How to fix*
> Change the while loop condition to support value 0.
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15438) Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy

2020-07-24 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17164689#comment-17164689
 ] 

Hadoop QA commented on HDFS-15438:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red}  0m 
25s{color} | {color:red} Docker failed to build yetus/hadoop:cce5a6f6094. 
{color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-15438 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13008378/HDFS-15438.001.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/29555/console |
| versions | git=2.17.1 |
| Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |


This message was automatically generated.



> Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy
> --
>
> Key: HDFS-15438
> URL: https://issues.apache.org/jira/browse/HDFS-15438
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
>Reporter: AMC-team
>Priority: Major
> Attachments: HDFS-15438.000.patch, HDFS-15438.001.patch
>
>
> In HDFS disk balancer, the config parameter 
> "dfs.disk.balancer.max.disk.errors" is to control the value of maximum number 
> of errors we can ignore for a specific move between two disks before it is 
> abandoned.
> The parameter can accept value that >= 0. And setting the value to 0 should 
> mean no error tolerance. However, setting the value to 0 will simply don't do 
> the block copy even there is no disk error occur because the while loop 
> condition *item.getErrorCount() < getMaxError(item)* will not satisfied.
> {code:java}
> // Gets the next block that we can copy
> private ExtendedBlock getBlockToCopy(FsVolumeSpi.BlockIterator iter,
>  DiskBalancerWorkItem item) {
>   while (!iter.atEnd() && item.getErrorCount() < getMaxError(item)) {
> try {
>   ... //get the block
> }  catch (IOException e) {
> item.incErrorCount();
> }
>if (item.getErrorCount() >= getMaxError(item)) {
> item.setErrMsg("Error count exceeded.");
> LOG.info("Maximum error count exceeded. Error count: {} Max error:{} 
> ",
> item.getErrorCount(), item.getMaxDiskErrors());
>   }
> {code}
> *How to fix*
> Change the while loop condition to support value 0.
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15438) Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy

2020-07-20 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17161369#comment-17161369
 ] 

Ayush Saxena commented on HDFS-15438:
-

Thanx [~AMC-team]  for the report.

even if you get away here, you may get stuck below at L926 :
{code:java}
  if (item.getErrorCount() >= getMaxError(item)) {
{code}
error count and max error both shall be zero and this condition shall become 
true and ultimately you would land up setting an error.

Instead of this :

{code:java}
+  (item.getErrorCount() == 0 || item.getErrorCount() < 
getMaxError(item))) {
{code}
Shouldn't we just have :
{code:java}
  item.getErrorCount() <= getMaxError(item) {
{code}

and even tweak the if condition at L926 to remove the '=' sign?

cc [~aengineer] you wrote this up, any pointers.

> Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy
> --
>
> Key: HDFS-15438
> URL: https://issues.apache.org/jira/browse/HDFS-15438
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
>Reporter: AMC-team
>Priority: Major
> Attachments: HDFS-15438.000.patch
>
>
> In HDFS disk balancer, the config parameter 
> "dfs.disk.balancer.max.disk.errors" is to control the value of maximum number 
> of errors we can ignore for a specific move between two disks before it is 
> abandoned.
> The parameter can accept value that >= 0. And setting the value to 0 should 
> mean no error tolerance. However, setting the value to 0 will simply don't do 
> the block copy even there is no disk error occur because the while loop 
> condition *item.getErrorCount() < getMaxError(item)* will not satisfied.
> {code:java}
> // Gets the next block that we can copy
> private ExtendedBlock getBlockToCopy(FsVolumeSpi.BlockIterator iter,
>  DiskBalancerWorkItem item) {
>   while (!iter.atEnd() && item.getErrorCount() < getMaxError(item)) {
> try {
>   ... //get the block
> }  catch (IOException e) {
> item.incErrorCount();
> }
>if (item.getErrorCount() >= getMaxError(item)) {
> item.setErrMsg("Error count exceeded.");
> LOG.info("Maximum error count exceeded. Error count: {} Max error:{} 
> ",
> item.getErrorCount(), item.getMaxDiskErrors());
>   }
> {code}
> *How to fix*
> Change the while loop condition to support value 0.
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15438) Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy

2020-07-17 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17159774#comment-17159774
 ] 

Hadoop QA commented on HDFS-15438:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 27m 
40s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
51s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
17m 40s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
46s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  3m 
23s{color} | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
21s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
11s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 44s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 1 new + 3 unchanged - 0 fixed = 4 total (was 3) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
18s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 41s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
26s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}133m 50s{color} 
| {color:red} hadoop-hdfs in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
49s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}236m 42s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.fs.viewfs.TestViewFileSystemLinkFallback |
|   | hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped |
|   | hadoop.hdfs.tools.TestDFSZKFailoverController |
|   | hadoop.hdfs.server.datanode.TestBPOfferService |
|   | hadoop.hdfs.TestMultipleNNPortQOP |
|   | hadoop.hdfs.server.namenode.TestReconstructStripedBlocks |
|   | hadoop.hdfs.web.TestWebHdfsWithMultipleNameNodes |
|   | hadoop.fs.contract.hdfs.TestHDFSContractMultipartUploader |
|   | hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl |
|   | hadoop.hdfs.TestGetFileChecksum |
|   | hadoop.hdfs.server.sps.TestExternalStoragePolicySatisfier |
\\
\\
|| Subsystem 

[jira] [Commented] (HDFS-15438) Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy

2020-07-04 Thread AMC-team (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17151394#comment-17151394
 ] 

AMC-team commented on HDFS-15438:
-

I added a patch to change the while loop condition to adapt that max disk 
errors = 0

> Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy
> --
>
> Key: HDFS-15438
> URL: https://issues.apache.org/jira/browse/HDFS-15438
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
>Reporter: AMC-team
>Priority: Major
> Attachments: HDFS-15438.000.patch
>
>
> In HDFS disk balancer, the config parameter 
> "dfs.disk.balancer.max.disk.errors" is to control the value of maximum number 
> of errors we can ignore for a specific move between two disks before it is 
> abandoned.
> The parameter can accept value that >= 0. And setting the value to 0 should 
> mean no error tolerance. However, setting the value to 0 will simply don't do 
> the block copy even there is no disk error occur because the while loop 
> condition *item.getErrorCount() < getMaxError(item)* will not satisfied.
> {code:java}
> // Gets the next block that we can copy
> private ExtendedBlock getBlockToCopy(FsVolumeSpi.BlockIterator iter,
>  DiskBalancerWorkItem item) {
>   while (!iter.atEnd() && item.getErrorCount() < getMaxError(item)) {
> try {
>   ... //get the block
> }  catch (IOException e) {
> item.incErrorCount();
> }
>if (item.getErrorCount() >= getMaxError(item)) {
> item.setErrMsg("Error count exceeded.");
> LOG.info("Maximum error count exceeded. Error count: {} Max error:{} 
> ",
> item.getErrorCount(), item.getMaxDiskErrors());
>   }
> {code}
> *How to fix*
> Change the while loop condition to support value 0.
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org