[jira] [Updated] (HDFS-13709) Report bad block to NN when transfer block encounter EIO exception

2019-08-19 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-13709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-13709:
---
Fix Version/s: 3.1.3
   3.2.1
   3.3.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Pushed 005 to trunk, branch-3.2 and branch-3.1.

There are conflicts cherry picking the commit into branch-2 and lower. 
[~zhangchen] if you are interested please provide a branch-2 patch.

Thanks [~zhangchen] and [~sodonnell]!

> Report bad block to NN when transfer block encounter EIO exception
> --
>
> Key: HDFS-13709
> URL: https://issues.apache.org/jira/browse/HDFS-13709
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Chen Zhang
>Assignee: Chen Zhang
>Priority: Major
> Fix For: 3.3.0, 3.2.1, 3.1.3
>
> Attachments: HDFS-13709.002.patch, HDFS-13709.003.patch, 
> HDFS-13709.004.patch, HDFS-13709.005.patch, HDFS-13709.patch
>
>
> In our online cluster, the BlockPoolSliceScanner is turned off, and sometimes 
> disk bad track may cause data loss.
> For example, there are 3 replicas on 3 machines A/B/C, if a bad track occurs 
> on A's replica data, and someday B and C crushed at the same time, NN will 
> try to replicate data from A but failed, this block is corrupt now but no one 
> knows, because NN think there is at least 1 healthy replica and it keep 
> trying to replicate it.
> When reading a replica which have data on bad track, OS will return an EIO 
> error, if DN reports the bad block as soon as it got an EIO,  we can find 
> this case ASAP and try to avoid data loss



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13709) Report bad block to NN when transfer block encounter EIO exception

2019-08-14 Thread Chen Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Zhang updated HDFS-13709:
--
Attachment: HDFS-13709.004.patch

> Report bad block to NN when transfer block encounter EIO exception
> --
>
> Key: HDFS-13709
> URL: https://issues.apache.org/jira/browse/HDFS-13709
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Chen Zhang
>Assignee: Chen Zhang
>Priority: Major
> Attachments: HDFS-13709.002.patch, HDFS-13709.003.patch, 
> HDFS-13709.004.patch, HDFS-13709.patch
>
>
> In our online cluster, the BlockPoolSliceScanner is turned off, and sometimes 
> disk bad track may cause data loss.
> For example, there are 3 replicas on 3 machines A/B/C, if a bad track occurs 
> on A's replica data, and someday B and C crushed at the same time, NN will 
> try to replicate data from A but failed, this block is corrupt now but no one 
> knows, because NN think there is at least 1 healthy replica and it keep 
> trying to replicate it.
> When reading a replica which have data on bad track, OS will return an EIO 
> error, if DN reports the bad block as soon as it got an EIO,  we can find 
> this case ASAP and try to avoid data loss



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13709) Report bad block to NN when transfer block encounter EIO exception

2019-08-14 Thread Chen Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Zhang updated HDFS-13709:
--
Attachment: HDFS-13709.003.patch

> Report bad block to NN when transfer block encounter EIO exception
> --
>
> Key: HDFS-13709
> URL: https://issues.apache.org/jira/browse/HDFS-13709
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Chen Zhang
>Assignee: Chen Zhang
>Priority: Major
> Attachments: HDFS-13709.002.patch, HDFS-13709.003.patch, 
> HDFS-13709.patch
>
>
> In our online cluster, the BlockPoolSliceScanner is turned off, and sometimes 
> disk bad track may cause data loss.
> For example, there are 3 replicas on 3 machines A/B/C, if a bad track occurs 
> on A's replica data, and someday B and C crushed at the same time, NN will 
> try to replicate data from A but failed, this block is corrupt now but no one 
> knows, because NN think there is at least 1 healthy replica and it keep 
> trying to replicate it.
> When reading a replica which have data on bad track, OS will return an EIO 
> error, if DN reports the bad block as soon as it got an EIO,  we can find 
> this case ASAP and try to avoid data loss



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13709) Report bad block to NN when transfer block encounter EIO exception

2019-08-12 Thread Chen Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Zhang updated HDFS-13709:
--
Attachment: HDFS-13709.002.patch

> Report bad block to NN when transfer block encounter EIO exception
> --
>
> Key: HDFS-13709
> URL: https://issues.apache.org/jira/browse/HDFS-13709
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Chen Zhang
>Assignee: Chen Zhang
>Priority: Major
> Attachments: HDFS-13709.002.patch, HDFS-13709.patch
>
>
> In our online cluster, the BlockPoolSliceScanner is turned off, and sometimes 
> disk bad track may cause data loss.
> For example, there are 3 replicas on 3 machines A/B/C, if a bad track occurs 
> on A's replica data, and someday B and C crushed at the same time, NN will 
> try to replicate data from A but failed, this block is corrupt now but no one 
> knows, because NN think there is at least 1 healthy replica and it keep 
> trying to replicate it.
> When reading a replica which have data on bad track, OS will return an EIO 
> error, if DN reports the bad block as soon as it got an EIO,  we can find 
> this case ASAP and try to avoid data loss



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13709) Report bad block to NN when transfer block encounter EIO exception

2018-07-01 Thread Chen Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Zhang updated HDFS-13709:
--
Status: Patch Available  (was: Open)

> Report bad block to NN when transfer block encounter EIO exception
> --
>
> Key: HDFS-13709
> URL: https://issues.apache.org/jira/browse/HDFS-13709
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Chen Zhang
>Assignee: Chen Zhang
>Priority: Major
> Attachments: HDFS-13709.patch
>
>
> In our online cluster, the BlockPoolSliceScanner is turned off, and sometimes 
> disk bad track may cause data loss.
> For example, there are 3 replicas on 3 machines A/B/C, if a bad track occurs 
> on A's replica data, and someday B and C crushed at the same time, NN will 
> try to replicate data from A but failed, this block is corrupt now but no one 
> knows, because NN think there is at least 1 healthy replica and it keep 
> trying to replicate it.
> When reading a replica which have data on bad track, OS will return an EIO 
> error, if DN reports the bad block as soon as it got an EIO,  we can find 
> this case ASAP and try to avoid data loss



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13709) Report bad block to NN when transfer block encounter EIO exception

2018-06-29 Thread Chen Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Zhang updated HDFS-13709:
--
Description: 
In our online cluster, the BlockPoolSliceScanner is turned off, and sometimes 
disk bad track may cause data loss.

For example, there are 3 replicas on 3 machines A/B/C, if a bad track occurs on 
A's replica data, and someday B and C crushed at the same time, NN will try to 
replicate data from A but failed, this block is corrupt now but no one knows, 
because NN think there is at least 1 healthy replica and it keep trying to 
replicate it.

When reading a replica which have data on bad track, OS will return an EIO 
error, if DN reports the bad block as soon as it got an EIO,  we can find this 
case ASAP and try to avoid data loss

  was:
In our online cluster, the BlockPoolSliceScanner is turned off, and sometimes 
disk bad track may cause data loss.

For example, there are 3 replicas on 3 machines A/B/C, if a bad track occurs on 
A's replica data, and someday B and C crushed at the same time, NN will try to 
replicate data from A but failed, this block is corrupt now but no one knows, 
because NN think there is at least 1 healthy replica and it keep trying to 
replicate it.

When reading a replica which hav data on bad track, OS will return an EIO 
error, if DN reports the bad block as soon as it got an EIO,  we can find this 
case ASAP and try to avoid data loss


> Report bad block to NN when transfer block encounter EIO exception
> --
>
> Key: HDFS-13709
> URL: https://issues.apache.org/jira/browse/HDFS-13709
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Chen Zhang
>Assignee: Chen Zhang
>Priority: Major
> Attachments: HDFS-13709.patch
>
>
> In our online cluster, the BlockPoolSliceScanner is turned off, and sometimes 
> disk bad track may cause data loss.
> For example, there are 3 replicas on 3 machines A/B/C, if a bad track occurs 
> on A's replica data, and someday B and C crushed at the same time, NN will 
> try to replicate data from A but failed, this block is corrupt now but no one 
> knows, because NN think there is at least 1 healthy replica and it keep 
> trying to replicate it.
> When reading a replica which have data on bad track, OS will return an EIO 
> error, if DN reports the bad block as soon as it got an EIO,  we can find 
> this case ASAP and try to avoid data loss



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13709) Report bad block to NN when transfer block encounter EIO exception

2018-06-29 Thread Chen Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Zhang updated HDFS-13709:
--
Attachment: HDFS-13709.patch

> Report bad block to NN when transfer block encounter EIO exception
> --
>
> Key: HDFS-13709
> URL: https://issues.apache.org/jira/browse/HDFS-13709
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Chen Zhang
>Assignee: Chen Zhang
>Priority: Major
> Attachments: HDFS-13709.patch
>
>
> In our online cluster, the BlockPoolSliceScanner is turned off, and sometimes 
> disk bad track may cause data loss.
> For example, there are 3 replicas on 3 machines A/B/C, if a bad track occurs 
> on A's replica data, and someday B and C crushed at the same time, NN will 
> try to replicate data from A but failed, this block is corrupt now but no one 
> knows, because NN think there is at least 1 healthy replica and it keep 
> trying to replicate it.
> When reading a replica which hav data on bad track, OS will return an EIO 
> error, if DN reports the bad block as soon as it got an EIO,  we can find 
> this case ASAP and try to avoid data loss



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13709) Report bad block to NN when transfer block encounter EIO exception

2018-06-29 Thread Chen Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Zhang updated HDFS-13709:
--
Description: 
In our online cluster, the BlockPoolSliceScanner is turned off, and sometimes 
disk bad track may cause data loss.

For example, there are 3 replicas on 3 machines A/B/C, if a bad track occurs on 
A's replica data, and someday B and C crushed at the same time, NN will try to 
replicate data from A but failed, this block is corrupt now but no one knows, 
because NN think there is at least 1 healthy replica and it keep trying to 
replicate it.

When reading a replica which hav data on bad track, OS will return an EIO 
error, if DN reports the bad block as soon as it got an EIO,  we can find this 
case ASAP and try to avoid data loss

> Report bad block to NN when transfer block encounter EIO exception
> --
>
> Key: HDFS-13709
> URL: https://issues.apache.org/jira/browse/HDFS-13709
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Chen Zhang
>Assignee: Chen Zhang
>Priority: Major
>
> In our online cluster, the BlockPoolSliceScanner is turned off, and sometimes 
> disk bad track may cause data loss.
> For example, there are 3 replicas on 3 machines A/B/C, if a bad track occurs 
> on A's replica data, and someday B and C crushed at the same time, NN will 
> try to replicate data from A but failed, this block is corrupt now but no one 
> knows, because NN think there is at least 1 healthy replica and it keep 
> trying to replicate it.
> When reading a replica which hav data on bad track, OS will return an EIO 
> error, if DN reports the bad block as soon as it got an EIO,  we can find 
> this case ASAP and try to avoid data loss



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org