[jira] [Updated] (HDFS-11960) Successfully closed files can stay under-replicated.

2023-05-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-11960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDFS-11960:
--
Labels: pull-request-available  (was: )

> Successfully closed files can stay under-replicated.
> 
>
> Key: HDFS-11960
> URL: https://issues.apache.org/jira/browse/HDFS-11960
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 2.9.0, 3.0.0-alpha4, 2.8.2
>
> Attachments: HDFS-11960-v2.branch-2.txt, HDFS-11960-v2.trunk.txt, 
> HDFS-11960.patch
>
>
> If a certain set of conditions hold at the time of a file creation, a block 
> of the file can stay under-replicated.  This is because the block is 
> mistakenly taken out of the under-replicated block queue and never gets 
> reevaluated.
> Re-evaluation can be triggered if
> - a replica containing node dies.
> - setrep is called
> - NN repl queues are reinitialized (NN failover or restart)
> If none of these happens, the block stays under-replicated. 
> Here is how it happens.
> 1) A replica is finalized, but the ACK does not reach the upstream in time. 
> IBR is also delayed.
> 2) A close recovery happens, which updates the gen stamp of "healthy" 
> replicas.
> 3) The file is closed with the healthy replicas. It is added to the 
> replication queue.
> 4) A replication is scheduled, so it is added to the pending replication 
> list. The replication target is picked as the failed node in 1).
> 5) The old IBR is finally received for the failed/excluded node. In the 
> meantime, the replication fails, because there is already a finalized replica 
> (with older gen stamp) on the node.
> 6) The IBR processing removes the block from the pending list, adds it to 
> corrupt replicas list, and then issues invalidation. Since the block is in 
> neither replication queue nor pending list, it stays under-replicated.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11960) Successfully closed files can stay under-replicated.

2017-06-20 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-11960:
--
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

> Successfully closed files can stay under-replicated.
> 
>
> Key: HDFS-11960
> URL: https://issues.apache.org/jira/browse/HDFS-11960
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Fix For: 2.9.0, 3.0.0-alpha4, 2.8.2
>
> Attachments: HDFS-11960.patch, HDFS-11960-v2.branch-2.txt, 
> HDFS-11960-v2.trunk.txt
>
>
> If a certain set of conditions hold at the time of a file creation, a block 
> of the file can stay under-replicated.  This is because the block is 
> mistakenly taken out of the under-replicated block queue and never gets 
> reevaluated.
> Re-evaluation can be triggered if
> - a replica containing node dies.
> - setrep is called
> - NN repl queues are reinitialized (NN failover or restart)
> If none of these happens, the block stays under-replicated. 
> Here is how it happens.
> 1) A replica is finalized, but the ACK does not reach the upstream in time. 
> IBR is also delayed.
> 2) A close recovery happens, which updates the gen stamp of "healthy" 
> replicas.
> 3) The file is closed with the healthy replicas. It is added to the 
> replication queue.
> 4) A replication is scheduled, so it is added to the pending replication 
> list. The replication target is picked as the failed node in 1).
> 5) The old IBR is finally received for the failed/excluded node. In the 
> meantime, the replication fails, because there is already a finalized replica 
> (with older gen stamp) on the node.
> 6) The IBR processing removes the block from the pending list, adds it to 
> corrupt replicas list, and then issues invalidation. Since the block is in 
> neither replication queue nor pending list, it stays under-replicated.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11960) Successfully closed files can stay under-replicated.

2017-06-20 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-11960:
--
Fix Version/s: 2.8.2
   3.0.0-alpha4
   2.9.0

> Successfully closed files can stay under-replicated.
> 
>
> Key: HDFS-11960
> URL: https://issues.apache.org/jira/browse/HDFS-11960
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Fix For: 2.9.0, 3.0.0-alpha4, 2.8.2
>
> Attachments: HDFS-11960.patch, HDFS-11960-v2.branch-2.txt, 
> HDFS-11960-v2.trunk.txt
>
>
> If a certain set of conditions hold at the time of a file creation, a block 
> of the file can stay under-replicated.  This is because the block is 
> mistakenly taken out of the under-replicated block queue and never gets 
> reevaluated.
> Re-evaluation can be triggered if
> - a replica containing node dies.
> - setrep is called
> - NN repl queues are reinitialized (NN failover or restart)
> If none of these happens, the block stays under-replicated. 
> Here is how it happens.
> 1) A replica is finalized, but the ACK does not reach the upstream in time. 
> IBR is also delayed.
> 2) A close recovery happens, which updates the gen stamp of "healthy" 
> replicas.
> 3) The file is closed with the healthy replicas. It is added to the 
> replication queue.
> 4) A replication is scheduled, so it is added to the pending replication 
> list. The replication target is picked as the failed node in 1).
> 5) The old IBR is finally received for the failed/excluded node. In the 
> meantime, the replication fails, because there is already a finalized replica 
> (with older gen stamp) on the node.
> 6) The IBR processing removes the block from the pending list, adds it to 
> corrupt replicas list, and then issues invalidation. Since the block is in 
> neither replication queue nor pending list, it stays under-replicated.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11960) Successfully closed files can stay under-replicated.

2017-06-19 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-11960:
--
Attachment: HDFS-11960-v2.branch-2.txt

The branch-2 patch is identical except for the name change from 
"Reconstruction" to "Replication".

> Successfully closed files can stay under-replicated.
> 
>
> Key: HDFS-11960
> URL: https://issues.apache.org/jira/browse/HDFS-11960
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Attachments: HDFS-11960.patch, HDFS-11960-v2.branch-2.txt, 
> HDFS-11960-v2.trunk.txt
>
>
> If a certain set of conditions hold at the time of a file creation, a block 
> of the file can stay under-replicated.  This is because the block is 
> mistakenly taken out of the under-replicated block queue and never gets 
> reevaluated.
> Re-evaluation can be triggered if
> - a replica containing node dies.
> - setrep is called
> - NN repl queues are reinitialized (NN failover or restart)
> If none of these happens, the block stays under-replicated. 
> Here is how it happens.
> 1) A replica is finalized, but the ACK does not reach the upstream in time. 
> IBR is also delayed.
> 2) A close recovery happens, which updates the gen stamp of "healthy" 
> replicas.
> 3) The file is closed with the healthy replicas. It is added to the 
> replication queue.
> 4) A replication is scheduled, so it is added to the pending replication 
> list. The replication target is picked as the failed node in 1).
> 5) The old IBR is finally received for the failed/excluded node. In the 
> meantime, the replication fails, because there is already a finalized replica 
> (with older gen stamp) on the node.
> 6) The IBR processing removes the block from the pending list, adds it to 
> corrupt replicas list, and then issues invalidation. Since the block is in 
> neither replication queue nor pending list, it stays under-replicated.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11960) Successfully closed files can stay under-replicated.

2017-06-19 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-11960:
--
Attachment: HDFS-11960-v2.trunk.txt

Added unit test.

> Successfully closed files can stay under-replicated.
> 
>
> Key: HDFS-11960
> URL: https://issues.apache.org/jira/browse/HDFS-11960
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Attachments: HDFS-11960.patch, HDFS-11960-v2.trunk.txt
>
>
> If a certain set of conditions hold at the time of a file creation, a block 
> of the file can stay under-replicated.  This is because the block is 
> mistakenly taken out of the under-replicated block queue and never gets 
> reevaluated.
> Re-evaluation can be triggered if
> - a replica containing node dies.
> - setrep is called
> - NN repl queues are reinitialized (NN failover or restart)
> If none of these happens, the block stays under-replicated. 
> Here is how it happens.
> 1) A replica is finalized, but the ACK does not reach the upstream in time. 
> IBR is also delayed.
> 2) A close recovery happens, which updates the gen stamp of "healthy" 
> replicas.
> 3) The file is closed with the healthy replicas. It is added to the 
> replication queue.
> 4) A replication is scheduled, so it is added to the pending replication 
> list. The replication target is picked as the failed node in 1).
> 5) The old IBR is finally received for the failed/excluded node. In the 
> meantime, the replication fails, because there is already a finalized replica 
> (with older gen stamp) on the node.
> 6) The IBR processing removes the block from the pending list, adds it to 
> corrupt replicas list, and then issues invalidation. Since the block is in 
> neither replication queue nor pending list, it stays under-replicated.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11960) Successfully closed files can stay under-replicated.

2017-06-09 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-11960:
--
Attachment: HDFS-11960.patch

> Successfully closed files can stay under-replicated.
> 
>
> Key: HDFS-11960
> URL: https://issues.apache.org/jira/browse/HDFS-11960
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Attachments: HDFS-11960.patch
>
>
> If a certain set of conditions hold at the time of a file creation, a block 
> of the file can stay under-replicated.  This is because the block is 
> mistakenly taken out of the under-replicated block queue and never gets 
> reevaluated.
> Re-evaluation can be triggered if
> - a replica containing node dies.
> - setrep is called
> - NN repl queues are reinitialized (NN failover or restart)
> If none of these happens, the block stays under-replicated. 
> Here is how it happens.
> 1) A replica is finalized, but the ACK does not reach the upstream in time. 
> IBR is also delayed.
> 2) A close recovery happens, which updates the gen stamp of "healthy" 
> replicas.
> 3) The file is closed with the healthy replicas. It is added to the 
> replication queue.
> 4) A replication is scheduled, so it is added to the pending replication 
> list. The replication target is picked as the failed node in 1).
> 5) The old IBR is finally received for the failed/excluded node. In the 
> meantime, the replication fails, because there is already a finalized replica 
> (with older gen stamp) on the node.
> 6) The IBR processing removes the block from the pending list, adds it to 
> corrupt replicas list, and then issues invalidation. Since the block is in 
> neither replication queue nor pending list, it stays under-replicated.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11960) Successfully closed files can stay under-replicated.

2017-06-09 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-11960:
--
Status: Patch Available  (was: Open)

> Successfully closed files can stay under-replicated.
> 
>
> Key: HDFS-11960
> URL: https://issues.apache.org/jira/browse/HDFS-11960
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Attachments: HDFS-11960.patch
>
>
> If a certain set of conditions hold at the time of a file creation, a block 
> of the file can stay under-replicated.  This is because the block is 
> mistakenly taken out of the under-replicated block queue and never gets 
> reevaluated.
> Re-evaluation can be triggered if
> - a replica containing node dies.
> - setrep is called
> - NN repl queues are reinitialized (NN failover or restart)
> If none of these happens, the block stays under-replicated. 
> Here is how it happens.
> 1) A replica is finalized, but the ACK does not reach the upstream in time. 
> IBR is also delayed.
> 2) A close recovery happens, which updates the gen stamp of "healthy" 
> replicas.
> 3) The file is closed with the healthy replicas. It is added to the 
> replication queue.
> 4) A replication is scheduled, so it is added to the pending replication 
> list. The replication target is picked as the failed node in 1).
> 5) The old IBR is finally received for the failed/excluded node. In the 
> meantime, the replication fails, because there is already a finalized replica 
> (with older gen stamp) on the node.
> 6) The IBR processing removes the block from the pending list, adds it to 
> corrupt replicas list, and then issues invalidation. Since the block is in 
> neither replication queue nor pending list, it stays under-replicated.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11960) Successfully closed files can stay under-replicated.

2017-06-09 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-11960:
--
Summary: Successfully closed files can stay under-replicated.  (was: 
Successfully closed file can stay under-replicated.)

> Successfully closed files can stay under-replicated.
> 
>
> Key: HDFS-11960
> URL: https://issues.apache.org/jira/browse/HDFS-11960
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
>
> If a certain set of conditions hold at the time of a file creation, a block 
> of the file can stay under-replicated.  This is because the block is 
> mistakenly taken out of the under-replicated block queue and never gets 
> reevaluated.
> Re-evaluation can be triggered if
> - a replica containing node dies.
> - setrep is called
> - NN repl queues are reinitialized (NN failover or restart)
> If none of these happens, the block stays under-replicated. 
> Here is how it happens.
> 1) A replica is finalized, but the ACK does not reach the upstream in time. 
> IBR is also delayed.
> 2) A close recovery happens, which updates the gen stamp of "healthy" 
> replicas.
> 3) The file is closed with the healthy replicas. It is added to the 
> replication queue.
> 4) A replication is scheduled, so it is added to the pending replication 
> list. The replication target is picked as the failed node in 1).
> 5) The old IBR is finally received for the failed/excluded node. In the 
> meantime, the replication fails, because there is already a finalized replica 
> (with older gen stamp) on the node.
> 6) The IBR processing removes the block from the pending list, adds it to 
> corrupt replicas list, and then issues invalidation. Since the block is in 
> neither replication queue nor pending list, it stays under-replicated.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org