[jira] Commented: (HDFS-86) Corrupted blocks get deleted but not replicated

2010-08-19 Thread Hairong Kuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-86?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900357#action_12900357
 ] 

Hairong Kuang commented on HDFS-86:
---

This jira is too old. It should be closed.

Now HDFS has a different policy with corrupt replicas. A corrupt replica does 
not get deleted until a good replica gets replicated. 

The problem you have is caused by the 2-node cluster. Because it does not an 
extra node to place the good replica, the corrupt one never gets deleted. If 
you add one more node to the cluster, the problem will go away. 

> Corrupted blocks get deleted but not replicated
> ---
>
> Key: HDFS-86
> URL: https://issues.apache.org/jira/browse/HDFS-86
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Hairong Kuang
>Assignee: Hairong Kuang
> Attachments: blockInvalidate.patch
>
>
> When I test the patch to HADOOP-1345 on a two node dfs cluster, I see that 
> dfs correctly delete the corrupted replica and successfully retry reading 
> from the other correct replica, but the block does not get replicated. The 
> block remains with only 1 replica until the next block report comes in.
> In my testcase, since the dfs cluster has only 2 datanodes, the target of 
> replication is the same as the target of block invalidation.  After poking 
> the logs, I found out that the namenode sent the replication request before 
> the block invalidation request. 
> This is because the namenode does not invalidate a block well. In 
> FSNamesystem.invalidateBlock, it first puts the invalidate request in a queue 
> and then immediately removes the replica from its state, which triggers the 
> choosing a target for the block. When requests are sent back to the target 
> datanode as a reply to a heartbeat message, the replication requests have 
> higher priority than the invalidate requests.
> This problem could be solved if a namenode removes an invalidated replica 
> from its state only after the invalidate request is sent to the datanode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-86) Corrupted blocks get deleted but not replicated

2010-08-19 Thread Thanh Do (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-86?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900342#action_12900342
 ] 

Thanh Do commented on HDFS-86:
--

i have a cluster of two nodes. Say a block with 2 replicas, and one of them get 
corrupted.
The corrupted block is reported to NN, but it is never deleted or replicated, 
even after NN restarts.
Not sure this is a bug or just a policy.
I am playing the append-trunk

> Corrupted blocks get deleted but not replicated
> ---
>
> Key: HDFS-86
> URL: https://issues.apache.org/jira/browse/HDFS-86
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Hairong Kuang
>Assignee: Hairong Kuang
> Attachments: blockInvalidate.patch
>
>
> When I test the patch to HADOOP-1345 on a two node dfs cluster, I see that 
> dfs correctly delete the corrupted replica and successfully retry reading 
> from the other correct replica, but the block does not get replicated. The 
> block remains with only 1 replica until the next block report comes in.
> In my testcase, since the dfs cluster has only 2 datanodes, the target of 
> replication is the same as the target of block invalidation.  After poking 
> the logs, I found out that the namenode sent the replication request before 
> the block invalidation request. 
> This is because the namenode does not invalidate a block well. In 
> FSNamesystem.invalidateBlock, it first puts the invalidate request in a queue 
> and then immediately removes the replica from its state, which triggers the 
> choosing a target for the block. When requests are sent back to the target 
> datanode as a reply to a heartbeat message, the replication requests have 
> higher priority than the invalidate requests.
> This problem could be solved if a namenode removes an invalidated replica 
> from its state only after the invalidate request is sent to the datanode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.