[ 
https://issues.apache.org/jira/browse/HADOOP-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur updated HADOOP-1079:
-------------------------------------

    Attachment: blockReportPeriod.patch

Here is a sample patch that increases the blockReport periodicity from 1 hour 
to 1 day. It also causes a blockReport to be sent after a failed heartbeat.

I would like some comments/feedback on this approach.

> DFS Scalability: optimize processing time of block reports
> ----------------------------------------------------------
>
>                 Key: HADOOP-1079
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1079
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>            Reporter: dhruba borthakur
>         Attachments: blockReportPeriod.patch
>
>
> I have a cluster that has 1800 datanodes. Each datanode has around 50000 
> blocks and sends a block report to the namenode once every hour. This means 
> that the namenode processes a block report once every 2 seconds. Each block 
> report contains all blocks that the datanode currently hosts. This makes the 
> namenode compare a huge number of blocks that practically remains the same 
> between two consecutive reports. This wastes CPU on the namenode.
> The problem becomes worse when the number of datanodes increases.
> One proposal is to make succeeding block reports (after a successful send of 
> a full block report) be incremental. This will make the namenode process only 
> those blocks that were added/deleted in the last period.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to