[jira] [Updated] (HDFS-12074) [branch-2] Add counters for block re-replication

2017-08-02 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-12074:
-
Resolution: Duplicate
Status: Resolved  (was: Patch Available)

Dup'ing against HDFS-12043 since it is committed.

> [branch-2] Add counters for block re-replication
> 
>
> Key: HDFS-12074
> URL: https://issues.apache.org/jira/browse/HDFS-12074
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Chen Liang
>Assignee: Chen Liang
> Attachments: HDFS-12043-branch-2.005.patch
>
>
> We occasionally see that the under-replicated block count is not going down 
> quickly enough. We've made at least one fix to speed up block replications 
> (HDFS-9205) but we need better insight into the current state and activity of 
> the block re-replication logic. For example, we need to understand whether is 
> it because re-replication is not making forward progress at all, or is it 
> because new under-replicated blocks are being added faster.
> We should include additional metrics:
> # Cumulative number of blocks that were successfully replicated. 
> # Cumulative number of re-replications that timed out.
> # Cumulative number of blocks that were dequeued for re-replication but not 
> scheduled e.g. because they were invalid, or under-construction or 
> replication was postponed.
>  
> The growth rate of of the above metrics will make it clear whether block 
> replication is making forward progress and if not then provide potential 
> clues about why it is stalled.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12074) [branch-2] Add counters for block re-replication

2017-06-30 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-12074:
-
Description: 
We occasionally see that the under-replicated block count is not going down 
quickly enough. We've made at least one fix to speed up block replications 
(HDFS-9205) but we need better insight into the current state and activity of 
the block re-replication logic. For example, we need to understand whether is 
it because re-replication is not making forward progress at all, or is it 
because new under-replicated blocks are being added faster.

We should include additional metrics:
# Cumulative number of blocks that were successfully replicated. 
# Cumulative number of re-replications that timed out.
# Cumulative number of blocks that were dequeued for re-replication but not 
scheduled e.g. because they were invalid, or under-construction or replication 
was postponed.
 
The growth rate of of the above metrics will make it clear whether block 
replication is making forward progress and if not then provide potential clues 
about why it is stalled.


  was:
We occasionally see that the under-replicated block count is not going down 
quickly enough. We've made at least one fix to speed up block replications 
(HDFS-9205) but we need better insight into the current state and activity of 
the block re-replication logic. For example, we need to understand whether is 
it because re-replication is not making forward progress at all, or is it 
because new under-replicated blocks are being added faster.

We should include additional metrics:
# Cumulative number of blocks that were successfully replicated. 
# Cumulative number of re-replications that timed out.
# Cumulative number of blocks that were dequeued for re-replication but not 
scheduled e.g. because they were invalid, or under-construction or replication 
was postponed.
 
The growth rate of of the above metrics will make it clear whether block 
replication is making forward progress and if not then provide potential clues 
about why it is stalled.

Thanks [~arpitagarwal] for the offline discussions.



> [branch-2] Add counters for block re-replication
> 
>
> Key: HDFS-12074
> URL: https://issues.apache.org/jira/browse/HDFS-12074
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Chen Liang
>Assignee: Chen Liang
> Attachments: HDFS-12043-branch-2.005.patch
>
>
> We occasionally see that the under-replicated block count is not going down 
> quickly enough. We've made at least one fix to speed up block replications 
> (HDFS-9205) but we need better insight into the current state and activity of 
> the block re-replication logic. For example, we need to understand whether is 
> it because re-replication is not making forward progress at all, or is it 
> because new under-replicated blocks are being added faster.
> We should include additional metrics:
> # Cumulative number of blocks that were successfully replicated. 
> # Cumulative number of re-replications that timed out.
> # Cumulative number of blocks that were dequeued for re-replication but not 
> scheduled e.g. because they were invalid, or under-construction or 
> replication was postponed.
>  
> The growth rate of of the above metrics will make it clear whether block 
> replication is making forward progress and if not then provide potential 
> clues about why it is stalled.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12074) [branch-2] Add counters for block re-replication

2017-06-30 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-12074:
-
Status: Patch Available  (was: Open)

> [branch-2] Add counters for block re-replication
> 
>
> Key: HDFS-12074
> URL: https://issues.apache.org/jira/browse/HDFS-12074
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Chen Liang
>Assignee: Chen Liang
> Attachments: HDFS-12043-branch-2.005.patch
>
>
> We occasionally see that the under-replicated block count is not going down 
> quickly enough. We've made at least one fix to speed up block replications 
> (HDFS-9205) but we need better insight into the current state and activity of 
> the block re-replication logic. For example, we need to understand whether is 
> it because re-replication is not making forward progress at all, or is it 
> because new under-replicated blocks are being added faster.
> We should include additional metrics:
> # Cumulative number of blocks that were successfully replicated. 
> # Cumulative number of re-replications that timed out.
> # Cumulative number of blocks that were dequeued for re-replication but not 
> scheduled e.g. because they were invalid, or under-construction or 
> replication was postponed.
>  
> The growth rate of of the above metrics will make it clear whether block 
> replication is making forward progress and if not then provide potential 
> clues about why it is stalled.
> Thanks [~arpitagarwal] for the offline discussions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12074) [branch-2] Add counters for block re-replication

2017-06-30 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-12074:
-
Attachment: HDFS-12043-branch-2.005.patch

> [branch-2] Add counters for block re-replication
> 
>
> Key: HDFS-12074
> URL: https://issues.apache.org/jira/browse/HDFS-12074
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Chen Liang
>Assignee: Chen Liang
> Attachments: HDFS-12043-branch-2.005.patch
>
>
> We occasionally see that the under-replicated block count is not going down 
> quickly enough. We've made at least one fix to speed up block replications 
> (HDFS-9205) but we need better insight into the current state and activity of 
> the block re-replication logic. For example, we need to understand whether is 
> it because re-replication is not making forward progress at all, or is it 
> because new under-replicated blocks are being added faster.
> We should include additional metrics:
> # Cumulative number of blocks that were successfully replicated. 
> # Cumulative number of re-replications that timed out.
> # Cumulative number of blocks that were dequeued for re-replication but not 
> scheduled e.g. because they were invalid, or under-construction or 
> replication was postponed.
>  
> The growth rate of of the above metrics will make it clear whether block 
> replication is making forward progress and if not then provide potential 
> clues about why it is stalled.
> Thanks [~arpitagarwal] for the offline discussions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12074) [branch-2] Add counters for block re-replication

2017-06-30 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-12074:
-
Target Version/s: 2.9.0

> [branch-2] Add counters for block re-replication
> 
>
> Key: HDFS-12074
> URL: https://issues.apache.org/jira/browse/HDFS-12074
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Chen Liang
>Assignee: Chen Liang
> Attachments: HDFS-12043-branch-2.005.patch
>
>
> We occasionally see that the under-replicated block count is not going down 
> quickly enough. We've made at least one fix to speed up block replications 
> (HDFS-9205) but we need better insight into the current state and activity of 
> the block re-replication logic. For example, we need to understand whether is 
> it because re-replication is not making forward progress at all, or is it 
> because new under-replicated blocks are being added faster.
> We should include additional metrics:
> # Cumulative number of blocks that were successfully replicated. 
> # Cumulative number of re-replications that timed out.
> # Cumulative number of blocks that were dequeued for re-replication but not 
> scheduled e.g. because they were invalid, or under-construction or 
> replication was postponed.
>  
> The growth rate of of the above metrics will make it clear whether block 
> replication is making forward progress and if not then provide potential 
> clues about why it is stalled.
> Thanks [~arpitagarwal] for the offline discussions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12074) [branch-2] Add counters for block re-replication

2017-06-30 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-12074:
-
Hadoop Flags:   (was: Reviewed)

> [branch-2] Add counters for block re-replication
> 
>
> Key: HDFS-12074
> URL: https://issues.apache.org/jira/browse/HDFS-12074
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Chen Liang
>Assignee: Chen Liang
> Attachments: HDFS-12043-branch-2.005.patch
>
>
> We occasionally see that the under-replicated block count is not going down 
> quickly enough. We've made at least one fix to speed up block replications 
> (HDFS-9205) but we need better insight into the current state and activity of 
> the block re-replication logic. For example, we need to understand whether is 
> it because re-replication is not making forward progress at all, or is it 
> because new under-replicated blocks are being added faster.
> We should include additional metrics:
> # Cumulative number of blocks that were successfully replicated. 
> # Cumulative number of re-replications that timed out.
> # Cumulative number of blocks that were dequeued for re-replication but not 
> scheduled e.g. because they were invalid, or under-construction or 
> replication was postponed.
>  
> The growth rate of of the above metrics will make it clear whether block 
> replication is making forward progress and if not then provide potential 
> clues about why it is stalled.
> Thanks [~arpitagarwal] for the offline discussions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12074) [branch-2] Add counters for block re-replication

2017-06-30 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-12074:
-
Fix Version/s: (was: 3.0.0-alpha4)

> [branch-2] Add counters for block re-replication
> 
>
> Key: HDFS-12074
> URL: https://issues.apache.org/jira/browse/HDFS-12074
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Chen Liang
>Assignee: Chen Liang
>
> We occasionally see that the under-replicated block count is not going down 
> quickly enough. We've made at least one fix to speed up block replications 
> (HDFS-9205) but we need better insight into the current state and activity of 
> the block re-replication logic. For example, we need to understand whether is 
> it because re-replication is not making forward progress at all, or is it 
> because new under-replicated blocks are being added faster.
> We should include additional metrics:
> # Cumulative number of blocks that were successfully replicated. 
> # Cumulative number of re-replications that timed out.
> # Cumulative number of blocks that were dequeued for re-replication but not 
> scheduled e.g. because they were invalid, or under-construction or 
> replication was postponed.
>  
> The growth rate of of the above metrics will make it clear whether block 
> replication is making forward progress and if not then provide potential 
> clues about why it is stalled.
> Thanks [~arpitagarwal] for the offline discussions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org