[jira] [Updated] (HDFS-12043) Add counters for block re-replication

2017-08-31 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-12043:
---
Fix Version/s: (was: 3.0.0-alpha4)
   3.0.0-beta1

> Add counters for block re-replication
> -
>
> Key: HDFS-12043
> URL: https://issues.apache.org/jira/browse/HDFS-12043
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Chen Liang
>Assignee: Chen Liang
> Fix For: 2.9.0, 3.0.0-beta1
>
> Attachments: HDFS-12043.001.patch, HDFS-12043.002.patch, 
> HDFS-12043.003.patch, HDFS-12043.004.patch, HDFS-12043-branch-2.005.patch
>
>
> We occasionally see that the under-replicated block count is not going down 
> quickly enough. We've made at least one fix to speed up block replications 
> (HDFS-9205) but we need better insight into the current state and activity of 
> the block re-replication logic. For example, we need to understand whether is 
> it because re-replication is not making forward progress at all, or is it 
> because new under-replicated blocks are being added faster.
> We should include additional metrics:
> # Cumulative number of blocks that were successfully replicated. 
> # Cumulative number of re-replications that timed out.
> # Cumulative number of blocks that were dequeued for re-replication but not 
> scheduled e.g. because they were invalid, or under-construction or 
> replication was postponed.
>  
> The growth rate of of the above metrics will make it clear whether block 
> replication is making forward progress and if not then provide potential 
> clues about why it is stalled.
> Thanks [~arpitagarwal] for the offline discussions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12043) Add counters for block re-replication

2017-08-02 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-12043:
-
Fix Version/s: 2.9.0

Committed to branch-2. Thanks for the backport [~vagarychen].

> Add counters for block re-replication
> -
>
> Key: HDFS-12043
> URL: https://issues.apache.org/jira/browse/HDFS-12043
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Chen Liang
>Assignee: Chen Liang
> Fix For: 2.9.0, 3.0.0-alpha4
>
> Attachments: HDFS-12043.001.patch, HDFS-12043.002.patch, 
> HDFS-12043.003.patch, HDFS-12043.004.patch, HDFS-12043-branch-2.005.patch
>
>
> We occasionally see that the under-replicated block count is not going down 
> quickly enough. We've made at least one fix to speed up block replications 
> (HDFS-9205) but we need better insight into the current state and activity of 
> the block re-replication logic. For example, we need to understand whether is 
> it because re-replication is not making forward progress at all, or is it 
> because new under-replicated blocks are being added faster.
> We should include additional metrics:
> # Cumulative number of blocks that were successfully replicated. 
> # Cumulative number of re-replications that timed out.
> # Cumulative number of blocks that were dequeued for re-replication but not 
> scheduled e.g. because they were invalid, or under-construction or 
> replication was postponed.
>  
> The growth rate of of the above metrics will make it clear whether block 
> replication is making forward progress and if not then provide potential 
> clues about why it is stalled.
> Thanks [~arpitagarwal] for the offline discussions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12043) Add counters for block re-replication

2017-06-30 Thread Chen Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-12043:
--
Attachment: HDFS-12043-branch-2.005.patch

Post v005 patch for branch-2

> Add counters for block re-replication
> -
>
> Key: HDFS-12043
> URL: https://issues.apache.org/jira/browse/HDFS-12043
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Chen Liang
>Assignee: Chen Liang
> Fix For: 3.0.0-alpha4
>
> Attachments: HDFS-12043.001.patch, HDFS-12043.002.patch, 
> HDFS-12043.003.patch, HDFS-12043.004.patch, HDFS-12043-branch-2.005.patch
>
>
> We occasionally see that the under-replicated block count is not going down 
> quickly enough. We've made at least one fix to speed up block replications 
> (HDFS-9205) but we need better insight into the current state and activity of 
> the block re-replication logic. For example, we need to understand whether is 
> it because re-replication is not making forward progress at all, or is it 
> because new under-replicated blocks are being added faster.
> We should include additional metrics:
> # Cumulative number of blocks that were successfully replicated. 
> # Cumulative number of re-replications that timed out.
> # Cumulative number of blocks that were dequeued for re-replication but not 
> scheduled e.g. because they were invalid, or under-construction or 
> replication was postponed.
>  
> The growth rate of of the above metrics will make it clear whether block 
> replication is making forward progress and if not then provide potential 
> clues about why it is stalled.
> Thanks [~arpitagarwal] for the offline discussions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12043) Add counters for block re-replication

2017-06-29 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-12043:
-
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.0.0-alpha4
   Status: Resolved  (was: Patch Available)

Committed to trunk, thanks for the contribution [~vagarychen]!

> Add counters for block re-replication
> -
>
> Key: HDFS-12043
> URL: https://issues.apache.org/jira/browse/HDFS-12043
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Chen Liang
>Assignee: Chen Liang
> Fix For: 3.0.0-alpha4
>
> Attachments: HDFS-12043.001.patch, HDFS-12043.002.patch, 
> HDFS-12043.003.patch, HDFS-12043.004.patch
>
>
> We occasionally see that the under-replicated block count is not going down 
> quickly enough. We've made at least one fix to speed up block replications 
> (HDFS-9205) but we need better insight into the current state and activity of 
> the block re-replication logic. For example, we need to understand whether is 
> it because re-replication is not making forward progress at all, or is it 
> because new under-replicated blocks are being added faster.
> We should include additional metrics:
> # Cumulative number of blocks that were successfully replicated. 
> # Cumulative number of re-replications that timed out.
> # Cumulative number of blocks that were dequeued for re-replication but not 
> scheduled e.g. because they were invalid, or under-construction or 
> replication was postponed.
>  
> The growth rate of of the above metrics will make it clear whether block 
> replication is making forward progress and if not then provide potential 
> clues about why it is stalled.
> Thanks [~arpitagarwal] for the offline discussions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12043) Add counters for block re-replication

2017-06-29 Thread Chen Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-12043:
--
Attachment: HDFS-12043.004.patch

Thanks [~arpitagarwal] who pointed out offline that the use of thread.sleep() 
in the test can be reliable. Post v004 patch to use 
{{GenericTestUtils.waitFor()}} instead.

> Add counters for block re-replication
> -
>
> Key: HDFS-12043
> URL: https://issues.apache.org/jira/browse/HDFS-12043
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Chen Liang
>Assignee: Chen Liang
> Attachments: HDFS-12043.001.patch, HDFS-12043.002.patch, 
> HDFS-12043.003.patch, HDFS-12043.004.patch
>
>
> We occasionally see that the under-replicated block count is not going down 
> quickly enough. We've made at least one fix to speed up block replications 
> (HDFS-9205) but we need better insight into the current state and activity of 
> the block re-replication logic. For example, we need to understand whether is 
> it because re-replication is not making forward progress at all, or is it 
> because new under-replicated blocks are being added faster.
> We should include additional metrics:
> # Cumulative number of blocks that were successfully replicated. 
> # Cumulative number of re-replications that timed out.
> # Cumulative number of blocks that were dequeued for re-replication but not 
> scheduled e.g. because they were invalid, or under-construction or 
> replication was postponed.
>  
> The growth rate of of the above metrics will make it clear whether block 
> replication is making forward progress and if not then provide potential 
> clues about why it is stalled.
> Thanks [~arpitagarwal] for the offline discussions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12043) Add counters for block re-replication

2017-06-29 Thread Chen Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-12043:
--
Attachment: HDFS-12043.003.patch

Thanks [~arpitagarwal] for the comments! Post v003 patch to rename the metrics 
and added to {{if (pendingNum > 0)}} check.

> Add counters for block re-replication
> -
>
> Key: HDFS-12043
> URL: https://issues.apache.org/jira/browse/HDFS-12043
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Chen Liang
>Assignee: Chen Liang
> Attachments: HDFS-12043.001.patch, HDFS-12043.002.patch, 
> HDFS-12043.003.patch
>
>
> We occasionally see that the under-replicated block count is not going down 
> quickly enough. We've made at least one fix to speed up block replications 
> (HDFS-9205) but we need better insight into the current state and activity of 
> the block re-replication logic. For example, we need to understand whether is 
> it because re-replication is not making forward progress at all, or is it 
> because new under-replicated blocks are being added faster.
> We should include additional metrics:
> # Cumulative number of blocks that were successfully replicated. 
> # Cumulative number of re-replications that timed out.
> # Cumulative number of blocks that were dequeued for re-replication but not 
> scheduled e.g. because they were invalid, or under-construction or 
> replication was postponed.
>  
> The growth rate of of the above metrics will make it clear whether block 
> replication is making forward progress and if not then provide potential 
> clues about why it is stalled.
> Thanks [~arpitagarwal] for the offline discussions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12043) Add counters for block re-replication

2017-06-28 Thread Chen Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-12043:
--
Attachment: HDFS-12043.002.patch

Thanks [~arpitagarwal] for the comments! Post v002 patch.

The other comments are addressed. Regarding the {{if (pendingNum > 0)}} branch, 
my understanding is that this means there are unfinished replication going on, 
not necessarily failed re-replication. It could still finish successfully, also 
it may timeout and counted by the other timeout counter. What do you think?

Also in v002 patch, changed the place of incrementing timeout re-replication to 
the place where it gets detected in {{PendingReconstructionBlocks}}'s thread. 
v001 patch actually delays the increment by calling in {{BlockManager}}'s 
thread.


> Add counters for block re-replication
> -
>
> Key: HDFS-12043
> URL: https://issues.apache.org/jira/browse/HDFS-12043
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Chen Liang
>Assignee: Chen Liang
> Attachments: HDFS-12043.001.patch, HDFS-12043.002.patch
>
>
> We occasionally see that the under-replicated block count is not going down 
> quickly enough. We've made at least one fix to speed up block replications 
> (HDFS-9205) but we need better insight into the current state and activity of 
> the block re-replication logic. For example, we need to understand whether is 
> it because re-replication is not making forward progress at all, or is it 
> because new under-replicated blocks are being added faster.
> We should include additional metrics:
> # Cumulative number of blocks that were successfully replicated. 
> # Cumulative number of re-replications that timed out.
> # Cumulative number of blocks that were dequeued for re-replication but not 
> scheduled e.g. because they were invalid, or under-construction or 
> replication was postponed.
>  
> The growth rate of of the above metrics will make it clear whether block 
> replication is making forward progress and if not then provide potential 
> clues about why it is stalled.
> Thanks [~arpitagarwal] for the offline discussions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12043) Add counters for block re-replication

2017-06-28 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-12043:
-
Status: Open  (was: Patch Available)

> Add counters for block re-replication
> -
>
> Key: HDFS-12043
> URL: https://issues.apache.org/jira/browse/HDFS-12043
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Chen Liang
>Assignee: Chen Liang
> Attachments: HDFS-12043.001.patch
>
>
> We occasionally see that the under-replicated block count is not going down 
> quickly enough. We've made at least one fix to speed up block replications 
> (HDFS-9205) but we need better insight into the current state and activity of 
> the block re-replication logic. For example, we need to understand whether is 
> it because re-replication is not making forward progress at all, or is it 
> because new under-replicated blocks are being added faster.
> We should include additional metrics:
> # Cumulative number of blocks that were successfully replicated. 
> # Cumulative number of re-replications that timed out.
> # Cumulative number of blocks that were dequeued for re-replication but not 
> scheduled e.g. because they were invalid, or under-construction or 
> replication was postponed.
>  
> The growth rate of of the above metrics will make it clear whether block 
> replication is making forward progress and if not then provide potential 
> clues about why it is stalled.
> Thanks [~arpitagarwal] for the offline discussions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12043) Add counters for block re-replication

2017-06-28 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-12043:
-
Status: Patch Available  (was: Open)

> Add counters for block re-replication
> -
>
> Key: HDFS-12043
> URL: https://issues.apache.org/jira/browse/HDFS-12043
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Chen Liang
>Assignee: Chen Liang
> Attachments: HDFS-12043.001.patch
>
>
> We occasionally see that the under-replicated block count is not going down 
> quickly enough. We've made at least one fix to speed up block replications 
> (HDFS-9205) but we need better insight into the current state and activity of 
> the block re-replication logic. For example, we need to understand whether is 
> it because re-replication is not making forward progress at all, or is it 
> because new under-replicated blocks are being added faster.
> We should include additional metrics:
> # Cumulative number of blocks that were successfully replicated. 
> # Cumulative number of re-replications that timed out.
> # Cumulative number of blocks that were dequeued for re-replication but not 
> scheduled e.g. because they were invalid, or under-construction or 
> replication was postponed.
>  
> The growth rate of of the above metrics will make it clear whether block 
> replication is making forward progress and if not then provide potential 
> clues about why it is stalled.
> Thanks [~arpitagarwal] for the offline discussions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12043) Add counters for block re-replication

2017-06-27 Thread Chen Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-12043:
--
Status: Patch Available  (was: Open)

> Add counters for block re-replication
> -
>
> Key: HDFS-12043
> URL: https://issues.apache.org/jira/browse/HDFS-12043
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Chen Liang
>Assignee: Chen Liang
> Attachments: HDFS-12043.001.patch
>
>
> We occasionally see that the under-replicated block count is not going down 
> quickly enough. We've made at least one fix to speed up block replications 
> (HDFS-9205) but we need better insight into the current state and activity of 
> the block re-replication logic. For example, we need to understand whether is 
> it because re-replication is not making forward progress at all, or is it 
> because new under-replicated blocks are being added faster.
> We should include additional metrics:
> # Cumulative number of blocks that were successfully replicated. 
> # Cumulative number of re-replications that timed out.
> # Cumulative number of blocks that were dequeued for re-replication but not 
> scheduled e.g. because they were invalid, or under-construction or 
> replication was postponed.
>  
> The growth rate of of the above metrics will make it clear whether block 
> replication is making forward progress and if not then provide potential 
> clues about why it is stalled.
> Thanks [~arpitagarwal] for the offline discussions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12043) Add counters for block re-replication

2017-06-27 Thread Chen Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-12043:
--
Attachment: HDFS-12043.001.patch

Post v001 patch.

> Add counters for block re-replication
> -
>
> Key: HDFS-12043
> URL: https://issues.apache.org/jira/browse/HDFS-12043
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Chen Liang
>Assignee: Chen Liang
> Attachments: HDFS-12043.001.patch
>
>
> We occasionally see that the under-replicated block count is not going down 
> quickly enough. We've made at least one fix to speed up block replications 
> (HDFS-9205) but we need better insight into the current state and activity of 
> the block re-replication logic. For example, we need to understand whether is 
> it because re-replication is not making forward progress at all, or is it 
> because new under-replicated blocks are being added faster.
> We should include additional metrics:
> # Cumulative number of blocks that were successfully replicated. 
> # Cumulative number of re-replications that timed out.
> # Cumulative number of blocks that were dequeued for re-replication but not 
> scheduled e.g. because they were invalid, or under-construction or 
> replication was postponed.
>  
> The growth rate of of the above metrics will make it clear whether block 
> replication is making forward progress and if not then provide potential 
> clues about why it is stalled.
> Thanks [~arpitagarwal] for the offline discussions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org