[jira] [Updated] (HDFS-11015) Enforce timeout in balancer

2016-10-25 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated HDFS-11015:
-
   Resolution: Fixed
Fix Version/s: 2.7.4
   Status: Resolved  (was: Patch Available)

Committed to branch-2.7 as well. Thanks [~kihwal] for the nice work!

> Enforce timeout in balancer
> ---
>
> Key: HDFS-11015
> URL: https://issues.apache.org/jira/browse/HDFS-11015
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
> Fix For: 2.8.0, 2.7.4, 3.0.0-alpha2
>
> Attachments: HDFS-11015-1.patch, HDFS-11015-2.patch, 
> HDFS-11015-3.patch, balancer.png
>
>
> 1) Hung node detection: HDFS-6247 has removed the socket read timeout while 
> adding the periodic response for slow block moves. However, the removal of 
> the long timeout wasn't necessary.  The timeout is still useful for avoiding 
> hung nodes and does not abort slow moves.
> 2) Enforcing the iteration limit:The 20 minute iteration limit is supposed to 
> be enforced, but it is not. An iteration can easily stretch to 30 to 40 
> minutes with a long tail. Because of the long tails, the balancer throughput 
> does not reach its full potential.
> 3) Slow move detection: For improved throughput, imposing block move timeout 
> is sometimes necessary.  We have seen an iteration taking over 2 hours 
> because of one slow block move.  This is mainly for catching exceptionally 
> slow moves.  Even if the balancer stops waiting, the move will continue and 
> finish.
> In order to not undo what  HDFS-6247 tried to achieve, it should be possible 
> to configure off 3).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11015) Enforce timeout in balancer

2016-10-25 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated HDFS-11015:
-
Fix Version/s: 2.8.0

> Enforce timeout in balancer
> ---
>
> Key: HDFS-11015
> URL: https://issues.apache.org/jira/browse/HDFS-11015
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
> Fix For: 2.8.0, 3.0.0-alpha2
>
> Attachments: HDFS-11015-1.patch, HDFS-11015-2.patch, 
> HDFS-11015-3.patch, balancer.png
>
>
> 1) Hung node detection: HDFS-6247 has removed the socket read timeout while 
> adding the periodic response for slow block moves. However, the removal of 
> the long timeout wasn't necessary.  The timeout is still useful for avoiding 
> hung nodes and does not abort slow moves.
> 2) Enforcing the iteration limit:The 20 minute iteration limit is supposed to 
> be enforced, but it is not. An iteration can easily stretch to 30 to 40 
> minutes with a long tail. Because of the long tails, the balancer throughput 
> does not reach its full potential.
> 3) Slow move detection: For improved throughput, imposing block move timeout 
> is sometimes necessary.  We have seen an iteration taking over 2 hours 
> because of one slow block move.  This is mainly for catching exceptionally 
> slow moves.  Even if the balancer stops waiting, the move will continue and 
> finish.
> In order to not undo what  HDFS-6247 tried to achieve, it should be possible 
> to configure off 3).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11015) Enforce timeout in balancer

2016-10-25 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated HDFS-11015:
-
 Hadoop Flags: Reviewed
Fix Version/s: 3.0.0-alpha2
  Component/s: balancer & mover

> Enforce timeout in balancer
> ---
>
> Key: HDFS-11015
> URL: https://issues.apache.org/jira/browse/HDFS-11015
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
> Fix For: 3.0.0-alpha2
>
> Attachments: HDFS-11015-1.patch, HDFS-11015-2.patch, 
> HDFS-11015-3.patch, balancer.png
>
>
> 1) Hung node detection: HDFS-6247 has removed the socket read timeout while 
> adding the periodic response for slow block moves. However, the removal of 
> the long timeout wasn't necessary.  The timeout is still useful for avoiding 
> hung nodes and does not abort slow moves.
> 2) Enforcing the iteration limit:The 20 minute iteration limit is supposed to 
> be enforced, but it is not. An iteration can easily stretch to 30 to 40 
> minutes with a long tail. Because of the long tails, the balancer throughput 
> does not reach its full potential.
> 3) Slow move detection: For improved throughput, imposing block move timeout 
> is sometimes necessary.  We have seen an iteration taking over 2 hours 
> because of one slow block move.  This is mainly for catching exceptionally 
> slow moves.  Even if the balancer stops waiting, the move will continue and 
> finish.
> In order to not undo what  HDFS-6247 tried to achieve, it should be possible 
> to configure off 3).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11015) Enforce timeout in balancer

2016-10-24 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-11015:
--
Attachment: HDFS-11015-3.patch

Silly me. I put {{isIterationOver()}} instead of {{!isIterationOver()}} for the 
condition to continue. Fixed it in the new patch.

> Enforce timeout in balancer
> ---
>
> Key: HDFS-11015
> URL: https://issues.apache.org/jira/browse/HDFS-11015
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
> Attachments: HDFS-11015-1.patch, HDFS-11015-2.patch, 
> HDFS-11015-3.patch, balancer.png
>
>
> 1) Hung node detection: HDFS-6247 has removed the socket read timeout while 
> adding the periodic response for slow block moves. However, the removal of 
> the long timeout wasn't necessary.  The timeout is still useful for avoiding 
> hung nodes and does not abort slow moves.
> 2) Enforcing the iteration limit:The 20 minute iteration limit is supposed to 
> be enforced, but it is not. An iteration can easily stretch to 30 to 40 
> minutes with a long tail. Because of the long tails, the balancer throughput 
> does not reach its full potential.
> 3) Slow move detection: For improved throughput, imposing block move timeout 
> is sometimes necessary.  We have seen an iteration taking over 2 hours 
> because of one slow block move.  This is mainly for catching exceptionally 
> slow moves.  Even if the balancer stops waiting, the move will continue and 
> finish.
> In order to not undo what  HDFS-6247 tried to achieve, it should be possible 
> to configure off 3).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11015) Enforce timeout in balancer

2016-10-21 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-11015:
--
Attachment: HDFS-11015-2.patch

Attaching the updated patch. 

> Enforce timeout in balancer
> ---
>
> Key: HDFS-11015
> URL: https://issues.apache.org/jira/browse/HDFS-11015
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
> Attachments: HDFS-11015-1.patch, HDFS-11015-2.patch, balancer.png
>
>
> 1) Hung node detection: HDFS-6247 has removed the socket read timeout while 
> adding the periodic response for slow block moves. However, the removal of 
> the long timeout wasn't necessary.  The timeout is still useful for avoiding 
> hung nodes and does not abort slow moves.
> 2) Enforcing the iteration limit:The 20 minute iteration limit is supposed to 
> be enforced, but it is not. An iteration can easily stretch to 30 to 40 
> minutes with a long tail. Because of the long tails, the balancer throughput 
> does not reach its full potential.
> 3) Slow move detection: For improved throughput, imposing block move timeout 
> is sometimes necessary.  We have seen an iteration taking over 2 hours 
> because of one slow block move.  This is mainly for catching exceptionally 
> slow moves.  Even if the balancer stops waiting, the move will continue and 
> finish.
> In order to not undo what  HDFS-6247 tried to achieve, it should be possible 
> to configure off 3).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11015) Enforce timeout in balancer

2016-10-18 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-11015:
--
Attachment: balancer.png

> Enforce timeout in balancer
> ---
>
> Key: HDFS-11015
> URL: https://issues.apache.org/jira/browse/HDFS-11015
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
> Attachments: HDFS-11015-1.patch, balancer.png
>
>
> 1) Hung node detection: HDFS-6247 has removed the socket read timeout while 
> adding the periodic response for slow block moves. However, the removal of 
> the long timeout wasn't necessary.  The timeout is still useful for avoiding 
> hung nodes and does not abort slow moves.
> 2) Enforcing the iteration limit:The 20 minute iteration limit is supposed to 
> be enforced, but it is not. An iteration can easily stretch to 30 to 40 
> minutes with a long tail. Because of the long tails, the balancer throughput 
> does not reach its full potential.
> 3) Slow move detection: For improved throughput, imposing block move timeout 
> is sometimes necessary.  We have seen an iteration taking over 2 hours 
> because of one slow block move.  This is mainly for catching exceptionally 
> slow moves.  Even if the balancer stops waiting, the move will continue and 
> finish.
> In order to not undo what  HDFS-6247 tried to achieve, it should be possible 
> to configure off 3).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11015) Enforce timeout in balancer

2016-10-17 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-11015:
--
Attachment: HDFS-11015-1.patch

> Enforce timeout in balancer
> ---
>
> Key: HDFS-11015
> URL: https://issues.apache.org/jira/browse/HDFS-11015
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
> Attachments: HDFS-11015-1.patch
>
>
> 1) Hung node detection: HDFS-6247 has removed the socket read timeout while 
> adding the periodic response for slow block moves. However, the removal of 
> the long timeout wasn't necessary.  The timeout is still useful for avoiding 
> hung nodes and does not abort slow moves.
> 2) Enforcing the iteration limit:The 20 minute iteration limit is supposed to 
> be enforced, but it is not. An iteration can easily stretch to 30 to 40 
> minutes with a long tail. Because of the long tails, the balancer throughput 
> does not reach its full potential.
> 3) Slow move detection: For improved throughput, imposing block move timeout 
> is sometimes necessary.  We have seen an iteration taking over 2 hours 
> because of one slow block move.  This is mainly for catching exceptionally 
> slow moves.  Even if the balancer stops waiting, the move will continue and 
> finish.
> In order to not undo what  HDFS-6247 tried to achieve, it should be possible 
> to configure off 3).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11015) Enforce timeout in balancer

2016-10-17 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-11015:
--
Status: Patch Available  (was: Open)

> Enforce timeout in balancer
> ---
>
> Key: HDFS-11015
> URL: https://issues.apache.org/jira/browse/HDFS-11015
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
> Attachments: HDFS-11015-1.patch
>
>
> 1) Hung node detection: HDFS-6247 has removed the socket read timeout while 
> adding the periodic response for slow block moves. However, the removal of 
> the long timeout wasn't necessary.  The timeout is still useful for avoiding 
> hung nodes and does not abort slow moves.
> 2) Enforcing the iteration limit:The 20 minute iteration limit is supposed to 
> be enforced, but it is not. An iteration can easily stretch to 30 to 40 
> minutes with a long tail. Because of the long tails, the balancer throughput 
> does not reach its full potential.
> 3) Slow move detection: For improved throughput, imposing block move timeout 
> is sometimes necessary.  We have seen an iteration taking over 2 hours 
> because of one slow block move.  This is mainly for catching exceptionally 
> slow moves.  Even if the balancer stops waiting, the move will continue and 
> finish.
> In order to not undo what  HDFS-6247 tried to achieve, it should be possible 
> to configure off 3).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11015) Enforce timeout in balancer

2016-10-14 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-11015:
--
Description: 
1) Hung node detection: HDFS-6247 has removed the socket read timeout while 
adding the periodic response for slow block moves. However, the removal of the 
long timeout wasn't necessary.  The timeout is still useful for avoiding hung 
nodes and does not abort slow moves.

2) Enforcing the iteration limit:The 20 minute iteration limit is supposed to 
be enforced, but it is not. An iteration can easily stretch to 30 to 40 minutes 
with a long tail. Because of the long tails, the balancer throughput does not 
reach its full potential.

3) Slow move detection: For improved throughput, imposing block move timeout is 
sometimes necessary.  We have seen an iteration taking over 2 hours because of 
one slow block move.  This is mainly for catching exceptionally slow moves.  
Even if the balancer stops waiting, the move will continue and finish.

In order to not undo what  HDFS-6247 tried to achieve, it should be possible to 
configure off 3).

  was:
1) Hung node detection: HDFS-6247 has removed the socket read timeout while 
adding the periodic response for slow block moves. However, the removal of the 
long timeout wasn't necessary.  The timeout is still useful for avoiding hung 
nodes and does not interfere with slow moves.

2) Enforcing the iteration limit:The 20 minute iteration limit is supposed to 
be enforced, but it is not. An iteration can easily stretch to 30 to 40 minutes 
with a long tail. Because of the long tails, the balancer throughput does not 
reach its full potential.

3) Slow move detection: For improved throughput, imposing block move timeout is 
sometimes necessary.  We have seen an iteration taking over 2 hours because of 
one slow block move.  This is mainly for catching exceptionally slow moves.  
Even if the balancer stops waiting, the move will continue and finish.

In order to not undo what  HDFS-6247 tried to achieve, it should be possible to 
configure off 3).


> Enforce timeout in balancer
> ---
>
> Key: HDFS-11015
> URL: https://issues.apache.org/jira/browse/HDFS-11015
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>
> 1) Hung node detection: HDFS-6247 has removed the socket read timeout while 
> adding the periodic response for slow block moves. However, the removal of 
> the long timeout wasn't necessary.  The timeout is still useful for avoiding 
> hung nodes and does not abort slow moves.
> 2) Enforcing the iteration limit:The 20 minute iteration limit is supposed to 
> be enforced, but it is not. An iteration can easily stretch to 30 to 40 
> minutes with a long tail. Because of the long tails, the balancer throughput 
> does not reach its full potential.
> 3) Slow move detection: For improved throughput, imposing block move timeout 
> is sometimes necessary.  We have seen an iteration taking over 2 hours 
> because of one slow block move.  This is mainly for catching exceptionally 
> slow moves.  Even if the balancer stops waiting, the move will continue and 
> finish.
> In order to not undo what  HDFS-6247 tried to achieve, it should be possible 
> to configure off 3).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org