[jira] [Updated] (HDFS-7097) Allow block reports to be processed during checkpointing on standby name node

2015-05-04 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated HDFS-7097:
--
Target Version/s:   (was: 2.6.0)

 Allow block reports to be processed during checkpointing on standby name node
 -

 Key: HDFS-7097
 URL: https://issues.apache.org/jira/browse/HDFS-7097
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Critical
 Fix For: 2.7.0

 Attachments: HDFS-7097.patch, HDFS-7097.patch, HDFS-7097.patch, 
 HDFS-7097.patch, HDFS-7097.ultimate.trunk.patch


 On a reasonably busy HDFS cluster, there are stream of creates, causing data 
 nodes to generate incremental block reports.  When a standby name node is 
 checkpointing, RPC handler threads trying to process a full or incremental 
 block report is blocked on the name system's {{fsLock}}, because the 
 checkpointer acquires the read lock on it.  This can create a serious problem 
 if the size of name space is big and checkpointing takes a long time.
 All available RPC handlers can be tied up very quickly. If you have 100 
 handlers, it only takes 34 file creates.  If a separate service RPC port is 
 not used, HA transition will have to wait in the call queue for minutes. Even 
 if a separate service RPC port is configured, hearbeats from datanodes will 
 be blocked. A standby NN  with a big name space can lose all data nodes after 
 checkpointing.  The rpc calls will also be retransmitted by data nodes many 
 times, filling up the call queue and potentially causing listen queue 
 overflow.
 Since block reports are not modifying any state that is being saved to 
 fsimage, I propose letting them through during checkpointing. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7097) Allow block reports to be processed during checkpointing on standby name node

2014-11-26 Thread Akira AJISAKA (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA updated HDFS-7097:

   Resolution: Fixed
Fix Version/s: 2.7.0
   Status: Resolved  (was: Patch Available)

 Allow block reports to be processed during checkpointing on standby name node
 -

 Key: HDFS-7097
 URL: https://issues.apache.org/jira/browse/HDFS-7097
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Critical
 Fix For: 2.7.0

 Attachments: HDFS-7097.patch, HDFS-7097.patch, HDFS-7097.patch, 
 HDFS-7097.patch, HDFS-7097.ultimate.trunk.patch


 On a reasonably busy HDFS cluster, there are stream of creates, causing data 
 nodes to generate incremental block reports.  When a standby name node is 
 checkpointing, RPC handler threads trying to process a full or incremental 
 block report is blocked on the name system's {{fsLock}}, because the 
 checkpointer acquires the read lock on it.  This can create a serious problem 
 if the size of name space is big and checkpointing takes a long time.
 All available RPC handlers can be tied up very quickly. If you have 100 
 handlers, it only takes 34 file creates.  If a separate service RPC port is 
 not used, HA transition will have to wait in the call queue for minutes. Even 
 if a separate service RPC port is configured, hearbeats from datanodes will 
 be blocked. A standby NN  with a big name space can lose all data nodes after 
 checkpointing.  The rpc calls will also be retransmitted by data nodes many 
 times, filling up the call queue and potentially causing listen queue 
 overflow.
 Since block reports are not modifying any state that is being saved to 
 fsimage, I propose letting them through during checkpointing. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7097) Allow block reports to be processed during checkpointing on standby name node

2014-11-25 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-7097:
-
Attachment: HDFS-7097.ultimate.trunk.patch

Here is the updated patch.

 Allow block reports to be processed during checkpointing on standby name node
 -

 Key: HDFS-7097
 URL: https://issues.apache.org/jira/browse/HDFS-7097
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Critical
 Attachments: HDFS-7097.patch, HDFS-7097.patch, HDFS-7097.patch, 
 HDFS-7097.patch, HDFS-7097.ultimate.trunk.patch


 On a reasonably busy HDFS cluster, there are stream of creates, causing data 
 nodes to generate incremental block reports.  When a standby name node is 
 checkpointing, RPC handler threads trying to process a full or incremental 
 block report is blocked on the name system's {{fsLock}}, because the 
 checkpointer acquires the read lock on it.  This can create a serious problem 
 if the size of name space is big and checkpointing takes a long time.
 All available RPC handlers can be tied up very quickly. If you have 100 
 handlers, it only takes 34 file creates.  If a separate service RPC port is 
 not used, HA transition will have to wait in the call queue for minutes. Even 
 if a separate service RPC port is configured, hearbeats from datanodes will 
 be blocked. A standby NN  with a big name space can lose all data nodes after 
 checkpointing.  The rpc calls will also be retransmitted by data nodes many 
 times, filling up the call queue and potentially causing listen queue 
 overflow.
 Since block reports are not modifying any state that is being saved to 
 fsimage, I propose letting them through during checkpointing. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7097) Allow block reports to be processed during checkpointing on standby name node

2014-10-29 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-7097:
-
Attachment: HDFS-7097.patch

Attaching a new patch with the lock name/comment changes.

 Allow block reports to be processed during checkpointing on standby name node
 -

 Key: HDFS-7097
 URL: https://issues.apache.org/jira/browse/HDFS-7097
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Critical
 Attachments: HDFS-7097.patch, HDFS-7097.patch, HDFS-7097.patch, 
 HDFS-7097.patch


 On a reasonably busy HDFS cluster, there are stream of creates, causing data 
 nodes to generate incremental block reports.  When a standby name node is 
 checkpointing, RPC handler threads trying to process a full or incremental 
 block report is blocked on the name system's {{fsLock}}, because the 
 checkpointer acquires the read lock on it.  This can create a serious problem 
 if the size of name space is big and checkpointing takes a long time.
 All available RPC handlers can be tied up very quickly. If you have 100 
 handlers, it only takes 34 file creates.  If a separate service RPC port is 
 not used, HA transition will have to wait in the call queue for minutes. Even 
 if a separate service RPC port is configured, hearbeats from datanodes will 
 be blocked. A standby NN  with a big name space can lose all data nodes after 
 checkpointing.  The rpc calls will also be retransmitted by data nodes many 
 times, filling up the call queue and potentially causing listen queue 
 overflow.
 Since block reports are not modifying any state that is being saved to 
 fsimage, I propose letting them through during checkpointing. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7097) Allow block reports to be processed during checkpointing on standby name node

2014-10-09 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-7097:
-
Attachment: HDFS-7097.patch

Made the change in the test case. It now checks the result every 100ms for up 
to 5 seconds.

 Allow block reports to be processed during checkpointing on standby name node
 -

 Key: HDFS-7097
 URL: https://issues.apache.org/jira/browse/HDFS-7097
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Critical
 Attachments: HDFS-7097.patch, HDFS-7097.patch, HDFS-7097.patch


 On a reasonably busy HDFS cluster, there are stream of creates, causing data 
 nodes to generate incremental block reports.  When a standby name node is 
 checkpointing, RPC handler threads trying to process a full or incremental 
 block report is blocked on the name system's {{fsLock}}, because the 
 checkpointer acquires the read lock on it.  This can create a serious problem 
 if the size of name space is big and checkpointing takes a long time.
 All available RPC handlers can be tied up very quickly. If you have 100 
 handlers, it only takes 34 file creates.  If a separate service RPC port is 
 not used, HA transition will have to wait in the call queue for minutes. Even 
 if a separate service RPC port is configured, hearbeats from datanodes will 
 be blocked. A standby NN  with a big name space can lose all data nodes after 
 checkpointing.  The rpc calls will also be retransmitted by data nodes many 
 times, filling up the call queue and potentially causing listen queue 
 overflow.
 Since block reports are not modifying any state that is being saved to 
 fsimage, I propose letting them through during checkpointing. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7097) Allow block reports to be processed during checkpointing on standby name node

2014-10-02 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-7097:
-
Attachment: HDFS-7097.patch

Modified a test case to also check whether block reports are processed during 
checkpointing on standby nn.

 Allow block reports to be processed during checkpointing on standby name node
 -

 Key: HDFS-7097
 URL: https://issues.apache.org/jira/browse/HDFS-7097
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Critical
 Attachments: HDFS-7097.patch, HDFS-7097.patch


 On a reasonably busy HDFS cluster, there are stream of creates, causing data 
 nodes to generate incremental block reports.  When a standby name node is 
 checkpointing, RPC handler threads trying to process a full or incremental 
 block report is blocked on the name system's {{fsLock}}, because the 
 checkpointer acquires the read lock on it.  This can create a serious problem 
 if the size of name space is big and checkpointing takes a long time.
 All available RPC handlers can be tied up very quickly. If you have 100 
 handlers, it only takes 34 file creates.  If a separate service RPC port is 
 not used, HA transition will have to wait in the call queue for minutes. Even 
 if a separate service RPC port is configured, hearbeats from datanodes will 
 be blocked. A standby NN  with a big name space can lose all data nodes after 
 checkpointing.  The rpc calls will also be retransmitted by data nodes many 
 times, filling up the call queue and potentially causing listen queue 
 overflow.
 Since block reports are not modifying any state that is being saved to 
 fsimage, I propose letting them through during checkpointing. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7097) Allow block reports to be processed during checkpointing on standby name node

2014-09-19 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-7097:
-
Attachment: HDFS-7097.patch

 Allow block reports to be processed during checkpointing on standby name node
 -

 Key: HDFS-7097
 URL: https://issues.apache.org/jira/browse/HDFS-7097
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Kihwal Lee
Priority: Critical
 Attachments: HDFS-7097.patch


 On a reasonably busy HDFS cluster, there are stream of creates, causing data 
 nodes to generate incremental block reports.  When a standby name node is 
 checkpointing, RPC handler threads trying to process a full or incremental 
 block report is blocked on the name system's {{fsLock}}, because the 
 checkpointer acquires the read lock on it.  This can create a serious problem 
 if the size of name space is big and checkpointing takes a long time.
 All available RPC handlers can be tied up very quickly. If you have 100 
 handlers, it only takes 34 file creates.  If a separate service RPC port is 
 not used, HA transition will have to wait in the call queue for minutes. Even 
 if a separate service RPC port is configured, hearbeats from datanodes will 
 be blocked. A standby NN  with a big name space can lose all data nodes after 
 checkpointing.  The rpc calls will also be retransmitted by data nodes many 
 times, filling up the call queue and potentially causing listen queue 
 overflow.
 Since block reports are not modifying any state that is being saved to 
 fsimage, I propose letting them through during checkpointing. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7097) Allow block reports to be processed during checkpointing on standby name node

2014-09-19 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-7097:
-
Status: Patch Available  (was: Open)

 Allow block reports to be processed during checkpointing on standby name node
 -

 Key: HDFS-7097
 URL: https://issues.apache.org/jira/browse/HDFS-7097
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Kihwal Lee
Priority: Critical
 Attachments: HDFS-7097.patch


 On a reasonably busy HDFS cluster, there are stream of creates, causing data 
 nodes to generate incremental block reports.  When a standby name node is 
 checkpointing, RPC handler threads trying to process a full or incremental 
 block report is blocked on the name system's {{fsLock}}, because the 
 checkpointer acquires the read lock on it.  This can create a serious problem 
 if the size of name space is big and checkpointing takes a long time.
 All available RPC handlers can be tied up very quickly. If you have 100 
 handlers, it only takes 34 file creates.  If a separate service RPC port is 
 not used, HA transition will have to wait in the call queue for minutes. Even 
 if a separate service RPC port is configured, hearbeats from datanodes will 
 be blocked. A standby NN  with a big name space can lose all data nodes after 
 checkpointing.  The rpc calls will also be retransmitted by data nodes many 
 times, filling up the call queue and potentially causing listen queue 
 overflow.
 Since block reports are not modifying any state that is being saved to 
 fsimage, I propose letting them through during checkpointing. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)