from:"Xudong Cao \(Jira\)"

[jira] [Updated] (HDDS-1530) Ozone: Freon: Support big files larger than 2GB and add "--bufferSize" and "--validateWrites" options.

2019-05-17 Thread Xudong Cao (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xudong Cao updated HDDS-1530:
-
Status: Patch Available  (was: In Progress)

> Ozone: Freon: Support big files larger than 2GB and add "--bufferSize" and 
> "--validateWrites" options.
> --
>
> Key: HDDS-1530
> URL: https://issues.apache.org/jira/browse/HDDS-1530
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: test
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> *Current problems:*
>  1. Freon does not support big files larger than 2GB because it use an int 
> type "keySize" parameter and also "keyValue" buffer size.
>  2. Freon allocates a entire buffer for each key at once, so if the key size 
> is large and the concurrency is high, freon will report OOM exception 
> frequently.
>  3. Freon lacks option such as "--validateWrites", thus users cannot manually 
> specify that verification is required after writing.
> *Some solutions:*
>  1. Use a long type "keySize" parameter, make sure freon can support big 
> files larger than 2GB.
>  2. Use a small buffer repeatedly than allocating the entire key-size buffer 
> at once, the default buffer size is 4K and can be configured by "–bufferSize" 
> parameter.
>  3. Add a "--validateWrites" option to Freon command line, users can provide 
> this option to indicate that a validation is required after write.
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-1530) Ozone: Freon: Support big files larger than 2GB and add "--bufferSize" and "--validateWrites" options.

2019-05-17 Thread Xudong Cao (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xudong Cao updated HDDS-1530:
-
Status: In Progress  (was: Patch Available)

> Ozone: Freon: Support big files larger than 2GB and add "--bufferSize" and 
> "--validateWrites" options.
> --
>
> Key: HDDS-1530
> URL: https://issues.apache.org/jira/browse/HDDS-1530
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: test
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> *Current problems:*
>  1. Freon does not support big files larger than 2GB because it use an int 
> type "keySize" parameter and also "keyValue" buffer size.
>  2. Freon allocates a entire buffer for each key at once, so if the key size 
> is large and the concurrency is high, freon will report OOM exception 
> frequently.
>  3. Freon lacks option such as "--validateWrites", thus users cannot manually 
> specify that verification is required after writing.
> *Some solutions:*
>  1. Use a long type "keySize" parameter, make sure freon can support big 
> files larger than 2GB.
>  2. Use a small buffer repeatedly than allocating the entire key-size buffer 
> at once, the default buffer size is 4K and can be configured by "–bufferSize" 
> parameter.
>  3. Add a "--validateWrites" option to Freon command line, users can provide 
> this option to indicate that a validation is required after write.
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-1530) Ozone: Freon: Support big files larger than 2GB and add "--bufferSize" and "--validateWrites" options.

2019-05-17 Thread Xudong Cao (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xudong Cao updated HDDS-1530:
-
Status: Open  (was: Patch Available)

> Ozone: Freon: Support big files larger than 2GB and add "--bufferSize" and 
> "--validateWrites" options.
> --
>
> Key: HDDS-1530
> URL: https://issues.apache.org/jira/browse/HDDS-1530
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: test
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Major
> Attachments: HDDS-1530.001.patch, HDDS-1530.002.patch
>
>
> *Current problems:*
>  1. Freon does not support big files larger than 2GB because it use an int 
> type "keySize" parameter and also "keyValue" buffer size.
>  2. Freon allocates a entire buffer for each key at once, so if the key size 
> is large and the concurrency is high, freon will report OOM exception 
> frequently.
>  3. Freon lacks option such as "--validateWrites", thus users cannot manually 
> specify that verification is required after writing.
> *Some solutions:*
>  1. Use a long type "keySize" parameter, make sure freon can support big 
> files larger than 2GB.
>  2. Use a small buffer repeatedly than allocating the entire key-size buffer 
> at once, the default buffer size is 4K and can be configured by "–bufferSize" 
> parameter.
>  3. Add a "--validateWrites" option to Freon command line, users can provide 
> this option to indicate that a validation is required after write.
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-1530) Ozone: Freon: Support big files larger than 2GB and add "--bufferSize" and "--validateWrites" options.

2019-05-17 Thread Xudong Cao (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xudong Cao updated HDDS-1530:
-
Status: Patch Available  (was: Open)

> Ozone: Freon: Support big files larger than 2GB and add "--bufferSize" and 
> "--validateWrites" options.
> --
>
> Key: HDDS-1530
> URL: https://issues.apache.org/jira/browse/HDDS-1530
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: test
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Major
> Attachments: HDDS-1530.001.patch, HDDS-1530.002.patch
>
>
> *Current problems:*
>  1. Freon does not support big files larger than 2GB because it use an int 
> type "keySize" parameter and also "keyValue" buffer size.
>  2. Freon allocates a entire buffer for each key at once, so if the key size 
> is large and the concurrency is high, freon will report OOM exception 
> frequently.
>  3. Freon lacks option such as "--validateWrites", thus users cannot manually 
> specify that verification is required after writing.
> *Some solutions:*
>  1. Use a long type "keySize" parameter, make sure freon can support big 
> files larger than 2GB.
>  2. Use a small buffer repeatedly than allocating the entire key-size buffer 
> at once, the default buffer size is 4K and can be configured by "–bufferSize" 
> parameter.
>  3. Add a "--validateWrites" option to Freon command line, users can provide 
> this option to indicate that a validation is required after write.
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-1530) Ozone: Freon: Support big files larger than 2GB and add "--bufferSize" and "--validateWrites" options.

2019-05-17 Thread Xudong Cao (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xudong Cao updated HDDS-1530:
-
Attachment: (was: HDDS-1530.002.patch)

> Ozone: Freon: Support big files larger than 2GB and add "--bufferSize" and 
> "--validateWrites" options.
> --
>
> Key: HDDS-1530
> URL: https://issues.apache.org/jira/browse/HDDS-1530
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: test
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Major
>
> *Current problems:*
>  1. Freon does not support big files larger than 2GB because it use an int 
> type "keySize" parameter and also "keyValue" buffer size.
>  2. Freon allocates a entire buffer for each key at once, so if the key size 
> is large and the concurrency is high, freon will report OOM exception 
> frequently.
>  3. Freon lacks option such as "--validateWrites", thus users cannot manually 
> specify that verification is required after writing.
> *Some solutions:*
>  1. Use a long type "keySize" parameter, make sure freon can support big 
> files larger than 2GB.
>  2. Use a small buffer repeatedly than allocating the entire key-size buffer 
> at once, the default buffer size is 4K and can be configured by "–bufferSize" 
> parameter.
>  3. Add a "--validateWrites" option to Freon command line, users can provide 
> this option to indicate that a validation is required after write.
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-1530) Ozone: Freon: Support big files larger than 2GB and add "--bufferSize" and "--validateWrites" options.

2019-05-17 Thread Xudong Cao (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xudong Cao updated HDDS-1530:
-
Status: Open  (was: Patch Available)

> Ozone: Freon: Support big files larger than 2GB and add "--bufferSize" and 
> "--validateWrites" options.
> --
>
> Key: HDDS-1530
> URL: https://issues.apache.org/jira/browse/HDDS-1530
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: test
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Major
>
> *Current problems:*
>  1. Freon does not support big files larger than 2GB because it use an int 
> type "keySize" parameter and also "keyValue" buffer size.
>  2. Freon allocates a entire buffer for each key at once, so if the key size 
> is large and the concurrency is high, freon will report OOM exception 
> frequently.
>  3. Freon lacks option such as "--validateWrites", thus users cannot manually 
> specify that verification is required after writing.
> *Some solutions:*
>  1. Use a long type "keySize" parameter, make sure freon can support big 
> files larger than 2GB.
>  2. Use a small buffer repeatedly than allocating the entire key-size buffer 
> at once, the default buffer size is 4K and can be configured by "–bufferSize" 
> parameter.
>  3. Add a "--validateWrites" option to Freon command line, users can provide 
> this option to indicate that a validation is required after write.
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-1530) Ozone: Freon: Support big files larger than 2GB and add "--bufferSize" and "--validateWrites" options.

2019-05-17 Thread Xudong Cao (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xudong Cao updated HDDS-1530:
-
Attachment: (was: HDDS-1530.001.patch)

> Ozone: Freon: Support big files larger than 2GB and add "--bufferSize" and 
> "--validateWrites" options.
> --
>
> Key: HDDS-1530
> URL: https://issues.apache.org/jira/browse/HDDS-1530
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: test
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Major
>
> *Current problems:*
>  1. Freon does not support big files larger than 2GB because it use an int 
> type "keySize" parameter and also "keyValue" buffer size.
>  2. Freon allocates a entire buffer for each key at once, so if the key size 
> is large and the concurrency is high, freon will report OOM exception 
> frequently.
>  3. Freon lacks option such as "--validateWrites", thus users cannot manually 
> specify that verification is required after writing.
> *Some solutions:*
>  1. Use a long type "keySize" parameter, make sure freon can support big 
> files larger than 2GB.
>  2. Use a small buffer repeatedly than allocating the entire key-size buffer 
> at once, the default buffer size is 4K and can be configured by "–bufferSize" 
> parameter.
>  3. Add a "--validateWrites" option to Freon command line, users can provide 
> this option to indicate that a validation is required after write.
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-1530) Ozone: Freon: Support big files larger than 2GB and add "--bufferSize" and "--validateWrites" options.

2019-05-17 Thread Xudong Cao (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xudong Cao updated HDDS-1530:
-
Status: Patch Available  (was: Open)

> Ozone: Freon: Support big files larger than 2GB and add "--bufferSize" and 
> "--validateWrites" options.
> --
>
> Key: HDDS-1530
> URL: https://issues.apache.org/jira/browse/HDDS-1530
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: test
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> *Current problems:*
>  1. Freon does not support big files larger than 2GB because it use an int 
> type "keySize" parameter and also "keyValue" buffer size.
>  2. Freon allocates a entire buffer for each key at once, so if the key size 
> is large and the concurrency is high, freon will report OOM exception 
> frequently.
>  3. Freon lacks option such as "--validateWrites", thus users cannot manually 
> specify that verification is required after writing.
> *Some solutions:*
>  1. Use a long type "keySize" parameter, make sure freon can support big 
> files larger than 2GB.
>  2. Use a small buffer repeatedly than allocating the entire key-size buffer 
> at once, the default buffer size is 4K and can be configured by "–bufferSize" 
> parameter.
>  3. Add a "--validateWrites" option to Freon command line, users can provide 
> this option to indicate that a validation is required after write.
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-1530) Ozone: Freon: Support big files larger than 2GB and add "--bufferSize" and "--validateWrites" options.

2019-05-17 Thread Xudong Cao (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xudong Cao updated HDDS-1530:
-
Status: Open  (was: Patch Available)

> Ozone: Freon: Support big files larger than 2GB and add "--bufferSize" and 
> "--validateWrites" options.
> --
>
> Key: HDDS-1530
> URL: https://issues.apache.org/jira/browse/HDDS-1530
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: test
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Major
> Attachments: HDDS-1530.001.patch, HDDS-1530.002.patch
>
>
> *Current problems:*
>  1. Freon does not support big files larger than 2GB because it use an int 
> type "keySize" parameter and also "keyValue" buffer size.
>  2. Freon allocates a entire buffer for each key at once, so if the key size 
> is large and the concurrency is high, freon will report OOM exception 
> frequently.
>  3. Freon lacks option such as "--validateWrites", thus users cannot manually 
> specify that verification is required after writing.
> *Some solutions:*
>  1. Use a long type "keySize" parameter, make sure freon can support big 
> files larger than 2GB.
>  2. Use a small buffer repeatedly than allocating the entire key-size buffer 
> at once, the default buffer size is 4K and can be configured by "–bufferSize" 
> parameter.
>  3. Add a "--validateWrites" option to Freon command line, users can provide 
> this option to indicate that a validation is required after write.
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-1530) Ozone: Freon: Support big files larger than 2GB and add "--bufferSize" and "--validateWrites" options.

2019-05-17 Thread Xudong Cao (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xudong Cao updated HDDS-1530:
-
Attachment: (was: HDDS-1530.001.patch)

> Ozone: Freon: Support big files larger than 2GB and add "--bufferSize" and 
> "--validateWrites" options.
> --
>
> Key: HDDS-1530
> URL: https://issues.apache.org/jira/browse/HDDS-1530
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: test
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Major
> Attachments: HDDS-1530.001.patch, HDDS-1530.002.patch
>
>
> *Current problems:*
>  1. Freon does not support big files larger than 2GB because it use an int 
> type "keySize" parameter and also "keyValue" buffer size.
>  2. Freon allocates a entire buffer for each key at once, so if the key size 
> is large and the concurrency is high, freon will report OOM exception 
> frequently.
>  3. Freon lacks option such as "--validateWrites", thus users cannot manually 
> specify that verification is required after writing.
> *Some solutions:*
>  1. Use a long type "keySize" parameter, make sure freon can support big 
> files larger than 2GB.
>  2. Use a small buffer repeatedly than allocating the entire key-size buffer 
> at once, the default buffer size is 4K and can be configured by "–bufferSize" 
> parameter.
>  3. Add a "--validateWrites" option to Freon command line, users can provide 
> this option to indicate that a validation is required after write.
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-1530) Ozone: Freon: Support big files larger than 2GB and add "--bufferSize" and "--validateWrites" options.

2019-05-17 Thread Xudong Cao (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xudong Cao updated HDDS-1530:
-
Attachment: HDDS-1530.001.patch
HDDS-1530.002.patch
Status: Patch Available  (was: Open)

> Ozone: Freon: Support big files larger than 2GB and add "--bufferSize" and 
> "--validateWrites" options.
> --
>
> Key: HDDS-1530
> URL: https://issues.apache.org/jira/browse/HDDS-1530
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: test
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Major
> Attachments: HDDS-1530.001.patch, HDDS-1530.002.patch
>
>
> *Current problems:*
>  1. Freon does not support big files larger than 2GB because it use an int 
> type "keySize" parameter and also "keyValue" buffer size.
>  2. Freon allocates a entire buffer for each key at once, so if the key size 
> is large and the concurrency is high, freon will report OOM exception 
> frequently.
>  3. Freon lacks option such as "--validateWrites", thus users cannot manually 
> specify that verification is required after writing.
> *Some solutions:*
>  1. Use a long type "keySize" parameter, make sure freon can support big 
> files larger than 2GB.
>  2. Use a small buffer repeatedly than allocating the entire key-size buffer 
> at once, the default buffer size is 4K and can be configured by "–bufferSize" 
> parameter.
>  3. Add a "--validateWrites" option to Freon command line, users can provide 
> this option to indicate that a validation is required after write.
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-1530) Ozone: Freon: Support big files larger than 2GB and add "--bufferSize" and "--validateWrites" options.

2019-05-17 Thread Xudong Cao (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xudong Cao updated HDDS-1530:
-
Attachment: (was: HDDS-1530.002.patch)

> Ozone: Freon: Support big files larger than 2GB and add "--bufferSize" and 
> "--validateWrites" options.
> --
>
> Key: HDDS-1530
> URL: https://issues.apache.org/jira/browse/HDDS-1530
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: test
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Major
> Attachments: HDDS-1530.001.patch, HDDS-1530.002.patch
>
>
> *Current problems:*
>  1. Freon does not support big files larger than 2GB because it use an int 
> type "keySize" parameter and also "keyValue" buffer size.
>  2. Freon allocates a entire buffer for each key at once, so if the key size 
> is large and the concurrency is high, freon will report OOM exception 
> frequently.
>  3. Freon lacks option such as "--validateWrites", thus users cannot manually 
> specify that verification is required after writing.
> *Some solutions:*
>  1. Use a long type "keySize" parameter, make sure freon can support big 
> files larger than 2GB.
>  2. Use a small buffer repeatedly than allocating the entire key-size buffer 
> at once, the default buffer size is 4K and can be configured by "–bufferSize" 
> parameter.
>  3. Add a "--validateWrites" option to Freon command line, users can provide 
> this option to indicate that a validation is required after write.
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDDS-1532) Freon: Improve the concurrency testing framework.

2019-06-12 Thread Xudong Cao (JIRA)



[ 
https://issues.apache.org/jira/browse/HDDS-1532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16862046#comment-16862046
 ] 

Xudong Cao commented on HDDS-1532:
--

Test performance comparison before and after modification：

> Freon: Improve the concurrency testing framework.
> -
>
> Key: HDDS-1532
> URL: https://issues.apache.org/jira/browse/HDDS-1532
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: test
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Major
>
> Currently, Freon's concurrency framework is just on volume-level, but in 
> actual testing, users are likely to provide a smaller volume number（typically 
> 1）, and a larger bucket number and key number, in which case the existing 
> concurrency framework can not make good use of the thread pool.
> We need to improve the concurrency policy, make the volume creation task, 
> bucket creation task, and key creation task all can be equally submitted to 
> the thread pool as a general task. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDDS-1532) Freon: Improve the concurrent testing framework.

2019-06-12 Thread Xudong Cao (JIRA)



[ 
https://issues.apache.org/jira/browse/HDDS-1532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16862046#comment-16862046
 ] 

Xudong Cao edited comment on HDDS-1532 at 6/12/19 12:36 PM:


Test performance comparison before and after this jira：

1. Test Environment

 Test under a 3 nodes ozone cluster, each node is equiped as below:
||Hardware ||Configuration||
|CPU, Memory|16 cores Xeon 2.10GHZ + 64GB RAM|
|Disk|3.6TB HDD * 11pcs|
|Network|1000M Network Card|

 
h3. 2. Test result before jira

*freon command (since there is only 1 volume, so only 1 thread will be 
created):*

bin/ozone freon randomkeys --numOfVolumes=1 --numOfBuckets 10 --numOfKeys 1000 
--keySize 1048576 --replicationType=RATIS --factor=THREE

*freon result:*

Number of Volumes created: 1
 Number of Buckets created: 10
 Number of Keys added: 1
 Ratis replication factor: THREE
 Ratis replication type: RATIS
 Average Time spent in volume creation: 00:00:00,040
 Average Time spent in bucket creation: 00:00:00,020
 Average Time spent in key creation: 00:00:14,493
 Average Time spent in key write: 00:23:14,749
 Total bytes written: 1048576
 Total Execution time: 00:23:31,540

 
h3. 3. Test result after jira

*freon command (note we have set --numOfThreads 50, so although there is only 1 
volume, we create 50 threads):*

ozone freon randomkeys --numOfVolumes=1 --numOfBuckets 10 --numOfKeys 1000 
--numOfThreads 50 --keySize 1048576 --replicationType=RATIS --factor=THREE

*freon result:*

Number of Volumes created: 1
 Number of Buckets created: 10
 Number of Keys added: 1
 Ratis replication factor: THREE
 Ratis replication type: RATIS
 Average Time spent in volume creation: 00:00:00,000
 Average Time spent in bucket creation: 00:00:00,002
 Average Time spent in key creation: 00:00:02,430
 Average Time spent in key write: 00:03:14,370
 Total bytes written: 1048576
 Total Execution time: 00:03:21,064

 
h3. 4. Conclusion

In this environment, writing the same amount of data, by improving the 
concurrent framework, test performance is 8 times faster.

 


was (Author: xudongcao):
Test performance comparison before and after this jira：

1. Test Environment

 Test under a 3 nodes ozone cluster, each node is equiped as below:
||Hardware ||Configuration||
|CPU, Memory|16 cores Xeon 2.10GHZ + 64GB RAM|
|Disk|3.6TB HDD * 11pcs|
|Network|1000M Network Card|

 
h3. 2. Test result before jira

*freon command (since there is only 1 volume, so only 1 thread will be 
created):*

bin/ozone freon randomkeys --numOfVolumes=1 --numOfBuckets 10 --numOfKeys 1000 
--keySize 1048576 --replicationType=RATIS --factor=THREE

*freon result:*

Number of Volumes created: 1
 Number of Buckets created: 10
 Number of Keys added: 1
 Ratis replication factor: THREE
 Ratis replication type: RATIS
 Average Time spent in volume creation: 00:00:00,040
 Average Time spent in bucket creation: 00:00:00,020
 Average Time spent in key creation: 00:00:14,493
 Average Time spent in key write: 00:23:14,749
 Total bytes written: 1048576
 Total Execution time: 00:23:31,540

 
h3. 3. Test result after jira

*freon command (note we have set --numOfThreads 50, so although there is only 1 
volume, we create 50 threads):*

ozone freon randomkeys --numOfVolumes=1 --numOfBuckets 10 --numOfKeys 1000 
--numOfThreads 50 --keySize 1048576 --replicationType=RATIS --factor=THREE

*freon result:*

Number of Volumes created: 1
 Number of Buckets created: 10
 Number of Keys added: 1
 Ratis replication factor: THREE
 Ratis replication type: RATIS
 Average Time spent in volume creation: 00:00:00,000
 Average Time spent in bucket creation: 00:00:00,002
 Average Time spent in key creation: 00:00:02,430
 Average Time spent in key write: 00:03:14,370
 Total bytes written: 1048576
 Total Execution time: 00:03:21,064

 
h3. 4. Conclusion

In this environment, writing the same amount of data, by improving the 
concurrency framework, test performance is 8 times faster.

 

> Freon: Improve the concurrent testing framework.
> 
>
> Key: HDDS-1532
> URL: https://issues.apache.org/jira/browse/HDDS-1532
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: test
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Major
>
> Currently, Freon's concurrency framework is just on volume-level, but in 
> actual testing, users are likely to provide a smaller volume number（typically 
> 1）, and a larger bucket number and key number, in which case the existing 
> concurrency framework can not make good use of the thread pool.
> We need to improve the concurrency policy, make the volume creation task, 
> bucket creation task, and key creation task all can be equally submitted to 
> the thread pool as a general task. 



--
This message was sent by

[jira] [Updated] (HDDS-1532) Freon: Improve the concurrent testing framework.

2019-06-12 Thread Xudong Cao (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-1532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xudong Cao updated HDDS-1532:
-
Summary: Freon: Improve the concurrent testing framework.  (was: Freon: 
Improve the concurrency testing framework.)

> Freon: Improve the concurrent testing framework.
> 
>
> Key: HDDS-1532
> URL: https://issues.apache.org/jira/browse/HDDS-1532
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: test
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Major
>
> Currently, Freon's concurrency framework is just on volume-level, but in 
> actual testing, users are likely to provide a smaller volume number（typically 
> 1）, and a larger bucket number and key number, in which case the existing 
> concurrency framework can not make good use of the thread pool.
> We need to improve the concurrency policy, make the volume creation task, 
> bucket creation task, and key creation task all can be equally submitted to 
> the thread pool as a general task. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDDS-1532) Freon: Improve the concurrency testing framework.

2019-06-12 Thread Xudong Cao (JIRA)



[ 
https://issues.apache.org/jira/browse/HDDS-1532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16862046#comment-16862046
 ] 

Xudong Cao edited comment on HDDS-1532 at 6/12/19 12:25 PM:


Test performance comparison before and after jira：

1. Test Environment

 Test under a 3 nodes ozone cluster, each node is equiped as below:
||Hardware ||Configuration||
|CPU, Memory|16 cores Xeon 2.10GHZ + 64GB RAM|
|Disk|3.6TB HDD * 11pcs|
|Network|1000M Network Card|

 
h3. 2. Test result before jira

*freon command (since there is only 1 volume, so only 1 thread will be 
created):*

bin/ozone freon randomkeys --numOfVolumes=1 --numOfBuckets 10 --numOfKeys 1000 
--keySize 1048576 --replicationType=RATIS --factor=THREE

*freon result:*

Number of Volumes created: 1
Number of Buckets created: 10
Number of Keys added: 1
Ratis replication factor: THREE
Ratis replication type: RATIS
Average Time spent in volume creation: 00:00:00,040
Average Time spent in bucket creation: 00:00:00,020
Average Time spent in key creation: 00:00:14,493
Average Time spent in key write: 00:23:14,749
Total bytes written: 1048576
Total Execution time: 00:23:31,540

 
h3. 3. Test result after jira

*freon command (note we have set --numOfThreads 50, so although there is only 1 
volume, we create 50 threads):*

ozone freon randomkeys --numOfVolumes=1 --numOfBuckets 10 --numOfKeys 1000 
--numOfThreads 50 --keySize 1048576 --replicationType=RATIS --factor=THREE

*freon result:*

Number of Volumes created: 1
Number of Buckets created: 10
Number of Keys added: 1
Ratis replication factor: THREE
Ratis replication type: RATIS
Average Time spent in volume creation: 00:00:00,000
Average Time spent in bucket creation: 00:00:00,002
Average Time spent in key creation: 00:00:02,430
Average Time spent in key write: 00:03:14,370
Total bytes written: 1048576
Total Execution time: 00:03:21,064

 
h3. 4. Conclusion

In this environment, writing the same amount of data, by improving the 
concurrency framework, test performance is 8 times faster.

 


was (Author: xudongcao):
Test performance comparison before and after modification：

> Freon: Improve the concurrency testing framework.
> -
>
> Key: HDDS-1532
> URL: https://issues.apache.org/jira/browse/HDDS-1532
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: test
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Major
>
> Currently, Freon's concurrency framework is just on volume-level, but in 
> actual testing, users are likely to provide a smaller volume number（typically 
> 1）, and a larger bucket number and key number, in which case the existing 
> concurrency framework can not make good use of the thread pool.
> We need to improve the concurrency policy, make the volume creation task, 
> bucket creation task, and key creation task all can be equally submitted to 
> the thread pool as a general task. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDDS-1532) Freon: Improve the concurrency testing framework.

2019-06-12 Thread Xudong Cao (JIRA)



[ 
https://issues.apache.org/jira/browse/HDDS-1532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16862046#comment-16862046
 ] 

Xudong Cao edited comment on HDDS-1532 at 6/12/19 12:34 PM:


Test performance comparison before and after this jira：

1. Test Environment

 Test under a 3 nodes ozone cluster, each node is equiped as below:
||Hardware ||Configuration||
|CPU, Memory|16 cores Xeon 2.10GHZ + 64GB RAM|
|Disk|3.6TB HDD * 11pcs|
|Network|1000M Network Card|

 
h3. 2. Test result before jira

*freon command (since there is only 1 volume, so only 1 thread will be 
created):*

bin/ozone freon randomkeys --numOfVolumes=1 --numOfBuckets 10 --numOfKeys 1000 
--keySize 1048576 --replicationType=RATIS --factor=THREE

*freon result:*

Number of Volumes created: 1
 Number of Buckets created: 10
 Number of Keys added: 1
 Ratis replication factor: THREE
 Ratis replication type: RATIS
 Average Time spent in volume creation: 00:00:00,040
 Average Time spent in bucket creation: 00:00:00,020
 Average Time spent in key creation: 00:00:14,493
 Average Time spent in key write: 00:23:14,749
 Total bytes written: 1048576
 Total Execution time: 00:23:31,540

 
h3. 3. Test result after jira

*freon command (note we have set --numOfThreads 50, so although there is only 1 
volume, we create 50 threads):*

ozone freon randomkeys --numOfVolumes=1 --numOfBuckets 10 --numOfKeys 1000 
--numOfThreads 50 --keySize 1048576 --replicationType=RATIS --factor=THREE

*freon result:*

Number of Volumes created: 1
 Number of Buckets created: 10
 Number of Keys added: 1
 Ratis replication factor: THREE
 Ratis replication type: RATIS
 Average Time spent in volume creation: 00:00:00,000
 Average Time spent in bucket creation: 00:00:00,002
 Average Time spent in key creation: 00:00:02,430
 Average Time spent in key write: 00:03:14,370
 Total bytes written: 1048576
 Total Execution time: 00:03:21,064

 
h3. 4. Conclusion

In this environment, writing the same amount of data, by improving the 
concurrency framework, test performance is 8 times faster.

 


was (Author: xudongcao):
Test performance comparison before and after jira：

1. Test Environment

 Test under a 3 nodes ozone cluster, each node is equiped as below:
||Hardware ||Configuration||
|CPU, Memory|16 cores Xeon 2.10GHZ + 64GB RAM|
|Disk|3.6TB HDD * 11pcs|
|Network|1000M Network Card|

 
h3. 2. Test result before jira

*freon command (since there is only 1 volume, so only 1 thread will be 
created):*

bin/ozone freon randomkeys --numOfVolumes=1 --numOfBuckets 10 --numOfKeys 1000 
--keySize 1048576 --replicationType=RATIS --factor=THREE

*freon result:*

Number of Volumes created: 1
Number of Buckets created: 10
Number of Keys added: 1
Ratis replication factor: THREE
Ratis replication type: RATIS
Average Time spent in volume creation: 00:00:00,040
Average Time spent in bucket creation: 00:00:00,020
Average Time spent in key creation: 00:00:14,493
Average Time spent in key write: 00:23:14,749
Total bytes written: 1048576
Total Execution time: 00:23:31,540

 
h3. 3. Test result after jira

*freon command (note we have set --numOfThreads 50, so although there is only 1 
volume, we create 50 threads):*

ozone freon randomkeys --numOfVolumes=1 --numOfBuckets 10 --numOfKeys 1000 
--numOfThreads 50 --keySize 1048576 --replicationType=RATIS --factor=THREE

*freon result:*

Number of Volumes created: 1
Number of Buckets created: 10
Number of Keys added: 1
Ratis replication factor: THREE
Ratis replication type: RATIS
Average Time spent in volume creation: 00:00:00,000
Average Time spent in bucket creation: 00:00:00,002
Average Time spent in key creation: 00:00:02,430
Average Time spent in key write: 00:03:14,370
Total bytes written: 1048576
Total Execution time: 00:03:21,064

 
h3. 4. Conclusion

In this environment, writing the same amount of data, by improving the 
concurrency framework, test performance is 8 times faster.

 

> Freon: Improve the concurrency testing framework.
> -
>
> Key: HDDS-1532
> URL: https://issues.apache.org/jira/browse/HDDS-1532
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: test
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Major
>
> Currently, Freon's concurrency framework is just on volume-level, but in 
> actual testing, users are likely to provide a smaller volume number（typically 
> 1）, and a larger bucket number and key number, in which case the existing 
> concurrency framework can not make good use of the thread pool.
> We need to improve the concurrency policy, make the volume creation task, 
> bucket creation task, and key creation task all can be equally submitted to 
> the thread pool as a general task. 



--
This message was sent by Atlassian JIRA

[jira] [Updated] (HDDS-1703) Freon uses wait/notify instead of polling to eliminate the test result errors.

2019-06-18 Thread Xudong Cao (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-1703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xudong Cao updated HDDS-1703:
-
Summary: Freon uses wait/notify instead of polling to eliminate the test 
result errors.  (was: Freon uses wait/notify instead of polling to eliminate 
the test result error.)

> Freon uses wait/notify instead of polling to eliminate the test result errors.
> --
>
> Key: HDDS-1703
> URL: https://issues.apache.org/jira/browse/HDDS-1703
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 0.4.0
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Minor
>
> After HDDS-1532, Freon has an efficient concurrent testing framework. In the 
> new framework, the main thread checks every 5s to verify whether the test is 
> completed (or an exception occurred), which will eventually introduce a 
> maximum error of 5s.
> In most cases, Freon's test results are at minutes or tens of minutes level, 
> thus a 5s error is not significant, but in some particularly small tests, a 
> 5s error may have a significant impact.
> Therefore, we can use the combination of Object.wait() + Object.notify()  
> instead of polling to completely eliminate this error.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-1703) Freon uses wait/notify instead of polling to eliminate the test result errors.

2019-06-18 Thread Xudong Cao (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-1703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xudong Cao updated HDDS-1703:
-
Issue Type: Bug  (was: Improvement)

> Freon uses wait/notify instead of polling to eliminate the test result errors.
> --
>
> Key: HDDS-1703
> URL: https://issues.apache.org/jira/browse/HDDS-1703
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.4.0
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Minor
>
> After HDDS-1532, Freon has an efficient concurrent testing framework. In the 
> new framework, the main thread checks every 5s to verify whether the test is 
> completed (or an exception occurred), which will eventually introduce a 
> maximum error of 5s.
> In most cases, Freon's test results are at minutes or tens of minutes level, 
> thus a 5s error is not significant, but in some particularly small tests, a 
> 5s error may have a significant impact.
> Therefore, we can use the combination of Object.wait() + Object.notify()  
> instead of polling to completely eliminate this error.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDDS-1703) Freon uses wait/notify instead of polling to eliminate the test result error.

2019-06-18 Thread Xudong Cao (JIRA)

Xudong Cao created HDDS-1703:


 Summary: Freon uses wait/notify instead of polling to eliminate 
the test result error.
 Key: HDDS-1703
 URL: https://issues.apache.org/jira/browse/HDDS-1703
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
  Components: test
Affects Versions: 0.4.0
Reporter: Xudong Cao
Assignee: Xudong Cao


After HDDS-1532, Freon has an efficient concurrent testing framework. In the 
new framework, the main thread checks every 5s to verify whether the test is 
completed (or an exception occurred), which will eventually introduce a maximum 
error of 5s.

In most cases, Freon's test results are at minutes or tens of minutes level, 
thus a 5s error is not significant, but in some particularly small tests, a 5s 
error may have a significant impact.

Therefore, we can use the combination of Object.wait() + Object.notify()  
instead of polling to completely eliminate this error.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-1532) Freon: Improve the concurrent testing framework.

2019-06-13 Thread Xudong Cao (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-1532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xudong Cao updated HDDS-1532:
-
Affects Version/s: 0.4.0
   Status: Patch Available  (was: Open)

> Freon: Improve the concurrent testing framework.
> 
>
> Key: HDDS-1532
> URL: https://issues.apache.org/jira/browse/HDDS-1532
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.4.0
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently, Freon's concurrency framework is just on volume-level, but in 
> actual testing, users are likely to provide a smaller volume number（typically 
> 1）, and a larger bucket number and key number, in which case the existing 
> concurrency framework can not make good use of the thread pool.
> We need to improve the concurrency policy, make the volume creation task, 
> bucket creation task, and key creation task all can be equally submitted to 
> the thread pool as a general task. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDDS-1532) Ozone: Freon: Improve the concurrent testing framework.

2019-06-13 Thread Xudong Cao (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-1532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xudong Cao updated HDDS-1532:
-
Summary: Ozone: Freon: Improve the concurrent testing framework.  (was: 
Freon: Improve the concurrent testing framework.)

> Ozone: Freon: Improve the concurrent testing framework.
> ---
>
> Key: HDDS-1532
> URL: https://issues.apache.org/jira/browse/HDDS-1532
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.4.0
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently, Freon's concurrency framework is just on volume-level, but in 
> actual testing, users are likely to provide a smaller volume number（typically 
> 1）, and a larger bucket number and key number, in which case the existing 
> concurrency framework can not make good use of the thread pool.
> We need to improve the concurrency policy, make the volume creation task, 
> bucket creation task, and key creation task all can be equally submitted to 
> the thread pool as a general task. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14646) Standby NameNode should terminate the FsImage put process as soon as possible if the peer NN is not in the appropriate state to receive an image.

2019-07-14 Thread Xudong Cao (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Xudong Cao updated HDFS-14646:
--
Description:
*Problem Description:*
In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put the
image to all other NNs (whether the peer NN is an ANN or not), and even if the
peer NN immediately replies with an error (such as
TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult
.OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put
process immediately, but will put the FsImage completely to the peer NN, and
will not read the peer NN's reply until the put is completed.

In a relatively large HDFS cluster, the size of FsImage can often reach about
30GB. In this case, this invalid put brings two problems:
# Wasting time and bandwidth.
# Since the ImageServlet of the peer NN no longer receives the FsImage, the
socket Send-Q of the local SNN is very large, and the ImageUpload thread will
be blocked in writing socket for a long time, eventually causing the local
StandbyCheckpointer thread often blocked for several hours.

*An example is as follows:*
In the following figure, the local NN 100.76.3.234 is a SNN, the peer NN
100.76.3.170 is another SNN, and the 8080 is NN Http port. When the local SNN
starts to put the FsImage, 170 will reply with a NOT_ACTIVE_NAMENODE_FAILURE
error immediately. In this case, the local SNN should terminate put
immediately, but in fact, local SNN has to wait until the image has been
completely put to the peer NN，and then can read the response.
# At this time, since the ImageServlet of the peer NN no longer receives the
FsImage, the socket Send-Q of the local SNN is very large:
!largeSendQ.png!

2. Moreover, the local SNN's ImageUpload thread will be blocked in
writing socket for a long time:

!blockedInWritingSocket.png! .

3. Eventually, the StandbyCheckpointer thread of local SNN is waiting for
the execution result of the ImageUpload thread, blocking in Future.get(), and
the blocking time may be as long as several hours:

!get1.png!

!get2.png!

*Solution:*
When the local SNN plans to put a FsImage to the peer NN, it need to test
whether he really need to put it at this time. The test process is:
# Establish an HTTP connection with the peer NN, send the put request, and
then immediately read the response (this is the key point). If the peer NN
replies with any of the following errors
(TransferResult.AUTHENTICATION_FAILURE,
TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult.
OLD_TRANSACTION_ID_FAILURE), immediately terminate the put process.
# If the peer NN is truly the ANN and can receive the FsImage normally, it
will reply to the local SNN with an HTTP response 410
(HttpServletResponse.SC_GONE, which is TransferResult.UNEXPECTED_FAILURE). At
this time, the local SNN can really begin to put the image.

*Note:*
This problem needs to be reproduced in a large cluster (the size of FsImage in
our cluster is about 30GB). Therefore, unit testing is difficult to write. In
our cluster, after the modification, the problem has been solved and there is
no such thing as a large backlog of Send-Q.

was:
*Problem Description:*
In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put the
image to all other NNs (whether the peer NN is an ANN or not), and even if the
peer NN immediately replies with an error (such as
TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult
.OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put
process immediately, but will put the FsImage completely to the peer NN, and
will not read the peer NN's reply until the put is completed.

In a relatively large HDFS cluster, the size of FsImage can often reach about
30G. In this case, this invalid put brings two problems:
# Wasting time and bandwidth.
# Since the ImageServlet of the peer NN no longer receives the FsImage, the
socket Send-Q of the local SNN is very large, and the ImageUpload thread will
be blocked in writing socket for a long time, eventually causing the local
StandbyCheckpointer thread often blocked for several hours.

*An example is as follows:*
In the following figure, the local NN 100.76.3.234 is a SNN, the peer NN
100.76.3.170 is another SNN, and the 8080 is NN Http port. When the local SNN
starts to put FsImage, 170 will reply with a NOT_ACTIVE_NAMENODE_FAILURE error
immediately. In this case, the local SNN should terminate put immediately, but
in fact, local SNN has to wait until the image has been completely put to peer
NN，and then canl read the response.
# At this time, since the ImageServlet of the peer NN no longer receives the
FsImage, the socket Send-Q of the local SNN is very large:
!largeSendQ.png!

2. Moreover, the local SNN's ImageUpload

[jira] [Updated] (HDFS-14646) Standby NameNode should terminate the FsImage put process as soon as possible if the peer NN is not in the appropriate state to receive an image.

2019-07-14 Thread Xudong Cao (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

*An example is as follows:*
In the following figure, the local NN 100.76.3.234 is a SNN, the peer NN
100.76.3.170 is another SNN, and the 8080 is NN Http port. When the local SNN
starts to put FsImage, 170 will reply with a NOT_ACTIVE_NAMENODE_FAILURE error
immediately. In this case, the local SNN should terminate put immediately, but
in fact, local SNN has to wait until the image has been completely put to peer
NN，and then canl read the response.
# At this time, since the ImageServlet of the peer NN no longer receives the
FsImage, the socket Send-Q of the local SNN is very large:
!largeSendQ.png!

2. Moreover, the local SNN's ImageUpload thread will be blocked in
writing socket for a long time:

!blockedInWritingSocket.png! .

!get1.png!

!get2.png!

*Note:*
This problem needs to be reproduced in a large cluster (the size of FsImage in
our cluster is about 30G). Therefore, unit testing is difficult to write. In
our cluster, after the modification, the problem has been solved and there is
no such thing as a large backlog of Send-Q.

*An example is as follows:*
In the following figure, the local NN 100.76.3.234 is a SNN, the peer NN
100.76.3.170 is another SNN, and the 8080 is NN Http port. When the local SNN
starts to put FsImage, 170 will reply with a NOT_ACTIVE_NAMENODE_FAILURE error
immediately. In this case, the local SNN should terminate put immediately, but
in fact, local SNN has to wait until the image has been completely put to peer
NN，and then canl read the response.
# At this time, since the ImageServlet of the peer NN no longer receives the
FsImage, the socket Send-Q of the local SNN is very large:
!largeSendQ.png!

2. Moreover, the local SNN's ImageUpload thread will

[jira] [Updated] (HDFS-14646) Standby NameNode should terminate the FsImage put process as soon as possible if the peer NN is not in the appropriate state to receive an image.

2019-07-14 Thread Xudong Cao (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xudong Cao updated HDFS-14646:
--
Attachment: (was: get2.png)

> Standby NameNode should terminate the FsImage put process as soon as possible 
> if the peer NN is not in the appropriate state to receive an image.
> -
>
> Key: HDFS-14646
> URL: https://issues.apache.org/jira/browse/HDFS-14646
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.2
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Major
> Attachments: blockedInWritingSocket.png, get1.png, get2.png, 
> largeSendQ.png
>
>
> *Problem Description:*
>  In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put 
> the image to all other NNs (whether the peer NN is an ANN or not), and even 
> if the peer NN immediately replies with an error (such as 
> TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult 
> .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put 
> process immediately, but will put the FsImage completely to the peer NN, and 
> will not read the peer NN's reply until the put is completed.
> In a relatively large HDFS cluster, the size of FsImage can often reach about 
> 30G. In this case, this invalid put brings two problems:
>  # Wasting time and bandwidth.
>  # Since the ImageServlet of the peer NN no longer receives the FsImage, the 
> socket Send-Q of the local SNN is very large, and the ImageUpload thread will 
> be blocked in writing socket for a long time, eventually causing the local 
> StandbyCheckpointer thread often blocked for several hours.
> *An example is as follows:*
>  In the following figure, the local NN 100.76.3.234 is a SNN, the peer NN 
> 100.76.3.170 is another SNN, and the 8080 is NN Http port. When the local SNN 
> starts to put FsImage, 170 will reply with a NOT_ACTIVE_NAMENODE_FAILURE 
> error immediately. In this case, the local SNN should terminate put 
> immediately, but in fact, local SNN has to wait until the image has been 
> completely put to peer NN，and then canl read the response.
>  # At this time, since the ImageServlet of the peer NN no longer receives the 
> FsImage, the socket Send-Q of the local SNN is very large:         
> !largeSendQ.png!
>       2. Moreover, the local SNN's ImageUpload thread will be blocked in 
> writing socket for a long time:
>           !blockedInWritingSocket.png!.
>  
>      3. Eventually, the StandbyCheckpointer thread of local SNN is waiting 
> for the execution result of the ImageUpload thread, blocking in Future.get(), 
> and the blocking time may be as long as several hours:
>            !get1.png!
>           
>            
>  
>  
> *Solution:*
>  When the local SNN is ready to put a FsImage to the peer NN, it need to test 
> whether he really need to put it at this time. The test process is:
>  # Establish an HTTP connection with the peer NN, send a put request, and 
> then immediately read the response (this is the key point). If the peer NN 
> replies with any of the following errors 
> (TransferResult.AUTHENTICATION_FAILURE, 
> TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult.
>  # If the peer NN is truly the ANN and can receive the FsImage normally, it 
> will reply to the local SNN with an HTTP response 410 
> (HttpServletResponse.SC_GONE, which is TransferResult.UNEXPECTED_FAILURE). At 
> this time, the local SNN can really begin to put the image.
> *Note:*
>  This problem needs to be reproduced in a large cluster (the size of FsImage 
> in our cluster is about 30G). Therefore, unit testing is difficult to write. 
> In our real cluster, after the modification, the problem has been solved. 
> There is no such thing as a large backlog of Send-Q.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14646) Standby NameNode should terminate the FsImage put process as soon as possible if the peer NN is not in the appropriate state to receive an image.

2019-07-14 Thread Xudong Cao (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xudong Cao updated HDFS-14646:
--
Attachment: get2.png

> Standby NameNode should terminate the FsImage put process as soon as possible 
> if the peer NN is not in the appropriate state to receive an image.
> -
>
> Key: HDFS-14646
> URL: https://issues.apache.org/jira/browse/HDFS-14646
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.2
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Major
> Attachments: blockedInWritingSocket.png, get1.png, get2.png, 
> largeSendQ.png
>
>
> *Problem Description:*
>  In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put 
> the image to all other NNs (whether the peer NN is an ANN or not), and even 
> if the peer NN immediately replies with an error (such as 
> TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult 
> .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put 
> process immediately, but will put the FsImage completely to the peer NN, and 
> will not read the peer NN's reply until the put is completed.
> In a relatively large HDFS cluster, the size of FsImage can often reach about 
> 30G. In this case, this invalid put brings two problems:
>  # Wasting time and bandwidth.
>  # Since the ImageServlet of the peer NN no longer receives the FsImage, the 
> socket Send-Q of the local SNN is very large, and the ImageUpload thread will 
> be blocked in writing socket for a long time, eventually causing the local 
> StandbyCheckpointer thread often blocked for several hours.
> *An example is as follows:*
>  In the following figure, the local NN 100.76.3.234 is a SNN, the peer NN 
> 100.76.3.170 is another SNN, and the 8080 is NN Http port. When the local SNN 
> starts to put FsImage, 170 will reply with a NOT_ACTIVE_NAMENODE_FAILURE 
> error immediately. In this case, the local SNN should terminate put 
> immediately, but in fact, local SNN has to wait until the image has been 
> completely put to peer NN，and then canl read the response.
>  # At this time, since the ImageServlet of the peer NN no longer receives the 
> FsImage, the socket Send-Q of the local SNN is very large:         
> !largeSendQ.png!
>       2. Moreover, the local SNN's ImageUpload thread will be blocked in 
> writing socket for a long time:
>           !blockedInWritingSocket.png!.
>  
>      3. Eventually, the StandbyCheckpointer thread of local SNN is waiting 
> for the execution result of the ImageUpload thread, blocking in Future.get(), 
> and the blocking time may be as long as several hours:
>            !get1.png!
>           
>            
>  
>  
> *Solution:*
>  When the local SNN is ready to put a FsImage to the peer NN, it need to test 
> whether he really need to put it at this time. The test process is:
>  # Establish an HTTP connection with the peer NN, send a put request, and 
> then immediately read the response (this is the key point). If the peer NN 
> replies with any of the following errors 
> (TransferResult.AUTHENTICATION_FAILURE, 
> TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult.
>  # If the peer NN is truly the ANN and can receive the FsImage normally, it 
> will reply to the local SNN with an HTTP response 410 
> (HttpServletResponse.SC_GONE, which is TransferResult.UNEXPECTED_FAILURE). At 
> this time, the local SNN can really begin to put the image.
> *Note:*
>  This problem needs to be reproduced in a large cluster (the size of FsImage 
> in our cluster is about 30G). Therefore, unit testing is difficult to write. 
> In our real cluster, after the modification, the problem has been solved. 
> There is no such thing as a large backlog of Send-Q.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14646) Standby NameNode should terminate the FsImage put process as soon as possible if the peer NN is not in the appropriate state to receive an image.

2019-07-14 Thread Xudong Cao (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

*An example is as follows:*
In the following figure, the local NN 100.76.3.234 is a SNN, the peer NN
100.76.3.170 is another SNN, and the 8080 is NN Http port. When the local SNN
starts to put FsImage, 170 will reply with a NOT_ACTIVE_NAMENODE_FAILURE error
immediately. In this case, the local SNN should terminate put immediately, but
in fact, local SNN has to wait until the image has been completely put to peer
NN，and then canl read the response.
# At this time, since the ImageServlet of the peer NN no longer receives the
FsImage, the socket Send-Q of the local SNN is very large:
!largeSendQ.png!

2. Moreover, the local SNN's ImageUpload thread will be blocked in
writing socket for a long time:

!blockedInWritingSocket.png!.

!get1.png!

*Solution:*
When the local SNN is ready to put a FsImage to the peer NN, it need to test
whether he really need to put it at this time. The test process is:
# Establish an HTTP connection with the peer NN, send a put request, and then
immediately read the response (this is the key point). If the peer NN replies
with any of the following errors (TransferResult.AUTHENTICATION_FAILURE,
TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult.
# If the peer NN is truly the ANN and can receive the FsImage normally, it
will reply to the local SNN with an HTTP response 410
(HttpServletResponse.SC_GONE, which is TransferResult.UNEXPECTED_FAILURE). At
this time, the local SNN can really begin to put the image.

*Note:*
This problem needs to be reproduced in a large cluster (the size of FsImage in
our cluster is about 30G). Therefore, unit testing is difficult to write. In
our real cluster, after the modification, the problem has been solved. There is
no such thing as a large backlog of Send-Q.

*An example is as follows:*
In the following figure, the local NN 100.76.3.234 is a SNN, the peer NN
100.76.3.170 is another SNN, and the 8080 is NN Http port. When the local SNN
starts to put FsImage, 170 will reply with a NOT_ACTIVE_NAMENODE_FAILURE error
immediately. In this case, the local SNN should terminate put immediately, but
in fact, local SNN has to wait until the image has been completely put to peer
NN，and then canl read the response.
# At this time, since the ImageServlet of the peer NN no longer receives the
FsImage, the socket Send-Q of the local SNN is very large:

2. Moreover, the local SNN's ImageUpload thread will be blocked in
writing socket for a long time:

!blockWriiting.png!

3. Eventually, the

[jira] [Updated] (HDFS-14646) Standby NameNode should terminate the FsImage put process as soon as possible if the peer NN is not in the appropriate state to receive an image.

2019-07-14 Thread Xudong Cao (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

*An example is as follows:*
In the following figure, the local NN 100.76.3.234 is a SNN, the peer NN
100.76.3.170 is another SNN, and the 8080 is NN Http port. When the local SNN
starts to put FsImage, 170 will reply with a NOT_ACTIVE_NAMENODE_FAILURE error
immediately. In this case, the local SNN should terminate put immediately, but
in fact, local SNN has to wait until the image has been completely put to peer
NN，and then canl read the response.
# At this time, since the ImageServlet of the peer NN no longer receives the
FsImage, the socket Send-Q of the local SNN is very large:

2. Moreover, the local SNN's ImageUpload thread will be blocked in
writing socket for a long time:

!blockWriiting.png!

!get1.png!

!get2.png!

was:
*Problem Description:*
In multi-NameNode scenario, when an SNN uploads a FsImage, it will put the
image to all other NNs (whether the peer NN is an ANN or not), and even if the
peer NN immediately replies with an error (such as
TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult
.OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put
process immediately, but will put the FsImage completely to the peer NN, and
will not read the peer NN's reply until the put is completed.

In a relatively large HDFS cluster, the size of FsImage can often reach about
30G. In this case, this invalid put brings two problems:
1. Wasting time and bandwidth.
2. Since the ImageServlet of the peer NN no longer receives the FsImage, the
socket Send-Q of the local SNN is very large, and the ImageUpload thread will
be blocked in writting socket for a long time, eventually causing the local
StandbyCheckpointer thread often blocked for several hours.

*An example is as follows:*
In the following figure, the local NN 100.76.3.234 is an SNN, the peer NN
100.76.3.170 is another SNN, and the 8080 is NN Http port. When the local SNN
starts to put FsImage, 170 will reply with a NOT_ACTIVE_NAMENODE_FAILURE error
immediately. In this case, local SNN should terminate put immediately, but in
fact, local SNN has to wait until the image has been completely put to peer
NN，and then canl read the response.
# At this time, since the ImageServlet of the peer NN no longer receives the
FsImage, the socket Send-Q of the local SNN is very large:

!largeSendQ.png!

2. Moreover, the local SNN's ImageUpload thread will be blocked in
writing socket for a long time:

!blockWriiting.png!

3. Eventually, the StandbyCheckpointer thread of local SNN is waiting for
the

[jira] [Updated] (HDFS-14646) Standby NameNode should terminate the FsImage put process as soon as possible if the peer NN is not in the appropriate state to receive an image.

2019-07-14 Thread Xudong Cao (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xudong Cao updated HDFS-14646:
--
Attachment: (was: blockWriiting.png)

> Standby NameNode should terminate the FsImage put process as soon as possible 
> if the peer NN is not in the appropriate state to receive an image.
> -
>
> Key: HDFS-14646
> URL: https://issues.apache.org/jira/browse/HDFS-14646
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.2
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Major
> Attachments: get1.png, get2.png, largeSendQ.png
>
>
> *Problem Description:*
>  In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put 
> the image to all other NNs (whether the peer NN is an ANN or not), and even 
> if the peer NN immediately replies with an error (such as 
> TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult 
> .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put 
> process immediately, but will put the FsImage completely to the peer NN, and 
> will not read the peer NN's reply until the put is completed.
> In a relatively large HDFS cluster, the size of FsImage can often reach about 
> 30G. In this case, this invalid put brings two problems:
>  # Wasting time and bandwidth.
>  # Since the ImageServlet of the peer NN no longer receives the FsImage, the 
> socket Send-Q of the local SNN is very large, and the ImageUpload thread will 
> be blocked in writing socket for a long time, eventually causing the local 
> StandbyCheckpointer thread often blocked for several hours.
> *An example is as follows:*
>  In the following figure, the local NN 100.76.3.234 is a SNN, the peer NN 
> 100.76.3.170 is another SNN, and the 8080 is NN Http port. When the local SNN 
> starts to put FsImage, 170 will reply with a NOT_ACTIVE_NAMENODE_FAILURE 
> error immediately. In this case, the local SNN should terminate put 
> immediately, but in fact, local SNN has to wait until the image has been 
> completely put to peer NN，and then canl read the response.
>  # At this time, since the ImageServlet of the peer NN no longer receives the 
> FsImage, the socket Send-Q of the local SNN is very large:
>           
>       2. Moreover, the local SNN's ImageUpload thread will be blocked in 
> writing socket for a long time:
>          !blockWriiting.png!
>  
>      3. Eventually, the StandbyCheckpointer thread of local SNN is waiting 
> for the execution result of the ImageUpload thread, blocking in Future.get(), 
> and the blocking time may be as long as several hours:
> !get1.png!
> !get2.png!
>  
> *Solution:*
>  When the local SNN is ready to put a FsImage to the peer NN, it need to test 
> whether he really need to put it at this time. The test process is:
>  # Establish an HTTP connection with the peer NN, send a put request, and 
> then immediately read the response (this is the key point). If the peer NN 
> replies with any of the following errors 
> (TransferResult.AUTHENTICATION_FAILURE, 
> TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult.
>  # If the peer NN is truly the ANN and can receive the FsImage normally, it 
> will reply to the local SNN with an HTTP response 410 
> (HttpServletResponse.SC_GONE, which is TransferResult.UNEXPECTED_FAILURE). At 
> this time, the local SNN can really begin to put the image.
> *Note:*
>  This problem needs to be reproduced in a large cluster (the size of FsImage 
> in our cluster is about 30G). Therefore, unit testing is difficult to write. 
> In our real cluster, after the modification, the problem has been solved. 
> There is no such thing as a large backlog of Send-Q.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14646) Standby NameNode should terminate the FsImage put process as soon as possible if the peer NN is not in the appropriate state to receive an image.

2019-07-14 Thread Xudong Cao (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xudong Cao updated HDFS-14646:
--
Attachment: largeSendQ.png

> Standby NameNode should terminate the FsImage put process as soon as possible 
> if the peer NN is not in the appropriate state to receive an image.
> -
>
> Key: HDFS-14646
> URL: https://issues.apache.org/jira/browse/HDFS-14646
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.2
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Major
> Attachments: get1.png, get2.png, largeSendQ.png
>
>
> *Problem Description:*
>  In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put 
> the image to all other NNs (whether the peer NN is an ANN or not), and even 
> if the peer NN immediately replies with an error (such as 
> TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult 
> .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put 
> process immediately, but will put the FsImage completely to the peer NN, and 
> will not read the peer NN's reply until the put is completed.
> In a relatively large HDFS cluster, the size of FsImage can often reach about 
> 30G. In this case, this invalid put brings two problems:
>  # Wasting time and bandwidth.
>  # Since the ImageServlet of the peer NN no longer receives the FsImage, the 
> socket Send-Q of the local SNN is very large, and the ImageUpload thread will 
> be blocked in writing socket for a long time, eventually causing the local 
> StandbyCheckpointer thread often blocked for several hours.
> *An example is as follows:*
>  In the following figure, the local NN 100.76.3.234 is a SNN, the peer NN 
> 100.76.3.170 is another SNN, and the 8080 is NN Http port. When the local SNN 
> starts to put FsImage, 170 will reply with a NOT_ACTIVE_NAMENODE_FAILURE 
> error immediately. In this case, the local SNN should terminate put 
> immediately, but in fact, local SNN has to wait until the image has been 
> completely put to peer NN，and then canl read the response.
>  # At this time, since the ImageServlet of the peer NN no longer receives the 
> FsImage, the socket Send-Q of the local SNN is very large:
>           
>       2. Moreover, the local SNN's ImageUpload thread will be blocked in 
> writing socket for a long time:
>          !blockWriiting.png!
>  
>      3. Eventually, the StandbyCheckpointer thread of local SNN is waiting 
> for the execution result of the ImageUpload thread, blocking in Future.get(), 
> and the blocking time may be as long as several hours:
> !get1.png!
> !get2.png!
>  
> *Solution:*
>  When the local SNN is ready to put a FsImage to the peer NN, it need to test 
> whether he really need to put it at this time. The test process is:
>  # Establish an HTTP connection with the peer NN, send a put request, and 
> then immediately read the response (this is the key point). If the peer NN 
> replies with any of the following errors 
> (TransferResult.AUTHENTICATION_FAILURE, 
> TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult.
>  # If the peer NN is truly the ANN and can receive the FsImage normally, it 
> will reply to the local SNN with an HTTP response 410 
> (HttpServletResponse.SC_GONE, which is TransferResult.UNEXPECTED_FAILURE). At 
> this time, the local SNN can really begin to put the image.
> *Note:*
>  This problem needs to be reproduced in a large cluster (the size of FsImage 
> in our cluster is about 30G). Therefore, unit testing is difficult to write. 
> In our real cluster, after the modification, the problem has been solved. 
> There is no such thing as a large backlog of Send-Q.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14646) Standby NameNode should terminate the FsImage put process as soon as possible if the peer NN is not in the appropriate state to receive an image.

2019-07-14 Thread Xudong Cao (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xudong Cao updated HDFS-14646:
--
Attachment: (was: largeSendQ.png)

> Standby NameNode should terminate the FsImage put process as soon as possible 
> if the peer NN is not in the appropriate state to receive an image.
> -
>
> Key: HDFS-14646
> URL: https://issues.apache.org/jira/browse/HDFS-14646
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.2
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Major
> Attachments: get1.png, get2.png, largeSendQ.png
>
>
> *Problem Description:*
>  In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put 
> the image to all other NNs (whether the peer NN is an ANN or not), and even 
> if the peer NN immediately replies with an error (such as 
> TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult 
> .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put 
> process immediately, but will put the FsImage completely to the peer NN, and 
> will not read the peer NN's reply until the put is completed.
> In a relatively large HDFS cluster, the size of FsImage can often reach about 
> 30G. In this case, this invalid put brings two problems:
>  # Wasting time and bandwidth.
>  # Since the ImageServlet of the peer NN no longer receives the FsImage, the 
> socket Send-Q of the local SNN is very large, and the ImageUpload thread will 
> be blocked in writing socket for a long time, eventually causing the local 
> StandbyCheckpointer thread often blocked for several hours.
> *An example is as follows:*
>  In the following figure, the local NN 100.76.3.234 is a SNN, the peer NN 
> 100.76.3.170 is another SNN, and the 8080 is NN Http port. When the local SNN 
> starts to put FsImage, 170 will reply with a NOT_ACTIVE_NAMENODE_FAILURE 
> error immediately. In this case, the local SNN should terminate put 
> immediately, but in fact, local SNN has to wait until the image has been 
> completely put to peer NN，and then canl read the response.
>  # At this time, since the ImageServlet of the peer NN no longer receives the 
> FsImage, the socket Send-Q of the local SNN is very large:
>           
>       2. Moreover, the local SNN's ImageUpload thread will be blocked in 
> writing socket for a long time:
>          !blockWriiting.png!
>  
>      3. Eventually, the StandbyCheckpointer thread of local SNN is waiting 
> for the execution result of the ImageUpload thread, blocking in Future.get(), 
> and the blocking time may be as long as several hours:
> !get1.png!
> !get2.png!
>  
> *Solution:*
>  When the local SNN is ready to put a FsImage to the peer NN, it need to test 
> whether he really need to put it at this time. The test process is:
>  # Establish an HTTP connection with the peer NN, send a put request, and 
> then immediately read the response (this is the key point). If the peer NN 
> replies with any of the following errors 
> (TransferResult.AUTHENTICATION_FAILURE, 
> TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult.
>  # If the peer NN is truly the ANN and can receive the FsImage normally, it 
> will reply to the local SNN with an HTTP response 410 
> (HttpServletResponse.SC_GONE, which is TransferResult.UNEXPECTED_FAILURE). At 
> this time, the local SNN can really begin to put the image.
> *Note:*
>  This problem needs to be reproduced in a large cluster (the size of FsImage 
> in our cluster is about 30G). Therefore, unit testing is difficult to write. 
> In our real cluster, after the modification, the problem has been solved. 
> There is no such thing as a large backlog of Send-Q.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14646) Standby NameNode should terminate the FsImage put process as soon as possible if the peer NN is not in the appropriate state to receive an image.

2019-07-14 Thread Xudong Cao (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xudong Cao updated HDFS-14646:
--
Attachment: blockedInWritingSocket.png

> Standby NameNode should terminate the FsImage put process as soon as possible 
> if the peer NN is not in the appropriate state to receive an image.
> -
>
> Key: HDFS-14646
> URL: https://issues.apache.org/jira/browse/HDFS-14646
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.2
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Major
> Attachments: blockedInWritingSocket.png, get1.png, get2.png, 
> largeSendQ.png
>
>
> *Problem Description:*
>  In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put 
> the image to all other NNs (whether the peer NN is an ANN or not), and even 
> if the peer NN immediately replies with an error (such as 
> TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult 
> .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put 
> process immediately, but will put the FsImage completely to the peer NN, and 
> will not read the peer NN's reply until the put is completed.
> In a relatively large HDFS cluster, the size of FsImage can often reach about 
> 30G. In this case, this invalid put brings two problems:
>  # Wasting time and bandwidth.
>  # Since the ImageServlet of the peer NN no longer receives the FsImage, the 
> socket Send-Q of the local SNN is very large, and the ImageUpload thread will 
> be blocked in writing socket for a long time, eventually causing the local 
> StandbyCheckpointer thread often blocked for several hours.
> *An example is as follows:*
>  In the following figure, the local NN 100.76.3.234 is a SNN, the peer NN 
> 100.76.3.170 is another SNN, and the 8080 is NN Http port. When the local SNN 
> starts to put FsImage, 170 will reply with a NOT_ACTIVE_NAMENODE_FAILURE 
> error immediately. In this case, the local SNN should terminate put 
> immediately, but in fact, local SNN has to wait until the image has been 
> completely put to peer NN，and then canl read the response.
>  # At this time, since the ImageServlet of the peer NN no longer receives the 
> FsImage, the socket Send-Q of the local SNN is very large:
>           
>       2. Moreover, the local SNN's ImageUpload thread will be blocked in 
> writing socket for a long time:
>          !blockWriiting.png!
>  
>      3. Eventually, the StandbyCheckpointer thread of local SNN is waiting 
> for the execution result of the ImageUpload thread, blocking in Future.get(), 
> and the blocking time may be as long as several hours:
> !get1.png!
> !get2.png!
>  
> *Solution:*
>  When the local SNN is ready to put a FsImage to the peer NN, it need to test 
> whether he really need to put it at this time. The test process is:
>  # Establish an HTTP connection with the peer NN, send a put request, and 
> then immediately read the response (this is the key point). If the peer NN 
> replies with any of the following errors 
> (TransferResult.AUTHENTICATION_FAILURE, 
> TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult.
>  # If the peer NN is truly the ANN and can receive the FsImage normally, it 
> will reply to the local SNN with an HTTP response 410 
> (HttpServletResponse.SC_GONE, which is TransferResult.UNEXPECTED_FAILURE). At 
> this time, the local SNN can really begin to put the image.
> *Note:*
>  This problem needs to be reproduced in a large cluster (the size of FsImage 
> in our cluster is about 30G). Therefore, unit testing is difficult to write. 
> In our real cluster, after the modification, the problem has been solved. 
> There is no such thing as a large backlog of Send-Q.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14646) Standby NameNode should terminate the FsImage put process as soon as possible if the peer NN is not in the appropriate state to receive an image.

2019-07-14 Thread Xudong Cao (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

2. Moreover, the local SNN's ImageUpload thread will be blocked in
writing socket for a long time:

!blockedInWritingSocket.png! .

!get1.png!

!get2.png!

*Solution:*
When the local SNN plans to put a FsImage to the peer NN, it need to test
whether he really need to put it at this time. The test process is:
# Establish an HTTP connection with the peer NN, send the put request, and
then immediately read the response (this is the key point). If the peer NN
replies any of the following errors (TransferResult.AUTHENTICATION_FAILURE,
TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult.
OLD_TRANSACTION_ID_FAILURE), immediately terminate the put process.
# If the peer NN is indeed the Active NameNode AND it's now in the appropriate
state to receive an image, it will reply an HTTP response 410
(HttpServletResponse.SC_GONE, which is TransferResult.UNEXPECTED_FAILURE). At
this time, the local SNN can really begin to put the image.

2. Moreover, the local SNN's

[jira] [Updated] (HDFS-14646) Standby NameNode should terminate the FsImage put process as soon as possible if the peer NN is not in the appropriate state to receive an image.

2019-07-14 Thread Xudong Cao (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

2. Moreover, the local SNN's ImageUpload thread will be blocked in
writing socket for a long time:

!blockedInWritingSocket.png! .

!get1.png!

!get2.png!

*Solution:*
When the local SNN plans to put a FsImage to the peer NN, it need to test
whether he really need to put it at this time. The test process is:
# Establish an HTTP connection with the peer NN, send the put request, and
then immediately read the response (this is the key point). If the peer NN
replies with any of the following errors
(TransferResult.AUTHENTICATION_FAILURE,
TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult.
OLD_TRANSACTION_ID_FAILURE), immediately terminate the put process.
# If the peer NN is indeed the Active NameNode AND it's now in the appropriate
state to receive an image, it will reply with an HTTP response 410
(HttpServletResponse.SC_GONE, which is TransferResult.UNEXPECTED_FAILURE). At
this time, the local SNN can really begin to put the image.

2. Moreover, the

[jira] [Created] (HDFS-14646) Standby NameNode should terminate the FsImage put process as soon as possible if the peer NN is not in the appropriate state to receive an image.

2019-07-14 Thread Xudong Cao (JIRA)

Xudong Cao created HDFS-14646:
-

 Summary: Standby NameNode should terminate the FsImage put process 
as soon as possible if the peer NN is not in the appropriate state to receive 
an image.
 Key: HDFS-14646
 URL: https://issues.apache.org/jira/browse/HDFS-14646
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs
Affects Versions: 3.1.2
Reporter: Xudong Cao
Assignee: Xudong Cao
 Attachments: blockWriiting.png, get1.png, get2.png, largeSendQ.png

*Problem Description:*
In multi-NameNode scenario, when an SNN uploads a FsImage, it will put the 
image to all other NNs (whether the peer NN is an ANN or not), and even if the 
peer NN immediately replies with an error (such as 
TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult 
.OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put 
process immediately, but will put the FsImage completely to the peer NN, and 
will not read the peer NN's reply until the put is completed.

In a relatively large HDFS cluster, the size of FsImage can often reach about 
30G. In this case, this invalid put brings two problems:
1. Wasting time and bandwidth.
2. Since the ImageServlet of the peer NN no longer receives the FsImage, the 
socket Send-Q of the local SNN is very large, and the ImageUpload thread will 
be blocked in writting socket for a long time, eventually causing the local 
StandbyCheckpointer thread often blocked for several hours.

*An example is as follows:*
In the following figure, the local NN 100.76.3.234 is an SNN, the peer NN 
100.76.3.170 is another SNN, and the 8080 is NN Http port. When the local SNN 
starts to put FsImage, 170 will reply with a NOT_ACTIVE_NAMENODE_FAILURE error 
immediately. In this case, local SNN should terminate put immediately, but in 
fact, local SNN has to wait until the image has been completely put to peer 
NN，and then canl read the response.
 # At this time, since the ImageServlet of the peer NN no longer receives the 
FsImage, the socket Send-Q of the local SNN is very large:

!largeSendQ.png!

      2. Moreover, the local SNN's ImageUpload thread will be blocked in 
writing socket for a long time:

!blockWriiting.png!

 

     3. Eventually, the StandbyCheckpointer thread of local SNN is waiting for 
the execution result of the ImageUpload thread, blocking in Future.get(), and 
the blocking time may be as long as several hours:

!get1.png!

!get2.png!

 

*Solution:*
When the local SNN is ready to put a FsImage to the peer NN, it need to test 
whether he really need to put it at this time. The test process is:
 # Establish an HTTP connection with the peer NN, send a put request, and then 
immediately read the response (this is the key point). If the peer NN replies 
with any of the following errors (TransferResult.AUTHENTICATION_FAILURE, 
TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult.
 # If the peer NN is truly the ANN and can receive the FsImage normally, it 
will reply to the local SNN with an HTTP response 410 
(HttpServletResponse.SC_GONE, which is TransferResult.UNEXPECTED_FAILURE). At 
this time, the local SNN can really begin to put the image.

*Note:*
This problem needs to be reproduced in a large cluster (the size of FsImage in 
our cluster is about 30G). Therefore, unit testing is difficult to write. In 
our real cluster, after the modification, the problem has been solved. There is 
no such thing as a large backlog of Send-Q.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14646) Standby NameNode should terminate the FsImage put process immediately if the peer NN is not in the appropriate state to receive an image.

2019-07-14 Thread Xudong Cao (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xudong Cao updated HDFS-14646:
--
Summary: Standby NameNode should terminate the FsImage put process 
immediately if the peer NN is not in the appropriate state to receive an image. 
 (was: Standby NameNode should terminate the FsImage put process as soon as 
possible if the peer NN is not in the appropriate state to receive an image.)

> Standby NameNode should terminate the FsImage put process immediately if the 
> peer NN is not in the appropriate state to receive an image.
> -
>
> Key: HDFS-14646
> URL: https://issues.apache.org/jira/browse/HDFS-14646
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.2
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Major
> Attachments: blockedInWritingSocket.png, get1.png, get2.png, 
> largeSendQ.png
>
>
> *Problem Description:*
>  In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put 
> the image to all other NNs (whether the peer NN is an ANN or not), and even 
> if the peer NN immediately replies with an error (such as 
> TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult 
> .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put 
> process immediately, but will put the FsImage completely to the peer NN, and 
> will not read the peer NN's reply until the put is completed.
> In a relatively large HDFS cluster, the size of FsImage can often reach about 
> 30GB. In this case, this invalid put brings two problems:
>  # Wasting time and bandwidth.
>  # Since the ImageServlet of the peer NN no longer receives the FsImage, the 
> socket Send-Q of the local SNN is very large, and the ImageUpload thread will 
> be blocked in writing socket for a long time, eventually causing the local 
> StandbyCheckpointer thread often blocked for several hours.
> *An example is as follows:*
>  In the following figure, the local NN 100.76.3.234 is a SNN, the peer NN 
> 100.76.3.170 is another SNN, and the 8080 is NN Http port. When the local SNN 
> starts to put the FsImage, 170 will reply with a NOT_ACTIVE_NAMENODE_FAILURE 
> error immediately. In this case, the local SNN should terminate put 
> immediately, but in fact, local SNN has to wait until the image has been 
> completely put to the peer NN，and then can read the response.
>  # At this time, since the ImageServlet of the peer NN no longer receives the 
> FsImage, the socket Send-Q of the local SNN is very large:          
> !largeSendQ.png!
>       2. Moreover, the local SNN's ImageUpload thread will be blocked in 
> writing socket for a long time:
>           !blockedInWritingSocket.png! .
>  
>      3. Eventually, the StandbyCheckpointer thread of local SNN is waiting 
> for the execution result of the ImageUpload thread, blocking in Future.get(), 
> and the blocking time may be as long as several hours:
>             !get1.png!
>                            
>        !get2.png!
>  
>  
> *Solution:*
>  When the local SNN plans to put a FsImage to the peer NN, it need to test 
> whether he really need to put it at this time. The test process is:
>  # Establish an HTTP connection with the peer NN, send the put request, and 
> then immediately read the response (this is the key point). If the peer NN 
> replies any of the following errors (TransferResult.AUTHENTICATION_FAILURE, 
> TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult. 
> OLD_TRANSACTION_ID_FAILURE), immediately terminate the put process.
>  # If the peer NN is indeed the Active NameNode AND it's now in the 
> appropriate state to receive an image, it will reply an HTTP response 410 
> (HttpServletResponse.SC_GONE, which is TransferResult.UNEXPECTED_FAILURE). At 
> this time, the local SNN can really begin to put the image.
> *Note:*
>  This problem needs to be reproduced in a large cluster (the size of FsImage 
> in our cluster is about 30GB). Therefore, unit testing is difficult to write. 
> In our cluster, after the modification, the problem has been solved and there 
> is no such thing as a large backlog of Send-Q.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDFS-14646) Standby NameNode should terminate the FsImage put process immediately if the peer NN is not in the appropriate state to receive an image.

2019-07-15 Thread Xudong Cao (JIRA)



[ 
https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16885107#comment-16885107
 ] 

Xudong Cao edited comment on HDFS-14646 at 7/15/19 11:35 AM:
-

*Test Result:*

in a 3 nodes HDFS, ubuntu1 (ANN)、ubuntu2 (SNN) and ubuntu3(SNN), the uploading 
log in ubuntu2 and ubuntu3 is as follows:

 


was (Author: xudongcao):
*Test Result:*

> Standby NameNode should terminate the FsImage put process immediately if the 
> peer NN is not in the appropriate state to receive an image.
> -
>
> Key: HDFS-14646
> URL: https://issues.apache.org/jira/browse/HDFS-14646
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.2
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Major
> Attachments: blockedInWritingSocket.png, get1.png, get2.png, 
> largeSendQ.png
>
>
> *Problem Description:*
>  In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put 
> the image to all other NNs (whether the peer NN is an ANN or not), and even 
> if the peer NN immediately replies with an error (such as 
> TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult 
> .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put 
> process immediately, but will put the FsImage completely to the peer NN, and 
> will not read the peer NN's reply until the put is completed.
> In a relatively large HDFS cluster, the size of FsImage can often reach about 
> 30GB. In this case, this invalid put brings two problems:
>  # Wasting time and bandwidth.
>  # Since the ImageServlet of the peer NN no longer receives the FsImage, the 
> socket Send-Q of the local SNN is very large, and the ImageUpload thread will 
> be blocked in writing socket for a long time, eventually causing the local 
> StandbyCheckpointer thread often blocked for several hours.
> *An example is as follows:*
>  In the following figure, the local NN 100.76.3.234 is a SNN, the peer NN 
> 100.76.3.170 is another SNN, and the 8080 is NN Http port. When the local SNN 
> starts to put the FsImage, 170 will reply with a NOT_ACTIVE_NAMENODE_FAILURE 
> error immediately. In this case, the local SNN should terminate put 
> immediately, but in fact, local SNN has to wait until the image has been 
> completely put to the peer NN，and then can read the response.
>  # At this time, since the ImageServlet of the peer NN no longer receives the 
> FsImage, the socket Send-Q of the local SNN is very large:          
> !largeSendQ.png!
>       2. Moreover, the local SNN's ImageUpload thread will be blocked in 
> writing socket for a long time:
>           !blockedInWritingSocket.png! .
>  
>      3. Eventually, the StandbyCheckpointer thread of local SNN is waiting 
> for the execution result of the ImageUpload thread, blocking in Future.get(), 
> and the blocking time may be as long as several hours:
>             !get1.png!
>                            
>        !get2.png!
>  
>  
> *Solution:*
>  When the local SNN plans to put a FsImage to the peer NN, it need to test 
> whether he really need to put it at this time. The test process is:
>  # Establish an HTTP connection with the peer NN, send the put request, and 
> then immediately read the response (this is the key point). If the peer NN 
> replies any of the following errors (TransferResult.AUTHENTICATION_FAILURE, 
> TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult. 
> OLD_TRANSACTION_ID_FAILURE), immediately terminate the put process.
>  # If the peer NN is indeed the Active NameNode AND it's now in the 
> appropriate state to receive an image, it will reply an HTTP response 410 
> (HttpServletResponse.SC_GONE, which is TransferResult.UNEXPECTED_FAILURE). At 
> this time, the local SNN can really begin to put the image.
> *Note:*
>  This problem needs to be reproduced in a large cluster (the size of FsImage 
> in our cluster is about 30GB). Therefore, unit testing is difficult to write. 
> In our cluster, after the modification, the problem has been solved and there 
> is no such thing as a large backlog of Send-Q.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14646) Standby NameNode should terminate the FsImage put process immediately if the peer NN is not in the appropriate state to receive an image.

2019-07-15 Thread Xudong Cao (JIRA)



[ 
https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16885107#comment-16885107
 ] 

Xudong Cao commented on HDFS-14646:
---

*Test Result:*

> Standby NameNode should terminate the FsImage put process immediately if the 
> peer NN is not in the appropriate state to receive an image.
> -
>
> Key: HDFS-14646
> URL: https://issues.apache.org/jira/browse/HDFS-14646
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.2
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Major
> Attachments: blockedInWritingSocket.png, get1.png, get2.png, 
> largeSendQ.png
>
>
> *Problem Description:*
>  In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put 
> the image to all other NNs (whether the peer NN is an ANN or not), and even 
> if the peer NN immediately replies with an error (such as 
> TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult 
> .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put 
> process immediately, but will put the FsImage completely to the peer NN, and 
> will not read the peer NN's reply until the put is completed.
> In a relatively large HDFS cluster, the size of FsImage can often reach about 
> 30GB. In this case, this invalid put brings two problems:
>  # Wasting time and bandwidth.
>  # Since the ImageServlet of the peer NN no longer receives the FsImage, the 
> socket Send-Q of the local SNN is very large, and the ImageUpload thread will 
> be blocked in writing socket for a long time, eventually causing the local 
> StandbyCheckpointer thread often blocked for several hours.
> *An example is as follows:*
>  In the following figure, the local NN 100.76.3.234 is a SNN, the peer NN 
> 100.76.3.170 is another SNN, and the 8080 is NN Http port. When the local SNN 
> starts to put the FsImage, 170 will reply with a NOT_ACTIVE_NAMENODE_FAILURE 
> error immediately. In this case, the local SNN should terminate put 
> immediately, but in fact, local SNN has to wait until the image has been 
> completely put to the peer NN，and then can read the response.
>  # At this time, since the ImageServlet of the peer NN no longer receives the 
> FsImage, the socket Send-Q of the local SNN is very large:          
> !largeSendQ.png!
>       2. Moreover, the local SNN's ImageUpload thread will be blocked in 
> writing socket for a long time:
>           !blockedInWritingSocket.png! .
>  
>      3. Eventually, the StandbyCheckpointer thread of local SNN is waiting 
> for the execution result of the ImageUpload thread, blocking in Future.get(), 
> and the blocking time may be as long as several hours:
>             !get1.png!
>                            
>        !get2.png!
>  
>  
> *Solution:*
>  When the local SNN plans to put a FsImage to the peer NN, it need to test 
> whether he really need to put it at this time. The test process is:
>  # Establish an HTTP connection with the peer NN, send the put request, and 
> then immediately read the response (this is the key point). If the peer NN 
> replies any of the following errors (TransferResult.AUTHENTICATION_FAILURE, 
> TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult. 
> OLD_TRANSACTION_ID_FAILURE), immediately terminate the put process.
>  # If the peer NN is indeed the Active NameNode AND it's now in the 
> appropriate state to receive an image, it will reply an HTTP response 410 
> (HttpServletResponse.SC_GONE, which is TransferResult.UNEXPECTED_FAILURE). At 
> this time, the local SNN can really begin to put the image.
> *Note:*
>  This problem needs to be reproduced in a large cluster (the size of FsImage 
> in our cluster is about 30GB). Therefore, unit testing is difficult to write. 
> In our cluster, after the modification, the problem has been solved and there 
> is no such thing as a large backlog of Send-Q.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDFS-14646) Standby NameNode should terminate the FsImage put process immediately if the peer NN is not in the appropriate state to receive an image.

2019-07-15 Thread Xudong Cao (JIRA)



[ 
https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16885107#comment-16885107
 ] 

Xudong Cao edited comment on HDFS-14646 at 7/15/19 11:38 AM:
-

*Test Result:*

in a 3 nodes HDFS: ubuntu1 (ANN) +ubuntu2 (SNN) + ubuntu3(SNN), the uploading 
log in ubuntu2 and ubuntu3 is as follows:

1. SNN ubuntu2:
{code:java}
root@ubuntu2:~/hadoop-3.3.0-SNAPSHOT/logs# grep "Uploaded" 
hadoop-root-namenode-ubuntu2.log 
2019-07-16 01:52:24,801 INFO 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with 
txid 9573 to namenode at http://ubuntu1:9870 in 0.178 seconds
2019-07-16 01:53:24,912 INFO 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with 
txid 9759 to namenode at http://ubuntu1:9870 in 0.041 seconds
2019-07-16 01:54:25,051 INFO 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with 
txid 9777 to namenode at http://ubuntu1:9870 in 0.075 seconds
2019-07-16 01:55:25,147 INFO 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with 
txid 9961 to namenode at http://ubuntu1:9870 in 0.031 seconds
2019-07-16 01:56:25,253 INFO 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with 
txid 9981 to namenode at http://ubuntu1:9870 in 0.054 seconds
2019-07-16 01:57:25,323 INFO 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with 
txid 10171 to namenode at http://ubuntu1:9870 in 0.033 seconds
2019-07-16 01:58:25,388 INFO 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with 
txid 10191 to namenode at http://ubuntu1:9870 in 0.032 seconds
2019-07-16 01:59:25,479 INFO 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with 
txid 10383 to namenode at http://ubuntu1:9870 in 0.046 seconds{code}
 

2. another SNN ubuntu3:
{code:java}
root@ubuntu3:~/hadoop-3.3.0-SNAPSHOT/logs# grep "Uploaded" 
hadoop-root-namenode-ubuntu3.log 
2019-07-16 02:00:34,767 INFO 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with 
txid 10401 to namenode at http://ubuntu1:9870 in 0.028 seconds
2019-07-16 02:02:34,851 INFO 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with 
txid 10603 to namenode at http://ubuntu1:9870 in 0.03 seconds
2019-07-16 02:04:34,938 INFO 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with 
txid 10807 to namenode at http://ubuntu1:9870 in 0.033 seconds
2019-07-16 02:06:35,021 INFO 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with 
txid 11013 to namenode at http://ubuntu1:9870 in 0.041 seconds
2019-07-16 02:08:35,094 INFO 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with 
txid 11217 to namenode at http://ubuntu1:9870 in 0.029 seconds
2019-07-16 02:10:35,200 INFO 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with 
txid 11423 to namenode at http://ubuntu1:9870 in 0.032 seconds
2019-07-16 02:12:35,285 INFO 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with 
txid 11629 to namenode at http://ubuntu1:9870 in 0.026 seconds
2019-07-16 02:14:35,357 INFO 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with 
txid 11835 to namenode at http://ubuntu1:9870 in 0.023 seconds
2019-07-16 02:16:35,442 INFO 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with 
txid 12035 to namenode at http://ubuntu1:9870 in 0.042 seconds
2019-07-16 02:18:35,515 INFO 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with 
txid 12233 to namenode at http://ubuntu1:9870 in 0.031 seconds
2019-07-16 02:20:35,605 INFO 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with 
txid 12441 to namenode at http://ubuntu1:9870 in 0.033 seconds
2019-07-16 02:22:35,675 INFO 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with 
txid 12647 to namenode at http://ubuntu1:9870 in 0.029 seconds
2019-07-16 02:24:35,771 INFO 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with 
txid 12853 to namenode at http://ubuntu1:9870 in 0.041 seconds{code}


was (Author: xudongcao):
*Test Result:*

in a 3 nodes HDFS, ubuntu1 (ANN)、ubuntu2 (SNN) and ubuntu3(SNN), the uploading 
log in ubuntu2 and ubuntu3 is as follows:

 

> Standby NameNode should terminate the FsImage put process immediately if the 
> peer NN is not in the appropriate state to receive an image.
> -
>
> Key: HDFS-14646
> URL: https://issues.apache.org/jira/browse/HDFS-14646
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.2
>Reporter: Xudong Cao
>Assignee: Xudong Cao

[jira] [Comment Edited] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.

2019-08-16 Thread Xudong Cao (JIRA)



[ 
https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909018#comment-16909018
 ] 

Xudong Cao edited comment on HDFS-14646 at 8/16/19 12:33 PM:
-

[~elgoiri] [~xkrogen] [~hexiaoqiao] [~jojochuang]This problem has different 
behaviors under different Hadoop versions. I corrected the problem description 
and submitted a new patch. please review it if you have time, thank you!


was (Author: xudongcao):
[~elgoiri] [~xkrogen] [~hexiaoqiao] This problem has different behaviors under 
different Hadoop versions. I corrected the problem description and submitted a 
new patch. please review it if you have time, thank you!

> Standby NameNode should not upload fsimage to an inappropriate NameNode.
> 
>
> Key: HDFS-14646
> URL: https://issues.apache.org/jira/browse/HDFS-14646
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.2
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Major
> Attachments: HDFS-14646.000.patch
>
>
> *Problem Description:*
>  In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put 
> the image to all other NNs (whether the peer NN is an ANN or not), and even 
> if the peer NN immediately replies an error (such as 
> TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult 
> .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put 
> process immediately, but will put the FsImage completely to the peer NN, and 
> will not read the peer NN's reply until the put is completed.
> Depending on the version of Jetty, this behavior can lead to different 
> consequences, I tested it under 2.7.2 and trunk version.
> *1. In Hadoop 2.7.2  (with Jetty 6.1.26)*
> After peer NN called HttpServletResponse.sendError(), the underlying TCP 
> connection will still be established, and the data SNN sent will be read by 
> Jetty framework itself in the peer NN side, so the SNN will still 
> insignificantly send the FsImage to the peer NN continuously, causing a waste 
> of time and bandwidth. In a relatively large HDFS cluster, the size of 
> FsImage can often reach about 30GB, This is indeed a big waste.
> *2.In trunk version (with Jetty 9.3.27)*
>  After peer NN called HttpServletResponse.sendError(), the underlying TCP 
> connection will be auto closed, and then SNN will directly get an "Error 
> writing request body to server" exception, as below:
> {code:java}
> 2019-08-17 03:59:25,413 INFO namenode.TransferFsImage: Sending fileName: 
> /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 
> 9864721. Sent total: 524288 bytes. Size of last segment intended to send: 
> 4096 bytes.
>  java.io.IOException: Error writing request body to server
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:314)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:249)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:277)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:272)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  2019-08-17 03:59:25,422 INFO namenode.TransferFsImage: Sending fileName: 
> /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 
> 9864721. Sent total: 851968 bytes. Size of last segment intended to send: 
> 4096 bytes.
>  java.io.IOException: Error writing request body to server
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340)
>   {code}
>                   
> *Solution:*
>  A standby NameNode should not upload fsimage

[jira] [Comment Edited] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.

2019-08-16 Thread Xudong Cao (JIRA)



[ 
https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909018#comment-16909018
 ] 

Xudong Cao edited comment on HDFS-14646 at 8/16/19 2:42 PM:


[~elgoiri] [~xkrogen] [~hexiaoqiao] [~jojochuang]

This problem seems having different behaviors in different Hadoop versions. I 
have corrected the problem description and submitted a new patch, please review 
it if you have time, thank you!


was (Author: xudongcao):
[~elgoiri] [~xkrogen] [~hexiaoqiao] [~jojochuang]This problem has different 
behaviors under different Hadoop versions. I corrected the problem description 
and submitted a new patch. please review it if you have time, thank you!

> Standby NameNode should not upload fsimage to an inappropriate NameNode.
> 
>
> Key: HDFS-14646
> URL: https://issues.apache.org/jira/browse/HDFS-14646
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.2
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Major
> Attachments: HDFS-14646.000.patch, HDFS-14646.001.patch
>
>
> *Problem Description:*
>  In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put 
> the image to all other NNs (whether the peer NN is an ANN or not), and even 
> if the peer NN immediately replies an error (such as 
> TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult 
> .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put 
> process immediately, but will put the FsImage completely to the peer NN, and 
> will not read the peer NN's reply until the put is completed.
> Depending on the version of Jetty, this behavior can lead to different 
> consequences, I tested it under 2.7.2 and trunk version.
> *1. In Hadoop 2.7.2  (with Jetty 6.1.26)*
> After peer NN called HttpServletResponse.sendError(), the underlying TCP 
> connection will still be established, and the data SNN sent will be read by 
> Jetty framework itself in the peer NN side, so the SNN will still 
> insignificantly send the FsImage to the peer NN continuously, causing a waste 
> of time and bandwidth. In a relatively large HDFS cluster, the size of 
> FsImage can often reach about 30GB, This is indeed a big waste.
> *2.In trunk version (with Jetty 9.3.27)*
>  After peer NN called HttpServletResponse.sendError(), the underlying TCP 
> connection will be auto closed, and then SNN will directly get an "Error 
> writing request body to server" exception, as below:
> {code:java}
> 2019-08-17 03:59:25,413 INFO namenode.TransferFsImage: Sending fileName: 
> /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 
> 9864721. Sent total: 524288 bytes. Size of last segment intended to send: 
> 4096 bytes.
>  java.io.IOException: Error writing request body to server
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:314)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:249)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:277)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:272)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  2019-08-17 03:59:25,422 INFO namenode.TransferFsImage: Sending fileName: 
> /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 
> 9864721. Sent total: 851968 bytes. Size of last segment intended to send: 
> 4096 bytes.
>  java.io.IOException: Error writing request body to server
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340)
>   {code}
>                   
> *Solution:*
>

[jira] [Updated] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.

2019-08-16 Thread Xudong Cao (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xudong Cao updated HDFS-14646:
--
Description: 
*Problem Description:*
 In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put the 
image to all other NNs (whether the peer NN is an ANN or not), and even if the 
peer NN immediately replies an error (such as 
TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult 
.OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put 
process immediately, but will put the FsImage completely to the peer NN, and 
will not read the peer NN's reply until the put is completed.

Depending on the version of Jetty, this behavior can lead to different 
consequences, I tested it under 2.7.2 and trunk version. 

*1.In Hadoop 2.7.2 (with Jetty 6.1.26)*
 After peer NN called HttpServletResponse.sendError(), the underlying TCP 
connection will still be established, and the data SNN sent will be read by 
Jetty framework itself in the peer NN side, so the SNN will insignificantly 
send the FsImage to the peer NN continuously, causing a waste of time and 
bandwidth. In a relatively large HDFS cluster, the size of FsImage can often 
reach about 30GB, This is indeed a big waste.

*2.In trunk version (with Jetty 9.3.27)*
 After peer NN called HttpServletResponse.sendError(), the underlying TCP 
connection will be auto closed, and then SNN will directly get an "Error 
writing request body to server" exception, as below:
{code:java}
2019-08-17 03:59:25,413 INFO namenode.TransferFsImage: Sending fileName: 
/tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 
9864721. Sent total: 524288 bytes. Size of last segment intended to send: 4096 
bytes.
 java.io.IOException: Error writing request body to server
 at 
sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
 at 
sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
 at 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396)
 at 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340)
 at 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:314)
 at 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:249)
 at 
org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:277)
 at 
org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:272)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)
 2019-08-17 03:59:25,422 INFO namenode.TransferFsImage: Sending fileName: 
/tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 
9864721. Sent total: 851968 bytes. Size of last segment intended to send: 4096 
bytes.
 java.io.IOException: Error writing request body to server
 at 
sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
 at 
sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
 at 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396)
 at 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340)
  {code}
                  

*Solution:*
 A standby NameNode should not upload fsimage to an inappropriate NameNode, 
when he plans to put a FsImage to the peer NN, he need to check whether he 
really need to put it at this time.

  was:
*Problem Description:*
 In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put the 
image to all other NNs (whether the peer NN is an ANN or not), and even if the 
peer NN immediately replies an error (such as 
TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult 
.OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put 
process immediately, but will put the FsImage completely to the peer NN, and 
will not read the peer NN's reply until the put is completed.

Depending on the version of Jetty, this behavior can lead to different 
consequences, I tested it under 2.7.2 and trunk version.

 

*1.In Hadoop 2.7.2 (with Jetty 6.1.26)*
After peer NN called HttpServletResponse.sendError(), the underlying TCP 
connection will still be established, and the data SNN sent will be read by 
Jetty framework itself in the peer NN side, so the SNN will insignificantly 
send the FsImage to the peer NN continuously, causing a waste of time and 
bandwidth. In a relatively large HDFS cluster, the size of FsImage

[jira] [Updated] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.

2019-08-16 Thread Xudong Cao (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xudong Cao updated HDFS-14646:
--
Description: 
*Problem Description:*
 In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put the 
image to all other NNs (whether the peer NN is an ANN or not), and even if the 
peer NN immediately replies an error (such as 
TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult 
.OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put 
process immediately, but will put the FsImage completely to the peer NN, and 
will not read the peer NN's reply until the put is completed.

Depending on the version of Jetty, this behavior can lead to different 
consequences, I tested it under 2.7.2 and trunk version.

 

*1.In Hadoop 2.7.2 (with Jetty 6.1.26)*
After peer NN called HttpServletResponse.sendError(), the underlying TCP 
connection will still be established, and the data SNN sent will be read by 
Jetty framework itself in the peer NN side, so the SNN will insignificantly 
send the FsImage to the peer NN continuously, causing a waste of time and 
bandwidth. In a relatively large HDFS cluster, the size of FsImage can often 
reach about 30GB, This is indeed a big waste.

*2.In trunk version (with Jetty 9.3.27)*
 After peer NN called HttpServletResponse.sendError(), the underlying TCP 
connection will be auto closed, and then SNN will directly get an "Error 
writing request body to server" exception, as below:
{code:java}
2019-08-17 03:59:25,413 INFO namenode.TransferFsImage: Sending fileName: 
/tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 
9864721. Sent total: 524288 bytes. Size of last segment intended to send: 4096 
bytes.
 java.io.IOException: Error writing request body to server
 at 
sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
 at 
sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
 at 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396)
 at 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340)
 at 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:314)
 at 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:249)
 at 
org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:277)
 at 
org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:272)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)
 2019-08-17 03:59:25,422 INFO namenode.TransferFsImage: Sending fileName: 
/tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 
9864721. Sent total: 851968 bytes. Size of last segment intended to send: 4096 
bytes.
 java.io.IOException: Error writing request body to server
 at 
sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
 at 
sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
 at 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396)
 at 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340)
  {code}
                  

*Solution:*
 A standby NameNode should not upload fsimage to an inappropriate NameNode, 
when he plans to put a FsImage to the peer NN, he need to check whether he 
really need to put it at this time.

  was:
*Problem Description:*
 In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put the 
image to all other NNs (whether the peer NN is an ANN or not), and even if the 
peer NN immediately replies an error (such as 
TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult 
.OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put 
process immediately, but will put the FsImage completely to the peer NN, and 
will not read the peer NN's reply until the put is completed.

Depending on the version of Jetty, this behavior can lead to different 
consequences, I tested it under 2.7.2 and trunk version.

*1. In Hadoop 2.7.2  (with Jetty 6.1.26)*

After peer NN called HttpServletResponse.sendError(), the underlying TCP 
connection will still be established, and the data SNN sent will be read by 
Jetty framework itself in the peer NN side, so the SNN will still 
insignificantly send the FsImage to the peer NN continuously, causing a waste 
of time and bandwidth. In a relatively large HDFS cluster, the size of

[jira] [Updated] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.

2019-08-16 Thread Xudong Cao (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xudong Cao updated HDFS-14646:
--
Status: Open  (was: Patch Available)

> Standby NameNode should not upload fsimage to an inappropriate NameNode.
> 
>
> Key: HDFS-14646
> URL: https://issues.apache.org/jira/browse/HDFS-14646
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.2
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Major
> Attachments: HDFS-14646.000.patch
>
>
> *Problem Description:*
>  In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put 
> the image to all other NNs (whether the peer NN is an ANN or not), and even 
> if the peer NN immediately replies an error (such as 
> TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult 
> .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put 
> process immediately, but will put the FsImage completely to the peer NN, and 
> will not read the peer NN's reply until the put is completed.
> Depending on the version of Jetty, this behavior can lead to different 
> consequences, I tested it under 2.7.2 and trunk version. 
> *1.In Hadoop 2.7.2 (with Jetty 6.1.26)*
>  After peer NN called HttpServletResponse.sendError(), the underlying TCP 
> connection will still be established, and the data SNN sent will be read by 
> Jetty framework itself in the peer NN side, so the SNN will insignificantly 
> send the FsImage to the peer NN continuously, causing a waste of time and 
> bandwidth. In a relatively large HDFS cluster, the size of FsImage can often 
> reach about 30GB, This is indeed a big waste.
> *2.In trunk version (with Jetty 9.3.27)*
>  After peer NN called HttpServletResponse.sendError(), the underlying TCP 
> connection will be auto closed, and then SNN will directly get an "Error 
> writing request body to server" exception, as below, note this test needs a 
> relatively big FSImage (e.g. 10MB level):
> {code:java}
> 2019-08-17 03:59:25,413 INFO namenode.TransferFsImage: Sending fileName: 
> /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 
> 9864721. Sent total: 524288 bytes. Size of last segment intended to send: 
> 4096 bytes.
>  java.io.IOException: Error writing request body to server
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:314)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:249)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:277)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:272)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  2019-08-17 03:59:25,422 INFO namenode.TransferFsImage: Sending fileName: 
> /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 
> 9864721. Sent total: 851968 bytes. Size of last segment intended to send: 
> 4096 bytes.
>  java.io.IOException: Error writing request body to server
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340)
>   {code}
>                   
> *Solution:*
>  A standby NameNode should not upload fsimage to an inappropriate NameNode, 
> when he plans to put a FsImage to the peer NN, he need to check whether he 
> really need to put it at this time.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.

2019-08-16 Thread Xudong Cao (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xudong Cao updated HDFS-14646:
--
Attachment: (was: HDFS-14646.001.patch)

> Standby NameNode should not upload fsimage to an inappropriate NameNode.
> 
>
> Key: HDFS-14646
> URL: https://issues.apache.org/jira/browse/HDFS-14646
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.2
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Major
> Attachments: HDFS-14646.000.patch
>
>
> *Problem Description:*
>  In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put 
> the image to all other NNs (whether the peer NN is an ANN or not), and even 
> if the peer NN immediately replies an error (such as 
> TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult 
> .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put 
> process immediately, but will put the FsImage completely to the peer NN, and 
> will not read the peer NN's reply until the put is completed.
> Depending on the version of Jetty, this behavior can lead to different 
> consequences, I tested it under 2.7.2 and trunk version. 
> *1.In Hadoop 2.7.2 (with Jetty 6.1.26)*
>  After peer NN called HttpServletResponse.sendError(), the underlying TCP 
> connection will still be established, and the data SNN sent will be read by 
> Jetty framework itself in the peer NN side, so the SNN will insignificantly 
> send the FsImage to the peer NN continuously, causing a waste of time and 
> bandwidth. In a relatively large HDFS cluster, the size of FsImage can often 
> reach about 30GB, This is indeed a big waste.
> *2.In trunk version (with Jetty 9.3.27)*
>  After peer NN called HttpServletResponse.sendError(), the underlying TCP 
> connection will be auto closed, and then SNN will directly get an "Error 
> writing request body to server" exception, as below, note this test needs a 
> relatively big FSImage (e.g. 10MB level):
> {code:java}
> 2019-08-17 03:59:25,413 INFO namenode.TransferFsImage: Sending fileName: 
> /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 
> 9864721. Sent total: 524288 bytes. Size of last segment intended to send: 
> 4096 bytes.
>  java.io.IOException: Error writing request body to server
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:314)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:249)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:277)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:272)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  2019-08-17 03:59:25,422 INFO namenode.TransferFsImage: Sending fileName: 
> /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 
> 9864721. Sent total: 851968 bytes. Size of last segment intended to send: 
> 4096 bytes.
>  java.io.IOException: Error writing request body to server
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340)
>   {code}
>                   
> *Solution:*
>  A standby NameNode should not upload fsimage to an inappropriate NameNode, 
> when he plans to put a FsImage to the peer NN, he need to check whether he 
> really need to put it at this time.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.

2019-08-16 Thread Xudong Cao (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xudong Cao updated HDFS-14646:
--
Attachment: HDFS-14646.001.patch

> Standby NameNode should not upload fsimage to an inappropriate NameNode.
> 
>
> Key: HDFS-14646
> URL: https://issues.apache.org/jira/browse/HDFS-14646
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.2
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Major
> Attachments: HDFS-14646.000.patch, HDFS-14646.001.patch
>
>
> *Problem Description:*
>  In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put 
> the image to all other NNs (whether the peer NN is an ANN or not), and even 
> if the peer NN immediately replies an error (such as 
> TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult 
> .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put 
> process immediately, but will put the FsImage completely to the peer NN, and 
> will not read the peer NN's reply until the put is completed.
> Depending on the version of Jetty, this behavior can lead to different 
> consequences, I tested it under 2.7.2 and trunk version. 
> *1.In Hadoop 2.7.2 (with Jetty 6.1.26)*
>  After peer NN called HttpServletResponse.sendError(), the underlying TCP 
> connection will still be established, and the data SNN sent will be read by 
> Jetty framework itself in the peer NN side, so the SNN will insignificantly 
> send the FsImage to the peer NN continuously, causing a waste of time and 
> bandwidth. In a relatively large HDFS cluster, the size of FsImage can often 
> reach about 30GB, This is indeed a big waste.
> *2.In trunk version (with Jetty 9.3.27)*
>  After peer NN called HttpServletResponse.sendError(), the underlying TCP 
> connection will be auto closed, and then SNN will directly get an "Error 
> writing request body to server" exception, as below, note this test needs a 
> relatively big FSImage (e.g. 10MB level):
> {code:java}
> 2019-08-17 03:59:25,413 INFO namenode.TransferFsImage: Sending fileName: 
> /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 
> 9864721. Sent total: 524288 bytes. Size of last segment intended to send: 
> 4096 bytes.
>  java.io.IOException: Error writing request body to server
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:314)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:249)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:277)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:272)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  2019-08-17 03:59:25,422 INFO namenode.TransferFsImage: Sending fileName: 
> /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 
> 9864721. Sent total: 851968 bytes. Size of last segment intended to send: 
> 4096 bytes.
>  java.io.IOException: Error writing request body to server
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340)
>   {code}
>                   
> *Solution:*
>  A standby NameNode should not upload fsimage to an inappropriate NameNode, 
> when he plans to put a FsImage to the peer NN, he need to check whether he 
> really need to put it at this time.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.

2019-08-16 Thread Xudong Cao (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xudong Cao updated HDFS-14646:
--
Attachment: HDFS-14646.001.patch
Status: Patch Available  (was: Open)

> Standby NameNode should not upload fsimage to an inappropriate NameNode.
> 
>
> Key: HDFS-14646
> URL: https://issues.apache.org/jira/browse/HDFS-14646
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.2
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Major
> Attachments: HDFS-14646.000.patch, HDFS-14646.001.patch
>
>
> *Problem Description:*
>  In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put 
> the image to all other NNs (whether the peer NN is an ANN or not), and even 
> if the peer NN immediately replies an error (such as 
> TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult 
> .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put 
> process immediately, but will put the FsImage completely to the peer NN, and 
> will not read the peer NN's reply until the put is completed.
> Depending on the version of Jetty, this behavior can lead to different 
> consequences, I tested it under 2.7.2 and trunk version. 
> *1.In Hadoop 2.7.2 (with Jetty 6.1.26)*
>  After peer NN called HttpServletResponse.sendError(), the underlying TCP 
> connection will still be established, and the data SNN sent will be read by 
> Jetty framework itself in the peer NN side, so the SNN will insignificantly 
> send the FsImage to the peer NN continuously, causing a waste of time and 
> bandwidth. In a relatively large HDFS cluster, the size of FsImage can often 
> reach about 30GB, This is indeed a big waste.
> *2.In trunk version (with Jetty 9.3.27)*
>  After peer NN called HttpServletResponse.sendError(), the underlying TCP 
> connection will be auto closed, and then SNN will directly get an "Error 
> writing request body to server" exception, as below, note this test needs a 
> relatively big FSImage (e.g. 10MB level):
> {code:java}
> 2019-08-17 03:59:25,413 INFO namenode.TransferFsImage: Sending fileName: 
> /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 
> 9864721. Sent total: 524288 bytes. Size of last segment intended to send: 
> 4096 bytes.
>  java.io.IOException: Error writing request body to server
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:314)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:249)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:277)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:272)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  2019-08-17 03:59:25,422 INFO namenode.TransferFsImage: Sending fileName: 
> /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 
> 9864721. Sent total: 851968 bytes. Size of last segment intended to send: 
> 4096 bytes.
>  java.io.IOException: Error writing request body to server
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340)
>   {code}
>                   
> *Solution:*
>  A standby NameNode should not upload fsimage to an inappropriate NameNode, 
> when he plans to put a FsImage to the peer NN, he need to check whether he 
> really need to put it at this time.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail:

[jira] [Updated] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.

2019-08-16 Thread Xudong Cao (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xudong Cao updated HDFS-14646:
--
Attachment: (was: HDFS-14646.001.patch)

> Standby NameNode should not upload fsimage to an inappropriate NameNode.
> 
>
> Key: HDFS-14646
> URL: https://issues.apache.org/jira/browse/HDFS-14646
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.2
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Major
> Attachments: HDFS-14646.000.patch, HDFS-14646.001.patch
>
>
> *Problem Description:*
>  In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put 
> the image to all other NNs (whether the peer NN is an ANN or not), and even 
> if the peer NN immediately replies an error (such as 
> TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult 
> .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put 
> process immediately, but will put the FsImage completely to the peer NN, and 
> will not read the peer NN's reply until the put is completed.
> Depending on the version of Jetty, this behavior can lead to different 
> consequences, I tested it under 2.7.2 and trunk version. 
> *1.In Hadoop 2.7.2 (with Jetty 6.1.26)*
>  After peer NN called HttpServletResponse.sendError(), the underlying TCP 
> connection will still be established, and the data SNN sent will be read by 
> Jetty framework itself in the peer NN side, so the SNN will insignificantly 
> send the FsImage to the peer NN continuously, causing a waste of time and 
> bandwidth. In a relatively large HDFS cluster, the size of FsImage can often 
> reach about 30GB, This is indeed a big waste.
> *2.In trunk version (with Jetty 9.3.27)*
>  After peer NN called HttpServletResponse.sendError(), the underlying TCP 
> connection will be auto closed, and then SNN will directly get an "Error 
> writing request body to server" exception, as below, note this test needs a 
> relatively big FSImage (e.g. 10MB level):
> {code:java}
> 2019-08-17 03:59:25,413 INFO namenode.TransferFsImage: Sending fileName: 
> /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 
> 9864721. Sent total: 524288 bytes. Size of last segment intended to send: 
> 4096 bytes.
>  java.io.IOException: Error writing request body to server
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:314)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:249)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:277)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:272)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  2019-08-17 03:59:25,422 INFO namenode.TransferFsImage: Sending fileName: 
> /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 
> 9864721. Sent total: 851968 bytes. Size of last segment intended to send: 
> 4096 bytes.
>  java.io.IOException: Error writing request body to server
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340)
>   {code}
>                   
> *Solution:*
>  A standby NameNode should not upload fsimage to an inappropriate NameNode, 
> when he plans to put a FsImage to the peer NN, he need to check whether he 
> really need to put it at this time.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.

2019-08-16 Thread Xudong Cao (JIRA)



[ 
https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909018#comment-16909018
 ] 

Xudong Cao commented on HDFS-14646:
---

[~elgoiri] [~xkrogen] [~hexiaoqiao] This problem has different behaviors under 
different Hadoop versions. I corrected the problem description and submitted a 
new patch. please review it if you have time, thank you!

> Standby NameNode should not upload fsimage to an inappropriate NameNode.
> 
>
> Key: HDFS-14646
> URL: https://issues.apache.org/jira/browse/HDFS-14646
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.2
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Major
> Attachments: HDFS-14646.000.patch
>
>
> *Problem Description:*
>  In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put 
> the image to all other NNs (whether the peer NN is an ANN or not), and even 
> if the peer NN immediately replies an error (such as 
> TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult 
> .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put 
> process immediately, but will put the FsImage completely to the peer NN, and 
> will not read the peer NN's reply until the put is completed.
> Depending on the version of Jetty, this behavior can lead to different 
> consequences, I tested it under 2.7.2 and trunk version.
> *1. In Hadoop 2.7.2  (with Jetty 6.1.26)*
> After peer NN called HttpServletResponse.sendError(), the underlying TCP 
> connection will still be established, and the data SNN sent will be read by 
> Jetty framework itself in the peer NN side, so the SNN will still 
> insignificantly send the FsImage to the peer NN continuously, causing a waste 
> of time and bandwidth. In a relatively large HDFS cluster, the size of 
> FsImage can often reach about 30GB, This is indeed a big waste.
> *2.In trunk version (with Jetty 9.3.27)*
>  After peer NN called HttpServletResponse.sendError(), the underlying TCP 
> connection will be auto closed, and then SNN will directly get an "Error 
> writing request body to server" exception, as below:
> {code:java}
> 2019-08-17 03:59:25,413 INFO namenode.TransferFsImage: Sending fileName: 
> /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 
> 9864721. Sent total: 524288 bytes. Size of last segment intended to send: 
> 4096 bytes.
>  java.io.IOException: Error writing request body to server
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:314)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:249)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:277)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:272)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  2019-08-17 03:59:25,422 INFO namenode.TransferFsImage: Sending fileName: 
> /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 
> 9864721. Sent total: 851968 bytes. Size of last segment intended to send: 
> 4096 bytes.
>  java.io.IOException: Error writing request body to server
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340)
>   {code}
>                   
> *Solution:*
>  A standby NameNode should not upload fsimage to an inappropriate NameNode, 
> when he plans to put a FsImage to the peer NN, he need to check whether he 
> really need to put it at this time.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail:

[jira] [Updated] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.

2019-08-16 Thread Xudong Cao (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xudong Cao updated HDFS-14646:
--
Attachment: HDFS-14646.001.patch

> Standby NameNode should not upload fsimage to an inappropriate NameNode.
> 
>
> Key: HDFS-14646
> URL: https://issues.apache.org/jira/browse/HDFS-14646
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.2
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Major
> Attachments: HDFS-14646.000.patch, HDFS-14646.001.patch
>
>
> *Problem Description:*
>  In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put 
> the image to all other NNs (whether the peer NN is an ANN or not), and even 
> if the peer NN immediately replies an error (such as 
> TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult 
> .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put 
> process immediately, but will put the FsImage completely to the peer NN, and 
> will not read the peer NN's reply until the put is completed.
> Depending on the version of Jetty, this behavior can lead to different 
> consequences, I tested it under 2.7.2 and trunk version.
> *1. In Hadoop 2.7.2  (with Jetty 6.1.26)*
> After peer NN called HttpServletResponse.sendError(), the underlying TCP 
> connection will still be established, and the data SNN sent will be read by 
> Jetty framework itself in the peer NN side, so the SNN will still 
> insignificantly send the FsImage to the peer NN continuously, causing a waste 
> of time and bandwidth. In a relatively large HDFS cluster, the size of 
> FsImage can often reach about 30GB, This is indeed a big waste.
> *2.In trunk version (with Jetty 9.3.27)*
>  After peer NN called HttpServletResponse.sendError(), the underlying TCP 
> connection will be auto closed, and then SNN will directly get an "Error 
> writing request body to server" exception, as below:
> {code:java}
> 2019-08-17 03:59:25,413 INFO namenode.TransferFsImage: Sending fileName: 
> /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 
> 9864721. Sent total: 524288 bytes. Size of last segment intended to send: 
> 4096 bytes.
>  java.io.IOException: Error writing request body to server
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:314)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:249)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:277)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:272)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  2019-08-17 03:59:25,422 INFO namenode.TransferFsImage: Sending fileName: 
> /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 
> 9864721. Sent total: 851968 bytes. Size of last segment intended to send: 
> 4096 bytes.
>  java.io.IOException: Error writing request body to server
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340)
>   {code}
>                   
> *Solution:*
>  A standby NameNode should not upload fsimage to an inappropriate NameNode, 
> when he plans to put a FsImage to the peer NN, he need to check whether he 
> really need to put it at this time.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.

2019-08-16 Thread Xudong Cao (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xudong Cao updated HDFS-14646:
--
Description: 
*Problem Description:*
 In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put the 
image to all other NNs (whether the peer NN is an ANN or not), and even if the 
peer NN immediately replies an error (such as 
TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult 
.OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put 
process immediately, but will put the FsImage completely to the peer NN, and 
will not read the peer NN's reply until the put is completed.

Depending on the version of Jetty, this behavior can lead to different 
consequences, I tested it under 2.7.2 and trunk version. 

*1.In Hadoop 2.7.2 (with Jetty 6.1.26)*
 After peer NN called HttpServletResponse.sendError(), the underlying TCP 
connection will still be established, and the data SNN sent will be read by 
Jetty framework itself in the peer NN side, so the SNN will insignificantly 
send the FsImage to the peer NN continuously, causing a waste of time and 
bandwidth. In a relatively large HDFS cluster, the size of FsImage can often 
reach about 30GB, This is indeed a big waste.

*2.In trunk version (with Jetty 9.3.27)*
 After peer NN called HttpServletResponse.sendError(), the underlying TCP 
connection will be auto closed, and then SNN will directly get an "Error 
writing request body to server" exception, as below, note this test needs a 
relatively big FSImage (e.g. MB level):
{code:java}
2019-08-17 03:59:25,413 INFO namenode.TransferFsImage: Sending fileName: 
/tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 
9864721. Sent total: 524288 bytes. Size of last segment intended to send: 4096 
bytes.
 java.io.IOException: Error writing request body to server
 at 
sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
 at 
sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
 at 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396)
 at 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340)
 at 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:314)
 at 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:249)
 at 
org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:277)
 at 
org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:272)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)
 2019-08-17 03:59:25,422 INFO namenode.TransferFsImage: Sending fileName: 
/tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 
9864721. Sent total: 851968 bytes. Size of last segment intended to send: 4096 
bytes.
 java.io.IOException: Error writing request body to server
 at 
sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
 at 
sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
 at 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396)
 at 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340)
  {code}
                  

*Solution:*
 A standby NameNode should not upload fsimage to an inappropriate NameNode, 
when he plans to put a FsImage to the peer NN, he need to check whether he 
really need to put it at this time.

  was:
*Problem Description:*
 In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put the 
image to all other NNs (whether the peer NN is an ANN or not), and even if the 
peer NN immediately replies an error (such as 
TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult 
.OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put 
process immediately, but will put the FsImage completely to the peer NN, and 
will not read the peer NN's reply until the put is completed.

Depending on the version of Jetty, this behavior can lead to different 
consequences, I tested it under 2.7.2 and trunk version. 

*1.In Hadoop 2.7.2 (with Jetty 6.1.26)*
 After peer NN called HttpServletResponse.sendError(), the underlying TCP 
connection will still be established, and the data SNN sent will be read by 
Jetty framework itself in the peer NN side, so the SNN will insignificantly 
send the FsImage to the peer NN continuously, causing a waste of time and

[jira] [Updated] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.

2019-08-16 Thread Xudong Cao (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xudong Cao updated HDFS-14646:
--
Description: 
*Problem Description:*
 In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put the 
image to all other NNs (whether the peer NN is an ANN or not), and even if the 
peer NN immediately replies an error (such as 
TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult 
.OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put 
process immediately, but will put the FsImage completely to the peer NN, and 
will not read the peer NN's reply until the put is completed.

Depending on the version of Jetty, this behavior can lead to different 
consequences, I tested it under 2.7.2 and trunk version. 

*1.In Hadoop 2.7.2 (with Jetty 6.1.26)*
 After peer NN called HttpServletResponse.sendError(), the underlying TCP 
connection will still be established, and the data SNN sent will be read by 
Jetty framework itself in the peer NN side, so the SNN will insignificantly 
send the FsImage to the peer NN continuously, causing a waste of time and 
bandwidth. In a relatively large HDFS cluster, the size of FsImage can often 
reach about 30GB, This is indeed a big waste.

*2.In trunk version (with Jetty 9.3.27)*
 After peer NN called HttpServletResponse.sendError(), the underlying TCP 
connection will be auto closed, and then SNN will directly get an "Error 
writing request body to server" exception, as below, note this test needs a 
relatively big FSImage (e.g. 10MB level):
{code:java}
2019-08-17 03:59:25,413 INFO namenode.TransferFsImage: Sending fileName: 
/tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 
9864721. Sent total: 524288 bytes. Size of last segment intended to send: 4096 
bytes.
 java.io.IOException: Error writing request body to server
 at 
sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
 at 
sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
 at 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396)
 at 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340)
 at 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:314)
 at 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:249)
 at 
org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:277)
 at 
org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:272)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)
 2019-08-17 03:59:25,422 INFO namenode.TransferFsImage: Sending fileName: 
/tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 
9864721. Sent total: 851968 bytes. Size of last segment intended to send: 4096 
bytes.
 java.io.IOException: Error writing request body to server
 at 
sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
 at 
sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
 at 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396)
 at 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340)
  {code}
                  

*Solution:*
 A standby NameNode should not upload fsimage to an inappropriate NameNode, 
when he plans to put a FsImage to the peer NN, he need to check whether he 
really need to put it at this time.

  was:
*Problem Description:*
 In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put the 
image to all other NNs (whether the peer NN is an ANN or not), and even if the 
peer NN immediately replies an error (such as 
TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult 
.OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put 
process immediately, but will put the FsImage completely to the peer NN, and 
will not read the peer NN's reply until the put is completed.

Depending on the version of Jetty, this behavior can lead to different 
consequences, I tested it under 2.7.2 and trunk version. 

*1.In Hadoop 2.7.2 (with Jetty 6.1.26)*
 After peer NN called HttpServletResponse.sendError(), the underlying TCP 
connection will still be established, and the data SNN sent will be read by 
Jetty framework itself in the peer NN side, so the SNN will insignificantly 
send the FsImage to the peer NN continuously, causing a waste of time and

[jira] [Updated] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.

2019-08-16 Thread Xudong Cao (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xudong Cao updated HDFS-14646:
--
Description: 
*Problem Description:*
 In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put the 
image to all other NNs (whether the peer NN is an ANN or not), and even if the 
peer NN immediately replies an error (such as 
TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult 
.OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put 
process immediately, but will put the FsImage completely to the peer NN, and 
will not read the peer NN's reply until the put is completed.

Depending on the version of Jetty, this behavior can lead to different 
consequences, I tested it under 2.7.2 and trunk version.

*1. In Hadoop 2.7.2  (with Jetty 6.1.26)*

After peer NN called HttpServletResponse.sendError(), the underlying TCP 
connection will still be established, and the data SNN sent will be read by 
Jetty framework itself in the peer NN side, so the SNN will still 
insignificantly send the FsImage to the peer NN continuously, causing a waste 
of time and bandwidth. In a relatively large HDFS cluster, the size of FsImage 
can often reach about 30GB, This is indeed a big waste.

*2.In trunk version (with Jetty 9.3.27)*
 After peer NN called HttpServletResponse.sendError(), the underlying TCP 
connection will be auto closed, and then SNN will directly get an "Error 
writing request body to server" exception, as below:
{code:java}
2019-08-17 03:59:25,413 INFO namenode.TransferFsImage: Sending fileName: 
/tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 
9864721. Sent total: 524288 bytes. Size of last segment intended to send: 4096 
bytes.
 java.io.IOException: Error writing request body to server
 at 
sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
 at 
sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
 at 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396)
 at 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340)
 at 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:314)
 at 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:249)
 at 
org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:277)
 at 
org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:272)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)
 2019-08-17 03:59:25,422 INFO namenode.TransferFsImage: Sending fileName: 
/tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 
9864721. Sent total: 851968 bytes. Size of last segment intended to send: 4096 
bytes.
 java.io.IOException: Error writing request body to server
 at 
sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
 at 
sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
 at 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396)
 at 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340)
  {code}
                  

*Solution:*
 A standby NameNode should not upload fsimage to an inappropriate NameNode, 
when he plans to put a FsImage to the peer NN, he need to check whether he 
really need to put it at this time.

  was:
*Problem Description:*
 In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put the 
image to all other NNs (whether the peer NN is an ANN or not), and even if the 
peer NN immediately replies an error (such as 
TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult 
.OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put 
process immediately, but will put the FsImage completely to the peer NN, and 
will not read the peer NN's reply until the put is completed.

Depending on the version of Jetty, this behavior can lead to different 
consequences, I tested it under 2.7.2 and the latest 3.3.0 version.

*1. In Hadoop 2.7.2  (with Jetty 6.1.26)*

After peer NN called HttpServletResponse.sendError(), the underlying TCP 
connection will still be established, and the data SNN sent will be read by 
Jetty framework itself in the peer NN side, so the SNN will still 
insignificantly send the FsImage to the peer NN continuously, causing a waste 
of time and bandwidth. In a relatively large HDFS

[jira] [Updated] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.

2019-08-16 Thread Xudong Cao (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xudong Cao updated HDFS-14646:
--
Summary: Standby NameNode should not upload fsimage to an inappropriate 
NameNode.  (was: Standby NameNode should terminate the FsImage put process 
immediately if the peer NN is not in the appropriate state to receive an image.)

> Standby NameNode should not upload fsimage to an inappropriate NameNode.
> 
>
> Key: HDFS-14646
> URL: https://issues.apache.org/jira/browse/HDFS-14646
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.2
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Major
> Attachments: HDFS-14646.000.patch, blockedInWritingSocket.png, 
> get1.png, get2.png, largeSendQ.png
>
>
> *Problem Description:*
>  In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put 
> the image to all other NNs (whether the peer NN is an ANN or not), and even 
> if the peer NN immediately replies with an error (such as 
> TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult 
> .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put 
> process immediately, but will put the FsImage completely to the peer NN, and 
> will not read the peer NN's reply until the put is completed.
> In a relatively large HDFS cluster, the size of FsImage can often reach about 
> 30GB. In this case, this invalid put brings two problems:
>  # Wasting time and bandwidth.
>  # Since the ImageServlet of the peer NN no longer receives the FsImage, the 
> socket Send-Q of the local SNN is very large, and the ImageUpload thread will 
> be blocked in writing socket for a long time, eventually causing the local 
> StandbyCheckpointer thread often blocked for several hours.
> *An example is as follows:*
>  In the following figure, the local NN 100.76.3.234 is a SNN, the peer NN 
> 100.76.3.170 is another SNN, and the 8080 is NN Http port. When the local SNN 
> starts to put the FsImage, 170 will reply with a NOT_ACTIVE_NAMENODE_FAILURE 
> error immediately. In this case, the local SNN should terminate put 
> immediately, but in fact, local SNN has to wait until the image has been 
> completely put to the peer NN，and then can read the response.
>  # At this time, since the ImageServlet of the peer NN no longer receives the 
> FsImage, the socket Send-Q of the local SNN is very large:          
> !largeSendQ.png!
>       2. Moreover, the local SNN's ImageUpload thread will be blocked in 
> writing socket for a long time:
>           !blockedInWritingSocket.png! .
>  
>      3. Eventually, the StandbyCheckpointer thread of local SNN is waiting 
> for the execution result of the ImageUpload thread, blocking in Future.get(), 
> and the blocking time may be as long as several hours:
>             !get1.png!
>                            
>        !get2.png!
>  
>  
> *Solution:*
>  When the local SNN plans to put a FsImage to the peer NN, it need to test 
> whether he really need to put it at this time. The test process is:
>  # Establish an HTTP connection with the peer NN, send the put request, and 
> then immediately read the response (this is the key point). If the peer NN 
> replies any of the following errors (TransferResult.AUTHENTICATION_FAILURE, 
> TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult. 
> OLD_TRANSACTION_ID_FAILURE), immediately terminate the put process.
>  # If the peer NN is indeed the Active NameNode AND it's now in the 
> appropriate state to receive an image, it will reply an HTTP response 410 
> (HttpServletResponse.SC_GONE, which is TransferResult.UNEXPECTED_FAILURE). At 
> this time, the local SNN can really begin to put the image.
> *Note:*
>  This problem needs to be reproduced in a large cluster (the size of FsImage 
> in our cluster is about 30GB). Therefore, unit testing is difficult to write. 
> In our cluster, after the modification, the problem has been solved and there 
> is no such thing as a large backlog of Send-Q.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.

2019-08-16 Thread Xudong Cao (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xudong Cao updated HDFS-14646:
--
Description: 
*Problem Description:*
 In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put the 
image to all other NNs (whether the peer NN is an ANN or not), and even if the 
peer NN immediately replies an error (such as 
TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult 
.OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put 
process immediately, but will put the FsImage completely to the peer NN, and 
will not read the peer NN's reply until the put is completed.

Depending on the version of Jetty, this behavior can lead to different 
consequences, I tested it under 2.7.2 and the latest 3.3.0 version.

*1. In Hadoop 2.7.2  (with Jetty 6.1.26)*

After peer NN called HttpServletResponse.sendError(), the underlying TCP 
connection will still be established, and the data SNN sent will be read by 
Jetty framework itself in the peer NN side, so the SNN will still 
insignificantly send the FsImage to the peer NN continuously, causing a waste 
of time and bandwidth. In a relatively large HDFS cluster, the size of FsImage 
can often reach about 30GB, This is indeed a big waste.

*2.In newest Hadoop-3.3.0-SNAPSHOT (with Jetty 9.3.27)*
 After peer NN called HttpServletResponse.sendError(), the underlying TCP 
connection will be auto closed, and then SNN will directly get an "Error 
writing request body to server" exception, as below:
{code:java}
2019-08-17 03:59:25,413 INFO namenode.TransferFsImage: Sending fileName: 
/tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 
9864721. Sent total: 524288 bytes. Size of last segment intended to send: 4096 
bytes.
 java.io.IOException: Error writing request body to server
 at 
sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
 at 
sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
 at 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396)
 at 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340)
 at 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:314)
 at 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:249)
 at 
org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:277)
 at 
org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:272)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)
 2019-08-17 03:59:25,422 INFO namenode.TransferFsImage: Sending fileName: 
/tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 
9864721. Sent total: 851968 bytes. Size of last segment intended to send: 4096 
bytes.
 java.io.IOException: Error writing request body to server
 at 
sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
 at 
sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
 at 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396)
 at 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340)
  {code}
                  

*Solution:*
 A standby NameNode should not upload fsimage to an inappropriate NameNode, 
when he plans to put a FsImage to the peer NN, he need to check whether he 
really need to put it at this time.

  was:
*Problem Description:*
 In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put the 
image to all other NNs (whether the peer NN is an ANN or not), and even if the 
peer NN immediately replies with an error (such as 
TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult 
.OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put 
process immediately, but will put the FsImage completely to the peer NN, and 
will not read the peer NN's reply until the put is completed.

Depending on the version of Jetty, this behavior can lead to different 
consequences, I tested it under 2.7.2 and the latest 3.3.0 version.

*1. In Hadoop 2.7.2  (with Jetty 6.1.26)*

After perr NN called HttpServletResponse.sendError(), the underlying TCP 
connection will still be established, and the data SNN sent will be read by 
Jetty framework itself in the peer NN, so the SNN will still insignificantly 
send the FsImage to the peer NN continuously, causing a waste of time and 
bandwidth. In a

[jira] [Updated] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.

2019-08-16 Thread Xudong Cao (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xudong Cao updated HDFS-14646:
--
Attachment: (was: blockedInWritingSocket.png)

> Standby NameNode should not upload fsimage to an inappropriate NameNode.
> 
>
> Key: HDFS-14646
> URL: https://issues.apache.org/jira/browse/HDFS-14646
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.2
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Major
> Attachments: HDFS-14646.000.patch
>
>
> *Problem Description:*
>  In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put 
> the image to all other NNs (whether the peer NN is an ANN or not), and even 
> if the peer NN immediately replies with an error (such as 
> TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult 
> .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put 
> process immediately, but will put the FsImage completely to the peer NN, and 
> will not read the peer NN's reply until the put is completed.
> Depending on the version of Jetty, this behavior can lead to different 
> consequences, I tested it under 2.7.2 and the latest 3.3.0 version.
> *1. In Hadoop 2.7.2  (with Jetty 6.1.26)*
> After perr NN called HttpServletResponse.sendError(), the underlying TCP 
> connection will still be established, and the data SNN sent will be read by 
> Jetty framework itself in the peer NN, so the SNN will still insignificantly 
> send the FsImage to the peer NN continuously, causing a waste of time and 
> bandwidth. In a relatively large HDFS cluster, the size of FsImage can often 
> reach about 30GB, This is indeed a big waste.
> *2.In newest Hadoop-3.3.0-SNAPSHOT (with Jetty 9.3.27)*
> After perr NN called HttpServletResponse.sendError(), the underlying TCP 
> connection will be auto closed, and then SNN will directly get an "Error 
> writing request body to server" exception, as below:
> {code:java}
> 2019-08-17 03:59:25,413 INFO namenode.TransferFsImage: Sending fileName: 
> /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 
> 9864721. Sent total: 524288 bytes. Size of last segment intended to send: 
> 4096 bytes.
>  java.io.IOException: Error writing request body to server
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:314)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:249)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:277)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:272)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  2019-08-17 03:59:25,422 INFO namenode.TransferFsImage: Sending fileName: 
> /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 
> 9864721. Sent total: 851968 bytes. Size of last segment intended to send: 
> 4096 bytes.
>  java.io.IOException: Error writing request body to server
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340)
>   {code}
>                   
> *Solution:*
>  A standby NameNode should not upload fsimage to an inappropriate NameNode, 
> when the he plans to put a FsImage to the peer NN, he need to check whether 
> he really need to put it at this time.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.

2019-08-16 Thread Xudong Cao (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xudong Cao updated HDFS-14646:
--
Attachment: (was: get1.png)

> Standby NameNode should not upload fsimage to an inappropriate NameNode.
> 
>
> Key: HDFS-14646
> URL: https://issues.apache.org/jira/browse/HDFS-14646
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.2
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Major
> Attachments: HDFS-14646.000.patch
>
>
> *Problem Description:*
>  In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put 
> the image to all other NNs (whether the peer NN is an ANN or not), and even 
> if the peer NN immediately replies with an error (such as 
> TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult 
> .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put 
> process immediately, but will put the FsImage completely to the peer NN, and 
> will not read the peer NN's reply until the put is completed.
> Depending on the version of Jetty, this behavior can lead to different 
> consequences, I tested it under 2.7.2 and the latest 3.3.0 version.
> *1. In Hadoop 2.7.2  (with Jetty 6.1.26)*
> After perr NN called HttpServletResponse.sendError(), the underlying TCP 
> connection will still be established, and the data SNN sent will be read by 
> Jetty framework itself in the peer NN, so the SNN will still insignificantly 
> send the FsImage to the peer NN continuously, causing a waste of time and 
> bandwidth. In a relatively large HDFS cluster, the size of FsImage can often 
> reach about 30GB, This is indeed a big waste.
> *2.In newest Hadoop-3.3.0-SNAPSHOT (with Jetty 9.3.27)*
> After perr NN called HttpServletResponse.sendError(), the underlying TCP 
> connection will be auto closed, and then SNN will directly get an "Error 
> writing request body to server" exception, as below:
> {code:java}
> 2019-08-17 03:59:25,413 INFO namenode.TransferFsImage: Sending fileName: 
> /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 
> 9864721. Sent total: 524288 bytes. Size of last segment intended to send: 
> 4096 bytes.
>  java.io.IOException: Error writing request body to server
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:314)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:249)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:277)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:272)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  2019-08-17 03:59:25,422 INFO namenode.TransferFsImage: Sending fileName: 
> /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 
> 9864721. Sent total: 851968 bytes. Size of last segment intended to send: 
> 4096 bytes.
>  java.io.IOException: Error writing request body to server
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340)
>   {code}
>                   
> *Solution:*
>  A standby NameNode should not upload fsimage to an inappropriate NameNode, 
> when the he plans to put a FsImage to the peer NN, he need to check whether 
> he really need to put it at this time.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.

2019-08-16 Thread Xudong Cao (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xudong Cao updated HDFS-14646:
--
Description: 
*Problem Description:*
 In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put the 
image to all other NNs (whether the peer NN is an ANN or not), and even if the 
peer NN immediately replies with an error (such as 
TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult 
.OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put 
process immediately, but will put the FsImage completely to the peer NN, and 
will not read the peer NN's reply until the put is completed.

Depending on the version of Jetty, this behavior can lead to different 
consequences, I tested it under 2.7.2 and the latest 3.3.0 version.

*1. In Hadoop 2.7.2  (with Jetty 6.1.26)*

After perr NN called HttpServletResponse.sendError(), the underlying TCP 
connection will still be established, and the data SNN sent will be read by 
Jetty framework itself in the peer NN, so the SNN will still insignificantly 
send the FsImage to the peer NN continuously, causing a waste of time and 
bandwidth. In a relatively large HDFS cluster, the size of FsImage can often 
reach about 30GB, This is indeed a big waste.

*2.In newest Hadoop-3.3.0-SNAPSHOT (with Jetty 9.3.27)*
After perr NN called HttpServletResponse.sendError(), the underlying TCP 
connection will be auto closed, and then SNN will directly get an "Error 
writing request body to server" exception, as below:
{code:java}
2019-08-17 03:59:25,413 INFO namenode.TransferFsImage: Sending fileName: 
/tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 
9864721. Sent total: 524288 bytes. Size of last segment intended to send: 4096 
bytes.
 java.io.IOException: Error writing request body to server
 at 
sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
 at 
sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
 at 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396)
 at 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340)
 at 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:314)
 at 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:249)
 at 
org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:277)
 at 
org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:272)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)
 2019-08-17 03:59:25,422 INFO namenode.TransferFsImage: Sending fileName: 
/tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 
9864721. Sent total: 851968 bytes. Size of last segment intended to send: 4096 
bytes.
 java.io.IOException: Error writing request body to server
 at 
sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
 at 
sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
 at 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396)
 at 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340)
  {code}
                  

*Solution:*
 A standby NameNode should not upload fsimage to an inappropriate NameNode, 
when the he plans to put a FsImage to the peer NN, he need to check whether he 
really need to put it at this time.

  was:
*Problem Description:*
 In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put the 
image to all other NNs (whether the peer NN is an ANN or not), and even if the 
peer NN immediately replies with an error (such as 
TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult 
.OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put 
process immediately, but will put the FsImage completely to the peer NN, and 
will not read the peer NN's reply until the put is completed.

In a relatively large HDFS cluster, the size of FsImage can often reach about 
30GB. In this case, this invalid put brings two problems:
 # Wasting time and bandwidth.
 # Since the ImageServlet of the peer NN no longer receives the FsImage, the 
socket Send-Q of the local SNN is very large, and the ImageUpload thread will 
be blocked in writing socket for a long time, eventually causing the local 
StandbyCheckpointer thread often blocked for several hours.

*An example is as follows:*
 In the

[jira] [Updated] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.

2019-08-16 Thread Xudong Cao (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xudong Cao updated HDFS-14646:
--
Attachment: (was: largeSendQ.png)

> Standby NameNode should not upload fsimage to an inappropriate NameNode.
> 
>
> Key: HDFS-14646
> URL: https://issues.apache.org/jira/browse/HDFS-14646
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.2
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Major
> Attachments: HDFS-14646.000.patch
>
>
> *Problem Description:*
>  In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put 
> the image to all other NNs (whether the peer NN is an ANN or not), and even 
> if the peer NN immediately replies with an error (such as 
> TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult 
> .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put 
> process immediately, but will put the FsImage completely to the peer NN, and 
> will not read the peer NN's reply until the put is completed.
> Depending on the version of Jetty, this behavior can lead to different 
> consequences, I tested it under 2.7.2 and the latest 3.3.0 version.
> *1. In Hadoop 2.7.2  (with Jetty 6.1.26)*
> After perr NN called HttpServletResponse.sendError(), the underlying TCP 
> connection will still be established, and the data SNN sent will be read by 
> Jetty framework itself in the peer NN, so the SNN will still insignificantly 
> send the FsImage to the peer NN continuously, causing a waste of time and 
> bandwidth. In a relatively large HDFS cluster, the size of FsImage can often 
> reach about 30GB, This is indeed a big waste.
> *2.In newest Hadoop-3.3.0-SNAPSHOT (with Jetty 9.3.27)*
> After perr NN called HttpServletResponse.sendError(), the underlying TCP 
> connection will be auto closed, and then SNN will directly get an "Error 
> writing request body to server" exception, as below:
> {code:java}
> 2019-08-17 03:59:25,413 INFO namenode.TransferFsImage: Sending fileName: 
> /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 
> 9864721. Sent total: 524288 bytes. Size of last segment intended to send: 
> 4096 bytes.
>  java.io.IOException: Error writing request body to server
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:314)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:249)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:277)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:272)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  2019-08-17 03:59:25,422 INFO namenode.TransferFsImage: Sending fileName: 
> /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 
> 9864721. Sent total: 851968 bytes. Size of last segment intended to send: 
> 4096 bytes.
>  java.io.IOException: Error writing request body to server
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340)
>   {code}
>                   
> *Solution:*
>  A standby NameNode should not upload fsimage to an inappropriate NameNode, 
> when the he plans to put a FsImage to the peer NN, he need to check whether 
> he really need to put it at this time.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.

2019-08-16 Thread Xudong Cao (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xudong Cao updated HDFS-14646:
--
Attachment: (was: get2.png)

> Standby NameNode should not upload fsimage to an inappropriate NameNode.
> 
>
> Key: HDFS-14646
> URL: https://issues.apache.org/jira/browse/HDFS-14646
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.2
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Major
> Attachments: HDFS-14646.000.patch
>
>
> *Problem Description:*
>  In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put 
> the image to all other NNs (whether the peer NN is an ANN or not), and even 
> if the peer NN immediately replies with an error (such as 
> TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult 
> .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put 
> process immediately, but will put the FsImage completely to the peer NN, and 
> will not read the peer NN's reply until the put is completed.
> Depending on the version of Jetty, this behavior can lead to different 
> consequences, I tested it under 2.7.2 and the latest 3.3.0 version.
> *1. In Hadoop 2.7.2  (with Jetty 6.1.26)*
> After perr NN called HttpServletResponse.sendError(), the underlying TCP 
> connection will still be established, and the data SNN sent will be read by 
> Jetty framework itself in the peer NN, so the SNN will still insignificantly 
> send the FsImage to the peer NN continuously, causing a waste of time and 
> bandwidth. In a relatively large HDFS cluster, the size of FsImage can often 
> reach about 30GB, This is indeed a big waste.
> *2.In newest Hadoop-3.3.0-SNAPSHOT (with Jetty 9.3.27)*
> After perr NN called HttpServletResponse.sendError(), the underlying TCP 
> connection will be auto closed, and then SNN will directly get an "Error 
> writing request body to server" exception, as below:
> {code:java}
> 2019-08-17 03:59:25,413 INFO namenode.TransferFsImage: Sending fileName: 
> /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 
> 9864721. Sent total: 524288 bytes. Size of last segment intended to send: 
> 4096 bytes.
>  java.io.IOException: Error writing request body to server
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:314)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:249)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:277)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:272)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  2019-08-17 03:59:25,422 INFO namenode.TransferFsImage: Sending fileName: 
> /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 
> 9864721. Sent total: 851968 bytes. Size of last segment intended to send: 
> 4096 bytes.
>  java.io.IOException: Error writing request body to server
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340)
>   {code}
>                   
> *Solution:*
>  A standby NameNode should not upload fsimage to an inappropriate NameNode, 
> when the he plans to put a FsImage to the peer NN, he need to check whether 
> he really need to put it at this time.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Issue Comment Deleted] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.

2019-08-16 Thread Xudong Cao (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xudong Cao updated HDFS-14646:
--
Comment: was deleted

(was: *Test Result:*

in a 3 nodes HDFS: ubuntu1 (ANN) +ubuntu2 (SNN) + ubuntu3(SNN), the uploading 
log in ubuntu2 and ubuntu3 is as follows:

1. SNN ubuntu2:
{code:java}
root@ubuntu2:~/hadoop-3.3.0-SNAPSHOT/logs# grep "Uploaded" 
hadoop-root-namenode-ubuntu2.log 
2019-07-16 01:52:24,801 INFO 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with 
txid 9573 to namenode at http://ubuntu1:9870 in 0.178 seconds
2019-07-16 01:53:24,912 INFO 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with 
txid 9759 to namenode at http://ubuntu1:9870 in 0.041 seconds
2019-07-16 01:54:25,051 INFO 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with 
txid 9777 to namenode at http://ubuntu1:9870 in 0.075 seconds
2019-07-16 01:55:25,147 INFO 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with 
txid 9961 to namenode at http://ubuntu1:9870 in 0.031 seconds
2019-07-16 01:56:25,253 INFO 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with 
txid 9981 to namenode at http://ubuntu1:9870 in 0.054 seconds
2019-07-16 01:57:25,323 INFO 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with 
txid 10171 to namenode at http://ubuntu1:9870 in 0.033 seconds
2019-07-16 01:58:25,388 INFO 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with 
txid 10191 to namenode at http://ubuntu1:9870 in 0.032 seconds
2019-07-16 01:59:25,479 INFO 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with 
txid 10383 to namenode at http://ubuntu1:9870 in 0.046 seconds{code}
 

2. another SNN ubuntu3:
{code:java}
root@ubuntu3:~/hadoop-3.3.0-SNAPSHOT/logs# grep "Uploaded" 
hadoop-root-namenode-ubuntu3.log 
2019-07-16 02:00:34,767 INFO 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with 
txid 10401 to namenode at http://ubuntu1:9870 in 0.028 seconds
2019-07-16 02:02:34,851 INFO 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with 
txid 10603 to namenode at http://ubuntu1:9870 in 0.03 seconds
2019-07-16 02:04:34,938 INFO 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with 
txid 10807 to namenode at http://ubuntu1:9870 in 0.033 seconds
2019-07-16 02:06:35,021 INFO 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with 
txid 11013 to namenode at http://ubuntu1:9870 in 0.041 seconds
2019-07-16 02:08:35,094 INFO 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with 
txid 11217 to namenode at http://ubuntu1:9870 in 0.029 seconds
2019-07-16 02:10:35,200 INFO 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with 
txid 11423 to namenode at http://ubuntu1:9870 in 0.032 seconds
2019-07-16 02:12:35,285 INFO 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with 
txid 11629 to namenode at http://ubuntu1:9870 in 0.026 seconds
2019-07-16 02:14:35,357 INFO 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with 
txid 11835 to namenode at http://ubuntu1:9870 in 0.023 seconds
2019-07-16 02:16:35,442 INFO 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with 
txid 12035 to namenode at http://ubuntu1:9870 in 0.042 seconds
2019-07-16 02:18:35,515 INFO 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with 
txid 12233 to namenode at http://ubuntu1:9870 in 0.031 seconds
2019-07-16 02:20:35,605 INFO 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with 
txid 12441 to namenode at http://ubuntu1:9870 in 0.033 seconds
2019-07-16 02:22:35,675 INFO 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with 
txid 12647 to namenode at http://ubuntu1:9870 in 0.029 seconds
2019-07-16 02:24:35,771 INFO 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with 
txid 12853 to namenode at http://ubuntu1:9870 in 0.041 seconds{code})

> Standby NameNode should not upload fsimage to an inappropriate NameNode.
> 
>
> Key: HDFS-14646
> URL: https://issues.apache.org/jira/browse/HDFS-14646
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.2
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Major
> Attachments: HDFS-14646.000.patch
>
>
> *Problem Description:*
>  In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put 
> the image to all other NNs (whether the peer NN is an ANN or not), and even 
> if the peer NN immediately replies with an error (such as 
>

[jira] [Updated] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.

2019-08-18 Thread Xudong Cao (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xudong Cao updated HDFS-14646:
--
Status: Open  (was: Patch Available)

> Standby NameNode should not upload fsimage to an inappropriate NameNode.
> 
>
> Key: HDFS-14646
> URL: https://issues.apache.org/jira/browse/HDFS-14646
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.2
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Major
> Attachments: HDFS-14646.000.patch, HDFS-14646.001.patch, 
> HDFS-14646.002.patch
>
>
> *Problem Description:*
>  In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put 
> the image to all other NNs (whether the peer NN is an ANN or not), and even 
> if the peer NN immediately replies an error (such as 
> TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult 
> .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put 
> process immediately, but will put the FsImage completely to the peer NN, and 
> will not read the peer NN's reply until the put is completed.
> Depending on the version of Jetty, this behavior can lead to different 
> consequences, I tested it under 2.7.2 and trunk version. 
> *1.In Hadoop 2.7.2 (with Jetty 6.1.26)*
>  After peer NN called HttpServletResponse.sendError(), the underlying TCP 
> connection will still be established, and the data SNN sent will be read by 
> Jetty framework itself in the peer NN side, so the SNN will insignificantly 
> send the FsImage to the peer NN continuously, causing a waste of time and 
> bandwidth. In a relatively large HDFS cluster, the size of FsImage can often 
> reach about 30GB, This is indeed a big waste.
> *2.In trunk version (with Jetty 9.3.27)*
>  After peer NN called HttpServletResponse.sendError(), the underlying TCP 
> connection will be auto closed, and then SNN will directly get an "Error 
> writing request body to server" exception, as below, note this test needs a 
> relatively big FSImage (e.g. 10MB level):
> {code:java}
> 2019-08-17 03:59:25,413 INFO namenode.TransferFsImage: Sending fileName: 
> /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 
> 9864721. Sent total: 524288 bytes. Size of last segment intended to send: 
> 4096 bytes.
>  java.io.IOException: Error writing request body to server
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:314)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:249)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:277)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:272)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  2019-08-17 03:59:25,422 INFO namenode.TransferFsImage: Sending fileName: 
> /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 
> 9864721. Sent total: 851968 bytes. Size of last segment intended to send: 
> 4096 bytes.
>  java.io.IOException: Error writing request body to server
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340)
>   {code}
>                   
> *Solution:*
>  A standby NameNode should not upload fsimage to an inappropriate NameNode, 
> when he plans to put a FsImage to the peer NN, he need to check whether he 
> really need to put it at this time.
> In detail, local SNN should establish an HTTP connection with the peer NN, 
> send the put request, and then immediately read the response (this is the key 
> point). If the peer NN does not reply an HTTP_OK, it means the local SNN 
> should not put image at this

[jira] [Resolved] (HDDS-1703) Freon uses wait/notify instead of polling to eliminate the test result errors.

2019-08-15 Thread Xudong Cao (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDDS-1703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xudong Cao resolved HDDS-1703.
--
Resolution: Invalid

> Freon uses wait/notify instead of polling to eliminate the test result errors.
> --
>
> Key: HDDS-1703
> URL: https://issues.apache.org/jira/browse/HDDS-1703
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.4.0
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Minor
>
> After HDDS-1532, Freon has an efficient concurrent testing framework. In the 
> new framework, the main thread checks every 5s to verify whether the test is 
> completed (or an exception occurred), which will eventually introduce a 
> maximum error of 5s.
> In most cases, Freon's test results are at minutes or tens of minutes level, 
> thus a 5s error is not significant, but in some particularly small tests, a 
> 5s error may have a significant impact.
> Therefore, we can use the combination of Object.wait() + Object.notify()  
> instead of polling to completely eliminate this error.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.

2019-08-20 Thread Xudong Cao (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xudong Cao updated HDFS-14646:
--
Attachment: (was: HDFS-14646.003.patch)

> Standby NameNode should not upload fsimage to an inappropriate NameNode.
> 
>
> Key: HDFS-14646
> URL: https://issues.apache.org/jira/browse/HDFS-14646
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.2
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Major
> Attachments: HDFS-14646.000.patch, HDFS-14646.001.patch, 
> HDFS-14646.002.patch, HDFS-14646.003.patch
>
>
> *Problem Description:*
>  In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put 
> the image to all other NNs (whether the peer NN is an ANN or not), and even 
> if the peer NN immediately replies an error (such as 
> TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult 
> .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put 
> process immediately, but will put the FsImage completely to the peer NN, and 
> will not read the peer NN's reply until the put is completed.
> Depending on the version of Jetty, this behavior can lead to different 
> consequences, I tested it under 2.7.2 and trunk version. 
> *1.In Hadoop 2.7.2 (with Jetty 6.1.26)*
>  After peer NN called HttpServletResponse.sendError(), the underlying TCP 
> connection will still be established, and the data SNN sent will be read by 
> Jetty framework itself in the peer NN side, so the SNN will insignificantly 
> send the FsImage to the peer NN continuously, causing a waste of time and 
> bandwidth. In a relatively large HDFS cluster, the size of FsImage can often 
> reach about 30GB, This is indeed a big waste.
> *2.In trunk version (with Jetty 9.3.27)*
>  After peer NN called HttpServletResponse.sendError(), the underlying TCP 
> connection will be auto closed, and then SNN will directly get an "Error 
> writing request body to server" exception, as below, note this test needs a 
> relatively big FSImage (e.g. 10MB level):
> {code:java}
> 2019-08-17 03:59:25,413 INFO namenode.TransferFsImage: Sending fileName: 
> /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 
> 9864721. Sent total: 524288 bytes. Size of last segment intended to send: 
> 4096 bytes.
>  java.io.IOException: Error writing request body to server
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:314)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:249)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:277)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:272)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  2019-08-17 03:59:25,422 INFO namenode.TransferFsImage: Sending fileName: 
> /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 
> 9864721. Sent total: 851968 bytes. Size of last segment intended to send: 
> 4096 bytes.
>  java.io.IOException: Error writing request body to server
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340)
>   {code}
>                   
> *Solution:*
>  A standby NameNode should not upload fsimage to an inappropriate NameNode, 
> when he plans to put a FsImage to the peer NN, he need to check whether he 
> really need to put it at this time.
> In detail, local SNN should establish an HTTP connection with the peer NN, 
> send the put request, and then immediately read the response (this is the key 
> point). If the peer NN does not reply an HTTP_OK, it means the local SNN 
>

[jira] [Updated] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.

2019-08-20 Thread Xudong Cao (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xudong Cao updated HDFS-14646:
--
Attachment: (was: HDFS-14646.003.patch)

> Standby NameNode should not upload fsimage to an inappropriate NameNode.
> 
>
> Key: HDFS-14646
> URL: https://issues.apache.org/jira/browse/HDFS-14646
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.2
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Major
> Attachments: HDFS-14646.000.patch, HDFS-14646.001.patch, 
> HDFS-14646.002.patch, HDFS-14646.003.patch
>
>
> *Problem Description:*
>  In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put 
> the image to all other NNs (whether the peer NN is an ANN or not), and even 
> if the peer NN immediately replies an error (such as 
> TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult 
> .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put 
> process immediately, but will put the FsImage completely to the peer NN, and 
> will not read the peer NN's reply until the put is completed.
> Depending on the version of Jetty, this behavior can lead to different 
> consequences, I tested it under 2.7.2 and trunk version. 
> *1.In Hadoop 2.7.2 (with Jetty 6.1.26)*
>  After peer NN called HttpServletResponse.sendError(), the underlying TCP 
> connection will still be established, and the data SNN sent will be read by 
> Jetty framework itself in the peer NN side, so the SNN will insignificantly 
> send the FsImage to the peer NN continuously, causing a waste of time and 
> bandwidth. In a relatively large HDFS cluster, the size of FsImage can often 
> reach about 30GB, This is indeed a big waste.
> *2.In trunk version (with Jetty 9.3.27)*
>  After peer NN called HttpServletResponse.sendError(), the underlying TCP 
> connection will be auto closed, and then SNN will directly get an "Error 
> writing request body to server" exception, as below, note this test needs a 
> relatively big FSImage (e.g. 10MB level):
> {code:java}
> 2019-08-17 03:59:25,413 INFO namenode.TransferFsImage: Sending fileName: 
> /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 
> 9864721. Sent total: 524288 bytes. Size of last segment intended to send: 
> 4096 bytes.
>  java.io.IOException: Error writing request body to server
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:314)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:249)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:277)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:272)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  2019-08-17 03:59:25,422 INFO namenode.TransferFsImage: Sending fileName: 
> /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 
> 9864721. Sent total: 851968 bytes. Size of last segment intended to send: 
> 4096 bytes.
>  java.io.IOException: Error writing request body to server
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340)
>   {code}
>                   
> *Solution:*
>  A standby NameNode should not upload fsimage to an inappropriate NameNode, 
> when he plans to put a FsImage to the peer NN, he need to check whether he 
> really need to put it at this time.
> In detail, local SNN should establish an HTTP connection with the peer NN, 
> send the put request, and then immediately read the response (this is the key 
> point). If the peer NN does not reply an HTTP_OK, it means the local SNN 
>

[jira] [Updated] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.

2019-08-20 Thread Xudong Cao (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xudong Cao updated HDFS-14646:
--
Status: Open  (was: Patch Available)

> Standby NameNode should not upload fsimage to an inappropriate NameNode.
> 
>
> Key: HDFS-14646
> URL: https://issues.apache.org/jira/browse/HDFS-14646
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.2
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Major
> Attachments: HDFS-14646.000.patch, HDFS-14646.001.patch, 
> HDFS-14646.002.patch, HDFS-14646.003.patch
>
>
> *Problem Description:*
>  In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put 
> the image to all other NNs (whether the peer NN is an ANN or not), and even 
> if the peer NN immediately replies an error (such as 
> TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult 
> .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put 
> process immediately, but will put the FsImage completely to the peer NN, and 
> will not read the peer NN's reply until the put is completed.
> Depending on the version of Jetty, this behavior can lead to different 
> consequences, I tested it under 2.7.2 and trunk version. 
> *1.In Hadoop 2.7.2 (with Jetty 6.1.26)*
>  After peer NN called HttpServletResponse.sendError(), the underlying TCP 
> connection will still be established, and the data SNN sent will be read by 
> Jetty framework itself in the peer NN side, so the SNN will insignificantly 
> send the FsImage to the peer NN continuously, causing a waste of time and 
> bandwidth. In a relatively large HDFS cluster, the size of FsImage can often 
> reach about 30GB, This is indeed a big waste.
> *2.In trunk version (with Jetty 9.3.27)*
>  After peer NN called HttpServletResponse.sendError(), the underlying TCP 
> connection will be auto closed, and then SNN will directly get an "Error 
> writing request body to server" exception, as below, note this test needs a 
> relatively big FSImage (e.g. 10MB level):
> {code:java}
> 2019-08-17 03:59:25,413 INFO namenode.TransferFsImage: Sending fileName: 
> /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 
> 9864721. Sent total: 524288 bytes. Size of last segment intended to send: 
> 4096 bytes.
>  java.io.IOException: Error writing request body to server
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:314)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:249)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:277)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:272)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  2019-08-17 03:59:25,422 INFO namenode.TransferFsImage: Sending fileName: 
> /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 
> 9864721. Sent total: 851968 bytes. Size of last segment intended to send: 
> 4096 bytes.
>  java.io.IOException: Error writing request body to server
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340)
>   {code}
>                   
> *Solution:*
>  A standby NameNode should not upload fsimage to an inappropriate NameNode, 
> when he plans to put a FsImage to the peer NN, he need to check whether he 
> really need to put it at this time.
> In detail, local SNN should establish an HTTP connection with the peer NN, 
> send the put request, and then immediately read the response (this is the key 
> point). If the peer NN does not reply an HTTP_OK, it means the local SNN 
> should not

[jira] [Updated] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.

2019-08-20 Thread Xudong Cao (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xudong Cao updated HDFS-14646:
--
Attachment: HDFS-14646.003.patch
Status: Patch Available  (was: Open)

> Standby NameNode should not upload fsimage to an inappropriate NameNode.
> 
>
> Key: HDFS-14646
> URL: https://issues.apache.org/jira/browse/HDFS-14646
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.2
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Major
> Attachments: HDFS-14646.000.patch, HDFS-14646.001.patch, 
> HDFS-14646.002.patch, HDFS-14646.003.patch
>
>
> *Problem Description:*
>  In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put 
> the image to all other NNs (whether the peer NN is an ANN or not), and even 
> if the peer NN immediately replies an error (such as 
> TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult 
> .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put 
> process immediately, but will put the FsImage completely to the peer NN, and 
> will not read the peer NN's reply until the put is completed.
> Depending on the version of Jetty, this behavior can lead to different 
> consequences, I tested it under 2.7.2 and trunk version. 
> *1.In Hadoop 2.7.2 (with Jetty 6.1.26)*
>  After peer NN called HttpServletResponse.sendError(), the underlying TCP 
> connection will still be established, and the data SNN sent will be read by 
> Jetty framework itself in the peer NN side, so the SNN will insignificantly 
> send the FsImage to the peer NN continuously, causing a waste of time and 
> bandwidth. In a relatively large HDFS cluster, the size of FsImage can often 
> reach about 30GB, This is indeed a big waste.
> *2.In trunk version (with Jetty 9.3.27)*
>  After peer NN called HttpServletResponse.sendError(), the underlying TCP 
> connection will be auto closed, and then SNN will directly get an "Error 
> writing request body to server" exception, as below, note this test needs a 
> relatively big FSImage (e.g. 10MB level):
> {code:java}
> 2019-08-17 03:59:25,413 INFO namenode.TransferFsImage: Sending fileName: 
> /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 
> 9864721. Sent total: 524288 bytes. Size of last segment intended to send: 
> 4096 bytes.
>  java.io.IOException: Error writing request body to server
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:314)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:249)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:277)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:272)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  2019-08-17 03:59:25,422 INFO namenode.TransferFsImage: Sending fileName: 
> /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 
> 9864721. Sent total: 851968 bytes. Size of last segment intended to send: 
> 4096 bytes.
>  java.io.IOException: Error writing request body to server
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340)
>   {code}
>                   
> *Solution:*
>  A standby NameNode should not upload fsimage to an inappropriate NameNode, 
> when he plans to put a FsImage to the peer NN, he need to check whether he 
> really need to put it at this time.
> In detail, local SNN should establish an HTTP connection with the peer NN, 
> send the put request, and then immediately read the response (this is the key 
> point). If the peer NN does not reply an

[jira] [Updated] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.

2019-08-20 Thread Xudong Cao (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xudong Cao updated HDFS-14646:
--
Status: Open  (was: Patch Available)

> Standby NameNode should not upload fsimage to an inappropriate NameNode.
> 
>
> Key: HDFS-14646
> URL: https://issues.apache.org/jira/browse/HDFS-14646
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.2
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Major
> Attachments: HDFS-14646.000.patch, HDFS-14646.001.patch, 
> HDFS-14646.002.patch, HDFS-14646.003.patch, HDFS-14646.003.patch
>
>
> *Problem Description:*
>  In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put 
> the image to all other NNs (whether the peer NN is an ANN or not), and even 
> if the peer NN immediately replies an error (such as 
> TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult 
> .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put 
> process immediately, but will put the FsImage completely to the peer NN, and 
> will not read the peer NN's reply until the put is completed.
> Depending on the version of Jetty, this behavior can lead to different 
> consequences, I tested it under 2.7.2 and trunk version. 
> *1.In Hadoop 2.7.2 (with Jetty 6.1.26)*
>  After peer NN called HttpServletResponse.sendError(), the underlying TCP 
> connection will still be established, and the data SNN sent will be read by 
> Jetty framework itself in the peer NN side, so the SNN will insignificantly 
> send the FsImage to the peer NN continuously, causing a waste of time and 
> bandwidth. In a relatively large HDFS cluster, the size of FsImage can often 
> reach about 30GB, This is indeed a big waste.
> *2.In trunk version (with Jetty 9.3.27)*
>  After peer NN called HttpServletResponse.sendError(), the underlying TCP 
> connection will be auto closed, and then SNN will directly get an "Error 
> writing request body to server" exception, as below, note this test needs a 
> relatively big FSImage (e.g. 10MB level):
> {code:java}
> 2019-08-17 03:59:25,413 INFO namenode.TransferFsImage: Sending fileName: 
> /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 
> 9864721. Sent total: 524288 bytes. Size of last segment intended to send: 
> 4096 bytes.
>  java.io.IOException: Error writing request body to server
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:314)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:249)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:277)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:272)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  2019-08-17 03:59:25,422 INFO namenode.TransferFsImage: Sending fileName: 
> /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 
> 9864721. Sent total: 851968 bytes. Size of last segment intended to send: 
> 4096 bytes.
>  java.io.IOException: Error writing request body to server
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340)
>   {code}
>                   
> *Solution:*
>  A standby NameNode should not upload fsimage to an inappropriate NameNode, 
> when he plans to put a FsImage to the peer NN, he need to check whether he 
> really need to put it at this time.
> In detail, local SNN should establish an HTTP connection with the peer NN, 
> send the put request, and then immediately read the response (this is the key 
> point). If the peer NN does not reply an HTTP_OK, it means the

[jira] [Updated] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.

2019-08-20 Thread Xudong Cao (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xudong Cao updated HDFS-14646:
--
Attachment: HDFS-14646.003.patch
Status: Patch Available  (was: Open)

> Standby NameNode should not upload fsimage to an inappropriate NameNode.
> 
>
> Key: HDFS-14646
> URL: https://issues.apache.org/jira/browse/HDFS-14646
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.2
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Major
> Attachments: HDFS-14646.000.patch, HDFS-14646.001.patch, 
> HDFS-14646.002.patch, HDFS-14646.003.patch, HDFS-14646.003.patch
>
>
> *Problem Description:*
>  In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put 
> the image to all other NNs (whether the peer NN is an ANN or not), and even 
> if the peer NN immediately replies an error (such as 
> TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult 
> .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put 
> process immediately, but will put the FsImage completely to the peer NN, and 
> will not read the peer NN's reply until the put is completed.
> Depending on the version of Jetty, this behavior can lead to different 
> consequences, I tested it under 2.7.2 and trunk version. 
> *1.In Hadoop 2.7.2 (with Jetty 6.1.26)*
>  After peer NN called HttpServletResponse.sendError(), the underlying TCP 
> connection will still be established, and the data SNN sent will be read by 
> Jetty framework itself in the peer NN side, so the SNN will insignificantly 
> send the FsImage to the peer NN continuously, causing a waste of time and 
> bandwidth. In a relatively large HDFS cluster, the size of FsImage can often 
> reach about 30GB, This is indeed a big waste.
> *2.In trunk version (with Jetty 9.3.27)*
>  After peer NN called HttpServletResponse.sendError(), the underlying TCP 
> connection will be auto closed, and then SNN will directly get an "Error 
> writing request body to server" exception, as below, note this test needs a 
> relatively big FSImage (e.g. 10MB level):
> {code:java}
> 2019-08-17 03:59:25,413 INFO namenode.TransferFsImage: Sending fileName: 
> /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 
> 9864721. Sent total: 524288 bytes. Size of last segment intended to send: 
> 4096 bytes.
>  java.io.IOException: Error writing request body to server
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:314)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:249)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:277)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:272)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  2019-08-17 03:59:25,422 INFO namenode.TransferFsImage: Sending fileName: 
> /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 
> 9864721. Sent total: 851968 bytes. Size of last segment intended to send: 
> 4096 bytes.
>  java.io.IOException: Error writing request body to server
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340)
>   {code}
>                   
> *Solution:*
>  A standby NameNode should not upload fsimage to an inappropriate NameNode, 
> when he plans to put a FsImage to the peer NN, he need to check whether he 
> really need to put it at this time.
> In detail, local SNN should establish an HTTP connection with the peer NN, 
> send the put request, and then immediately read the response (this is the key 
> point). If the peer NN

[jira] [Updated] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.

2019-08-16 Thread Xudong Cao (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xudong Cao updated HDFS-14646:
--
Description: 
*Problem Description:*
 In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put the 
image to all other NNs (whether the peer NN is an ANN or not), and even if the 
peer NN immediately replies an error (such as 
TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult 
.OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put 
process immediately, but will put the FsImage completely to the peer NN, and 
will not read the peer NN's reply until the put is completed.

Depending on the version of Jetty, this behavior can lead to different 
consequences, I tested it under 2.7.2 and trunk version. 

*1.In Hadoop 2.7.2 (with Jetty 6.1.26)*
 After peer NN called HttpServletResponse.sendError(), the underlying TCP 
connection will still be established, and the data SNN sent will be read by 
Jetty framework itself in the peer NN side, so the SNN will insignificantly 
send the FsImage to the peer NN continuously, causing a waste of time and 
bandwidth. In a relatively large HDFS cluster, the size of FsImage can often 
reach about 30GB, This is indeed a big waste.

*2.In trunk version (with Jetty 9.3.27)*
 After peer NN called HttpServletResponse.sendError(), the underlying TCP 
connection will be auto closed, and then SNN will directly get an "Error 
writing request body to server" exception, as below, note this test needs a 
relatively big FSImage (e.g. 10MB level):
{code:java}
2019-08-17 03:59:25,413 INFO namenode.TransferFsImage: Sending fileName: 
/tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 
9864721. Sent total: 524288 bytes. Size of last segment intended to send: 4096 
bytes.
 java.io.IOException: Error writing request body to server
 at 
sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
 at 
sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
 at 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396)
 at 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340)
 at 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:314)
 at 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:249)
 at 
org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:277)
 at 
org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:272)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)
 2019-08-17 03:59:25,422 INFO namenode.TransferFsImage: Sending fileName: 
/tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 
9864721. Sent total: 851968 bytes. Size of last segment intended to send: 4096 
bytes.
 java.io.IOException: Error writing request body to server
 at 
sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
 at 
sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
 at 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396)
 at 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340)
  {code}
                  

*Solution:*
 A standby NameNode should not upload fsimage to an inappropriate NameNode, 
when he plans to put a FsImage to the peer NN, he need to check whether he 
really need to put it at this time.

In detail, local SNN should establish an HTTP connection with the peer NN, send 
the put request, and then immediately read the response (this is the key 
point). If the peer NN does not reply an HTTP_OK, it means the local SNN should 
not put image at this time.

  was:
*Problem Description:*
 In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put the 
image to all other NNs (whether the peer NN is an ANN or not), and even if the 
peer NN immediately replies an error (such as 
TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult 
.OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put 
process immediately, but will put the FsImage completely to the peer NN, and 
will not read the peer NN's reply until the put is completed.

Depending on the version of Jetty, this behavior can lead to different 
consequences, I tested it under 2.7.2 and trunk version. 

*1.In Hadoop 2.7.2 (with Jetty 6.1.26)*
 After peer NN called

[jira] [Updated] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.

2019-08-17 Thread Xudong Cao (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xudong Cao updated HDFS-14646:
--
Status: Open  (was: Patch Available)

> Standby NameNode should not upload fsimage to an inappropriate NameNode.
> 
>
> Key: HDFS-14646
> URL: https://issues.apache.org/jira/browse/HDFS-14646
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.2
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Major
> Attachments: HDFS-14646.000.patch, HDFS-14646.001.patch
>
>
> *Problem Description:*
>  In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put 
> the image to all other NNs (whether the peer NN is an ANN or not), and even 
> if the peer NN immediately replies an error (such as 
> TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult 
> .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put 
> process immediately, but will put the FsImage completely to the peer NN, and 
> will not read the peer NN's reply until the put is completed.
> Depending on the version of Jetty, this behavior can lead to different 
> consequences, I tested it under 2.7.2 and trunk version. 
> *1.In Hadoop 2.7.2 (with Jetty 6.1.26)*
>  After peer NN called HttpServletResponse.sendError(), the underlying TCP 
> connection will still be established, and the data SNN sent will be read by 
> Jetty framework itself in the peer NN side, so the SNN will insignificantly 
> send the FsImage to the peer NN continuously, causing a waste of time and 
> bandwidth. In a relatively large HDFS cluster, the size of FsImage can often 
> reach about 30GB, This is indeed a big waste.
> *2.In trunk version (with Jetty 9.3.27)*
>  After peer NN called HttpServletResponse.sendError(), the underlying TCP 
> connection will be auto closed, and then SNN will directly get an "Error 
> writing request body to server" exception, as below, note this test needs a 
> relatively big FSImage (e.g. 10MB level):
> {code:java}
> 2019-08-17 03:59:25,413 INFO namenode.TransferFsImage: Sending fileName: 
> /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 
> 9864721. Sent total: 524288 bytes. Size of last segment intended to send: 
> 4096 bytes.
>  java.io.IOException: Error writing request body to server
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:314)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:249)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:277)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:272)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  2019-08-17 03:59:25,422 INFO namenode.TransferFsImage: Sending fileName: 
> /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 
> 9864721. Sent total: 851968 bytes. Size of last segment intended to send: 
> 4096 bytes.
>  java.io.IOException: Error writing request body to server
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340)
>   {code}
>                   
> *Solution:*
>  A standby NameNode should not upload fsimage to an inappropriate NameNode, 
> when he plans to put a FsImage to the peer NN, he need to check whether he 
> really need to put it at this time.
> In detail, local SNN should establish an HTTP connection with the peer NN, 
> send the put request, and then immediately read the response (this is the key 
> point). If the peer NN does not reply an HTTP_OK, it means the local SNN 
> should not put image at this time.



--
This message was

[jira] [Updated] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.

2019-08-17 Thread Xudong Cao (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xudong Cao updated HDFS-14646:
--
Attachment: HDFS-14646.002.patch
Status: Patch Available  (was: Open)

> Standby NameNode should not upload fsimage to an inappropriate NameNode.
> 
>
> Key: HDFS-14646
> URL: https://issues.apache.org/jira/browse/HDFS-14646
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.2
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Major
> Attachments: HDFS-14646.000.patch, HDFS-14646.001.patch, 
> HDFS-14646.002.patch
>
>
> *Problem Description:*
>  In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put 
> the image to all other NNs (whether the peer NN is an ANN or not), and even 
> if the peer NN immediately replies an error (such as 
> TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult 
> .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put 
> process immediately, but will put the FsImage completely to the peer NN, and 
> will not read the peer NN's reply until the put is completed.
> Depending on the version of Jetty, this behavior can lead to different 
> consequences, I tested it under 2.7.2 and trunk version. 
> *1.In Hadoop 2.7.2 (with Jetty 6.1.26)*
>  After peer NN called HttpServletResponse.sendError(), the underlying TCP 
> connection will still be established, and the data SNN sent will be read by 
> Jetty framework itself in the peer NN side, so the SNN will insignificantly 
> send the FsImage to the peer NN continuously, causing a waste of time and 
> bandwidth. In a relatively large HDFS cluster, the size of FsImage can often 
> reach about 30GB, This is indeed a big waste.
> *2.In trunk version (with Jetty 9.3.27)*
>  After peer NN called HttpServletResponse.sendError(), the underlying TCP 
> connection will be auto closed, and then SNN will directly get an "Error 
> writing request body to server" exception, as below, note this test needs a 
> relatively big FSImage (e.g. 10MB level):
> {code:java}
> 2019-08-17 03:59:25,413 INFO namenode.TransferFsImage: Sending fileName: 
> /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 
> 9864721. Sent total: 524288 bytes. Size of last segment intended to send: 
> 4096 bytes.
>  java.io.IOException: Error writing request body to server
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:314)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:249)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:277)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:272)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  2019-08-17 03:59:25,422 INFO namenode.TransferFsImage: Sending fileName: 
> /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 
> 9864721. Sent total: 851968 bytes. Size of last segment intended to send: 
> 4096 bytes.
>  java.io.IOException: Error writing request body to server
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340)
>   {code}
>                   
> *Solution:*
>  A standby NameNode should not upload fsimage to an inappropriate NameNode, 
> when he plans to put a FsImage to the peer NN, he need to check whether he 
> really need to put it at this time.
> In detail, local SNN should establish an HTTP connection with the peer NN, 
> send the put request, and then immediately read the response (this is the key 
> point). If the peer NN does not reply an HTTP_OK, it means the

[jira] [Commented] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.

2019-08-20 Thread Xudong Cao (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16911885#comment-16911885
 ] 

Xudong Cao commented on HDFS-14646:
---

cc [~xkrogen],[~elgoiri],[~csun]The failed unit test has nothing to do with 
this jira. 

> Standby NameNode should not upload fsimage to an inappropriate NameNode.
> 
>
> Key: HDFS-14646
> URL: https://issues.apache.org/jira/browse/HDFS-14646
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.2
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Major
> Attachments: HDFS-14646.000.patch, HDFS-14646.001.patch, 
> HDFS-14646.002.patch, HDFS-14646.003.patch
>
>
> *Problem Description:*
>  In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put 
> the image to all other NNs (whether the peer NN is an ANN or not), and even 
> if the peer NN immediately replies an error (such as 
> TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult 
> .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put 
> process immediately, but will put the FsImage completely to the peer NN, and 
> will not read the peer NN's reply until the put is completed.
> Depending on the version of Jetty, this behavior can lead to different 
> consequences, I tested it under 2.7.2 and trunk version. 
> *1.In Hadoop 2.7.2 (with Jetty 6.1.26)*
>  After peer NN called HttpServletResponse.sendError(), the underlying TCP 
> connection will still be established, and the data SNN sent will be read by 
> Jetty framework itself in the peer NN side, so the SNN will insignificantly 
> send the FsImage to the peer NN continuously, causing a waste of time and 
> bandwidth. In a relatively large HDFS cluster, the size of FsImage can often 
> reach about 30GB, This is indeed a big waste.
> *2.In trunk version (with Jetty 9.3.27)*
>  After peer NN called HttpServletResponse.sendError(), the underlying TCP 
> connection will be auto closed, and then SNN will directly get an "Error 
> writing request body to server" exception, as below, note this test needs a 
> relatively big FSImage (e.g. 10MB level):
> {code:java}
> 2019-08-17 03:59:25,413 INFO namenode.TransferFsImage: Sending fileName: 
> /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 
> 9864721. Sent total: 524288 bytes. Size of last segment intended to send: 
> 4096 bytes.
>  java.io.IOException: Error writing request body to server
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:314)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:249)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:277)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:272)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  2019-08-17 03:59:25,422 INFO namenode.TransferFsImage: Sending fileName: 
> /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 
> 9864721. Sent total: 851968 bytes. Size of last segment intended to send: 
> 4096 bytes.
>  java.io.IOException: Error writing request body to server
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340)
>   {code}
>                   
> *Solution:*
>  A standby NameNode should not upload fsimage to an inappropriate NameNode, 
> when he plans to put a FsImage to the peer NN, he need to check whether he 
> really need to put it at this time.
> In detail, local SNN should establish an HTTP connection with the peer NN, 
> send the put request, and then immediately read the response (this is the key

[jira] [Commented] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.

2019-08-25 Thread Xudong Cao (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915412#comment-16915412
 ] 

Xudong Cao commented on HDFS-14646:
---

Thanks [~csun] , our production environment uses Hadoop version 2.7.2, but 
merged multi-SBN feature. We have about 20,000 machines (divided into dozens of 
clusters),  not long ago, we frequently encountered the problem described by 
the jira.  and after this patch was merged, the errors are no longer reported.

> Standby NameNode should not upload fsimage to an inappropriate NameNode.
> 
>
> Key: HDFS-14646
> URL: https://issues.apache.org/jira/browse/HDFS-14646
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.2
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Major
> Attachments: HDFS-14646.000.patch, HDFS-14646.001.patch, 
> HDFS-14646.002.patch, HDFS-14646.003.patch
>
>
> *Problem Description:*
>  In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put 
> the image to all other NNs (whether the peer NN is an ANN or not), and even 
> if the peer NN immediately replies an error (such as 
> TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult 
> .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put 
> process immediately, but will put the FsImage completely to the peer NN, and 
> will not read the peer NN's reply until the put is completed.
> Depending on the version of Jetty, this behavior can lead to different 
> consequences, I tested it under 2.7.2 and trunk version. 
> *1.In Hadoop 2.7.2 (with Jetty 6.1.26)*
>  After peer NN called HttpServletResponse.sendError(), the underlying TCP 
> connection will still be established, and the data SNN sent will be read by 
> Jetty framework itself in the peer NN side, so the SNN will insignificantly 
> send the FsImage to the peer NN continuously, causing a waste of time and 
> bandwidth. In a relatively large HDFS cluster, the size of FsImage can often 
> reach about 30GB, This is indeed a big waste.
> *2.In trunk version (with Jetty 9.3.27)*
>  After peer NN called HttpServletResponse.sendError(), the underlying TCP 
> connection will be auto closed, and then SNN will directly get an "Error 
> writing request body to server" exception, as below, note this test needs a 
> relatively big FSImage (e.g. 10MB level):
> {code:java}
> 2019-08-17 03:59:25,413 INFO namenode.TransferFsImage: Sending fileName: 
> /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 
> 9864721. Sent total: 524288 bytes. Size of last segment intended to send: 
> 4096 bytes.
>  java.io.IOException: Error writing request body to server
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:314)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:249)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:277)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:272)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  2019-08-17 03:59:25,422 INFO namenode.TransferFsImage: Sending fileName: 
> /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 
> 9864721. Sent total: 851968 bytes. Size of last segment intended to send: 
> 4096 bytes.
>  java.io.IOException: Error writing request body to server
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340)
>   {code}
>                   
> *Solution:*
>  A standby NameNode should not upload fsimage to an inappropriate NameNode, 
> when he plans to put a FsImage to the peer NN,

[jira] [Comment Edited] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.

2019-08-25 Thread Xudong Cao (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915412#comment-16915412
 ] 

Xudong Cao edited comment on HDFS-14646 at 8/26/19 1:58 AM:


Thanks [~csun] , our production environment uses Hadoop version 2.7.2, but 
merged multi-SBN feature. We have about 20,000 machines (divided into dozens of 
clusters).  Not long ago, we frequently encountered the problem described by 
the jira.  and after this patch was merged, the errors are no longer reported.


was (Author: xudongcao):
Thanks [~csun] , our production environment uses Hadoop version 2.7.2, but 
merged multi-SBN feature. We have about 20,000 machines (divided into dozens of 
clusters),  not long ago, we frequently encountered the problem described by 
the jira.  and after this patch was merged, the errors are no longer reported.

> Standby NameNode should not upload fsimage to an inappropriate NameNode.
> 
>
> Key: HDFS-14646
> URL: https://issues.apache.org/jira/browse/HDFS-14646
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.2
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Major
> Attachments: HDFS-14646.000.patch, HDFS-14646.001.patch, 
> HDFS-14646.002.patch, HDFS-14646.003.patch
>
>
> *Problem Description:*
>  In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put 
> the image to all other NNs (whether the peer NN is an ANN or not), and even 
> if the peer NN immediately replies an error (such as 
> TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult 
> .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put 
> process immediately, but will put the FsImage completely to the peer NN, and 
> will not read the peer NN's reply until the put is completed.
> Depending on the version of Jetty, this behavior can lead to different 
> consequences, I tested it under 2.7.2 and trunk version. 
> *1.In Hadoop 2.7.2 (with Jetty 6.1.26)*
>  After peer NN called HttpServletResponse.sendError(), the underlying TCP 
> connection will still be established, and the data SNN sent will be read by 
> Jetty framework itself in the peer NN side, so the SNN will insignificantly 
> send the FsImage to the peer NN continuously, causing a waste of time and 
> bandwidth. In a relatively large HDFS cluster, the size of FsImage can often 
> reach about 30GB, This is indeed a big waste.
> *2.In trunk version (with Jetty 9.3.27)*
>  After peer NN called HttpServletResponse.sendError(), the underlying TCP 
> connection will be auto closed, and then SNN will directly get an "Error 
> writing request body to server" exception, as below, note this test needs a 
> relatively big FSImage (e.g. 10MB level):
> {code:java}
> 2019-08-17 03:59:25,413 INFO namenode.TransferFsImage: Sending fileName: 
> /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 
> 9864721. Sent total: 524288 bytes. Size of last segment intended to send: 
> 4096 bytes.
>  java.io.IOException: Error writing request body to server
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:314)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:249)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:277)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:272)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  2019-08-17 03:59:25,422 INFO namenode.TransferFsImage: Sending fileName: 
> /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 
> 9864721. Sent total: 851968 bytes. Size of last segment intended to send: 
> 4096 bytes.
>  java.io.IOException: Error writing request body to server
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
>

[jira] [Comment Edited] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.

2019-08-25 Thread Xudong Cao (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915412#comment-16915412
 ] 

Xudong Cao edited comment on HDFS-14646 at 8/26/19 2:07 AM:


Thanks [~csun] , our production environment uses Hadoop version 2.7.2, but 
merged multi-SBN feature. We have about 20,000 machines (divided into dozens of 
HDFS clusters).  Not long ago, we frequently encountered the problem described 
by the jira.  and after this patch was merged, the errors are no longer 
reported.


was (Author: xudongcao):
Thanks [~csun] , our production environment uses Hadoop version 2.7.2, but 
merged multi-SBN feature. We have about 20,000 machines (divided into dozens of 
clusters).  Not long ago, we frequently encountered the problem described by 
the jira.  and after this patch was merged, the errors are no longer reported.

> Standby NameNode should not upload fsimage to an inappropriate NameNode.
> 
>
> Key: HDFS-14646
> URL: https://issues.apache.org/jira/browse/HDFS-14646
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.2
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Major
> Attachments: HDFS-14646.000.patch, HDFS-14646.001.patch, 
> HDFS-14646.002.patch, HDFS-14646.003.patch
>
>
> *Problem Description:*
>  In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put 
> the image to all other NNs (whether the peer NN is an ANN or not), and even 
> if the peer NN immediately replies an error (such as 
> TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult 
> .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put 
> process immediately, but will put the FsImage completely to the peer NN, and 
> will not read the peer NN's reply until the put is completed.
> Depending on the version of Jetty, this behavior can lead to different 
> consequences, I tested it under 2.7.2 and trunk version. 
> *1.In Hadoop 2.7.2 (with Jetty 6.1.26)*
>  After peer NN called HttpServletResponse.sendError(), the underlying TCP 
> connection will still be established, and the data SNN sent will be read by 
> Jetty framework itself in the peer NN side, so the SNN will insignificantly 
> send the FsImage to the peer NN continuously, causing a waste of time and 
> bandwidth. In a relatively large HDFS cluster, the size of FsImage can often 
> reach about 30GB, This is indeed a big waste.
> *2.In trunk version (with Jetty 9.3.27)*
>  After peer NN called HttpServletResponse.sendError(), the underlying TCP 
> connection will be auto closed, and then SNN will directly get an "Error 
> writing request body to server" exception, as below, note this test needs a 
> relatively big FSImage (e.g. 10MB level):
> {code:java}
> 2019-08-17 03:59:25,413 INFO namenode.TransferFsImage: Sending fileName: 
> /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 
> 9864721. Sent total: 524288 bytes. Size of last segment intended to send: 
> 4096 bytes.
>  java.io.IOException: Error writing request body to server
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:314)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:249)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:277)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:272)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  2019-08-17 03:59:25,422 INFO namenode.TransferFsImage: Sending fileName: 
> /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 
> 9864721. Sent total: 851968 bytes. Size of last segment intended to send: 
> 4096 bytes.
>  java.io.IOException: Error writing request body to server
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
>  at 
>

[jira] [Comment Edited] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.

2019-08-25 Thread Xudong Cao (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915412#comment-16915412
 ] 

Xudong Cao edited comment on HDFS-14646 at 8/26/19 2:08 AM:


Thanks [~csun] , our production environment uses Hadoop version 2.7.2, but 
merged multi-SBN feature. We have about 20,000 machines (divided into dozens of 
HDFS clusters).  Not long ago, we frequently encountered the problem described 
by this jira.  and after this patch was merged, the errors are no longer 
reported.


was (Author: xudongcao):
Thanks [~csun] , our production environment uses Hadoop version 2.7.2, but 
merged multi-SBN feature. We have about 20,000 machines (divided into dozens of 
HDFS clusters).  Not long ago, we frequently encountered the problem described 
by the jira.  and after this patch was merged, the errors are no longer 
reported.

> Standby NameNode should not upload fsimage to an inappropriate NameNode.
> 
>
> Key: HDFS-14646
> URL: https://issues.apache.org/jira/browse/HDFS-14646
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.2
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Major
> Attachments: HDFS-14646.000.patch, HDFS-14646.001.patch, 
> HDFS-14646.002.patch, HDFS-14646.003.patch
>
>
> *Problem Description:*
>  In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put 
> the image to all other NNs (whether the peer NN is an ANN or not), and even 
> if the peer NN immediately replies an error (such as 
> TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult 
> .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put 
> process immediately, but will put the FsImage completely to the peer NN, and 
> will not read the peer NN's reply until the put is completed.
> Depending on the version of Jetty, this behavior can lead to different 
> consequences, I tested it under 2.7.2 and trunk version. 
> *1.In Hadoop 2.7.2 (with Jetty 6.1.26)*
>  After peer NN called HttpServletResponse.sendError(), the underlying TCP 
> connection will still be established, and the data SNN sent will be read by 
> Jetty framework itself in the peer NN side, so the SNN will insignificantly 
> send the FsImage to the peer NN continuously, causing a waste of time and 
> bandwidth. In a relatively large HDFS cluster, the size of FsImage can often 
> reach about 30GB, This is indeed a big waste.
> *2.In trunk version (with Jetty 9.3.27)*
>  After peer NN called HttpServletResponse.sendError(), the underlying TCP 
> connection will be auto closed, and then SNN will directly get an "Error 
> writing request body to server" exception, as below, note this test needs a 
> relatively big FSImage (e.g. 10MB level):
> {code:java}
> 2019-08-17 03:59:25,413 INFO namenode.TransferFsImage: Sending fileName: 
> /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 
> 9864721. Sent total: 524288 bytes. Size of last segment intended to send: 
> 4096 bytes.
>  java.io.IOException: Error writing request body to server
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:314)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:249)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:277)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:272)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  2019-08-17 03:59:25,422 INFO namenode.TransferFsImage: Sending fileName: 
> /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 
> 9864721. Sent total: 851968 bytes. Size of last segment intended to send: 
> 4096 bytes.
>  java.io.IOException: Error writing request body to server
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
>  at 
>

[jira] [Comment Edited] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.

2019-09-09 Thread Xudong Cao (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925408#comment-16925408
 ] 

Xudong Cao edited comment on HDFS-14646 at 9/9/19 6:44 AM:
---

Thanks [~csun]  very mush for reviewing, I have corrected all the problems you 
pointed out.

Additionally:
 #  I test this jira under the newest release-3.2.0-RC1 (with Jetty 9.3.24), it 
behaves just the same like trunk.
 # Disable http keep-alive in TransferFsImage.setupConnection(), because it 
sometimes causes bugs. Specifically, when the client sends a same put request 
again, the servlet sometimes will not call the corresponding doPut(), this is a 
very small probability bug.
 # In ImageServlet.doPut(), a ImageUploadRequest should be added to the set 
ImageServlet.currentlyDownloadingCheckpoints in check-should-put step, and 
removed in the end of transfer image step, so that between this two steps, 
other NNs can not interfere.

Another bug fixed:
 # ImageServlet.doPut() doesn't add anything to the set 
FSImage.currentlyCheckpointing, so it doesn't need to invoke 
currentlyCheckpointing.remove() to remove anything from this set too. I will 
fix the comment for FSImage.currentlyCheckpointing in another jira if needed 
later.


was (Author: xudongcao):
Thanks [~csun]  very mush for reviewing, I have corrected all the problems you 
pointed out.

Additionally:
 #  I test this jira under the newest release-3.2.0-RC1 (with Jetty 9.3.24), it 
behaves just the same like trunk,
 # Disable http keep-alive in TransferFsImage.setupConnection(), because it 
sometimes causes bugs. Specifically, when the client sends a same put request 
again, the servlet sometimes will not call the corresponding doPut().
 # In ImageServlet.doPut(), a ImageUploadRequest should be added to the set 
ImageServlet.currentlyDownloadingCheckpoints in check-should-put step, and 
removed in the end of transfer image step, so that between this two steps, 
other NNs can not interfere.

Another bug fixed:
 # ImageServlet.doPut() doesn't add anything to the set 
FSImage.currentlyCheckpointing, so it doesn't need to invoke 
currentlyCheckpointing.remove() to remove anything from this set too. I will 
fix the comment for FSImage.currentlyCheckpointing in another jira if needed 
later.

> Standby NameNode should not upload fsimage to an inappropriate NameNode.
> 
>
> Key: HDFS-14646
> URL: https://issues.apache.org/jira/browse/HDFS-14646
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.2
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Major
> Attachments: HDFS-14646.000.patch, HDFS-14646.001.patch, 
> HDFS-14646.002.patch, HDFS-14646.003.patch, HDFS-14646.004.patch
>
>
> *Problem Description:*
>  In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put 
> the image to all other NNs (whether the peer NN is an ANN or not), and even 
> if the peer NN immediately replies an error (such as 
> TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult 
> .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put 
> process immediately, but will put the FsImage completely to the peer NN, and 
> will not read the peer NN's reply until the put is completed.
> Depending on the version of Jetty, this behavior can lead to different 
> consequences : 
> *1.Under Hadoop 2.7.2 (with Jetty 6.1.26)*
>  After peer NN called HttpServletResponse.sendError(), the underlying TCP 
> connection will still be established, and the data SNN sent will be read by 
> Jetty framework itself in the peer NN side, so the SNN will insignificantly 
> send the FsImage to the peer NN continuously, causing a waste of time and 
> bandwidth. In a relatively large HDFS cluster, the size of FsImage can often 
> reach about 30GB, This is indeed a big waste.
> *2.Under newest release-3.2.0-RC1 (with Jetty 9.3.24) and trunk (with Jetty 
> 9.3.27)*
>  After peer NN called HttpServletResponse.sendError(), the underlying TCP 
> connection will be auto closed, and then SNN will directly get an "Error 
> writing request body to server" exception, as below, note this test needs a 
> relatively big FSImage (e.g. 10MB level):
> {code:java}
> 2019-08-17 03:59:25,413 INFO namenode.TransferFsImage: Sending fileName: 
> /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 
> 9864721. Sent total: 524288 bytes. Size of last segment intended to send: 
> 4096 bytes.
>  java.io.IOException: Error writing request body to server
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
>  at 
>

[jira] [Updated] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.

2019-09-09 Thread Xudong Cao (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xudong Cao updated HDFS-14646:
--
Description: 
*Problem Description:*
 In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put the 
image to all other NNs (whether the peer NN is an ANN or not), and even if the 
peer NN immediately replies an error (such as 
TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult 
.OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put 
process immediately, but will put the FsImage completely to the peer NN, and 
will not read the peer NN's reply until the put is completed.

Depending on the version of Jetty, this behavior can lead to different 
consequences : 

*1.Under Hadoop 2.7.2 (with Jetty 6.1.26)*
 After peer NN called HttpServletResponse.sendError(), the underlying TCP 
connection will still be established, and the data SNN sent will be read by 
Jetty framework itself in the peer NN side, so the SNN will insignificantly 
send the FsImage to the peer NN continuously, causing a waste of time and 
bandwidth. In a relatively large HDFS cluster, the size of FsImage can often 
reach about 30GB, This is indeed a big waste.

*2.Under newest release-3.2.0-RC1 (with Jetty 9.3.24) and trunk (with Jetty 
9.3.27)*
 After peer NN called HttpServletResponse.sendError(), the underlying TCP 
connection will be auto closed, and then SNN will directly get an "Error 
writing request body to server" exception, as below, note this test needs a 
relatively big FSImage (e.g. 10MB level):
{code:java}
2019-08-17 03:59:25,413 INFO namenode.TransferFsImage: Sending fileName: 
/tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 
9864721. Sent total: 524288 bytes. Size of last segment intended to send: 4096 
bytes.
 java.io.IOException: Error writing request body to server
 at 
sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
 at 
sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
 at 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396)
 at 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340)
 at 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:314)
 at 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:249)
 at 
org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:277)
 at 
org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:272)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)
 2019-08-17 03:59:25,422 INFO namenode.TransferFsImage: Sending fileName: 
/tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 
9864721. Sent total: 851968 bytes. Size of last segment intended to send: 4096 
bytes.
 java.io.IOException: Error writing request body to server
 at 
sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
 at 
sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
 at 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396)
 at 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340)
  {code}
                  

*Solution:*
 A standby NameNode should not upload fsimage to an inappropriate NameNode, 
when he plans to put a FsImage to the peer NN, he need to check whether he 
really need to put it at this time.

In detail, local SNN should establish an HTTP connection with the peer NN, send 
the put request, and then immediately read the response (this is the key 
point). If the peer NN does not reply an HTTP_OK, it means the local SNN should 
not put image at this time.

  was:
*Problem Description:*
 In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put the 
image to all other NNs (whether the peer NN is an ANN or not), and even if the 
peer NN immediately replies an error (such as 
TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult 
.OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put 
process immediately, but will put the FsImage completely to the peer NN, and 
will not read the peer NN's reply until the put is completed.

Depending on the version of Jetty, this behavior can lead to different 
consequences : 

*1.Under Hadoop 2.7.2 (with Jetty 6.1.26)*
 After peer NN called HttpServletResponse.sendError(), the underlying TCP

[jira] [Updated] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.

2019-09-08 Thread Xudong Cao (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xudong Cao updated HDFS-14646:
--
Status: Open  (was: Patch Available)

> Standby NameNode should not upload fsimage to an inappropriate NameNode.
> 
>
> Key: HDFS-14646
> URL: https://issues.apache.org/jira/browse/HDFS-14646
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.2
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Major
> Attachments: HDFS-14646.000.patch, HDFS-14646.001.patch, 
> HDFS-14646.002.patch, HDFS-14646.003.patch
>
>
> *Problem Description:*
>  In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put 
> the image to all other NNs (whether the peer NN is an ANN or not), and even 
> if the peer NN immediately replies an error (such as 
> TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult 
> .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put 
> process immediately, but will put the FsImage completely to the peer NN, and 
> will not read the peer NN's reply until the put is completed.
> Depending on the version of Jetty, this behavior can lead to different 
> consequences, I tested it under 2.7.2 and trunk version. 
> *1.In Hadoop 2.7.2 (with Jetty 6.1.26)*
>  After peer NN called HttpServletResponse.sendError(), the underlying TCP 
> connection will still be established, and the data SNN sent will be read by 
> Jetty framework itself in the peer NN side, so the SNN will insignificantly 
> send the FsImage to the peer NN continuously, causing a waste of time and 
> bandwidth. In a relatively large HDFS cluster, the size of FsImage can often 
> reach about 30GB, This is indeed a big waste.
> *2.In trunk version (with Jetty 9.3.27)*
>  After peer NN called HttpServletResponse.sendError(), the underlying TCP 
> connection will be auto closed, and then SNN will directly get an "Error 
> writing request body to server" exception, as below, note this test needs a 
> relatively big FSImage (e.g. 10MB level):
> {code:java}
> 2019-08-17 03:59:25,413 INFO namenode.TransferFsImage: Sending fileName: 
> /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 
> 9864721. Sent total: 524288 bytes. Size of last segment intended to send: 
> 4096 bytes.
>  java.io.IOException: Error writing request body to server
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:314)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:249)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:277)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:272)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  2019-08-17 03:59:25,422 INFO namenode.TransferFsImage: Sending fileName: 
> /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 
> 9864721. Sent total: 851968 bytes. Size of last segment intended to send: 
> 4096 bytes.
>  java.io.IOException: Error writing request body to server
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340)
>   {code}
>                   
> *Solution:*
>  A standby NameNode should not upload fsimage to an inappropriate NameNode, 
> when he plans to put a FsImage to the peer NN, he need to check whether he 
> really need to put it at this time.
> In detail, local SNN should establish an HTTP connection with the peer NN, 
> send the put request, and then immediately read the response (this is the key 
> point). If the peer NN does not reply an HTTP_OK, it means the local SNN 
> should not

[jira] [Updated] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.

2019-09-09 Thread Xudong Cao (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xudong Cao updated HDFS-14646:
--
Description: 
*Problem Description:*
 In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put the 
image to all other NNs (whether the peer NN is an ANN or not), and even if the 
peer NN immediately replies an error (such as 
TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult 
.OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put 
process immediately, but will put the FsImage completely to the peer NN, and 
will not read the peer NN's reply until the put is completed.

Depending on the version of Jetty, this behavior can lead to different 
consequences : 

*1.Under Hadoop 2.7.2 (with Jetty 6.1.26)*
 After peer NN called HttpServletResponse.sendError(), the underlying TCP 
connection will still be established, and the data SNN sent will be read by 
Jetty framework itself in the peer NN side, so the SNN will insignificantly 
send the FsImage to the peer NN continuously, causing a waste of time and 
bandwidth. In a relatively large HDFS cluster, the size of FsImage can often 
reach about 30GB, This is indeed a big waste.

*2.Under release-3.2.0-RC1 (with Jetty 9.3.24) and trunk (with Jetty 9.3.27)*
 After peer NN called HttpServletResponse.sendError(), the underlying TCP 
connection will be auto closed, and then SNN will directly get an "Error 
writing request body to server" exception, as below, note this test needs a 
relatively big FSImage (e.g. 10MB level):
{code:java}
2019-08-17 03:59:25,413 INFO namenode.TransferFsImage: Sending fileName: 
/tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 
9864721. Sent total: 524288 bytes. Size of last segment intended to send: 4096 
bytes.
 java.io.IOException: Error writing request body to server
 at 
sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
 at 
sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
 at 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396)
 at 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340)
 at 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:314)
 at 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:249)
 at 
org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:277)
 at 
org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:272)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)
 2019-08-17 03:59:25,422 INFO namenode.TransferFsImage: Sending fileName: 
/tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 
9864721. Sent total: 851968 bytes. Size of last segment intended to send: 4096 
bytes.
 java.io.IOException: Error writing request body to server
 at 
sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
 at 
sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
 at 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396)
 at 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340)
  {code}
                  

*Solution:*
 A standby NameNode should not upload fsimage to an inappropriate NameNode, 
when he plans to put a FsImage to the peer NN, he need to check whether he 
really need to put it at this time.

In detail, local SNN should establish an HTTP connection with the peer NN, send 
the put request, and then immediately read the response (this is the key 
point). If the peer NN does not reply an HTTP_OK, it means the local SNN should 
not put image at this time.

  was:
*Problem Description:*
 In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put the 
image to all other NNs (whether the peer NN is an ANN or not), and even if the 
peer NN immediately replies an error (such as 
TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult 
.OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put 
process immediately, but will put the FsImage completely to the peer NN, and 
will not read the peer NN's reply until the put is completed.

Depending on the version of Jetty, this behavior can lead to different 
consequences, I tested it under 2.7.2 and trunk version. 

*1.In Hadoop 2.7.2 (with Jetty 6.1.26)*
 After peer NN called

[jira] [Commented] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.

2019-09-09 Thread Xudong Cao (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925408#comment-16925408
 ] 

Xudong Cao commented on HDFS-14646:
---

Thanks [~csun]  very mush for reviewing, I have corrected all the problems you 
pointed out.

Additionally:
 #  I test this jira under the newest release-3.2.0-RC1 (with Jetty 9.3.24), it 
behaves just the same like trunk,
 # Disable http keep-alive in TransferFsImage.setupConnection(), because it 
sometimes causes bugs. Specifically, when the client sends a same put request 
again, the servlet sometimes will not call the corresponding doPut().
 # In ImageServlet.doPut(), a ImageUploadRequest should be added to the set 
ImageServlet.currentlyDownloadingCheckpoints in check-should-put step, and 
removed in the end of transfer image step, so that between this two steps, 
other NNs can not interfere.

Another bug fixed:
 # ImageServlet.doPut() doesn't add anything to the set 
FSImage.currentlyCheckpointing, so it doesn't need to invoke 
currentlyCheckpointing.remove() to remove anything from this set too. I will 
fix the comment for FSImage.currentlyCheckpointing in another jira if needed 
later.

> Standby NameNode should not upload fsimage to an inappropriate NameNode.
> 
>
> Key: HDFS-14646
> URL: https://issues.apache.org/jira/browse/HDFS-14646
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.2
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Major
> Attachments: HDFS-14646.000.patch, HDFS-14646.001.patch, 
> HDFS-14646.002.patch, HDFS-14646.003.patch, HDFS-14646.004.patch
>
>
> *Problem Description:*
>  In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put 
> the image to all other NNs (whether the peer NN is an ANN or not), and even 
> if the peer NN immediately replies an error (such as 
> TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult 
> .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put 
> process immediately, but will put the FsImage completely to the peer NN, and 
> will not read the peer NN's reply until the put is completed.
> Depending on the version of Jetty, this behavior can lead to different 
> consequences, I tested it under 2.7.2 and trunk version. 
> *1.In Hadoop 2.7.2 (with Jetty 6.1.26)*
>  After peer NN called HttpServletResponse.sendError(), the underlying TCP 
> connection will still be established, and the data SNN sent will be read by 
> Jetty framework itself in the peer NN side, so the SNN will insignificantly 
> send the FsImage to the peer NN continuously, causing a waste of time and 
> bandwidth. In a relatively large HDFS cluster, the size of FsImage can often 
> reach about 30GB, This is indeed a big waste.
> *2.In trunk version (with Jetty 9.3.27)*
>  After peer NN called HttpServletResponse.sendError(), the underlying TCP 
> connection will be auto closed, and then SNN will directly get an "Error 
> writing request body to server" exception, as below, note this test needs a 
> relatively big FSImage (e.g. 10MB level):
> {code:java}
> 2019-08-17 03:59:25,413 INFO namenode.TransferFsImage: Sending fileName: 
> /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 
> 9864721. Sent total: 524288 bytes. Size of last segment intended to send: 
> 4096 bytes.
>  java.io.IOException: Error writing request body to server
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:314)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:249)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:277)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:272)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  2019-08-17 03:59:25,422 INFO namenode.TransferFsImage: Sending fileName: 
> /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 
> 9864721. Sent total: 851968 bytes. Size of last

[jira] [Updated] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.

2019-09-08 Thread Xudong Cao (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xudong Cao updated HDFS-14646:
--
Attachment: HDFS-14646.004.patch
Status: Patch Available  (was: Open)

> Standby NameNode should not upload fsimage to an inappropriate NameNode.
> 
>
> Key: HDFS-14646
> URL: https://issues.apache.org/jira/browse/HDFS-14646
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.2
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Major
> Attachments: HDFS-14646.000.patch, HDFS-14646.001.patch, 
> HDFS-14646.002.patch, HDFS-14646.003.patch, HDFS-14646.004.patch
>
>
> *Problem Description:*
>  In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put 
> the image to all other NNs (whether the peer NN is an ANN or not), and even 
> if the peer NN immediately replies an error (such as 
> TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult 
> .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put 
> process immediately, but will put the FsImage completely to the peer NN, and 
> will not read the peer NN's reply until the put is completed.
> Depending on the version of Jetty, this behavior can lead to different 
> consequences, I tested it under 2.7.2 and trunk version. 
> *1.In Hadoop 2.7.2 (with Jetty 6.1.26)*
>  After peer NN called HttpServletResponse.sendError(), the underlying TCP 
> connection will still be established, and the data SNN sent will be read by 
> Jetty framework itself in the peer NN side, so the SNN will insignificantly 
> send the FsImage to the peer NN continuously, causing a waste of time and 
> bandwidth. In a relatively large HDFS cluster, the size of FsImage can often 
> reach about 30GB, This is indeed a big waste.
> *2.In trunk version (with Jetty 9.3.27)*
>  After peer NN called HttpServletResponse.sendError(), the underlying TCP 
> connection will be auto closed, and then SNN will directly get an "Error 
> writing request body to server" exception, as below, note this test needs a 
> relatively big FSImage (e.g. 10MB level):
> {code:java}
> 2019-08-17 03:59:25,413 INFO namenode.TransferFsImage: Sending fileName: 
> /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 
> 9864721. Sent total: 524288 bytes. Size of last segment intended to send: 
> 4096 bytes.
>  java.io.IOException: Error writing request body to server
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:314)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:249)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:277)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:272)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  2019-08-17 03:59:25,422 INFO namenode.TransferFsImage: Sending fileName: 
> /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 
> 9864721. Sent total: 851968 bytes. Size of last segment intended to send: 
> 4096 bytes.
>  java.io.IOException: Error writing request body to server
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396)
>  at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340)
>   {code}
>                   
> *Solution:*
>  A standby NameNode should not upload fsimage to an inappropriate NameNode, 
> when he plans to put a FsImage to the peer NN, he need to check whether he 
> really need to put it at this time.
> In detail, local SNN should establish an HTTP connection with the peer NN, 
> send the put request, and then immediately read the response (this is the key 
> point). If the peer NN

[jira] [Comment Edited] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.

2019-09-09 Thread Xudong Cao (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925408#comment-16925408
 ] 

Xudong Cao edited comment on HDFS-14646 at 9/9/19 6:46 AM:
---

Thanks [~csun]  very mush for reviewing, I have corrected all the problems you 
pointed out.

Additionally:
 #  I test this jira under the newest release-3.2.0-RC1 (with Jetty 9.3.24), it 
behaves just the same like trunk.
 # Disable http keep-alive in TransferFsImage.setupConnection(), because it 
sometimes causes bugs. Specifically, when the client sends a same put request 
again, the servlet sometimes will not call the corresponding doPut(), this is a 
very small probability problem.
 # In ImageServlet.doPut(), a ImageUploadRequest should be added to the set 
ImageServlet.currentlyDownloadingCheckpoints in check-should-put step, and 
removed in the end of transfer image step, so that between this two steps, 
other NNs can not interfere.

Another bug fixed:
 # ImageServlet.doPut() doesn't add anything to the set 
FSImage.currentlyCheckpointing, so it doesn't need to invoke 
currentlyCheckpointing.remove() to remove anything from this set too. I will 
fix the comment for FSImage.currentlyCheckpointing in another jira if needed 
later.


was (Author: xudongcao):
Thanks [~csun]  very mush for reviewing, I have corrected all the problems you 
pointed out.

Additionally:
 #  I test this jira under the newest release-3.2.0-RC1 (with Jetty 9.3.24), it 
behaves just the same like trunk.
 # Disable http keep-alive in TransferFsImage.setupConnection(), because it 
sometimes causes bugs. Specifically, when the client sends a same put request 
again, the servlet sometimes will not call the corresponding doPut(), this is a 
very small probability bug.
 # In ImageServlet.doPut(), a ImageUploadRequest should be added to the set 
ImageServlet.currentlyDownloadingCheckpoints in check-should-put step, and 
removed in the end of transfer image step, so that between this two steps, 
other NNs can not interfere.

Another bug fixed:
 # ImageServlet.doPut() doesn't add anything to the set 
FSImage.currentlyCheckpointing, so it doesn't need to invoke 
currentlyCheckpointing.remove() to remove anything from this set too. I will 
fix the comment for FSImage.currentlyCheckpointing in another jira if needed 
later.

> Standby NameNode should not upload fsimage to an inappropriate NameNode.
> 
>
> Key: HDFS-14646
> URL: https://issues.apache.org/jira/browse/HDFS-14646
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.2
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Major
> Attachments: HDFS-14646.000.patch, HDFS-14646.001.patch, 
> HDFS-14646.002.patch, HDFS-14646.003.patch, HDFS-14646.004.patch
>
>
> *Problem Description:*
>  In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put 
> the image to all other NNs (whether the peer NN is an ANN or not), and even 
> if the peer NN immediately replies an error (such as 
> TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult 
> .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put 
> process immediately, but will put the FsImage completely to the peer NN, and 
> will not read the peer NN's reply until the put is completed.
> Depending on the version of Jetty, this behavior can lead to different 
> consequences : 
> *1.Under Hadoop 2.7.2 (with Jetty 6.1.26)*
>  After peer NN called HttpServletResponse.sendError(), the underlying TCP 
> connection will still be established, and the data SNN sent will be read by 
> Jetty framework itself in the peer NN side, so the SNN will insignificantly 
> send the FsImage to the peer NN continuously, causing a waste of time and 
> bandwidth. In a relatively large HDFS cluster, the size of FsImage can often 
> reach about 30GB, This is indeed a big waste.
> *2.Under newest release-3.2.0-RC1 (with Jetty 9.3.24) and trunk (with Jetty 
> 9.3.27)*
>  After peer NN called HttpServletResponse.sendError(), the underlying TCP 
> connection will be auto closed, and then SNN will directly get an "Error 
> writing request body to server" exception, as below, note this test needs a 
> relatively big FSImage (e.g. 10MB level):
> {code:java}
> 2019-08-17 03:59:25,413 INFO namenode.TransferFsImage: Sending fileName: 
> /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 
> 9864721. Sent total: 524288 bytes. Size of last segment intended to send: 
> 4096 bytes.
>  java.io.IOException: Error writing request body to server
>  at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
>  at 
>

[jira] [Updated] (HDFS-14693) NameNode should log a warning when EditLog IPC logger's pending size exceeds limit.

2019-08-01 Thread Xudong Cao (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDFS-14693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xudong Cao updated HDFS-14693:
--
Description: 
In a production environment, there may be some differences in each JouranlNode 
(e.g. network condition, disk condition, and so on). For example, If a JN's 
network is much worse than other JNs, then the time taken by the NN to write 
this JN will be much greater than other JNs, in this case, it will cause the 
IPC Logger thread corresponding to this JN to have many pending edits, when the 
pending edits exceeds the maximum limit (default 10MB), the new edits about to 
write to this JN will be silently dropped, and will result gaps in the editlog 
segment, which causing this JN and NN repeatedly reporting the following errors:
 
{code:java}
org.apache.hadoop.hdfs.qjournal.protocol.JournalOutOfSyncException: Can't write 
txid 1904164873 expecting nextTxId=1904164871{code}
 

Unfortunately, the above error message can not help us quickly find the root 
cause, so it's better to add a warning log to tell us the really reason, like 
this:

 
{code:java}
2019-08-02 04:55:05,879 WARN 
org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager:Pending edits to 
192.168.202.13:8485 is going to exceed limit size:10240, current queued edits 
size:10224, will silently drop 174 bytes of edits!{code}
 

This is just a very small improvement.

> NameNode should log a warning when EditLog IPC logger's pending size exceeds 
> limit.
> ---
>
> Key: HDFS-14693
> URL: https://issues.apache.org/jira/browse/HDFS-14693
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Xudong Cao
>Priority: Minor
>
> In a production environment, there may be some differences in each 
> JouranlNode (e.g. network condition, disk condition, and so on). For example, 
> If a JN's network is much worse than other JNs, then the time taken by the NN 
> to write this JN will be much greater than other JNs, in this case, it will 
> cause the IPC Logger thread corresponding to this JN to have many pending 
> edits, when the pending edits exceeds the maximum limit (default 10MB), the 
> new edits about to write to this JN will be silently dropped, and will result 
> gaps in the editlog segment, which causing this JN and NN repeatedly 
> reporting the following errors:
>  
> {code:java}
> org.apache.hadoop.hdfs.qjournal.protocol.JournalOutOfSyncException: Can't 
> write txid 1904164873 expecting nextTxId=1904164871{code}
>  
> Unfortunately, the above error message can not help us quickly find the root 
> cause, so it's better to add a warning log to tell us the really reason, like 
> this:
>  
> {code:java}
> 2019-08-02 04:55:05,879 WARN 
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager:Pending edits to 
> 192.168.202.13:8485 is going to exceed limit size:10240, current queued edits 
> size:10224, will silently drop 174 bytes of edits!{code}
>  
> This is just a very small improvement.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14693) NameNode should log a warning when EditLog IPC logger's pending size exceeds limit.

2019-08-01 Thread Xudong Cao (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDFS-14693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xudong Cao updated HDFS-14693:
--
Description: 
In a production environment, there may be some differences in each JouranlNode 
(e.g. network condition, disk condition, and so on). For example, If a JN's 
network is much worse than other JNs, then the time taken by the NN to write 
this JN will be much greater than other JNs, in this case, it will cause the 
IPC Logger thread corresponding to this JN to have many pending edits, when the 
pending edits exceeds the maximum limit (default 10MB), the new edits about to 
write to this JN will be silently dropped, and will result gaps in the editlog 
segment, which causing this JN and NN repeatedly reporting the following 
errors: 
{code:java}
org.apache.hadoop.hdfs.qjournal.protocol.JournalOutOfSyncException: Can't write 
txid 1904164873 expecting nextTxId=1904164871{code}
 Unfortunately, the above error message can not help us quickly find the root 
cause, so it's better to add a warning log to tell us the really reason, like 
this: 
{code:java}
2019-08-02 04:55:05,879 WARN 
org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager:Pending edits to 
192.168.202.13:8485 is going to exceed limit size:10240, current queued edits 
size:10224, will silently drop 174 bytes of edits!{code}
 This is just a very small improvement.

  was:
In a production environment, there may be some differences in each JouranlNode 
(e.g. network condition, disk condition, and so on). For example, If a JN's 
network is much worse than other JNs, then the time taken by the NN to write 
this JN will be much greater than other JNs, in this case, it will cause the 
IPC Logger thread corresponding to this JN to have many pending edits, when the 
pending edits exceeds the maximum limit (default 10MB), the new edits about to 
write to this JN will be silently dropped, and will result gaps in the editlog 
segment, which causing this JN and NN repeatedly reporting the following errors:
 
{code:java}
org.apache.hadoop.hdfs.qjournal.protocol.JournalOutOfSyncException: Can't write 
txid 1904164873 expecting nextTxId=1904164871{code}
 

Unfortunately, the above error message can not help us quickly find the root 
cause, so it's better to add a warning log to tell us the really reason, like 
this:

 
{code:java}
2019-08-02 04:55:05,879 WARN 
org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager:Pending edits to 
192.168.202.13:8485 is going to exceed limit size:10240, current queued edits 
size:10224, will silently drop 174 bytes of edits!{code}
 

This is just a very small improvement.


> NameNode should log a warning when EditLog IPC logger's pending size exceeds 
> limit.
> ---
>
> Key: HDFS-14693
> URL: https://issues.apache.org/jira/browse/HDFS-14693
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Xudong Cao
>Priority: Minor
>
> In a production environment, there may be some differences in each 
> JouranlNode (e.g. network condition, disk condition, and so on). For example, 
> If a JN's network is much worse than other JNs, then the time taken by the NN 
> to write this JN will be much greater than other JNs, in this case, it will 
> cause the IPC Logger thread corresponding to this JN to have many pending 
> edits, when the pending edits exceeds the maximum limit (default 10MB), the 
> new edits about to write to this JN will be silently dropped, and will result 
> gaps in the editlog segment, which causing this JN and NN repeatedly 
> reporting the following errors: 
> {code:java}
> org.apache.hadoop.hdfs.qjournal.protocol.JournalOutOfSyncException: Can't 
> write txid 1904164873 expecting nextTxId=1904164871{code}
>  Unfortunately, the above error message can not help us quickly find the root 
> cause, so it's better to add a warning log to tell us the really reason, like 
> this: 
> {code:java}
> 2019-08-02 04:55:05,879 WARN 
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager:Pending edits to 
> 192.168.202.13:8485 is going to exceed limit size:10240, current queued edits 
> size:10224, will silently drop 174 bytes of edits!{code}
>  This is just a very small improvement.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDFS-14693) NameNode should log a warning when EditLog IPC logger's pending size exceeds limit.

2019-08-01 Thread Xudong Cao (JIRA)

Xudong Cao created HDFS-14693:
-

 Summary: NameNode should log a warning when EditLog IPC logger's 
pending size exceeds limit.
 Key: HDFS-14693
 URL: https://issues.apache.org/jira/browse/HDFS-14693
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Xudong Cao






--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14693) NameNode should log a warning when EditLog IPC logger's pending size exceeds limit.

2019-08-01 Thread Xudong Cao (JIRA)



[ 
https://issues.apache.org/jira/browse/HDFS-14693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16898516#comment-16898516
 ] 

Xudong Cao commented on HDFS-14693:
---

Just add a log, does not need an unit test.

> NameNode should log a warning when EditLog IPC logger's pending size exceeds 
> limit.
> ---
>
> Key: HDFS-14693
> URL: https://issues.apache.org/jira/browse/HDFS-14693
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Minor
> Attachments: HDFS-16493.001.patch
>
>
> In a production environment, there may be some differences in each 
> JouranlNode (e.g. network condition, disk condition, and so on). For example, 
> If a JN's network is much worse than other JNs, then the time taken by the NN 
> to write this JN will be much greater than other JNs, in this case, it will 
> cause the IPC Logger thread corresponding to this JN to have many pending 
> edits, when the pending edits exceeds the maximum limit (default 10MB), the 
> new edits about to write to this JN will be silently dropped, and will result 
> gaps in the editlog segment, which causing this JN and NN repeatedly 
> reporting the following errors: 
> {code:java}
> org.apache.hadoop.hdfs.qjournal.protocol.JournalOutOfSyncException: Can't 
> write txid 1904164873 expecting nextTxId=1904164871{code}
>  Unfortunately, the above error message can not help us quickly find the root 
> cause, so it's better to add a warning log to tell us the really reason, like 
> this: 
> {code:java}
> 2019-08-02 04:55:05,879 WARN 
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager:Pending edits to 
> 192.168.202.13:8485 is going to exceed limit size:10240, current queued edits 
> size:10224, will silently drop 174 bytes of edits!{code}
>  This is just a very small improvement.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14693) NameNode should log a warning when EditLog IPC logger's pending size exceeds limit.

2019-08-01 Thread Xudong Cao (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDFS-14693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xudong Cao updated HDFS-14693:
--
Component/s: namenode

> NameNode should log a warning when EditLog IPC logger's pending size exceeds 
> limit.
> ---
>
> Key: HDFS-14693
> URL: https://issues.apache.org/jira/browse/HDFS-14693
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.1.2
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Minor
> Attachments: HDFS-16493.001.patch
>
>
> In a production environment, there may be some differences in each 
> JouranlNode (e.g. network condition, disk condition, and so on). For example, 
> If a JN's network is much worse than other JNs, then the time taken by the NN 
> to write this JN will be much greater than other JNs, in this case, it will 
> cause the IPC Logger thread corresponding to this JN to have many pending 
> edits, when the pending edits exceeds the maximum limit (default 10MB), the 
> new edits about to write to this JN will be silently dropped, and will result 
> gaps in the editlog segment, which causing this JN and NN repeatedly 
> reporting the following errors: 
> {code:java}
> org.apache.hadoop.hdfs.qjournal.protocol.JournalOutOfSyncException: Can't 
> write txid 1904164873 expecting nextTxId=1904164871{code}
>  Unfortunately, the above error message can not help us quickly find the root 
> cause, so it's better to add a warning log to tell us the really reason, like 
> this: 
> {code:java}
> 2019-08-02 04:55:05,879 WARN 
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager:Pending edits to 
> 192.168.202.13:8485 is going to exceed limit size:10240, current queued edits 
> size:10224, will silently drop 174 bytes of edits!{code}
>  This is just a very small improvement.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14693) NameNode should log a warning when EditLog IPC logger's pending size exceeds limit.

2019-08-01 Thread Xudong Cao (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDFS-14693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xudong Cao updated HDFS-14693:
--
Affects Version/s: 3.1.2

> NameNode should log a warning when EditLog IPC logger's pending size exceeds 
> limit.
> ---
>
> Key: HDFS-14693
> URL: https://issues.apache.org/jira/browse/HDFS-14693
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.1.2
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Minor
> Attachments: HDFS-16493.001.patch
>
>
> In a production environment, there may be some differences in each 
> JouranlNode (e.g. network condition, disk condition, and so on). For example, 
> If a JN's network is much worse than other JNs, then the time taken by the NN 
> to write this JN will be much greater than other JNs, in this case, it will 
> cause the IPC Logger thread corresponding to this JN to have many pending 
> edits, when the pending edits exceeds the maximum limit (default 10MB), the 
> new edits about to write to this JN will be silently dropped, and will result 
> gaps in the editlog segment, which causing this JN and NN repeatedly 
> reporting the following errors: 
> {code:java}
> org.apache.hadoop.hdfs.qjournal.protocol.JournalOutOfSyncException: Can't 
> write txid 1904164873 expecting nextTxId=1904164871{code}
>  Unfortunately, the above error message can not help us quickly find the root 
> cause, so it's better to add a warning log to tell us the really reason, like 
> this: 
> {code:java}
> 2019-08-02 04:55:05,879 WARN 
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager:Pending edits to 
> 192.168.202.13:8485 is going to exceed limit size:10240, current queued edits 
> size:10224, will silently drop 174 bytes of edits!{code}
>  This is just a very small improvement.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14693) NameNode should log a warning when EditLog IPC logger's pending size exceeds limit.

2019-08-01 Thread Xudong Cao (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDFS-14693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xudong Cao updated HDFS-14693:
--
Assignee: Xudong Cao
  Status: Patch Available  (was: Open)

> NameNode should log a warning when EditLog IPC logger's pending size exceeds 
> limit.
> ---
>
> Key: HDFS-14693
> URL: https://issues.apache.org/jira/browse/HDFS-14693
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Minor
> Attachments: HDFS-16493.001.patch
>
>
> In a production environment, there may be some differences in each 
> JouranlNode (e.g. network condition, disk condition, and so on). For example, 
> If a JN's network is much worse than other JNs, then the time taken by the NN 
> to write this JN will be much greater than other JNs, in this case, it will 
> cause the IPC Logger thread corresponding to this JN to have many pending 
> edits, when the pending edits exceeds the maximum limit (default 10MB), the 
> new edits about to write to this JN will be silently dropped, and will result 
> gaps in the editlog segment, which causing this JN and NN repeatedly 
> reporting the following errors: 
> {code:java}
> org.apache.hadoop.hdfs.qjournal.protocol.JournalOutOfSyncException: Can't 
> write txid 1904164873 expecting nextTxId=1904164871{code}
>  Unfortunately, the above error message can not help us quickly find the root 
> cause, so it's better to add a warning log to tell us the really reason, like 
> this: 
> {code:java}
> 2019-08-02 04:55:05,879 WARN 
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager:Pending edits to 
> 192.168.202.13:8485 is going to exceed limit size:10240, current queued edits 
> size:10224, will silently drop 174 bytes of edits!{code}
>  This is just a very small improvement.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14693) NameNode should log a warning when EditLog IPC logger's pending size exceeds limit.

2019-08-01 Thread Xudong Cao (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDFS-14693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xudong Cao updated HDFS-14693:
--
Attachment: HDFS-16493.001.patch

> NameNode should log a warning when EditLog IPC logger's pending size exceeds 
> limit.
> ---
>
> Key: HDFS-14693
> URL: https://issues.apache.org/jira/browse/HDFS-14693
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Xudong Cao
>Priority: Minor
> Attachments: HDFS-16493.001.patch
>
>
> In a production environment, there may be some differences in each 
> JouranlNode (e.g. network condition, disk condition, and so on). For example, 
> If a JN's network is much worse than other JNs, then the time taken by the NN 
> to write this JN will be much greater than other JNs, in this case, it will 
> cause the IPC Logger thread corresponding to this JN to have many pending 
> edits, when the pending edits exceeds the maximum limit (default 10MB), the 
> new edits about to write to this JN will be silently dropped, and will result 
> gaps in the editlog segment, which causing this JN and NN repeatedly 
> reporting the following errors: 
> {code:java}
> org.apache.hadoop.hdfs.qjournal.protocol.JournalOutOfSyncException: Can't 
> write txid 1904164873 expecting nextTxId=1904164871{code}
>  Unfortunately, the above error message can not help us quickly find the root 
> cause, so it's better to add a warning log to tell us the really reason, like 
> this: 
> {code:java}
> 2019-08-02 04:55:05,879 WARN 
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager:Pending edits to 
> 192.168.202.13:8485 is going to exceed limit size:10240, current queued edits 
> size:10224, will silently drop 174 bytes of edits!{code}
>  This is just a very small improvement.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14693) NameNode should log a warning when EditLog IPC logger's pending size exceeds limit.

2019-08-01 Thread Xudong Cao (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDFS-14693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xudong Cao updated HDFS-14693:
--
Status: Patch Available  (was: Open)

> NameNode should log a warning when EditLog IPC logger's pending size exceeds 
> limit.
> ---
>
> Key: HDFS-14693
> URL: https://issues.apache.org/jira/browse/HDFS-14693
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.1.2
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Minor
> Attachments: HDFS-14693.001.patch
>
>
> In a production environment, there may be some differences in each 
> JouranlNode (e.g. network condition, disk condition, and so on). For example, 
> If a JN's network is much worse than other JNs, then the time taken by the NN 
> to write this JN will be much greater than other JNs, in this case, it will 
> cause the IPC Logger thread corresponding to this JN to have many pending 
> edits, when the pending edits exceeds the maximum limit (default 10MB), the 
> new edits about to write to this JN will be silently dropped, and will result 
> gaps in the editlog segment, which causing this JN and NN repeatedly 
> reporting the following errors: 
> {code:java}
> org.apache.hadoop.hdfs.qjournal.protocol.JournalOutOfSyncException: Can't 
> write txid 1904164873 expecting nextTxId=1904164871{code}
> Unfortunately, the above error message can not help us quickly find the root 
> cause, It took more time to find the cause, so it's better to add a warning 
> log here, like this: 
> {code:java}
> 2019-08-02 04:55:05,879 WARN 
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager:Pending edits to 
> 192.168.202.13:8485 is going to exceed limit size:10240, current queued edits 
> size:10224, will silently drop 174 bytes of edits!{code}
>  This is just a very small improvement.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14693) NameNode should log a warning when EditLog IPC logger's pending size exceeds limit.

2019-08-01 Thread Xudong Cao (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDFS-14693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xudong Cao updated HDFS-14693:
--
Attachment: HDFS-14693.001.patch

> NameNode should log a warning when EditLog IPC logger's pending size exceeds 
> limit.
> ---
>
> Key: HDFS-14693
> URL: https://issues.apache.org/jira/browse/HDFS-14693
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.1.2
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Minor
> Attachments: HDFS-14693.001.patch
>
>
> In a production environment, there may be some differences in each 
> JouranlNode (e.g. network condition, disk condition, and so on). For example, 
> If a JN's network is much worse than other JNs, then the time taken by the NN 
> to write this JN will be much greater than other JNs, in this case, it will 
> cause the IPC Logger thread corresponding to this JN to have many pending 
> edits, when the pending edits exceeds the maximum limit (default 10MB), the 
> new edits about to write to this JN will be silently dropped, and will result 
> gaps in the editlog segment, which causing this JN and NN repeatedly 
> reporting the following errors: 
> {code:java}
> org.apache.hadoop.hdfs.qjournal.protocol.JournalOutOfSyncException: Can't 
> write txid 1904164873 expecting nextTxId=1904164871{code}
> Unfortunately, the above error message can not help us quickly find the root 
> cause, It took more time to find the cause, so it's better to add a warning 
> log here, like this: 
> {code:java}
> 2019-08-02 04:55:05,879 WARN 
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager:Pending edits to 
> 192.168.202.13:8485 is going to exceed limit size:10240, current queued edits 
> size:10224, will silently drop 174 bytes of edits!{code}
>  This is just a very small improvement.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14693) NameNode should log a warning when EditLog IPC logger's pending size exceeds limit.

2019-08-01 Thread Xudong Cao (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDFS-14693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xudong Cao updated HDFS-14693:
--
Status: Open  (was: Patch Available)

> NameNode should log a warning when EditLog IPC logger's pending size exceeds 
> limit.
> ---
>
> Key: HDFS-14693
> URL: https://issues.apache.org/jira/browse/HDFS-14693
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.1.2
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Minor
>
> In a production environment, there may be some differences in each 
> JouranlNode (e.g. network condition, disk condition, and so on). For example, 
> If a JN's network is much worse than other JNs, then the time taken by the NN 
> to write this JN will be much greater than other JNs, in this case, it will 
> cause the IPC Logger thread corresponding to this JN to have many pending 
> edits, when the pending edits exceeds the maximum limit (default 10MB), the 
> new edits about to write to this JN will be silently dropped, and will result 
> gaps in the editlog segment, which causing this JN and NN repeatedly 
> reporting the following errors: 
> {code:java}
> org.apache.hadoop.hdfs.qjournal.protocol.JournalOutOfSyncException: Can't 
> write txid 1904164873 expecting nextTxId=1904164871{code}
> Unfortunately, the above error message can not help us quickly find the root 
> cause, It took more time to find the cause, so it's better to add a warning 
> log here, like this: 
> {code:java}
> 2019-08-02 04:55:05,879 WARN 
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager:Pending edits to 
> 192.168.202.13:8485 is going to exceed limit size:10240, current queued edits 
> size:10224, will silently drop 174 bytes of edits!{code}
>  This is just a very small improvement.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14693) NameNode should log a warning when EditLog IPC logger's pending size exceeds limit.

2019-08-01 Thread Xudong Cao (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDFS-14693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xudong Cao updated HDFS-14693:
--
Attachment: (was: HDFS-16493.001.patch)

> NameNode should log a warning when EditLog IPC logger's pending size exceeds 
> limit.
> ---
>
> Key: HDFS-14693
> URL: https://issues.apache.org/jira/browse/HDFS-14693
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.1.2
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Minor
>
> In a production environment, there may be some differences in each 
> JouranlNode (e.g. network condition, disk condition, and so on). For example, 
> If a JN's network is much worse than other JNs, then the time taken by the NN 
> to write this JN will be much greater than other JNs, in this case, it will 
> cause the IPC Logger thread corresponding to this JN to have many pending 
> edits, when the pending edits exceeds the maximum limit (default 10MB), the 
> new edits about to write to this JN will be silently dropped, and will result 
> gaps in the editlog segment, which causing this JN and NN repeatedly 
> reporting the following errors: 
> {code:java}
> org.apache.hadoop.hdfs.qjournal.protocol.JournalOutOfSyncException: Can't 
> write txid 1904164873 expecting nextTxId=1904164871{code}
> Unfortunately, the above error message can not help us quickly find the root 
> cause, It took more time to find the cause, so it's better to add a warning 
> log here, like this: 
> {code:java}
> 2019-08-02 04:55:05,879 WARN 
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager:Pending edits to 
> 192.168.202.13:8485 is going to exceed limit size:10240, current queued edits 
> size:10224, will silently drop 174 bytes of edits!{code}
>  This is just a very small improvement.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14693) NameNode should log a warning when EditLog IPC logger's pending size exceeds limit.

2019-08-01 Thread Xudong Cao (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDFS-14693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xudong Cao updated HDFS-14693:
--
Description: 
In a production environment, there may be some differences in each JouranlNode 
(e.g. network condition, disk condition, and so on). For example, If a JN's 
network is much worse than other JNs, then the time taken by the NN to write 
this JN will be much greater than other JNs, in this case, it will cause the 
IPC Logger thread corresponding to this JN to have many pending edits, when the 
pending edits exceeds the maximum limit (default 10MB), the new edits about to 
write to this JN will be silently dropped, and will result gaps in the editlog 
segment, which causing this JN and NN repeatedly reporting the following 
errors: 
{code:java}
org.apache.hadoop.hdfs.qjournal.protocol.JournalOutOfSyncException: Can't write 
txid 1904164873 expecting nextTxId=1904164871{code}
Unfortunately, the above error message can not help us quickly find the root 
cause, It took more time to find the cause, so it's better to add a warning log 
here, like this: 
{code:java}
2019-08-02 04:55:05,879 WARN 
org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager:Pending edits to 
192.168.202.13:8485 is going to exceed limit size:10240, current queued edits 
size:10224, will silently drop 174 bytes of edits!{code}
 This is just a very small improvement.

  was:
In a production environment, there may be some differences in each JouranlNode 
(e.g. network condition, disk condition, and so on). For example, If a JN's 
network is much worse than other JNs, then the time taken by the NN to write 
this JN will be much greater than other JNs, in this case, it will cause the 
IPC Logger thread corresponding to this JN to have many pending edits, when the 
pending edits exceeds the maximum limit (default 10MB), the new edits about to 
write to this JN will be silently dropped, and will result gaps in the editlog 
segment, which causing this JN and NN repeatedly reporting the following 
errors: 
{code:java}
org.apache.hadoop.hdfs.qjournal.protocol.JournalOutOfSyncException: Can't write 
txid 1904164873 expecting nextTxId=1904164871{code}
 Unfortunately, the above error message can not help us quickly find the root 
cause, so it's better to add a warning log to tell us the really reason, like 
this: 
{code:java}
2019-08-02 04:55:05,879 WARN 
org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager:Pending edits to 
192.168.202.13:8485 is going to exceed limit size:10240, current queued edits 
size:10224, will silently drop 174 bytes of edits!{code}
 This is just a very small improvement.


> NameNode should log a warning when EditLog IPC logger's pending size exceeds 
> limit.
> ---
>
> Key: HDFS-14693
> URL: https://issues.apache.org/jira/browse/HDFS-14693
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.1.2
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Minor
> Attachments: HDFS-16493.001.patch
>
>
> In a production environment, there may be some differences in each 
> JouranlNode (e.g. network condition, disk condition, and so on). For example, 
> If a JN's network is much worse than other JNs, then the time taken by the NN 
> to write this JN will be much greater than other JNs, in this case, it will 
> cause the IPC Logger thread corresponding to this JN to have many pending 
> edits, when the pending edits exceeds the maximum limit (default 10MB), the 
> new edits about to write to this JN will be silently dropped, and will result 
> gaps in the editlog segment, which causing this JN and NN repeatedly 
> reporting the following errors: 
> {code:java}
> org.apache.hadoop.hdfs.qjournal.protocol.JournalOutOfSyncException: Can't 
> write txid 1904164873 expecting nextTxId=1904164871{code}
> Unfortunately, the above error message can not help us quickly find the root 
> cause, It took more time to find the cause, so it's better to add a warning 
> log here, like this: 
> {code:java}
> 2019-08-02 04:55:05,879 WARN 
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager:Pending edits to 
> 192.168.202.13:8485 is going to exceed limit size:10240, current queued edits 
> size:10224, will silently drop 174 bytes of edits!{code}
>  This is just a very small improvement.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14646) Standby NameNode should terminate the FsImage put process immediately if the peer NN is not in the appropriate state to receive an image.

2019-07-17 Thread Xudong Cao (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xudong Cao updated HDFS-14646:
--
Status: Patch Available  (was: Open)

> Standby NameNode should terminate the FsImage put process immediately if the 
> peer NN is not in the appropriate state to receive an image.
> -
>
> Key: HDFS-14646
> URL: https://issues.apache.org/jira/browse/HDFS-14646
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.2
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Major
> Attachments: blockedInWritingSocket.png, get1.png, get2.png, 
> largeSendQ.png
>
>
> *Problem Description:*
>  In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put 
> the image to all other NNs (whether the peer NN is an ANN or not), and even 
> if the peer NN immediately replies with an error (such as 
> TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult 
> .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put 
> process immediately, but will put the FsImage completely to the peer NN, and 
> will not read the peer NN's reply until the put is completed.
> In a relatively large HDFS cluster, the size of FsImage can often reach about 
> 30GB. In this case, this invalid put brings two problems:
>  # Wasting time and bandwidth.
>  # Since the ImageServlet of the peer NN no longer receives the FsImage, the 
> socket Send-Q of the local SNN is very large, and the ImageUpload thread will 
> be blocked in writing socket for a long time, eventually causing the local 
> StandbyCheckpointer thread often blocked for several hours.
> *An example is as follows:*
>  In the following figure, the local NN 100.76.3.234 is a SNN, the peer NN 
> 100.76.3.170 is another SNN, and the 8080 is NN Http port. When the local SNN 
> starts to put the FsImage, 170 will reply with a NOT_ACTIVE_NAMENODE_FAILURE 
> error immediately. In this case, the local SNN should terminate put 
> immediately, but in fact, local SNN has to wait until the image has been 
> completely put to the peer NN，and then can read the response.
>  # At this time, since the ImageServlet of the peer NN no longer receives the 
> FsImage, the socket Send-Q of the local SNN is very large:          
> !largeSendQ.png!
>       2. Moreover, the local SNN's ImageUpload thread will be blocked in 
> writing socket for a long time:
>           !blockedInWritingSocket.png! .
>  
>      3. Eventually, the StandbyCheckpointer thread of local SNN is waiting 
> for the execution result of the ImageUpload thread, blocking in Future.get(), 
> and the blocking time may be as long as several hours:
>             !get1.png!
>                            
>        !get2.png!
>  
>  
> *Solution:*
>  When the local SNN plans to put a FsImage to the peer NN, it need to test 
> whether he really need to put it at this time. The test process is:
>  # Establish an HTTP connection with the peer NN, send the put request, and 
> then immediately read the response (this is the key point). If the peer NN 
> replies any of the following errors (TransferResult.AUTHENTICATION_FAILURE, 
> TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult. 
> OLD_TRANSACTION_ID_FAILURE), immediately terminate the put process.
>  # If the peer NN is indeed the Active NameNode AND it's now in the 
> appropriate state to receive an image, it will reply an HTTP response 410 
> (HttpServletResponse.SC_GONE, which is TransferResult.UNEXPECTED_FAILURE). At 
> this time, the local SNN can really begin to put the image.
> *Note:*
>  This problem needs to be reproduced in a large cluster (the size of FsImage 
> in our cluster is about 30GB). Therefore, unit testing is difficult to write. 
> In our cluster, after the modification, the problem has been solved and there 
> is no such thing as a large backlog of Send-Q.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14646) Standby NameNode should terminate the FsImage put process immediately if the peer NN is not in the appropriate state to receive an image.

2019-07-17 Thread Xudong Cao (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xudong Cao updated HDFS-14646:
--
Status: Open  (was: Patch Available)

> Standby NameNode should terminate the FsImage put process immediately if the 
> peer NN is not in the appropriate state to receive an image.
> -
>
> Key: HDFS-14646
> URL: https://issues.apache.org/jira/browse/HDFS-14646
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.2
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Major
> Attachments: blockedInWritingSocket.png, get1.png, get2.png, 
> largeSendQ.png
>
>
> *Problem Description:*
>  In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put 
> the image to all other NNs (whether the peer NN is an ANN or not), and even 
> if the peer NN immediately replies with an error (such as 
> TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult 
> .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put 
> process immediately, but will put the FsImage completely to the peer NN, and 
> will not read the peer NN's reply until the put is completed.
> In a relatively large HDFS cluster, the size of FsImage can often reach about 
> 30GB. In this case, this invalid put brings two problems:
>  # Wasting time and bandwidth.
>  # Since the ImageServlet of the peer NN no longer receives the FsImage, the 
> socket Send-Q of the local SNN is very large, and the ImageUpload thread will 
> be blocked in writing socket for a long time, eventually causing the local 
> StandbyCheckpointer thread often blocked for several hours.
> *An example is as follows:*
>  In the following figure, the local NN 100.76.3.234 is a SNN, the peer NN 
> 100.76.3.170 is another SNN, and the 8080 is NN Http port. When the local SNN 
> starts to put the FsImage, 170 will reply with a NOT_ACTIVE_NAMENODE_FAILURE 
> error immediately. In this case, the local SNN should terminate put 
> immediately, but in fact, local SNN has to wait until the image has been 
> completely put to the peer NN，and then can read the response.
>  # At this time, since the ImageServlet of the peer NN no longer receives the 
> FsImage, the socket Send-Q of the local SNN is very large:          
> !largeSendQ.png!
>       2. Moreover, the local SNN's ImageUpload thread will be blocked in 
> writing socket for a long time:
>           !blockedInWritingSocket.png! .
>  
>      3. Eventually, the StandbyCheckpointer thread of local SNN is waiting 
> for the execution result of the ImageUpload thread, blocking in Future.get(), 
> and the blocking time may be as long as several hours:
>             !get1.png!
>                            
>        !get2.png!
>  
>  
> *Solution:*
>  When the local SNN plans to put a FsImage to the peer NN, it need to test 
> whether he really need to put it at this time. The test process is:
>  # Establish an HTTP connection with the peer NN, send the put request, and 
> then immediately read the response (this is the key point). If the peer NN 
> replies any of the following errors (TransferResult.AUTHENTICATION_FAILURE, 
> TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult. 
> OLD_TRANSACTION_ID_FAILURE), immediately terminate the put process.
>  # If the peer NN is indeed the Active NameNode AND it's now in the 
> appropriate state to receive an image, it will reply an HTTP response 410 
> (HttpServletResponse.SC_GONE, which is TransferResult.UNEXPECTED_FAILURE). At 
> this time, the local SNN can really begin to put the image.
> *Note:*
>  This problem needs to be reproduced in a large cluster (the size of FsImage 
> in our cluster is about 30GB). Therefore, unit testing is difficult to write. 
> In our cluster, after the modification, the problem has been solved and there 
> is no such thing as a large backlog of Send-Q.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14646) Standby NameNode should terminate the FsImage put process immediately if the peer NN is not in the appropriate state to receive an image.

2019-07-17 Thread Xudong Cao (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xudong Cao updated HDFS-14646:
--
Attachment: HDFS-14646.000.patch
Status: Patch Available  (was: Open)

> Standby NameNode should terminate the FsImage put process immediately if the 
> peer NN is not in the appropriate state to receive an image.
> -
>
> Key: HDFS-14646
> URL: https://issues.apache.org/jira/browse/HDFS-14646
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.2
>Reporter: Xudong Cao
>Assignee: Xudong Cao
>Priority: Major
> Attachments: HDFS-14646.000.patch, blockedInWritingSocket.png, 
> get1.png, get2.png, largeSendQ.png
>
>
> *Problem Description:*
>  In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put 
> the image to all other NNs (whether the peer NN is an ANN or not), and even 
> if the peer NN immediately replies with an error (such as 
> TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult 
> .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put 
> process immediately, but will put the FsImage completely to the peer NN, and 
> will not read the peer NN's reply until the put is completed.
> In a relatively large HDFS cluster, the size of FsImage can often reach about 
> 30GB. In this case, this invalid put brings two problems:
>  # Wasting time and bandwidth.
>  # Since the ImageServlet of the peer NN no longer receives the FsImage, the 
> socket Send-Q of the local SNN is very large, and the ImageUpload thread will 
> be blocked in writing socket for a long time, eventually causing the local 
> StandbyCheckpointer thread often blocked for several hours.
> *An example is as follows:*
>  In the following figure, the local NN 100.76.3.234 is a SNN, the peer NN 
> 100.76.3.170 is another SNN, and the 8080 is NN Http port. When the local SNN 
> starts to put the FsImage, 170 will reply with a NOT_ACTIVE_NAMENODE_FAILURE 
> error immediately. In this case, the local SNN should terminate put 
> immediately, but in fact, local SNN has to wait until the image has been 
> completely put to the peer NN，and then can read the response.
>  # At this time, since the ImageServlet of the peer NN no longer receives the 
> FsImage, the socket Send-Q of the local SNN is very large:          
> !largeSendQ.png!
>       2. Moreover, the local SNN's ImageUpload thread will be blocked in 
> writing socket for a long time:
>           !blockedInWritingSocket.png! .
>  
>      3. Eventually, the StandbyCheckpointer thread of local SNN is waiting 
> for the execution result of the ImageUpload thread, blocking in Future.get(), 
> and the blocking time may be as long as several hours:
>             !get1.png!
>                            
>        !get2.png!
>  
>  
> *Solution:*
>  When the local SNN plans to put a FsImage to the peer NN, it need to test 
> whether he really need to put it at this time. The test process is:
>  # Establish an HTTP connection with the peer NN, send the put request, and 
> then immediately read the response (this is the key point). If the peer NN 
> replies any of the following errors (TransferResult.AUTHENTICATION_FAILURE, 
> TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult. 
> OLD_TRANSACTION_ID_FAILURE), immediately terminate the put process.
>  # If the peer NN is indeed the Active NameNode AND it's now in the 
> appropriate state to receive an image, it will reply an HTTP response 410 
> (HttpServletResponse.SC_GONE, which is TransferResult.UNEXPECTED_FAILURE). At 
> this time, the local SNN can really begin to put the image.
> *Note:*
>  This problem needs to be reproduced in a large cluster (the size of FsImage 
> in our cluster is about 30GB). Therefore, unit testing is difficult to write. 
> In our cluster, after the modification, the problem has been solved and there 
> is no such thing as a large backlog of Send-Q.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

1 2 3 >

1 - 100 of 239 matches

Mail list logo