[jira] [Updated] (HDDS-1530) Ozone: Freon: Support big files larger than 2GB and add "--bufferSize" and "--validateWrites" options.
[ https://issues.apache.org/jira/browse/HDDS-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xudong Cao updated HDDS-1530: - Status: Patch Available (was: In Progress) > Ozone: Freon: Support big files larger than 2GB and add "--bufferSize" and > "--validateWrites" options. > -- > > Key: HDDS-1530 > URL: https://issues.apache.org/jira/browse/HDDS-1530 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: test >Reporter: Xudong Cao >Assignee: Xudong Cao >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > *Current problems:* > 1. Freon does not support big files larger than 2GB because it use an int > type "keySize" parameter and also "keyValue" buffer size. > 2. Freon allocates a entire buffer for each key at once, so if the key size > is large and the concurrency is high, freon will report OOM exception > frequently. > 3. Freon lacks option such as "--validateWrites", thus users cannot manually > specify that verification is required after writing. > *Some solutions:* > 1. Use a long type "keySize" parameter, make sure freon can support big > files larger than 2GB. > 2. Use a small buffer repeatedly than allocating the entire key-size buffer > at once, the default buffer size is 4K and can be configured by "–bufferSize" > parameter. > 3. Add a "--validateWrites" option to Freon command line, users can provide > this option to indicate that a validation is required after write. > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1530) Ozone: Freon: Support big files larger than 2GB and add "--bufferSize" and "--validateWrites" options.
[ https://issues.apache.org/jira/browse/HDDS-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xudong Cao updated HDDS-1530: - Status: In Progress (was: Patch Available) > Ozone: Freon: Support big files larger than 2GB and add "--bufferSize" and > "--validateWrites" options. > -- > > Key: HDDS-1530 > URL: https://issues.apache.org/jira/browse/HDDS-1530 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: test >Reporter: Xudong Cao >Assignee: Xudong Cao >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > *Current problems:* > 1. Freon does not support big files larger than 2GB because it use an int > type "keySize" parameter and also "keyValue" buffer size. > 2. Freon allocates a entire buffer for each key at once, so if the key size > is large and the concurrency is high, freon will report OOM exception > frequently. > 3. Freon lacks option such as "--validateWrites", thus users cannot manually > specify that verification is required after writing. > *Some solutions:* > 1. Use a long type "keySize" parameter, make sure freon can support big > files larger than 2GB. > 2. Use a small buffer repeatedly than allocating the entire key-size buffer > at once, the default buffer size is 4K and can be configured by "–bufferSize" > parameter. > 3. Add a "--validateWrites" option to Freon command line, users can provide > this option to indicate that a validation is required after write. > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1530) Ozone: Freon: Support big files larger than 2GB and add "--bufferSize" and "--validateWrites" options.
[ https://issues.apache.org/jira/browse/HDDS-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xudong Cao updated HDDS-1530: - Status: Open (was: Patch Available) > Ozone: Freon: Support big files larger than 2GB and add "--bufferSize" and > "--validateWrites" options. > -- > > Key: HDDS-1530 > URL: https://issues.apache.org/jira/browse/HDDS-1530 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: test >Reporter: Xudong Cao >Assignee: Xudong Cao >Priority: Major > Attachments: HDDS-1530.001.patch, HDDS-1530.002.patch > > > *Current problems:* > 1. Freon does not support big files larger than 2GB because it use an int > type "keySize" parameter and also "keyValue" buffer size. > 2. Freon allocates a entire buffer for each key at once, so if the key size > is large and the concurrency is high, freon will report OOM exception > frequently. > 3. Freon lacks option such as "--validateWrites", thus users cannot manually > specify that verification is required after writing. > *Some solutions:* > 1. Use a long type "keySize" parameter, make sure freon can support big > files larger than 2GB. > 2. Use a small buffer repeatedly than allocating the entire key-size buffer > at once, the default buffer size is 4K and can be configured by "–bufferSize" > parameter. > 3. Add a "--validateWrites" option to Freon command line, users can provide > this option to indicate that a validation is required after write. > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1530) Ozone: Freon: Support big files larger than 2GB and add "--bufferSize" and "--validateWrites" options.
[ https://issues.apache.org/jira/browse/HDDS-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xudong Cao updated HDDS-1530: - Status: Patch Available (was: Open) > Ozone: Freon: Support big files larger than 2GB and add "--bufferSize" and > "--validateWrites" options. > -- > > Key: HDDS-1530 > URL: https://issues.apache.org/jira/browse/HDDS-1530 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: test >Reporter: Xudong Cao >Assignee: Xudong Cao >Priority: Major > Attachments: HDDS-1530.001.patch, HDDS-1530.002.patch > > > *Current problems:* > 1. Freon does not support big files larger than 2GB because it use an int > type "keySize" parameter and also "keyValue" buffer size. > 2. Freon allocates a entire buffer for each key at once, so if the key size > is large and the concurrency is high, freon will report OOM exception > frequently. > 3. Freon lacks option such as "--validateWrites", thus users cannot manually > specify that verification is required after writing. > *Some solutions:* > 1. Use a long type "keySize" parameter, make sure freon can support big > files larger than 2GB. > 2. Use a small buffer repeatedly than allocating the entire key-size buffer > at once, the default buffer size is 4K and can be configured by "–bufferSize" > parameter. > 3. Add a "--validateWrites" option to Freon command line, users can provide > this option to indicate that a validation is required after write. > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1530) Ozone: Freon: Support big files larger than 2GB and add "--bufferSize" and "--validateWrites" options.
[ https://issues.apache.org/jira/browse/HDDS-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xudong Cao updated HDDS-1530: - Attachment: (was: HDDS-1530.002.patch) > Ozone: Freon: Support big files larger than 2GB and add "--bufferSize" and > "--validateWrites" options. > -- > > Key: HDDS-1530 > URL: https://issues.apache.org/jira/browse/HDDS-1530 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: test >Reporter: Xudong Cao >Assignee: Xudong Cao >Priority: Major > > *Current problems:* > 1. Freon does not support big files larger than 2GB because it use an int > type "keySize" parameter and also "keyValue" buffer size. > 2. Freon allocates a entire buffer for each key at once, so if the key size > is large and the concurrency is high, freon will report OOM exception > frequently. > 3. Freon lacks option such as "--validateWrites", thus users cannot manually > specify that verification is required after writing. > *Some solutions:* > 1. Use a long type "keySize" parameter, make sure freon can support big > files larger than 2GB. > 2. Use a small buffer repeatedly than allocating the entire key-size buffer > at once, the default buffer size is 4K and can be configured by "–bufferSize" > parameter. > 3. Add a "--validateWrites" option to Freon command line, users can provide > this option to indicate that a validation is required after write. > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1530) Ozone: Freon: Support big files larger than 2GB and add "--bufferSize" and "--validateWrites" options.
[ https://issues.apache.org/jira/browse/HDDS-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xudong Cao updated HDDS-1530: - Status: Open (was: Patch Available) > Ozone: Freon: Support big files larger than 2GB and add "--bufferSize" and > "--validateWrites" options. > -- > > Key: HDDS-1530 > URL: https://issues.apache.org/jira/browse/HDDS-1530 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: test >Reporter: Xudong Cao >Assignee: Xudong Cao >Priority: Major > > *Current problems:* > 1. Freon does not support big files larger than 2GB because it use an int > type "keySize" parameter and also "keyValue" buffer size. > 2. Freon allocates a entire buffer for each key at once, so if the key size > is large and the concurrency is high, freon will report OOM exception > frequently. > 3. Freon lacks option such as "--validateWrites", thus users cannot manually > specify that verification is required after writing. > *Some solutions:* > 1. Use a long type "keySize" parameter, make sure freon can support big > files larger than 2GB. > 2. Use a small buffer repeatedly than allocating the entire key-size buffer > at once, the default buffer size is 4K and can be configured by "–bufferSize" > parameter. > 3. Add a "--validateWrites" option to Freon command line, users can provide > this option to indicate that a validation is required after write. > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1530) Ozone: Freon: Support big files larger than 2GB and add "--bufferSize" and "--validateWrites" options.
[ https://issues.apache.org/jira/browse/HDDS-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xudong Cao updated HDDS-1530: - Attachment: (was: HDDS-1530.001.patch) > Ozone: Freon: Support big files larger than 2GB and add "--bufferSize" and > "--validateWrites" options. > -- > > Key: HDDS-1530 > URL: https://issues.apache.org/jira/browse/HDDS-1530 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: test >Reporter: Xudong Cao >Assignee: Xudong Cao >Priority: Major > > *Current problems:* > 1. Freon does not support big files larger than 2GB because it use an int > type "keySize" parameter and also "keyValue" buffer size. > 2. Freon allocates a entire buffer for each key at once, so if the key size > is large and the concurrency is high, freon will report OOM exception > frequently. > 3. Freon lacks option such as "--validateWrites", thus users cannot manually > specify that verification is required after writing. > *Some solutions:* > 1. Use a long type "keySize" parameter, make sure freon can support big > files larger than 2GB. > 2. Use a small buffer repeatedly than allocating the entire key-size buffer > at once, the default buffer size is 4K and can be configured by "–bufferSize" > parameter. > 3. Add a "--validateWrites" option to Freon command line, users can provide > this option to indicate that a validation is required after write. > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1530) Ozone: Freon: Support big files larger than 2GB and add "--bufferSize" and "--validateWrites" options.
[ https://issues.apache.org/jira/browse/HDDS-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xudong Cao updated HDDS-1530: - Status: Patch Available (was: Open) > Ozone: Freon: Support big files larger than 2GB and add "--bufferSize" and > "--validateWrites" options. > -- > > Key: HDDS-1530 > URL: https://issues.apache.org/jira/browse/HDDS-1530 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: test >Reporter: Xudong Cao >Assignee: Xudong Cao >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > *Current problems:* > 1. Freon does not support big files larger than 2GB because it use an int > type "keySize" parameter and also "keyValue" buffer size. > 2. Freon allocates a entire buffer for each key at once, so if the key size > is large and the concurrency is high, freon will report OOM exception > frequently. > 3. Freon lacks option such as "--validateWrites", thus users cannot manually > specify that verification is required after writing. > *Some solutions:* > 1. Use a long type "keySize" parameter, make sure freon can support big > files larger than 2GB. > 2. Use a small buffer repeatedly than allocating the entire key-size buffer > at once, the default buffer size is 4K and can be configured by "–bufferSize" > parameter. > 3. Add a "--validateWrites" option to Freon command line, users can provide > this option to indicate that a validation is required after write. > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1530) Ozone: Freon: Support big files larger than 2GB and add "--bufferSize" and "--validateWrites" options.
[ https://issues.apache.org/jira/browse/HDDS-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xudong Cao updated HDDS-1530: - Status: Open (was: Patch Available) > Ozone: Freon: Support big files larger than 2GB and add "--bufferSize" and > "--validateWrites" options. > -- > > Key: HDDS-1530 > URL: https://issues.apache.org/jira/browse/HDDS-1530 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: test >Reporter: Xudong Cao >Assignee: Xudong Cao >Priority: Major > Attachments: HDDS-1530.001.patch, HDDS-1530.002.patch > > > *Current problems:* > 1. Freon does not support big files larger than 2GB because it use an int > type "keySize" parameter and also "keyValue" buffer size. > 2. Freon allocates a entire buffer for each key at once, so if the key size > is large and the concurrency is high, freon will report OOM exception > frequently. > 3. Freon lacks option such as "--validateWrites", thus users cannot manually > specify that verification is required after writing. > *Some solutions:* > 1. Use a long type "keySize" parameter, make sure freon can support big > files larger than 2GB. > 2. Use a small buffer repeatedly than allocating the entire key-size buffer > at once, the default buffer size is 4K and can be configured by "–bufferSize" > parameter. > 3. Add a "--validateWrites" option to Freon command line, users can provide > this option to indicate that a validation is required after write. > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1530) Ozone: Freon: Support big files larger than 2GB and add "--bufferSize" and "--validateWrites" options.
[ https://issues.apache.org/jira/browse/HDDS-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xudong Cao updated HDDS-1530: - Attachment: (was: HDDS-1530.001.patch) > Ozone: Freon: Support big files larger than 2GB and add "--bufferSize" and > "--validateWrites" options. > -- > > Key: HDDS-1530 > URL: https://issues.apache.org/jira/browse/HDDS-1530 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: test >Reporter: Xudong Cao >Assignee: Xudong Cao >Priority: Major > Attachments: HDDS-1530.001.patch, HDDS-1530.002.patch > > > *Current problems:* > 1. Freon does not support big files larger than 2GB because it use an int > type "keySize" parameter and also "keyValue" buffer size. > 2. Freon allocates a entire buffer for each key at once, so if the key size > is large and the concurrency is high, freon will report OOM exception > frequently. > 3. Freon lacks option such as "--validateWrites", thus users cannot manually > specify that verification is required after writing. > *Some solutions:* > 1. Use a long type "keySize" parameter, make sure freon can support big > files larger than 2GB. > 2. Use a small buffer repeatedly than allocating the entire key-size buffer > at once, the default buffer size is 4K and can be configured by "–bufferSize" > parameter. > 3. Add a "--validateWrites" option to Freon command line, users can provide > this option to indicate that a validation is required after write. > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1530) Ozone: Freon: Support big files larger than 2GB and add "--bufferSize" and "--validateWrites" options.
[ https://issues.apache.org/jira/browse/HDDS-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xudong Cao updated HDDS-1530: - Attachment: HDDS-1530.001.patch HDDS-1530.002.patch Status: Patch Available (was: Open) > Ozone: Freon: Support big files larger than 2GB and add "--bufferSize" and > "--validateWrites" options. > -- > > Key: HDDS-1530 > URL: https://issues.apache.org/jira/browse/HDDS-1530 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: test >Reporter: Xudong Cao >Assignee: Xudong Cao >Priority: Major > Attachments: HDDS-1530.001.patch, HDDS-1530.002.patch > > > *Current problems:* > 1. Freon does not support big files larger than 2GB because it use an int > type "keySize" parameter and also "keyValue" buffer size. > 2. Freon allocates a entire buffer for each key at once, so if the key size > is large and the concurrency is high, freon will report OOM exception > frequently. > 3. Freon lacks option such as "--validateWrites", thus users cannot manually > specify that verification is required after writing. > *Some solutions:* > 1. Use a long type "keySize" parameter, make sure freon can support big > files larger than 2GB. > 2. Use a small buffer repeatedly than allocating the entire key-size buffer > at once, the default buffer size is 4K and can be configured by "–bufferSize" > parameter. > 3. Add a "--validateWrites" option to Freon command line, users can provide > this option to indicate that a validation is required after write. > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1530) Ozone: Freon: Support big files larger than 2GB and add "--bufferSize" and "--validateWrites" options.
[ https://issues.apache.org/jira/browse/HDDS-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xudong Cao updated HDDS-1530: - Attachment: (was: HDDS-1530.002.patch) > Ozone: Freon: Support big files larger than 2GB and add "--bufferSize" and > "--validateWrites" options. > -- > > Key: HDDS-1530 > URL: https://issues.apache.org/jira/browse/HDDS-1530 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: test >Reporter: Xudong Cao >Assignee: Xudong Cao >Priority: Major > Attachments: HDDS-1530.001.patch, HDDS-1530.002.patch > > > *Current problems:* > 1. Freon does not support big files larger than 2GB because it use an int > type "keySize" parameter and also "keyValue" buffer size. > 2. Freon allocates a entire buffer for each key at once, so if the key size > is large and the concurrency is high, freon will report OOM exception > frequently. > 3. Freon lacks option such as "--validateWrites", thus users cannot manually > specify that verification is required after writing. > *Some solutions:* > 1. Use a long type "keySize" parameter, make sure freon can support big > files larger than 2GB. > 2. Use a small buffer repeatedly than allocating the entire key-size buffer > at once, the default buffer size is 4K and can be configured by "–bufferSize" > parameter. > 3. Add a "--validateWrites" option to Freon command line, users can provide > this option to indicate that a validation is required after write. > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-1532) Freon: Improve the concurrency testing framework.
[ https://issues.apache.org/jira/browse/HDDS-1532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16862046#comment-16862046 ] Xudong Cao commented on HDDS-1532: -- Test performance comparison before and after modification: > Freon: Improve the concurrency testing framework. > - > > Key: HDDS-1532 > URL: https://issues.apache.org/jira/browse/HDDS-1532 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: test >Reporter: Xudong Cao >Assignee: Xudong Cao >Priority: Major > > Currently, Freon's concurrency framework is just on volume-level, but in > actual testing, users are likely to provide a smaller volume number(typically > 1), and a larger bucket number and key number, in which case the existing > concurrency framework can not make good use of the thread pool. > We need to improve the concurrency policy, make the volume creation task, > bucket creation task, and key creation task all can be equally submitted to > the thread pool as a general task. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDDS-1532) Freon: Improve the concurrent testing framework.
[ https://issues.apache.org/jira/browse/HDDS-1532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16862046#comment-16862046 ] Xudong Cao edited comment on HDDS-1532 at 6/12/19 12:36 PM: Test performance comparison before and after this jira: 1. Test Environment Test under a 3 nodes ozone cluster, each node is equiped as below: ||Hardware ||Configuration|| |CPU, Memory|16 cores Xeon 2.10GHZ + 64GB RAM| |Disk|3.6TB HDD * 11pcs| |Network|1000M Network Card| h3. 2. Test result before jira *freon command (since there is only 1 volume, so only 1 thread will be created):* bin/ozone freon randomkeys --numOfVolumes=1 --numOfBuckets 10 --numOfKeys 1000 --keySize 1048576 --replicationType=RATIS --factor=THREE *freon result:* Number of Volumes created: 1 Number of Buckets created: 10 Number of Keys added: 1 Ratis replication factor: THREE Ratis replication type: RATIS Average Time spent in volume creation: 00:00:00,040 Average Time spent in bucket creation: 00:00:00,020 Average Time spent in key creation: 00:00:14,493 Average Time spent in key write: 00:23:14,749 Total bytes written: 1048576 Total Execution time: 00:23:31,540 h3. 3. Test result after jira *freon command (note we have set --numOfThreads 50, so although there is only 1 volume, we create 50 threads):* ozone freon randomkeys --numOfVolumes=1 --numOfBuckets 10 --numOfKeys 1000 --numOfThreads 50 --keySize 1048576 --replicationType=RATIS --factor=THREE *freon result:* Number of Volumes created: 1 Number of Buckets created: 10 Number of Keys added: 1 Ratis replication factor: THREE Ratis replication type: RATIS Average Time spent in volume creation: 00:00:00,000 Average Time spent in bucket creation: 00:00:00,002 Average Time spent in key creation: 00:00:02,430 Average Time spent in key write: 00:03:14,370 Total bytes written: 1048576 Total Execution time: 00:03:21,064 h3. 4. Conclusion In this environment, writing the same amount of data, by improving the concurrent framework, test performance is 8 times faster. was (Author: xudongcao): Test performance comparison before and after this jira: 1. Test Environment Test under a 3 nodes ozone cluster, each node is equiped as below: ||Hardware ||Configuration|| |CPU, Memory|16 cores Xeon 2.10GHZ + 64GB RAM| |Disk|3.6TB HDD * 11pcs| |Network|1000M Network Card| h3. 2. Test result before jira *freon command (since there is only 1 volume, so only 1 thread will be created):* bin/ozone freon randomkeys --numOfVolumes=1 --numOfBuckets 10 --numOfKeys 1000 --keySize 1048576 --replicationType=RATIS --factor=THREE *freon result:* Number of Volumes created: 1 Number of Buckets created: 10 Number of Keys added: 1 Ratis replication factor: THREE Ratis replication type: RATIS Average Time spent in volume creation: 00:00:00,040 Average Time spent in bucket creation: 00:00:00,020 Average Time spent in key creation: 00:00:14,493 Average Time spent in key write: 00:23:14,749 Total bytes written: 1048576 Total Execution time: 00:23:31,540 h3. 3. Test result after jira *freon command (note we have set --numOfThreads 50, so although there is only 1 volume, we create 50 threads):* ozone freon randomkeys --numOfVolumes=1 --numOfBuckets 10 --numOfKeys 1000 --numOfThreads 50 --keySize 1048576 --replicationType=RATIS --factor=THREE *freon result:* Number of Volumes created: 1 Number of Buckets created: 10 Number of Keys added: 1 Ratis replication factor: THREE Ratis replication type: RATIS Average Time spent in volume creation: 00:00:00,000 Average Time spent in bucket creation: 00:00:00,002 Average Time spent in key creation: 00:00:02,430 Average Time spent in key write: 00:03:14,370 Total bytes written: 1048576 Total Execution time: 00:03:21,064 h3. 4. Conclusion In this environment, writing the same amount of data, by improving the concurrency framework, test performance is 8 times faster. > Freon: Improve the concurrent testing framework. > > > Key: HDDS-1532 > URL: https://issues.apache.org/jira/browse/HDDS-1532 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: test >Reporter: Xudong Cao >Assignee: Xudong Cao >Priority: Major > > Currently, Freon's concurrency framework is just on volume-level, but in > actual testing, users are likely to provide a smaller volume number(typically > 1), and a larger bucket number and key number, in which case the existing > concurrency framework can not make good use of the thread pool. > We need to improve the concurrency policy, make the volume creation task, > bucket creation task, and key creation task all can be equally submitted to > the thread pool as a general task. -- This message was sent by
[jira] [Updated] (HDDS-1532) Freon: Improve the concurrent testing framework.
[ https://issues.apache.org/jira/browse/HDDS-1532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xudong Cao updated HDDS-1532: - Summary: Freon: Improve the concurrent testing framework. (was: Freon: Improve the concurrency testing framework.) > Freon: Improve the concurrent testing framework. > > > Key: HDDS-1532 > URL: https://issues.apache.org/jira/browse/HDDS-1532 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: test >Reporter: Xudong Cao >Assignee: Xudong Cao >Priority: Major > > Currently, Freon's concurrency framework is just on volume-level, but in > actual testing, users are likely to provide a smaller volume number(typically > 1), and a larger bucket number and key number, in which case the existing > concurrency framework can not make good use of the thread pool. > We need to improve the concurrency policy, make the volume creation task, > bucket creation task, and key creation task all can be equally submitted to > the thread pool as a general task. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDDS-1532) Freon: Improve the concurrency testing framework.
[ https://issues.apache.org/jira/browse/HDDS-1532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16862046#comment-16862046 ] Xudong Cao edited comment on HDDS-1532 at 6/12/19 12:25 PM: Test performance comparison before and after jira: 1. Test Environment Test under a 3 nodes ozone cluster, each node is equiped as below: ||Hardware ||Configuration|| |CPU, Memory|16 cores Xeon 2.10GHZ + 64GB RAM| |Disk|3.6TB HDD * 11pcs| |Network|1000M Network Card| h3. 2. Test result before jira *freon command (since there is only 1 volume, so only 1 thread will be created):* bin/ozone freon randomkeys --numOfVolumes=1 --numOfBuckets 10 --numOfKeys 1000 --keySize 1048576 --replicationType=RATIS --factor=THREE *freon result:* Number of Volumes created: 1 Number of Buckets created: 10 Number of Keys added: 1 Ratis replication factor: THREE Ratis replication type: RATIS Average Time spent in volume creation: 00:00:00,040 Average Time spent in bucket creation: 00:00:00,020 Average Time spent in key creation: 00:00:14,493 Average Time spent in key write: 00:23:14,749 Total bytes written: 1048576 Total Execution time: 00:23:31,540 h3. 3. Test result after jira *freon command (note we have set --numOfThreads 50, so although there is only 1 volume, we create 50 threads):* ozone freon randomkeys --numOfVolumes=1 --numOfBuckets 10 --numOfKeys 1000 --numOfThreads 50 --keySize 1048576 --replicationType=RATIS --factor=THREE *freon result:* Number of Volumes created: 1 Number of Buckets created: 10 Number of Keys added: 1 Ratis replication factor: THREE Ratis replication type: RATIS Average Time spent in volume creation: 00:00:00,000 Average Time spent in bucket creation: 00:00:00,002 Average Time spent in key creation: 00:00:02,430 Average Time spent in key write: 00:03:14,370 Total bytes written: 1048576 Total Execution time: 00:03:21,064 h3. 4. Conclusion In this environment, writing the same amount of data, by improving the concurrency framework, test performance is 8 times faster. was (Author: xudongcao): Test performance comparison before and after modification: > Freon: Improve the concurrency testing framework. > - > > Key: HDDS-1532 > URL: https://issues.apache.org/jira/browse/HDDS-1532 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: test >Reporter: Xudong Cao >Assignee: Xudong Cao >Priority: Major > > Currently, Freon's concurrency framework is just on volume-level, but in > actual testing, users are likely to provide a smaller volume number(typically > 1), and a larger bucket number and key number, in which case the existing > concurrency framework can not make good use of the thread pool. > We need to improve the concurrency policy, make the volume creation task, > bucket creation task, and key creation task all can be equally submitted to > the thread pool as a general task. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDDS-1532) Freon: Improve the concurrency testing framework.
[ https://issues.apache.org/jira/browse/HDDS-1532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16862046#comment-16862046 ] Xudong Cao edited comment on HDDS-1532 at 6/12/19 12:34 PM: Test performance comparison before and after this jira: 1. Test Environment Test under a 3 nodes ozone cluster, each node is equiped as below: ||Hardware ||Configuration|| |CPU, Memory|16 cores Xeon 2.10GHZ + 64GB RAM| |Disk|3.6TB HDD * 11pcs| |Network|1000M Network Card| h3. 2. Test result before jira *freon command (since there is only 1 volume, so only 1 thread will be created):* bin/ozone freon randomkeys --numOfVolumes=1 --numOfBuckets 10 --numOfKeys 1000 --keySize 1048576 --replicationType=RATIS --factor=THREE *freon result:* Number of Volumes created: 1 Number of Buckets created: 10 Number of Keys added: 1 Ratis replication factor: THREE Ratis replication type: RATIS Average Time spent in volume creation: 00:00:00,040 Average Time spent in bucket creation: 00:00:00,020 Average Time spent in key creation: 00:00:14,493 Average Time spent in key write: 00:23:14,749 Total bytes written: 1048576 Total Execution time: 00:23:31,540 h3. 3. Test result after jira *freon command (note we have set --numOfThreads 50, so although there is only 1 volume, we create 50 threads):* ozone freon randomkeys --numOfVolumes=1 --numOfBuckets 10 --numOfKeys 1000 --numOfThreads 50 --keySize 1048576 --replicationType=RATIS --factor=THREE *freon result:* Number of Volumes created: 1 Number of Buckets created: 10 Number of Keys added: 1 Ratis replication factor: THREE Ratis replication type: RATIS Average Time spent in volume creation: 00:00:00,000 Average Time spent in bucket creation: 00:00:00,002 Average Time spent in key creation: 00:00:02,430 Average Time spent in key write: 00:03:14,370 Total bytes written: 1048576 Total Execution time: 00:03:21,064 h3. 4. Conclusion In this environment, writing the same amount of data, by improving the concurrency framework, test performance is 8 times faster. was (Author: xudongcao): Test performance comparison before and after jira: 1. Test Environment Test under a 3 nodes ozone cluster, each node is equiped as below: ||Hardware ||Configuration|| |CPU, Memory|16 cores Xeon 2.10GHZ + 64GB RAM| |Disk|3.6TB HDD * 11pcs| |Network|1000M Network Card| h3. 2. Test result before jira *freon command (since there is only 1 volume, so only 1 thread will be created):* bin/ozone freon randomkeys --numOfVolumes=1 --numOfBuckets 10 --numOfKeys 1000 --keySize 1048576 --replicationType=RATIS --factor=THREE *freon result:* Number of Volumes created: 1 Number of Buckets created: 10 Number of Keys added: 1 Ratis replication factor: THREE Ratis replication type: RATIS Average Time spent in volume creation: 00:00:00,040 Average Time spent in bucket creation: 00:00:00,020 Average Time spent in key creation: 00:00:14,493 Average Time spent in key write: 00:23:14,749 Total bytes written: 1048576 Total Execution time: 00:23:31,540 h3. 3. Test result after jira *freon command (note we have set --numOfThreads 50, so although there is only 1 volume, we create 50 threads):* ozone freon randomkeys --numOfVolumes=1 --numOfBuckets 10 --numOfKeys 1000 --numOfThreads 50 --keySize 1048576 --replicationType=RATIS --factor=THREE *freon result:* Number of Volumes created: 1 Number of Buckets created: 10 Number of Keys added: 1 Ratis replication factor: THREE Ratis replication type: RATIS Average Time spent in volume creation: 00:00:00,000 Average Time spent in bucket creation: 00:00:00,002 Average Time spent in key creation: 00:00:02,430 Average Time spent in key write: 00:03:14,370 Total bytes written: 1048576 Total Execution time: 00:03:21,064 h3. 4. Conclusion In this environment, writing the same amount of data, by improving the concurrency framework, test performance is 8 times faster. > Freon: Improve the concurrency testing framework. > - > > Key: HDDS-1532 > URL: https://issues.apache.org/jira/browse/HDDS-1532 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: test >Reporter: Xudong Cao >Assignee: Xudong Cao >Priority: Major > > Currently, Freon's concurrency framework is just on volume-level, but in > actual testing, users are likely to provide a smaller volume number(typically > 1), and a larger bucket number and key number, in which case the existing > concurrency framework can not make good use of the thread pool. > We need to improve the concurrency policy, make the volume creation task, > bucket creation task, and key creation task all can be equally submitted to > the thread pool as a general task. -- This message was sent by Atlassian JIRA
[jira] [Updated] (HDDS-1703) Freon uses wait/notify instead of polling to eliminate the test result errors.
[ https://issues.apache.org/jira/browse/HDDS-1703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xudong Cao updated HDDS-1703: - Summary: Freon uses wait/notify instead of polling to eliminate the test result errors. (was: Freon uses wait/notify instead of polling to eliminate the test result error.) > Freon uses wait/notify instead of polling to eliminate the test result errors. > -- > > Key: HDDS-1703 > URL: https://issues.apache.org/jira/browse/HDDS-1703 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: test >Affects Versions: 0.4.0 >Reporter: Xudong Cao >Assignee: Xudong Cao >Priority: Minor > > After HDDS-1532, Freon has an efficient concurrent testing framework. In the > new framework, the main thread checks every 5s to verify whether the test is > completed (or an exception occurred), which will eventually introduce a > maximum error of 5s. > In most cases, Freon's test results are at minutes or tens of minutes level, > thus a 5s error is not significant, but in some particularly small tests, a > 5s error may have a significant impact. > Therefore, we can use the combination of Object.wait() + Object.notify() > instead of polling to completely eliminate this error. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1703) Freon uses wait/notify instead of polling to eliminate the test result errors.
[ https://issues.apache.org/jira/browse/HDDS-1703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xudong Cao updated HDDS-1703: - Issue Type: Bug (was: Improvement) > Freon uses wait/notify instead of polling to eliminate the test result errors. > -- > > Key: HDDS-1703 > URL: https://issues.apache.org/jira/browse/HDDS-1703 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: test >Affects Versions: 0.4.0 >Reporter: Xudong Cao >Assignee: Xudong Cao >Priority: Minor > > After HDDS-1532, Freon has an efficient concurrent testing framework. In the > new framework, the main thread checks every 5s to verify whether the test is > completed (or an exception occurred), which will eventually introduce a > maximum error of 5s. > In most cases, Freon's test results are at minutes or tens of minutes level, > thus a 5s error is not significant, but in some particularly small tests, a > 5s error may have a significant impact. > Therefore, we can use the combination of Object.wait() + Object.notify() > instead of polling to completely eliminate this error. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-1703) Freon uses wait/notify instead of polling to eliminate the test result error.
Xudong Cao created HDDS-1703: Summary: Freon uses wait/notify instead of polling to eliminate the test result error. Key: HDDS-1703 URL: https://issues.apache.org/jira/browse/HDDS-1703 Project: Hadoop Distributed Data Store Issue Type: Improvement Components: test Affects Versions: 0.4.0 Reporter: Xudong Cao Assignee: Xudong Cao After HDDS-1532, Freon has an efficient concurrent testing framework. In the new framework, the main thread checks every 5s to verify whether the test is completed (or an exception occurred), which will eventually introduce a maximum error of 5s. In most cases, Freon's test results are at minutes or tens of minutes level, thus a 5s error is not significant, but in some particularly small tests, a 5s error may have a significant impact. Therefore, we can use the combination of Object.wait() + Object.notify() instead of polling to completely eliminate this error. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1532) Freon: Improve the concurrent testing framework.
[ https://issues.apache.org/jira/browse/HDDS-1532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xudong Cao updated HDDS-1532: - Affects Version/s: 0.4.0 Status: Patch Available (was: Open) > Freon: Improve the concurrent testing framework. > > > Key: HDDS-1532 > URL: https://issues.apache.org/jira/browse/HDDS-1532 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: test >Affects Versions: 0.4.0 >Reporter: Xudong Cao >Assignee: Xudong Cao >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Currently, Freon's concurrency framework is just on volume-level, but in > actual testing, users are likely to provide a smaller volume number(typically > 1), and a larger bucket number and key number, in which case the existing > concurrency framework can not make good use of the thread pool. > We need to improve the concurrency policy, make the volume creation task, > bucket creation task, and key creation task all can be equally submitted to > the thread pool as a general task. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1532) Ozone: Freon: Improve the concurrent testing framework.
[ https://issues.apache.org/jira/browse/HDDS-1532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xudong Cao updated HDDS-1532: - Summary: Ozone: Freon: Improve the concurrent testing framework. (was: Freon: Improve the concurrent testing framework.) > Ozone: Freon: Improve the concurrent testing framework. > --- > > Key: HDDS-1532 > URL: https://issues.apache.org/jira/browse/HDDS-1532 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: test >Affects Versions: 0.4.0 >Reporter: Xudong Cao >Assignee: Xudong Cao >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Currently, Freon's concurrency framework is just on volume-level, but in > actual testing, users are likely to provide a smaller volume number(typically > 1), and a larger bucket number and key number, in which case the existing > concurrency framework can not make good use of the thread pool. > We need to improve the concurrency policy, make the volume creation task, > bucket creation task, and key creation task all can be equally submitted to > the thread pool as a general task. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14646) Standby NameNode should terminate the FsImage put process as soon as possible if the peer NN is not in the appropriate state to receive an image.
[ https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xudong Cao updated HDFS-14646: -- Description: *Problem Description:* In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put the image to all other NNs (whether the peer NN is an ANN or not), and even if the peer NN immediately replies with an error (such as TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put process immediately, but will put the FsImage completely to the peer NN, and will not read the peer NN's reply until the put is completed. In a relatively large HDFS cluster, the size of FsImage can often reach about 30GB. In this case, this invalid put brings two problems: # Wasting time and bandwidth. # Since the ImageServlet of the peer NN no longer receives the FsImage, the socket Send-Q of the local SNN is very large, and the ImageUpload thread will be blocked in writing socket for a long time, eventually causing the local StandbyCheckpointer thread often blocked for several hours. *An example is as follows:* In the following figure, the local NN 100.76.3.234 is a SNN, the peer NN 100.76.3.170 is another SNN, and the 8080 is NN Http port. When the local SNN starts to put the FsImage, 170 will reply with a NOT_ACTIVE_NAMENODE_FAILURE error immediately. In this case, the local SNN should terminate put immediately, but in fact, local SNN has to wait until the image has been completely put to the peer NN,and then can read the response. # At this time, since the ImageServlet of the peer NN no longer receives the FsImage, the socket Send-Q of the local SNN is very large: !largeSendQ.png! 2. Moreover, the local SNN's ImageUpload thread will be blocked in writing socket for a long time: !blockedInWritingSocket.png! . 3. Eventually, the StandbyCheckpointer thread of local SNN is waiting for the execution result of the ImageUpload thread, blocking in Future.get(), and the blocking time may be as long as several hours: !get1.png! !get2.png! *Solution:* When the local SNN plans to put a FsImage to the peer NN, it need to test whether he really need to put it at this time. The test process is: # Establish an HTTP connection with the peer NN, send the put request, and then immediately read the response (this is the key point). If the peer NN replies with any of the following errors (TransferResult.AUTHENTICATION_FAILURE, TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult. OLD_TRANSACTION_ID_FAILURE), immediately terminate the put process. # If the peer NN is truly the ANN and can receive the FsImage normally, it will reply to the local SNN with an HTTP response 410 (HttpServletResponse.SC_GONE, which is TransferResult.UNEXPECTED_FAILURE). At this time, the local SNN can really begin to put the image. *Note:* This problem needs to be reproduced in a large cluster (the size of FsImage in our cluster is about 30GB). Therefore, unit testing is difficult to write. In our cluster, after the modification, the problem has been solved and there is no such thing as a large backlog of Send-Q. was: *Problem Description:* In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put the image to all other NNs (whether the peer NN is an ANN or not), and even if the peer NN immediately replies with an error (such as TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put process immediately, but will put the FsImage completely to the peer NN, and will not read the peer NN's reply until the put is completed. In a relatively large HDFS cluster, the size of FsImage can often reach about 30G. In this case, this invalid put brings two problems: # Wasting time and bandwidth. # Since the ImageServlet of the peer NN no longer receives the FsImage, the socket Send-Q of the local SNN is very large, and the ImageUpload thread will be blocked in writing socket for a long time, eventually causing the local StandbyCheckpointer thread often blocked for several hours. *An example is as follows:* In the following figure, the local NN 100.76.3.234 is a SNN, the peer NN 100.76.3.170 is another SNN, and the 8080 is NN Http port. When the local SNN starts to put FsImage, 170 will reply with a NOT_ACTIVE_NAMENODE_FAILURE error immediately. In this case, the local SNN should terminate put immediately, but in fact, local SNN has to wait until the image has been completely put to peer NN,and then canl read the response. # At this time, since the ImageServlet of the peer NN no longer receives the FsImage, the socket Send-Q of the local SNN is very large: !largeSendQ.png! 2. Moreover, the local SNN's ImageUpload
[jira] [Updated] (HDFS-14646) Standby NameNode should terminate the FsImage put process as soon as possible if the peer NN is not in the appropriate state to receive an image.
[ https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xudong Cao updated HDFS-14646: -- Description: *Problem Description:* In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put the image to all other NNs (whether the peer NN is an ANN or not), and even if the peer NN immediately replies with an error (such as TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put process immediately, but will put the FsImage completely to the peer NN, and will not read the peer NN's reply until the put is completed. In a relatively large HDFS cluster, the size of FsImage can often reach about 30G. In this case, this invalid put brings two problems: # Wasting time and bandwidth. # Since the ImageServlet of the peer NN no longer receives the FsImage, the socket Send-Q of the local SNN is very large, and the ImageUpload thread will be blocked in writing socket for a long time, eventually causing the local StandbyCheckpointer thread often blocked for several hours. *An example is as follows:* In the following figure, the local NN 100.76.3.234 is a SNN, the peer NN 100.76.3.170 is another SNN, and the 8080 is NN Http port. When the local SNN starts to put FsImage, 170 will reply with a NOT_ACTIVE_NAMENODE_FAILURE error immediately. In this case, the local SNN should terminate put immediately, but in fact, local SNN has to wait until the image has been completely put to peer NN,and then canl read the response. # At this time, since the ImageServlet of the peer NN no longer receives the FsImage, the socket Send-Q of the local SNN is very large: !largeSendQ.png! 2. Moreover, the local SNN's ImageUpload thread will be blocked in writing socket for a long time: !blockedInWritingSocket.png! . 3. Eventually, the StandbyCheckpointer thread of local SNN is waiting for the execution result of the ImageUpload thread, blocking in Future.get(), and the blocking time may be as long as several hours: !get1.png! !get2.png! *Solution:* When the local SNN plans to put a FsImage to the peer NN, it need to test whether he really need to put it at this time. The test process is: # Establish an HTTP connection with the peer NN, send the put request, and then immediately read the response (this is the key point). If the peer NN replies with any of the following errors (TransferResult.AUTHENTICATION_FAILURE, TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult. OLD_TRANSACTION_ID_FAILURE), immediately terminate the put process. # If the peer NN is truly the ANN and can receive the FsImage normally, it will reply to the local SNN with an HTTP response 410 (HttpServletResponse.SC_GONE, which is TransferResult.UNEXPECTED_FAILURE). At this time, the local SNN can really begin to put the image. *Note:* This problem needs to be reproduced in a large cluster (the size of FsImage in our cluster is about 30G). Therefore, unit testing is difficult to write. In our cluster, after the modification, the problem has been solved and there is no such thing as a large backlog of Send-Q. was: *Problem Description:* In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put the image to all other NNs (whether the peer NN is an ANN or not), and even if the peer NN immediately replies with an error (such as TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put process immediately, but will put the FsImage completely to the peer NN, and will not read the peer NN's reply until the put is completed. In a relatively large HDFS cluster, the size of FsImage can often reach about 30G. In this case, this invalid put brings two problems: # Wasting time and bandwidth. # Since the ImageServlet of the peer NN no longer receives the FsImage, the socket Send-Q of the local SNN is very large, and the ImageUpload thread will be blocked in writing socket for a long time, eventually causing the local StandbyCheckpointer thread often blocked for several hours. *An example is as follows:* In the following figure, the local NN 100.76.3.234 is a SNN, the peer NN 100.76.3.170 is another SNN, and the 8080 is NN Http port. When the local SNN starts to put FsImage, 170 will reply with a NOT_ACTIVE_NAMENODE_FAILURE error immediately. In this case, the local SNN should terminate put immediately, but in fact, local SNN has to wait until the image has been completely put to peer NN,and then canl read the response. # At this time, since the ImageServlet of the peer NN no longer receives the FsImage, the socket Send-Q of the local SNN is very large: !largeSendQ.png! 2. Moreover, the local SNN's ImageUpload thread will
[jira] [Updated] (HDFS-14646) Standby NameNode should terminate the FsImage put process as soon as possible if the peer NN is not in the appropriate state to receive an image.
[ https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xudong Cao updated HDFS-14646: -- Attachment: (was: get2.png) > Standby NameNode should terminate the FsImage put process as soon as possible > if the peer NN is not in the appropriate state to receive an image. > - > > Key: HDFS-14646 > URL: https://issues.apache.org/jira/browse/HDFS-14646 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.1.2 >Reporter: Xudong Cao >Assignee: Xudong Cao >Priority: Major > Attachments: blockedInWritingSocket.png, get1.png, get2.png, > largeSendQ.png > > > *Problem Description:* > In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put > the image to all other NNs (whether the peer NN is an ANN or not), and even > if the peer NN immediately replies with an error (such as > TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult > .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put > process immediately, but will put the FsImage completely to the peer NN, and > will not read the peer NN's reply until the put is completed. > In a relatively large HDFS cluster, the size of FsImage can often reach about > 30G. In this case, this invalid put brings two problems: > # Wasting time and bandwidth. > # Since the ImageServlet of the peer NN no longer receives the FsImage, the > socket Send-Q of the local SNN is very large, and the ImageUpload thread will > be blocked in writing socket for a long time, eventually causing the local > StandbyCheckpointer thread often blocked for several hours. > *An example is as follows:* > In the following figure, the local NN 100.76.3.234 is a SNN, the peer NN > 100.76.3.170 is another SNN, and the 8080 is NN Http port. When the local SNN > starts to put FsImage, 170 will reply with a NOT_ACTIVE_NAMENODE_FAILURE > error immediately. In this case, the local SNN should terminate put > immediately, but in fact, local SNN has to wait until the image has been > completely put to peer NN,and then canl read the response. > # At this time, since the ImageServlet of the peer NN no longer receives the > FsImage, the socket Send-Q of the local SNN is very large: > !largeSendQ.png! > 2. Moreover, the local SNN's ImageUpload thread will be blocked in > writing socket for a long time: > !blockedInWritingSocket.png!. > > 3. Eventually, the StandbyCheckpointer thread of local SNN is waiting > for the execution result of the ImageUpload thread, blocking in Future.get(), > and the blocking time may be as long as several hours: > !get1.png! > > > > > *Solution:* > When the local SNN is ready to put a FsImage to the peer NN, it need to test > whether he really need to put it at this time. The test process is: > # Establish an HTTP connection with the peer NN, send a put request, and > then immediately read the response (this is the key point). If the peer NN > replies with any of the following errors > (TransferResult.AUTHENTICATION_FAILURE, > TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult. > # If the peer NN is truly the ANN and can receive the FsImage normally, it > will reply to the local SNN with an HTTP response 410 > (HttpServletResponse.SC_GONE, which is TransferResult.UNEXPECTED_FAILURE). At > this time, the local SNN can really begin to put the image. > *Note:* > This problem needs to be reproduced in a large cluster (the size of FsImage > in our cluster is about 30G). Therefore, unit testing is difficult to write. > In our real cluster, after the modification, the problem has been solved. > There is no such thing as a large backlog of Send-Q. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14646) Standby NameNode should terminate the FsImage put process as soon as possible if the peer NN is not in the appropriate state to receive an image.
[ https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xudong Cao updated HDFS-14646: -- Attachment: get2.png > Standby NameNode should terminate the FsImage put process as soon as possible > if the peer NN is not in the appropriate state to receive an image. > - > > Key: HDFS-14646 > URL: https://issues.apache.org/jira/browse/HDFS-14646 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.1.2 >Reporter: Xudong Cao >Assignee: Xudong Cao >Priority: Major > Attachments: blockedInWritingSocket.png, get1.png, get2.png, > largeSendQ.png > > > *Problem Description:* > In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put > the image to all other NNs (whether the peer NN is an ANN or not), and even > if the peer NN immediately replies with an error (such as > TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult > .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put > process immediately, but will put the FsImage completely to the peer NN, and > will not read the peer NN's reply until the put is completed. > In a relatively large HDFS cluster, the size of FsImage can often reach about > 30G. In this case, this invalid put brings two problems: > # Wasting time and bandwidth. > # Since the ImageServlet of the peer NN no longer receives the FsImage, the > socket Send-Q of the local SNN is very large, and the ImageUpload thread will > be blocked in writing socket for a long time, eventually causing the local > StandbyCheckpointer thread often blocked for several hours. > *An example is as follows:* > In the following figure, the local NN 100.76.3.234 is a SNN, the peer NN > 100.76.3.170 is another SNN, and the 8080 is NN Http port. When the local SNN > starts to put FsImage, 170 will reply with a NOT_ACTIVE_NAMENODE_FAILURE > error immediately. In this case, the local SNN should terminate put > immediately, but in fact, local SNN has to wait until the image has been > completely put to peer NN,and then canl read the response. > # At this time, since the ImageServlet of the peer NN no longer receives the > FsImage, the socket Send-Q of the local SNN is very large: > !largeSendQ.png! > 2. Moreover, the local SNN's ImageUpload thread will be blocked in > writing socket for a long time: > !blockedInWritingSocket.png!. > > 3. Eventually, the StandbyCheckpointer thread of local SNN is waiting > for the execution result of the ImageUpload thread, blocking in Future.get(), > and the blocking time may be as long as several hours: > !get1.png! > > > > > *Solution:* > When the local SNN is ready to put a FsImage to the peer NN, it need to test > whether he really need to put it at this time. The test process is: > # Establish an HTTP connection with the peer NN, send a put request, and > then immediately read the response (this is the key point). If the peer NN > replies with any of the following errors > (TransferResult.AUTHENTICATION_FAILURE, > TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult. > # If the peer NN is truly the ANN and can receive the FsImage normally, it > will reply to the local SNN with an HTTP response 410 > (HttpServletResponse.SC_GONE, which is TransferResult.UNEXPECTED_FAILURE). At > this time, the local SNN can really begin to put the image. > *Note:* > This problem needs to be reproduced in a large cluster (the size of FsImage > in our cluster is about 30G). Therefore, unit testing is difficult to write. > In our real cluster, after the modification, the problem has been solved. > There is no such thing as a large backlog of Send-Q. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14646) Standby NameNode should terminate the FsImage put process as soon as possible if the peer NN is not in the appropriate state to receive an image.
[ https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xudong Cao updated HDFS-14646: -- Description: *Problem Description:* In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put the image to all other NNs (whether the peer NN is an ANN or not), and even if the peer NN immediately replies with an error (such as TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put process immediately, but will put the FsImage completely to the peer NN, and will not read the peer NN's reply until the put is completed. In a relatively large HDFS cluster, the size of FsImage can often reach about 30G. In this case, this invalid put brings two problems: # Wasting time and bandwidth. # Since the ImageServlet of the peer NN no longer receives the FsImage, the socket Send-Q of the local SNN is very large, and the ImageUpload thread will be blocked in writing socket for a long time, eventually causing the local StandbyCheckpointer thread often blocked for several hours. *An example is as follows:* In the following figure, the local NN 100.76.3.234 is a SNN, the peer NN 100.76.3.170 is another SNN, and the 8080 is NN Http port. When the local SNN starts to put FsImage, 170 will reply with a NOT_ACTIVE_NAMENODE_FAILURE error immediately. In this case, the local SNN should terminate put immediately, but in fact, local SNN has to wait until the image has been completely put to peer NN,and then canl read the response. # At this time, since the ImageServlet of the peer NN no longer receives the FsImage, the socket Send-Q of the local SNN is very large: !largeSendQ.png! 2. Moreover, the local SNN's ImageUpload thread will be blocked in writing socket for a long time: !blockedInWritingSocket.png!. 3. Eventually, the StandbyCheckpointer thread of local SNN is waiting for the execution result of the ImageUpload thread, blocking in Future.get(), and the blocking time may be as long as several hours: !get1.png! *Solution:* When the local SNN is ready to put a FsImage to the peer NN, it need to test whether he really need to put it at this time. The test process is: # Establish an HTTP connection with the peer NN, send a put request, and then immediately read the response (this is the key point). If the peer NN replies with any of the following errors (TransferResult.AUTHENTICATION_FAILURE, TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult. # If the peer NN is truly the ANN and can receive the FsImage normally, it will reply to the local SNN with an HTTP response 410 (HttpServletResponse.SC_GONE, which is TransferResult.UNEXPECTED_FAILURE). At this time, the local SNN can really begin to put the image. *Note:* This problem needs to be reproduced in a large cluster (the size of FsImage in our cluster is about 30G). Therefore, unit testing is difficult to write. In our real cluster, after the modification, the problem has been solved. There is no such thing as a large backlog of Send-Q. was: *Problem Description:* In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put the image to all other NNs (whether the peer NN is an ANN or not), and even if the peer NN immediately replies with an error (such as TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put process immediately, but will put the FsImage completely to the peer NN, and will not read the peer NN's reply until the put is completed. In a relatively large HDFS cluster, the size of FsImage can often reach about 30G. In this case, this invalid put brings two problems: # Wasting time and bandwidth. # Since the ImageServlet of the peer NN no longer receives the FsImage, the socket Send-Q of the local SNN is very large, and the ImageUpload thread will be blocked in writing socket for a long time, eventually causing the local StandbyCheckpointer thread often blocked for several hours. *An example is as follows:* In the following figure, the local NN 100.76.3.234 is a SNN, the peer NN 100.76.3.170 is another SNN, and the 8080 is NN Http port. When the local SNN starts to put FsImage, 170 will reply with a NOT_ACTIVE_NAMENODE_FAILURE error immediately. In this case, the local SNN should terminate put immediately, but in fact, local SNN has to wait until the image has been completely put to peer NN,and then canl read the response. # At this time, since the ImageServlet of the peer NN no longer receives the FsImage, the socket Send-Q of the local SNN is very large: 2. Moreover, the local SNN's ImageUpload thread will be blocked in writing socket for a long time: !blockWriiting.png! 3. Eventually, the
[jira] [Updated] (HDFS-14646) Standby NameNode should terminate the FsImage put process as soon as possible if the peer NN is not in the appropriate state to receive an image.
[ https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xudong Cao updated HDFS-14646: -- Description: *Problem Description:* In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put the image to all other NNs (whether the peer NN is an ANN or not), and even if the peer NN immediately replies with an error (such as TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put process immediately, but will put the FsImage completely to the peer NN, and will not read the peer NN's reply until the put is completed. In a relatively large HDFS cluster, the size of FsImage can often reach about 30G. In this case, this invalid put brings two problems: # Wasting time and bandwidth. # Since the ImageServlet of the peer NN no longer receives the FsImage, the socket Send-Q of the local SNN is very large, and the ImageUpload thread will be blocked in writing socket for a long time, eventually causing the local StandbyCheckpointer thread often blocked for several hours. *An example is as follows:* In the following figure, the local NN 100.76.3.234 is a SNN, the peer NN 100.76.3.170 is another SNN, and the 8080 is NN Http port. When the local SNN starts to put FsImage, 170 will reply with a NOT_ACTIVE_NAMENODE_FAILURE error immediately. In this case, the local SNN should terminate put immediately, but in fact, local SNN has to wait until the image has been completely put to peer NN,and then canl read the response. # At this time, since the ImageServlet of the peer NN no longer receives the FsImage, the socket Send-Q of the local SNN is very large: 2. Moreover, the local SNN's ImageUpload thread will be blocked in writing socket for a long time: !blockWriiting.png! 3. Eventually, the StandbyCheckpointer thread of local SNN is waiting for the execution result of the ImageUpload thread, blocking in Future.get(), and the blocking time may be as long as several hours: !get1.png! !get2.png! *Solution:* When the local SNN is ready to put a FsImage to the peer NN, it need to test whether he really need to put it at this time. The test process is: # Establish an HTTP connection with the peer NN, send a put request, and then immediately read the response (this is the key point). If the peer NN replies with any of the following errors (TransferResult.AUTHENTICATION_FAILURE, TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult. # If the peer NN is truly the ANN and can receive the FsImage normally, it will reply to the local SNN with an HTTP response 410 (HttpServletResponse.SC_GONE, which is TransferResult.UNEXPECTED_FAILURE). At this time, the local SNN can really begin to put the image. *Note:* This problem needs to be reproduced in a large cluster (the size of FsImage in our cluster is about 30G). Therefore, unit testing is difficult to write. In our real cluster, after the modification, the problem has been solved. There is no such thing as a large backlog of Send-Q. was: *Problem Description:* In multi-NameNode scenario, when an SNN uploads a FsImage, it will put the image to all other NNs (whether the peer NN is an ANN or not), and even if the peer NN immediately replies with an error (such as TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put process immediately, but will put the FsImage completely to the peer NN, and will not read the peer NN's reply until the put is completed. In a relatively large HDFS cluster, the size of FsImage can often reach about 30G. In this case, this invalid put brings two problems: 1. Wasting time and bandwidth. 2. Since the ImageServlet of the peer NN no longer receives the FsImage, the socket Send-Q of the local SNN is very large, and the ImageUpload thread will be blocked in writting socket for a long time, eventually causing the local StandbyCheckpointer thread often blocked for several hours. *An example is as follows:* In the following figure, the local NN 100.76.3.234 is an SNN, the peer NN 100.76.3.170 is another SNN, and the 8080 is NN Http port. When the local SNN starts to put FsImage, 170 will reply with a NOT_ACTIVE_NAMENODE_FAILURE error immediately. In this case, local SNN should terminate put immediately, but in fact, local SNN has to wait until the image has been completely put to peer NN,and then canl read the response. # At this time, since the ImageServlet of the peer NN no longer receives the FsImage, the socket Send-Q of the local SNN is very large: !largeSendQ.png! 2. Moreover, the local SNN's ImageUpload thread will be blocked in writing socket for a long time: !blockWriiting.png! 3. Eventually, the StandbyCheckpointer thread of local SNN is waiting for the
[jira] [Updated] (HDFS-14646) Standby NameNode should terminate the FsImage put process as soon as possible if the peer NN is not in the appropriate state to receive an image.
[ https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xudong Cao updated HDFS-14646: -- Attachment: (was: blockWriiting.png) > Standby NameNode should terminate the FsImage put process as soon as possible > if the peer NN is not in the appropriate state to receive an image. > - > > Key: HDFS-14646 > URL: https://issues.apache.org/jira/browse/HDFS-14646 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.1.2 >Reporter: Xudong Cao >Assignee: Xudong Cao >Priority: Major > Attachments: get1.png, get2.png, largeSendQ.png > > > *Problem Description:* > In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put > the image to all other NNs (whether the peer NN is an ANN or not), and even > if the peer NN immediately replies with an error (such as > TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult > .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put > process immediately, but will put the FsImage completely to the peer NN, and > will not read the peer NN's reply until the put is completed. > In a relatively large HDFS cluster, the size of FsImage can often reach about > 30G. In this case, this invalid put brings two problems: > # Wasting time and bandwidth. > # Since the ImageServlet of the peer NN no longer receives the FsImage, the > socket Send-Q of the local SNN is very large, and the ImageUpload thread will > be blocked in writing socket for a long time, eventually causing the local > StandbyCheckpointer thread often blocked for several hours. > *An example is as follows:* > In the following figure, the local NN 100.76.3.234 is a SNN, the peer NN > 100.76.3.170 is another SNN, and the 8080 is NN Http port. When the local SNN > starts to put FsImage, 170 will reply with a NOT_ACTIVE_NAMENODE_FAILURE > error immediately. In this case, the local SNN should terminate put > immediately, but in fact, local SNN has to wait until the image has been > completely put to peer NN,and then canl read the response. > # At this time, since the ImageServlet of the peer NN no longer receives the > FsImage, the socket Send-Q of the local SNN is very large: > > 2. Moreover, the local SNN's ImageUpload thread will be blocked in > writing socket for a long time: > !blockWriiting.png! > > 3. Eventually, the StandbyCheckpointer thread of local SNN is waiting > for the execution result of the ImageUpload thread, blocking in Future.get(), > and the blocking time may be as long as several hours: > !get1.png! > !get2.png! > > *Solution:* > When the local SNN is ready to put a FsImage to the peer NN, it need to test > whether he really need to put it at this time. The test process is: > # Establish an HTTP connection with the peer NN, send a put request, and > then immediately read the response (this is the key point). If the peer NN > replies with any of the following errors > (TransferResult.AUTHENTICATION_FAILURE, > TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult. > # If the peer NN is truly the ANN and can receive the FsImage normally, it > will reply to the local SNN with an HTTP response 410 > (HttpServletResponse.SC_GONE, which is TransferResult.UNEXPECTED_FAILURE). At > this time, the local SNN can really begin to put the image. > *Note:* > This problem needs to be reproduced in a large cluster (the size of FsImage > in our cluster is about 30G). Therefore, unit testing is difficult to write. > In our real cluster, after the modification, the problem has been solved. > There is no such thing as a large backlog of Send-Q. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14646) Standby NameNode should terminate the FsImage put process as soon as possible if the peer NN is not in the appropriate state to receive an image.
[ https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xudong Cao updated HDFS-14646: -- Attachment: largeSendQ.png > Standby NameNode should terminate the FsImage put process as soon as possible > if the peer NN is not in the appropriate state to receive an image. > - > > Key: HDFS-14646 > URL: https://issues.apache.org/jira/browse/HDFS-14646 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.1.2 >Reporter: Xudong Cao >Assignee: Xudong Cao >Priority: Major > Attachments: get1.png, get2.png, largeSendQ.png > > > *Problem Description:* > In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put > the image to all other NNs (whether the peer NN is an ANN or not), and even > if the peer NN immediately replies with an error (such as > TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult > .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put > process immediately, but will put the FsImage completely to the peer NN, and > will not read the peer NN's reply until the put is completed. > In a relatively large HDFS cluster, the size of FsImage can often reach about > 30G. In this case, this invalid put brings two problems: > # Wasting time and bandwidth. > # Since the ImageServlet of the peer NN no longer receives the FsImage, the > socket Send-Q of the local SNN is very large, and the ImageUpload thread will > be blocked in writing socket for a long time, eventually causing the local > StandbyCheckpointer thread often blocked for several hours. > *An example is as follows:* > In the following figure, the local NN 100.76.3.234 is a SNN, the peer NN > 100.76.3.170 is another SNN, and the 8080 is NN Http port. When the local SNN > starts to put FsImage, 170 will reply with a NOT_ACTIVE_NAMENODE_FAILURE > error immediately. In this case, the local SNN should terminate put > immediately, but in fact, local SNN has to wait until the image has been > completely put to peer NN,and then canl read the response. > # At this time, since the ImageServlet of the peer NN no longer receives the > FsImage, the socket Send-Q of the local SNN is very large: > > 2. Moreover, the local SNN's ImageUpload thread will be blocked in > writing socket for a long time: > !blockWriiting.png! > > 3. Eventually, the StandbyCheckpointer thread of local SNN is waiting > for the execution result of the ImageUpload thread, blocking in Future.get(), > and the blocking time may be as long as several hours: > !get1.png! > !get2.png! > > *Solution:* > When the local SNN is ready to put a FsImage to the peer NN, it need to test > whether he really need to put it at this time. The test process is: > # Establish an HTTP connection with the peer NN, send a put request, and > then immediately read the response (this is the key point). If the peer NN > replies with any of the following errors > (TransferResult.AUTHENTICATION_FAILURE, > TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult. > # If the peer NN is truly the ANN and can receive the FsImage normally, it > will reply to the local SNN with an HTTP response 410 > (HttpServletResponse.SC_GONE, which is TransferResult.UNEXPECTED_FAILURE). At > this time, the local SNN can really begin to put the image. > *Note:* > This problem needs to be reproduced in a large cluster (the size of FsImage > in our cluster is about 30G). Therefore, unit testing is difficult to write. > In our real cluster, after the modification, the problem has been solved. > There is no such thing as a large backlog of Send-Q. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14646) Standby NameNode should terminate the FsImage put process as soon as possible if the peer NN is not in the appropriate state to receive an image.
[ https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xudong Cao updated HDFS-14646: -- Attachment: (was: largeSendQ.png) > Standby NameNode should terminate the FsImage put process as soon as possible > if the peer NN is not in the appropriate state to receive an image. > - > > Key: HDFS-14646 > URL: https://issues.apache.org/jira/browse/HDFS-14646 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.1.2 >Reporter: Xudong Cao >Assignee: Xudong Cao >Priority: Major > Attachments: get1.png, get2.png, largeSendQ.png > > > *Problem Description:* > In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put > the image to all other NNs (whether the peer NN is an ANN or not), and even > if the peer NN immediately replies with an error (such as > TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult > .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put > process immediately, but will put the FsImage completely to the peer NN, and > will not read the peer NN's reply until the put is completed. > In a relatively large HDFS cluster, the size of FsImage can often reach about > 30G. In this case, this invalid put brings two problems: > # Wasting time and bandwidth. > # Since the ImageServlet of the peer NN no longer receives the FsImage, the > socket Send-Q of the local SNN is very large, and the ImageUpload thread will > be blocked in writing socket for a long time, eventually causing the local > StandbyCheckpointer thread often blocked for several hours. > *An example is as follows:* > In the following figure, the local NN 100.76.3.234 is a SNN, the peer NN > 100.76.3.170 is another SNN, and the 8080 is NN Http port. When the local SNN > starts to put FsImage, 170 will reply with a NOT_ACTIVE_NAMENODE_FAILURE > error immediately. In this case, the local SNN should terminate put > immediately, but in fact, local SNN has to wait until the image has been > completely put to peer NN,and then canl read the response. > # At this time, since the ImageServlet of the peer NN no longer receives the > FsImage, the socket Send-Q of the local SNN is very large: > > 2. Moreover, the local SNN's ImageUpload thread will be blocked in > writing socket for a long time: > !blockWriiting.png! > > 3. Eventually, the StandbyCheckpointer thread of local SNN is waiting > for the execution result of the ImageUpload thread, blocking in Future.get(), > and the blocking time may be as long as several hours: > !get1.png! > !get2.png! > > *Solution:* > When the local SNN is ready to put a FsImage to the peer NN, it need to test > whether he really need to put it at this time. The test process is: > # Establish an HTTP connection with the peer NN, send a put request, and > then immediately read the response (this is the key point). If the peer NN > replies with any of the following errors > (TransferResult.AUTHENTICATION_FAILURE, > TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult. > # If the peer NN is truly the ANN and can receive the FsImage normally, it > will reply to the local SNN with an HTTP response 410 > (HttpServletResponse.SC_GONE, which is TransferResult.UNEXPECTED_FAILURE). At > this time, the local SNN can really begin to put the image. > *Note:* > This problem needs to be reproduced in a large cluster (the size of FsImage > in our cluster is about 30G). Therefore, unit testing is difficult to write. > In our real cluster, after the modification, the problem has been solved. > There is no such thing as a large backlog of Send-Q. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14646) Standby NameNode should terminate the FsImage put process as soon as possible if the peer NN is not in the appropriate state to receive an image.
[ https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xudong Cao updated HDFS-14646: -- Attachment: blockedInWritingSocket.png > Standby NameNode should terminate the FsImage put process as soon as possible > if the peer NN is not in the appropriate state to receive an image. > - > > Key: HDFS-14646 > URL: https://issues.apache.org/jira/browse/HDFS-14646 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.1.2 >Reporter: Xudong Cao >Assignee: Xudong Cao >Priority: Major > Attachments: blockedInWritingSocket.png, get1.png, get2.png, > largeSendQ.png > > > *Problem Description:* > In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put > the image to all other NNs (whether the peer NN is an ANN or not), and even > if the peer NN immediately replies with an error (such as > TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult > .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put > process immediately, but will put the FsImage completely to the peer NN, and > will not read the peer NN's reply until the put is completed. > In a relatively large HDFS cluster, the size of FsImage can often reach about > 30G. In this case, this invalid put brings two problems: > # Wasting time and bandwidth. > # Since the ImageServlet of the peer NN no longer receives the FsImage, the > socket Send-Q of the local SNN is very large, and the ImageUpload thread will > be blocked in writing socket for a long time, eventually causing the local > StandbyCheckpointer thread often blocked for several hours. > *An example is as follows:* > In the following figure, the local NN 100.76.3.234 is a SNN, the peer NN > 100.76.3.170 is another SNN, and the 8080 is NN Http port. When the local SNN > starts to put FsImage, 170 will reply with a NOT_ACTIVE_NAMENODE_FAILURE > error immediately. In this case, the local SNN should terminate put > immediately, but in fact, local SNN has to wait until the image has been > completely put to peer NN,and then canl read the response. > # At this time, since the ImageServlet of the peer NN no longer receives the > FsImage, the socket Send-Q of the local SNN is very large: > > 2. Moreover, the local SNN's ImageUpload thread will be blocked in > writing socket for a long time: > !blockWriiting.png! > > 3. Eventually, the StandbyCheckpointer thread of local SNN is waiting > for the execution result of the ImageUpload thread, blocking in Future.get(), > and the blocking time may be as long as several hours: > !get1.png! > !get2.png! > > *Solution:* > When the local SNN is ready to put a FsImage to the peer NN, it need to test > whether he really need to put it at this time. The test process is: > # Establish an HTTP connection with the peer NN, send a put request, and > then immediately read the response (this is the key point). If the peer NN > replies with any of the following errors > (TransferResult.AUTHENTICATION_FAILURE, > TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult. > # If the peer NN is truly the ANN and can receive the FsImage normally, it > will reply to the local SNN with an HTTP response 410 > (HttpServletResponse.SC_GONE, which is TransferResult.UNEXPECTED_FAILURE). At > this time, the local SNN can really begin to put the image. > *Note:* > This problem needs to be reproduced in a large cluster (the size of FsImage > in our cluster is about 30G). Therefore, unit testing is difficult to write. > In our real cluster, after the modification, the problem has been solved. > There is no such thing as a large backlog of Send-Q. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14646) Standby NameNode should terminate the FsImage put process as soon as possible if the peer NN is not in the appropriate state to receive an image.
[ https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xudong Cao updated HDFS-14646: -- Description: *Problem Description:* In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put the image to all other NNs (whether the peer NN is an ANN or not), and even if the peer NN immediately replies with an error (such as TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put process immediately, but will put the FsImage completely to the peer NN, and will not read the peer NN's reply until the put is completed. In a relatively large HDFS cluster, the size of FsImage can often reach about 30GB. In this case, this invalid put brings two problems: # Wasting time and bandwidth. # Since the ImageServlet of the peer NN no longer receives the FsImage, the socket Send-Q of the local SNN is very large, and the ImageUpload thread will be blocked in writing socket for a long time, eventually causing the local StandbyCheckpointer thread often blocked for several hours. *An example is as follows:* In the following figure, the local NN 100.76.3.234 is a SNN, the peer NN 100.76.3.170 is another SNN, and the 8080 is NN Http port. When the local SNN starts to put the FsImage, 170 will reply with a NOT_ACTIVE_NAMENODE_FAILURE error immediately. In this case, the local SNN should terminate put immediately, but in fact, local SNN has to wait until the image has been completely put to the peer NN,and then can read the response. # At this time, since the ImageServlet of the peer NN no longer receives the FsImage, the socket Send-Q of the local SNN is very large: !largeSendQ.png! 2. Moreover, the local SNN's ImageUpload thread will be blocked in writing socket for a long time: !blockedInWritingSocket.png! . 3. Eventually, the StandbyCheckpointer thread of local SNN is waiting for the execution result of the ImageUpload thread, blocking in Future.get(), and the blocking time may be as long as several hours: !get1.png! !get2.png! *Solution:* When the local SNN plans to put a FsImage to the peer NN, it need to test whether he really need to put it at this time. The test process is: # Establish an HTTP connection with the peer NN, send the put request, and then immediately read the response (this is the key point). If the peer NN replies any of the following errors (TransferResult.AUTHENTICATION_FAILURE, TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult. OLD_TRANSACTION_ID_FAILURE), immediately terminate the put process. # If the peer NN is indeed the Active NameNode AND it's now in the appropriate state to receive an image, it will reply an HTTP response 410 (HttpServletResponse.SC_GONE, which is TransferResult.UNEXPECTED_FAILURE). At this time, the local SNN can really begin to put the image. *Note:* This problem needs to be reproduced in a large cluster (the size of FsImage in our cluster is about 30GB). Therefore, unit testing is difficult to write. In our cluster, after the modification, the problem has been solved and there is no such thing as a large backlog of Send-Q. was: *Problem Description:* In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put the image to all other NNs (whether the peer NN is an ANN or not), and even if the peer NN immediately replies with an error (such as TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put process immediately, but will put the FsImage completely to the peer NN, and will not read the peer NN's reply until the put is completed. In a relatively large HDFS cluster, the size of FsImage can often reach about 30GB. In this case, this invalid put brings two problems: # Wasting time and bandwidth. # Since the ImageServlet of the peer NN no longer receives the FsImage, the socket Send-Q of the local SNN is very large, and the ImageUpload thread will be blocked in writing socket for a long time, eventually causing the local StandbyCheckpointer thread often blocked for several hours. *An example is as follows:* In the following figure, the local NN 100.76.3.234 is a SNN, the peer NN 100.76.3.170 is another SNN, and the 8080 is NN Http port. When the local SNN starts to put the FsImage, 170 will reply with a NOT_ACTIVE_NAMENODE_FAILURE error immediately. In this case, the local SNN should terminate put immediately, but in fact, local SNN has to wait until the image has been completely put to the peer NN,and then can read the response. # At this time, since the ImageServlet of the peer NN no longer receives the FsImage, the socket Send-Q of the local SNN is very large: !largeSendQ.png! 2. Moreover, the local SNN's
[jira] [Updated] (HDFS-14646) Standby NameNode should terminate the FsImage put process as soon as possible if the peer NN is not in the appropriate state to receive an image.
[ https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xudong Cao updated HDFS-14646: -- Description: *Problem Description:* In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put the image to all other NNs (whether the peer NN is an ANN or not), and even if the peer NN immediately replies with an error (such as TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put process immediately, but will put the FsImage completely to the peer NN, and will not read the peer NN's reply until the put is completed. In a relatively large HDFS cluster, the size of FsImage can often reach about 30GB. In this case, this invalid put brings two problems: # Wasting time and bandwidth. # Since the ImageServlet of the peer NN no longer receives the FsImage, the socket Send-Q of the local SNN is very large, and the ImageUpload thread will be blocked in writing socket for a long time, eventually causing the local StandbyCheckpointer thread often blocked for several hours. *An example is as follows:* In the following figure, the local NN 100.76.3.234 is a SNN, the peer NN 100.76.3.170 is another SNN, and the 8080 is NN Http port. When the local SNN starts to put the FsImage, 170 will reply with a NOT_ACTIVE_NAMENODE_FAILURE error immediately. In this case, the local SNN should terminate put immediately, but in fact, local SNN has to wait until the image has been completely put to the peer NN,and then can read the response. # At this time, since the ImageServlet of the peer NN no longer receives the FsImage, the socket Send-Q of the local SNN is very large: !largeSendQ.png! 2. Moreover, the local SNN's ImageUpload thread will be blocked in writing socket for a long time: !blockedInWritingSocket.png! . 3. Eventually, the StandbyCheckpointer thread of local SNN is waiting for the execution result of the ImageUpload thread, blocking in Future.get(), and the blocking time may be as long as several hours: !get1.png! !get2.png! *Solution:* When the local SNN plans to put a FsImage to the peer NN, it need to test whether he really need to put it at this time. The test process is: # Establish an HTTP connection with the peer NN, send the put request, and then immediately read the response (this is the key point). If the peer NN replies with any of the following errors (TransferResult.AUTHENTICATION_FAILURE, TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult. OLD_TRANSACTION_ID_FAILURE), immediately terminate the put process. # If the peer NN is indeed the Active NameNode AND it's now in the appropriate state to receive an image, it will reply with an HTTP response 410 (HttpServletResponse.SC_GONE, which is TransferResult.UNEXPECTED_FAILURE). At this time, the local SNN can really begin to put the image. *Note:* This problem needs to be reproduced in a large cluster (the size of FsImage in our cluster is about 30GB). Therefore, unit testing is difficult to write. In our cluster, after the modification, the problem has been solved and there is no such thing as a large backlog of Send-Q. was: *Problem Description:* In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put the image to all other NNs (whether the peer NN is an ANN or not), and even if the peer NN immediately replies with an error (such as TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put process immediately, but will put the FsImage completely to the peer NN, and will not read the peer NN's reply until the put is completed. In a relatively large HDFS cluster, the size of FsImage can often reach about 30GB. In this case, this invalid put brings two problems: # Wasting time and bandwidth. # Since the ImageServlet of the peer NN no longer receives the FsImage, the socket Send-Q of the local SNN is very large, and the ImageUpload thread will be blocked in writing socket for a long time, eventually causing the local StandbyCheckpointer thread often blocked for several hours. *An example is as follows:* In the following figure, the local NN 100.76.3.234 is a SNN, the peer NN 100.76.3.170 is another SNN, and the 8080 is NN Http port. When the local SNN starts to put the FsImage, 170 will reply with a NOT_ACTIVE_NAMENODE_FAILURE error immediately. In this case, the local SNN should terminate put immediately, but in fact, local SNN has to wait until the image has been completely put to the peer NN,and then can read the response. # At this time, since the ImageServlet of the peer NN no longer receives the FsImage, the socket Send-Q of the local SNN is very large: !largeSendQ.png! 2. Moreover, the
[jira] [Created] (HDFS-14646) Standby NameNode should terminate the FsImage put process as soon as possible if the peer NN is not in the appropriate state to receive an image.
Xudong Cao created HDFS-14646: - Summary: Standby NameNode should terminate the FsImage put process as soon as possible if the peer NN is not in the appropriate state to receive an image. Key: HDFS-14646 URL: https://issues.apache.org/jira/browse/HDFS-14646 Project: Hadoop HDFS Issue Type: Bug Components: hdfs Affects Versions: 3.1.2 Reporter: Xudong Cao Assignee: Xudong Cao Attachments: blockWriiting.png, get1.png, get2.png, largeSendQ.png *Problem Description:* In multi-NameNode scenario, when an SNN uploads a FsImage, it will put the image to all other NNs (whether the peer NN is an ANN or not), and even if the peer NN immediately replies with an error (such as TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put process immediately, but will put the FsImage completely to the peer NN, and will not read the peer NN's reply until the put is completed. In a relatively large HDFS cluster, the size of FsImage can often reach about 30G. In this case, this invalid put brings two problems: 1. Wasting time and bandwidth. 2. Since the ImageServlet of the peer NN no longer receives the FsImage, the socket Send-Q of the local SNN is very large, and the ImageUpload thread will be blocked in writting socket for a long time, eventually causing the local StandbyCheckpointer thread often blocked for several hours. *An example is as follows:* In the following figure, the local NN 100.76.3.234 is an SNN, the peer NN 100.76.3.170 is another SNN, and the 8080 is NN Http port. When the local SNN starts to put FsImage, 170 will reply with a NOT_ACTIVE_NAMENODE_FAILURE error immediately. In this case, local SNN should terminate put immediately, but in fact, local SNN has to wait until the image has been completely put to peer NN,and then canl read the response. # At this time, since the ImageServlet of the peer NN no longer receives the FsImage, the socket Send-Q of the local SNN is very large: !largeSendQ.png! 2. Moreover, the local SNN's ImageUpload thread will be blocked in writing socket for a long time: !blockWriiting.png! 3. Eventually, the StandbyCheckpointer thread of local SNN is waiting for the execution result of the ImageUpload thread, blocking in Future.get(), and the blocking time may be as long as several hours: !get1.png! !get2.png! *Solution:* When the local SNN is ready to put a FsImage to the peer NN, it need to test whether he really need to put it at this time. The test process is: # Establish an HTTP connection with the peer NN, send a put request, and then immediately read the response (this is the key point). If the peer NN replies with any of the following errors (TransferResult.AUTHENTICATION_FAILURE, TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult. # If the peer NN is truly the ANN and can receive the FsImage normally, it will reply to the local SNN with an HTTP response 410 (HttpServletResponse.SC_GONE, which is TransferResult.UNEXPECTED_FAILURE). At this time, the local SNN can really begin to put the image. *Note:* This problem needs to be reproduced in a large cluster (the size of FsImage in our cluster is about 30G). Therefore, unit testing is difficult to write. In our real cluster, after the modification, the problem has been solved. There is no such thing as a large backlog of Send-Q. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14646) Standby NameNode should terminate the FsImage put process immediately if the peer NN is not in the appropriate state to receive an image.
[ https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xudong Cao updated HDFS-14646: -- Summary: Standby NameNode should terminate the FsImage put process immediately if the peer NN is not in the appropriate state to receive an image. (was: Standby NameNode should terminate the FsImage put process as soon as possible if the peer NN is not in the appropriate state to receive an image.) > Standby NameNode should terminate the FsImage put process immediately if the > peer NN is not in the appropriate state to receive an image. > - > > Key: HDFS-14646 > URL: https://issues.apache.org/jira/browse/HDFS-14646 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.1.2 >Reporter: Xudong Cao >Assignee: Xudong Cao >Priority: Major > Attachments: blockedInWritingSocket.png, get1.png, get2.png, > largeSendQ.png > > > *Problem Description:* > In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put > the image to all other NNs (whether the peer NN is an ANN or not), and even > if the peer NN immediately replies with an error (such as > TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult > .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put > process immediately, but will put the FsImage completely to the peer NN, and > will not read the peer NN's reply until the put is completed. > In a relatively large HDFS cluster, the size of FsImage can often reach about > 30GB. In this case, this invalid put brings two problems: > # Wasting time and bandwidth. > # Since the ImageServlet of the peer NN no longer receives the FsImage, the > socket Send-Q of the local SNN is very large, and the ImageUpload thread will > be blocked in writing socket for a long time, eventually causing the local > StandbyCheckpointer thread often blocked for several hours. > *An example is as follows:* > In the following figure, the local NN 100.76.3.234 is a SNN, the peer NN > 100.76.3.170 is another SNN, and the 8080 is NN Http port. When the local SNN > starts to put the FsImage, 170 will reply with a NOT_ACTIVE_NAMENODE_FAILURE > error immediately. In this case, the local SNN should terminate put > immediately, but in fact, local SNN has to wait until the image has been > completely put to the peer NN,and then can read the response. > # At this time, since the ImageServlet of the peer NN no longer receives the > FsImage, the socket Send-Q of the local SNN is very large: > !largeSendQ.png! > 2. Moreover, the local SNN's ImageUpload thread will be blocked in > writing socket for a long time: > !blockedInWritingSocket.png! . > > 3. Eventually, the StandbyCheckpointer thread of local SNN is waiting > for the execution result of the ImageUpload thread, blocking in Future.get(), > and the blocking time may be as long as several hours: > !get1.png! > > !get2.png! > > > *Solution:* > When the local SNN plans to put a FsImage to the peer NN, it need to test > whether he really need to put it at this time. The test process is: > # Establish an HTTP connection with the peer NN, send the put request, and > then immediately read the response (this is the key point). If the peer NN > replies any of the following errors (TransferResult.AUTHENTICATION_FAILURE, > TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult. > OLD_TRANSACTION_ID_FAILURE), immediately terminate the put process. > # If the peer NN is indeed the Active NameNode AND it's now in the > appropriate state to receive an image, it will reply an HTTP response 410 > (HttpServletResponse.SC_GONE, which is TransferResult.UNEXPECTED_FAILURE). At > this time, the local SNN can really begin to put the image. > *Note:* > This problem needs to be reproduced in a large cluster (the size of FsImage > in our cluster is about 30GB). Therefore, unit testing is difficult to write. > In our cluster, after the modification, the problem has been solved and there > is no such thing as a large backlog of Send-Q. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-14646) Standby NameNode should terminate the FsImage put process immediately if the peer NN is not in the appropriate state to receive an image.
[ https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16885107#comment-16885107 ] Xudong Cao edited comment on HDFS-14646 at 7/15/19 11:35 AM: - *Test Result:* in a 3 nodes HDFS, ubuntu1 (ANN)、ubuntu2 (SNN) and ubuntu3(SNN), the uploading log in ubuntu2 and ubuntu3 is as follows: was (Author: xudongcao): *Test Result:* > Standby NameNode should terminate the FsImage put process immediately if the > peer NN is not in the appropriate state to receive an image. > - > > Key: HDFS-14646 > URL: https://issues.apache.org/jira/browse/HDFS-14646 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.1.2 >Reporter: Xudong Cao >Assignee: Xudong Cao >Priority: Major > Attachments: blockedInWritingSocket.png, get1.png, get2.png, > largeSendQ.png > > > *Problem Description:* > In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put > the image to all other NNs (whether the peer NN is an ANN or not), and even > if the peer NN immediately replies with an error (such as > TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult > .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put > process immediately, but will put the FsImage completely to the peer NN, and > will not read the peer NN's reply until the put is completed. > In a relatively large HDFS cluster, the size of FsImage can often reach about > 30GB. In this case, this invalid put brings two problems: > # Wasting time and bandwidth. > # Since the ImageServlet of the peer NN no longer receives the FsImage, the > socket Send-Q of the local SNN is very large, and the ImageUpload thread will > be blocked in writing socket for a long time, eventually causing the local > StandbyCheckpointer thread often blocked for several hours. > *An example is as follows:* > In the following figure, the local NN 100.76.3.234 is a SNN, the peer NN > 100.76.3.170 is another SNN, and the 8080 is NN Http port. When the local SNN > starts to put the FsImage, 170 will reply with a NOT_ACTIVE_NAMENODE_FAILURE > error immediately. In this case, the local SNN should terminate put > immediately, but in fact, local SNN has to wait until the image has been > completely put to the peer NN,and then can read the response. > # At this time, since the ImageServlet of the peer NN no longer receives the > FsImage, the socket Send-Q of the local SNN is very large: > !largeSendQ.png! > 2. Moreover, the local SNN's ImageUpload thread will be blocked in > writing socket for a long time: > !blockedInWritingSocket.png! . > > 3. Eventually, the StandbyCheckpointer thread of local SNN is waiting > for the execution result of the ImageUpload thread, blocking in Future.get(), > and the blocking time may be as long as several hours: > !get1.png! > > !get2.png! > > > *Solution:* > When the local SNN plans to put a FsImage to the peer NN, it need to test > whether he really need to put it at this time. The test process is: > # Establish an HTTP connection with the peer NN, send the put request, and > then immediately read the response (this is the key point). If the peer NN > replies any of the following errors (TransferResult.AUTHENTICATION_FAILURE, > TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult. > OLD_TRANSACTION_ID_FAILURE), immediately terminate the put process. > # If the peer NN is indeed the Active NameNode AND it's now in the > appropriate state to receive an image, it will reply an HTTP response 410 > (HttpServletResponse.SC_GONE, which is TransferResult.UNEXPECTED_FAILURE). At > this time, the local SNN can really begin to put the image. > *Note:* > This problem needs to be reproduced in a large cluster (the size of FsImage > in our cluster is about 30GB). Therefore, unit testing is difficult to write. > In our cluster, after the modification, the problem has been solved and there > is no such thing as a large backlog of Send-Q. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14646) Standby NameNode should terminate the FsImage put process immediately if the peer NN is not in the appropriate state to receive an image.
[ https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16885107#comment-16885107 ] Xudong Cao commented on HDFS-14646: --- *Test Result:* > Standby NameNode should terminate the FsImage put process immediately if the > peer NN is not in the appropriate state to receive an image. > - > > Key: HDFS-14646 > URL: https://issues.apache.org/jira/browse/HDFS-14646 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.1.2 >Reporter: Xudong Cao >Assignee: Xudong Cao >Priority: Major > Attachments: blockedInWritingSocket.png, get1.png, get2.png, > largeSendQ.png > > > *Problem Description:* > In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put > the image to all other NNs (whether the peer NN is an ANN or not), and even > if the peer NN immediately replies with an error (such as > TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult > .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put > process immediately, but will put the FsImage completely to the peer NN, and > will not read the peer NN's reply until the put is completed. > In a relatively large HDFS cluster, the size of FsImage can often reach about > 30GB. In this case, this invalid put brings two problems: > # Wasting time and bandwidth. > # Since the ImageServlet of the peer NN no longer receives the FsImage, the > socket Send-Q of the local SNN is very large, and the ImageUpload thread will > be blocked in writing socket for a long time, eventually causing the local > StandbyCheckpointer thread often blocked for several hours. > *An example is as follows:* > In the following figure, the local NN 100.76.3.234 is a SNN, the peer NN > 100.76.3.170 is another SNN, and the 8080 is NN Http port. When the local SNN > starts to put the FsImage, 170 will reply with a NOT_ACTIVE_NAMENODE_FAILURE > error immediately. In this case, the local SNN should terminate put > immediately, but in fact, local SNN has to wait until the image has been > completely put to the peer NN,and then can read the response. > # At this time, since the ImageServlet of the peer NN no longer receives the > FsImage, the socket Send-Q of the local SNN is very large: > !largeSendQ.png! > 2. Moreover, the local SNN's ImageUpload thread will be blocked in > writing socket for a long time: > !blockedInWritingSocket.png! . > > 3. Eventually, the StandbyCheckpointer thread of local SNN is waiting > for the execution result of the ImageUpload thread, blocking in Future.get(), > and the blocking time may be as long as several hours: > !get1.png! > > !get2.png! > > > *Solution:* > When the local SNN plans to put a FsImage to the peer NN, it need to test > whether he really need to put it at this time. The test process is: > # Establish an HTTP connection with the peer NN, send the put request, and > then immediately read the response (this is the key point). If the peer NN > replies any of the following errors (TransferResult.AUTHENTICATION_FAILURE, > TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult. > OLD_TRANSACTION_ID_FAILURE), immediately terminate the put process. > # If the peer NN is indeed the Active NameNode AND it's now in the > appropriate state to receive an image, it will reply an HTTP response 410 > (HttpServletResponse.SC_GONE, which is TransferResult.UNEXPECTED_FAILURE). At > this time, the local SNN can really begin to put the image. > *Note:* > This problem needs to be reproduced in a large cluster (the size of FsImage > in our cluster is about 30GB). Therefore, unit testing is difficult to write. > In our cluster, after the modification, the problem has been solved and there > is no such thing as a large backlog of Send-Q. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-14646) Standby NameNode should terminate the FsImage put process immediately if the peer NN is not in the appropriate state to receive an image.
[ https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16885107#comment-16885107 ] Xudong Cao edited comment on HDFS-14646 at 7/15/19 11:38 AM: - *Test Result:* in a 3 nodes HDFS: ubuntu1 (ANN) +ubuntu2 (SNN) + ubuntu3(SNN), the uploading log in ubuntu2 and ubuntu3 is as follows: 1. SNN ubuntu2: {code:java} root@ubuntu2:~/hadoop-3.3.0-SNAPSHOT/logs# grep "Uploaded" hadoop-root-namenode-ubuntu2.log 2019-07-16 01:52:24,801 INFO org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with txid 9573 to namenode at http://ubuntu1:9870 in 0.178 seconds 2019-07-16 01:53:24,912 INFO org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with txid 9759 to namenode at http://ubuntu1:9870 in 0.041 seconds 2019-07-16 01:54:25,051 INFO org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with txid 9777 to namenode at http://ubuntu1:9870 in 0.075 seconds 2019-07-16 01:55:25,147 INFO org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with txid 9961 to namenode at http://ubuntu1:9870 in 0.031 seconds 2019-07-16 01:56:25,253 INFO org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with txid 9981 to namenode at http://ubuntu1:9870 in 0.054 seconds 2019-07-16 01:57:25,323 INFO org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with txid 10171 to namenode at http://ubuntu1:9870 in 0.033 seconds 2019-07-16 01:58:25,388 INFO org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with txid 10191 to namenode at http://ubuntu1:9870 in 0.032 seconds 2019-07-16 01:59:25,479 INFO org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with txid 10383 to namenode at http://ubuntu1:9870 in 0.046 seconds{code} 2. another SNN ubuntu3: {code:java} root@ubuntu3:~/hadoop-3.3.0-SNAPSHOT/logs# grep "Uploaded" hadoop-root-namenode-ubuntu3.log 2019-07-16 02:00:34,767 INFO org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with txid 10401 to namenode at http://ubuntu1:9870 in 0.028 seconds 2019-07-16 02:02:34,851 INFO org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with txid 10603 to namenode at http://ubuntu1:9870 in 0.03 seconds 2019-07-16 02:04:34,938 INFO org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with txid 10807 to namenode at http://ubuntu1:9870 in 0.033 seconds 2019-07-16 02:06:35,021 INFO org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with txid 11013 to namenode at http://ubuntu1:9870 in 0.041 seconds 2019-07-16 02:08:35,094 INFO org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with txid 11217 to namenode at http://ubuntu1:9870 in 0.029 seconds 2019-07-16 02:10:35,200 INFO org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with txid 11423 to namenode at http://ubuntu1:9870 in 0.032 seconds 2019-07-16 02:12:35,285 INFO org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with txid 11629 to namenode at http://ubuntu1:9870 in 0.026 seconds 2019-07-16 02:14:35,357 INFO org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with txid 11835 to namenode at http://ubuntu1:9870 in 0.023 seconds 2019-07-16 02:16:35,442 INFO org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with txid 12035 to namenode at http://ubuntu1:9870 in 0.042 seconds 2019-07-16 02:18:35,515 INFO org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with txid 12233 to namenode at http://ubuntu1:9870 in 0.031 seconds 2019-07-16 02:20:35,605 INFO org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with txid 12441 to namenode at http://ubuntu1:9870 in 0.033 seconds 2019-07-16 02:22:35,675 INFO org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with txid 12647 to namenode at http://ubuntu1:9870 in 0.029 seconds 2019-07-16 02:24:35,771 INFO org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with txid 12853 to namenode at http://ubuntu1:9870 in 0.041 seconds{code} was (Author: xudongcao): *Test Result:* in a 3 nodes HDFS, ubuntu1 (ANN)、ubuntu2 (SNN) and ubuntu3(SNN), the uploading log in ubuntu2 and ubuntu3 is as follows: > Standby NameNode should terminate the FsImage put process immediately if the > peer NN is not in the appropriate state to receive an image. > - > > Key: HDFS-14646 > URL: https://issues.apache.org/jira/browse/HDFS-14646 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.1.2 >Reporter: Xudong Cao >Assignee: Xudong Cao
[jira] [Comment Edited] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.
[ https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909018#comment-16909018 ] Xudong Cao edited comment on HDFS-14646 at 8/16/19 12:33 PM: - [~elgoiri] [~xkrogen] [~hexiaoqiao] [~jojochuang]This problem has different behaviors under different Hadoop versions. I corrected the problem description and submitted a new patch. please review it if you have time, thank you! was (Author: xudongcao): [~elgoiri] [~xkrogen] [~hexiaoqiao] This problem has different behaviors under different Hadoop versions. I corrected the problem description and submitted a new patch. please review it if you have time, thank you! > Standby NameNode should not upload fsimage to an inappropriate NameNode. > > > Key: HDFS-14646 > URL: https://issues.apache.org/jira/browse/HDFS-14646 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.1.2 >Reporter: Xudong Cao >Assignee: Xudong Cao >Priority: Major > Attachments: HDFS-14646.000.patch > > > *Problem Description:* > In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put > the image to all other NNs (whether the peer NN is an ANN or not), and even > if the peer NN immediately replies an error (such as > TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult > .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put > process immediately, but will put the FsImage completely to the peer NN, and > will not read the peer NN's reply until the put is completed. > Depending on the version of Jetty, this behavior can lead to different > consequences, I tested it under 2.7.2 and trunk version. > *1. In Hadoop 2.7.2 (with Jetty 6.1.26)* > After peer NN called HttpServletResponse.sendError(), the underlying TCP > connection will still be established, and the data SNN sent will be read by > Jetty framework itself in the peer NN side, so the SNN will still > insignificantly send the FsImage to the peer NN continuously, causing a waste > of time and bandwidth. In a relatively large HDFS cluster, the size of > FsImage can often reach about 30GB, This is indeed a big waste. > *2.In trunk version (with Jetty 9.3.27)* > After peer NN called HttpServletResponse.sendError(), the underlying TCP > connection will be auto closed, and then SNN will directly get an "Error > writing request body to server" exception, as below: > {code:java} > 2019-08-17 03:59:25,413 INFO namenode.TransferFsImage: Sending fileName: > /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: > 9864721. Sent total: 524288 bytes. Size of last segment intended to send: > 4096 bytes. > java.io.IOException: Error writing request body to server > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:314) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:249) > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:277) > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:272) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2019-08-17 03:59:25,422 INFO namenode.TransferFsImage: Sending fileName: > /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: > 9864721. Sent total: 851968 bytes. Size of last segment intended to send: > 4096 bytes. > java.io.IOException: Error writing request body to server > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340) > {code} > > *Solution:* > A standby NameNode should not upload fsimage
[jira] [Comment Edited] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.
[ https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909018#comment-16909018 ] Xudong Cao edited comment on HDFS-14646 at 8/16/19 2:42 PM: [~elgoiri] [~xkrogen] [~hexiaoqiao] [~jojochuang] This problem seems having different behaviors in different Hadoop versions. I have corrected the problem description and submitted a new patch, please review it if you have time, thank you! was (Author: xudongcao): [~elgoiri] [~xkrogen] [~hexiaoqiao] [~jojochuang]This problem has different behaviors under different Hadoop versions. I corrected the problem description and submitted a new patch. please review it if you have time, thank you! > Standby NameNode should not upload fsimage to an inappropriate NameNode. > > > Key: HDFS-14646 > URL: https://issues.apache.org/jira/browse/HDFS-14646 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.1.2 >Reporter: Xudong Cao >Assignee: Xudong Cao >Priority: Major > Attachments: HDFS-14646.000.patch, HDFS-14646.001.patch > > > *Problem Description:* > In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put > the image to all other NNs (whether the peer NN is an ANN or not), and even > if the peer NN immediately replies an error (such as > TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult > .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put > process immediately, but will put the FsImage completely to the peer NN, and > will not read the peer NN's reply until the put is completed. > Depending on the version of Jetty, this behavior can lead to different > consequences, I tested it under 2.7.2 and trunk version. > *1. In Hadoop 2.7.2 (with Jetty 6.1.26)* > After peer NN called HttpServletResponse.sendError(), the underlying TCP > connection will still be established, and the data SNN sent will be read by > Jetty framework itself in the peer NN side, so the SNN will still > insignificantly send the FsImage to the peer NN continuously, causing a waste > of time and bandwidth. In a relatively large HDFS cluster, the size of > FsImage can often reach about 30GB, This is indeed a big waste. > *2.In trunk version (with Jetty 9.3.27)* > After peer NN called HttpServletResponse.sendError(), the underlying TCP > connection will be auto closed, and then SNN will directly get an "Error > writing request body to server" exception, as below: > {code:java} > 2019-08-17 03:59:25,413 INFO namenode.TransferFsImage: Sending fileName: > /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: > 9864721. Sent total: 524288 bytes. Size of last segment intended to send: > 4096 bytes. > java.io.IOException: Error writing request body to server > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:314) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:249) > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:277) > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:272) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2019-08-17 03:59:25,422 INFO namenode.TransferFsImage: Sending fileName: > /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: > 9864721. Sent total: 851968 bytes. Size of last segment intended to send: > 4096 bytes. > java.io.IOException: Error writing request body to server > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340) > {code} > > *Solution:* >
[jira] [Updated] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.
[ https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xudong Cao updated HDFS-14646: -- Description: *Problem Description:* In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put the image to all other NNs (whether the peer NN is an ANN or not), and even if the peer NN immediately replies an error (such as TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put process immediately, but will put the FsImage completely to the peer NN, and will not read the peer NN's reply until the put is completed. Depending on the version of Jetty, this behavior can lead to different consequences, I tested it under 2.7.2 and trunk version. *1.In Hadoop 2.7.2 (with Jetty 6.1.26)* After peer NN called HttpServletResponse.sendError(), the underlying TCP connection will still be established, and the data SNN sent will be read by Jetty framework itself in the peer NN side, so the SNN will insignificantly send the FsImage to the peer NN continuously, causing a waste of time and bandwidth. In a relatively large HDFS cluster, the size of FsImage can often reach about 30GB, This is indeed a big waste. *2.In trunk version (with Jetty 9.3.27)* After peer NN called HttpServletResponse.sendError(), the underlying TCP connection will be auto closed, and then SNN will directly get an "Error writing request body to server" exception, as below: {code:java} 2019-08-17 03:59:25,413 INFO namenode.TransferFsImage: Sending fileName: /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 9864721. Sent total: 524288 bytes. Size of last segment intended to send: 4096 bytes. java.io.IOException: Error writing request body to server at sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) at sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570) at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396) at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340) at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:314) at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:249) at org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:277) at org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:272) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 2019-08-17 03:59:25,422 INFO namenode.TransferFsImage: Sending fileName: /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 9864721. Sent total: 851968 bytes. Size of last segment intended to send: 4096 bytes. java.io.IOException: Error writing request body to server at sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) at sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570) at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396) at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340) {code} *Solution:* A standby NameNode should not upload fsimage to an inappropriate NameNode, when he plans to put a FsImage to the peer NN, he need to check whether he really need to put it at this time. was: *Problem Description:* In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put the image to all other NNs (whether the peer NN is an ANN or not), and even if the peer NN immediately replies an error (such as TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put process immediately, but will put the FsImage completely to the peer NN, and will not read the peer NN's reply until the put is completed. Depending on the version of Jetty, this behavior can lead to different consequences, I tested it under 2.7.2 and trunk version. *1.In Hadoop 2.7.2 (with Jetty 6.1.26)* After peer NN called HttpServletResponse.sendError(), the underlying TCP connection will still be established, and the data SNN sent will be read by Jetty framework itself in the peer NN side, so the SNN will insignificantly send the FsImage to the peer NN continuously, causing a waste of time and bandwidth. In a relatively large HDFS cluster, the size of FsImage
[jira] [Updated] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.
[ https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xudong Cao updated HDFS-14646: -- Description: *Problem Description:* In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put the image to all other NNs (whether the peer NN is an ANN or not), and even if the peer NN immediately replies an error (such as TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put process immediately, but will put the FsImage completely to the peer NN, and will not read the peer NN's reply until the put is completed. Depending on the version of Jetty, this behavior can lead to different consequences, I tested it under 2.7.2 and trunk version. *1.In Hadoop 2.7.2 (with Jetty 6.1.26)* After peer NN called HttpServletResponse.sendError(), the underlying TCP connection will still be established, and the data SNN sent will be read by Jetty framework itself in the peer NN side, so the SNN will insignificantly send the FsImage to the peer NN continuously, causing a waste of time and bandwidth. In a relatively large HDFS cluster, the size of FsImage can often reach about 30GB, This is indeed a big waste. *2.In trunk version (with Jetty 9.3.27)* After peer NN called HttpServletResponse.sendError(), the underlying TCP connection will be auto closed, and then SNN will directly get an "Error writing request body to server" exception, as below: {code:java} 2019-08-17 03:59:25,413 INFO namenode.TransferFsImage: Sending fileName: /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 9864721. Sent total: 524288 bytes. Size of last segment intended to send: 4096 bytes. java.io.IOException: Error writing request body to server at sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) at sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570) at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396) at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340) at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:314) at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:249) at org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:277) at org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:272) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 2019-08-17 03:59:25,422 INFO namenode.TransferFsImage: Sending fileName: /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 9864721. Sent total: 851968 bytes. Size of last segment intended to send: 4096 bytes. java.io.IOException: Error writing request body to server at sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) at sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570) at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396) at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340) {code} *Solution:* A standby NameNode should not upload fsimage to an inappropriate NameNode, when he plans to put a FsImage to the peer NN, he need to check whether he really need to put it at this time. was: *Problem Description:* In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put the image to all other NNs (whether the peer NN is an ANN or not), and even if the peer NN immediately replies an error (such as TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put process immediately, but will put the FsImage completely to the peer NN, and will not read the peer NN's reply until the put is completed. Depending on the version of Jetty, this behavior can lead to different consequences, I tested it under 2.7.2 and trunk version. *1. In Hadoop 2.7.2 (with Jetty 6.1.26)* After peer NN called HttpServletResponse.sendError(), the underlying TCP connection will still be established, and the data SNN sent will be read by Jetty framework itself in the peer NN side, so the SNN will still insignificantly send the FsImage to the peer NN continuously, causing a waste of time and bandwidth. In a relatively large HDFS cluster, the size of
[jira] [Updated] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.
[ https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xudong Cao updated HDFS-14646: -- Status: Open (was: Patch Available) > Standby NameNode should not upload fsimage to an inappropriate NameNode. > > > Key: HDFS-14646 > URL: https://issues.apache.org/jira/browse/HDFS-14646 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.1.2 >Reporter: Xudong Cao >Assignee: Xudong Cao >Priority: Major > Attachments: HDFS-14646.000.patch > > > *Problem Description:* > In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put > the image to all other NNs (whether the peer NN is an ANN or not), and even > if the peer NN immediately replies an error (such as > TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult > .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put > process immediately, but will put the FsImage completely to the peer NN, and > will not read the peer NN's reply until the put is completed. > Depending on the version of Jetty, this behavior can lead to different > consequences, I tested it under 2.7.2 and trunk version. > *1.In Hadoop 2.7.2 (with Jetty 6.1.26)* > After peer NN called HttpServletResponse.sendError(), the underlying TCP > connection will still be established, and the data SNN sent will be read by > Jetty framework itself in the peer NN side, so the SNN will insignificantly > send the FsImage to the peer NN continuously, causing a waste of time and > bandwidth. In a relatively large HDFS cluster, the size of FsImage can often > reach about 30GB, This is indeed a big waste. > *2.In trunk version (with Jetty 9.3.27)* > After peer NN called HttpServletResponse.sendError(), the underlying TCP > connection will be auto closed, and then SNN will directly get an "Error > writing request body to server" exception, as below, note this test needs a > relatively big FSImage (e.g. 10MB level): > {code:java} > 2019-08-17 03:59:25,413 INFO namenode.TransferFsImage: Sending fileName: > /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: > 9864721. Sent total: 524288 bytes. Size of last segment intended to send: > 4096 bytes. > java.io.IOException: Error writing request body to server > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:314) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:249) > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:277) > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:272) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2019-08-17 03:59:25,422 INFO namenode.TransferFsImage: Sending fileName: > /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: > 9864721. Sent total: 851968 bytes. Size of last segment intended to send: > 4096 bytes. > java.io.IOException: Error writing request body to server > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340) > {code} > > *Solution:* > A standby NameNode should not upload fsimage to an inappropriate NameNode, > when he plans to put a FsImage to the peer NN, he need to check whether he > really need to put it at this time. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.
[ https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xudong Cao updated HDFS-14646: -- Attachment: (was: HDFS-14646.001.patch) > Standby NameNode should not upload fsimage to an inappropriate NameNode. > > > Key: HDFS-14646 > URL: https://issues.apache.org/jira/browse/HDFS-14646 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.1.2 >Reporter: Xudong Cao >Assignee: Xudong Cao >Priority: Major > Attachments: HDFS-14646.000.patch > > > *Problem Description:* > In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put > the image to all other NNs (whether the peer NN is an ANN or not), and even > if the peer NN immediately replies an error (such as > TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult > .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put > process immediately, but will put the FsImage completely to the peer NN, and > will not read the peer NN's reply until the put is completed. > Depending on the version of Jetty, this behavior can lead to different > consequences, I tested it under 2.7.2 and trunk version. > *1.In Hadoop 2.7.2 (with Jetty 6.1.26)* > After peer NN called HttpServletResponse.sendError(), the underlying TCP > connection will still be established, and the data SNN sent will be read by > Jetty framework itself in the peer NN side, so the SNN will insignificantly > send the FsImage to the peer NN continuously, causing a waste of time and > bandwidth. In a relatively large HDFS cluster, the size of FsImage can often > reach about 30GB, This is indeed a big waste. > *2.In trunk version (with Jetty 9.3.27)* > After peer NN called HttpServletResponse.sendError(), the underlying TCP > connection will be auto closed, and then SNN will directly get an "Error > writing request body to server" exception, as below, note this test needs a > relatively big FSImage (e.g. 10MB level): > {code:java} > 2019-08-17 03:59:25,413 INFO namenode.TransferFsImage: Sending fileName: > /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: > 9864721. Sent total: 524288 bytes. Size of last segment intended to send: > 4096 bytes. > java.io.IOException: Error writing request body to server > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:314) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:249) > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:277) > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:272) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2019-08-17 03:59:25,422 INFO namenode.TransferFsImage: Sending fileName: > /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: > 9864721. Sent total: 851968 bytes. Size of last segment intended to send: > 4096 bytes. > java.io.IOException: Error writing request body to server > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340) > {code} > > *Solution:* > A standby NameNode should not upload fsimage to an inappropriate NameNode, > when he plans to put a FsImage to the peer NN, he need to check whether he > really need to put it at this time. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.
[ https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xudong Cao updated HDFS-14646: -- Attachment: HDFS-14646.001.patch > Standby NameNode should not upload fsimage to an inappropriate NameNode. > > > Key: HDFS-14646 > URL: https://issues.apache.org/jira/browse/HDFS-14646 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.1.2 >Reporter: Xudong Cao >Assignee: Xudong Cao >Priority: Major > Attachments: HDFS-14646.000.patch, HDFS-14646.001.patch > > > *Problem Description:* > In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put > the image to all other NNs (whether the peer NN is an ANN or not), and even > if the peer NN immediately replies an error (such as > TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult > .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put > process immediately, but will put the FsImage completely to the peer NN, and > will not read the peer NN's reply until the put is completed. > Depending on the version of Jetty, this behavior can lead to different > consequences, I tested it under 2.7.2 and trunk version. > *1.In Hadoop 2.7.2 (with Jetty 6.1.26)* > After peer NN called HttpServletResponse.sendError(), the underlying TCP > connection will still be established, and the data SNN sent will be read by > Jetty framework itself in the peer NN side, so the SNN will insignificantly > send the FsImage to the peer NN continuously, causing a waste of time and > bandwidth. In a relatively large HDFS cluster, the size of FsImage can often > reach about 30GB, This is indeed a big waste. > *2.In trunk version (with Jetty 9.3.27)* > After peer NN called HttpServletResponse.sendError(), the underlying TCP > connection will be auto closed, and then SNN will directly get an "Error > writing request body to server" exception, as below, note this test needs a > relatively big FSImage (e.g. 10MB level): > {code:java} > 2019-08-17 03:59:25,413 INFO namenode.TransferFsImage: Sending fileName: > /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: > 9864721. Sent total: 524288 bytes. Size of last segment intended to send: > 4096 bytes. > java.io.IOException: Error writing request body to server > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:314) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:249) > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:277) > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:272) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2019-08-17 03:59:25,422 INFO namenode.TransferFsImage: Sending fileName: > /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: > 9864721. Sent total: 851968 bytes. Size of last segment intended to send: > 4096 bytes. > java.io.IOException: Error writing request body to server > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340) > {code} > > *Solution:* > A standby NameNode should not upload fsimage to an inappropriate NameNode, > when he plans to put a FsImage to the peer NN, he need to check whether he > really need to put it at this time. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.
[ https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xudong Cao updated HDFS-14646: -- Attachment: HDFS-14646.001.patch Status: Patch Available (was: Open) > Standby NameNode should not upload fsimage to an inappropriate NameNode. > > > Key: HDFS-14646 > URL: https://issues.apache.org/jira/browse/HDFS-14646 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.1.2 >Reporter: Xudong Cao >Assignee: Xudong Cao >Priority: Major > Attachments: HDFS-14646.000.patch, HDFS-14646.001.patch > > > *Problem Description:* > In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put > the image to all other NNs (whether the peer NN is an ANN or not), and even > if the peer NN immediately replies an error (such as > TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult > .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put > process immediately, but will put the FsImage completely to the peer NN, and > will not read the peer NN's reply until the put is completed. > Depending on the version of Jetty, this behavior can lead to different > consequences, I tested it under 2.7.2 and trunk version. > *1.In Hadoop 2.7.2 (with Jetty 6.1.26)* > After peer NN called HttpServletResponse.sendError(), the underlying TCP > connection will still be established, and the data SNN sent will be read by > Jetty framework itself in the peer NN side, so the SNN will insignificantly > send the FsImage to the peer NN continuously, causing a waste of time and > bandwidth. In a relatively large HDFS cluster, the size of FsImage can often > reach about 30GB, This is indeed a big waste. > *2.In trunk version (with Jetty 9.3.27)* > After peer NN called HttpServletResponse.sendError(), the underlying TCP > connection will be auto closed, and then SNN will directly get an "Error > writing request body to server" exception, as below, note this test needs a > relatively big FSImage (e.g. 10MB level): > {code:java} > 2019-08-17 03:59:25,413 INFO namenode.TransferFsImage: Sending fileName: > /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: > 9864721. Sent total: 524288 bytes. Size of last segment intended to send: > 4096 bytes. > java.io.IOException: Error writing request body to server > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:314) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:249) > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:277) > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:272) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2019-08-17 03:59:25,422 INFO namenode.TransferFsImage: Sending fileName: > /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: > 9864721. Sent total: 851968 bytes. Size of last segment intended to send: > 4096 bytes. > java.io.IOException: Error writing request body to server > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340) > {code} > > *Solution:* > A standby NameNode should not upload fsimage to an inappropriate NameNode, > when he plans to put a FsImage to the peer NN, he need to check whether he > really need to put it at this time. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail:
[jira] [Updated] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.
[ https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xudong Cao updated HDFS-14646: -- Attachment: (was: HDFS-14646.001.patch) > Standby NameNode should not upload fsimage to an inappropriate NameNode. > > > Key: HDFS-14646 > URL: https://issues.apache.org/jira/browse/HDFS-14646 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.1.2 >Reporter: Xudong Cao >Assignee: Xudong Cao >Priority: Major > Attachments: HDFS-14646.000.patch, HDFS-14646.001.patch > > > *Problem Description:* > In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put > the image to all other NNs (whether the peer NN is an ANN or not), and even > if the peer NN immediately replies an error (such as > TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult > .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put > process immediately, but will put the FsImage completely to the peer NN, and > will not read the peer NN's reply until the put is completed. > Depending on the version of Jetty, this behavior can lead to different > consequences, I tested it under 2.7.2 and trunk version. > *1.In Hadoop 2.7.2 (with Jetty 6.1.26)* > After peer NN called HttpServletResponse.sendError(), the underlying TCP > connection will still be established, and the data SNN sent will be read by > Jetty framework itself in the peer NN side, so the SNN will insignificantly > send the FsImage to the peer NN continuously, causing a waste of time and > bandwidth. In a relatively large HDFS cluster, the size of FsImage can often > reach about 30GB, This is indeed a big waste. > *2.In trunk version (with Jetty 9.3.27)* > After peer NN called HttpServletResponse.sendError(), the underlying TCP > connection will be auto closed, and then SNN will directly get an "Error > writing request body to server" exception, as below, note this test needs a > relatively big FSImage (e.g. 10MB level): > {code:java} > 2019-08-17 03:59:25,413 INFO namenode.TransferFsImage: Sending fileName: > /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: > 9864721. Sent total: 524288 bytes. Size of last segment intended to send: > 4096 bytes. > java.io.IOException: Error writing request body to server > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:314) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:249) > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:277) > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:272) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2019-08-17 03:59:25,422 INFO namenode.TransferFsImage: Sending fileName: > /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: > 9864721. Sent total: 851968 bytes. Size of last segment intended to send: > 4096 bytes. > java.io.IOException: Error writing request body to server > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340) > {code} > > *Solution:* > A standby NameNode should not upload fsimage to an inappropriate NameNode, > when he plans to put a FsImage to the peer NN, he need to check whether he > really need to put it at this time. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.
[ https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909018#comment-16909018 ] Xudong Cao commented on HDFS-14646: --- [~elgoiri] [~xkrogen] [~hexiaoqiao] This problem has different behaviors under different Hadoop versions. I corrected the problem description and submitted a new patch. please review it if you have time, thank you! > Standby NameNode should not upload fsimage to an inappropriate NameNode. > > > Key: HDFS-14646 > URL: https://issues.apache.org/jira/browse/HDFS-14646 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.1.2 >Reporter: Xudong Cao >Assignee: Xudong Cao >Priority: Major > Attachments: HDFS-14646.000.patch > > > *Problem Description:* > In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put > the image to all other NNs (whether the peer NN is an ANN or not), and even > if the peer NN immediately replies an error (such as > TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult > .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put > process immediately, but will put the FsImage completely to the peer NN, and > will not read the peer NN's reply until the put is completed. > Depending on the version of Jetty, this behavior can lead to different > consequences, I tested it under 2.7.2 and trunk version. > *1. In Hadoop 2.7.2 (with Jetty 6.1.26)* > After peer NN called HttpServletResponse.sendError(), the underlying TCP > connection will still be established, and the data SNN sent will be read by > Jetty framework itself in the peer NN side, so the SNN will still > insignificantly send the FsImage to the peer NN continuously, causing a waste > of time and bandwidth. In a relatively large HDFS cluster, the size of > FsImage can often reach about 30GB, This is indeed a big waste. > *2.In trunk version (with Jetty 9.3.27)* > After peer NN called HttpServletResponse.sendError(), the underlying TCP > connection will be auto closed, and then SNN will directly get an "Error > writing request body to server" exception, as below: > {code:java} > 2019-08-17 03:59:25,413 INFO namenode.TransferFsImage: Sending fileName: > /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: > 9864721. Sent total: 524288 bytes. Size of last segment intended to send: > 4096 bytes. > java.io.IOException: Error writing request body to server > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:314) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:249) > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:277) > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:272) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2019-08-17 03:59:25,422 INFO namenode.TransferFsImage: Sending fileName: > /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: > 9864721. Sent total: 851968 bytes. Size of last segment intended to send: > 4096 bytes. > java.io.IOException: Error writing request body to server > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340) > {code} > > *Solution:* > A standby NameNode should not upload fsimage to an inappropriate NameNode, > when he plans to put a FsImage to the peer NN, he need to check whether he > really need to put it at this time. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail:
[jira] [Updated] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.
[ https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xudong Cao updated HDFS-14646: -- Attachment: HDFS-14646.001.patch > Standby NameNode should not upload fsimage to an inappropriate NameNode. > > > Key: HDFS-14646 > URL: https://issues.apache.org/jira/browse/HDFS-14646 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.1.2 >Reporter: Xudong Cao >Assignee: Xudong Cao >Priority: Major > Attachments: HDFS-14646.000.patch, HDFS-14646.001.patch > > > *Problem Description:* > In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put > the image to all other NNs (whether the peer NN is an ANN or not), and even > if the peer NN immediately replies an error (such as > TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult > .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put > process immediately, but will put the FsImage completely to the peer NN, and > will not read the peer NN's reply until the put is completed. > Depending on the version of Jetty, this behavior can lead to different > consequences, I tested it under 2.7.2 and trunk version. > *1. In Hadoop 2.7.2 (with Jetty 6.1.26)* > After peer NN called HttpServletResponse.sendError(), the underlying TCP > connection will still be established, and the data SNN sent will be read by > Jetty framework itself in the peer NN side, so the SNN will still > insignificantly send the FsImage to the peer NN continuously, causing a waste > of time and bandwidth. In a relatively large HDFS cluster, the size of > FsImage can often reach about 30GB, This is indeed a big waste. > *2.In trunk version (with Jetty 9.3.27)* > After peer NN called HttpServletResponse.sendError(), the underlying TCP > connection will be auto closed, and then SNN will directly get an "Error > writing request body to server" exception, as below: > {code:java} > 2019-08-17 03:59:25,413 INFO namenode.TransferFsImage: Sending fileName: > /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: > 9864721. Sent total: 524288 bytes. Size of last segment intended to send: > 4096 bytes. > java.io.IOException: Error writing request body to server > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:314) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:249) > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:277) > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:272) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2019-08-17 03:59:25,422 INFO namenode.TransferFsImage: Sending fileName: > /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: > 9864721. Sent total: 851968 bytes. Size of last segment intended to send: > 4096 bytes. > java.io.IOException: Error writing request body to server > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340) > {code} > > *Solution:* > A standby NameNode should not upload fsimage to an inappropriate NameNode, > when he plans to put a FsImage to the peer NN, he need to check whether he > really need to put it at this time. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.
[ https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xudong Cao updated HDFS-14646: -- Description: *Problem Description:* In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put the image to all other NNs (whether the peer NN is an ANN or not), and even if the peer NN immediately replies an error (such as TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put process immediately, but will put the FsImage completely to the peer NN, and will not read the peer NN's reply until the put is completed. Depending on the version of Jetty, this behavior can lead to different consequences, I tested it under 2.7.2 and trunk version. *1.In Hadoop 2.7.2 (with Jetty 6.1.26)* After peer NN called HttpServletResponse.sendError(), the underlying TCP connection will still be established, and the data SNN sent will be read by Jetty framework itself in the peer NN side, so the SNN will insignificantly send the FsImage to the peer NN continuously, causing a waste of time and bandwidth. In a relatively large HDFS cluster, the size of FsImage can often reach about 30GB, This is indeed a big waste. *2.In trunk version (with Jetty 9.3.27)* After peer NN called HttpServletResponse.sendError(), the underlying TCP connection will be auto closed, and then SNN will directly get an "Error writing request body to server" exception, as below, note this test needs a relatively big FSImage (e.g. MB level): {code:java} 2019-08-17 03:59:25,413 INFO namenode.TransferFsImage: Sending fileName: /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 9864721. Sent total: 524288 bytes. Size of last segment intended to send: 4096 bytes. java.io.IOException: Error writing request body to server at sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) at sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570) at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396) at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340) at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:314) at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:249) at org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:277) at org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:272) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 2019-08-17 03:59:25,422 INFO namenode.TransferFsImage: Sending fileName: /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 9864721. Sent total: 851968 bytes. Size of last segment intended to send: 4096 bytes. java.io.IOException: Error writing request body to server at sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) at sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570) at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396) at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340) {code} *Solution:* A standby NameNode should not upload fsimage to an inappropriate NameNode, when he plans to put a FsImage to the peer NN, he need to check whether he really need to put it at this time. was: *Problem Description:* In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put the image to all other NNs (whether the peer NN is an ANN or not), and even if the peer NN immediately replies an error (such as TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put process immediately, but will put the FsImage completely to the peer NN, and will not read the peer NN's reply until the put is completed. Depending on the version of Jetty, this behavior can lead to different consequences, I tested it under 2.7.2 and trunk version. *1.In Hadoop 2.7.2 (with Jetty 6.1.26)* After peer NN called HttpServletResponse.sendError(), the underlying TCP connection will still be established, and the data SNN sent will be read by Jetty framework itself in the peer NN side, so the SNN will insignificantly send the FsImage to the peer NN continuously, causing a waste of time and
[jira] [Updated] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.
[ https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xudong Cao updated HDFS-14646: -- Description: *Problem Description:* In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put the image to all other NNs (whether the peer NN is an ANN or not), and even if the peer NN immediately replies an error (such as TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put process immediately, but will put the FsImage completely to the peer NN, and will not read the peer NN's reply until the put is completed. Depending on the version of Jetty, this behavior can lead to different consequences, I tested it under 2.7.2 and trunk version. *1.In Hadoop 2.7.2 (with Jetty 6.1.26)* After peer NN called HttpServletResponse.sendError(), the underlying TCP connection will still be established, and the data SNN sent will be read by Jetty framework itself in the peer NN side, so the SNN will insignificantly send the FsImage to the peer NN continuously, causing a waste of time and bandwidth. In a relatively large HDFS cluster, the size of FsImage can often reach about 30GB, This is indeed a big waste. *2.In trunk version (with Jetty 9.3.27)* After peer NN called HttpServletResponse.sendError(), the underlying TCP connection will be auto closed, and then SNN will directly get an "Error writing request body to server" exception, as below, note this test needs a relatively big FSImage (e.g. 10MB level): {code:java} 2019-08-17 03:59:25,413 INFO namenode.TransferFsImage: Sending fileName: /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 9864721. Sent total: 524288 bytes. Size of last segment intended to send: 4096 bytes. java.io.IOException: Error writing request body to server at sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) at sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570) at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396) at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340) at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:314) at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:249) at org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:277) at org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:272) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 2019-08-17 03:59:25,422 INFO namenode.TransferFsImage: Sending fileName: /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 9864721. Sent total: 851968 bytes. Size of last segment intended to send: 4096 bytes. java.io.IOException: Error writing request body to server at sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) at sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570) at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396) at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340) {code} *Solution:* A standby NameNode should not upload fsimage to an inappropriate NameNode, when he plans to put a FsImage to the peer NN, he need to check whether he really need to put it at this time. was: *Problem Description:* In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put the image to all other NNs (whether the peer NN is an ANN or not), and even if the peer NN immediately replies an error (such as TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put process immediately, but will put the FsImage completely to the peer NN, and will not read the peer NN's reply until the put is completed. Depending on the version of Jetty, this behavior can lead to different consequences, I tested it under 2.7.2 and trunk version. *1.In Hadoop 2.7.2 (with Jetty 6.1.26)* After peer NN called HttpServletResponse.sendError(), the underlying TCP connection will still be established, and the data SNN sent will be read by Jetty framework itself in the peer NN side, so the SNN will insignificantly send the FsImage to the peer NN continuously, causing a waste of time and
[jira] [Updated] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.
[ https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xudong Cao updated HDFS-14646: -- Description: *Problem Description:* In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put the image to all other NNs (whether the peer NN is an ANN or not), and even if the peer NN immediately replies an error (such as TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put process immediately, but will put the FsImage completely to the peer NN, and will not read the peer NN's reply until the put is completed. Depending on the version of Jetty, this behavior can lead to different consequences, I tested it under 2.7.2 and trunk version. *1. In Hadoop 2.7.2 (with Jetty 6.1.26)* After peer NN called HttpServletResponse.sendError(), the underlying TCP connection will still be established, and the data SNN sent will be read by Jetty framework itself in the peer NN side, so the SNN will still insignificantly send the FsImage to the peer NN continuously, causing a waste of time and bandwidth. In a relatively large HDFS cluster, the size of FsImage can often reach about 30GB, This is indeed a big waste. *2.In trunk version (with Jetty 9.3.27)* After peer NN called HttpServletResponse.sendError(), the underlying TCP connection will be auto closed, and then SNN will directly get an "Error writing request body to server" exception, as below: {code:java} 2019-08-17 03:59:25,413 INFO namenode.TransferFsImage: Sending fileName: /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 9864721. Sent total: 524288 bytes. Size of last segment intended to send: 4096 bytes. java.io.IOException: Error writing request body to server at sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) at sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570) at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396) at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340) at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:314) at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:249) at org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:277) at org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:272) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 2019-08-17 03:59:25,422 INFO namenode.TransferFsImage: Sending fileName: /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 9864721. Sent total: 851968 bytes. Size of last segment intended to send: 4096 bytes. java.io.IOException: Error writing request body to server at sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) at sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570) at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396) at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340) {code} *Solution:* A standby NameNode should not upload fsimage to an inappropriate NameNode, when he plans to put a FsImage to the peer NN, he need to check whether he really need to put it at this time. was: *Problem Description:* In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put the image to all other NNs (whether the peer NN is an ANN or not), and even if the peer NN immediately replies an error (such as TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put process immediately, but will put the FsImage completely to the peer NN, and will not read the peer NN's reply until the put is completed. Depending on the version of Jetty, this behavior can lead to different consequences, I tested it under 2.7.2 and the latest 3.3.0 version. *1. In Hadoop 2.7.2 (with Jetty 6.1.26)* After peer NN called HttpServletResponse.sendError(), the underlying TCP connection will still be established, and the data SNN sent will be read by Jetty framework itself in the peer NN side, so the SNN will still insignificantly send the FsImage to the peer NN continuously, causing a waste of time and bandwidth. In a relatively large HDFS
[jira] [Updated] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.
[ https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xudong Cao updated HDFS-14646: -- Summary: Standby NameNode should not upload fsimage to an inappropriate NameNode. (was: Standby NameNode should terminate the FsImage put process immediately if the peer NN is not in the appropriate state to receive an image.) > Standby NameNode should not upload fsimage to an inappropriate NameNode. > > > Key: HDFS-14646 > URL: https://issues.apache.org/jira/browse/HDFS-14646 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.1.2 >Reporter: Xudong Cao >Assignee: Xudong Cao >Priority: Major > Attachments: HDFS-14646.000.patch, blockedInWritingSocket.png, > get1.png, get2.png, largeSendQ.png > > > *Problem Description:* > In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put > the image to all other NNs (whether the peer NN is an ANN or not), and even > if the peer NN immediately replies with an error (such as > TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult > .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put > process immediately, but will put the FsImage completely to the peer NN, and > will not read the peer NN's reply until the put is completed. > In a relatively large HDFS cluster, the size of FsImage can often reach about > 30GB. In this case, this invalid put brings two problems: > # Wasting time and bandwidth. > # Since the ImageServlet of the peer NN no longer receives the FsImage, the > socket Send-Q of the local SNN is very large, and the ImageUpload thread will > be blocked in writing socket for a long time, eventually causing the local > StandbyCheckpointer thread often blocked for several hours. > *An example is as follows:* > In the following figure, the local NN 100.76.3.234 is a SNN, the peer NN > 100.76.3.170 is another SNN, and the 8080 is NN Http port. When the local SNN > starts to put the FsImage, 170 will reply with a NOT_ACTIVE_NAMENODE_FAILURE > error immediately. In this case, the local SNN should terminate put > immediately, but in fact, local SNN has to wait until the image has been > completely put to the peer NN,and then can read the response. > # At this time, since the ImageServlet of the peer NN no longer receives the > FsImage, the socket Send-Q of the local SNN is very large: > !largeSendQ.png! > 2. Moreover, the local SNN's ImageUpload thread will be blocked in > writing socket for a long time: > !blockedInWritingSocket.png! . > > 3. Eventually, the StandbyCheckpointer thread of local SNN is waiting > for the execution result of the ImageUpload thread, blocking in Future.get(), > and the blocking time may be as long as several hours: > !get1.png! > > !get2.png! > > > *Solution:* > When the local SNN plans to put a FsImage to the peer NN, it need to test > whether he really need to put it at this time. The test process is: > # Establish an HTTP connection with the peer NN, send the put request, and > then immediately read the response (this is the key point). If the peer NN > replies any of the following errors (TransferResult.AUTHENTICATION_FAILURE, > TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult. > OLD_TRANSACTION_ID_FAILURE), immediately terminate the put process. > # If the peer NN is indeed the Active NameNode AND it's now in the > appropriate state to receive an image, it will reply an HTTP response 410 > (HttpServletResponse.SC_GONE, which is TransferResult.UNEXPECTED_FAILURE). At > this time, the local SNN can really begin to put the image. > *Note:* > This problem needs to be reproduced in a large cluster (the size of FsImage > in our cluster is about 30GB). Therefore, unit testing is difficult to write. > In our cluster, after the modification, the problem has been solved and there > is no such thing as a large backlog of Send-Q. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.
[ https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xudong Cao updated HDFS-14646: -- Description: *Problem Description:* In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put the image to all other NNs (whether the peer NN is an ANN or not), and even if the peer NN immediately replies an error (such as TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put process immediately, but will put the FsImage completely to the peer NN, and will not read the peer NN's reply until the put is completed. Depending on the version of Jetty, this behavior can lead to different consequences, I tested it under 2.7.2 and the latest 3.3.0 version. *1. In Hadoop 2.7.2 (with Jetty 6.1.26)* After peer NN called HttpServletResponse.sendError(), the underlying TCP connection will still be established, and the data SNN sent will be read by Jetty framework itself in the peer NN side, so the SNN will still insignificantly send the FsImage to the peer NN continuously, causing a waste of time and bandwidth. In a relatively large HDFS cluster, the size of FsImage can often reach about 30GB, This is indeed a big waste. *2.In newest Hadoop-3.3.0-SNAPSHOT (with Jetty 9.3.27)* After peer NN called HttpServletResponse.sendError(), the underlying TCP connection will be auto closed, and then SNN will directly get an "Error writing request body to server" exception, as below: {code:java} 2019-08-17 03:59:25,413 INFO namenode.TransferFsImage: Sending fileName: /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 9864721. Sent total: 524288 bytes. Size of last segment intended to send: 4096 bytes. java.io.IOException: Error writing request body to server at sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) at sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570) at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396) at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340) at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:314) at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:249) at org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:277) at org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:272) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 2019-08-17 03:59:25,422 INFO namenode.TransferFsImage: Sending fileName: /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 9864721. Sent total: 851968 bytes. Size of last segment intended to send: 4096 bytes. java.io.IOException: Error writing request body to server at sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) at sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570) at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396) at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340) {code} *Solution:* A standby NameNode should not upload fsimage to an inappropriate NameNode, when he plans to put a FsImage to the peer NN, he need to check whether he really need to put it at this time. was: *Problem Description:* In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put the image to all other NNs (whether the peer NN is an ANN or not), and even if the peer NN immediately replies with an error (such as TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put process immediately, but will put the FsImage completely to the peer NN, and will not read the peer NN's reply until the put is completed. Depending on the version of Jetty, this behavior can lead to different consequences, I tested it under 2.7.2 and the latest 3.3.0 version. *1. In Hadoop 2.7.2 (with Jetty 6.1.26)* After perr NN called HttpServletResponse.sendError(), the underlying TCP connection will still be established, and the data SNN sent will be read by Jetty framework itself in the peer NN, so the SNN will still insignificantly send the FsImage to the peer NN continuously, causing a waste of time and bandwidth. In a
[jira] [Updated] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.
[ https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xudong Cao updated HDFS-14646: -- Attachment: (was: blockedInWritingSocket.png) > Standby NameNode should not upload fsimage to an inappropriate NameNode. > > > Key: HDFS-14646 > URL: https://issues.apache.org/jira/browse/HDFS-14646 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.1.2 >Reporter: Xudong Cao >Assignee: Xudong Cao >Priority: Major > Attachments: HDFS-14646.000.patch > > > *Problem Description:* > In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put > the image to all other NNs (whether the peer NN is an ANN or not), and even > if the peer NN immediately replies with an error (such as > TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult > .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put > process immediately, but will put the FsImage completely to the peer NN, and > will not read the peer NN's reply until the put is completed. > Depending on the version of Jetty, this behavior can lead to different > consequences, I tested it under 2.7.2 and the latest 3.3.0 version. > *1. In Hadoop 2.7.2 (with Jetty 6.1.26)* > After perr NN called HttpServletResponse.sendError(), the underlying TCP > connection will still be established, and the data SNN sent will be read by > Jetty framework itself in the peer NN, so the SNN will still insignificantly > send the FsImage to the peer NN continuously, causing a waste of time and > bandwidth. In a relatively large HDFS cluster, the size of FsImage can often > reach about 30GB, This is indeed a big waste. > *2.In newest Hadoop-3.3.0-SNAPSHOT (with Jetty 9.3.27)* > After perr NN called HttpServletResponse.sendError(), the underlying TCP > connection will be auto closed, and then SNN will directly get an "Error > writing request body to server" exception, as below: > {code:java} > 2019-08-17 03:59:25,413 INFO namenode.TransferFsImage: Sending fileName: > /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: > 9864721. Sent total: 524288 bytes. Size of last segment intended to send: > 4096 bytes. > java.io.IOException: Error writing request body to server > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:314) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:249) > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:277) > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:272) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2019-08-17 03:59:25,422 INFO namenode.TransferFsImage: Sending fileName: > /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: > 9864721. Sent total: 851968 bytes. Size of last segment intended to send: > 4096 bytes. > java.io.IOException: Error writing request body to server > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340) > {code} > > *Solution:* > A standby NameNode should not upload fsimage to an inappropriate NameNode, > when the he plans to put a FsImage to the peer NN, he need to check whether > he really need to put it at this time. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.
[ https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xudong Cao updated HDFS-14646: -- Attachment: (was: get1.png) > Standby NameNode should not upload fsimage to an inappropriate NameNode. > > > Key: HDFS-14646 > URL: https://issues.apache.org/jira/browse/HDFS-14646 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.1.2 >Reporter: Xudong Cao >Assignee: Xudong Cao >Priority: Major > Attachments: HDFS-14646.000.patch > > > *Problem Description:* > In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put > the image to all other NNs (whether the peer NN is an ANN or not), and even > if the peer NN immediately replies with an error (such as > TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult > .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put > process immediately, but will put the FsImage completely to the peer NN, and > will not read the peer NN's reply until the put is completed. > Depending on the version of Jetty, this behavior can lead to different > consequences, I tested it under 2.7.2 and the latest 3.3.0 version. > *1. In Hadoop 2.7.2 (with Jetty 6.1.26)* > After perr NN called HttpServletResponse.sendError(), the underlying TCP > connection will still be established, and the data SNN sent will be read by > Jetty framework itself in the peer NN, so the SNN will still insignificantly > send the FsImage to the peer NN continuously, causing a waste of time and > bandwidth. In a relatively large HDFS cluster, the size of FsImage can often > reach about 30GB, This is indeed a big waste. > *2.In newest Hadoop-3.3.0-SNAPSHOT (with Jetty 9.3.27)* > After perr NN called HttpServletResponse.sendError(), the underlying TCP > connection will be auto closed, and then SNN will directly get an "Error > writing request body to server" exception, as below: > {code:java} > 2019-08-17 03:59:25,413 INFO namenode.TransferFsImage: Sending fileName: > /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: > 9864721. Sent total: 524288 bytes. Size of last segment intended to send: > 4096 bytes. > java.io.IOException: Error writing request body to server > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:314) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:249) > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:277) > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:272) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2019-08-17 03:59:25,422 INFO namenode.TransferFsImage: Sending fileName: > /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: > 9864721. Sent total: 851968 bytes. Size of last segment intended to send: > 4096 bytes. > java.io.IOException: Error writing request body to server > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340) > {code} > > *Solution:* > A standby NameNode should not upload fsimage to an inappropriate NameNode, > when the he plans to put a FsImage to the peer NN, he need to check whether > he really need to put it at this time. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.
[ https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xudong Cao updated HDFS-14646: -- Description: *Problem Description:* In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put the image to all other NNs (whether the peer NN is an ANN or not), and even if the peer NN immediately replies with an error (such as TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put process immediately, but will put the FsImage completely to the peer NN, and will not read the peer NN's reply until the put is completed. Depending on the version of Jetty, this behavior can lead to different consequences, I tested it under 2.7.2 and the latest 3.3.0 version. *1. In Hadoop 2.7.2 (with Jetty 6.1.26)* After perr NN called HttpServletResponse.sendError(), the underlying TCP connection will still be established, and the data SNN sent will be read by Jetty framework itself in the peer NN, so the SNN will still insignificantly send the FsImage to the peer NN continuously, causing a waste of time and bandwidth. In a relatively large HDFS cluster, the size of FsImage can often reach about 30GB, This is indeed a big waste. *2.In newest Hadoop-3.3.0-SNAPSHOT (with Jetty 9.3.27)* After perr NN called HttpServletResponse.sendError(), the underlying TCP connection will be auto closed, and then SNN will directly get an "Error writing request body to server" exception, as below: {code:java} 2019-08-17 03:59:25,413 INFO namenode.TransferFsImage: Sending fileName: /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 9864721. Sent total: 524288 bytes. Size of last segment intended to send: 4096 bytes. java.io.IOException: Error writing request body to server at sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) at sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570) at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396) at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340) at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:314) at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:249) at org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:277) at org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:272) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 2019-08-17 03:59:25,422 INFO namenode.TransferFsImage: Sending fileName: /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 9864721. Sent total: 851968 bytes. Size of last segment intended to send: 4096 bytes. java.io.IOException: Error writing request body to server at sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) at sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570) at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396) at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340) {code} *Solution:* A standby NameNode should not upload fsimage to an inappropriate NameNode, when the he plans to put a FsImage to the peer NN, he need to check whether he really need to put it at this time. was: *Problem Description:* In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put the image to all other NNs (whether the peer NN is an ANN or not), and even if the peer NN immediately replies with an error (such as TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put process immediately, but will put the FsImage completely to the peer NN, and will not read the peer NN's reply until the put is completed. In a relatively large HDFS cluster, the size of FsImage can often reach about 30GB. In this case, this invalid put brings two problems: # Wasting time and bandwidth. # Since the ImageServlet of the peer NN no longer receives the FsImage, the socket Send-Q of the local SNN is very large, and the ImageUpload thread will be blocked in writing socket for a long time, eventually causing the local StandbyCheckpointer thread often blocked for several hours. *An example is as follows:* In the
[jira] [Updated] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.
[ https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xudong Cao updated HDFS-14646: -- Attachment: (was: largeSendQ.png) > Standby NameNode should not upload fsimage to an inappropriate NameNode. > > > Key: HDFS-14646 > URL: https://issues.apache.org/jira/browse/HDFS-14646 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.1.2 >Reporter: Xudong Cao >Assignee: Xudong Cao >Priority: Major > Attachments: HDFS-14646.000.patch > > > *Problem Description:* > In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put > the image to all other NNs (whether the peer NN is an ANN or not), and even > if the peer NN immediately replies with an error (such as > TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult > .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put > process immediately, but will put the FsImage completely to the peer NN, and > will not read the peer NN's reply until the put is completed. > Depending on the version of Jetty, this behavior can lead to different > consequences, I tested it under 2.7.2 and the latest 3.3.0 version. > *1. In Hadoop 2.7.2 (with Jetty 6.1.26)* > After perr NN called HttpServletResponse.sendError(), the underlying TCP > connection will still be established, and the data SNN sent will be read by > Jetty framework itself in the peer NN, so the SNN will still insignificantly > send the FsImage to the peer NN continuously, causing a waste of time and > bandwidth. In a relatively large HDFS cluster, the size of FsImage can often > reach about 30GB, This is indeed a big waste. > *2.In newest Hadoop-3.3.0-SNAPSHOT (with Jetty 9.3.27)* > After perr NN called HttpServletResponse.sendError(), the underlying TCP > connection will be auto closed, and then SNN will directly get an "Error > writing request body to server" exception, as below: > {code:java} > 2019-08-17 03:59:25,413 INFO namenode.TransferFsImage: Sending fileName: > /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: > 9864721. Sent total: 524288 bytes. Size of last segment intended to send: > 4096 bytes. > java.io.IOException: Error writing request body to server > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:314) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:249) > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:277) > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:272) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2019-08-17 03:59:25,422 INFO namenode.TransferFsImage: Sending fileName: > /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: > 9864721. Sent total: 851968 bytes. Size of last segment intended to send: > 4096 bytes. > java.io.IOException: Error writing request body to server > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340) > {code} > > *Solution:* > A standby NameNode should not upload fsimage to an inappropriate NameNode, > when the he plans to put a FsImage to the peer NN, he need to check whether > he really need to put it at this time. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.
[ https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xudong Cao updated HDFS-14646: -- Attachment: (was: get2.png) > Standby NameNode should not upload fsimage to an inappropriate NameNode. > > > Key: HDFS-14646 > URL: https://issues.apache.org/jira/browse/HDFS-14646 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.1.2 >Reporter: Xudong Cao >Assignee: Xudong Cao >Priority: Major > Attachments: HDFS-14646.000.patch > > > *Problem Description:* > In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put > the image to all other NNs (whether the peer NN is an ANN or not), and even > if the peer NN immediately replies with an error (such as > TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult > .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put > process immediately, but will put the FsImage completely to the peer NN, and > will not read the peer NN's reply until the put is completed. > Depending on the version of Jetty, this behavior can lead to different > consequences, I tested it under 2.7.2 and the latest 3.3.0 version. > *1. In Hadoop 2.7.2 (with Jetty 6.1.26)* > After perr NN called HttpServletResponse.sendError(), the underlying TCP > connection will still be established, and the data SNN sent will be read by > Jetty framework itself in the peer NN, so the SNN will still insignificantly > send the FsImage to the peer NN continuously, causing a waste of time and > bandwidth. In a relatively large HDFS cluster, the size of FsImage can often > reach about 30GB, This is indeed a big waste. > *2.In newest Hadoop-3.3.0-SNAPSHOT (with Jetty 9.3.27)* > After perr NN called HttpServletResponse.sendError(), the underlying TCP > connection will be auto closed, and then SNN will directly get an "Error > writing request body to server" exception, as below: > {code:java} > 2019-08-17 03:59:25,413 INFO namenode.TransferFsImage: Sending fileName: > /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: > 9864721. Sent total: 524288 bytes. Size of last segment intended to send: > 4096 bytes. > java.io.IOException: Error writing request body to server > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:314) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:249) > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:277) > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:272) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2019-08-17 03:59:25,422 INFO namenode.TransferFsImage: Sending fileName: > /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: > 9864721. Sent total: 851968 bytes. Size of last segment intended to send: > 4096 bytes. > java.io.IOException: Error writing request body to server > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340) > {code} > > *Solution:* > A standby NameNode should not upload fsimage to an inappropriate NameNode, > when the he plans to put a FsImage to the peer NN, he need to check whether > he really need to put it at this time. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Issue Comment Deleted] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.
[ https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xudong Cao updated HDFS-14646: -- Comment: was deleted (was: *Test Result:* in a 3 nodes HDFS: ubuntu1 (ANN) +ubuntu2 (SNN) + ubuntu3(SNN), the uploading log in ubuntu2 and ubuntu3 is as follows: 1. SNN ubuntu2: {code:java} root@ubuntu2:~/hadoop-3.3.0-SNAPSHOT/logs# grep "Uploaded" hadoop-root-namenode-ubuntu2.log 2019-07-16 01:52:24,801 INFO org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with txid 9573 to namenode at http://ubuntu1:9870 in 0.178 seconds 2019-07-16 01:53:24,912 INFO org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with txid 9759 to namenode at http://ubuntu1:9870 in 0.041 seconds 2019-07-16 01:54:25,051 INFO org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with txid 9777 to namenode at http://ubuntu1:9870 in 0.075 seconds 2019-07-16 01:55:25,147 INFO org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with txid 9961 to namenode at http://ubuntu1:9870 in 0.031 seconds 2019-07-16 01:56:25,253 INFO org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with txid 9981 to namenode at http://ubuntu1:9870 in 0.054 seconds 2019-07-16 01:57:25,323 INFO org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with txid 10171 to namenode at http://ubuntu1:9870 in 0.033 seconds 2019-07-16 01:58:25,388 INFO org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with txid 10191 to namenode at http://ubuntu1:9870 in 0.032 seconds 2019-07-16 01:59:25,479 INFO org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with txid 10383 to namenode at http://ubuntu1:9870 in 0.046 seconds{code} 2. another SNN ubuntu3: {code:java} root@ubuntu3:~/hadoop-3.3.0-SNAPSHOT/logs# grep "Uploaded" hadoop-root-namenode-ubuntu3.log 2019-07-16 02:00:34,767 INFO org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with txid 10401 to namenode at http://ubuntu1:9870 in 0.028 seconds 2019-07-16 02:02:34,851 INFO org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with txid 10603 to namenode at http://ubuntu1:9870 in 0.03 seconds 2019-07-16 02:04:34,938 INFO org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with txid 10807 to namenode at http://ubuntu1:9870 in 0.033 seconds 2019-07-16 02:06:35,021 INFO org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with txid 11013 to namenode at http://ubuntu1:9870 in 0.041 seconds 2019-07-16 02:08:35,094 INFO org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with txid 11217 to namenode at http://ubuntu1:9870 in 0.029 seconds 2019-07-16 02:10:35,200 INFO org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with txid 11423 to namenode at http://ubuntu1:9870 in 0.032 seconds 2019-07-16 02:12:35,285 INFO org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with txid 11629 to namenode at http://ubuntu1:9870 in 0.026 seconds 2019-07-16 02:14:35,357 INFO org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with txid 11835 to namenode at http://ubuntu1:9870 in 0.023 seconds 2019-07-16 02:16:35,442 INFO org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with txid 12035 to namenode at http://ubuntu1:9870 in 0.042 seconds 2019-07-16 02:18:35,515 INFO org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with txid 12233 to namenode at http://ubuntu1:9870 in 0.031 seconds 2019-07-16 02:20:35,605 INFO org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with txid 12441 to namenode at http://ubuntu1:9870 in 0.033 seconds 2019-07-16 02:22:35,675 INFO org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with txid 12647 to namenode at http://ubuntu1:9870 in 0.029 seconds 2019-07-16 02:24:35,771 INFO org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with txid 12853 to namenode at http://ubuntu1:9870 in 0.041 seconds{code}) > Standby NameNode should not upload fsimage to an inappropriate NameNode. > > > Key: HDFS-14646 > URL: https://issues.apache.org/jira/browse/HDFS-14646 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.1.2 >Reporter: Xudong Cao >Assignee: Xudong Cao >Priority: Major > Attachments: HDFS-14646.000.patch > > > *Problem Description:* > In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put > the image to all other NNs (whether the peer NN is an ANN or not), and even > if the peer NN immediately replies with an error (such as >
[jira] [Updated] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.
[ https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xudong Cao updated HDFS-14646: -- Status: Open (was: Patch Available) > Standby NameNode should not upload fsimage to an inappropriate NameNode. > > > Key: HDFS-14646 > URL: https://issues.apache.org/jira/browse/HDFS-14646 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.1.2 >Reporter: Xudong Cao >Assignee: Xudong Cao >Priority: Major > Attachments: HDFS-14646.000.patch, HDFS-14646.001.patch, > HDFS-14646.002.patch > > > *Problem Description:* > In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put > the image to all other NNs (whether the peer NN is an ANN or not), and even > if the peer NN immediately replies an error (such as > TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult > .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put > process immediately, but will put the FsImage completely to the peer NN, and > will not read the peer NN's reply until the put is completed. > Depending on the version of Jetty, this behavior can lead to different > consequences, I tested it under 2.7.2 and trunk version. > *1.In Hadoop 2.7.2 (with Jetty 6.1.26)* > After peer NN called HttpServletResponse.sendError(), the underlying TCP > connection will still be established, and the data SNN sent will be read by > Jetty framework itself in the peer NN side, so the SNN will insignificantly > send the FsImage to the peer NN continuously, causing a waste of time and > bandwidth. In a relatively large HDFS cluster, the size of FsImage can often > reach about 30GB, This is indeed a big waste. > *2.In trunk version (with Jetty 9.3.27)* > After peer NN called HttpServletResponse.sendError(), the underlying TCP > connection will be auto closed, and then SNN will directly get an "Error > writing request body to server" exception, as below, note this test needs a > relatively big FSImage (e.g. 10MB level): > {code:java} > 2019-08-17 03:59:25,413 INFO namenode.TransferFsImage: Sending fileName: > /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: > 9864721. Sent total: 524288 bytes. Size of last segment intended to send: > 4096 bytes. > java.io.IOException: Error writing request body to server > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:314) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:249) > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:277) > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:272) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2019-08-17 03:59:25,422 INFO namenode.TransferFsImage: Sending fileName: > /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: > 9864721. Sent total: 851968 bytes. Size of last segment intended to send: > 4096 bytes. > java.io.IOException: Error writing request body to server > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340) > {code} > > *Solution:* > A standby NameNode should not upload fsimage to an inappropriate NameNode, > when he plans to put a FsImage to the peer NN, he need to check whether he > really need to put it at this time. > In detail, local SNN should establish an HTTP connection with the peer NN, > send the put request, and then immediately read the response (this is the key > point). If the peer NN does not reply an HTTP_OK, it means the local SNN > should not put image at this
[jira] [Resolved] (HDDS-1703) Freon uses wait/notify instead of polling to eliminate the test result errors.
[ https://issues.apache.org/jira/browse/HDDS-1703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xudong Cao resolved HDDS-1703. -- Resolution: Invalid > Freon uses wait/notify instead of polling to eliminate the test result errors. > -- > > Key: HDDS-1703 > URL: https://issues.apache.org/jira/browse/HDDS-1703 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: test >Affects Versions: 0.4.0 >Reporter: Xudong Cao >Assignee: Xudong Cao >Priority: Minor > > After HDDS-1532, Freon has an efficient concurrent testing framework. In the > new framework, the main thread checks every 5s to verify whether the test is > completed (or an exception occurred), which will eventually introduce a > maximum error of 5s. > In most cases, Freon's test results are at minutes or tens of minutes level, > thus a 5s error is not significant, but in some particularly small tests, a > 5s error may have a significant impact. > Therefore, we can use the combination of Object.wait() + Object.notify() > instead of polling to completely eliminate this error. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.
[ https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xudong Cao updated HDFS-14646: -- Attachment: (was: HDFS-14646.003.patch) > Standby NameNode should not upload fsimage to an inappropriate NameNode. > > > Key: HDFS-14646 > URL: https://issues.apache.org/jira/browse/HDFS-14646 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.1.2 >Reporter: Xudong Cao >Assignee: Xudong Cao >Priority: Major > Attachments: HDFS-14646.000.patch, HDFS-14646.001.patch, > HDFS-14646.002.patch, HDFS-14646.003.patch > > > *Problem Description:* > In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put > the image to all other NNs (whether the peer NN is an ANN or not), and even > if the peer NN immediately replies an error (such as > TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult > .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put > process immediately, but will put the FsImage completely to the peer NN, and > will not read the peer NN's reply until the put is completed. > Depending on the version of Jetty, this behavior can lead to different > consequences, I tested it under 2.7.2 and trunk version. > *1.In Hadoop 2.7.2 (with Jetty 6.1.26)* > After peer NN called HttpServletResponse.sendError(), the underlying TCP > connection will still be established, and the data SNN sent will be read by > Jetty framework itself in the peer NN side, so the SNN will insignificantly > send the FsImage to the peer NN continuously, causing a waste of time and > bandwidth. In a relatively large HDFS cluster, the size of FsImage can often > reach about 30GB, This is indeed a big waste. > *2.In trunk version (with Jetty 9.3.27)* > After peer NN called HttpServletResponse.sendError(), the underlying TCP > connection will be auto closed, and then SNN will directly get an "Error > writing request body to server" exception, as below, note this test needs a > relatively big FSImage (e.g. 10MB level): > {code:java} > 2019-08-17 03:59:25,413 INFO namenode.TransferFsImage: Sending fileName: > /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: > 9864721. Sent total: 524288 bytes. Size of last segment intended to send: > 4096 bytes. > java.io.IOException: Error writing request body to server > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:314) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:249) > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:277) > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:272) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2019-08-17 03:59:25,422 INFO namenode.TransferFsImage: Sending fileName: > /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: > 9864721. Sent total: 851968 bytes. Size of last segment intended to send: > 4096 bytes. > java.io.IOException: Error writing request body to server > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340) > {code} > > *Solution:* > A standby NameNode should not upload fsimage to an inappropriate NameNode, > when he plans to put a FsImage to the peer NN, he need to check whether he > really need to put it at this time. > In detail, local SNN should establish an HTTP connection with the peer NN, > send the put request, and then immediately read the response (this is the key > point). If the peer NN does not reply an HTTP_OK, it means the local SNN >
[jira] [Updated] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.
[ https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xudong Cao updated HDFS-14646: -- Attachment: (was: HDFS-14646.003.patch) > Standby NameNode should not upload fsimage to an inappropriate NameNode. > > > Key: HDFS-14646 > URL: https://issues.apache.org/jira/browse/HDFS-14646 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.1.2 >Reporter: Xudong Cao >Assignee: Xudong Cao >Priority: Major > Attachments: HDFS-14646.000.patch, HDFS-14646.001.patch, > HDFS-14646.002.patch, HDFS-14646.003.patch > > > *Problem Description:* > In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put > the image to all other NNs (whether the peer NN is an ANN or not), and even > if the peer NN immediately replies an error (such as > TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult > .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put > process immediately, but will put the FsImage completely to the peer NN, and > will not read the peer NN's reply until the put is completed. > Depending on the version of Jetty, this behavior can lead to different > consequences, I tested it under 2.7.2 and trunk version. > *1.In Hadoop 2.7.2 (with Jetty 6.1.26)* > After peer NN called HttpServletResponse.sendError(), the underlying TCP > connection will still be established, and the data SNN sent will be read by > Jetty framework itself in the peer NN side, so the SNN will insignificantly > send the FsImage to the peer NN continuously, causing a waste of time and > bandwidth. In a relatively large HDFS cluster, the size of FsImage can often > reach about 30GB, This is indeed a big waste. > *2.In trunk version (with Jetty 9.3.27)* > After peer NN called HttpServletResponse.sendError(), the underlying TCP > connection will be auto closed, and then SNN will directly get an "Error > writing request body to server" exception, as below, note this test needs a > relatively big FSImage (e.g. 10MB level): > {code:java} > 2019-08-17 03:59:25,413 INFO namenode.TransferFsImage: Sending fileName: > /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: > 9864721. Sent total: 524288 bytes. Size of last segment intended to send: > 4096 bytes. > java.io.IOException: Error writing request body to server > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:314) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:249) > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:277) > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:272) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2019-08-17 03:59:25,422 INFO namenode.TransferFsImage: Sending fileName: > /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: > 9864721. Sent total: 851968 bytes. Size of last segment intended to send: > 4096 bytes. > java.io.IOException: Error writing request body to server > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340) > {code} > > *Solution:* > A standby NameNode should not upload fsimage to an inappropriate NameNode, > when he plans to put a FsImage to the peer NN, he need to check whether he > really need to put it at this time. > In detail, local SNN should establish an HTTP connection with the peer NN, > send the put request, and then immediately read the response (this is the key > point). If the peer NN does not reply an HTTP_OK, it means the local SNN >
[jira] [Updated] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.
[ https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xudong Cao updated HDFS-14646: -- Status: Open (was: Patch Available) > Standby NameNode should not upload fsimage to an inappropriate NameNode. > > > Key: HDFS-14646 > URL: https://issues.apache.org/jira/browse/HDFS-14646 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.1.2 >Reporter: Xudong Cao >Assignee: Xudong Cao >Priority: Major > Attachments: HDFS-14646.000.patch, HDFS-14646.001.patch, > HDFS-14646.002.patch, HDFS-14646.003.patch > > > *Problem Description:* > In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put > the image to all other NNs (whether the peer NN is an ANN or not), and even > if the peer NN immediately replies an error (such as > TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult > .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put > process immediately, but will put the FsImage completely to the peer NN, and > will not read the peer NN's reply until the put is completed. > Depending on the version of Jetty, this behavior can lead to different > consequences, I tested it under 2.7.2 and trunk version. > *1.In Hadoop 2.7.2 (with Jetty 6.1.26)* > After peer NN called HttpServletResponse.sendError(), the underlying TCP > connection will still be established, and the data SNN sent will be read by > Jetty framework itself in the peer NN side, so the SNN will insignificantly > send the FsImage to the peer NN continuously, causing a waste of time and > bandwidth. In a relatively large HDFS cluster, the size of FsImage can often > reach about 30GB, This is indeed a big waste. > *2.In trunk version (with Jetty 9.3.27)* > After peer NN called HttpServletResponse.sendError(), the underlying TCP > connection will be auto closed, and then SNN will directly get an "Error > writing request body to server" exception, as below, note this test needs a > relatively big FSImage (e.g. 10MB level): > {code:java} > 2019-08-17 03:59:25,413 INFO namenode.TransferFsImage: Sending fileName: > /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: > 9864721. Sent total: 524288 bytes. Size of last segment intended to send: > 4096 bytes. > java.io.IOException: Error writing request body to server > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:314) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:249) > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:277) > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:272) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2019-08-17 03:59:25,422 INFO namenode.TransferFsImage: Sending fileName: > /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: > 9864721. Sent total: 851968 bytes. Size of last segment intended to send: > 4096 bytes. > java.io.IOException: Error writing request body to server > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340) > {code} > > *Solution:* > A standby NameNode should not upload fsimage to an inappropriate NameNode, > when he plans to put a FsImage to the peer NN, he need to check whether he > really need to put it at this time. > In detail, local SNN should establish an HTTP connection with the peer NN, > send the put request, and then immediately read the response (this is the key > point). If the peer NN does not reply an HTTP_OK, it means the local SNN > should not
[jira] [Updated] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.
[ https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xudong Cao updated HDFS-14646: -- Attachment: HDFS-14646.003.patch Status: Patch Available (was: Open) > Standby NameNode should not upload fsimage to an inappropriate NameNode. > > > Key: HDFS-14646 > URL: https://issues.apache.org/jira/browse/HDFS-14646 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.1.2 >Reporter: Xudong Cao >Assignee: Xudong Cao >Priority: Major > Attachments: HDFS-14646.000.patch, HDFS-14646.001.patch, > HDFS-14646.002.patch, HDFS-14646.003.patch > > > *Problem Description:* > In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put > the image to all other NNs (whether the peer NN is an ANN or not), and even > if the peer NN immediately replies an error (such as > TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult > .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put > process immediately, but will put the FsImage completely to the peer NN, and > will not read the peer NN's reply until the put is completed. > Depending on the version of Jetty, this behavior can lead to different > consequences, I tested it under 2.7.2 and trunk version. > *1.In Hadoop 2.7.2 (with Jetty 6.1.26)* > After peer NN called HttpServletResponse.sendError(), the underlying TCP > connection will still be established, and the data SNN sent will be read by > Jetty framework itself in the peer NN side, so the SNN will insignificantly > send the FsImage to the peer NN continuously, causing a waste of time and > bandwidth. In a relatively large HDFS cluster, the size of FsImage can often > reach about 30GB, This is indeed a big waste. > *2.In trunk version (with Jetty 9.3.27)* > After peer NN called HttpServletResponse.sendError(), the underlying TCP > connection will be auto closed, and then SNN will directly get an "Error > writing request body to server" exception, as below, note this test needs a > relatively big FSImage (e.g. 10MB level): > {code:java} > 2019-08-17 03:59:25,413 INFO namenode.TransferFsImage: Sending fileName: > /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: > 9864721. Sent total: 524288 bytes. Size of last segment intended to send: > 4096 bytes. > java.io.IOException: Error writing request body to server > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:314) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:249) > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:277) > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:272) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2019-08-17 03:59:25,422 INFO namenode.TransferFsImage: Sending fileName: > /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: > 9864721. Sent total: 851968 bytes. Size of last segment intended to send: > 4096 bytes. > java.io.IOException: Error writing request body to server > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340) > {code} > > *Solution:* > A standby NameNode should not upload fsimage to an inappropriate NameNode, > when he plans to put a FsImage to the peer NN, he need to check whether he > really need to put it at this time. > In detail, local SNN should establish an HTTP connection with the peer NN, > send the put request, and then immediately read the response (this is the key > point). If the peer NN does not reply an
[jira] [Updated] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.
[ https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xudong Cao updated HDFS-14646: -- Status: Open (was: Patch Available) > Standby NameNode should not upload fsimage to an inappropriate NameNode. > > > Key: HDFS-14646 > URL: https://issues.apache.org/jira/browse/HDFS-14646 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.1.2 >Reporter: Xudong Cao >Assignee: Xudong Cao >Priority: Major > Attachments: HDFS-14646.000.patch, HDFS-14646.001.patch, > HDFS-14646.002.patch, HDFS-14646.003.patch, HDFS-14646.003.patch > > > *Problem Description:* > In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put > the image to all other NNs (whether the peer NN is an ANN or not), and even > if the peer NN immediately replies an error (such as > TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult > .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put > process immediately, but will put the FsImage completely to the peer NN, and > will not read the peer NN's reply until the put is completed. > Depending on the version of Jetty, this behavior can lead to different > consequences, I tested it under 2.7.2 and trunk version. > *1.In Hadoop 2.7.2 (with Jetty 6.1.26)* > After peer NN called HttpServletResponse.sendError(), the underlying TCP > connection will still be established, and the data SNN sent will be read by > Jetty framework itself in the peer NN side, so the SNN will insignificantly > send the FsImage to the peer NN continuously, causing a waste of time and > bandwidth. In a relatively large HDFS cluster, the size of FsImage can often > reach about 30GB, This is indeed a big waste. > *2.In trunk version (with Jetty 9.3.27)* > After peer NN called HttpServletResponse.sendError(), the underlying TCP > connection will be auto closed, and then SNN will directly get an "Error > writing request body to server" exception, as below, note this test needs a > relatively big FSImage (e.g. 10MB level): > {code:java} > 2019-08-17 03:59:25,413 INFO namenode.TransferFsImage: Sending fileName: > /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: > 9864721. Sent total: 524288 bytes. Size of last segment intended to send: > 4096 bytes. > java.io.IOException: Error writing request body to server > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:314) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:249) > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:277) > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:272) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2019-08-17 03:59:25,422 INFO namenode.TransferFsImage: Sending fileName: > /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: > 9864721. Sent total: 851968 bytes. Size of last segment intended to send: > 4096 bytes. > java.io.IOException: Error writing request body to server > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340) > {code} > > *Solution:* > A standby NameNode should not upload fsimage to an inappropriate NameNode, > when he plans to put a FsImage to the peer NN, he need to check whether he > really need to put it at this time. > In detail, local SNN should establish an HTTP connection with the peer NN, > send the put request, and then immediately read the response (this is the key > point). If the peer NN does not reply an HTTP_OK, it means the
[jira] [Updated] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.
[ https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xudong Cao updated HDFS-14646: -- Attachment: HDFS-14646.003.patch Status: Patch Available (was: Open) > Standby NameNode should not upload fsimage to an inappropriate NameNode. > > > Key: HDFS-14646 > URL: https://issues.apache.org/jira/browse/HDFS-14646 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.1.2 >Reporter: Xudong Cao >Assignee: Xudong Cao >Priority: Major > Attachments: HDFS-14646.000.patch, HDFS-14646.001.patch, > HDFS-14646.002.patch, HDFS-14646.003.patch, HDFS-14646.003.patch > > > *Problem Description:* > In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put > the image to all other NNs (whether the peer NN is an ANN or not), and even > if the peer NN immediately replies an error (such as > TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult > .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put > process immediately, but will put the FsImage completely to the peer NN, and > will not read the peer NN's reply until the put is completed. > Depending on the version of Jetty, this behavior can lead to different > consequences, I tested it under 2.7.2 and trunk version. > *1.In Hadoop 2.7.2 (with Jetty 6.1.26)* > After peer NN called HttpServletResponse.sendError(), the underlying TCP > connection will still be established, and the data SNN sent will be read by > Jetty framework itself in the peer NN side, so the SNN will insignificantly > send the FsImage to the peer NN continuously, causing a waste of time and > bandwidth. In a relatively large HDFS cluster, the size of FsImage can often > reach about 30GB, This is indeed a big waste. > *2.In trunk version (with Jetty 9.3.27)* > After peer NN called HttpServletResponse.sendError(), the underlying TCP > connection will be auto closed, and then SNN will directly get an "Error > writing request body to server" exception, as below, note this test needs a > relatively big FSImage (e.g. 10MB level): > {code:java} > 2019-08-17 03:59:25,413 INFO namenode.TransferFsImage: Sending fileName: > /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: > 9864721. Sent total: 524288 bytes. Size of last segment intended to send: > 4096 bytes. > java.io.IOException: Error writing request body to server > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:314) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:249) > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:277) > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:272) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2019-08-17 03:59:25,422 INFO namenode.TransferFsImage: Sending fileName: > /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: > 9864721. Sent total: 851968 bytes. Size of last segment intended to send: > 4096 bytes. > java.io.IOException: Error writing request body to server > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340) > {code} > > *Solution:* > A standby NameNode should not upload fsimage to an inappropriate NameNode, > when he plans to put a FsImage to the peer NN, he need to check whether he > really need to put it at this time. > In detail, local SNN should establish an HTTP connection with the peer NN, > send the put request, and then immediately read the response (this is the key > point). If the peer NN
[jira] [Updated] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.
[ https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xudong Cao updated HDFS-14646: -- Description: *Problem Description:* In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put the image to all other NNs (whether the peer NN is an ANN or not), and even if the peer NN immediately replies an error (such as TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put process immediately, but will put the FsImage completely to the peer NN, and will not read the peer NN's reply until the put is completed. Depending on the version of Jetty, this behavior can lead to different consequences, I tested it under 2.7.2 and trunk version. *1.In Hadoop 2.7.2 (with Jetty 6.1.26)* After peer NN called HttpServletResponse.sendError(), the underlying TCP connection will still be established, and the data SNN sent will be read by Jetty framework itself in the peer NN side, so the SNN will insignificantly send the FsImage to the peer NN continuously, causing a waste of time and bandwidth. In a relatively large HDFS cluster, the size of FsImage can often reach about 30GB, This is indeed a big waste. *2.In trunk version (with Jetty 9.3.27)* After peer NN called HttpServletResponse.sendError(), the underlying TCP connection will be auto closed, and then SNN will directly get an "Error writing request body to server" exception, as below, note this test needs a relatively big FSImage (e.g. 10MB level): {code:java} 2019-08-17 03:59:25,413 INFO namenode.TransferFsImage: Sending fileName: /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 9864721. Sent total: 524288 bytes. Size of last segment intended to send: 4096 bytes. java.io.IOException: Error writing request body to server at sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) at sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570) at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396) at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340) at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:314) at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:249) at org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:277) at org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:272) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 2019-08-17 03:59:25,422 INFO namenode.TransferFsImage: Sending fileName: /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 9864721. Sent total: 851968 bytes. Size of last segment intended to send: 4096 bytes. java.io.IOException: Error writing request body to server at sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) at sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570) at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396) at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340) {code} *Solution:* A standby NameNode should not upload fsimage to an inappropriate NameNode, when he plans to put a FsImage to the peer NN, he need to check whether he really need to put it at this time. In detail, local SNN should establish an HTTP connection with the peer NN, send the put request, and then immediately read the response (this is the key point). If the peer NN does not reply an HTTP_OK, it means the local SNN should not put image at this time. was: *Problem Description:* In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put the image to all other NNs (whether the peer NN is an ANN or not), and even if the peer NN immediately replies an error (such as TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put process immediately, but will put the FsImage completely to the peer NN, and will not read the peer NN's reply until the put is completed. Depending on the version of Jetty, this behavior can lead to different consequences, I tested it under 2.7.2 and trunk version. *1.In Hadoop 2.7.2 (with Jetty 6.1.26)* After peer NN called
[jira] [Updated] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.
[ https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xudong Cao updated HDFS-14646: -- Status: Open (was: Patch Available) > Standby NameNode should not upload fsimage to an inappropriate NameNode. > > > Key: HDFS-14646 > URL: https://issues.apache.org/jira/browse/HDFS-14646 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.1.2 >Reporter: Xudong Cao >Assignee: Xudong Cao >Priority: Major > Attachments: HDFS-14646.000.patch, HDFS-14646.001.patch > > > *Problem Description:* > In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put > the image to all other NNs (whether the peer NN is an ANN or not), and even > if the peer NN immediately replies an error (such as > TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult > .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put > process immediately, but will put the FsImage completely to the peer NN, and > will not read the peer NN's reply until the put is completed. > Depending on the version of Jetty, this behavior can lead to different > consequences, I tested it under 2.7.2 and trunk version. > *1.In Hadoop 2.7.2 (with Jetty 6.1.26)* > After peer NN called HttpServletResponse.sendError(), the underlying TCP > connection will still be established, and the data SNN sent will be read by > Jetty framework itself in the peer NN side, so the SNN will insignificantly > send the FsImage to the peer NN continuously, causing a waste of time and > bandwidth. In a relatively large HDFS cluster, the size of FsImage can often > reach about 30GB, This is indeed a big waste. > *2.In trunk version (with Jetty 9.3.27)* > After peer NN called HttpServletResponse.sendError(), the underlying TCP > connection will be auto closed, and then SNN will directly get an "Error > writing request body to server" exception, as below, note this test needs a > relatively big FSImage (e.g. 10MB level): > {code:java} > 2019-08-17 03:59:25,413 INFO namenode.TransferFsImage: Sending fileName: > /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: > 9864721. Sent total: 524288 bytes. Size of last segment intended to send: > 4096 bytes. > java.io.IOException: Error writing request body to server > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:314) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:249) > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:277) > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:272) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2019-08-17 03:59:25,422 INFO namenode.TransferFsImage: Sending fileName: > /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: > 9864721. Sent total: 851968 bytes. Size of last segment intended to send: > 4096 bytes. > java.io.IOException: Error writing request body to server > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340) > {code} > > *Solution:* > A standby NameNode should not upload fsimage to an inappropriate NameNode, > when he plans to put a FsImage to the peer NN, he need to check whether he > really need to put it at this time. > In detail, local SNN should establish an HTTP connection with the peer NN, > send the put request, and then immediately read the response (this is the key > point). If the peer NN does not reply an HTTP_OK, it means the local SNN > should not put image at this time. -- This message was
[jira] [Updated] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.
[ https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xudong Cao updated HDFS-14646: -- Attachment: HDFS-14646.002.patch Status: Patch Available (was: Open) > Standby NameNode should not upload fsimage to an inappropriate NameNode. > > > Key: HDFS-14646 > URL: https://issues.apache.org/jira/browse/HDFS-14646 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.1.2 >Reporter: Xudong Cao >Assignee: Xudong Cao >Priority: Major > Attachments: HDFS-14646.000.patch, HDFS-14646.001.patch, > HDFS-14646.002.patch > > > *Problem Description:* > In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put > the image to all other NNs (whether the peer NN is an ANN or not), and even > if the peer NN immediately replies an error (such as > TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult > .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put > process immediately, but will put the FsImage completely to the peer NN, and > will not read the peer NN's reply until the put is completed. > Depending on the version of Jetty, this behavior can lead to different > consequences, I tested it under 2.7.2 and trunk version. > *1.In Hadoop 2.7.2 (with Jetty 6.1.26)* > After peer NN called HttpServletResponse.sendError(), the underlying TCP > connection will still be established, and the data SNN sent will be read by > Jetty framework itself in the peer NN side, so the SNN will insignificantly > send the FsImage to the peer NN continuously, causing a waste of time and > bandwidth. In a relatively large HDFS cluster, the size of FsImage can often > reach about 30GB, This is indeed a big waste. > *2.In trunk version (with Jetty 9.3.27)* > After peer NN called HttpServletResponse.sendError(), the underlying TCP > connection will be auto closed, and then SNN will directly get an "Error > writing request body to server" exception, as below, note this test needs a > relatively big FSImage (e.g. 10MB level): > {code:java} > 2019-08-17 03:59:25,413 INFO namenode.TransferFsImage: Sending fileName: > /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: > 9864721. Sent total: 524288 bytes. Size of last segment intended to send: > 4096 bytes. > java.io.IOException: Error writing request body to server > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:314) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:249) > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:277) > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:272) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2019-08-17 03:59:25,422 INFO namenode.TransferFsImage: Sending fileName: > /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: > 9864721. Sent total: 851968 bytes. Size of last segment intended to send: > 4096 bytes. > java.io.IOException: Error writing request body to server > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340) > {code} > > *Solution:* > A standby NameNode should not upload fsimage to an inappropriate NameNode, > when he plans to put a FsImage to the peer NN, he need to check whether he > really need to put it at this time. > In detail, local SNN should establish an HTTP connection with the peer NN, > send the put request, and then immediately read the response (this is the key > point). If the peer NN does not reply an HTTP_OK, it means the
[jira] [Commented] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.
[ https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16911885#comment-16911885 ] Xudong Cao commented on HDFS-14646: --- cc [~xkrogen],[~elgoiri],[~csun]The failed unit test has nothing to do with this jira. > Standby NameNode should not upload fsimage to an inappropriate NameNode. > > > Key: HDFS-14646 > URL: https://issues.apache.org/jira/browse/HDFS-14646 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.1.2 >Reporter: Xudong Cao >Assignee: Xudong Cao >Priority: Major > Attachments: HDFS-14646.000.patch, HDFS-14646.001.patch, > HDFS-14646.002.patch, HDFS-14646.003.patch > > > *Problem Description:* > In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put > the image to all other NNs (whether the peer NN is an ANN or not), and even > if the peer NN immediately replies an error (such as > TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult > .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put > process immediately, but will put the FsImage completely to the peer NN, and > will not read the peer NN's reply until the put is completed. > Depending on the version of Jetty, this behavior can lead to different > consequences, I tested it under 2.7.2 and trunk version. > *1.In Hadoop 2.7.2 (with Jetty 6.1.26)* > After peer NN called HttpServletResponse.sendError(), the underlying TCP > connection will still be established, and the data SNN sent will be read by > Jetty framework itself in the peer NN side, so the SNN will insignificantly > send the FsImage to the peer NN continuously, causing a waste of time and > bandwidth. In a relatively large HDFS cluster, the size of FsImage can often > reach about 30GB, This is indeed a big waste. > *2.In trunk version (with Jetty 9.3.27)* > After peer NN called HttpServletResponse.sendError(), the underlying TCP > connection will be auto closed, and then SNN will directly get an "Error > writing request body to server" exception, as below, note this test needs a > relatively big FSImage (e.g. 10MB level): > {code:java} > 2019-08-17 03:59:25,413 INFO namenode.TransferFsImage: Sending fileName: > /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: > 9864721. Sent total: 524288 bytes. Size of last segment intended to send: > 4096 bytes. > java.io.IOException: Error writing request body to server > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:314) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:249) > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:277) > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:272) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2019-08-17 03:59:25,422 INFO namenode.TransferFsImage: Sending fileName: > /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: > 9864721. Sent total: 851968 bytes. Size of last segment intended to send: > 4096 bytes. > java.io.IOException: Error writing request body to server > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340) > {code} > > *Solution:* > A standby NameNode should not upload fsimage to an inappropriate NameNode, > when he plans to put a FsImage to the peer NN, he need to check whether he > really need to put it at this time. > In detail, local SNN should establish an HTTP connection with the peer NN, > send the put request, and then immediately read the response (this is the key
[jira] [Commented] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.
[ https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915412#comment-16915412 ] Xudong Cao commented on HDFS-14646: --- Thanks [~csun] , our production environment uses Hadoop version 2.7.2, but merged multi-SBN feature. We have about 20,000 machines (divided into dozens of clusters), not long ago, we frequently encountered the problem described by the jira. and after this patch was merged, the errors are no longer reported. > Standby NameNode should not upload fsimage to an inappropriate NameNode. > > > Key: HDFS-14646 > URL: https://issues.apache.org/jira/browse/HDFS-14646 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.1.2 >Reporter: Xudong Cao >Assignee: Xudong Cao >Priority: Major > Attachments: HDFS-14646.000.patch, HDFS-14646.001.patch, > HDFS-14646.002.patch, HDFS-14646.003.patch > > > *Problem Description:* > In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put > the image to all other NNs (whether the peer NN is an ANN or not), and even > if the peer NN immediately replies an error (such as > TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult > .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put > process immediately, but will put the FsImage completely to the peer NN, and > will not read the peer NN's reply until the put is completed. > Depending on the version of Jetty, this behavior can lead to different > consequences, I tested it under 2.7.2 and trunk version. > *1.In Hadoop 2.7.2 (with Jetty 6.1.26)* > After peer NN called HttpServletResponse.sendError(), the underlying TCP > connection will still be established, and the data SNN sent will be read by > Jetty framework itself in the peer NN side, so the SNN will insignificantly > send the FsImage to the peer NN continuously, causing a waste of time and > bandwidth. In a relatively large HDFS cluster, the size of FsImage can often > reach about 30GB, This is indeed a big waste. > *2.In trunk version (with Jetty 9.3.27)* > After peer NN called HttpServletResponse.sendError(), the underlying TCP > connection will be auto closed, and then SNN will directly get an "Error > writing request body to server" exception, as below, note this test needs a > relatively big FSImage (e.g. 10MB level): > {code:java} > 2019-08-17 03:59:25,413 INFO namenode.TransferFsImage: Sending fileName: > /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: > 9864721. Sent total: 524288 bytes. Size of last segment intended to send: > 4096 bytes. > java.io.IOException: Error writing request body to server > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:314) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:249) > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:277) > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:272) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2019-08-17 03:59:25,422 INFO namenode.TransferFsImage: Sending fileName: > /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: > 9864721. Sent total: 851968 bytes. Size of last segment intended to send: > 4096 bytes. > java.io.IOException: Error writing request body to server > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340) > {code} > > *Solution:* > A standby NameNode should not upload fsimage to an inappropriate NameNode, > when he plans to put a FsImage to the peer NN,
[jira] [Comment Edited] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.
[ https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915412#comment-16915412 ] Xudong Cao edited comment on HDFS-14646 at 8/26/19 1:58 AM: Thanks [~csun] , our production environment uses Hadoop version 2.7.2, but merged multi-SBN feature. We have about 20,000 machines (divided into dozens of clusters). Not long ago, we frequently encountered the problem described by the jira. and after this patch was merged, the errors are no longer reported. was (Author: xudongcao): Thanks [~csun] , our production environment uses Hadoop version 2.7.2, but merged multi-SBN feature. We have about 20,000 machines (divided into dozens of clusters), not long ago, we frequently encountered the problem described by the jira. and after this patch was merged, the errors are no longer reported. > Standby NameNode should not upload fsimage to an inappropriate NameNode. > > > Key: HDFS-14646 > URL: https://issues.apache.org/jira/browse/HDFS-14646 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.1.2 >Reporter: Xudong Cao >Assignee: Xudong Cao >Priority: Major > Attachments: HDFS-14646.000.patch, HDFS-14646.001.patch, > HDFS-14646.002.patch, HDFS-14646.003.patch > > > *Problem Description:* > In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put > the image to all other NNs (whether the peer NN is an ANN or not), and even > if the peer NN immediately replies an error (such as > TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult > .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put > process immediately, but will put the FsImage completely to the peer NN, and > will not read the peer NN's reply until the put is completed. > Depending on the version of Jetty, this behavior can lead to different > consequences, I tested it under 2.7.2 and trunk version. > *1.In Hadoop 2.7.2 (with Jetty 6.1.26)* > After peer NN called HttpServletResponse.sendError(), the underlying TCP > connection will still be established, and the data SNN sent will be read by > Jetty framework itself in the peer NN side, so the SNN will insignificantly > send the FsImage to the peer NN continuously, causing a waste of time and > bandwidth. In a relatively large HDFS cluster, the size of FsImage can often > reach about 30GB, This is indeed a big waste. > *2.In trunk version (with Jetty 9.3.27)* > After peer NN called HttpServletResponse.sendError(), the underlying TCP > connection will be auto closed, and then SNN will directly get an "Error > writing request body to server" exception, as below, note this test needs a > relatively big FSImage (e.g. 10MB level): > {code:java} > 2019-08-17 03:59:25,413 INFO namenode.TransferFsImage: Sending fileName: > /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: > 9864721. Sent total: 524288 bytes. Size of last segment intended to send: > 4096 bytes. > java.io.IOException: Error writing request body to server > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:314) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:249) > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:277) > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:272) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2019-08-17 03:59:25,422 INFO namenode.TransferFsImage: Sending fileName: > /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: > 9864721. Sent total: 851968 bytes. Size of last segment intended to send: > 4096 bytes. > java.io.IOException: Error writing request body to server > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570) >
[jira] [Comment Edited] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.
[ https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915412#comment-16915412 ] Xudong Cao edited comment on HDFS-14646 at 8/26/19 2:07 AM: Thanks [~csun] , our production environment uses Hadoop version 2.7.2, but merged multi-SBN feature. We have about 20,000 machines (divided into dozens of HDFS clusters). Not long ago, we frequently encountered the problem described by the jira. and after this patch was merged, the errors are no longer reported. was (Author: xudongcao): Thanks [~csun] , our production environment uses Hadoop version 2.7.2, but merged multi-SBN feature. We have about 20,000 machines (divided into dozens of clusters). Not long ago, we frequently encountered the problem described by the jira. and after this patch was merged, the errors are no longer reported. > Standby NameNode should not upload fsimage to an inappropriate NameNode. > > > Key: HDFS-14646 > URL: https://issues.apache.org/jira/browse/HDFS-14646 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.1.2 >Reporter: Xudong Cao >Assignee: Xudong Cao >Priority: Major > Attachments: HDFS-14646.000.patch, HDFS-14646.001.patch, > HDFS-14646.002.patch, HDFS-14646.003.patch > > > *Problem Description:* > In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put > the image to all other NNs (whether the peer NN is an ANN or not), and even > if the peer NN immediately replies an error (such as > TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult > .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put > process immediately, but will put the FsImage completely to the peer NN, and > will not read the peer NN's reply until the put is completed. > Depending on the version of Jetty, this behavior can lead to different > consequences, I tested it under 2.7.2 and trunk version. > *1.In Hadoop 2.7.2 (with Jetty 6.1.26)* > After peer NN called HttpServletResponse.sendError(), the underlying TCP > connection will still be established, and the data SNN sent will be read by > Jetty framework itself in the peer NN side, so the SNN will insignificantly > send the FsImage to the peer NN continuously, causing a waste of time and > bandwidth. In a relatively large HDFS cluster, the size of FsImage can often > reach about 30GB, This is indeed a big waste. > *2.In trunk version (with Jetty 9.3.27)* > After peer NN called HttpServletResponse.sendError(), the underlying TCP > connection will be auto closed, and then SNN will directly get an "Error > writing request body to server" exception, as below, note this test needs a > relatively big FSImage (e.g. 10MB level): > {code:java} > 2019-08-17 03:59:25,413 INFO namenode.TransferFsImage: Sending fileName: > /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: > 9864721. Sent total: 524288 bytes. Size of last segment intended to send: > 4096 bytes. > java.io.IOException: Error writing request body to server > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:314) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:249) > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:277) > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:272) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2019-08-17 03:59:25,422 INFO namenode.TransferFsImage: Sending fileName: > /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: > 9864721. Sent total: 851968 bytes. Size of last segment intended to send: > 4096 bytes. > java.io.IOException: Error writing request body to server > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) > at >
[jira] [Comment Edited] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.
[ https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915412#comment-16915412 ] Xudong Cao edited comment on HDFS-14646 at 8/26/19 2:08 AM: Thanks [~csun] , our production environment uses Hadoop version 2.7.2, but merged multi-SBN feature. We have about 20,000 machines (divided into dozens of HDFS clusters). Not long ago, we frequently encountered the problem described by this jira. and after this patch was merged, the errors are no longer reported. was (Author: xudongcao): Thanks [~csun] , our production environment uses Hadoop version 2.7.2, but merged multi-SBN feature. We have about 20,000 machines (divided into dozens of HDFS clusters). Not long ago, we frequently encountered the problem described by the jira. and after this patch was merged, the errors are no longer reported. > Standby NameNode should not upload fsimage to an inappropriate NameNode. > > > Key: HDFS-14646 > URL: https://issues.apache.org/jira/browse/HDFS-14646 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.1.2 >Reporter: Xudong Cao >Assignee: Xudong Cao >Priority: Major > Attachments: HDFS-14646.000.patch, HDFS-14646.001.patch, > HDFS-14646.002.patch, HDFS-14646.003.patch > > > *Problem Description:* > In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put > the image to all other NNs (whether the peer NN is an ANN or not), and even > if the peer NN immediately replies an error (such as > TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult > .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put > process immediately, but will put the FsImage completely to the peer NN, and > will not read the peer NN's reply until the put is completed. > Depending on the version of Jetty, this behavior can lead to different > consequences, I tested it under 2.7.2 and trunk version. > *1.In Hadoop 2.7.2 (with Jetty 6.1.26)* > After peer NN called HttpServletResponse.sendError(), the underlying TCP > connection will still be established, and the data SNN sent will be read by > Jetty framework itself in the peer NN side, so the SNN will insignificantly > send the FsImage to the peer NN continuously, causing a waste of time and > bandwidth. In a relatively large HDFS cluster, the size of FsImage can often > reach about 30GB, This is indeed a big waste. > *2.In trunk version (with Jetty 9.3.27)* > After peer NN called HttpServletResponse.sendError(), the underlying TCP > connection will be auto closed, and then SNN will directly get an "Error > writing request body to server" exception, as below, note this test needs a > relatively big FSImage (e.g. 10MB level): > {code:java} > 2019-08-17 03:59:25,413 INFO namenode.TransferFsImage: Sending fileName: > /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: > 9864721. Sent total: 524288 bytes. Size of last segment intended to send: > 4096 bytes. > java.io.IOException: Error writing request body to server > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:314) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:249) > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:277) > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:272) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2019-08-17 03:59:25,422 INFO namenode.TransferFsImage: Sending fileName: > /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: > 9864721. Sent total: 851968 bytes. Size of last segment intended to send: > 4096 bytes. > java.io.IOException: Error writing request body to server > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) > at >
[jira] [Comment Edited] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.
[ https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925408#comment-16925408 ] Xudong Cao edited comment on HDFS-14646 at 9/9/19 6:44 AM: --- Thanks [~csun] very mush for reviewing, I have corrected all the problems you pointed out. Additionally: # I test this jira under the newest release-3.2.0-RC1 (with Jetty 9.3.24), it behaves just the same like trunk. # Disable http keep-alive in TransferFsImage.setupConnection(), because it sometimes causes bugs. Specifically, when the client sends a same put request again, the servlet sometimes will not call the corresponding doPut(), this is a very small probability bug. # In ImageServlet.doPut(), a ImageUploadRequest should be added to the set ImageServlet.currentlyDownloadingCheckpoints in check-should-put step, and removed in the end of transfer image step, so that between this two steps, other NNs can not interfere. Another bug fixed: # ImageServlet.doPut() doesn't add anything to the set FSImage.currentlyCheckpointing, so it doesn't need to invoke currentlyCheckpointing.remove() to remove anything from this set too. I will fix the comment for FSImage.currentlyCheckpointing in another jira if needed later. was (Author: xudongcao): Thanks [~csun] very mush for reviewing, I have corrected all the problems you pointed out. Additionally: # I test this jira under the newest release-3.2.0-RC1 (with Jetty 9.3.24), it behaves just the same like trunk, # Disable http keep-alive in TransferFsImage.setupConnection(), because it sometimes causes bugs. Specifically, when the client sends a same put request again, the servlet sometimes will not call the corresponding doPut(). # In ImageServlet.doPut(), a ImageUploadRequest should be added to the set ImageServlet.currentlyDownloadingCheckpoints in check-should-put step, and removed in the end of transfer image step, so that between this two steps, other NNs can not interfere. Another bug fixed: # ImageServlet.doPut() doesn't add anything to the set FSImage.currentlyCheckpointing, so it doesn't need to invoke currentlyCheckpointing.remove() to remove anything from this set too. I will fix the comment for FSImage.currentlyCheckpointing in another jira if needed later. > Standby NameNode should not upload fsimage to an inappropriate NameNode. > > > Key: HDFS-14646 > URL: https://issues.apache.org/jira/browse/HDFS-14646 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.1.2 >Reporter: Xudong Cao >Assignee: Xudong Cao >Priority: Major > Attachments: HDFS-14646.000.patch, HDFS-14646.001.patch, > HDFS-14646.002.patch, HDFS-14646.003.patch, HDFS-14646.004.patch > > > *Problem Description:* > In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put > the image to all other NNs (whether the peer NN is an ANN or not), and even > if the peer NN immediately replies an error (such as > TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult > .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put > process immediately, but will put the FsImage completely to the peer NN, and > will not read the peer NN's reply until the put is completed. > Depending on the version of Jetty, this behavior can lead to different > consequences : > *1.Under Hadoop 2.7.2 (with Jetty 6.1.26)* > After peer NN called HttpServletResponse.sendError(), the underlying TCP > connection will still be established, and the data SNN sent will be read by > Jetty framework itself in the peer NN side, so the SNN will insignificantly > send the FsImage to the peer NN continuously, causing a waste of time and > bandwidth. In a relatively large HDFS cluster, the size of FsImage can often > reach about 30GB, This is indeed a big waste. > *2.Under newest release-3.2.0-RC1 (with Jetty 9.3.24) and trunk (with Jetty > 9.3.27)* > After peer NN called HttpServletResponse.sendError(), the underlying TCP > connection will be auto closed, and then SNN will directly get an "Error > writing request body to server" exception, as below, note this test needs a > relatively big FSImage (e.g. 10MB level): > {code:java} > 2019-08-17 03:59:25,413 INFO namenode.TransferFsImage: Sending fileName: > /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: > 9864721. Sent total: 524288 bytes. Size of last segment intended to send: > 4096 bytes. > java.io.IOException: Error writing request body to server > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570) > at >
[jira] [Updated] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.
[ https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xudong Cao updated HDFS-14646: -- Description: *Problem Description:* In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put the image to all other NNs (whether the peer NN is an ANN or not), and even if the peer NN immediately replies an error (such as TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put process immediately, but will put the FsImage completely to the peer NN, and will not read the peer NN's reply until the put is completed. Depending on the version of Jetty, this behavior can lead to different consequences : *1.Under Hadoop 2.7.2 (with Jetty 6.1.26)* After peer NN called HttpServletResponse.sendError(), the underlying TCP connection will still be established, and the data SNN sent will be read by Jetty framework itself in the peer NN side, so the SNN will insignificantly send the FsImage to the peer NN continuously, causing a waste of time and bandwidth. In a relatively large HDFS cluster, the size of FsImage can often reach about 30GB, This is indeed a big waste. *2.Under newest release-3.2.0-RC1 (with Jetty 9.3.24) and trunk (with Jetty 9.3.27)* After peer NN called HttpServletResponse.sendError(), the underlying TCP connection will be auto closed, and then SNN will directly get an "Error writing request body to server" exception, as below, note this test needs a relatively big FSImage (e.g. 10MB level): {code:java} 2019-08-17 03:59:25,413 INFO namenode.TransferFsImage: Sending fileName: /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 9864721. Sent total: 524288 bytes. Size of last segment intended to send: 4096 bytes. java.io.IOException: Error writing request body to server at sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) at sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570) at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396) at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340) at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:314) at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:249) at org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:277) at org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:272) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 2019-08-17 03:59:25,422 INFO namenode.TransferFsImage: Sending fileName: /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 9864721. Sent total: 851968 bytes. Size of last segment intended to send: 4096 bytes. java.io.IOException: Error writing request body to server at sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) at sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570) at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396) at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340) {code} *Solution:* A standby NameNode should not upload fsimage to an inappropriate NameNode, when he plans to put a FsImage to the peer NN, he need to check whether he really need to put it at this time. In detail, local SNN should establish an HTTP connection with the peer NN, send the put request, and then immediately read the response (this is the key point). If the peer NN does not reply an HTTP_OK, it means the local SNN should not put image at this time. was: *Problem Description:* In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put the image to all other NNs (whether the peer NN is an ANN or not), and even if the peer NN immediately replies an error (such as TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put process immediately, but will put the FsImage completely to the peer NN, and will not read the peer NN's reply until the put is completed. Depending on the version of Jetty, this behavior can lead to different consequences : *1.Under Hadoop 2.7.2 (with Jetty 6.1.26)* After peer NN called HttpServletResponse.sendError(), the underlying TCP
[jira] [Updated] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.
[ https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xudong Cao updated HDFS-14646: -- Status: Open (was: Patch Available) > Standby NameNode should not upload fsimage to an inappropriate NameNode. > > > Key: HDFS-14646 > URL: https://issues.apache.org/jira/browse/HDFS-14646 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.1.2 >Reporter: Xudong Cao >Assignee: Xudong Cao >Priority: Major > Attachments: HDFS-14646.000.patch, HDFS-14646.001.patch, > HDFS-14646.002.patch, HDFS-14646.003.patch > > > *Problem Description:* > In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put > the image to all other NNs (whether the peer NN is an ANN or not), and even > if the peer NN immediately replies an error (such as > TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult > .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put > process immediately, but will put the FsImage completely to the peer NN, and > will not read the peer NN's reply until the put is completed. > Depending on the version of Jetty, this behavior can lead to different > consequences, I tested it under 2.7.2 and trunk version. > *1.In Hadoop 2.7.2 (with Jetty 6.1.26)* > After peer NN called HttpServletResponse.sendError(), the underlying TCP > connection will still be established, and the data SNN sent will be read by > Jetty framework itself in the peer NN side, so the SNN will insignificantly > send the FsImage to the peer NN continuously, causing a waste of time and > bandwidth. In a relatively large HDFS cluster, the size of FsImage can often > reach about 30GB, This is indeed a big waste. > *2.In trunk version (with Jetty 9.3.27)* > After peer NN called HttpServletResponse.sendError(), the underlying TCP > connection will be auto closed, and then SNN will directly get an "Error > writing request body to server" exception, as below, note this test needs a > relatively big FSImage (e.g. 10MB level): > {code:java} > 2019-08-17 03:59:25,413 INFO namenode.TransferFsImage: Sending fileName: > /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: > 9864721. Sent total: 524288 bytes. Size of last segment intended to send: > 4096 bytes. > java.io.IOException: Error writing request body to server > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:314) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:249) > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:277) > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:272) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2019-08-17 03:59:25,422 INFO namenode.TransferFsImage: Sending fileName: > /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: > 9864721. Sent total: 851968 bytes. Size of last segment intended to send: > 4096 bytes. > java.io.IOException: Error writing request body to server > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340) > {code} > > *Solution:* > A standby NameNode should not upload fsimage to an inappropriate NameNode, > when he plans to put a FsImage to the peer NN, he need to check whether he > really need to put it at this time. > In detail, local SNN should establish an HTTP connection with the peer NN, > send the put request, and then immediately read the response (this is the key > point). If the peer NN does not reply an HTTP_OK, it means the local SNN > should not
[jira] [Updated] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.
[ https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xudong Cao updated HDFS-14646: -- Description: *Problem Description:* In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put the image to all other NNs (whether the peer NN is an ANN or not), and even if the peer NN immediately replies an error (such as TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put process immediately, but will put the FsImage completely to the peer NN, and will not read the peer NN's reply until the put is completed. Depending on the version of Jetty, this behavior can lead to different consequences : *1.Under Hadoop 2.7.2 (with Jetty 6.1.26)* After peer NN called HttpServletResponse.sendError(), the underlying TCP connection will still be established, and the data SNN sent will be read by Jetty framework itself in the peer NN side, so the SNN will insignificantly send the FsImage to the peer NN continuously, causing a waste of time and bandwidth. In a relatively large HDFS cluster, the size of FsImage can often reach about 30GB, This is indeed a big waste. *2.Under release-3.2.0-RC1 (with Jetty 9.3.24) and trunk (with Jetty 9.3.27)* After peer NN called HttpServletResponse.sendError(), the underlying TCP connection will be auto closed, and then SNN will directly get an "Error writing request body to server" exception, as below, note this test needs a relatively big FSImage (e.g. 10MB level): {code:java} 2019-08-17 03:59:25,413 INFO namenode.TransferFsImage: Sending fileName: /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 9864721. Sent total: 524288 bytes. Size of last segment intended to send: 4096 bytes. java.io.IOException: Error writing request body to server at sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) at sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570) at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396) at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340) at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:314) at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:249) at org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:277) at org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:272) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 2019-08-17 03:59:25,422 INFO namenode.TransferFsImage: Sending fileName: /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: 9864721. Sent total: 851968 bytes. Size of last segment intended to send: 4096 bytes. java.io.IOException: Error writing request body to server at sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) at sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570) at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396) at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340) {code} *Solution:* A standby NameNode should not upload fsimage to an inappropriate NameNode, when he plans to put a FsImage to the peer NN, he need to check whether he really need to put it at this time. In detail, local SNN should establish an HTTP connection with the peer NN, send the put request, and then immediately read the response (this is the key point). If the peer NN does not reply an HTTP_OK, it means the local SNN should not put image at this time. was: *Problem Description:* In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put the image to all other NNs (whether the peer NN is an ANN or not), and even if the peer NN immediately replies an error (such as TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put process immediately, but will put the FsImage completely to the peer NN, and will not read the peer NN's reply until the put is completed. Depending on the version of Jetty, this behavior can lead to different consequences, I tested it under 2.7.2 and trunk version. *1.In Hadoop 2.7.2 (with Jetty 6.1.26)* After peer NN called
[jira] [Commented] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.
[ https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925408#comment-16925408 ] Xudong Cao commented on HDFS-14646: --- Thanks [~csun] very mush for reviewing, I have corrected all the problems you pointed out. Additionally: # I test this jira under the newest release-3.2.0-RC1 (with Jetty 9.3.24), it behaves just the same like trunk, # Disable http keep-alive in TransferFsImage.setupConnection(), because it sometimes causes bugs. Specifically, when the client sends a same put request again, the servlet sometimes will not call the corresponding doPut(). # In ImageServlet.doPut(), a ImageUploadRequest should be added to the set ImageServlet.currentlyDownloadingCheckpoints in check-should-put step, and removed in the end of transfer image step, so that between this two steps, other NNs can not interfere. Another bug fixed: # ImageServlet.doPut() doesn't add anything to the set FSImage.currentlyCheckpointing, so it doesn't need to invoke currentlyCheckpointing.remove() to remove anything from this set too. I will fix the comment for FSImage.currentlyCheckpointing in another jira if needed later. > Standby NameNode should not upload fsimage to an inappropriate NameNode. > > > Key: HDFS-14646 > URL: https://issues.apache.org/jira/browse/HDFS-14646 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.1.2 >Reporter: Xudong Cao >Assignee: Xudong Cao >Priority: Major > Attachments: HDFS-14646.000.patch, HDFS-14646.001.patch, > HDFS-14646.002.patch, HDFS-14646.003.patch, HDFS-14646.004.patch > > > *Problem Description:* > In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put > the image to all other NNs (whether the peer NN is an ANN or not), and even > if the peer NN immediately replies an error (such as > TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult > .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put > process immediately, but will put the FsImage completely to the peer NN, and > will not read the peer NN's reply until the put is completed. > Depending on the version of Jetty, this behavior can lead to different > consequences, I tested it under 2.7.2 and trunk version. > *1.In Hadoop 2.7.2 (with Jetty 6.1.26)* > After peer NN called HttpServletResponse.sendError(), the underlying TCP > connection will still be established, and the data SNN sent will be read by > Jetty framework itself in the peer NN side, so the SNN will insignificantly > send the FsImage to the peer NN continuously, causing a waste of time and > bandwidth. In a relatively large HDFS cluster, the size of FsImage can often > reach about 30GB, This is indeed a big waste. > *2.In trunk version (with Jetty 9.3.27)* > After peer NN called HttpServletResponse.sendError(), the underlying TCP > connection will be auto closed, and then SNN will directly get an "Error > writing request body to server" exception, as below, note this test needs a > relatively big FSImage (e.g. 10MB level): > {code:java} > 2019-08-17 03:59:25,413 INFO namenode.TransferFsImage: Sending fileName: > /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: > 9864721. Sent total: 524288 bytes. Size of last segment intended to send: > 4096 bytes. > java.io.IOException: Error writing request body to server > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:314) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:249) > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:277) > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:272) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2019-08-17 03:59:25,422 INFO namenode.TransferFsImage: Sending fileName: > /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: > 9864721. Sent total: 851968 bytes. Size of last
[jira] [Updated] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.
[ https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xudong Cao updated HDFS-14646: -- Attachment: HDFS-14646.004.patch Status: Patch Available (was: Open) > Standby NameNode should not upload fsimage to an inappropriate NameNode. > > > Key: HDFS-14646 > URL: https://issues.apache.org/jira/browse/HDFS-14646 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.1.2 >Reporter: Xudong Cao >Assignee: Xudong Cao >Priority: Major > Attachments: HDFS-14646.000.patch, HDFS-14646.001.patch, > HDFS-14646.002.patch, HDFS-14646.003.patch, HDFS-14646.004.patch > > > *Problem Description:* > In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put > the image to all other NNs (whether the peer NN is an ANN or not), and even > if the peer NN immediately replies an error (such as > TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult > .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put > process immediately, but will put the FsImage completely to the peer NN, and > will not read the peer NN's reply until the put is completed. > Depending on the version of Jetty, this behavior can lead to different > consequences, I tested it under 2.7.2 and trunk version. > *1.In Hadoop 2.7.2 (with Jetty 6.1.26)* > After peer NN called HttpServletResponse.sendError(), the underlying TCP > connection will still be established, and the data SNN sent will be read by > Jetty framework itself in the peer NN side, so the SNN will insignificantly > send the FsImage to the peer NN continuously, causing a waste of time and > bandwidth. In a relatively large HDFS cluster, the size of FsImage can often > reach about 30GB, This is indeed a big waste. > *2.In trunk version (with Jetty 9.3.27)* > After peer NN called HttpServletResponse.sendError(), the underlying TCP > connection will be auto closed, and then SNN will directly get an "Error > writing request body to server" exception, as below, note this test needs a > relatively big FSImage (e.g. 10MB level): > {code:java} > 2019-08-17 03:59:25,413 INFO namenode.TransferFsImage: Sending fileName: > /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: > 9864721. Sent total: 524288 bytes. Size of last segment intended to send: > 4096 bytes. > java.io.IOException: Error writing request body to server > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:314) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:249) > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:277) > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:272) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2019-08-17 03:59:25,422 INFO namenode.TransferFsImage: Sending fileName: > /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: > 9864721. Sent total: 851968 bytes. Size of last segment intended to send: > 4096 bytes. > java.io.IOException: Error writing request body to server > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:396) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:340) > {code} > > *Solution:* > A standby NameNode should not upload fsimage to an inappropriate NameNode, > when he plans to put a FsImage to the peer NN, he need to check whether he > really need to put it at this time. > In detail, local SNN should establish an HTTP connection with the peer NN, > send the put request, and then immediately read the response (this is the key > point). If the peer NN
[jira] [Comment Edited] (HDFS-14646) Standby NameNode should not upload fsimage to an inappropriate NameNode.
[ https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925408#comment-16925408 ] Xudong Cao edited comment on HDFS-14646 at 9/9/19 6:46 AM: --- Thanks [~csun] very mush for reviewing, I have corrected all the problems you pointed out. Additionally: # I test this jira under the newest release-3.2.0-RC1 (with Jetty 9.3.24), it behaves just the same like trunk. # Disable http keep-alive in TransferFsImage.setupConnection(), because it sometimes causes bugs. Specifically, when the client sends a same put request again, the servlet sometimes will not call the corresponding doPut(), this is a very small probability problem. # In ImageServlet.doPut(), a ImageUploadRequest should be added to the set ImageServlet.currentlyDownloadingCheckpoints in check-should-put step, and removed in the end of transfer image step, so that between this two steps, other NNs can not interfere. Another bug fixed: # ImageServlet.doPut() doesn't add anything to the set FSImage.currentlyCheckpointing, so it doesn't need to invoke currentlyCheckpointing.remove() to remove anything from this set too. I will fix the comment for FSImage.currentlyCheckpointing in another jira if needed later. was (Author: xudongcao): Thanks [~csun] very mush for reviewing, I have corrected all the problems you pointed out. Additionally: # I test this jira under the newest release-3.2.0-RC1 (with Jetty 9.3.24), it behaves just the same like trunk. # Disable http keep-alive in TransferFsImage.setupConnection(), because it sometimes causes bugs. Specifically, when the client sends a same put request again, the servlet sometimes will not call the corresponding doPut(), this is a very small probability bug. # In ImageServlet.doPut(), a ImageUploadRequest should be added to the set ImageServlet.currentlyDownloadingCheckpoints in check-should-put step, and removed in the end of transfer image step, so that between this two steps, other NNs can not interfere. Another bug fixed: # ImageServlet.doPut() doesn't add anything to the set FSImage.currentlyCheckpointing, so it doesn't need to invoke currentlyCheckpointing.remove() to remove anything from this set too. I will fix the comment for FSImage.currentlyCheckpointing in another jira if needed later. > Standby NameNode should not upload fsimage to an inappropriate NameNode. > > > Key: HDFS-14646 > URL: https://issues.apache.org/jira/browse/HDFS-14646 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.1.2 >Reporter: Xudong Cao >Assignee: Xudong Cao >Priority: Major > Attachments: HDFS-14646.000.patch, HDFS-14646.001.patch, > HDFS-14646.002.patch, HDFS-14646.003.patch, HDFS-14646.004.patch > > > *Problem Description:* > In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put > the image to all other NNs (whether the peer NN is an ANN or not), and even > if the peer NN immediately replies an error (such as > TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult > .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put > process immediately, but will put the FsImage completely to the peer NN, and > will not read the peer NN's reply until the put is completed. > Depending on the version of Jetty, this behavior can lead to different > consequences : > *1.Under Hadoop 2.7.2 (with Jetty 6.1.26)* > After peer NN called HttpServletResponse.sendError(), the underlying TCP > connection will still be established, and the data SNN sent will be read by > Jetty framework itself in the peer NN side, so the SNN will insignificantly > send the FsImage to the peer NN continuously, causing a waste of time and > bandwidth. In a relatively large HDFS cluster, the size of FsImage can often > reach about 30GB, This is indeed a big waste. > *2.Under newest release-3.2.0-RC1 (with Jetty 9.3.24) and trunk (with Jetty > 9.3.27)* > After peer NN called HttpServletResponse.sendError(), the underlying TCP > connection will be auto closed, and then SNN will directly get an "Error > writing request body to server" exception, as below, note this test needs a > relatively big FSImage (e.g. 10MB level): > {code:java} > 2019-08-17 03:59:25,413 INFO namenode.TransferFsImage: Sending fileName: > /tmp/hadoop-root/dfs/name/current/fsimage_3364240, fileSize: > 9864721. Sent total: 524288 bytes. Size of last segment intended to send: > 4096 bytes. > java.io.IOException: Error writing request body to server > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) > at >
[jira] [Updated] (HDFS-14693) NameNode should log a warning when EditLog IPC logger's pending size exceeds limit.
[ https://issues.apache.org/jira/browse/HDFS-14693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xudong Cao updated HDFS-14693: -- Description: In a production environment, there may be some differences in each JouranlNode (e.g. network condition, disk condition, and so on). For example, If a JN's network is much worse than other JNs, then the time taken by the NN to write this JN will be much greater than other JNs, in this case, it will cause the IPC Logger thread corresponding to this JN to have many pending edits, when the pending edits exceeds the maximum limit (default 10MB), the new edits about to write to this JN will be silently dropped, and will result gaps in the editlog segment, which causing this JN and NN repeatedly reporting the following errors: {code:java} org.apache.hadoop.hdfs.qjournal.protocol.JournalOutOfSyncException: Can't write txid 1904164873 expecting nextTxId=1904164871{code} Unfortunately, the above error message can not help us quickly find the root cause, so it's better to add a warning log to tell us the really reason, like this: {code:java} 2019-08-02 04:55:05,879 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager:Pending edits to 192.168.202.13:8485 is going to exceed limit size:10240, current queued edits size:10224, will silently drop 174 bytes of edits!{code} This is just a very small improvement. > NameNode should log a warning when EditLog IPC logger's pending size exceeds > limit. > --- > > Key: HDFS-14693 > URL: https://issues.apache.org/jira/browse/HDFS-14693 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Xudong Cao >Priority: Minor > > In a production environment, there may be some differences in each > JouranlNode (e.g. network condition, disk condition, and so on). For example, > If a JN's network is much worse than other JNs, then the time taken by the NN > to write this JN will be much greater than other JNs, in this case, it will > cause the IPC Logger thread corresponding to this JN to have many pending > edits, when the pending edits exceeds the maximum limit (default 10MB), the > new edits about to write to this JN will be silently dropped, and will result > gaps in the editlog segment, which causing this JN and NN repeatedly > reporting the following errors: > > {code:java} > org.apache.hadoop.hdfs.qjournal.protocol.JournalOutOfSyncException: Can't > write txid 1904164873 expecting nextTxId=1904164871{code} > > Unfortunately, the above error message can not help us quickly find the root > cause, so it's better to add a warning log to tell us the really reason, like > this: > > {code:java} > 2019-08-02 04:55:05,879 WARN > org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager:Pending edits to > 192.168.202.13:8485 is going to exceed limit size:10240, current queued edits > size:10224, will silently drop 174 bytes of edits!{code} > > This is just a very small improvement. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14693) NameNode should log a warning when EditLog IPC logger's pending size exceeds limit.
[ https://issues.apache.org/jira/browse/HDFS-14693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xudong Cao updated HDFS-14693: -- Description: In a production environment, there may be some differences in each JouranlNode (e.g. network condition, disk condition, and so on). For example, If a JN's network is much worse than other JNs, then the time taken by the NN to write this JN will be much greater than other JNs, in this case, it will cause the IPC Logger thread corresponding to this JN to have many pending edits, when the pending edits exceeds the maximum limit (default 10MB), the new edits about to write to this JN will be silently dropped, and will result gaps in the editlog segment, which causing this JN and NN repeatedly reporting the following errors: {code:java} org.apache.hadoop.hdfs.qjournal.protocol.JournalOutOfSyncException: Can't write txid 1904164873 expecting nextTxId=1904164871{code} Unfortunately, the above error message can not help us quickly find the root cause, so it's better to add a warning log to tell us the really reason, like this: {code:java} 2019-08-02 04:55:05,879 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager:Pending edits to 192.168.202.13:8485 is going to exceed limit size:10240, current queued edits size:10224, will silently drop 174 bytes of edits!{code} This is just a very small improvement. was: In a production environment, there may be some differences in each JouranlNode (e.g. network condition, disk condition, and so on). For example, If a JN's network is much worse than other JNs, then the time taken by the NN to write this JN will be much greater than other JNs, in this case, it will cause the IPC Logger thread corresponding to this JN to have many pending edits, when the pending edits exceeds the maximum limit (default 10MB), the new edits about to write to this JN will be silently dropped, and will result gaps in the editlog segment, which causing this JN and NN repeatedly reporting the following errors: {code:java} org.apache.hadoop.hdfs.qjournal.protocol.JournalOutOfSyncException: Can't write txid 1904164873 expecting nextTxId=1904164871{code} Unfortunately, the above error message can not help us quickly find the root cause, so it's better to add a warning log to tell us the really reason, like this: {code:java} 2019-08-02 04:55:05,879 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager:Pending edits to 192.168.202.13:8485 is going to exceed limit size:10240, current queued edits size:10224, will silently drop 174 bytes of edits!{code} This is just a very small improvement. > NameNode should log a warning when EditLog IPC logger's pending size exceeds > limit. > --- > > Key: HDFS-14693 > URL: https://issues.apache.org/jira/browse/HDFS-14693 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Xudong Cao >Priority: Minor > > In a production environment, there may be some differences in each > JouranlNode (e.g. network condition, disk condition, and so on). For example, > If a JN's network is much worse than other JNs, then the time taken by the NN > to write this JN will be much greater than other JNs, in this case, it will > cause the IPC Logger thread corresponding to this JN to have many pending > edits, when the pending edits exceeds the maximum limit (default 10MB), the > new edits about to write to this JN will be silently dropped, and will result > gaps in the editlog segment, which causing this JN and NN repeatedly > reporting the following errors: > {code:java} > org.apache.hadoop.hdfs.qjournal.protocol.JournalOutOfSyncException: Can't > write txid 1904164873 expecting nextTxId=1904164871{code} > Unfortunately, the above error message can not help us quickly find the root > cause, so it's better to add a warning log to tell us the really reason, like > this: > {code:java} > 2019-08-02 04:55:05,879 WARN > org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager:Pending edits to > 192.168.202.13:8485 is going to exceed limit size:10240, current queued edits > size:10224, will silently drop 174 bytes of edits!{code} > This is just a very small improvement. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-14693) NameNode should log a warning when EditLog IPC logger's pending size exceeds limit.
Xudong Cao created HDFS-14693: - Summary: NameNode should log a warning when EditLog IPC logger's pending size exceeds limit. Key: HDFS-14693 URL: https://issues.apache.org/jira/browse/HDFS-14693 Project: Hadoop HDFS Issue Type: Improvement Reporter: Xudong Cao -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14693) NameNode should log a warning when EditLog IPC logger's pending size exceeds limit.
[ https://issues.apache.org/jira/browse/HDFS-14693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16898516#comment-16898516 ] Xudong Cao commented on HDFS-14693: --- Just add a log, does not need an unit test. > NameNode should log a warning when EditLog IPC logger's pending size exceeds > limit. > --- > > Key: HDFS-14693 > URL: https://issues.apache.org/jira/browse/HDFS-14693 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Xudong Cao >Assignee: Xudong Cao >Priority: Minor > Attachments: HDFS-16493.001.patch > > > In a production environment, there may be some differences in each > JouranlNode (e.g. network condition, disk condition, and so on). For example, > If a JN's network is much worse than other JNs, then the time taken by the NN > to write this JN will be much greater than other JNs, in this case, it will > cause the IPC Logger thread corresponding to this JN to have many pending > edits, when the pending edits exceeds the maximum limit (default 10MB), the > new edits about to write to this JN will be silently dropped, and will result > gaps in the editlog segment, which causing this JN and NN repeatedly > reporting the following errors: > {code:java} > org.apache.hadoop.hdfs.qjournal.protocol.JournalOutOfSyncException: Can't > write txid 1904164873 expecting nextTxId=1904164871{code} > Unfortunately, the above error message can not help us quickly find the root > cause, so it's better to add a warning log to tell us the really reason, like > this: > {code:java} > 2019-08-02 04:55:05,879 WARN > org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager:Pending edits to > 192.168.202.13:8485 is going to exceed limit size:10240, current queued edits > size:10224, will silently drop 174 bytes of edits!{code} > This is just a very small improvement. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14693) NameNode should log a warning when EditLog IPC logger's pending size exceeds limit.
[ https://issues.apache.org/jira/browse/HDFS-14693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xudong Cao updated HDFS-14693: -- Component/s: namenode > NameNode should log a warning when EditLog IPC logger's pending size exceeds > limit. > --- > > Key: HDFS-14693 > URL: https://issues.apache.org/jira/browse/HDFS-14693 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.1.2 >Reporter: Xudong Cao >Assignee: Xudong Cao >Priority: Minor > Attachments: HDFS-16493.001.patch > > > In a production environment, there may be some differences in each > JouranlNode (e.g. network condition, disk condition, and so on). For example, > If a JN's network is much worse than other JNs, then the time taken by the NN > to write this JN will be much greater than other JNs, in this case, it will > cause the IPC Logger thread corresponding to this JN to have many pending > edits, when the pending edits exceeds the maximum limit (default 10MB), the > new edits about to write to this JN will be silently dropped, and will result > gaps in the editlog segment, which causing this JN and NN repeatedly > reporting the following errors: > {code:java} > org.apache.hadoop.hdfs.qjournal.protocol.JournalOutOfSyncException: Can't > write txid 1904164873 expecting nextTxId=1904164871{code} > Unfortunately, the above error message can not help us quickly find the root > cause, so it's better to add a warning log to tell us the really reason, like > this: > {code:java} > 2019-08-02 04:55:05,879 WARN > org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager:Pending edits to > 192.168.202.13:8485 is going to exceed limit size:10240, current queued edits > size:10224, will silently drop 174 bytes of edits!{code} > This is just a very small improvement. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14693) NameNode should log a warning when EditLog IPC logger's pending size exceeds limit.
[ https://issues.apache.org/jira/browse/HDFS-14693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xudong Cao updated HDFS-14693: -- Affects Version/s: 3.1.2 > NameNode should log a warning when EditLog IPC logger's pending size exceeds > limit. > --- > > Key: HDFS-14693 > URL: https://issues.apache.org/jira/browse/HDFS-14693 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.1.2 >Reporter: Xudong Cao >Assignee: Xudong Cao >Priority: Minor > Attachments: HDFS-16493.001.patch > > > In a production environment, there may be some differences in each > JouranlNode (e.g. network condition, disk condition, and so on). For example, > If a JN's network is much worse than other JNs, then the time taken by the NN > to write this JN will be much greater than other JNs, in this case, it will > cause the IPC Logger thread corresponding to this JN to have many pending > edits, when the pending edits exceeds the maximum limit (default 10MB), the > new edits about to write to this JN will be silently dropped, and will result > gaps in the editlog segment, which causing this JN and NN repeatedly > reporting the following errors: > {code:java} > org.apache.hadoop.hdfs.qjournal.protocol.JournalOutOfSyncException: Can't > write txid 1904164873 expecting nextTxId=1904164871{code} > Unfortunately, the above error message can not help us quickly find the root > cause, so it's better to add a warning log to tell us the really reason, like > this: > {code:java} > 2019-08-02 04:55:05,879 WARN > org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager:Pending edits to > 192.168.202.13:8485 is going to exceed limit size:10240, current queued edits > size:10224, will silently drop 174 bytes of edits!{code} > This is just a very small improvement. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14693) NameNode should log a warning when EditLog IPC logger's pending size exceeds limit.
[ https://issues.apache.org/jira/browse/HDFS-14693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xudong Cao updated HDFS-14693: -- Assignee: Xudong Cao Status: Patch Available (was: Open) > NameNode should log a warning when EditLog IPC logger's pending size exceeds > limit. > --- > > Key: HDFS-14693 > URL: https://issues.apache.org/jira/browse/HDFS-14693 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Xudong Cao >Assignee: Xudong Cao >Priority: Minor > Attachments: HDFS-16493.001.patch > > > In a production environment, there may be some differences in each > JouranlNode (e.g. network condition, disk condition, and so on). For example, > If a JN's network is much worse than other JNs, then the time taken by the NN > to write this JN will be much greater than other JNs, in this case, it will > cause the IPC Logger thread corresponding to this JN to have many pending > edits, when the pending edits exceeds the maximum limit (default 10MB), the > new edits about to write to this JN will be silently dropped, and will result > gaps in the editlog segment, which causing this JN and NN repeatedly > reporting the following errors: > {code:java} > org.apache.hadoop.hdfs.qjournal.protocol.JournalOutOfSyncException: Can't > write txid 1904164873 expecting nextTxId=1904164871{code} > Unfortunately, the above error message can not help us quickly find the root > cause, so it's better to add a warning log to tell us the really reason, like > this: > {code:java} > 2019-08-02 04:55:05,879 WARN > org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager:Pending edits to > 192.168.202.13:8485 is going to exceed limit size:10240, current queued edits > size:10224, will silently drop 174 bytes of edits!{code} > This is just a very small improvement. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14693) NameNode should log a warning when EditLog IPC logger's pending size exceeds limit.
[ https://issues.apache.org/jira/browse/HDFS-14693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xudong Cao updated HDFS-14693: -- Attachment: HDFS-16493.001.patch > NameNode should log a warning when EditLog IPC logger's pending size exceeds > limit. > --- > > Key: HDFS-14693 > URL: https://issues.apache.org/jira/browse/HDFS-14693 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Xudong Cao >Priority: Minor > Attachments: HDFS-16493.001.patch > > > In a production environment, there may be some differences in each > JouranlNode (e.g. network condition, disk condition, and so on). For example, > If a JN's network is much worse than other JNs, then the time taken by the NN > to write this JN will be much greater than other JNs, in this case, it will > cause the IPC Logger thread corresponding to this JN to have many pending > edits, when the pending edits exceeds the maximum limit (default 10MB), the > new edits about to write to this JN will be silently dropped, and will result > gaps in the editlog segment, which causing this JN and NN repeatedly > reporting the following errors: > {code:java} > org.apache.hadoop.hdfs.qjournal.protocol.JournalOutOfSyncException: Can't > write txid 1904164873 expecting nextTxId=1904164871{code} > Unfortunately, the above error message can not help us quickly find the root > cause, so it's better to add a warning log to tell us the really reason, like > this: > {code:java} > 2019-08-02 04:55:05,879 WARN > org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager:Pending edits to > 192.168.202.13:8485 is going to exceed limit size:10240, current queued edits > size:10224, will silently drop 174 bytes of edits!{code} > This is just a very small improvement. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14693) NameNode should log a warning when EditLog IPC logger's pending size exceeds limit.
[ https://issues.apache.org/jira/browse/HDFS-14693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xudong Cao updated HDFS-14693: -- Status: Patch Available (was: Open) > NameNode should log a warning when EditLog IPC logger's pending size exceeds > limit. > --- > > Key: HDFS-14693 > URL: https://issues.apache.org/jira/browse/HDFS-14693 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.1.2 >Reporter: Xudong Cao >Assignee: Xudong Cao >Priority: Minor > Attachments: HDFS-14693.001.patch > > > In a production environment, there may be some differences in each > JouranlNode (e.g. network condition, disk condition, and so on). For example, > If a JN's network is much worse than other JNs, then the time taken by the NN > to write this JN will be much greater than other JNs, in this case, it will > cause the IPC Logger thread corresponding to this JN to have many pending > edits, when the pending edits exceeds the maximum limit (default 10MB), the > new edits about to write to this JN will be silently dropped, and will result > gaps in the editlog segment, which causing this JN and NN repeatedly > reporting the following errors: > {code:java} > org.apache.hadoop.hdfs.qjournal.protocol.JournalOutOfSyncException: Can't > write txid 1904164873 expecting nextTxId=1904164871{code} > Unfortunately, the above error message can not help us quickly find the root > cause, It took more time to find the cause, so it's better to add a warning > log here, like this: > {code:java} > 2019-08-02 04:55:05,879 WARN > org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager:Pending edits to > 192.168.202.13:8485 is going to exceed limit size:10240, current queued edits > size:10224, will silently drop 174 bytes of edits!{code} > This is just a very small improvement. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14693) NameNode should log a warning when EditLog IPC logger's pending size exceeds limit.
[ https://issues.apache.org/jira/browse/HDFS-14693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xudong Cao updated HDFS-14693: -- Attachment: HDFS-14693.001.patch > NameNode should log a warning when EditLog IPC logger's pending size exceeds > limit. > --- > > Key: HDFS-14693 > URL: https://issues.apache.org/jira/browse/HDFS-14693 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.1.2 >Reporter: Xudong Cao >Assignee: Xudong Cao >Priority: Minor > Attachments: HDFS-14693.001.patch > > > In a production environment, there may be some differences in each > JouranlNode (e.g. network condition, disk condition, and so on). For example, > If a JN's network is much worse than other JNs, then the time taken by the NN > to write this JN will be much greater than other JNs, in this case, it will > cause the IPC Logger thread corresponding to this JN to have many pending > edits, when the pending edits exceeds the maximum limit (default 10MB), the > new edits about to write to this JN will be silently dropped, and will result > gaps in the editlog segment, which causing this JN and NN repeatedly > reporting the following errors: > {code:java} > org.apache.hadoop.hdfs.qjournal.protocol.JournalOutOfSyncException: Can't > write txid 1904164873 expecting nextTxId=1904164871{code} > Unfortunately, the above error message can not help us quickly find the root > cause, It took more time to find the cause, so it's better to add a warning > log here, like this: > {code:java} > 2019-08-02 04:55:05,879 WARN > org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager:Pending edits to > 192.168.202.13:8485 is going to exceed limit size:10240, current queued edits > size:10224, will silently drop 174 bytes of edits!{code} > This is just a very small improvement. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14693) NameNode should log a warning when EditLog IPC logger's pending size exceeds limit.
[ https://issues.apache.org/jira/browse/HDFS-14693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xudong Cao updated HDFS-14693: -- Status: Open (was: Patch Available) > NameNode should log a warning when EditLog IPC logger's pending size exceeds > limit. > --- > > Key: HDFS-14693 > URL: https://issues.apache.org/jira/browse/HDFS-14693 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.1.2 >Reporter: Xudong Cao >Assignee: Xudong Cao >Priority: Minor > > In a production environment, there may be some differences in each > JouranlNode (e.g. network condition, disk condition, and so on). For example, > If a JN's network is much worse than other JNs, then the time taken by the NN > to write this JN will be much greater than other JNs, in this case, it will > cause the IPC Logger thread corresponding to this JN to have many pending > edits, when the pending edits exceeds the maximum limit (default 10MB), the > new edits about to write to this JN will be silently dropped, and will result > gaps in the editlog segment, which causing this JN and NN repeatedly > reporting the following errors: > {code:java} > org.apache.hadoop.hdfs.qjournal.protocol.JournalOutOfSyncException: Can't > write txid 1904164873 expecting nextTxId=1904164871{code} > Unfortunately, the above error message can not help us quickly find the root > cause, It took more time to find the cause, so it's better to add a warning > log here, like this: > {code:java} > 2019-08-02 04:55:05,879 WARN > org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager:Pending edits to > 192.168.202.13:8485 is going to exceed limit size:10240, current queued edits > size:10224, will silently drop 174 bytes of edits!{code} > This is just a very small improvement. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14693) NameNode should log a warning when EditLog IPC logger's pending size exceeds limit.
[ https://issues.apache.org/jira/browse/HDFS-14693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xudong Cao updated HDFS-14693: -- Attachment: (was: HDFS-16493.001.patch) > NameNode should log a warning when EditLog IPC logger's pending size exceeds > limit. > --- > > Key: HDFS-14693 > URL: https://issues.apache.org/jira/browse/HDFS-14693 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.1.2 >Reporter: Xudong Cao >Assignee: Xudong Cao >Priority: Minor > > In a production environment, there may be some differences in each > JouranlNode (e.g. network condition, disk condition, and so on). For example, > If a JN's network is much worse than other JNs, then the time taken by the NN > to write this JN will be much greater than other JNs, in this case, it will > cause the IPC Logger thread corresponding to this JN to have many pending > edits, when the pending edits exceeds the maximum limit (default 10MB), the > new edits about to write to this JN will be silently dropped, and will result > gaps in the editlog segment, which causing this JN and NN repeatedly > reporting the following errors: > {code:java} > org.apache.hadoop.hdfs.qjournal.protocol.JournalOutOfSyncException: Can't > write txid 1904164873 expecting nextTxId=1904164871{code} > Unfortunately, the above error message can not help us quickly find the root > cause, It took more time to find the cause, so it's better to add a warning > log here, like this: > {code:java} > 2019-08-02 04:55:05,879 WARN > org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager:Pending edits to > 192.168.202.13:8485 is going to exceed limit size:10240, current queued edits > size:10224, will silently drop 174 bytes of edits!{code} > This is just a very small improvement. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14693) NameNode should log a warning when EditLog IPC logger's pending size exceeds limit.
[ https://issues.apache.org/jira/browse/HDFS-14693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xudong Cao updated HDFS-14693: -- Description: In a production environment, there may be some differences in each JouranlNode (e.g. network condition, disk condition, and so on). For example, If a JN's network is much worse than other JNs, then the time taken by the NN to write this JN will be much greater than other JNs, in this case, it will cause the IPC Logger thread corresponding to this JN to have many pending edits, when the pending edits exceeds the maximum limit (default 10MB), the new edits about to write to this JN will be silently dropped, and will result gaps in the editlog segment, which causing this JN and NN repeatedly reporting the following errors: {code:java} org.apache.hadoop.hdfs.qjournal.protocol.JournalOutOfSyncException: Can't write txid 1904164873 expecting nextTxId=1904164871{code} Unfortunately, the above error message can not help us quickly find the root cause, It took more time to find the cause, so it's better to add a warning log here, like this: {code:java} 2019-08-02 04:55:05,879 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager:Pending edits to 192.168.202.13:8485 is going to exceed limit size:10240, current queued edits size:10224, will silently drop 174 bytes of edits!{code} This is just a very small improvement. was: In a production environment, there may be some differences in each JouranlNode (e.g. network condition, disk condition, and so on). For example, If a JN's network is much worse than other JNs, then the time taken by the NN to write this JN will be much greater than other JNs, in this case, it will cause the IPC Logger thread corresponding to this JN to have many pending edits, when the pending edits exceeds the maximum limit (default 10MB), the new edits about to write to this JN will be silently dropped, and will result gaps in the editlog segment, which causing this JN and NN repeatedly reporting the following errors: {code:java} org.apache.hadoop.hdfs.qjournal.protocol.JournalOutOfSyncException: Can't write txid 1904164873 expecting nextTxId=1904164871{code} Unfortunately, the above error message can not help us quickly find the root cause, so it's better to add a warning log to tell us the really reason, like this: {code:java} 2019-08-02 04:55:05,879 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager:Pending edits to 192.168.202.13:8485 is going to exceed limit size:10240, current queued edits size:10224, will silently drop 174 bytes of edits!{code} This is just a very small improvement. > NameNode should log a warning when EditLog IPC logger's pending size exceeds > limit. > --- > > Key: HDFS-14693 > URL: https://issues.apache.org/jira/browse/HDFS-14693 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.1.2 >Reporter: Xudong Cao >Assignee: Xudong Cao >Priority: Minor > Attachments: HDFS-16493.001.patch > > > In a production environment, there may be some differences in each > JouranlNode (e.g. network condition, disk condition, and so on). For example, > If a JN's network is much worse than other JNs, then the time taken by the NN > to write this JN will be much greater than other JNs, in this case, it will > cause the IPC Logger thread corresponding to this JN to have many pending > edits, when the pending edits exceeds the maximum limit (default 10MB), the > new edits about to write to this JN will be silently dropped, and will result > gaps in the editlog segment, which causing this JN and NN repeatedly > reporting the following errors: > {code:java} > org.apache.hadoop.hdfs.qjournal.protocol.JournalOutOfSyncException: Can't > write txid 1904164873 expecting nextTxId=1904164871{code} > Unfortunately, the above error message can not help us quickly find the root > cause, It took more time to find the cause, so it's better to add a warning > log here, like this: > {code:java} > 2019-08-02 04:55:05,879 WARN > org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager:Pending edits to > 192.168.202.13:8485 is going to exceed limit size:10240, current queued edits > size:10224, will silently drop 174 bytes of edits!{code} > This is just a very small improvement. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14646) Standby NameNode should terminate the FsImage put process immediately if the peer NN is not in the appropriate state to receive an image.
[ https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xudong Cao updated HDFS-14646: -- Status: Patch Available (was: Open) > Standby NameNode should terminate the FsImage put process immediately if the > peer NN is not in the appropriate state to receive an image. > - > > Key: HDFS-14646 > URL: https://issues.apache.org/jira/browse/HDFS-14646 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.1.2 >Reporter: Xudong Cao >Assignee: Xudong Cao >Priority: Major > Attachments: blockedInWritingSocket.png, get1.png, get2.png, > largeSendQ.png > > > *Problem Description:* > In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put > the image to all other NNs (whether the peer NN is an ANN or not), and even > if the peer NN immediately replies with an error (such as > TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult > .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put > process immediately, but will put the FsImage completely to the peer NN, and > will not read the peer NN's reply until the put is completed. > In a relatively large HDFS cluster, the size of FsImage can often reach about > 30GB. In this case, this invalid put brings two problems: > # Wasting time and bandwidth. > # Since the ImageServlet of the peer NN no longer receives the FsImage, the > socket Send-Q of the local SNN is very large, and the ImageUpload thread will > be blocked in writing socket for a long time, eventually causing the local > StandbyCheckpointer thread often blocked for several hours. > *An example is as follows:* > In the following figure, the local NN 100.76.3.234 is a SNN, the peer NN > 100.76.3.170 is another SNN, and the 8080 is NN Http port. When the local SNN > starts to put the FsImage, 170 will reply with a NOT_ACTIVE_NAMENODE_FAILURE > error immediately. In this case, the local SNN should terminate put > immediately, but in fact, local SNN has to wait until the image has been > completely put to the peer NN,and then can read the response. > # At this time, since the ImageServlet of the peer NN no longer receives the > FsImage, the socket Send-Q of the local SNN is very large: > !largeSendQ.png! > 2. Moreover, the local SNN's ImageUpload thread will be blocked in > writing socket for a long time: > !blockedInWritingSocket.png! . > > 3. Eventually, the StandbyCheckpointer thread of local SNN is waiting > for the execution result of the ImageUpload thread, blocking in Future.get(), > and the blocking time may be as long as several hours: > !get1.png! > > !get2.png! > > > *Solution:* > When the local SNN plans to put a FsImage to the peer NN, it need to test > whether he really need to put it at this time. The test process is: > # Establish an HTTP connection with the peer NN, send the put request, and > then immediately read the response (this is the key point). If the peer NN > replies any of the following errors (TransferResult.AUTHENTICATION_FAILURE, > TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult. > OLD_TRANSACTION_ID_FAILURE), immediately terminate the put process. > # If the peer NN is indeed the Active NameNode AND it's now in the > appropriate state to receive an image, it will reply an HTTP response 410 > (HttpServletResponse.SC_GONE, which is TransferResult.UNEXPECTED_FAILURE). At > this time, the local SNN can really begin to put the image. > *Note:* > This problem needs to be reproduced in a large cluster (the size of FsImage > in our cluster is about 30GB). Therefore, unit testing is difficult to write. > In our cluster, after the modification, the problem has been solved and there > is no such thing as a large backlog of Send-Q. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14646) Standby NameNode should terminate the FsImage put process immediately if the peer NN is not in the appropriate state to receive an image.
[ https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xudong Cao updated HDFS-14646: -- Status: Open (was: Patch Available) > Standby NameNode should terminate the FsImage put process immediately if the > peer NN is not in the appropriate state to receive an image. > - > > Key: HDFS-14646 > URL: https://issues.apache.org/jira/browse/HDFS-14646 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.1.2 >Reporter: Xudong Cao >Assignee: Xudong Cao >Priority: Major > Attachments: blockedInWritingSocket.png, get1.png, get2.png, > largeSendQ.png > > > *Problem Description:* > In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put > the image to all other NNs (whether the peer NN is an ANN or not), and even > if the peer NN immediately replies with an error (such as > TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult > .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put > process immediately, but will put the FsImage completely to the peer NN, and > will not read the peer NN's reply until the put is completed. > In a relatively large HDFS cluster, the size of FsImage can often reach about > 30GB. In this case, this invalid put brings two problems: > # Wasting time and bandwidth. > # Since the ImageServlet of the peer NN no longer receives the FsImage, the > socket Send-Q of the local SNN is very large, and the ImageUpload thread will > be blocked in writing socket for a long time, eventually causing the local > StandbyCheckpointer thread often blocked for several hours. > *An example is as follows:* > In the following figure, the local NN 100.76.3.234 is a SNN, the peer NN > 100.76.3.170 is another SNN, and the 8080 is NN Http port. When the local SNN > starts to put the FsImage, 170 will reply with a NOT_ACTIVE_NAMENODE_FAILURE > error immediately. In this case, the local SNN should terminate put > immediately, but in fact, local SNN has to wait until the image has been > completely put to the peer NN,and then can read the response. > # At this time, since the ImageServlet of the peer NN no longer receives the > FsImage, the socket Send-Q of the local SNN is very large: > !largeSendQ.png! > 2. Moreover, the local SNN's ImageUpload thread will be blocked in > writing socket for a long time: > !blockedInWritingSocket.png! . > > 3. Eventually, the StandbyCheckpointer thread of local SNN is waiting > for the execution result of the ImageUpload thread, blocking in Future.get(), > and the blocking time may be as long as several hours: > !get1.png! > > !get2.png! > > > *Solution:* > When the local SNN plans to put a FsImage to the peer NN, it need to test > whether he really need to put it at this time. The test process is: > # Establish an HTTP connection with the peer NN, send the put request, and > then immediately read the response (this is the key point). If the peer NN > replies any of the following errors (TransferResult.AUTHENTICATION_FAILURE, > TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult. > OLD_TRANSACTION_ID_FAILURE), immediately terminate the put process. > # If the peer NN is indeed the Active NameNode AND it's now in the > appropriate state to receive an image, it will reply an HTTP response 410 > (HttpServletResponse.SC_GONE, which is TransferResult.UNEXPECTED_FAILURE). At > this time, the local SNN can really begin to put the image. > *Note:* > This problem needs to be reproduced in a large cluster (the size of FsImage > in our cluster is about 30GB). Therefore, unit testing is difficult to write. > In our cluster, after the modification, the problem has been solved and there > is no such thing as a large backlog of Send-Q. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14646) Standby NameNode should terminate the FsImage put process immediately if the peer NN is not in the appropriate state to receive an image.
[ https://issues.apache.org/jira/browse/HDFS-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xudong Cao updated HDFS-14646: -- Attachment: HDFS-14646.000.patch Status: Patch Available (was: Open) > Standby NameNode should terminate the FsImage put process immediately if the > peer NN is not in the appropriate state to receive an image. > - > > Key: HDFS-14646 > URL: https://issues.apache.org/jira/browse/HDFS-14646 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.1.2 >Reporter: Xudong Cao >Assignee: Xudong Cao >Priority: Major > Attachments: HDFS-14646.000.patch, blockedInWritingSocket.png, > get1.png, get2.png, largeSendQ.png > > > *Problem Description:* > In the multi-NameNode scenario, when a SNN uploads a FsImage, it will put > the image to all other NNs (whether the peer NN is an ANN or not), and even > if the peer NN immediately replies with an error (such as > TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult > .OLD_TRANSACTION_ID_FAILURE, etc.), the local SNN will not terminate the put > process immediately, but will put the FsImage completely to the peer NN, and > will not read the peer NN's reply until the put is completed. > In a relatively large HDFS cluster, the size of FsImage can often reach about > 30GB. In this case, this invalid put brings two problems: > # Wasting time and bandwidth. > # Since the ImageServlet of the peer NN no longer receives the FsImage, the > socket Send-Q of the local SNN is very large, and the ImageUpload thread will > be blocked in writing socket for a long time, eventually causing the local > StandbyCheckpointer thread often blocked for several hours. > *An example is as follows:* > In the following figure, the local NN 100.76.3.234 is a SNN, the peer NN > 100.76.3.170 is another SNN, and the 8080 is NN Http port. When the local SNN > starts to put the FsImage, 170 will reply with a NOT_ACTIVE_NAMENODE_FAILURE > error immediately. In this case, the local SNN should terminate put > immediately, but in fact, local SNN has to wait until the image has been > completely put to the peer NN,and then can read the response. > # At this time, since the ImageServlet of the peer NN no longer receives the > FsImage, the socket Send-Q of the local SNN is very large: > !largeSendQ.png! > 2. Moreover, the local SNN's ImageUpload thread will be blocked in > writing socket for a long time: > !blockedInWritingSocket.png! . > > 3. Eventually, the StandbyCheckpointer thread of local SNN is waiting > for the execution result of the ImageUpload thread, blocking in Future.get(), > and the blocking time may be as long as several hours: > !get1.png! > > !get2.png! > > > *Solution:* > When the local SNN plans to put a FsImage to the peer NN, it need to test > whether he really need to put it at this time. The test process is: > # Establish an HTTP connection with the peer NN, send the put request, and > then immediately read the response (this is the key point). If the peer NN > replies any of the following errors (TransferResult.AUTHENTICATION_FAILURE, > TransferResult.NOT_ACTIVE_NAMENODE_FAILURE, TransferResult. > OLD_TRANSACTION_ID_FAILURE), immediately terminate the put process. > # If the peer NN is indeed the Active NameNode AND it's now in the > appropriate state to receive an image, it will reply an HTTP response 410 > (HttpServletResponse.SC_GONE, which is TransferResult.UNEXPECTED_FAILURE). At > this time, the local SNN can really begin to put the image. > *Note:* > This problem needs to be reproduced in a large cluster (the size of FsImage > in our cluster is about 30GB). Therefore, unit testing is difficult to write. > In our cluster, after the modification, the problem has been solved and there > is no such thing as a large backlog of Send-Q. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org