[jira] [Updated] (HADOOP-14660) wasb: improve throughput by 34% when account limit exceeded
[ https://issues.apache.org/jira/browse/HADOOP-14660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-14660: Resolution: Fixed Fix Version/s: 3.0.0-beta1 2.9.0 Status: Resolved (was: Patch Available) +1, committed thanks for wrapping this up...closing the issue now > wasb: improve throughput by 34% when account limit exceeded > --- > > Key: HADOOP-14660 > URL: https://issues.apache.org/jira/browse/HADOOP-14660 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Reporter: Thomas Marquardt >Assignee: Thomas Marquardt > Fix For: 2.9.0, 3.0.0-beta1 > > Attachments: HADOOP-14660-001.patch, HADOOP-14660-002.patch, > HADOOP-14660-003.patch, HADOOP-14660-004.patch, HADOOP-14660-005.patch, > HADOOP-14660-006.patch, HADOOP-14660-007.patch, HADOOP-14660-008.patch, > HADOOP-14660-010.patch, HADOOP-14660-branch-2-001.patch > > > Big data workloads frequently exceed the Azure Storage max ingress and egress > limits > (https://docs.microsoft.com/en-us/azure/azure-subscription-service-limits). > For example, the max ingress limit for a GRS account in the United States is > currently 10 Gbps. When the limit is exceeded, the Azure Storage service > fails a percentage of incoming requests, and this causes the client to > initiate the retry policy. The retry policy delays requests by sleeping, but > the sleep duration is independent of the client throughput and account limit. > This results in low throughput, due to the high number of failed requests > and thrashing causes by the retry policy. > To fix this, we introduce a client-side throttle which minimizes failed > requests and maximizes throughput. Tests have shown that this improves > throughtput by ~34% when the storage account max ingress and/or egress limits > are exceeded. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14660) wasb: improve throughput by 34% when account limit exceeded
[ https://issues.apache.org/jira/browse/HADOOP-14660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Marquardt updated HADOOP-14660: -- Attachment: HADOOP-14660-branch-2-001.patch Re-attaching HADOOP-14660-branch-2-001.patch for another QA pass now that dependency HADOOP-14662 is committed. All hadoop-azure tests passed against my tmarql3 endpoint. Tests run: 736, Failures: 0, Errors: 0, Skipped: 95 > wasb: improve throughput by 34% when account limit exceeded > --- > > Key: HADOOP-14660 > URL: https://issues.apache.org/jira/browse/HADOOP-14660 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Reporter: Thomas Marquardt >Assignee: Thomas Marquardt > Attachments: HADOOP-14660-001.patch, HADOOP-14660-002.patch, > HADOOP-14660-003.patch, HADOOP-14660-004.patch, HADOOP-14660-005.patch, > HADOOP-14660-006.patch, HADOOP-14660-007.patch, HADOOP-14660-008.patch, > HADOOP-14660-010.patch, HADOOP-14660-branch-2-001.patch > > > Big data workloads frequently exceed the Azure Storage max ingress and egress > limits > (https://docs.microsoft.com/en-us/azure/azure-subscription-service-limits). > For example, the max ingress limit for a GRS account in the United States is > currently 10 Gbps. When the limit is exceeded, the Azure Storage service > fails a percentage of incoming requests, and this causes the client to > initiate the retry policy. The retry policy delays requests by sleeping, but > the sleep duration is independent of the client throughput and account limit. > This results in low throughput, due to the high number of failed requests > and thrashing causes by the retry policy. > To fix this, we introduce a client-side throttle which minimizes failed > requests and maximizes throughput. Tests have shown that this improves > throughtput by ~34% when the storage account max ingress and/or egress limits > are exceeded. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14660) wasb: improve throughput by 34% when account limit exceeded
[ https://issues.apache.org/jira/browse/HADOOP-14660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Marquardt updated HADOOP-14660: -- Attachment: (was: HADOOP-14660-branch-2.patch) > wasb: improve throughput by 34% when account limit exceeded > --- > > Key: HADOOP-14660 > URL: https://issues.apache.org/jira/browse/HADOOP-14660 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Reporter: Thomas Marquardt >Assignee: Thomas Marquardt > Attachments: HADOOP-14660-001.patch, HADOOP-14660-002.patch, > HADOOP-14660-003.patch, HADOOP-14660-004.patch, HADOOP-14660-005.patch, > HADOOP-14660-006.patch, HADOOP-14660-007.patch, HADOOP-14660-008.patch, > HADOOP-14660-010.patch > > > Big data workloads frequently exceed the Azure Storage max ingress and egress > limits > (https://docs.microsoft.com/en-us/azure/azure-subscription-service-limits). > For example, the max ingress limit for a GRS account in the United States is > currently 10 Gbps. When the limit is exceeded, the Azure Storage service > fails a percentage of incoming requests, and this causes the client to > initiate the retry policy. The retry policy delays requests by sleeping, but > the sleep duration is independent of the client throughput and account limit. > This results in low throughput, due to the high number of failed requests > and thrashing causes by the retry policy. > To fix this, we introduce a client-side throttle which minimizes failed > requests and maximizes throughput. Tests have shown that this improves > throughtput by ~34% when the storage account max ingress and/or egress limits > are exceeded. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14660) wasb: improve throughput by 34% when account limit exceeded
[ https://issues.apache.org/jira/browse/HADOOP-14660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Marquardt updated HADOOP-14660: -- Attachment: HADOOP-14660-branch-2.patch Attaching HADOOP-14660-branch-2.patch. This is the branch-2 patch. It has a dependency on the branch-2 patch attached to https://issues.apache.org/jira/browse/HADOOP-14662. All tests are passing against my tmarql3 endpoint: Tests run: 736, Failures: 0, Errors: 0, Skipped: 95 > wasb: improve throughput by 34% when account limit exceeded > --- > > Key: HADOOP-14660 > URL: https://issues.apache.org/jira/browse/HADOOP-14660 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Reporter: Thomas Marquardt >Assignee: Thomas Marquardt > Attachments: HADOOP-14660-001.patch, HADOOP-14660-002.patch, > HADOOP-14660-003.patch, HADOOP-14660-004.patch, HADOOP-14660-005.patch, > HADOOP-14660-006.patch, HADOOP-14660-007.patch, HADOOP-14660-008.patch, > HADOOP-14660-010.patch, HADOOP-14660-branch-2.patch > > > Big data workloads frequently exceed the Azure Storage max ingress and egress > limits > (https://docs.microsoft.com/en-us/azure/azure-subscription-service-limits). > For example, the max ingress limit for a GRS account in the United States is > currently 10 Gbps. When the limit is exceeded, the Azure Storage service > fails a percentage of incoming requests, and this causes the client to > initiate the retry policy. The retry policy delays requests by sleeping, but > the sleep duration is independent of the client throughput and account limit. > This results in low throughput, due to the high number of failed requests > and thrashing causes by the retry policy. > To fix this, we introduce a client-side throttle which minimizes failed > requests and maximizes throughput. Tests have shown that this improves > throughtput by ~34% when the storage account max ingress and/or egress limits > are exceeded. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14660) wasb: improve throughput by 34% when account limit exceeded
[ https://issues.apache.org/jira/browse/HADOOP-14660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-14660: Issue Type: Sub-task (was: Improvement) Parent: HADOOP-14552 > wasb: improve throughput by 34% when account limit exceeded > --- > > Key: HADOOP-14660 > URL: https://issues.apache.org/jira/browse/HADOOP-14660 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Reporter: Thomas Marquardt >Assignee: Thomas Marquardt > Attachments: HADOOP-14660-001.patch, HADOOP-14660-002.patch, > HADOOP-14660-003.patch, HADOOP-14660-004.patch, HADOOP-14660-005.patch, > HADOOP-14660-006.patch, HADOOP-14660-007.patch, HADOOP-14660-008.patch, > HADOOP-14660-010.patch > > > Big data workloads frequently exceed the Azure Storage max ingress and egress > limits > (https://docs.microsoft.com/en-us/azure/azure-subscription-service-limits). > For example, the max ingress limit for a GRS account in the United States is > currently 10 Gbps. When the limit is exceeded, the Azure Storage service > fails a percentage of incoming requests, and this causes the client to > initiate the retry policy. The retry policy delays requests by sleeping, but > the sleep duration is independent of the client throughput and account limit. > This results in low throughput, due to the high number of failed requests > and thrashing causes by the retry policy. > To fix this, we introduce a client-side throttle which minimizes failed > requests and maximizes throughput. Tests have shown that this improves > throughtput by ~34% when the storage account max ingress and/or egress limits > are exceeded. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14660) wasb: improve throughput by 34% when account limit exceeded
[ https://issues.apache.org/jira/browse/HADOOP-14660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-14660: Status: Patch Available (was: Open) > wasb: improve throughput by 34% when account limit exceeded > --- > > Key: HADOOP-14660 > URL: https://issues.apache.org/jira/browse/HADOOP-14660 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/azure >Reporter: Thomas Marquardt >Assignee: Thomas Marquardt > Attachments: HADOOP-14660-001.patch, HADOOP-14660-002.patch, > HADOOP-14660-003.patch, HADOOP-14660-004.patch, HADOOP-14660-005.patch, > HADOOP-14660-006.patch, HADOOP-14660-007.patch, HADOOP-14660-008.patch, > HADOOP-14660-010.patch > > > Big data workloads frequently exceed the Azure Storage max ingress and egress > limits > (https://docs.microsoft.com/en-us/azure/azure-subscription-service-limits). > For example, the max ingress limit for a GRS account in the United States is > currently 10 Gbps. When the limit is exceeded, the Azure Storage service > fails a percentage of incoming requests, and this causes the client to > initiate the retry policy. The retry policy delays requests by sleeping, but > the sleep duration is independent of the client throughput and account limit. > This results in low throughput, due to the high number of failed requests > and thrashing causes by the retry policy. > To fix this, we introduce a client-side throttle which minimizes failed > requests and maximizes throughput. Tests have shown that this improves > throughtput by ~34% when the storage account max ingress and/or egress limits > are exceeded. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14660) wasb: improve throughput by 34% when account limit exceeded
[ https://issues.apache.org/jira/browse/HADOOP-14660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-14660: Attachment: HADOOP-14660-010.patch Patch 010; patch 009 with the checkstyle cleanups of ContractTestUtils pulled out, and so avoids some merge conflict. Testing in progress > wasb: improve throughput by 34% when account limit exceeded > --- > > Key: HADOOP-14660 > URL: https://issues.apache.org/jira/browse/HADOOP-14660 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/azure >Reporter: Thomas Marquardt >Assignee: Thomas Marquardt > Attachments: HADOOP-14660-001.patch, HADOOP-14660-002.patch, > HADOOP-14660-003.patch, HADOOP-14660-004.patch, HADOOP-14660-005.patch, > HADOOP-14660-006.patch, HADOOP-14660-007.patch, HADOOP-14660-008.patch, > HADOOP-14660-010.patch > > > Big data workloads frequently exceed the Azure Storage max ingress and egress > limits > (https://docs.microsoft.com/en-us/azure/azure-subscription-service-limits). > For example, the max ingress limit for a GRS account in the United States is > currently 10 Gbps. When the limit is exceeded, the Azure Storage service > fails a percentage of incoming requests, and this causes the client to > initiate the retry policy. The retry policy delays requests by sleeping, but > the sleep duration is independent of the client throughput and account limit. > This results in low throughput, due to the high number of failed requests > and thrashing causes by the retry policy. > To fix this, we introduce a client-side throttle which minimizes failed > requests and maximizes throughput. Tests have shown that this improves > throughtput by ~34% when the storage account max ingress and/or egress limits > are exceeded. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14660) wasb: improve throughput by 34% when account limit exceeded
[ https://issues.apache.org/jira/browse/HADOOP-14660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-14660: Status: Open (was: Patch Available) > wasb: improve throughput by 34% when account limit exceeded > --- > > Key: HADOOP-14660 > URL: https://issues.apache.org/jira/browse/HADOOP-14660 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/azure >Reporter: Thomas Marquardt >Assignee: Thomas Marquardt > Attachments: HADOOP-14660-001.patch, HADOOP-14660-002.patch, > HADOOP-14660-003.patch, HADOOP-14660-004.patch, HADOOP-14660-005.patch, > HADOOP-14660-006.patch, HADOOP-14660-007.patch, HADOOP-14660-008.patch > > > Big data workloads frequently exceed the Azure Storage max ingress and egress > limits > (https://docs.microsoft.com/en-us/azure/azure-subscription-service-limits). > For example, the max ingress limit for a GRS account in the United States is > currently 10 Gbps. When the limit is exceeded, the Azure Storage service > fails a percentage of incoming requests, and this causes the client to > initiate the retry policy. The retry policy delays requests by sleeping, but > the sleep duration is independent of the client throughput and account limit. > This results in low throughput, due to the high number of failed requests > and thrashing causes by the retry policy. > To fix this, we introduce a client-side throttle which minimizes failed > requests and maximizes throughput. Tests have shown that this improves > throughtput by ~34% when the storage account max ingress and/or egress limits > are exceeded. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14660) wasb: improve throughput by 34% when account limit exceeded
[ https://issues.apache.org/jira/browse/HADOOP-14660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas updated HADOOP-14660: Attachment: HADOOP-14660-008.patch Re-naming and attaching HADOOP-14660-008.patch. I'm able to apply the patches and am not sure why Jenkins is failing. > wasb: improve throughput by 34% when account limit exceeded > --- > > Key: HADOOP-14660 > URL: https://issues.apache.org/jira/browse/HADOOP-14660 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/azure >Reporter: Thomas >Assignee: Thomas > Attachments: HADOOP-14660-001.patch, HADOOP-14660-002.patch, > HADOOP-14660-003.patch, HADOOP-14660-004.patch, HADOOP-14660-005.patch, > HADOOP-14660-006.patch, HADOOP-14660-007.patch, HADOOP-14660-008.patch > > > Big data workloads frequently exceed the Azure Storage max ingress and egress > limits > (https://docs.microsoft.com/en-us/azure/azure-subscription-service-limits). > For example, the max ingress limit for a GRS account in the United States is > currently 10 Gbps. When the limit is exceeded, the Azure Storage service > fails a percentage of incoming requests, and this causes the client to > initiate the retry policy. The retry policy delays requests by sleeping, but > the sleep duration is independent of the client throughput and account limit. > This results in low throughput, due to the high number of failed requests > and thrashing causes by the retry policy. > To fix this, we introduce a client-side throttle which minimizes failed > requests and maximizes throughput. Tests have shown that this improves > throughtput by ~34% when the storage account max ingress and/or egress limits > are exceeded. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14660) wasb: improve throughput by 34% when account limit exceeded
[ https://issues.apache.org/jira/browse/HADOOP-14660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas updated HADOOP-14660: Attachment: HADOOP-14660-007.patch Something wrong with Jenkins? Attaching HADOOP-14660-007.patch, which is the same as HADOOP-14660-006.patch. > wasb: improve throughput by 34% when account limit exceeded > --- > > Key: HADOOP-14660 > URL: https://issues.apache.org/jira/browse/HADOOP-14660 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/azure >Reporter: Thomas >Assignee: Thomas > Attachments: HADOOP-14660-001.patch, HADOOP-14660-002.patch, > HADOOP-14660-003.patch, HADOOP-14660-004.patch, HADOOP-14660-005.patch, > HADOOP-14660-006.patch, HADOOP-14660-007.patch > > > Big data workloads frequently exceed the Azure Storage max ingress and egress > limits > (https://docs.microsoft.com/en-us/azure/azure-subscription-service-limits). > For example, the max ingress limit for a GRS account in the United States is > currently 10 Gbps. When the limit is exceeded, the Azure Storage service > fails a percentage of incoming requests, and this causes the client to > initiate the retry policy. The retry policy delays requests by sleeping, but > the sleep duration is independent of the client throughput and account limit. > This results in low throughput, due to the high number of failed requests > and thrashing causes by the retry policy. > To fix this, we introduce a client-side throttle which minimizes failed > requests and maximizes throughput. Tests have shown that this improves > throughtput by ~34% when the storage account max ingress and/or egress limits > are exceeded. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14660) wasb: improve throughput by 34% when account limit exceeded
[ https://issues.apache.org/jira/browse/HADOOP-14660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas updated HADOOP-14660: Attachment: HADOOP-14660-006.patch Attaching HADOOP-14660-006.patch with {{checkstyle}} fixes for all the files modified by the patch. > wasb: improve throughput by 34% when account limit exceeded > --- > > Key: HADOOP-14660 > URL: https://issues.apache.org/jira/browse/HADOOP-14660 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/azure >Reporter: Thomas >Assignee: Thomas > Attachments: HADOOP-14660-001.patch, HADOOP-14660-002.patch, > HADOOP-14660-003.patch, HADOOP-14660-004.patch, HADOOP-14660-005.patch, > HADOOP-14660-006.patch > > > Big data workloads frequently exceed the Azure Storage max ingress and egress > limits > (https://docs.microsoft.com/en-us/azure/azure-subscription-service-limits). > For example, the max ingress limit for a GRS account in the United States is > currently 10 Gbps. When the limit is exceeded, the Azure Storage service > fails a percentage of incoming requests, and this causes the client to > initiate the retry policy. The retry policy delays requests by sleeping, but > the sleep duration is independent of the client throughput and account limit. > This results in low throughput, due to the high number of failed requests > and thrashing causes by the retry policy. > To fix this, we introduce a client-side throttle which minimizes failed > requests and maximizes throughput. Tests have shown that this improves > throughtput by ~34% when the storage account max ingress and/or egress limits > are exceeded. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14660) wasb: improve throughput by 34% when account limit exceeded
[ https://issues.apache.org/jira/browse/HADOOP-14660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas updated HADOOP-14660: Attachment: HADOOP-14660-005.patch Attaching HADOOP-14660-005.patch. Thanks Steve! I've addressed your feedback in HADOOP-14660-005.patch. All hadoop-tools/hadoop-azure tests are passing against my tmarql3 azure storage account. Tests run: 743, Failures: 0, Errors: 0, Skipped: 129 > wasb: improve throughput by 34% when account limit exceeded > --- > > Key: HADOOP-14660 > URL: https://issues.apache.org/jira/browse/HADOOP-14660 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/azure >Reporter: Thomas >Assignee: Thomas > Attachments: HADOOP-14660-001.patch, HADOOP-14660-002.patch, > HADOOP-14660-003.patch, HADOOP-14660-004.patch, HADOOP-14660-005.patch > > > Big data workloads frequently exceed the Azure Storage max ingress and egress > limits > (https://docs.microsoft.com/en-us/azure/azure-subscription-service-limits). > For example, the max ingress limit for a GRS account in the United States is > currently 10 Gbps. When the limit is exceeded, the Azure Storage service > fails a percentage of incoming requests, and this causes the client to > initiate the retry policy. The retry policy delays requests by sleeping, but > the sleep duration is independent of the client throughput and account limit. > This results in low throughput, due to the high number of failed requests > and thrashing causes by the retry policy. > To fix this, we introduce a client-side throttle which minimizes failed > requests and maximizes throughput. Tests have shown that this improves > throughtput by ~34% when the storage account max ingress and/or egress limits > are exceeded. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14660) wasb: improve throughput by 34% when account limit exceeded
[ https://issues.apache.org/jira/browse/HADOOP-14660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas updated HADOOP-14660: Attachment: HADOOP-14660-004.patch Looks like re-submitting doesn't work, so I've renamed the patch to HADOOP-14660-004.patch. > wasb: improve throughput by 34% when account limit exceeded > --- > > Key: HADOOP-14660 > URL: https://issues.apache.org/jira/browse/HADOOP-14660 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/azure >Reporter: Thomas >Assignee: Thomas > Attachments: HADOOP-14660-001.patch, HADOOP-14660-002.patch, > HADOOP-14660-003.patch, HADOOP-14660-004.patch > > > Big data workloads frequently exceed the Azure Storage max ingress and egress > limits > (https://docs.microsoft.com/en-us/azure/azure-subscription-service-limits). > For example, the max ingress limit for a GRS account in the United States is > currently 10 Gbps. When the limit is exceeded, the Azure Storage service > fails a percentage of incoming requests, and this causes the client to > initiate the retry policy. The retry policy delays requests by sleeping, but > the sleep duration is independent of the client throughput and account limit. > This results in low throughput, due to the high number of failed requests > and thrashing causes by the retry policy. > To fix this, we introduce a client-side throttle which minimizes failed > requests and maximizes throughput. Tests have shown that this improves > throughtput by ~34% when the storage account max ingress and/or egress limits > are exceeded. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14660) wasb: improve throughput by 34% when account limit exceeded
[ https://issues.apache.org/jira/browse/HADOOP-14660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas updated HADOOP-14660: Status: Patch Available (was: Open) Re-submitting HADOOP-14660-003.patch due to transient build failure. > wasb: improve throughput by 34% when account limit exceeded > --- > > Key: HADOOP-14660 > URL: https://issues.apache.org/jira/browse/HADOOP-14660 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/azure >Reporter: Thomas >Assignee: Thomas > Attachments: HADOOP-14660-001.patch, HADOOP-14660-002.patch, > HADOOP-14660-003.patch > > > Big data workloads frequently exceed the Azure Storage max ingress and egress > limits > (https://docs.microsoft.com/en-us/azure/azure-subscription-service-limits). > For example, the max ingress limit for a GRS account in the United States is > currently 10 Gbps. When the limit is exceeded, the Azure Storage service > fails a percentage of incoming requests, and this causes the client to > initiate the retry policy. The retry policy delays requests by sleeping, but > the sleep duration is independent of the client throughput and account limit. > This results in low throughput, due to the high number of failed requests > and thrashing causes by the retry policy. > To fix this, we introduce a client-side throttle which minimizes failed > requests and maximizes throughput. Tests have shown that this improves > throughtput by ~34% when the storage account max ingress and/or egress limits > are exceeded. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14660) wasb: improve throughput by 34% when account limit exceeded
[ https://issues.apache.org/jira/browse/HADOOP-14660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas updated HADOOP-14660: Status: Open (was: Patch Available) Cancelling so I can re-submit due to transient build failure. > wasb: improve throughput by 34% when account limit exceeded > --- > > Key: HADOOP-14660 > URL: https://issues.apache.org/jira/browse/HADOOP-14660 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/azure >Reporter: Thomas >Assignee: Thomas > Attachments: HADOOP-14660-001.patch, HADOOP-14660-002.patch, > HADOOP-14660-003.patch > > > Big data workloads frequently exceed the Azure Storage max ingress and egress > limits > (https://docs.microsoft.com/en-us/azure/azure-subscription-service-limits). > For example, the max ingress limit for a GRS account in the United States is > currently 10 Gbps. When the limit is exceeded, the Azure Storage service > fails a percentage of incoming requests, and this causes the client to > initiate the retry policy. The retry policy delays requests by sleeping, but > the sleep duration is independent of the client throughput and account limit. > This results in low throughput, due to the high number of failed requests > and thrashing causes by the retry policy. > To fix this, we introduce a client-side throttle which minimizes failed > requests and maximizes throughput. Tests have shown that this improves > throughtput by ~34% when the storage account max ingress and/or egress limits > are exceeded. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14660) wasb: improve throughput by 34% when account limit exceeded
[ https://issues.apache.org/jira/browse/HADOOP-14660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas updated HADOOP-14660: Attachment: HADOOP-14660-003.patch Attached HADOOP-14660-003.patch. Thanks for the feedback Steve! My response below: We tested this on an HDInsight cluster using real loads against Azure Storage and used Storage Metrics (https://docs.microsoft.com/en-us/rest/api/storageservices/Storage-Analytics-Metrics-Table-Schema) to analyze the performance. For example, with the client-side throttling feature enabled for an account with 10 Gbps ingress limit, the total ingress throughput levels off at approximately 10 Gbps, whereas without the feature the throughput varies between 5 Gbps and 10 Gbps as the service rejects requests and the retry policy kicks in. For example, a teragen job that attempts to upload more than the ingress limit allows can complete 30 to 40% faster with the feature enabled. There can be tens or hundreds of wasb clients and they independently throttle themselves with no knowledge of each other, and together they minimize errors and maximize throughput. I do not believe there will be any contention with TCP congestion control. I'm confident, but not overly so, which is why the feature is disabled by default. I'd like to obtain feedback and perhaps enable it by default in the future. The Azure Storage service exposes account level metrics so we're not in the dark today. My team plans to review and implement client-side metrics for wasb. We're a small team, and metrics are important to get right, so we don't want to do it piecemeal, but instead will review and make comprehensive changes. This change will not compile without the 5.4.0 SDK, as it uses the new ErrorReceivingResponseEvent, but I'll create a separate JIRA as you requested. I agree the configuration key should have the suffix "enabled", but for consistency I'm using "enable" because all the other wasb configuration keys are this way. ClientThrottlingInercept logs a debug message saying "Client-side throttling is enabled for the WASB file system." There are no other configurable options for the feature, so nothing else is included in the message. The HTTP request/response parsing (BlobOperationDescriptor.getContentLengthIfKnown) is best effort. It should not throw or fail for the data that it processes. There are tests to validate this parsing logic and its integration with the Storage SDK, but it is not a general purpose parser--it works inconjunction with the SDK. I switched to SLF4J. Removed superfluous InterfaceAudience.Private. Changed timer name to "wasb-timer-client-throttling-analyzer-". The timer task returns quickly--no blocking, waiting, loops, etc. I updated ClientThrottlingAnalyzer, TestBlobOperationDescriptor, and TestClientThrottlingAnalyzer as requeseted. > wasb: improve throughput by 34% when account limit exceeded > --- > > Key: HADOOP-14660 > URL: https://issues.apache.org/jira/browse/HADOOP-14660 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/azure >Reporter: Thomas >Assignee: Thomas > Attachments: HADOOP-14660-001.patch, HADOOP-14660-002.patch, > HADOOP-14660-003.patch > > > Big data workloads frequently exceed the Azure Storage max ingress and egress > limits > (https://docs.microsoft.com/en-us/azure/azure-subscription-service-limits). > For example, the max ingress limit for a GRS account in the United States is > currently 10 Gbps. When the limit is exceeded, the Azure Storage service > fails a percentage of incoming requests, and this causes the client to > initiate the retry policy. The retry policy delays requests by sleeping, but > the sleep duration is independent of the client throughput and account limit. > This results in low throughput, due to the high number of failed requests > and thrashing causes by the retry policy. > To fix this, we introduce a client-side throttle which minimizes failed > requests and maximizes throughput. Tests have shown that this improves > throughtput by ~34% when the storage account max ingress and/or egress limits > are exceeded. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14660) wasb: improve throughput by 34% when account limit exceeded
[ https://issues.apache.org/jira/browse/HADOOP-14660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas updated HADOOP-14660: Attachment: HADOOP-14660-002.patch Attaching HADOOP-14660-002.patch with findbugs fix for missing switch/default. The unit test failure is not related to this change. All 743 hadoop-azure tests are passing with this patch: Tests run: 743, Failures: 0, Errors: 0, Skipped: 129. > wasb: improve throughput by 34% when account limit exceeded > --- > > Key: HADOOP-14660 > URL: https://issues.apache.org/jira/browse/HADOOP-14660 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/azure >Reporter: Thomas >Assignee: Thomas > Attachments: HADOOP-14660-001.patch, HADOOP-14660-002.patch > > > Big data workloads frequently exceed the Azure Storage max ingress and egress > limits > (https://docs.microsoft.com/en-us/azure/azure-subscription-service-limits). > For example, the max ingress limit for a GRS account in the United States is > currently 10 Gbps. When the limit is exceeded, the Azure Storage service > fails a percentage of incoming requests, and this causes the client to > initiate the retry policy. The retry policy delays requests by sleeping, but > the sleep duration is independent of the client throughput and account limit. > This results in low throughput, due to the high number of failed requests > and thrashing causes by the retry policy. > To fix this, we introduce a client-side throttle which minimizes failed > requests and maximizes throughput. Tests have shown that this improves > throughtput by ~34% when the storage account max ingress and/or egress limits > are exceeded. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14660) wasb: improve throughput by 34% when account limit exceeded
[ https://issues.apache.org/jira/browse/HADOOP-14660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas updated HADOOP-14660: Status: Patch Available (was: Open) > wasb: improve throughput by 34% when account limit exceeded > --- > > Key: HADOOP-14660 > URL: https://issues.apache.org/jira/browse/HADOOP-14660 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/azure >Reporter: Thomas >Assignee: Thomas > Attachments: HADOOP-14660-001.patch > > > Big data workloads frequently exceed the Azure Storage max ingress and egress > limits > (https://docs.microsoft.com/en-us/azure/azure-subscription-service-limits). > For example, the max ingress limit for a GRS account in the United States is > currently 10 Gbps. When the limit is exceeded, the Azure Storage service > fails a percentage of incoming requests, and this causes the client to > initiate the retry policy. The retry policy delays requests by sleeping, but > the sleep duration is independent of the client throughput and account limit. > This results in low throughput, due to the high number of failed requests > and thrashing causes by the retry policy. > To fix this, we introduce a client-side throttle which minimizes failed > requests and maximizes throughput. Tests have shown that this improves > throughtput by ~34% when the storage account max ingress and/or egress limits > are exceeded. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14660) wasb: improve throughput by 34% when account limit exceeded
[ https://issues.apache.org/jira/browse/HADOOP-14660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas updated HADOOP-14660: Attachment: HADOOP-14660-001.patch Attaching HADOOP-14660-001.patch. Client-side throttling works as follows: When *fs.azure.selfthrottling* is *false* and *fs.azure.autothrottling* is *true*, the feature is enabled, The feature is not enabled by default. When enabled, it listens to the SendRequestEvent and ErrorReceivingResponseEvent exposed by Azure Storage SDK. In SendRequestEvent it will sleep, if necessary, to reduce errors caused by exceeding account ingress/egress limit and throttle throughput. In ErrorReceivingResponseEvent, it will inspect the HTTP request/response and update metrics. The metrics it calculates are "bytes successfully transferred", "bytes failed to transfer", "number of successful operations", and "number of failed operations". It treats reads and writes separately, so there are actually two groups of metrics, one for read (GetBlob) and another for write (PutBlock, PutPage, and AppendBlock). There is a timer that fires every 10 seconds. The timer callback analyzes the metrics during the last 10 seconds and updates the "sleep duration" used in SendRequestEvent. (There are actually two "sleep durations", one for reads and one for writes.) To update the "sleep duration", the timer callback first calculates the error percentage: Error Percentage = 100 * Bytes Failed / (Bytes Failed + Bytes Successful) The sleep duration is then updated as follows: if (Error Percentage < .1) { Sleep Duration = Sleep Duration * .975 } else if (Error Percentage < 1) { // Do nothing in attempt to stabilize. Less than 1% errors is acceptable. } else { Additional Delay = (Bytes Failed + Bytes Successful) * 10 Seconds / Bytes Successful - 10 Seconds Sleep Duration = Additional Delay / (Operations Failed + Operations Successful) } The above describes the algorithm in a nutshell, omitting special handling (to avoid divide by zero, etc) > wasb: improve throughput by 34% when account limit exceeded > --- > > Key: HADOOP-14660 > URL: https://issues.apache.org/jira/browse/HADOOP-14660 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/azure >Reporter: Thomas >Assignee: Thomas > Attachments: HADOOP-14660-001.patch > > > Big data workloads frequently exceed the Azure Storage max ingress and egress > limits > (https://docs.microsoft.com/en-us/azure/azure-subscription-service-limits). > For example, the max ingress limit for a GRS account in the United States is > currently 10 Gbps. When the limit is exceeded, the Azure Storage service > fails a percentage of incoming requests, and this causes the client to > initiate the retry policy. The retry policy delays requests by sleeping, but > the sleep duration is independent of the client throughput and account limit. > This results in low throughput, due to the high number of failed requests > and thrashing causes by the retry policy. > To fix this, we introduce a client-side throttle which minimizes failed > requests and maximizes throughput. Tests have shown that this improves > throughtput by ~34% when the storage account max ingress and/or egress limits > are exceeded. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org