[jira] [Updated] (HADOOP-15729) [s3a] stop treat fs.s3a.max.threads as the long-term minimum
[ https://issues.apache.org/jira/browse/HADOOP-15729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-15729: Fix Version/s: 3.3.0 Resolution: Fixed Status: Resolved (was: Patch Available) Closing as fixed. Sean, how far back should this patch go? > [s3a] stop treat fs.s3a.max.threads as the long-term minimum > > > Key: HADOOP-15729 > URL: https://issues.apache.org/jira/browse/HADOOP-15729 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Reporter: Sean Mackrory >Assignee: Sean Mackrory >Priority: Major > Fix For: 3.3.0 > > Attachments: HADOOP-15729.001.patch, HADOOP-15729.002.patch > > > A while ago the s3a connector started experiencing deadlocks because the AWS > SDK requires an unbounded threadpool. It places monitoring tasks on the work > queue before the tasks they wait on, so it's possible (has even happened with > larger-than-default threadpools) for the executor to become permanently > saturated and deadlock. > So we started giving an unbounded threadpool executor to the SDK, and using a > bounded, blocking threadpool service for everything else S3A needs (although > currently that's only in the S3ABlockOutputStream). fs.s3a.max.threads then > only limits this threadpool, however we also specified fs.s3a.max.threads as > the number of core threads in the unbounded threadpool, which in hindsight is > pretty terrible. > Currently those core threads do not timeout, so this is actually setting a > sort of minimum. Once that many tasks have been submitted, the threadpool > will be locked at that number until it bursts beyond that, but it will only > spin down that far. If fs.s3a.max.threads is set reasonably high and someone > uses a bunch of S3 buckets, they could easily have thousands of idle threads > constantly. > We should either not use fs.s3a.max.threads for the corepool size and > introduce a new configuration, or we should simply allow core threads to > timeout. I'm reading the OpenJDK source now to see what subtle differences > there are between core threads and other threads if core threads can timeout. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15729) [s3a] stop treat fs.s3a.max.threads as the long-term minimum
[ https://issues.apache.org/jira/browse/HADOOP-15729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Mackrory updated HADOOP-15729: --- Attachment: (was: HADOOP-15729.002.patch) > [s3a] stop treat fs.s3a.max.threads as the long-term minimum > > > Key: HADOOP-15729 > URL: https://issues.apache.org/jira/browse/HADOOP-15729 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Reporter: Sean Mackrory >Assignee: Sean Mackrory >Priority: Major > Attachments: HADOOP-15729.001.patch, HADOOP-15729.002.patch > > > A while ago the s3a connector started experiencing deadlocks because the AWS > SDK requires an unbounded threadpool. It places monitoring tasks on the work > queue before the tasks they wait on, so it's possible (has even happened with > larger-than-default threadpools) for the executor to become permanently > saturated and deadlock. > So we started giving an unbounded threadpool executor to the SDK, and using a > bounded, blocking threadpool service for everything else S3A needs (although > currently that's only in the S3ABlockOutputStream). fs.s3a.max.threads then > only limits this threadpool, however we also specified fs.s3a.max.threads as > the number of core threads in the unbounded threadpool, which in hindsight is > pretty terrible. > Currently those core threads do not timeout, so this is actually setting a > sort of minimum. Once that many tasks have been submitted, the threadpool > will be locked at that number until it bursts beyond that, but it will only > spin down that far. If fs.s3a.max.threads is set reasonably high and someone > uses a bunch of S3 buckets, they could easily have thousands of idle threads > constantly. > We should either not use fs.s3a.max.threads for the corepool size and > introduce a new configuration, or we should simply allow core threads to > timeout. I'm reading the OpenJDK source now to see what subtle differences > there are between core threads and other threads if core threads can timeout. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15729) [s3a] stop treat fs.s3a.max.threads as the long-term minimum
[ https://issues.apache.org/jira/browse/HADOOP-15729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Mackrory updated HADOOP-15729: --- Attachment: HADOOP-15729.002.patch > [s3a] stop treat fs.s3a.max.threads as the long-term minimum > > > Key: HADOOP-15729 > URL: https://issues.apache.org/jira/browse/HADOOP-15729 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Reporter: Sean Mackrory >Assignee: Sean Mackrory >Priority: Major > Attachments: HADOOP-15729.001.patch, HADOOP-15729.002.patch > > > A while ago the s3a connector started experiencing deadlocks because the AWS > SDK requires an unbounded threadpool. It places monitoring tasks on the work > queue before the tasks they wait on, so it's possible (has even happened with > larger-than-default threadpools) for the executor to become permanently > saturated and deadlock. > So we started giving an unbounded threadpool executor to the SDK, and using a > bounded, blocking threadpool service for everything else S3A needs (although > currently that's only in the S3ABlockOutputStream). fs.s3a.max.threads then > only limits this threadpool, however we also specified fs.s3a.max.threads as > the number of core threads in the unbounded threadpool, which in hindsight is > pretty terrible. > Currently those core threads do not timeout, so this is actually setting a > sort of minimum. Once that many tasks have been submitted, the threadpool > will be locked at that number until it bursts beyond that, but it will only > spin down that far. If fs.s3a.max.threads is set reasonably high and someone > uses a bunch of S3 buckets, they could easily have thousands of idle threads > constantly. > We should either not use fs.s3a.max.threads for the corepool size and > introduce a new configuration, or we should simply allow core threads to > timeout. I'm reading the OpenJDK source now to see what subtle differences > there are between core threads and other threads if core threads can timeout. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15729) [s3a] stop treat fs.s3a.max.threads as the long-term minimum
[ https://issues.apache.org/jira/browse/HADOOP-15729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Mackrory updated HADOOP-15729: --- Status: Patch Available (was: Open) Resubmitting as checkstyle output has expired. > [s3a] stop treat fs.s3a.max.threads as the long-term minimum > > > Key: HADOOP-15729 > URL: https://issues.apache.org/jira/browse/HADOOP-15729 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Reporter: Sean Mackrory >Assignee: Sean Mackrory >Priority: Major > Attachments: HADOOP-15729.001.patch, HADOOP-15729.002.patch > > > A while ago the s3a connector started experiencing deadlocks because the AWS > SDK requires an unbounded threadpool. It places monitoring tasks on the work > queue before the tasks they wait on, so it's possible (has even happened with > larger-than-default threadpools) for the executor to become permanently > saturated and deadlock. > So we started giving an unbounded threadpool executor to the SDK, and using a > bounded, blocking threadpool service for everything else S3A needs (although > currently that's only in the S3ABlockOutputStream). fs.s3a.max.threads then > only limits this threadpool, however we also specified fs.s3a.max.threads as > the number of core threads in the unbounded threadpool, which in hindsight is > pretty terrible. > Currently those core threads do not timeout, so this is actually setting a > sort of minimum. Once that many tasks have been submitted, the threadpool > will be locked at that number until it bursts beyond that, but it will only > spin down that far. If fs.s3a.max.threads is set reasonably high and someone > uses a bunch of S3 buckets, they could easily have thousands of idle threads > constantly. > We should either not use fs.s3a.max.threads for the corepool size and > introduce a new configuration, or we should simply allow core threads to > timeout. I'm reading the OpenJDK source now to see what subtle differences > there are between core threads and other threads if core threads can timeout. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15729) [s3a] stop treat fs.s3a.max.threads as the long-term minimum
[ https://issues.apache.org/jira/browse/HADOOP-15729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Mackrory updated HADOOP-15729: --- Status: Open (was: Patch Available) > [s3a] stop treat fs.s3a.max.threads as the long-term minimum > > > Key: HADOOP-15729 > URL: https://issues.apache.org/jira/browse/HADOOP-15729 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Reporter: Sean Mackrory >Assignee: Sean Mackrory >Priority: Major > Attachments: HADOOP-15729.001.patch, HADOOP-15729.002.patch > > > A while ago the s3a connector started experiencing deadlocks because the AWS > SDK requires an unbounded threadpool. It places monitoring tasks on the work > queue before the tasks they wait on, so it's possible (has even happened with > larger-than-default threadpools) for the executor to become permanently > saturated and deadlock. > So we started giving an unbounded threadpool executor to the SDK, and using a > bounded, blocking threadpool service for everything else S3A needs (although > currently that's only in the S3ABlockOutputStream). fs.s3a.max.threads then > only limits this threadpool, however we also specified fs.s3a.max.threads as > the number of core threads in the unbounded threadpool, which in hindsight is > pretty terrible. > Currently those core threads do not timeout, so this is actually setting a > sort of minimum. Once that many tasks have been submitted, the threadpool > will be locked at that number until it bursts beyond that, but it will only > spin down that far. If fs.s3a.max.threads is set reasonably high and someone > uses a bunch of S3 buckets, they could easily have thousands of idle threads > constantly. > We should either not use fs.s3a.max.threads for the corepool size and > introduce a new configuration, or we should simply allow core threads to > timeout. I'm reading the OpenJDK source now to see what subtle differences > there are between core threads and other threads if core threads can timeout. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15729) [s3a] stop treat fs.s3a.max.threads as the long-term minimum
[ https://issues.apache.org/jira/browse/HADOOP-15729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Mackrory updated HADOOP-15729: --- Status: Patch Available (was: Open) > [s3a] stop treat fs.s3a.max.threads as the long-term minimum > > > Key: HADOOP-15729 > URL: https://issues.apache.org/jira/browse/HADOOP-15729 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Reporter: Sean Mackrory >Assignee: Sean Mackrory >Priority: Major > Attachments: HADOOP-15729.001.patch, HADOOP-15729.002.patch > > > A while ago the s3a connector started experiencing deadlocks because the AWS > SDK requires an unbounded threadpool. It places monitoring tasks on the work > queue before the tasks they wait on, so it's possible (has even happened with > larger-than-default threadpools) for the executor to become permanently > saturated and deadlock. > So we started giving an unbounded threadpool executor to the SDK, and using a > bounded, blocking threadpool service for everything else S3A needs (although > currently that's only in the S3ABlockOutputStream). fs.s3a.max.threads then > only limits this threadpool, however we also specified fs.s3a.max.threads as > the number of core threads in the unbounded threadpool, which in hindsight is > pretty terrible. > Currently those core threads do not timeout, so this is actually setting a > sort of minimum. Once that many tasks have been submitted, the threadpool > will be locked at that number until it bursts beyond that, but it will only > spin down that far. If fs.s3a.max.threads is set reasonably high and someone > uses a bunch of S3 buckets, they could easily have thousands of idle threads > constantly. > We should either not use fs.s3a.max.threads for the corepool size and > introduce a new configuration, or we should simply allow core threads to > timeout. I'm reading the OpenJDK source now to see what subtle differences > there are between core threads and other threads if core threads can timeout. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15729) [s3a] stop treat fs.s3a.max.threads as the long-term minimum
[ https://issues.apache.org/jira/browse/HADOOP-15729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Mackrory updated HADOOP-15729: --- Attachment: HADOOP-15729.002.patch > [s3a] stop treat fs.s3a.max.threads as the long-term minimum > > > Key: HADOOP-15729 > URL: https://issues.apache.org/jira/browse/HADOOP-15729 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Reporter: Sean Mackrory >Assignee: Sean Mackrory >Priority: Major > Attachments: HADOOP-15729.001.patch, HADOOP-15729.002.patch > > > A while ago the s3a connector started experiencing deadlocks because the AWS > SDK requires an unbounded threadpool. It places monitoring tasks on the work > queue before the tasks they wait on, so it's possible (has even happened with > larger-than-default threadpools) for the executor to become permanently > saturated and deadlock. > So we started giving an unbounded threadpool executor to the SDK, and using a > bounded, blocking threadpool service for everything else S3A needs (although > currently that's only in the S3ABlockOutputStream). fs.s3a.max.threads then > only limits this threadpool, however we also specified fs.s3a.max.threads as > the number of core threads in the unbounded threadpool, which in hindsight is > pretty terrible. > Currently those core threads do not timeout, so this is actually setting a > sort of minimum. Once that many tasks have been submitted, the threadpool > will be locked at that number until it bursts beyond that, but it will only > spin down that far. If fs.s3a.max.threads is set reasonably high and someone > uses a bunch of S3 buckets, they could easily have thousands of idle threads > constantly. > We should either not use fs.s3a.max.threads for the corepool size and > introduce a new configuration, or we should simply allow core threads to > timeout. I'm reading the OpenJDK source now to see what subtle differences > there are between core threads and other threads if core threads can timeout. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15729) [s3a] stop treat fs.s3a.max.threads as the long-term minimum
[ https://issues.apache.org/jira/browse/HADOOP-15729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Mackrory updated HADOOP-15729: --- Attachment: HADOOP-15729.001.patch > [s3a] stop treat fs.s3a.max.threads as the long-term minimum > > > Key: HADOOP-15729 > URL: https://issues.apache.org/jira/browse/HADOOP-15729 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Reporter: Sean Mackrory >Assignee: Sean Mackrory >Priority: Major > Attachments: HADOOP-15729.001.patch > > > A while ago the s3a connector started experiencing deadlocks because the AWS > SDK requires an unbounded threadpool. It places monitoring tasks on the work > queue before the tasks they wait on, so it's possible (has even happened with > larger-than-default threadpools) for the executor to become permanently > saturated and deadlock. > So we started giving an unbounded threadpool executor to the SDK, and using a > bounded, blocking threadpool service for everything else S3A needs (although > currently that's only in the S3ABlockOutputStream). fs.s3a.max.threads then > only limits this threadpool, however we also specified fs.s3a.max.threads as > the number of core threads in the unbounded threadpool, which in hindsight is > pretty terrible. > Currently those core threads do not timeout, so this is actually setting a > sort of minimum. Once that many tasks have been submitted, the threadpool > will be locked at that number until it bursts beyond that, but it will only > spin down that far. If fs.s3a.max.threads is set reasonably high and someone > uses a bunch of S3 buckets, they could easily have thousands of idle threads > constantly. > We should either not use fs.s3a.max.threads for the corepool size and > introduce a new configuration, or we should simply allow core threads to > timeout. I'm reading the OpenJDK source now to see what subtle differences > there are between core threads and other threads if core threads can timeout. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15729) [s3a] stop treat fs.s3a.max.threads as the long-term minimum
[ https://issues.apache.org/jira/browse/HADOOP-15729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-15729: Component/s: fs/s3 > [s3a] stop treat fs.s3a.max.threads as the long-term minimum > > > Key: HADOOP-15729 > URL: https://issues.apache.org/jira/browse/HADOOP-15729 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Reporter: Sean Mackrory >Assignee: Sean Mackrory >Priority: Major > > A while ago the s3a connector started experiencing deadlocks because the AWS > SDK requires an unbounded threadpool. It places monitoring tasks on the work > queue before the tasks they wait on, so it's possible (has even happened with > larger-than-default threadpools) for the executor to become permanently > saturated and deadlock. > So we started giving an unbounded threadpool executor to the SDK, and using a > bounded, blocking threadpool service for everything else S3A needs (although > currently that's only in the S3ABlockOutputStream). fs.s3a.max.threads then > only limits this threadpool, however we also specified fs.s3a.max.threads as > the number of core threads in the unbounded threadpool, which in hindsight is > pretty terrible. > Currently those core threads do not timeout, so this is actually setting a > sort of minimum. Once that many tasks have been submitted, the threadpool > will be locked at that number until it bursts beyond that, but it will only > spin down that far. If fs.s3a.max.threads is set reasonably high and someone > uses a bunch of S3 buckets, they could easily have thousands of idle threads > constantly. > We should either not use fs.s3a.max.threads for the corepool size and > introduce a new configuration, or we should simply allow core threads to > timeout. I'm reading the OpenJDK source now to see what subtle differences > there are between core threads and other threads if core threads can timeout. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15729) [s3a] stop treat fs.s3a.max.threads as the long-term minimum
[ https://issues.apache.org/jira/browse/HADOOP-15729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-15729: Issue Type: Sub-task (was: Bug) Parent: HADOOP-15620 > [s3a] stop treat fs.s3a.max.threads as the long-term minimum > > > Key: HADOOP-15729 > URL: https://issues.apache.org/jira/browse/HADOOP-15729 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Reporter: Sean Mackrory >Assignee: Sean Mackrory >Priority: Major > > A while ago the s3a connector started experiencing deadlocks because the AWS > SDK requires an unbounded threadpool. It places monitoring tasks on the work > queue before the tasks they wait on, so it's possible (has even happened with > larger-than-default threadpools) for the executor to become permanently > saturated and deadlock. > So we started giving an unbounded threadpool executor to the SDK, and using a > bounded, blocking threadpool service for everything else S3A needs (although > currently that's only in the S3ABlockOutputStream). fs.s3a.max.threads then > only limits this threadpool, however we also specified fs.s3a.max.threads as > the number of core threads in the unbounded threadpool, which in hindsight is > pretty terrible. > Currently those core threads do not timeout, so this is actually setting a > sort of minimum. Once that many tasks have been submitted, the threadpool > will be locked at that number until it bursts beyond that, but it will only > spin down that far. If fs.s3a.max.threads is set reasonably high and someone > uses a bunch of S3 buckets, they could easily have thousands of idle threads > constantly. > We should either not use fs.s3a.max.threads for the corepool size and > introduce a new configuration, or we should simply allow core threads to > timeout. I'm reading the OpenJDK source now to see what subtle differences > there are between core threads and other threads if core threads can timeout. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org