[jira] [Updated] (HIVE-13858) LLAP: A preempted task can end up waiting on completeInitialization if some part of the executing code suppressed the interrupt
[ https://issues.apache.org/jira/browse/HIVE-13858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-13858: -- Resolution: Fixed Fix Version/s: 2.1.0 Status: Resolved (was: Patch Available) > LLAP: A preempted task can end up waiting on completeInitialization if some > part of the executing code suppressed the interrupt > --- > > Key: HIVE-13858 > URL: https://issues.apache.org/jira/browse/HIVE-13858 > Project: Hive > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: Siddharth Seth >Assignee: Siddharth Seth >Priority: Blocker > Labels: llap > Fix For: 2.1.0 > > Attachments: HIVE-13858.01.patch, HIVE-13858.02.patch, > HIVE-13858.03.patch > > > An interrupt along with a HiveProcessor.abort call is made when attempting to > preempt a task. > In this specific case, the task was in the middle of HDFS IO - which > 'handled' the interrupt by retrying. As a result the interrupt status on the > thread was reset - so instead of skipping the future.get in > completeInitialization - the task ended up blocking there. > End result - a single executor slot permanently blocked in LLAP. Depending on > what else is running - this can cause a cluster level deadlock. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13858) LLAP: A preempted task can end up waiting on completeInitialization if some part of the executing code suppressed the interrupt
[ https://issues.apache.org/jira/browse/HIVE-13858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-13858: --- Priority: Blocker (was: Critical) > LLAP: A preempted task can end up waiting on completeInitialization if some > part of the executing code suppressed the interrupt > --- > > Key: HIVE-13858 > URL: https://issues.apache.org/jira/browse/HIVE-13858 > Project: Hive > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: Siddharth Seth >Assignee: Siddharth Seth >Priority: Blocker > Labels: llap > Attachments: HIVE-13858.01.patch, HIVE-13858.02.patch, > HIVE-13858.03.patch > > > An interrupt along with a HiveProcessor.abort call is made when attempting to > preempt a task. > In this specific case, the task was in the middle of HDFS IO - which > 'handled' the interrupt by retrying. As a result the interrupt status on the > thread was reset - so instead of skipping the future.get in > completeInitialization - the task ended up blocking there. > End result - a single executor slot permanently blocked in LLAP. Depending on > what else is running - this can cause a cluster level deadlock. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13858) LLAP: A preempted task can end up waiting on completeInitialization if some part of the executing code suppressed the interrupt
[ https://issues.apache.org/jira/browse/HIVE-13858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-13858: -- Attachment: HIVE-13858.03.patch Updated patch with comments addressed. Throwing an InterruptedException does not clear the InterruptStatus (catching it probably does). Going by most recommendations - I've removed the code to clear the interrupt status. Also, propagating InterruptedException all the way out of the Hive processor. > LLAP: A preempted task can end up waiting on completeInitialization if some > part of the executing code suppressed the interrupt > --- > > Key: HIVE-13858 > URL: https://issues.apache.org/jira/browse/HIVE-13858 > Project: Hive > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: Siddharth Seth >Assignee: Siddharth Seth >Priority: Critical > Labels: llap > Attachments: HIVE-13858.01.patch, HIVE-13858.02.patch, > HIVE-13858.03.patch > > > An interrupt along with a HiveProcessor.abort call is made when attempting to > preempt a task. > In this specific case, the task was in the middle of HDFS IO - which > 'handled' the interrupt by retrying. As a result the interrupt status on the > thread was reset - so instead of skipping the future.get in > completeInitialization - the task ended up blocking there. > End result - a single executor slot permanently blocked in LLAP. Depending on > what else is running - this can cause a cluster level deadlock. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13858) LLAP: A preempted task can end up waiting on completeInitialization if some part of the executing code suppressed the interrupt
[ https://issues.apache.org/jira/browse/HIVE-13858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-13858: -- Attachment: HIVE-13858.02.patch Updated patch with review comments addressed. > LLAP: A preempted task can end up waiting on completeInitialization if some > part of the executing code suppressed the interrupt > --- > > Key: HIVE-13858 > URL: https://issues.apache.org/jira/browse/HIVE-13858 > Project: Hive > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: Siddharth Seth >Assignee: Siddharth Seth >Priority: Critical > Labels: llap > Attachments: HIVE-13858.01.patch, HIVE-13858.02.patch > > > An interrupt along with a HiveProcessor.abort call is made when attempting to > preempt a task. > In this specific case, the task was in the middle of HDFS IO - which > 'handled' the interrupt by retrying. As a result the interrupt status on the > thread was reset - so instead of skipping the future.get in > completeInitialization - the task ended up blocking there. > End result - a single executor slot permanently blocked in LLAP. Depending on > what else is running - this can cause a cluster level deadlock. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13858) LLAP: A preempted task can end up waiting on completeInitialization if some part of the executing code suppressed the interrupt
[ https://issues.apache.org/jira/browse/HIVE-13858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-13858: -- Status: Patch Available (was: Open) > LLAP: A preempted task can end up waiting on completeInitialization if some > part of the executing code suppressed the interrupt > --- > > Key: HIVE-13858 > URL: https://issues.apache.org/jira/browse/HIVE-13858 > Project: Hive > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: Siddharth Seth >Assignee: Siddharth Seth >Priority: Critical > Labels: llap > Attachments: HIVE-13858.01.patch > > > An interrupt along with a HiveProcessor.abort call is made when attempting to > preempt a task. > In this specific case, the task was in the middle of HDFS IO - which > 'handled' the interrupt by retrying. As a result the interrupt status on the > thread was reset - so instead of skipping the future.get in > completeInitialization - the task ended up blocking there. > End result - a single executor slot permanently blocked in LLAP. Depending on > what else is running - this can cause a cluster level deadlock. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13858) LLAP: A preempted task can end up waiting on completeInitialization if some part of the executing code suppressed the interrupt
[ https://issues.apache.org/jira/browse/HIVE-13858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-13858: -- Attachment: HIVE-13858.01.patch Patch which should fix this. 1. It changes a 'aborted && interrupted' check to just say 'aborted' 2. It changes completeInitialization - to do a future.get with a timeout, and check the abort status between these waits. > LLAP: A preempted task can end up waiting on completeInitialization if some > part of the executing code suppressed the interrupt > --- > > Key: HIVE-13858 > URL: https://issues.apache.org/jira/browse/HIVE-13858 > Project: Hive > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: Siddharth Seth >Assignee: Siddharth Seth >Priority: Critical > Labels: llap > Attachments: HIVE-13858.01.patch > > > An interrupt along with a HiveProcessor.abort call is made when attempting to > preempt a task. > In this specific case, the task was in the middle of HDFS IO - which > 'handled' the interrupt by retrying. As a result the interrupt status on the > thread was reset - so instead of skipping the future.get in > completeInitialization - the task ended up blocking there. > End result - a single executor slot permanently blocked in LLAP. Depending on > what else is running - this can cause a cluster level deadlock. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13858) LLAP: A preempted task can end up waiting on completeInitialization if some part of the executing code suppressed the interrupt
[ https://issues.apache.org/jira/browse/HIVE-13858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-13858: -- Affects Version/s: 2.0.0 > LLAP: A preempted task can end up waiting on completeInitialization if some > part of the executing code suppressed the interrupt > --- > > Key: HIVE-13858 > URL: https://issues.apache.org/jira/browse/HIVE-13858 > Project: Hive > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: Siddharth Seth >Assignee: Siddharth Seth >Priority: Critical > Labels: llap > > An interrupt along with a HiveProcessor.abort call is made when attempting to > preempt a task. > In this specific case, the task was in the middle of HDFS IO - which > 'handled' the interrupt by retrying. As a result the interrupt status on the > thread was reset - so instead of skipping the future.get in > completeInitialization - the task ended up blocking there. > End result - a single executor slot permanently blocked in LLAP. Depending on > what else is running - this can cause a cluster level deadlock. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13858) LLAP: A preempted task can end up waiting on completeInitialization if some part of the executing code suppressed the interrupt
[ https://issues.apache.org/jira/browse/HIVE-13858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-13858: -- Labels: llap (was: ) > LLAP: A preempted task can end up waiting on completeInitialization if some > part of the executing code suppressed the interrupt > --- > > Key: HIVE-13858 > URL: https://issues.apache.org/jira/browse/HIVE-13858 > Project: Hive > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: Siddharth Seth >Assignee: Siddharth Seth >Priority: Critical > Labels: llap > > An interrupt along with a HiveProcessor.abort call is made when attempting to > preempt a task. > In this specific case, the task was in the middle of HDFS IO - which > 'handled' the interrupt by retrying. As a result the interrupt status on the > thread was reset - so instead of skipping the future.get in > completeInitialization - the task ended up blocking there. > End result - a single executor slot permanently blocked in LLAP. Depending on > what else is running - this can cause a cluster level deadlock. -- This message was sent by Atlassian JIRA (v6.3.4#6332)