[jira] [Commented] (HIVE-13858) LLAP: A preempted task can end up waiting on completeInitialization if some part of the executing code suppressed the interrupt

2016-06-01 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15310790#comment-15310790
 ] 

Siddharth Seth commented on HIVE-13858:
---

The test failures are unrelated. This particular run took over 4 hours compared 
to other runs which take 2 hours. However, from the logs it looks like the mvn 
build started at 23:00, and the test run completed at 01:19 - which is a 2h20m 
and consistent with other runs. Looks to be a delay in the run actually 
starting.

Committing.

> LLAP: A preempted task can end up waiting on completeInitialization if some 
> part of the executing code suppressed the interrupt
> ---
>
> Key: HIVE-13858
> URL: https://issues.apache.org/jira/browse/HIVE-13858
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
>Priority: Blocker
>  Labels: llap
> Attachments: HIVE-13858.01.patch, HIVE-13858.02.patch, 
> HIVE-13858.03.patch
>
>
> An interrupt along with a HiveProcessor.abort call is made when attempting to 
> preempt a task.
> In this specific case, the task was in the middle of HDFS IO - which 
> 'handled' the interrupt by retrying. As a result the interrupt status on the 
> thread was reset - so instead of skipping the future.get in 
> completeInitialization - the task ended up blocking there.
> End result - a single executor slot permanently blocked in LLAP. Depending on 
> what else is running - this can cause a cluster level deadlock.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13858) LLAP: A preempted task can end up waiting on completeInitialization if some part of the executing code suppressed the interrupt

2016-05-31 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15309008#comment-15309008
 ] 

Hive QA commented on HIVE-13858:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12806662/HIVE-13858.03.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 12 failed/errored test(s), 10195 tests 
executed
*Failed tests:*
{noformat}
TestHWISessionManager - did not produce a TEST-*.xml file
TestJdbcWithMiniHA - did not produce a TEST-*.xml file
TestJdbcWithMiniMr - did not produce a TEST-*.xml file
TestOperationLoggingAPIWithTez - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ivyDownload
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_constprog_partitioner
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_complex_all
org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager2.testLocksInSubquery
{noformat}

Test results: 
http://ec2-54-177-240-2.us-west-1.compute.amazonaws.com/job/PreCommit-HIVE-MASTER-Build/472/testReport
Console output: 
http://ec2-54-177-240-2.us-west-1.compute.amazonaws.com/job/PreCommit-HIVE-MASTER-Build/472/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-472/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 12 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12806662 - PreCommit-HIVE-MASTER-Build

> LLAP: A preempted task can end up waiting on completeInitialization if some 
> part of the executing code suppressed the interrupt
> ---
>
> Key: HIVE-13858
> URL: https://issues.apache.org/jira/browse/HIVE-13858
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
>Priority: Critical
>  Labels: llap
> Attachments: HIVE-13858.01.patch, HIVE-13858.02.patch, 
> HIVE-13858.03.patch
>
>
> An interrupt along with a HiveProcessor.abort call is made when attempting to 
> preempt a task.
> In this specific case, the task was in the middle of HDFS IO - which 
> 'handled' the interrupt by retrying. As a result the interrupt status on the 
> thread was reset - so instead of skipping the future.get in 
> completeInitialization - the task ended up blocking there.
> End result - a single executor slot permanently blocked in LLAP. Depending on 
> what else is running - this can cause a cluster level deadlock.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13858) LLAP: A preempted task can end up waiting on completeInitialization if some part of the executing code suppressed the interrupt

2016-05-31 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15308553#comment-15308553
 ] 

Siddharth Seth commented on HIVE-13858:
---

[~jcamachorodriguez] - this is pending a jenkins run - which is next in line. 
If it comes back clean, I'll commit it.
I would ideally like to include it in 2.1 - given that preemption in LLAP can 
run into issues without this. If you're creating the RC tomorrow - I guess it 
gets documented as a known issue and will get fixed in 2.1.1 or 2.2

> LLAP: A preempted task can end up waiting on completeInitialization if some 
> part of the executing code suppressed the interrupt
> ---
>
> Key: HIVE-13858
> URL: https://issues.apache.org/jira/browse/HIVE-13858
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
>Priority: Critical
>  Labels: llap
> Attachments: HIVE-13858.01.patch, HIVE-13858.02.patch, 
> HIVE-13858.03.patch
>
>
> An interrupt along with a HiveProcessor.abort call is made when attempting to 
> preempt a task.
> In this specific case, the task was in the middle of HDFS IO - which 
> 'handled' the interrupt by retrying. As a result the interrupt status on the 
> thread was reset - so instead of skipping the future.get in 
> completeInitialization - the task ended up blocking there.
> End result - a single executor slot permanently blocked in LLAP. Depending on 
> what else is running - this can cause a cluster level deadlock.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13858) LLAP: A preempted task can end up waiting on completeInitialization if some part of the executing code suppressed the interrupt

2016-05-31 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15308453#comment-15308453
 ] 

Jesus Camacho Rodriguez commented on HIVE-13858:


[~sseth], what is the status on this one? I plan to create the first 2.1.0 RC 
tomorrow and this is marked as Critical. Should it go in and can it be 
deferred? Thanks

> LLAP: A preempted task can end up waiting on completeInitialization if some 
> part of the executing code suppressed the interrupt
> ---
>
> Key: HIVE-13858
> URL: https://issues.apache.org/jira/browse/HIVE-13858
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
>Priority: Critical
>  Labels: llap
> Attachments: HIVE-13858.01.patch, HIVE-13858.02.patch, 
> HIVE-13858.03.patch
>
>
> An interrupt along with a HiveProcessor.abort call is made when attempting to 
> preempt a task.
> In this specific case, the task was in the middle of HDFS IO - which 
> 'handled' the interrupt by retrying. As a result the interrupt status on the 
> thread was reset - so instead of skipping the future.get in 
> completeInitialization - the task ended up blocking there.
> End result - a single executor slot permanently blocked in LLAP. Depending on 
> what else is running - this can cause a cluster level deadlock.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13858) LLAP: A preempted task can end up waiting on completeInitialization if some part of the executing code suppressed the interrupt

2016-05-27 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15304781#comment-15304781
 ] 

Prasanth Jayachandran commented on HIVE-13858:
--

LGMT, +1. Pending tests

> LLAP: A preempted task can end up waiting on completeInitialization if some 
> part of the executing code suppressed the interrupt
> ---
>
> Key: HIVE-13858
> URL: https://issues.apache.org/jira/browse/HIVE-13858
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
>Priority: Critical
>  Labels: llap
> Attachments: HIVE-13858.01.patch, HIVE-13858.02.patch, 
> HIVE-13858.03.patch
>
>
> An interrupt along with a HiveProcessor.abort call is made when attempting to 
> preempt a task.
> In this specific case, the task was in the middle of HDFS IO - which 
> 'handled' the interrupt by retrying. As a result the interrupt status on the 
> thread was reset - so instead of skipping the future.get in 
> completeInitialization - the task ended up blocking there.
> End result - a single executor slot permanently blocked in LLAP. Depending on 
> what else is running - this can cause a cluster level deadlock.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13858) LLAP: A preempted task can end up waiting on completeInitialization if some part of the executing code suppressed the interrupt

2016-05-27 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15303670#comment-15303670
 ] 

Prasanth Jayachandran commented on HIVE-13858:
--

Looks like RB went down when I was commenting. Adding the comments here
1) CancellationException seems to be not caught. Is it not expected?
2) I think we can remove the TODO for throwing HiveException and replace it 
with InterruptedException. IIRC throwing InterruptedException will also clear 
the interrupt status flag, so the Thread.interrupted() call is also not 
required. TezProcessor anyways catches Throwable, so it should be safe to throw 
InterruptedException.

> LLAP: A preempted task can end up waiting on completeInitialization if some 
> part of the executing code suppressed the interrupt
> ---
>
> Key: HIVE-13858
> URL: https://issues.apache.org/jira/browse/HIVE-13858
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
>Priority: Critical
>  Labels: llap
> Attachments: HIVE-13858.01.patch, HIVE-13858.02.patch
>
>
> An interrupt along with a HiveProcessor.abort call is made when attempting to 
> preempt a task.
> In this specific case, the task was in the middle of HDFS IO - which 
> 'handled' the interrupt by retrying. As a result the interrupt status on the 
> thread was reset - so instead of skipping the future.get in 
> completeInitialization - the task ended up blocking there.
> End result - a single executor slot permanently blocked in LLAP. Depending on 
> what else is running - this can cause a cluster level deadlock.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13858) LLAP: A preempted task can end up waiting on completeInitialization if some part of the executing code suppressed the interrupt

2016-05-25 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15301307#comment-15301307
 ] 

Prasanth Jayachandran commented on HIVE-13858:
--

Left some comments in RB

> LLAP: A preempted task can end up waiting on completeInitialization if some 
> part of the executing code suppressed the interrupt
> ---
>
> Key: HIVE-13858
> URL: https://issues.apache.org/jira/browse/HIVE-13858
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
>Priority: Critical
>  Labels: llap
> Attachments: HIVE-13858.01.patch
>
>
> An interrupt along with a HiveProcessor.abort call is made when attempting to 
> preempt a task.
> In this specific case, the task was in the middle of HDFS IO - which 
> 'handled' the interrupt by retrying. As a result the interrupt status on the 
> thread was reset - so instead of skipping the future.get in 
> completeInitialization - the task ended up blocking there.
> End result - a single executor slot permanently blocked in LLAP. Depending on 
> what else is running - this can cause a cluster level deadlock.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)