[jira] [Updated] (HIVE-9976) Possible race condition in DynamicPartitionPruner for 200ms tasks

2015-03-25 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-9976:
-
Fix Version/s: (was: 1.0.1)

 Possible race condition in DynamicPartitionPruner for 200ms tasks
 --

 Key: HIVE-9976
 URL: https://issues.apache.org/jira/browse/HIVE-9976
 Project: Hive
  Issue Type: Bug
  Components: Tez
Affects Versions: 1.0.0
Reporter: Gopal V
Assignee: Siddharth Seth
 Attachments: HIVE-9976.1.patch, llap_vertex_200ms.png


 Race condition in the DynamicPartitionPruner between 
 DynamicPartitionPruner::processVertex() and 
 DynamicPartitionpruner::addEvent() for tasks which respond with both the 
 result and success in a single heartbeat sequence.
 {code}
 2015-03-16 07:05:01,589 ERROR [InputInitializer [Map 1] #0] 
 tez.DynamicPartitionPruner: Expecting: 1, received: 0
 2015-03-16 07:05:01,590 ERROR [Dispatcher thread: Central] impl.VertexImpl: 
 Vertex Input: store_sales initializer failed, 
 vertex=vertex_1424502260528_1113_4_04 [Map 1]
 org.apache.tez.dag.app.dag.impl.AMUserCodeException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: Incorrect event count in 
 dynamic parition pruning
 {code}
 !llap_vertex_200ms.png!
 All 4 upstream vertices of Map 1 need to finish within ~200ms to trigger 
 this, which seems to be consistently happening with LLAP.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9976) Possible race condition in DynamicPartitionPruner for 200ms tasks

2015-03-25 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HIVE-9976:
-
Attachment: HIVE-9976.2.patch

Thanks for the review. Updated patch with comments addressed, and some more 
changes.

bq. Not your fault - but there are 2 paths through HiveSplitGenerator.
Moved the methods into SplitGrouper. There's a static cache in there which 
seems a little strange. Will create a follow up jira to investigate this. For 
now I've changed that to a ConcurrentMap since split generation can run in 
parallel.

bq. i see you've fixed calling close consistently on the data input stream. 
maybe use try{}finally there?
Fixed. There was a bug with some of the other conditions which I'd changed. 
Fixed that as well.

bq. it seems you're setting numexpectedevents to 0 first and then turn around 
and call decrement. Why not just set to -1? Also - why atomic integers? as far 
as i can tell all access to these maps is synchronized.
numExpectedEvents is decremented for each column for which a source will send 
events. That's used to track total number of expected events from that source. 
Added a comment for this.
Moved from AtomicIntegers to MutableInt - this was just to avoid re-inserting 
the Integer into the map, and not for thread safety.

bq. does it make sense to make initialize in the pruner private now? (can't be 
used to init anymore - only from the constr). Also, the parameters aren't used 
anymore, right?
Done, along with some other methods.


 Possible race condition in DynamicPartitionPruner for 200ms tasks
 --

 Key: HIVE-9976
 URL: https://issues.apache.org/jira/browse/HIVE-9976
 Project: Hive
  Issue Type: Bug
  Components: Tez
Affects Versions: 1.0.0
Reporter: Gopal V
Assignee: Siddharth Seth
 Attachments: HIVE-9976.1.patch, HIVE-9976.2.patch, 
 llap_vertex_200ms.png


 Race condition in the DynamicPartitionPruner between 
 DynamicPartitionPruner::processVertex() and 
 DynamicPartitionpruner::addEvent() for tasks which respond with both the 
 result and success in a single heartbeat sequence.
 {code}
 2015-03-16 07:05:01,589 ERROR [InputInitializer [Map 1] #0] 
 tez.DynamicPartitionPruner: Expecting: 1, received: 0
 2015-03-16 07:05:01,590 ERROR [Dispatcher thread: Central] impl.VertexImpl: 
 Vertex Input: store_sales initializer failed, 
 vertex=vertex_1424502260528_1113_4_04 [Map 1]
 org.apache.tez.dag.app.dag.impl.AMUserCodeException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: Incorrect event count in 
 dynamic parition pruning
 {code}
 !llap_vertex_200ms.png!
 All 4 upstream vertices of Map 1 need to finish within ~200ms to trigger 
 this, which seems to be consistently happening with LLAP.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9976) Possible race condition in DynamicPartitionPruner for 200ms tasks

2015-03-24 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HIVE-9976:
-
Affects Version/s: (was: 1.1.0)
   1.0.0
Fix Version/s: (was: 1.1.1)
   (was: 1.2.0)
   1.0.1
 Assignee: Siddharth Seth  (was: Gunther Hagleitner)

This is not limited to LLAP. Assigning to myself - to change the handling of 
vertex success / init events.

 Possible race condition in DynamicPartitionPruner for 200ms tasks
 --

 Key: HIVE-9976
 URL: https://issues.apache.org/jira/browse/HIVE-9976
 Project: Hive
  Issue Type: Sub-task
  Components: Tez
Affects Versions: 1.0.0
Reporter: Gopal V
Assignee: Siddharth Seth
 Fix For: 1.0.1

 Attachments: llap_vertex_200ms.png


 Race condition in the DynamicPartitionPruner between 
 DynamicPartitionPruner::processVertex() and 
 DynamicPartitionpruner::addEvent() for tasks which respond with both the 
 result and success in a single heartbeat sequence.
 {code}
 2015-03-16 07:05:01,589 ERROR [InputInitializer [Map 1] #0] 
 tez.DynamicPartitionPruner: Expecting: 1, received: 0
 2015-03-16 07:05:01,590 ERROR [Dispatcher thread: Central] impl.VertexImpl: 
 Vertex Input: store_sales initializer failed, 
 vertex=vertex_1424502260528_1113_4_04 [Map 1]
 org.apache.tez.dag.app.dag.impl.AMUserCodeException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: Incorrect event count in 
 dynamic parition pruning
 {code}
 !llap_vertex_200ms.png!
 All 4 upstream vertices of Map 1 need to finish within ~200ms to trigger 
 this, which seems to be consistently happening with LLAP.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9976) Possible race condition in DynamicPartitionPruner for 200ms tasks

2015-03-24 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HIVE-9976:
-
Issue Type: Bug  (was: Sub-task)
Parent: (was: HIVE-7926)

 Possible race condition in DynamicPartitionPruner for 200ms tasks
 --

 Key: HIVE-9976
 URL: https://issues.apache.org/jira/browse/HIVE-9976
 Project: Hive
  Issue Type: Bug
  Components: Tez
Affects Versions: 1.0.0
Reporter: Gopal V
Assignee: Siddharth Seth
 Fix For: 1.0.1

 Attachments: llap_vertex_200ms.png


 Race condition in the DynamicPartitionPruner between 
 DynamicPartitionPruner::processVertex() and 
 DynamicPartitionpruner::addEvent() for tasks which respond with both the 
 result and success in a single heartbeat sequence.
 {code}
 2015-03-16 07:05:01,589 ERROR [InputInitializer [Map 1] #0] 
 tez.DynamicPartitionPruner: Expecting: 1, received: 0
 2015-03-16 07:05:01,590 ERROR [Dispatcher thread: Central] impl.VertexImpl: 
 Vertex Input: store_sales initializer failed, 
 vertex=vertex_1424502260528_1113_4_04 [Map 1]
 org.apache.tez.dag.app.dag.impl.AMUserCodeException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: Incorrect event count in 
 dynamic parition pruning
 {code}
 !llap_vertex_200ms.png!
 All 4 upstream vertices of Map 1 need to finish within ~200ms to trigger 
 this, which seems to be consistently happening with LLAP.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9976) Possible race condition in DynamicPartitionPruner for 200ms tasks

2015-03-24 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HIVE-9976:
-
Fix Version/s: 1.1.1
   1.2.0

 Possible race condition in DynamicPartitionPruner for 200ms tasks
 --

 Key: HIVE-9976
 URL: https://issues.apache.org/jira/browse/HIVE-9976
 Project: Hive
  Issue Type: Sub-task
  Components: Tez
Affects Versions: 1.1.0
Reporter: Gopal V
Assignee: Gunther Hagleitner
 Fix For: 1.2.0, 1.1.1

 Attachments: llap_vertex_200ms.png


 Race condition in the DynamicPartitionPruner between 
 DynamicPartitionPruner::processVertex() and 
 DynamicPartitionpruner::addEvent() for tasks which respond with both the 
 result and success in a single heartbeat sequence.
 {code}
 2015-03-16 07:05:01,589 ERROR [InputInitializer [Map 1] #0] 
 tez.DynamicPartitionPruner: Expecting: 1, received: 0
 2015-03-16 07:05:01,590 ERROR [Dispatcher thread: Central] impl.VertexImpl: 
 Vertex Input: store_sales initializer failed, 
 vertex=vertex_1424502260528_1113_4_04 [Map 1]
 org.apache.tez.dag.app.dag.impl.AMUserCodeException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: Incorrect event count in 
 dynamic parition pruning
 {code}
 !llap_vertex_200ms.png!
 All 4 upstream vertices of Map 1 need to finish within ~200ms to trigger 
 this, which seems to be consistently happening with LLAP.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9976) Possible race condition in DynamicPartitionPruner for 200ms tasks

2015-03-24 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HIVE-9976:
-
Summary: Possible race condition in DynamicPartitionPruner for 200ms tasks 
 (was: LLAP: Possible race condition in DynamicPartitionPruner for 200ms tasks)

 Possible race condition in DynamicPartitionPruner for 200ms tasks
 --

 Key: HIVE-9976
 URL: https://issues.apache.org/jira/browse/HIVE-9976
 Project: Hive
  Issue Type: Sub-task
  Components: Tez
Affects Versions: 1.1.0
Reporter: Gopal V
Assignee: Gunther Hagleitner
 Fix For: 1.2.0, 1.1.1

 Attachments: llap_vertex_200ms.png


 Race condition in the DynamicPartitionPruner between 
 DynamicPartitionPruner::processVertex() and 
 DynamicPartitionpruner::addEvent() for tasks which respond with both the 
 result and success in a single heartbeat sequence.
 {code}
 2015-03-16 07:05:01,589 ERROR [InputInitializer [Map 1] #0] 
 tez.DynamicPartitionPruner: Expecting: 1, received: 0
 2015-03-16 07:05:01,590 ERROR [Dispatcher thread: Central] impl.VertexImpl: 
 Vertex Input: store_sales initializer failed, 
 vertex=vertex_1424502260528_1113_4_04 [Map 1]
 org.apache.tez.dag.app.dag.impl.AMUserCodeException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: Incorrect event count in 
 dynamic parition pruning
 {code}
 !llap_vertex_200ms.png!
 All 4 upstream vertices of Map 1 need to finish within ~200ms to trigger 
 this, which seems to be consistently happening with LLAP.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9976) Possible race condition in DynamicPartitionPruner for 200ms tasks

2015-03-24 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HIVE-9976:
-
Affects Version/s: (was: llap)
   1.1.0

 Possible race condition in DynamicPartitionPruner for 200ms tasks
 --

 Key: HIVE-9976
 URL: https://issues.apache.org/jira/browse/HIVE-9976
 Project: Hive
  Issue Type: Sub-task
  Components: Tez
Affects Versions: 1.1.0
Reporter: Gopal V
Assignee: Gunther Hagleitner
 Fix For: 1.2.0, 1.1.1

 Attachments: llap_vertex_200ms.png


 Race condition in the DynamicPartitionPruner between 
 DynamicPartitionPruner::processVertex() and 
 DynamicPartitionpruner::addEvent() for tasks which respond with both the 
 result and success in a single heartbeat sequence.
 {code}
 2015-03-16 07:05:01,589 ERROR [InputInitializer [Map 1] #0] 
 tez.DynamicPartitionPruner: Expecting: 1, received: 0
 2015-03-16 07:05:01,590 ERROR [Dispatcher thread: Central] impl.VertexImpl: 
 Vertex Input: store_sales initializer failed, 
 vertex=vertex_1424502260528_1113_4_04 [Map 1]
 org.apache.tez.dag.app.dag.impl.AMUserCodeException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: Incorrect event count in 
 dynamic parition pruning
 {code}
 !llap_vertex_200ms.png!
 All 4 upstream vertices of Map 1 need to finish within ~200ms to trigger 
 this, which seems to be consistently happening with LLAP.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9976) Possible race condition in DynamicPartitionPruner for 200ms tasks

2015-03-24 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HIVE-9976:
-
Attachment: HIVE-9976.1.patch

Patch to handle out of order events. Also initializes the pruner during Input 
construction - so that events don't show up before the pruner is initialized. 
Adds a bunch of tests.

[~hagleitn], [~vikram.dixit] - please review.

 Possible race condition in DynamicPartitionPruner for 200ms tasks
 --

 Key: HIVE-9976
 URL: https://issues.apache.org/jira/browse/HIVE-9976
 Project: Hive
  Issue Type: Bug
  Components: Tez
Affects Versions: 1.0.0
Reporter: Gopal V
Assignee: Siddharth Seth
 Fix For: 1.0.1

 Attachments: HIVE-9976.1.patch, llap_vertex_200ms.png


 Race condition in the DynamicPartitionPruner between 
 DynamicPartitionPruner::processVertex() and 
 DynamicPartitionpruner::addEvent() for tasks which respond with both the 
 result and success in a single heartbeat sequence.
 {code}
 2015-03-16 07:05:01,589 ERROR [InputInitializer [Map 1] #0] 
 tez.DynamicPartitionPruner: Expecting: 1, received: 0
 2015-03-16 07:05:01,590 ERROR [Dispatcher thread: Central] impl.VertexImpl: 
 Vertex Input: store_sales initializer failed, 
 vertex=vertex_1424502260528_1113_4_04 [Map 1]
 org.apache.tez.dag.app.dag.impl.AMUserCodeException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: Incorrect event count in 
 dynamic parition pruning
 {code}
 !llap_vertex_200ms.png!
 All 4 upstream vertices of Map 1 need to finish within ~200ms to trigger 
 this, which seems to be consistently happening with LLAP.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9976) Possible race condition in DynamicPartitionPruner for 200ms tasks

2015-03-16 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-9976:
--
Description: 
Race condition in the DynamicPartitionPruner between 
DynamicPartitionPruner::processVertex() and DynamicPartitionpruner::addEvent() 
for tasks which respond with both the result and success in a single heartbeat 
sequence.

{code}
2015-03-16 07:05:01,589 ERROR [InputInitializer [Map 1] #0] 
tez.DynamicPartitionPruner: Expecting: 1, received: 0
2015-03-16 07:05:01,590 ERROR [Dispatcher thread: Central] impl.VertexImpl: 
Vertex Input: store_sales initializer failed, 
vertex=vertex_1424502260528_1113_4_04 [Map 1]
org.apache.tez.dag.app.dag.impl.AMUserCodeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Incorrect event count in 
dynamic parition pruning
{code}

!llap_vertex_200ms.png!

All 4 upstream vertices of Map 1 need to finish within ~200ms to trigger this, 
which seems to be happening with LLAP.

  was:
Race condition in the DynamicPartitionPruner between 
DynamicPartitionPruner::processVertex() and DynamicPartitionpruner::addEvent() 
for tasks which respond with both the result and success in a single heartbeat 
sequence.

{code}
2015-03-16 07:05:01,589 ERROR [InputInitializer [Map 1] #0] 
tez.DynamicPartitionPruner: Expecting: 1, received: 0
2015-03-16 07:05:01,590 ERROR [Dispatcher thread: Central] impl.VertexImpl: 
Vertex Input: store_sales initializer failed, 
vertex=vertex_1424502260528_1113_4_04 [Map 1]
org.apache.tez.dag.app.dag.impl.AMUserCodeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Incorrect event count in 
dynamic parition pruning
{code}

All 4 upstream vertices of Map 1 need to finish within ~200ms to trigger this, 
which seems to be happening with LLAP.


 Possible race condition in DynamicPartitionPruner for 200ms tasks
 --

 Key: HIVE-9976
 URL: https://issues.apache.org/jira/browse/HIVE-9976
 Project: Hive
  Issue Type: Bug
  Components: Tez
Affects Versions: llap
Reporter: Gopal V
Assignee: Gunther Hagleitner
 Attachments: llap_vertex_200ms.png


 Race condition in the DynamicPartitionPruner between 
 DynamicPartitionPruner::processVertex() and 
 DynamicPartitionpruner::addEvent() for tasks which respond with both the 
 result and success in a single heartbeat sequence.
 {code}
 2015-03-16 07:05:01,589 ERROR [InputInitializer [Map 1] #0] 
 tez.DynamicPartitionPruner: Expecting: 1, received: 0
 2015-03-16 07:05:01,590 ERROR [Dispatcher thread: Central] impl.VertexImpl: 
 Vertex Input: store_sales initializer failed, 
 vertex=vertex_1424502260528_1113_4_04 [Map 1]
 org.apache.tez.dag.app.dag.impl.AMUserCodeException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: Incorrect event count in 
 dynamic parition pruning
 {code}
 !llap_vertex_200ms.png!
 All 4 upstream vertices of Map 1 need to finish within ~200ms to trigger 
 this, which seems to be happening with LLAP.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9976) Possible race condition in DynamicPartitionPruner for 200ms tasks

2015-03-16 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-9976:
--
Description: 
Race condition in the DynamicPartitionPruner between 
DynamicPartitionPruner::processVertex() and DynamicPartitionpruner::addEvent() 
for tasks which respond with both the result and success in a single heartbeat 
sequence.

{code}
2015-03-16 07:05:01,589 ERROR [InputInitializer [Map 1] #0] 
tez.DynamicPartitionPruner: Expecting: 1, received: 0
2015-03-16 07:05:01,590 ERROR [Dispatcher thread: Central] impl.VertexImpl: 
Vertex Input: store_sales initializer failed, 
vertex=vertex_1424502260528_1113_4_04 [Map 1]
org.apache.tez.dag.app.dag.impl.AMUserCodeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Incorrect event count in 
dynamic parition pruning
{code}

!llap_vertex_200ms.png!

All 4 upstream vertices of Map 1 need to finish within ~200ms to trigger this, 
which seems to be consistently happening with LLAP.

  was:
Race condition in the DynamicPartitionPruner between 
DynamicPartitionPruner::processVertex() and DynamicPartitionpruner::addEvent() 
for tasks which respond with both the result and success in a single heartbeat 
sequence.

{code}
2015-03-16 07:05:01,589 ERROR [InputInitializer [Map 1] #0] 
tez.DynamicPartitionPruner: Expecting: 1, received: 0
2015-03-16 07:05:01,590 ERROR [Dispatcher thread: Central] impl.VertexImpl: 
Vertex Input: store_sales initializer failed, 
vertex=vertex_1424502260528_1113_4_04 [Map 1]
org.apache.tez.dag.app.dag.impl.AMUserCodeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Incorrect event count in 
dynamic parition pruning
{code}

!llap_vertex_200ms.png!

All 4 upstream vertices of Map 1 need to finish within ~200ms to trigger this, 
which seems to be happening with LLAP.


 Possible race condition in DynamicPartitionPruner for 200ms tasks
 --

 Key: HIVE-9976
 URL: https://issues.apache.org/jira/browse/HIVE-9976
 Project: Hive
  Issue Type: Bug
  Components: Tez
Affects Versions: llap
Reporter: Gopal V
Assignee: Gunther Hagleitner
 Attachments: llap_vertex_200ms.png


 Race condition in the DynamicPartitionPruner between 
 DynamicPartitionPruner::processVertex() and 
 DynamicPartitionpruner::addEvent() for tasks which respond with both the 
 result and success in a single heartbeat sequence.
 {code}
 2015-03-16 07:05:01,589 ERROR [InputInitializer [Map 1] #0] 
 tez.DynamicPartitionPruner: Expecting: 1, received: 0
 2015-03-16 07:05:01,590 ERROR [Dispatcher thread: Central] impl.VertexImpl: 
 Vertex Input: store_sales initializer failed, 
 vertex=vertex_1424502260528_1113_4_04 [Map 1]
 org.apache.tez.dag.app.dag.impl.AMUserCodeException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: Incorrect event count in 
 dynamic parition pruning
 {code}
 !llap_vertex_200ms.png!
 All 4 upstream vertices of Map 1 need to finish within ~200ms to trigger 
 this, which seems to be consistently happening with LLAP.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)