[jira] [Updated] (HIVE-9976) Possible race condition in DynamicPartitionPruner for 200ms tasks
[ https://issues.apache.org/jira/browse/HIVE-9976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-9976: - Fix Version/s: (was: 1.0.1) Possible race condition in DynamicPartitionPruner for 200ms tasks -- Key: HIVE-9976 URL: https://issues.apache.org/jira/browse/HIVE-9976 Project: Hive Issue Type: Bug Components: Tez Affects Versions: 1.0.0 Reporter: Gopal V Assignee: Siddharth Seth Attachments: HIVE-9976.1.patch, llap_vertex_200ms.png Race condition in the DynamicPartitionPruner between DynamicPartitionPruner::processVertex() and DynamicPartitionpruner::addEvent() for tasks which respond with both the result and success in a single heartbeat sequence. {code} 2015-03-16 07:05:01,589 ERROR [InputInitializer [Map 1] #0] tez.DynamicPartitionPruner: Expecting: 1, received: 0 2015-03-16 07:05:01,590 ERROR [Dispatcher thread: Central] impl.VertexImpl: Vertex Input: store_sales initializer failed, vertex=vertex_1424502260528_1113_4_04 [Map 1] org.apache.tez.dag.app.dag.impl.AMUserCodeException: org.apache.hadoop.hive.ql.metadata.HiveException: Incorrect event count in dynamic parition pruning {code} !llap_vertex_200ms.png! All 4 upstream vertices of Map 1 need to finish within ~200ms to trigger this, which seems to be consistently happening with LLAP. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9976) Possible race condition in DynamicPartitionPruner for 200ms tasks
[ https://issues.apache.org/jira/browse/HIVE-9976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-9976: - Attachment: HIVE-9976.2.patch Thanks for the review. Updated patch with comments addressed, and some more changes. bq. Not your fault - but there are 2 paths through HiveSplitGenerator. Moved the methods into SplitGrouper. There's a static cache in there which seems a little strange. Will create a follow up jira to investigate this. For now I've changed that to a ConcurrentMap since split generation can run in parallel. bq. i see you've fixed calling close consistently on the data input stream. maybe use try{}finally there? Fixed. There was a bug with some of the other conditions which I'd changed. Fixed that as well. bq. it seems you're setting numexpectedevents to 0 first and then turn around and call decrement. Why not just set to -1? Also - why atomic integers? as far as i can tell all access to these maps is synchronized. numExpectedEvents is decremented for each column for which a source will send events. That's used to track total number of expected events from that source. Added a comment for this. Moved from AtomicIntegers to MutableInt - this was just to avoid re-inserting the Integer into the map, and not for thread safety. bq. does it make sense to make initialize in the pruner private now? (can't be used to init anymore - only from the constr). Also, the parameters aren't used anymore, right? Done, along with some other methods. Possible race condition in DynamicPartitionPruner for 200ms tasks -- Key: HIVE-9976 URL: https://issues.apache.org/jira/browse/HIVE-9976 Project: Hive Issue Type: Bug Components: Tez Affects Versions: 1.0.0 Reporter: Gopal V Assignee: Siddharth Seth Attachments: HIVE-9976.1.patch, HIVE-9976.2.patch, llap_vertex_200ms.png Race condition in the DynamicPartitionPruner between DynamicPartitionPruner::processVertex() and DynamicPartitionpruner::addEvent() for tasks which respond with both the result and success in a single heartbeat sequence. {code} 2015-03-16 07:05:01,589 ERROR [InputInitializer [Map 1] #0] tez.DynamicPartitionPruner: Expecting: 1, received: 0 2015-03-16 07:05:01,590 ERROR [Dispatcher thread: Central] impl.VertexImpl: Vertex Input: store_sales initializer failed, vertex=vertex_1424502260528_1113_4_04 [Map 1] org.apache.tez.dag.app.dag.impl.AMUserCodeException: org.apache.hadoop.hive.ql.metadata.HiveException: Incorrect event count in dynamic parition pruning {code} !llap_vertex_200ms.png! All 4 upstream vertices of Map 1 need to finish within ~200ms to trigger this, which seems to be consistently happening with LLAP. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9976) Possible race condition in DynamicPartitionPruner for 200ms tasks
[ https://issues.apache.org/jira/browse/HIVE-9976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-9976: - Affects Version/s: (was: 1.1.0) 1.0.0 Fix Version/s: (was: 1.1.1) (was: 1.2.0) 1.0.1 Assignee: Siddharth Seth (was: Gunther Hagleitner) This is not limited to LLAP. Assigning to myself - to change the handling of vertex success / init events. Possible race condition in DynamicPartitionPruner for 200ms tasks -- Key: HIVE-9976 URL: https://issues.apache.org/jira/browse/HIVE-9976 Project: Hive Issue Type: Sub-task Components: Tez Affects Versions: 1.0.0 Reporter: Gopal V Assignee: Siddharth Seth Fix For: 1.0.1 Attachments: llap_vertex_200ms.png Race condition in the DynamicPartitionPruner between DynamicPartitionPruner::processVertex() and DynamicPartitionpruner::addEvent() for tasks which respond with both the result and success in a single heartbeat sequence. {code} 2015-03-16 07:05:01,589 ERROR [InputInitializer [Map 1] #0] tez.DynamicPartitionPruner: Expecting: 1, received: 0 2015-03-16 07:05:01,590 ERROR [Dispatcher thread: Central] impl.VertexImpl: Vertex Input: store_sales initializer failed, vertex=vertex_1424502260528_1113_4_04 [Map 1] org.apache.tez.dag.app.dag.impl.AMUserCodeException: org.apache.hadoop.hive.ql.metadata.HiveException: Incorrect event count in dynamic parition pruning {code} !llap_vertex_200ms.png! All 4 upstream vertices of Map 1 need to finish within ~200ms to trigger this, which seems to be consistently happening with LLAP. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9976) Possible race condition in DynamicPartitionPruner for 200ms tasks
[ https://issues.apache.org/jira/browse/HIVE-9976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-9976: - Issue Type: Bug (was: Sub-task) Parent: (was: HIVE-7926) Possible race condition in DynamicPartitionPruner for 200ms tasks -- Key: HIVE-9976 URL: https://issues.apache.org/jira/browse/HIVE-9976 Project: Hive Issue Type: Bug Components: Tez Affects Versions: 1.0.0 Reporter: Gopal V Assignee: Siddharth Seth Fix For: 1.0.1 Attachments: llap_vertex_200ms.png Race condition in the DynamicPartitionPruner between DynamicPartitionPruner::processVertex() and DynamicPartitionpruner::addEvent() for tasks which respond with both the result and success in a single heartbeat sequence. {code} 2015-03-16 07:05:01,589 ERROR [InputInitializer [Map 1] #0] tez.DynamicPartitionPruner: Expecting: 1, received: 0 2015-03-16 07:05:01,590 ERROR [Dispatcher thread: Central] impl.VertexImpl: Vertex Input: store_sales initializer failed, vertex=vertex_1424502260528_1113_4_04 [Map 1] org.apache.tez.dag.app.dag.impl.AMUserCodeException: org.apache.hadoop.hive.ql.metadata.HiveException: Incorrect event count in dynamic parition pruning {code} !llap_vertex_200ms.png! All 4 upstream vertices of Map 1 need to finish within ~200ms to trigger this, which seems to be consistently happening with LLAP. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9976) Possible race condition in DynamicPartitionPruner for 200ms tasks
[ https://issues.apache.org/jira/browse/HIVE-9976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-9976: - Fix Version/s: 1.1.1 1.2.0 Possible race condition in DynamicPartitionPruner for 200ms tasks -- Key: HIVE-9976 URL: https://issues.apache.org/jira/browse/HIVE-9976 Project: Hive Issue Type: Sub-task Components: Tez Affects Versions: 1.1.0 Reporter: Gopal V Assignee: Gunther Hagleitner Fix For: 1.2.0, 1.1.1 Attachments: llap_vertex_200ms.png Race condition in the DynamicPartitionPruner between DynamicPartitionPruner::processVertex() and DynamicPartitionpruner::addEvent() for tasks which respond with both the result and success in a single heartbeat sequence. {code} 2015-03-16 07:05:01,589 ERROR [InputInitializer [Map 1] #0] tez.DynamicPartitionPruner: Expecting: 1, received: 0 2015-03-16 07:05:01,590 ERROR [Dispatcher thread: Central] impl.VertexImpl: Vertex Input: store_sales initializer failed, vertex=vertex_1424502260528_1113_4_04 [Map 1] org.apache.tez.dag.app.dag.impl.AMUserCodeException: org.apache.hadoop.hive.ql.metadata.HiveException: Incorrect event count in dynamic parition pruning {code} !llap_vertex_200ms.png! All 4 upstream vertices of Map 1 need to finish within ~200ms to trigger this, which seems to be consistently happening with LLAP. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9976) Possible race condition in DynamicPartitionPruner for 200ms tasks
[ https://issues.apache.org/jira/browse/HIVE-9976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-9976: - Summary: Possible race condition in DynamicPartitionPruner for 200ms tasks (was: LLAP: Possible race condition in DynamicPartitionPruner for 200ms tasks) Possible race condition in DynamicPartitionPruner for 200ms tasks -- Key: HIVE-9976 URL: https://issues.apache.org/jira/browse/HIVE-9976 Project: Hive Issue Type: Sub-task Components: Tez Affects Versions: 1.1.0 Reporter: Gopal V Assignee: Gunther Hagleitner Fix For: 1.2.0, 1.1.1 Attachments: llap_vertex_200ms.png Race condition in the DynamicPartitionPruner between DynamicPartitionPruner::processVertex() and DynamicPartitionpruner::addEvent() for tasks which respond with both the result and success in a single heartbeat sequence. {code} 2015-03-16 07:05:01,589 ERROR [InputInitializer [Map 1] #0] tez.DynamicPartitionPruner: Expecting: 1, received: 0 2015-03-16 07:05:01,590 ERROR [Dispatcher thread: Central] impl.VertexImpl: Vertex Input: store_sales initializer failed, vertex=vertex_1424502260528_1113_4_04 [Map 1] org.apache.tez.dag.app.dag.impl.AMUserCodeException: org.apache.hadoop.hive.ql.metadata.HiveException: Incorrect event count in dynamic parition pruning {code} !llap_vertex_200ms.png! All 4 upstream vertices of Map 1 need to finish within ~200ms to trigger this, which seems to be consistently happening with LLAP. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9976) Possible race condition in DynamicPartitionPruner for 200ms tasks
[ https://issues.apache.org/jira/browse/HIVE-9976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-9976: - Affects Version/s: (was: llap) 1.1.0 Possible race condition in DynamicPartitionPruner for 200ms tasks -- Key: HIVE-9976 URL: https://issues.apache.org/jira/browse/HIVE-9976 Project: Hive Issue Type: Sub-task Components: Tez Affects Versions: 1.1.0 Reporter: Gopal V Assignee: Gunther Hagleitner Fix For: 1.2.0, 1.1.1 Attachments: llap_vertex_200ms.png Race condition in the DynamicPartitionPruner between DynamicPartitionPruner::processVertex() and DynamicPartitionpruner::addEvent() for tasks which respond with both the result and success in a single heartbeat sequence. {code} 2015-03-16 07:05:01,589 ERROR [InputInitializer [Map 1] #0] tez.DynamicPartitionPruner: Expecting: 1, received: 0 2015-03-16 07:05:01,590 ERROR [Dispatcher thread: Central] impl.VertexImpl: Vertex Input: store_sales initializer failed, vertex=vertex_1424502260528_1113_4_04 [Map 1] org.apache.tez.dag.app.dag.impl.AMUserCodeException: org.apache.hadoop.hive.ql.metadata.HiveException: Incorrect event count in dynamic parition pruning {code} !llap_vertex_200ms.png! All 4 upstream vertices of Map 1 need to finish within ~200ms to trigger this, which seems to be consistently happening with LLAP. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9976) Possible race condition in DynamicPartitionPruner for 200ms tasks
[ https://issues.apache.org/jira/browse/HIVE-9976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-9976: - Attachment: HIVE-9976.1.patch Patch to handle out of order events. Also initializes the pruner during Input construction - so that events don't show up before the pruner is initialized. Adds a bunch of tests. [~hagleitn], [~vikram.dixit] - please review. Possible race condition in DynamicPartitionPruner for 200ms tasks -- Key: HIVE-9976 URL: https://issues.apache.org/jira/browse/HIVE-9976 Project: Hive Issue Type: Bug Components: Tez Affects Versions: 1.0.0 Reporter: Gopal V Assignee: Siddharth Seth Fix For: 1.0.1 Attachments: HIVE-9976.1.patch, llap_vertex_200ms.png Race condition in the DynamicPartitionPruner between DynamicPartitionPruner::processVertex() and DynamicPartitionpruner::addEvent() for tasks which respond with both the result and success in a single heartbeat sequence. {code} 2015-03-16 07:05:01,589 ERROR [InputInitializer [Map 1] #0] tez.DynamicPartitionPruner: Expecting: 1, received: 0 2015-03-16 07:05:01,590 ERROR [Dispatcher thread: Central] impl.VertexImpl: Vertex Input: store_sales initializer failed, vertex=vertex_1424502260528_1113_4_04 [Map 1] org.apache.tez.dag.app.dag.impl.AMUserCodeException: org.apache.hadoop.hive.ql.metadata.HiveException: Incorrect event count in dynamic parition pruning {code} !llap_vertex_200ms.png! All 4 upstream vertices of Map 1 need to finish within ~200ms to trigger this, which seems to be consistently happening with LLAP. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9976) Possible race condition in DynamicPartitionPruner for 200ms tasks
[ https://issues.apache.org/jira/browse/HIVE-9976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-9976: -- Description: Race condition in the DynamicPartitionPruner between DynamicPartitionPruner::processVertex() and DynamicPartitionpruner::addEvent() for tasks which respond with both the result and success in a single heartbeat sequence. {code} 2015-03-16 07:05:01,589 ERROR [InputInitializer [Map 1] #0] tez.DynamicPartitionPruner: Expecting: 1, received: 0 2015-03-16 07:05:01,590 ERROR [Dispatcher thread: Central] impl.VertexImpl: Vertex Input: store_sales initializer failed, vertex=vertex_1424502260528_1113_4_04 [Map 1] org.apache.tez.dag.app.dag.impl.AMUserCodeException: org.apache.hadoop.hive.ql.metadata.HiveException: Incorrect event count in dynamic parition pruning {code} !llap_vertex_200ms.png! All 4 upstream vertices of Map 1 need to finish within ~200ms to trigger this, which seems to be happening with LLAP. was: Race condition in the DynamicPartitionPruner between DynamicPartitionPruner::processVertex() and DynamicPartitionpruner::addEvent() for tasks which respond with both the result and success in a single heartbeat sequence. {code} 2015-03-16 07:05:01,589 ERROR [InputInitializer [Map 1] #0] tez.DynamicPartitionPruner: Expecting: 1, received: 0 2015-03-16 07:05:01,590 ERROR [Dispatcher thread: Central] impl.VertexImpl: Vertex Input: store_sales initializer failed, vertex=vertex_1424502260528_1113_4_04 [Map 1] org.apache.tez.dag.app.dag.impl.AMUserCodeException: org.apache.hadoop.hive.ql.metadata.HiveException: Incorrect event count in dynamic parition pruning {code} All 4 upstream vertices of Map 1 need to finish within ~200ms to trigger this, which seems to be happening with LLAP. Possible race condition in DynamicPartitionPruner for 200ms tasks -- Key: HIVE-9976 URL: https://issues.apache.org/jira/browse/HIVE-9976 Project: Hive Issue Type: Bug Components: Tez Affects Versions: llap Reporter: Gopal V Assignee: Gunther Hagleitner Attachments: llap_vertex_200ms.png Race condition in the DynamicPartitionPruner between DynamicPartitionPruner::processVertex() and DynamicPartitionpruner::addEvent() for tasks which respond with both the result and success in a single heartbeat sequence. {code} 2015-03-16 07:05:01,589 ERROR [InputInitializer [Map 1] #0] tez.DynamicPartitionPruner: Expecting: 1, received: 0 2015-03-16 07:05:01,590 ERROR [Dispatcher thread: Central] impl.VertexImpl: Vertex Input: store_sales initializer failed, vertex=vertex_1424502260528_1113_4_04 [Map 1] org.apache.tez.dag.app.dag.impl.AMUserCodeException: org.apache.hadoop.hive.ql.metadata.HiveException: Incorrect event count in dynamic parition pruning {code} !llap_vertex_200ms.png! All 4 upstream vertices of Map 1 need to finish within ~200ms to trigger this, which seems to be happening with LLAP. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9976) Possible race condition in DynamicPartitionPruner for 200ms tasks
[ https://issues.apache.org/jira/browse/HIVE-9976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-9976: -- Description: Race condition in the DynamicPartitionPruner between DynamicPartitionPruner::processVertex() and DynamicPartitionpruner::addEvent() for tasks which respond with both the result and success in a single heartbeat sequence. {code} 2015-03-16 07:05:01,589 ERROR [InputInitializer [Map 1] #0] tez.DynamicPartitionPruner: Expecting: 1, received: 0 2015-03-16 07:05:01,590 ERROR [Dispatcher thread: Central] impl.VertexImpl: Vertex Input: store_sales initializer failed, vertex=vertex_1424502260528_1113_4_04 [Map 1] org.apache.tez.dag.app.dag.impl.AMUserCodeException: org.apache.hadoop.hive.ql.metadata.HiveException: Incorrect event count in dynamic parition pruning {code} !llap_vertex_200ms.png! All 4 upstream vertices of Map 1 need to finish within ~200ms to trigger this, which seems to be consistently happening with LLAP. was: Race condition in the DynamicPartitionPruner between DynamicPartitionPruner::processVertex() and DynamicPartitionpruner::addEvent() for tasks which respond with both the result and success in a single heartbeat sequence. {code} 2015-03-16 07:05:01,589 ERROR [InputInitializer [Map 1] #0] tez.DynamicPartitionPruner: Expecting: 1, received: 0 2015-03-16 07:05:01,590 ERROR [Dispatcher thread: Central] impl.VertexImpl: Vertex Input: store_sales initializer failed, vertex=vertex_1424502260528_1113_4_04 [Map 1] org.apache.tez.dag.app.dag.impl.AMUserCodeException: org.apache.hadoop.hive.ql.metadata.HiveException: Incorrect event count in dynamic parition pruning {code} !llap_vertex_200ms.png! All 4 upstream vertices of Map 1 need to finish within ~200ms to trigger this, which seems to be happening with LLAP. Possible race condition in DynamicPartitionPruner for 200ms tasks -- Key: HIVE-9976 URL: https://issues.apache.org/jira/browse/HIVE-9976 Project: Hive Issue Type: Bug Components: Tez Affects Versions: llap Reporter: Gopal V Assignee: Gunther Hagleitner Attachments: llap_vertex_200ms.png Race condition in the DynamicPartitionPruner between DynamicPartitionPruner::processVertex() and DynamicPartitionpruner::addEvent() for tasks which respond with both the result and success in a single heartbeat sequence. {code} 2015-03-16 07:05:01,589 ERROR [InputInitializer [Map 1] #0] tez.DynamicPartitionPruner: Expecting: 1, received: 0 2015-03-16 07:05:01,590 ERROR [Dispatcher thread: Central] impl.VertexImpl: Vertex Input: store_sales initializer failed, vertex=vertex_1424502260528_1113_4_04 [Map 1] org.apache.tez.dag.app.dag.impl.AMUserCodeException: org.apache.hadoop.hive.ql.metadata.HiveException: Incorrect event count in dynamic parition pruning {code} !llap_vertex_200ms.png! All 4 upstream vertices of Map 1 need to finish within ~200ms to trigger this, which seems to be consistently happening with LLAP. -- This message was sent by Atlassian JIRA (v6.3.4#6332)