[jira] [Commented] (HIVE-14778) document threading model of Streaming API
[ https://issues.apache.org/jira/browse/HIVE-14778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15533378#comment-15533378 ] Alan Gates commented on HIVE-14778: --- +1, makes sense. > document threading model of Streaming API > - > > Key: HIVE-14778 > URL: https://issues.apache.org/jira/browse/HIVE-14778 > Project: Hive > Issue Type: Bug > Components: HCatalog, Transactions >Affects Versions: 0.14.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman > Attachments: HIVE-14778.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > The model is not obvious and needs to be documented properly. > A StreamingConnection internally maintains 2 MetaStoreClient objects (each > has 1 Thrift client for actual RPC). Let's call them "primary" and > "heartbeat". Each TransactionBatch created from a given StreamingConnection, > gets a reference to both of these MetaStoreClients. > So the model is that there is at most 1 outstanding (not closed) > TransactionBatch for any given StreamingConnection and for any given > TransactionBatch there can be at most 2 threads accessing it concurrently. 1 > thread calling TransactionBatch.heartbeat() (and nothing else) and the other > calling all other methods. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14778) document threading model of Streaming API
[ https://issues.apache.org/jira/browse/HIVE-14778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15527166#comment-15527166 ] Eugene Koifman commented on HIVE-14778: --- It depends on what you mean by multiple threads. TransactionBatches are made from StreamingConnection objects. So the model is that if you want to write to the same HiveEndPoint in parallel, you create different StreamingConnection objects but from any given StreamingConnection you open/close TransactionBatches sequentially. (The exception is that you can heartbeat any given TransactionBatch using a separate thread). This seems like a reasonable model. For example, JDBC (usually) has the same model. You can create any number of connections but operations on a given Connection are expected to be sequential. > document threading model of Streaming API > - > > Key: HIVE-14778 > URL: https://issues.apache.org/jira/browse/HIVE-14778 > Project: Hive > Issue Type: Bug > Components: HCatalog, Transactions >Affects Versions: 0.14.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman > Attachments: HIVE-14778.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > The model is not obvious and needs to be documented properly. > A StreamingConnection internally maintains 2 MetaStoreClient objects (each > has 1 Thrift client for actual RPC). Let's call them "primary" and > "heartbeat". Each TransactionBatch created from a given StreamingConnection, > gets a reference to both of these MetaStoreClients. > So the model is that there is at most 1 outstanding (not closed) > TransactionBatch for any given StreamingConnection and for any given > TransactionBatch there can be at most 2 threads accessing it concurrently. 1 > thread calling TransactionBatch.heartbeat() (and nothing else) and the other > calling all other methods. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14778) document threading model of Streaming API
[ https://issues.apache.org/jira/browse/HIVE-14778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15526802#comment-15526802 ] Alan Gates commented on HIVE-14778: --- These changes appear to say that the streaming is single threaded. I don't think that's what you mean, but I want to make sure I understand what you're saying, which I think is the following: {quote} A single HiveEndPoint object cannot support having more than one TransactionBatch open and being committed to at the same time. Also it does not properly support multiple threads committing in parallel, even inside one TransactionBatch. However, it does support multiple threads as long as the commits are serialized. {quote} Is that correct? > document threading model of Streaming API > - > > Key: HIVE-14778 > URL: https://issues.apache.org/jira/browse/HIVE-14778 > Project: Hive > Issue Type: Bug > Components: HCatalog, Transactions >Affects Versions: 0.14.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman > Attachments: HIVE-14778.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > The model is not obvious and needs to be documented properly. > A StreamingConnection internally maintains 2 MetaStoreClient objects (each > has 1 Thrift client for actual RPC). Let's call them "primary" and > "heartbeat". Each TransactionBatch created from a given StreamingConnection, > gets a reference to both of these MetaStoreClients. > So the model is that there is at most 1 outstanding (not closed) > TransactionBatch for any given StreamingConnection and for any given > TransactionBatch there can be at most 2 threads accessing it concurrently. 1 > thread calling TransactionBatch.heartbeat() (and nothing else) and the other > calling all other methods. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14778) document threading model of Streaming API
[ https://issues.apache.org/jira/browse/HIVE-14778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15497739#comment-15497739 ] Hive QA commented on HIVE-14778: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12828885/HIVE-14778.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 10527 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_mapjoin] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ctas] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[index_auto_mult_tables] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[index_bitmap_auto_partitioned] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_join_part_col_char] org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[acid_bucket_pruning] org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] org.apache.hadoop.hive.metastore.TestMetaStoreMetrics.testMetaDataCounts org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/1215/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/1215/console Test logs: http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-1215/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 9 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12828885 - PreCommit-HIVE-MASTER-Build > document threading model of Streaming API > - > > Key: HIVE-14778 > URL: https://issues.apache.org/jira/browse/HIVE-14778 > Project: Hive > Issue Type: Bug > Components: HCatalog, Transactions >Affects Versions: 0.14.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman > Attachments: HIVE-14778.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > The model is not obvious and needs to be documented properly. > A StreamingConnection internally maintains 2 MetaStoreClient objects (each > has 1 Thrift client for actual RPC). Let's call them "primary" and > "heartbeat". Each TransactionBatch created from a given StreamingConnection, > gets a reference to both of these MetaStoreClients. > So the model is that there is at most 1 outstanding (not closed) > TransactionBatch for any given StreamingConnection and for any given > TransactionBatch there can be at most 2 threads accessing it concurrently. 1 > thread calling TransactionBatch.heartbeat() (and nothing else) and the other > calling all other methods. -- This message was sent by Atlassian JIRA (v6.3.4#6332)