[jira] [Commented] (HIVE-14778) document threading model of Streaming API

2016-09-29 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15533378#comment-15533378
 ] 

Alan Gates commented on HIVE-14778:
---

+1, makes sense.

> document threading model of Streaming API
> -
>
> Key: HIVE-14778
> URL: https://issues.apache.org/jira/browse/HIVE-14778
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog, Transactions
>Affects Versions: 0.14.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-14778.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The model is not obvious and needs to be documented properly.
> A StreamingConnection internally maintains 2 MetaStoreClient objects (each 
> has 1 Thrift client for actual RPC). Let's call them "primary" and 
> "heartbeat". Each TransactionBatch created from a given StreamingConnection, 
> gets a reference to both of these MetaStoreClients. 
> So the model is that there is at most 1 outstanding (not closed) 
> TransactionBatch for any given StreamingConnection and for any given 
> TransactionBatch there can be at most 2 threads accessing it concurrently. 1 
> thread calling TransactionBatch.heartbeat() (and nothing else) and the other 
> calling all other methods.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14778) document threading model of Streaming API

2016-09-27 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15527166#comment-15527166
 ] 

Eugene Koifman commented on HIVE-14778:
---

It depends on what you mean by multiple threads.  TransactionBatches are made 
from StreamingConnection objects.  So the model is that if you want to write to 
the same HiveEndPoint in parallel, you create different StreamingConnection 
objects but from any given StreamingConnection you open/close 
TransactionBatches sequentially.  (The exception is that you can heartbeat any 
given TransactionBatch using a separate thread).

This seems like a reasonable model.  For example, JDBC (usually) has the same 
model.  You can create any number of connections but operations on a given 
Connection are expected to be sequential.

> document threading model of Streaming API
> -
>
> Key: HIVE-14778
> URL: https://issues.apache.org/jira/browse/HIVE-14778
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog, Transactions
>Affects Versions: 0.14.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-14778.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The model is not obvious and needs to be documented properly.
> A StreamingConnection internally maintains 2 MetaStoreClient objects (each 
> has 1 Thrift client for actual RPC). Let's call them "primary" and 
> "heartbeat". Each TransactionBatch created from a given StreamingConnection, 
> gets a reference to both of these MetaStoreClients. 
> So the model is that there is at most 1 outstanding (not closed) 
> TransactionBatch for any given StreamingConnection and for any given 
> TransactionBatch there can be at most 2 threads accessing it concurrently. 1 
> thread calling TransactionBatch.heartbeat() (and nothing else) and the other 
> calling all other methods.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14778) document threading model of Streaming API

2016-09-27 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15526802#comment-15526802
 ] 

Alan Gates commented on HIVE-14778:
---

These changes appear to say that the streaming is single threaded.  I don't 
think that's what you mean, but I want to make sure I understand what you're 
saying, which I think is the following:

{quote}
A single HiveEndPoint object cannot support having more than one 
TransactionBatch open and being committed to at the same time.  Also it does 
not properly support multiple threads committing in parallel, even inside one 
TransactionBatch.  However, it does support multiple threads as long as the 
commits are serialized.
{quote}

Is that correct?

> document threading model of Streaming API
> -
>
> Key: HIVE-14778
> URL: https://issues.apache.org/jira/browse/HIVE-14778
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog, Transactions
>Affects Versions: 0.14.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-14778.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The model is not obvious and needs to be documented properly.
> A StreamingConnection internally maintains 2 MetaStoreClient objects (each 
> has 1 Thrift client for actual RPC). Let's call them "primary" and 
> "heartbeat". Each TransactionBatch created from a given StreamingConnection, 
> gets a reference to both of these MetaStoreClients. 
> So the model is that there is at most 1 outstanding (not closed) 
> TransactionBatch for any given StreamingConnection and for any given 
> TransactionBatch there can be at most 2 threads accessing it concurrently. 1 
> thread calling TransactionBatch.heartbeat() (and nothing else) and the other 
> calling all other methods.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14778) document threading model of Streaming API

2016-09-16 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15497739#comment-15497739
 ] 

Hive QA commented on HIVE-14778:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12828885/HIVE-14778.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 10527 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_mapjoin]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ctas]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[index_auto_mult_tables]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[index_bitmap_auto_partitioned]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_join_part_col_char]
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[acid_bucket_pruning]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3]
org.apache.hadoop.hive.metastore.TestMetaStoreMetrics.testMetaDataCounts
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/1215/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/1215/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-1215/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12828885 - PreCommit-HIVE-MASTER-Build

> document threading model of Streaming API
> -
>
> Key: HIVE-14778
> URL: https://issues.apache.org/jira/browse/HIVE-14778
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog, Transactions
>Affects Versions: 0.14.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-14778.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The model is not obvious and needs to be documented properly.
> A StreamingConnection internally maintains 2 MetaStoreClient objects (each 
> has 1 Thrift client for actual RPC). Let's call them "primary" and 
> "heartbeat". Each TransactionBatch created from a given StreamingConnection, 
> gets a reference to both of these MetaStoreClients. 
> So the model is that there is at most 1 outstanding (not closed) 
> TransactionBatch for any given StreamingConnection and for any given 
> TransactionBatch there can be at most 2 threads accessing it concurrently. 1 
> thread calling TransactionBatch.heartbeat() (and nothing else) and the other 
> calling all other methods.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)