[jira] [Commented] (HIVE-15691) Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink

2017-03-09 Thread Roshan Naik (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15904023#comment-15904023
 ] 

Roshan Naik commented on HIVE-15691:


[~ekoifman] looks like this has been idle for a bit.  is there anything you 
need from  [~kalyanhadoop]   to move forward on this ?

> Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink
> -
>
> Key: HIVE-15691
> URL: https://issues.apache.org/jira/browse/HIVE-15691
> Project: Hive
>  Issue Type: New Feature
>  Components: HCatalog, Transactions
>Reporter: Kalyan
>Assignee: Kalyan
> Attachments: HIVE-15691.1.patch, HIVE-15691.patch, 
> HIVE-15691-updated.patch
>
>
> Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink.
> It is similar to StrictJsonWriter available in hive.
> Dependency is there in flume to commit.
> FLUME-3036 : Create a RegexSerializer for Hive Sink.
> Patch is available for Flume, Please verify the below link
> https://github.com/kalyanhadooptraining/flume/commit/1c651e81395404321f9964c8d9d2af6f4a2aaef9



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15691) Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink

2017-02-13 Thread Roshan Naik (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15865168#comment-15865168
 ] 

Roshan Naik commented on HIVE-15691:


ok,. Flume is not yet switched over to 2.x Hive.


> Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink
> -
>
> Key: HIVE-15691
> URL: https://issues.apache.org/jira/browse/HIVE-15691
> Project: Hive
>  Issue Type: New Feature
>  Components: HCatalog, Transactions
>Reporter: Kalyan
>Assignee: Kalyan
> Attachments: HIVE-15691.1.patch, HIVE-15691.patch, 
> HIVE-15691-updated.patch
>
>
> Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink.
> It is similar to StrictJsonWriter available in hive.
> Dependency is there in flume to commit.
> FLUME-3036 : Create a RegexSerializer for Hive Sink.
> Patch is available for Flume, Please verify the below link
> https://github.com/kalyanhadooptraining/flume/commit/1c651e81395404321f9964c8d9d2af6f4a2aaef9



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15691) Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink

2017-02-13 Thread Roshan Naik (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864832#comment-15864832
 ] 

Roshan Naik commented on HIVE-15691:


I see.

> Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink
> -
>
> Key: HIVE-15691
> URL: https://issues.apache.org/jira/browse/HIVE-15691
> Project: Hive
>  Issue Type: New Feature
>  Components: HCatalog, Transactions
>Reporter: Kalyan
>Assignee: Kalyan
> Attachments: HIVE-15691.1.patch, HIVE-15691.patch, 
> HIVE-15691-updated.patch
>
>
> Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink.
> It is similar to StrictJsonWriter available in hive.
> Dependency is there in flume to commit.
> FLUME-3036 : Create a RegexSerializer for Hive Sink.
> Patch is available for Flume, Please verify the below link
> https://github.com/kalyanhadooptraining/flume/commit/1c651e81395404321f9964c8d9d2af6f4a2aaef9



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HIVE-15691) Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink

2017-02-13 Thread Roshan Naik (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864761#comment-15864761
 ] 

Roshan Naik edited comment on HIVE-15691 at 2/14/17 12:39 AM:
--

[~ekoifman] & [~kalyanhadoop]

1) I think the class needs only 2 constructors:
- StrictRegexWriter(String regex, HiveEndPoint endPoint)
- StrictRegexWriter(String regex, HiveEndPoint endPoint, HiveConf conf)

The 'connection' param should be eliminated.

2) I cannot say much about the correctness of createSerde() & encode() methods, 
as i have not worked with RegexSerDe. Would be good to know if this has been 
validated via a manual test  where the data streamed via this writer, was 
properly queryable from Hive.


Looks fine other than that.


was (Author: roshan_naik):
[~ekoifman] & [~kalyanhadoop]

1) I think the class needs only 2 constructors:
- StrictRegexWriter(String regex, HiveEndPoint endPoint)
- StrictRegexWriter(String regex, HiveEndPoint endPoint, HiveConf conf)

The 'connection' param should be eliminated.

2) I cannot say much about the correctness of createSerde() & encode() methods, 
as i have not worked with RegexSerDe. Would be good to know if this has been 
validated via a manual test  where the data streamed via this writer, was 
properly queryable from Hive.



> Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink
> -
>
> Key: HIVE-15691
> URL: https://issues.apache.org/jira/browse/HIVE-15691
> Project: Hive
>  Issue Type: New Feature
>  Components: HCatalog, Transactions
>Reporter: Kalyan
>Assignee: Kalyan
> Attachments: HIVE-15691.1.patch, HIVE-15691.patch, 
> HIVE-15691-updated.patch
>
>
> Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink.
> It is similar to StrictJsonWriter available in hive.
> Dependency is there in flume to commit.
> FLUME-3036 : Create a RegexSerializer for Hive Sink.
> Patch is available for Flume, Please verify the below link
> https://github.com/kalyanhadooptraining/flume/commit/1c651e81395404321f9964c8d9d2af6f4a2aaef9



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15691) Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink

2017-02-13 Thread Roshan Naik (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864761#comment-15864761
 ] 

Roshan Naik commented on HIVE-15691:


[~ekoifman] & [~kalyanhadoop]

1) I think the class needs only 2 constructors:
- StrictRegexWriter(String regex, HiveEndPoint endPoint)
- StrictRegexWriter(String regex, HiveEndPoint endPoint, HiveConf conf)

The 'connection' param should be eliminated.

2) I cannot say much about the correctness of createSerde() & encode() methods, 
as i have not worked with RegexSerDe. Would be good to know if this has been 
validated via a manual test  where the data streamed via this writer, was 
properly queryable from Hive.



> Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink
> -
>
> Key: HIVE-15691
> URL: https://issues.apache.org/jira/browse/HIVE-15691
> Project: Hive
>  Issue Type: New Feature
>  Components: HCatalog, Transactions
>Reporter: Kalyan
>Assignee: Kalyan
> Attachments: HIVE-15691.1.patch, HIVE-15691.patch, 
> HIVE-15691-updated.patch
>
>
> Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink.
> It is similar to StrictJsonWriter available in hive.
> Dependency is there in flume to commit.
> FLUME-3036 : Create a RegexSerializer for Hive Sink.
> Patch is available for Flume, Please verify the below link
> https://github.com/kalyanhadooptraining/flume/commit/1c651e81395404321f9964c8d9d2af6f4a2aaef9



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15691) Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink

2017-02-10 Thread Roshan Naik (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861858#comment-15861858
 ] 

Roshan Naik commented on HIVE-15691:


Should be able to get to it on Monday.

> Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink
> -
>
> Key: HIVE-15691
> URL: https://issues.apache.org/jira/browse/HIVE-15691
> Project: Hive
>  Issue Type: New Feature
>  Components: HCatalog, Transactions
>Reporter: Kalyan
>Assignee: Kalyan
> Attachments: HIVE-15691.1.patch, HIVE-15691.patch, 
> HIVE-15691-updated.patch
>
>
> Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink.
> It is similar to StrictJsonWriter available in hive.
> Dependency is there in flume to commit.
> FLUME-3036 : Create a RegexSerializer for Hive Sink.
> Patch is available for Flume, Please verify the below link
> https://github.com/kalyanhadooptraining/flume/commit/1c651e81395404321f9964c8d9d2af6f4a2aaef9



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15691) Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink

2017-02-08 Thread Roshan Naik (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15858601#comment-15858601
 ] 

Roshan Naik commented on HIVE-15691:


StrictRegexWriter allows extracting fields from the incoming text based on user 
specified regexes.  DelimitedWriter is for extracting fields from things like 
comma/tab separated text.  My understanding is that, implementation wise, this 
is closely modeled around the  DelimitedWriter.  

[~kalyanhadoop] can better speak to the implementation details.

> Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink
> -
>
> Key: HIVE-15691
> URL: https://issues.apache.org/jira/browse/HIVE-15691
> Project: Hive
>  Issue Type: New Feature
>  Components: HCatalog, Transactions
>Reporter: Kalyan
>Assignee: Kalyan
> Attachments: HIVE-15691.1.patch, HIVE-15691.patch, 
> HIVE-15691-updated.patch
>
>
> Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink.
> It is similar to StrictJsonWriter available in hive.
> Dependency is there in flume to commit.
> FLUME-3036 : Create a RegexSerializer for Hive Sink.
> Patch is available for Flume, Please verify the below link
> https://github.com/kalyanhadooptraining/flume/commit/1c651e81395404321f9964c8d9d2af6f4a2aaef9



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-12421) Streaming API add TransactionBatch.beginNextTransaction(long timeout)

2015-11-18 Thread Roshan Naik (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15010502#comment-15010502
 ] 

Roshan Naik commented on HIVE-12421:


though.. not having a timeout option on blocking call is an issue IMO

> Streaming API add TransactionBatch.beginNextTransaction(long timeout)
> -
>
> Key: HIVE-12421
> URL: https://issues.apache.org/jira/browse/HIVE-12421
> Project: Hive
>  Issue Type: Improvement
>  Components: HCatalog, Transactions
>Affects Versions: 0.14.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>
> TransactionBatchImpl.beginNextTransactionImpl() has
> {noformat}
> LockResponse res = msClient.lock(lockRequest);
> if (res.getState() != LockState.ACQUIRED) {
>   throw new TransactionError("Unable to acquire lock on " + endPt);
> }
> {noformat}
> This means that if there are any competing locks already take, this will 
> throw an Exception to client.  This doesn't seem like the right behavior.  It 
> should block.
> We could also add TransactionBatch.beginNextTransaction(long timeoutMs) to  
> give the client more control.
> cc [~alangates]  [~sriharsha]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12421) Streaming API TransactionBatch.beginNextTransaction() does not wait for locks

2015-11-17 Thread Roshan Naik (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15009713#comment-15009713
 ] 

Roshan Naik commented on HIVE-12421:


I believe  msClient.lock( ) itself is a blocking call.  

The return value check is done  as  thats the mechanism of the msclient.lock() 
api  to indicate success. 

It is definitely a good idea to provide a  msClient.lock( timeout )  overload 
for the API to invoke.

> Streaming API TransactionBatch.beginNextTransaction() does not wait for locks
> -
>
> Key: HIVE-12421
> URL: https://issues.apache.org/jira/browse/HIVE-12421
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog, Transactions
>Affects Versions: 0.14.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>
> TransactionBatchImpl.beginNextTransactionImpl() has
> {noformat}
> LockResponse res = msClient.lock(lockRequest);
> if (res.getState() != LockState.ACQUIRED) {
>   throw new TransactionError("Unable to acquire lock on " + endPt);
> }
> {noformat}
> This means that if there are any competing locks already take, this will 
> throw an Exception to client.  This doesn't seem like the right behavior.  It 
> should block.
> We could also add TransactionBatch.beginNextTransaction(long timeoutMs) to  
> give the client more control.
> cc [~alangates]  [~sriharsha]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12421) Streaming API TransactionBatch.beginNextTransaction() does not wait for locks

2015-11-17 Thread Roshan Naik (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15009724#comment-15009724
 ] 

Roshan Naik commented on HIVE-12421:


I think the timeout should be set on the connection object as a property and 
reused internally on all blocking calls to the backend. 
.. instead of  per method TransactionBatch.beginNextTransaction(long timeoutMs

> Streaming API TransactionBatch.beginNextTransaction() does not wait for locks
> -
>
> Key: HIVE-12421
> URL: https://issues.apache.org/jira/browse/HIVE-12421
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog, Transactions
>Affects Versions: 0.14.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>
> TransactionBatchImpl.beginNextTransactionImpl() has
> {noformat}
> LockResponse res = msClient.lock(lockRequest);
> if (res.getState() != LockState.ACQUIRED) {
>   throw new TransactionError("Unable to acquire lock on " + endPt);
> }
> {noformat}
> This means that if there are any competing locks already take, this will 
> throw an Exception to client.  This doesn't seem like the right behavior.  It 
> should block.
> We could also add TransactionBatch.beginNextTransaction(long timeoutMs) to  
> give the client more control.
> cc [~alangates]  [~sriharsha]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11672) Hive Streaming API handles bucketing incorrectly

2015-11-02 Thread Roshan Naik (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14985919#comment-14985919
 ] 

Roshan Naik commented on HIVE-11672:


yes.

> Hive Streaming API handles bucketing incorrectly
> 
>
> Key: HIVE-11672
> URL: https://issues.apache.org/jira/browse/HIVE-11672
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog, Hive, Transactions
>Affects Versions: 1.2.1
>Reporter: Raj Bains
>Assignee: Roshan Naik
>Priority: Critical
>
> Hive Streaming API allows the clients to get a random bucket and then insert 
> data into it. However, this leads to incorrect bucketing as Hive expects data 
> to be distributed into buckets based on a hash function applied to bucket 
> key. The data is inserted randomly by the clients right now. They have no way 
> of
> # Knowing what bucket a row (tuple) belongs to
> # Asking for a specific bucket
> There are optimization such as Sort Merge Join and Bucket Map Join that rely 
> on the data being correctly distributed across buckets and these will cause 
> incorrect read results if the data is not distributed correctly.
> There are two obvious design choices
> # Hive Streaming API should fix this internally by distributing the data 
> correctly
> # Hive Streaming API should expose data distribution scheme to the clients 
> and allow them to distribute the data correctly
> The first option will mean every client thread will write to many buckets, 
> causing many small files in each bucket and too many connections open. this 
> does not seem feasible. The second option pushes more functionality into the 
> client of the Hive Streaming API, but can maintain high throughput and write 
> good sized ORC files. This option seems preferable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9582) HCatalog should use IMetaStoreClient interface

2015-10-21 Thread Roshan Naik (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14967419#comment-14967419
 ] 

Roshan Naik commented on HIVE-9582:
---

In this patch, HCatUtil.getHiveMetastoreClient()  uses double checked locking 
pattern
to implement singleton, which is a broken pattern. 

Created HIVE-12221

> HCatalog should use IMetaStoreClient interface
> --
>
> Key: HIVE-9582
> URL: https://issues.apache.org/jira/browse/HIVE-9582
> Project: Hive
>  Issue Type: Sub-task
>  Components: HCatalog, Metastore
>Affects Versions: 0.14.0, 0.13.1
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>  Labels: hcatalog, metastore, rolling_upgrade
> Fix For: 1.2.0
>
> Attachments: HIVE-9582.1.patch, HIVE-9582.2.patch, HIVE-9582.3.patch, 
> HIVE-9582.4.patch, HIVE-9582.5.patch, HIVE-9582.6.patch, HIVE-9582.7.patch, 
> HIVE-9582.8.patch, HIVE-9583.1.patch
>
>
> Hive uses IMetaStoreClient and it makes using RetryingMetaStoreClient easy. 
> Hence during a failure, the client retries and possibly succeeds. But 
> HCatalog has long been using HiveMetaStoreClient directly and hence failures 
> are costly, especially if they are during the commit stage of a job. Its also 
> not possible to do rolling upgrade of MetaStore Server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12003) Hive Streaming API : Add check to ensure table is transactional

2015-10-08 Thread Roshan Naik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roshan Naik updated HIVE-12003:
---
Attachment: HIVE-12003.3.patch

uploading patch v3

> Hive Streaming API : Add check to ensure table is transactional
> ---
>
> Key: HIVE-12003
> URL: https://issues.apache.org/jira/browse/HIVE-12003
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog, Hive, Transactions
>Affects Versions: 1.2.1
>Reporter: Roshan Naik
>Assignee: Roshan Naik
> Attachments: HIVE-12003.2.patch, HIVE-12003.3.patch, HIVE-12003.patch
>
>
> Check if TBLPROPERTIES ('transactional'='true') is set when opening connection



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12003) Hive Streaming API : Add check to ensure table is transactional

2015-10-08 Thread Roshan Naik (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14949405#comment-14949405
 ] 

Roshan Naik commented on HIVE-12003:


Was misled to thinking it was unrelated due to all the tests being a different 
module.

Looks like the TestCompactor* test are related, but as best as i can tell these 
two are not
- TestHCatClient.testTableSchemaPropagation and 
- TestSSL.testSSLVersion


*WRT error message:*  The actual Logs emitted are unfortunately getting 
swallowed in the UT run. Only stack trace shown and Exception's internal string 
which is not sufficiently descriptive. The  LOG.error and  LOG.warn are quite 
clear in checkEndPoint().  

Will upload new patch to improve the message string in the exception and also 
fixed the TestCompactor tests



> Hive Streaming API : Add check to ensure table is transactional
> ---
>
> Key: HIVE-12003
> URL: https://issues.apache.org/jira/browse/HIVE-12003
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog, Hive, Transactions
>Affects Versions: 1.2.1
>Reporter: Roshan Naik
>Assignee: Roshan Naik
> Attachments: HIVE-12003.2.patch, HIVE-12003.patch
>
>
> Check if TBLPROPERTIES ('transactional'='true') is set when opening connection



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11983) Hive streaming API uses incorrect logic to assign buckets to incoming records

2015-10-02 Thread Roshan Naik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roshan Naik updated HIVE-11983:
---
Attachment: HIVE-11983.5.patch

Uploading patch v5 addressing [~ekoifman]'s comments

> Hive streaming API uses incorrect logic to assign buckets to incoming records
> -
>
> Key: HIVE-11983
> URL: https://issues.apache.org/jira/browse/HIVE-11983
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog, Transactions
>Affects Versions: 1.2.1
>Reporter: Roshan Naik
>Assignee: Roshan Naik
>  Labels: streaming, streaming_api
> Attachments: HIVE-11983.3.patch, HIVE-11983.4.patch, 
> HIVE-11983.5.patch, HIVE-11983.patch
>
>
> The Streaming API tries to distribute records evenly into buckets. 
> All records in every Transaction that is part of TransactionBatch goes to the 
> same bucket and a new bucket number is chose for each TransactionBatch.
> Fix: API needs to hash each record to determine which bucket it belongs to. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12003) Hive Streaming API : Add check to ensure table is transactional

2015-10-01 Thread Roshan Naik (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14940402#comment-14940402
 ] 

Roshan Naik commented on HIVE-12003:


i am revising this patch to exclude the -w option

> Hive Streaming API : Add check to ensure table is transactional
> ---
>
> Key: HIVE-12003
> URL: https://issues.apache.org/jira/browse/HIVE-12003
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog, Hive, Transactions
>Affects Versions: 1.2.1
>Reporter: Roshan Naik
>Assignee: Roshan Naik
> Attachments: HIVE-12003.patch
>
>
> Check if TBLPROPERTIES ('transactional'='true') is set when opening connection



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11983) Hive streaming API uses incorrect logic to assign buckets to incoming records

2015-10-01 Thread Roshan Naik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roshan Naik updated HIVE-11983:
---
Attachment: HIVE-11983.4.patch

Uploading v4 patch ..  found that the patch was not applying due to the above 
noted -w option.  This patch is created without the -w option and applies  
cleanly using 'patch -p0'  and 'git apply -p 0'
RB however still doesn't like it.

This  patch is on top of  commit SHA  24988f7 (HIVE-11972)

> Hive streaming API uses incorrect logic to assign buckets to incoming records
> -
>
> Key: HIVE-11983
> URL: https://issues.apache.org/jira/browse/HIVE-11983
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog, Transactions
>Affects Versions: 1.2.1
>Reporter: Roshan Naik
>Assignee: Roshan Naik
>  Labels: streaming, streaming_api
> Attachments: HIVE-11983.3.patch, HIVE-11983.4.patch, HIVE-11983.patch
>
>
> The Streaming API tries to distribute records evenly into buckets. 
> All records in every Transaction that is part of TransactionBatch goes to the 
> same bucket and a new bucket number is chose for each TransactionBatch.
> Fix: API needs to hash each record to determine which bucket it belongs to. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12003) Hive Streaming API : Add check to ensure table is transactional

2015-10-01 Thread Roshan Naik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roshan Naik updated HIVE-12003:
---
Attachment: HIVE-12003.2.patch

> Hive Streaming API : Add check to ensure table is transactional
> ---
>
> Key: HIVE-12003
> URL: https://issues.apache.org/jira/browse/HIVE-12003
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog, Hive, Transactions
>Affects Versions: 1.2.1
>Reporter: Roshan Naik
>Assignee: Roshan Naik
> Attachments: HIVE-12003.2.patch, HIVE-12003.patch
>
>
> Check if TBLPROPERTIES ('transactional'='true') is set when opening connection



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12003) Hive Streaming API : Add check to ensure table is transactional

2015-09-30 Thread Roshan Naik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roshan Naik updated HIVE-12003:
---
Attachment: HIVE-12003.patch

Uploading patch (fix +UT)

> Hive Streaming API : Add check to ensure table is transactional
> ---
>
> Key: HIVE-12003
> URL: https://issues.apache.org/jira/browse/HIVE-12003
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.1
>Reporter: Roshan Naik
>Assignee: Roshan Naik
> Attachments: HIVE-12003.patch
>
>
> Check if TBLPROPERTIES ('transactional'='true') is set when opening connection



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12003) Hive Streaming API : Add check to ensure table is transactional

2015-09-30 Thread Roshan Naik (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14939001#comment-14939001
 ] 

Roshan Naik commented on HIVE-12003:


its better to make the normalize the case of in the code that sets the 
tblproperties() on table .. to either lower or uppercase .. instead of every 
lookup trying to do a case-insenstive lookup on the map

> Hive Streaming API : Add check to ensure table is transactional
> ---
>
> Key: HIVE-12003
> URL: https://issues.apache.org/jira/browse/HIVE-12003
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog, Hive, Transactions
>Affects Versions: 1.2.1
>Reporter: Roshan Naik
>Assignee: Roshan Naik
> Attachments: HIVE-12003.patch
>
>
> Check if TBLPROPERTIES ('transactional'='true') is set when opening connection



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12003) Hive Streaming API : Add check to ensure table is transactional

2015-09-30 Thread Roshan Naik (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14939025#comment-14939025
 ] 

Roshan Naik commented on HIVE-12003:


doing it in tblproperties would not be in scope of this jira.. my point is its 
better to it there instead of here or other places where lookup is done.  it 
defeats the purpose of using a Map<> 

> Hive Streaming API : Add check to ensure table is transactional
> ---
>
> Key: HIVE-12003
> URL: https://issues.apache.org/jira/browse/HIVE-12003
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog, Hive, Transactions
>Affects Versions: 1.2.1
>Reporter: Roshan Naik
>Assignee: Roshan Naik
> Attachments: HIVE-12003.patch
>
>
> Check if TBLPROPERTIES ('transactional'='true') is set when opening connection



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11983) Hive streaming API uses incorrect logic to assign buckets to incoming records

2015-09-30 Thread Roshan Naik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roshan Naik updated HIVE-11983:
---
Attachment: HIVE-11983.v2.patch

uploading patch generated with -w option to skip whitespace changes

> Hive streaming API uses incorrect logic to assign buckets to incoming records
> -
>
> Key: HIVE-11983
> URL: https://issues.apache.org/jira/browse/HIVE-11983
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog, Transactions
>Affects Versions: 1.2.1
>Reporter: Roshan Naik
>Assignee: Roshan Naik
>  Labels: streaming, streaming_api
> Attachments: HIVE-11983.patch, HIVE-11983.v2.patch
>
>
> The Streaming API tries to distribute records evenly into buckets. 
> All records in every Transaction that is part of TransactionBatch goes to the 
> same bucket and a new bucket number is chose for each TransactionBatch.
> Fix: API needs to hash each record to determine which bucket it belongs to. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12003) Hive Streaming API : Add check to ensure table is transactional

2015-09-30 Thread Roshan Naik (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14938999#comment-14938999
 ] 

Roshan Naik commented on HIVE-12003:


It seems strange to make the lookup key case insensitive. i don't think maps 
support case insensitive lookups. the only option would be to iterate.. is that 
what you are suggesting ?

I could switch to using the constant but that wont make it case insensitive.

> Hive Streaming API : Add check to ensure table is transactional
> ---
>
> Key: HIVE-12003
> URL: https://issues.apache.org/jira/browse/HIVE-12003
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog, Hive, Transactions
>Affects Versions: 1.2.1
>Reporter: Roshan Naik
>Assignee: Roshan Naik
> Attachments: HIVE-12003.patch
>
>
> Check if TBLPROPERTIES ('transactional'='true') is set when opening connection



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11983) Hive streaming API uses incorrect logic to assign buckets to incoming records

2015-09-30 Thread Roshan Naik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roshan Naik updated HIVE-11983:
---
Attachment: HIVE-11983.3.patch

[~ekoifman] looks like that patch generation had an issue. Some of those lines 
should not be deleted in OI-Utils.  Uploading a revised patch v3

> Hive streaming API uses incorrect logic to assign buckets to incoming records
> -
>
> Key: HIVE-11983
> URL: https://issues.apache.org/jira/browse/HIVE-11983
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog, Transactions
>Affects Versions: 1.2.1
>Reporter: Roshan Naik
>Assignee: Roshan Naik
>  Labels: streaming, streaming_api
> Attachments: HIVE-11983.3.patch, HIVE-11983.patch
>
>
> The Streaming API tries to distribute records evenly into buckets. 
> All records in every Transaction that is part of TransactionBatch goes to the 
> same bucket and a new bucket number is chose for each TransactionBatch.
> Fix: API needs to hash each record to determine which bucket it belongs to. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11983) Hive streaming API uses incorrect logic to assign buckets to incoming records

2015-09-30 Thread Roshan Naik (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14939107#comment-14939107
 ] 

Roshan Naik commented on HIVE-11983:


Deleted the v2 patch to ensure bot does not balk.

> Hive streaming API uses incorrect logic to assign buckets to incoming records
> -
>
> Key: HIVE-11983
> URL: https://issues.apache.org/jira/browse/HIVE-11983
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog, Transactions
>Affects Versions: 1.2.1
>Reporter: Roshan Naik
>Assignee: Roshan Naik
>  Labels: streaming, streaming_api
> Attachments: HIVE-11983.3.patch, HIVE-11983.patch
>
>
> The Streaming API tries to distribute records evenly into buckets. 
> All records in every Transaction that is part of TransactionBatch goes to the 
> same bucket and a new bucket number is chose for each TransactionBatch.
> Fix: API needs to hash each record to determine which bucket it belongs to. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12003) Hive Streaming API : Add check to ensure table is transactional

2015-09-30 Thread Roshan Naik (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14938977#comment-14938977
 ] 

Roshan Naik commented on HIVE-12003:


[~ekoifman]  .. it is already case insensitive ..
{code}
if (transactionalProp != null && transactionalProp.equalsIgnoreCase("true")) {
{code}

> Hive Streaming API : Add check to ensure table is transactional
> ---
>
> Key: HIVE-12003
> URL: https://issues.apache.org/jira/browse/HIVE-12003
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog, Hive, Transactions
>Affects Versions: 1.2.1
>Reporter: Roshan Naik
>Assignee: Roshan Naik
> Attachments: HIVE-12003.patch
>
>
> Check if TBLPROPERTIES ('transactional'='true') is set when opening connection



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11983) Hive streaming API uses incorrect logic to assign buckets to incoming records

2015-09-30 Thread Roshan Naik (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14939056#comment-14939056
 ] 

Roshan Naik commented on HIVE-11983:


Created a review on RB,  but RB is having some trouble showing the patch info 
due to some issue... not sure why

https://reviews.apache.org/r/38911/


> Hive streaming API uses incorrect logic to assign buckets to incoming records
> -
>
> Key: HIVE-11983
> URL: https://issues.apache.org/jira/browse/HIVE-11983
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog, Transactions
>Affects Versions: 1.2.1
>Reporter: Roshan Naik
>Assignee: Roshan Naik
>  Labels: streaming, streaming_api
> Attachments: HIVE-11983.patch
>
>
> The Streaming API tries to distribute records evenly into buckets. 
> All records in every Transaction that is part of TransactionBatch goes to the 
> same bucket and a new bucket number is chose for each TransactionBatch.
> Fix: API needs to hash each record to determine which bucket it belongs to. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11983) Hive streaming API uses incorrect logic to assign buckets to incoming records

2015-09-30 Thread Roshan Naik (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14939082#comment-14939082
 ] 

Roshan Naik commented on HIVE-11983:


tried a few different things.. but unable to resolve the RB issue.

> Hive streaming API uses incorrect logic to assign buckets to incoming records
> -
>
> Key: HIVE-11983
> URL: https://issues.apache.org/jira/browse/HIVE-11983
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog, Transactions
>Affects Versions: 1.2.1
>Reporter: Roshan Naik
>Assignee: Roshan Naik
>  Labels: streaming, streaming_api
> Attachments: HIVE-11983.patch, HIVE-11983.v2.patch
>
>
> The Streaming API tries to distribute records evenly into buckets. 
> All records in every Transaction that is part of TransactionBatch goes to the 
> same bucket and a new bucket number is chose for each TransactionBatch.
> Fix: API needs to hash each record to determine which bucket it belongs to. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11983) Hive streaming API uses incorrect logic to assign buckets to incoming records

2015-09-30 Thread Roshan Naik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roshan Naik updated HIVE-11983:
---
Attachment: (was: HIVE-11983.v2.patch)

> Hive streaming API uses incorrect logic to assign buckets to incoming records
> -
>
> Key: HIVE-11983
> URL: https://issues.apache.org/jira/browse/HIVE-11983
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog, Transactions
>Affects Versions: 1.2.1
>Reporter: Roshan Naik
>Assignee: Roshan Naik
>  Labels: streaming, streaming_api
> Attachments: HIVE-11983.3.patch, HIVE-11983.patch
>
>
> The Streaming API tries to distribute records evenly into buckets. 
> All records in every Transaction that is part of TransactionBatch goes to the 
> same bucket and a new bucket number is chose for each TransactionBatch.
> Fix: API needs to hash each record to determine which bucket it belongs to. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11983) Hive streaming API uses incorrect logic to assign buckets to incoming records

2015-09-29 Thread Roshan Naik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roshan Naik updated HIVE-11983:
---
Attachment: HIVE-11983.patch

Uploading patch

> Hive streaming API uses incorrect logic to assign buckets to incoming records
> -
>
> Key: HIVE-11983
> URL: https://issues.apache.org/jira/browse/HIVE-11983
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 1.2.1
>Reporter: Roshan Naik
>Assignee: Roshan Naik
> Attachments: HIVE-11983.patch
>
>
> The Streaming API tries to distribute records evenly into buckets. 
> All records in every Transaction that is part of TransactionBatch goes to the 
> same bucket and a new bucket number is chose for each TransactionBatch.
> Fix: API needs to hash each record to determine which bucket it belongs to. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11983) Hive streaming API uses incorrect logic to assign buckets to incoming records

2015-09-28 Thread Roshan Naik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roshan Naik updated HIVE-11983:
---
Summary: Hive streaming API uses incorrect logic to assign buckets to 
incoming records  (was: Hive streaming API's uses incorrect logic to assign 
buckets to incoming records)

> Hive streaming API uses incorrect logic to assign buckets to incoming records
> -
>
> Key: HIVE-11983
> URL: https://issues.apache.org/jira/browse/HIVE-11983
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 1.2.1
>Reporter: Roshan Naik
>Assignee: Roshan Naik
>
> The Streaming API tries to distribute records evenly into buckets. 
> All records in every Transaction that is part of TransactionBatch goes to the 
> same bucket and a new bucket number is chose for each TransactionBatch.
> Fix: API needs to hash each record to determine which bucket it belongs to. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11089) Hive Streaming: connection fails when using a proxy user UGI

2015-08-05 Thread Roshan Naik (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14659125#comment-14659125
 ] 

Roshan Naik commented on HIVE-11089:



- That 'proxyUser' string argument was a parameter to a private method prior to 
kerberos support. it was never exposed externally and always set to null 
internally. At the time the thought was to support proxying but it never got 
fully tested. So I think i pulled it from the public interface very late in the 
dev cycle and did not reflect that in the wiki. I just updated the wiki.

- With introduction of kerberos support, the internal 'proxyUser' was dropped, 
and UGI based 'authenticatedUser' argument  was exposed publicly ... in a new 
overload for newConnection(). So to acquire connection as a user other than 
process user, kerberos will be needed.

- Wiki has a secure/kerberos example at the bottom. that should work.  API 
reference is in the Java Docs http://hive.apache.org/javadocs/r1.2.1/api/.   
References to proxyUser in the javadocs need to be fixed.

 Hive Streaming: connection fails when using a proxy user UGI
 

 Key: HIVE-11089
 URL: https://issues.apache.org/jira/browse/HIVE-11089
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 0.14.0, 1.0.0, 1.2.0
Reporter: Adam Kunicki
  Labels: ACID, Streaming

 HIVE-7508 Add Kerberos Support seems to also remove the ability to specify 
 a proxy user.
 HIVE-8427 adds a call to ugi.hasKerberosCredentials() to check whether the 
 connection is supposed to be a secure connection.
 This however breaks support for Proxy Users as a proxy user UGI will always 
 return false to hasKerberosCredentials().
 See lines 273, 274 of HiveEndPoint.java
 {code}
 this.secureMode = ugi==null ? false : ugi.hasKerberosCredentials();
 this.msClient = getMetaStoreClient(endPoint, conf, secureMode);
 {code}
 It also seems that between 13.1 and 0.14 the newConnection() method that 
 includes a proxy user has been removed.
 for reference: 
 https://github.com/apache/hive/commit/8e423a12db47759196c24535fbc32236b79f464a



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)