[jira] [Commented] (HIVE-15691) Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink
[ https://issues.apache.org/jira/browse/HIVE-15691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15904023#comment-15904023 ] Roshan Naik commented on HIVE-15691: [~ekoifman] looks like this has been idle for a bit. is there anything you need from [~kalyanhadoop] to move forward on this ? > Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink > - > > Key: HIVE-15691 > URL: https://issues.apache.org/jira/browse/HIVE-15691 > Project: Hive > Issue Type: New Feature > Components: HCatalog, Transactions >Reporter: Kalyan >Assignee: Kalyan > Attachments: HIVE-15691.1.patch, HIVE-15691.patch, > HIVE-15691-updated.patch > > > Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink. > It is similar to StrictJsonWriter available in hive. > Dependency is there in flume to commit. > FLUME-3036 : Create a RegexSerializer for Hive Sink. > Patch is available for Flume, Please verify the below link > https://github.com/kalyanhadooptraining/flume/commit/1c651e81395404321f9964c8d9d2af6f4a2aaef9 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-15691) Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink
[ https://issues.apache.org/jira/browse/HIVE-15691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15865168#comment-15865168 ] Roshan Naik commented on HIVE-15691: ok,. Flume is not yet switched over to 2.x Hive. > Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink > - > > Key: HIVE-15691 > URL: https://issues.apache.org/jira/browse/HIVE-15691 > Project: Hive > Issue Type: New Feature > Components: HCatalog, Transactions >Reporter: Kalyan >Assignee: Kalyan > Attachments: HIVE-15691.1.patch, HIVE-15691.patch, > HIVE-15691-updated.patch > > > Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink. > It is similar to StrictJsonWriter available in hive. > Dependency is there in flume to commit. > FLUME-3036 : Create a RegexSerializer for Hive Sink. > Patch is available for Flume, Please verify the below link > https://github.com/kalyanhadooptraining/flume/commit/1c651e81395404321f9964c8d9d2af6f4a2aaef9 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-15691) Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink
[ https://issues.apache.org/jira/browse/HIVE-15691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864832#comment-15864832 ] Roshan Naik commented on HIVE-15691: I see. > Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink > - > > Key: HIVE-15691 > URL: https://issues.apache.org/jira/browse/HIVE-15691 > Project: Hive > Issue Type: New Feature > Components: HCatalog, Transactions >Reporter: Kalyan >Assignee: Kalyan > Attachments: HIVE-15691.1.patch, HIVE-15691.patch, > HIVE-15691-updated.patch > > > Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink. > It is similar to StrictJsonWriter available in hive. > Dependency is there in flume to commit. > FLUME-3036 : Create a RegexSerializer for Hive Sink. > Patch is available for Flume, Please verify the below link > https://github.com/kalyanhadooptraining/flume/commit/1c651e81395404321f9964c8d9d2af6f4a2aaef9 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (HIVE-15691) Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink
[ https://issues.apache.org/jira/browse/HIVE-15691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864761#comment-15864761 ] Roshan Naik edited comment on HIVE-15691 at 2/14/17 12:39 AM: -- [~ekoifman] & [~kalyanhadoop] 1) I think the class needs only 2 constructors: - StrictRegexWriter(String regex, HiveEndPoint endPoint) - StrictRegexWriter(String regex, HiveEndPoint endPoint, HiveConf conf) The 'connection' param should be eliminated. 2) I cannot say much about the correctness of createSerde() & encode() methods, as i have not worked with RegexSerDe. Would be good to know if this has been validated via a manual test where the data streamed via this writer, was properly queryable from Hive. Looks fine other than that. was (Author: roshan_naik): [~ekoifman] & [~kalyanhadoop] 1) I think the class needs only 2 constructors: - StrictRegexWriter(String regex, HiveEndPoint endPoint) - StrictRegexWriter(String regex, HiveEndPoint endPoint, HiveConf conf) The 'connection' param should be eliminated. 2) I cannot say much about the correctness of createSerde() & encode() methods, as i have not worked with RegexSerDe. Would be good to know if this has been validated via a manual test where the data streamed via this writer, was properly queryable from Hive. > Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink > - > > Key: HIVE-15691 > URL: https://issues.apache.org/jira/browse/HIVE-15691 > Project: Hive > Issue Type: New Feature > Components: HCatalog, Transactions >Reporter: Kalyan >Assignee: Kalyan > Attachments: HIVE-15691.1.patch, HIVE-15691.patch, > HIVE-15691-updated.patch > > > Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink. > It is similar to StrictJsonWriter available in hive. > Dependency is there in flume to commit. > FLUME-3036 : Create a RegexSerializer for Hive Sink. > Patch is available for Flume, Please verify the below link > https://github.com/kalyanhadooptraining/flume/commit/1c651e81395404321f9964c8d9d2af6f4a2aaef9 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-15691) Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink
[ https://issues.apache.org/jira/browse/HIVE-15691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864761#comment-15864761 ] Roshan Naik commented on HIVE-15691: [~ekoifman] & [~kalyanhadoop] 1) I think the class needs only 2 constructors: - StrictRegexWriter(String regex, HiveEndPoint endPoint) - StrictRegexWriter(String regex, HiveEndPoint endPoint, HiveConf conf) The 'connection' param should be eliminated. 2) I cannot say much about the correctness of createSerde() & encode() methods, as i have not worked with RegexSerDe. Would be good to know if this has been validated via a manual test where the data streamed via this writer, was properly queryable from Hive. > Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink > - > > Key: HIVE-15691 > URL: https://issues.apache.org/jira/browse/HIVE-15691 > Project: Hive > Issue Type: New Feature > Components: HCatalog, Transactions >Reporter: Kalyan >Assignee: Kalyan > Attachments: HIVE-15691.1.patch, HIVE-15691.patch, > HIVE-15691-updated.patch > > > Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink. > It is similar to StrictJsonWriter available in hive. > Dependency is there in flume to commit. > FLUME-3036 : Create a RegexSerializer for Hive Sink. > Patch is available for Flume, Please verify the below link > https://github.com/kalyanhadooptraining/flume/commit/1c651e81395404321f9964c8d9d2af6f4a2aaef9 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-15691) Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink
[ https://issues.apache.org/jira/browse/HIVE-15691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861858#comment-15861858 ] Roshan Naik commented on HIVE-15691: Should be able to get to it on Monday. > Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink > - > > Key: HIVE-15691 > URL: https://issues.apache.org/jira/browse/HIVE-15691 > Project: Hive > Issue Type: New Feature > Components: HCatalog, Transactions >Reporter: Kalyan >Assignee: Kalyan > Attachments: HIVE-15691.1.patch, HIVE-15691.patch, > HIVE-15691-updated.patch > > > Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink. > It is similar to StrictJsonWriter available in hive. > Dependency is there in flume to commit. > FLUME-3036 : Create a RegexSerializer for Hive Sink. > Patch is available for Flume, Please verify the below link > https://github.com/kalyanhadooptraining/flume/commit/1c651e81395404321f9964c8d9d2af6f4a2aaef9 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-15691) Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink
[ https://issues.apache.org/jira/browse/HIVE-15691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15858601#comment-15858601 ] Roshan Naik commented on HIVE-15691: StrictRegexWriter allows extracting fields from the incoming text based on user specified regexes. DelimitedWriter is for extracting fields from things like comma/tab separated text. My understanding is that, implementation wise, this is closely modeled around the DelimitedWriter. [~kalyanhadoop] can better speak to the implementation details. > Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink > - > > Key: HIVE-15691 > URL: https://issues.apache.org/jira/browse/HIVE-15691 > Project: Hive > Issue Type: New Feature > Components: HCatalog, Transactions >Reporter: Kalyan >Assignee: Kalyan > Attachments: HIVE-15691.1.patch, HIVE-15691.patch, > HIVE-15691-updated.patch > > > Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink. > It is similar to StrictJsonWriter available in hive. > Dependency is there in flume to commit. > FLUME-3036 : Create a RegexSerializer for Hive Sink. > Patch is available for Flume, Please verify the below link > https://github.com/kalyanhadooptraining/flume/commit/1c651e81395404321f9964c8d9d2af6f4a2aaef9 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-12421) Streaming API add TransactionBatch.beginNextTransaction(long timeout)
[ https://issues.apache.org/jira/browse/HIVE-12421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15010502#comment-15010502 ] Roshan Naik commented on HIVE-12421: though.. not having a timeout option on blocking call is an issue IMO > Streaming API add TransactionBatch.beginNextTransaction(long timeout) > - > > Key: HIVE-12421 > URL: https://issues.apache.org/jira/browse/HIVE-12421 > Project: Hive > Issue Type: Improvement > Components: HCatalog, Transactions >Affects Versions: 0.14.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman > > TransactionBatchImpl.beginNextTransactionImpl() has > {noformat} > LockResponse res = msClient.lock(lockRequest); > if (res.getState() != LockState.ACQUIRED) { > throw new TransactionError("Unable to acquire lock on " + endPt); > } > {noformat} > This means that if there are any competing locks already take, this will > throw an Exception to client. This doesn't seem like the right behavior. It > should block. > We could also add TransactionBatch.beginNextTransaction(long timeoutMs) to > give the client more control. > cc [~alangates] [~sriharsha] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12421) Streaming API TransactionBatch.beginNextTransaction() does not wait for locks
[ https://issues.apache.org/jira/browse/HIVE-12421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15009713#comment-15009713 ] Roshan Naik commented on HIVE-12421: I believe msClient.lock( ) itself is a blocking call. The return value check is done as thats the mechanism of the msclient.lock() api to indicate success. It is definitely a good idea to provide a msClient.lock( timeout ) overload for the API to invoke. > Streaming API TransactionBatch.beginNextTransaction() does not wait for locks > - > > Key: HIVE-12421 > URL: https://issues.apache.org/jira/browse/HIVE-12421 > Project: Hive > Issue Type: Bug > Components: HCatalog, Transactions >Affects Versions: 0.14.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman > > TransactionBatchImpl.beginNextTransactionImpl() has > {noformat} > LockResponse res = msClient.lock(lockRequest); > if (res.getState() != LockState.ACQUIRED) { > throw new TransactionError("Unable to acquire lock on " + endPt); > } > {noformat} > This means that if there are any competing locks already take, this will > throw an Exception to client. This doesn't seem like the right behavior. It > should block. > We could also add TransactionBatch.beginNextTransaction(long timeoutMs) to > give the client more control. > cc [~alangates] [~sriharsha] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12421) Streaming API TransactionBatch.beginNextTransaction() does not wait for locks
[ https://issues.apache.org/jira/browse/HIVE-12421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15009724#comment-15009724 ] Roshan Naik commented on HIVE-12421: I think the timeout should be set on the connection object as a property and reused internally on all blocking calls to the backend. .. instead of per method TransactionBatch.beginNextTransaction(long timeoutMs > Streaming API TransactionBatch.beginNextTransaction() does not wait for locks > - > > Key: HIVE-12421 > URL: https://issues.apache.org/jira/browse/HIVE-12421 > Project: Hive > Issue Type: Bug > Components: HCatalog, Transactions >Affects Versions: 0.14.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman > > TransactionBatchImpl.beginNextTransactionImpl() has > {noformat} > LockResponse res = msClient.lock(lockRequest); > if (res.getState() != LockState.ACQUIRED) { > throw new TransactionError("Unable to acquire lock on " + endPt); > } > {noformat} > This means that if there are any competing locks already take, this will > throw an Exception to client. This doesn't seem like the right behavior. It > should block. > We could also add TransactionBatch.beginNextTransaction(long timeoutMs) to > give the client more control. > cc [~alangates] [~sriharsha] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11672) Hive Streaming API handles bucketing incorrectly
[ https://issues.apache.org/jira/browse/HIVE-11672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14985919#comment-14985919 ] Roshan Naik commented on HIVE-11672: yes. > Hive Streaming API handles bucketing incorrectly > > > Key: HIVE-11672 > URL: https://issues.apache.org/jira/browse/HIVE-11672 > Project: Hive > Issue Type: Bug > Components: HCatalog, Hive, Transactions >Affects Versions: 1.2.1 >Reporter: Raj Bains >Assignee: Roshan Naik >Priority: Critical > > Hive Streaming API allows the clients to get a random bucket and then insert > data into it. However, this leads to incorrect bucketing as Hive expects data > to be distributed into buckets based on a hash function applied to bucket > key. The data is inserted randomly by the clients right now. They have no way > of > # Knowing what bucket a row (tuple) belongs to > # Asking for a specific bucket > There are optimization such as Sort Merge Join and Bucket Map Join that rely > on the data being correctly distributed across buckets and these will cause > incorrect read results if the data is not distributed correctly. > There are two obvious design choices > # Hive Streaming API should fix this internally by distributing the data > correctly > # Hive Streaming API should expose data distribution scheme to the clients > and allow them to distribute the data correctly > The first option will mean every client thread will write to many buckets, > causing many small files in each bucket and too many connections open. this > does not seem feasible. The second option pushes more functionality into the > client of the Hive Streaming API, but can maintain high throughput and write > good sized ORC files. This option seems preferable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9582) HCatalog should use IMetaStoreClient interface
[ https://issues.apache.org/jira/browse/HIVE-9582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14967419#comment-14967419 ] Roshan Naik commented on HIVE-9582: --- In this patch, HCatUtil.getHiveMetastoreClient() uses double checked locking pattern to implement singleton, which is a broken pattern. Created HIVE-12221 > HCatalog should use IMetaStoreClient interface > -- > > Key: HIVE-9582 > URL: https://issues.apache.org/jira/browse/HIVE-9582 > Project: Hive > Issue Type: Sub-task > Components: HCatalog, Metastore >Affects Versions: 0.14.0, 0.13.1 >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan > Labels: hcatalog, metastore, rolling_upgrade > Fix For: 1.2.0 > > Attachments: HIVE-9582.1.patch, HIVE-9582.2.patch, HIVE-9582.3.patch, > HIVE-9582.4.patch, HIVE-9582.5.patch, HIVE-9582.6.patch, HIVE-9582.7.patch, > HIVE-9582.8.patch, HIVE-9583.1.patch > > > Hive uses IMetaStoreClient and it makes using RetryingMetaStoreClient easy. > Hence during a failure, the client retries and possibly succeeds. But > HCatalog has long been using HiveMetaStoreClient directly and hence failures > are costly, especially if they are during the commit stage of a job. Its also > not possible to do rolling upgrade of MetaStore Server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12003) Hive Streaming API : Add check to ensure table is transactional
[ https://issues.apache.org/jira/browse/HIVE-12003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated HIVE-12003: --- Attachment: HIVE-12003.3.patch uploading patch v3 > Hive Streaming API : Add check to ensure table is transactional > --- > > Key: HIVE-12003 > URL: https://issues.apache.org/jira/browse/HIVE-12003 > Project: Hive > Issue Type: Bug > Components: HCatalog, Hive, Transactions >Affects Versions: 1.2.1 >Reporter: Roshan Naik >Assignee: Roshan Naik > Attachments: HIVE-12003.2.patch, HIVE-12003.3.patch, HIVE-12003.patch > > > Check if TBLPROPERTIES ('transactional'='true') is set when opening connection -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12003) Hive Streaming API : Add check to ensure table is transactional
[ https://issues.apache.org/jira/browse/HIVE-12003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14949405#comment-14949405 ] Roshan Naik commented on HIVE-12003: Was misled to thinking it was unrelated due to all the tests being a different module. Looks like the TestCompactor* test are related, but as best as i can tell these two are not - TestHCatClient.testTableSchemaPropagation and - TestSSL.testSSLVersion *WRT error message:* The actual Logs emitted are unfortunately getting swallowed in the UT run. Only stack trace shown and Exception's internal string which is not sufficiently descriptive. The LOG.error and LOG.warn are quite clear in checkEndPoint(). Will upload new patch to improve the message string in the exception and also fixed the TestCompactor tests > Hive Streaming API : Add check to ensure table is transactional > --- > > Key: HIVE-12003 > URL: https://issues.apache.org/jira/browse/HIVE-12003 > Project: Hive > Issue Type: Bug > Components: HCatalog, Hive, Transactions >Affects Versions: 1.2.1 >Reporter: Roshan Naik >Assignee: Roshan Naik > Attachments: HIVE-12003.2.patch, HIVE-12003.patch > > > Check if TBLPROPERTIES ('transactional'='true') is set when opening connection -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11983) Hive streaming API uses incorrect logic to assign buckets to incoming records
[ https://issues.apache.org/jira/browse/HIVE-11983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated HIVE-11983: --- Attachment: HIVE-11983.5.patch Uploading patch v5 addressing [~ekoifman]'s comments > Hive streaming API uses incorrect logic to assign buckets to incoming records > - > > Key: HIVE-11983 > URL: https://issues.apache.org/jira/browse/HIVE-11983 > Project: Hive > Issue Type: Bug > Components: HCatalog, Transactions >Affects Versions: 1.2.1 >Reporter: Roshan Naik >Assignee: Roshan Naik > Labels: streaming, streaming_api > Attachments: HIVE-11983.3.patch, HIVE-11983.4.patch, > HIVE-11983.5.patch, HIVE-11983.patch > > > The Streaming API tries to distribute records evenly into buckets. > All records in every Transaction that is part of TransactionBatch goes to the > same bucket and a new bucket number is chose for each TransactionBatch. > Fix: API needs to hash each record to determine which bucket it belongs to. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12003) Hive Streaming API : Add check to ensure table is transactional
[ https://issues.apache.org/jira/browse/HIVE-12003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14940402#comment-14940402 ] Roshan Naik commented on HIVE-12003: i am revising this patch to exclude the -w option > Hive Streaming API : Add check to ensure table is transactional > --- > > Key: HIVE-12003 > URL: https://issues.apache.org/jira/browse/HIVE-12003 > Project: Hive > Issue Type: Bug > Components: HCatalog, Hive, Transactions >Affects Versions: 1.2.1 >Reporter: Roshan Naik >Assignee: Roshan Naik > Attachments: HIVE-12003.patch > > > Check if TBLPROPERTIES ('transactional'='true') is set when opening connection -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11983) Hive streaming API uses incorrect logic to assign buckets to incoming records
[ https://issues.apache.org/jira/browse/HIVE-11983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated HIVE-11983: --- Attachment: HIVE-11983.4.patch Uploading v4 patch .. found that the patch was not applying due to the above noted -w option. This patch is created without the -w option and applies cleanly using 'patch -p0' and 'git apply -p 0' RB however still doesn't like it. This patch is on top of commit SHA 24988f7 (HIVE-11972) > Hive streaming API uses incorrect logic to assign buckets to incoming records > - > > Key: HIVE-11983 > URL: https://issues.apache.org/jira/browse/HIVE-11983 > Project: Hive > Issue Type: Bug > Components: HCatalog, Transactions >Affects Versions: 1.2.1 >Reporter: Roshan Naik >Assignee: Roshan Naik > Labels: streaming, streaming_api > Attachments: HIVE-11983.3.patch, HIVE-11983.4.patch, HIVE-11983.patch > > > The Streaming API tries to distribute records evenly into buckets. > All records in every Transaction that is part of TransactionBatch goes to the > same bucket and a new bucket number is chose for each TransactionBatch. > Fix: API needs to hash each record to determine which bucket it belongs to. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12003) Hive Streaming API : Add check to ensure table is transactional
[ https://issues.apache.org/jira/browse/HIVE-12003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated HIVE-12003: --- Attachment: HIVE-12003.2.patch > Hive Streaming API : Add check to ensure table is transactional > --- > > Key: HIVE-12003 > URL: https://issues.apache.org/jira/browse/HIVE-12003 > Project: Hive > Issue Type: Bug > Components: HCatalog, Hive, Transactions >Affects Versions: 1.2.1 >Reporter: Roshan Naik >Assignee: Roshan Naik > Attachments: HIVE-12003.2.patch, HIVE-12003.patch > > > Check if TBLPROPERTIES ('transactional'='true') is set when opening connection -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12003) Hive Streaming API : Add check to ensure table is transactional
[ https://issues.apache.org/jira/browse/HIVE-12003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated HIVE-12003: --- Attachment: HIVE-12003.patch Uploading patch (fix +UT) > Hive Streaming API : Add check to ensure table is transactional > --- > > Key: HIVE-12003 > URL: https://issues.apache.org/jira/browse/HIVE-12003 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.2.1 >Reporter: Roshan Naik >Assignee: Roshan Naik > Attachments: HIVE-12003.patch > > > Check if TBLPROPERTIES ('transactional'='true') is set when opening connection -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12003) Hive Streaming API : Add check to ensure table is transactional
[ https://issues.apache.org/jira/browse/HIVE-12003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14939001#comment-14939001 ] Roshan Naik commented on HIVE-12003: its better to make the normalize the case of in the code that sets the tblproperties() on table .. to either lower or uppercase .. instead of every lookup trying to do a case-insenstive lookup on the map > Hive Streaming API : Add check to ensure table is transactional > --- > > Key: HIVE-12003 > URL: https://issues.apache.org/jira/browse/HIVE-12003 > Project: Hive > Issue Type: Bug > Components: HCatalog, Hive, Transactions >Affects Versions: 1.2.1 >Reporter: Roshan Naik >Assignee: Roshan Naik > Attachments: HIVE-12003.patch > > > Check if TBLPROPERTIES ('transactional'='true') is set when opening connection -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12003) Hive Streaming API : Add check to ensure table is transactional
[ https://issues.apache.org/jira/browse/HIVE-12003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14939025#comment-14939025 ] Roshan Naik commented on HIVE-12003: doing it in tblproperties would not be in scope of this jira.. my point is its better to it there instead of here or other places where lookup is done. it defeats the purpose of using a Map<> > Hive Streaming API : Add check to ensure table is transactional > --- > > Key: HIVE-12003 > URL: https://issues.apache.org/jira/browse/HIVE-12003 > Project: Hive > Issue Type: Bug > Components: HCatalog, Hive, Transactions >Affects Versions: 1.2.1 >Reporter: Roshan Naik >Assignee: Roshan Naik > Attachments: HIVE-12003.patch > > > Check if TBLPROPERTIES ('transactional'='true') is set when opening connection -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11983) Hive streaming API uses incorrect logic to assign buckets to incoming records
[ https://issues.apache.org/jira/browse/HIVE-11983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated HIVE-11983: --- Attachment: HIVE-11983.v2.patch uploading patch generated with -w option to skip whitespace changes > Hive streaming API uses incorrect logic to assign buckets to incoming records > - > > Key: HIVE-11983 > URL: https://issues.apache.org/jira/browse/HIVE-11983 > Project: Hive > Issue Type: Bug > Components: HCatalog, Transactions >Affects Versions: 1.2.1 >Reporter: Roshan Naik >Assignee: Roshan Naik > Labels: streaming, streaming_api > Attachments: HIVE-11983.patch, HIVE-11983.v2.patch > > > The Streaming API tries to distribute records evenly into buckets. > All records in every Transaction that is part of TransactionBatch goes to the > same bucket and a new bucket number is chose for each TransactionBatch. > Fix: API needs to hash each record to determine which bucket it belongs to. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12003) Hive Streaming API : Add check to ensure table is transactional
[ https://issues.apache.org/jira/browse/HIVE-12003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14938999#comment-14938999 ] Roshan Naik commented on HIVE-12003: It seems strange to make the lookup key case insensitive. i don't think maps support case insensitive lookups. the only option would be to iterate.. is that what you are suggesting ? I could switch to using the constant but that wont make it case insensitive. > Hive Streaming API : Add check to ensure table is transactional > --- > > Key: HIVE-12003 > URL: https://issues.apache.org/jira/browse/HIVE-12003 > Project: Hive > Issue Type: Bug > Components: HCatalog, Hive, Transactions >Affects Versions: 1.2.1 >Reporter: Roshan Naik >Assignee: Roshan Naik > Attachments: HIVE-12003.patch > > > Check if TBLPROPERTIES ('transactional'='true') is set when opening connection -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11983) Hive streaming API uses incorrect logic to assign buckets to incoming records
[ https://issues.apache.org/jira/browse/HIVE-11983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated HIVE-11983: --- Attachment: HIVE-11983.3.patch [~ekoifman] looks like that patch generation had an issue. Some of those lines should not be deleted in OI-Utils. Uploading a revised patch v3 > Hive streaming API uses incorrect logic to assign buckets to incoming records > - > > Key: HIVE-11983 > URL: https://issues.apache.org/jira/browse/HIVE-11983 > Project: Hive > Issue Type: Bug > Components: HCatalog, Transactions >Affects Versions: 1.2.1 >Reporter: Roshan Naik >Assignee: Roshan Naik > Labels: streaming, streaming_api > Attachments: HIVE-11983.3.patch, HIVE-11983.patch > > > The Streaming API tries to distribute records evenly into buckets. > All records in every Transaction that is part of TransactionBatch goes to the > same bucket and a new bucket number is chose for each TransactionBatch. > Fix: API needs to hash each record to determine which bucket it belongs to. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11983) Hive streaming API uses incorrect logic to assign buckets to incoming records
[ https://issues.apache.org/jira/browse/HIVE-11983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14939107#comment-14939107 ] Roshan Naik commented on HIVE-11983: Deleted the v2 patch to ensure bot does not balk. > Hive streaming API uses incorrect logic to assign buckets to incoming records > - > > Key: HIVE-11983 > URL: https://issues.apache.org/jira/browse/HIVE-11983 > Project: Hive > Issue Type: Bug > Components: HCatalog, Transactions >Affects Versions: 1.2.1 >Reporter: Roshan Naik >Assignee: Roshan Naik > Labels: streaming, streaming_api > Attachments: HIVE-11983.3.patch, HIVE-11983.patch > > > The Streaming API tries to distribute records evenly into buckets. > All records in every Transaction that is part of TransactionBatch goes to the > same bucket and a new bucket number is chose for each TransactionBatch. > Fix: API needs to hash each record to determine which bucket it belongs to. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12003) Hive Streaming API : Add check to ensure table is transactional
[ https://issues.apache.org/jira/browse/HIVE-12003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14938977#comment-14938977 ] Roshan Naik commented on HIVE-12003: [~ekoifman] .. it is already case insensitive .. {code} if (transactionalProp != null && transactionalProp.equalsIgnoreCase("true")) { {code} > Hive Streaming API : Add check to ensure table is transactional > --- > > Key: HIVE-12003 > URL: https://issues.apache.org/jira/browse/HIVE-12003 > Project: Hive > Issue Type: Bug > Components: HCatalog, Hive, Transactions >Affects Versions: 1.2.1 >Reporter: Roshan Naik >Assignee: Roshan Naik > Attachments: HIVE-12003.patch > > > Check if TBLPROPERTIES ('transactional'='true') is set when opening connection -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11983) Hive streaming API uses incorrect logic to assign buckets to incoming records
[ https://issues.apache.org/jira/browse/HIVE-11983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14939056#comment-14939056 ] Roshan Naik commented on HIVE-11983: Created a review on RB, but RB is having some trouble showing the patch info due to some issue... not sure why https://reviews.apache.org/r/38911/ > Hive streaming API uses incorrect logic to assign buckets to incoming records > - > > Key: HIVE-11983 > URL: https://issues.apache.org/jira/browse/HIVE-11983 > Project: Hive > Issue Type: Bug > Components: HCatalog, Transactions >Affects Versions: 1.2.1 >Reporter: Roshan Naik >Assignee: Roshan Naik > Labels: streaming, streaming_api > Attachments: HIVE-11983.patch > > > The Streaming API tries to distribute records evenly into buckets. > All records in every Transaction that is part of TransactionBatch goes to the > same bucket and a new bucket number is chose for each TransactionBatch. > Fix: API needs to hash each record to determine which bucket it belongs to. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11983) Hive streaming API uses incorrect logic to assign buckets to incoming records
[ https://issues.apache.org/jira/browse/HIVE-11983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14939082#comment-14939082 ] Roshan Naik commented on HIVE-11983: tried a few different things.. but unable to resolve the RB issue. > Hive streaming API uses incorrect logic to assign buckets to incoming records > - > > Key: HIVE-11983 > URL: https://issues.apache.org/jira/browse/HIVE-11983 > Project: Hive > Issue Type: Bug > Components: HCatalog, Transactions >Affects Versions: 1.2.1 >Reporter: Roshan Naik >Assignee: Roshan Naik > Labels: streaming, streaming_api > Attachments: HIVE-11983.patch, HIVE-11983.v2.patch > > > The Streaming API tries to distribute records evenly into buckets. > All records in every Transaction that is part of TransactionBatch goes to the > same bucket and a new bucket number is chose for each TransactionBatch. > Fix: API needs to hash each record to determine which bucket it belongs to. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11983) Hive streaming API uses incorrect logic to assign buckets to incoming records
[ https://issues.apache.org/jira/browse/HIVE-11983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated HIVE-11983: --- Attachment: (was: HIVE-11983.v2.patch) > Hive streaming API uses incorrect logic to assign buckets to incoming records > - > > Key: HIVE-11983 > URL: https://issues.apache.org/jira/browse/HIVE-11983 > Project: Hive > Issue Type: Bug > Components: HCatalog, Transactions >Affects Versions: 1.2.1 >Reporter: Roshan Naik >Assignee: Roshan Naik > Labels: streaming, streaming_api > Attachments: HIVE-11983.3.patch, HIVE-11983.patch > > > The Streaming API tries to distribute records evenly into buckets. > All records in every Transaction that is part of TransactionBatch goes to the > same bucket and a new bucket number is chose for each TransactionBatch. > Fix: API needs to hash each record to determine which bucket it belongs to. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11983) Hive streaming API uses incorrect logic to assign buckets to incoming records
[ https://issues.apache.org/jira/browse/HIVE-11983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated HIVE-11983: --- Attachment: HIVE-11983.patch Uploading patch > Hive streaming API uses incorrect logic to assign buckets to incoming records > - > > Key: HIVE-11983 > URL: https://issues.apache.org/jira/browse/HIVE-11983 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 1.2.1 >Reporter: Roshan Naik >Assignee: Roshan Naik > Attachments: HIVE-11983.patch > > > The Streaming API tries to distribute records evenly into buckets. > All records in every Transaction that is part of TransactionBatch goes to the > same bucket and a new bucket number is chose for each TransactionBatch. > Fix: API needs to hash each record to determine which bucket it belongs to. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11983) Hive streaming API uses incorrect logic to assign buckets to incoming records
[ https://issues.apache.org/jira/browse/HIVE-11983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated HIVE-11983: --- Summary: Hive streaming API uses incorrect logic to assign buckets to incoming records (was: Hive streaming API's uses incorrect logic to assign buckets to incoming records) > Hive streaming API uses incorrect logic to assign buckets to incoming records > - > > Key: HIVE-11983 > URL: https://issues.apache.org/jira/browse/HIVE-11983 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 1.2.1 >Reporter: Roshan Naik >Assignee: Roshan Naik > > The Streaming API tries to distribute records evenly into buckets. > All records in every Transaction that is part of TransactionBatch goes to the > same bucket and a new bucket number is chose for each TransactionBatch. > Fix: API needs to hash each record to determine which bucket it belongs to. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11089) Hive Streaming: connection fails when using a proxy user UGI
[ https://issues.apache.org/jira/browse/HIVE-11089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14659125#comment-14659125 ] Roshan Naik commented on HIVE-11089: - That 'proxyUser' string argument was a parameter to a private method prior to kerberos support. it was never exposed externally and always set to null internally. At the time the thought was to support proxying but it never got fully tested. So I think i pulled it from the public interface very late in the dev cycle and did not reflect that in the wiki. I just updated the wiki. - With introduction of kerberos support, the internal 'proxyUser' was dropped, and UGI based 'authenticatedUser' argument was exposed publicly ... in a new overload for newConnection(). So to acquire connection as a user other than process user, kerberos will be needed. - Wiki has a secure/kerberos example at the bottom. that should work. API reference is in the Java Docs http://hive.apache.org/javadocs/r1.2.1/api/. References to proxyUser in the javadocs need to be fixed. Hive Streaming: connection fails when using a proxy user UGI Key: HIVE-11089 URL: https://issues.apache.org/jira/browse/HIVE-11089 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.14.0, 1.0.0, 1.2.0 Reporter: Adam Kunicki Labels: ACID, Streaming HIVE-7508 Add Kerberos Support seems to also remove the ability to specify a proxy user. HIVE-8427 adds a call to ugi.hasKerberosCredentials() to check whether the connection is supposed to be a secure connection. This however breaks support for Proxy Users as a proxy user UGI will always return false to hasKerberosCredentials(). See lines 273, 274 of HiveEndPoint.java {code} this.secureMode = ugi==null ? false : ugi.hasKerberosCredentials(); this.msClient = getMetaStoreClient(endPoint, conf, secureMode); {code} It also seems that between 13.1 and 0.14 the newConnection() method that includes a proxy user has been removed. for reference: https://github.com/apache/hive/commit/8e423a12db47759196c24535fbc32236b79f464a -- This message was sent by Atlassian JIRA (v6.3.4#6332)