[jira] [Created] (HIVE-12221) Concurrency issue in HCatUtil.getHiveMetastoreClient()
Roshan Naik created HIVE-12221: -- Summary: Concurrency issue in HCatUtil.getHiveMetastoreClient() Key: HIVE-12221 URL: https://issues.apache.org/jira/browse/HIVE-12221 Project: Hive Issue Type: Bug Reporter: Roshan Naik HCatUtil.getHiveMetastoreClient() uses double checked locking pattern to implement singleton, which is a broken pattern -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-12003) Hive Streaming API : Add check to ensure table is transactional
Roshan Naik created HIVE-12003: -- Summary: Hive Streaming API : Add check to ensure table is transactional Key: HIVE-12003 URL: https://issues.apache.org/jira/browse/HIVE-12003 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 1.2.1 Reporter: Roshan Naik Assignee: Roshan Naik Check if TBLPROPERTIES ('transactional'='true') is set when opening connection -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11983) Hive streaming API's uses incorrect logic to assign buckets to incoming records
Roshan Naik created HIVE-11983: -- Summary: Hive streaming API's uses incorrect logic to assign buckets to incoming records Key: HIVE-11983 URL: https://issues.apache.org/jira/browse/HIVE-11983 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 1.2.1 Reporter: Roshan Naik Assignee: Roshan Naik The Streaming API tries to distribute records evenly into buckets. All records in every Transaction that is part of TransactionBatch goes to the same bucket and a new bucket number is chose for each TransactionBatch. Fix: API needs to hash each record to determine which bucket it belongs to. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8629) Streaming / ACID : hive cli session creation takes too long and times out if execution engine is tez
[ https://issues.apache.org/jira/browse/HIVE-8629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189697#comment-14189697 ] Roshan Naik commented on HIVE-8629: --- I added the doc to https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest Streaming / ACID : hive cli session creation takes too long and times out if execution engine is tez Key: HIVE-8629 URL: https://issues.apache.org/jira/browse/HIVE-8629 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.14.0 Reporter: Roshan Naik Assignee: Roshan Naik Labels: ACID, Streaming, TODOC14 Fix For: 0.14.0 Attachments: HIVE-8629.patch, HIVE-8629.v2.patch When creating a hive session to run basic alter table create partition queries, the session creation takes too long (more than 5 sec) if the hive execution engine is set to tez. Since the streaming clients dont care about Tez , it can explicitly override the setting to mr. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8629) Streaming / ACID : hive cli session creation takes too long and times out if execution engine is tez
[ https://issues.apache.org/jira/browse/HIVE-8629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189705#comment-14189705 ] Roshan Naik commented on HIVE-8629: --- before should not be there.. just deleted it. thanks for catching it Streaming / ACID : hive cli session creation takes too long and times out if execution engine is tez Key: HIVE-8629 URL: https://issues.apache.org/jira/browse/HIVE-8629 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.14.0 Reporter: Roshan Naik Assignee: Roshan Naik Labels: ACID, Streaming Fix For: 0.14.0 Attachments: HIVE-8629.patch, HIVE-8629.v2.patch When creating a hive session to run basic alter table create partition queries, the session creation takes too long (more than 5 sec) if the hive execution engine is set to tez. Since the streaming clients dont care about Tez , it can explicitly override the setting to mr. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8629) Streaming / ACID : hive cli session creation takes too long and times out if execution engine is tez
[ https://issues.apache.org/jira/browse/HIVE-8629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14187329#comment-14187329 ] Roshan Naik commented on HIVE-8629: --- [~alangates] the without setugi, the directories created by the metastore during add partition etc are done as hive user instead of the client user of the metastore process, consequently leading to incorrect permissions and later failure to stream to those directories. WRT Log.info(): Since this done each time a new connection is created, which occurs multiple times over the duration of a long running streaming process, to reduce noise in the log output .. i am wondering if we should document this instead of log.info() ? Either I am fine. Let me know what you think. Streaming / ACID : hive cli session creation takes too long and times out if execution engine is tez Key: HIVE-8629 URL: https://issues.apache.org/jira/browse/HIVE-8629 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.14.0 Reporter: Roshan Naik Assignee: Roshan Naik Labels: ACID, Streaming Attachments: HIVE-8629.patch When creating a hive session to run basic alter table create partition queries, the session creation takes too long (more than 5 sec) if the hive execution engine is set to tez. Since the streaming clients dont care about Tez , it can explicitly override the setting to mr. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8629) Streaming / ACID : hive cli session creation takes too long and times out if execution engine is tez
[ https://issues.apache.org/jira/browse/HIVE-8629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated HIVE-8629: -- Attachment: HIVE-8629.v2.patch Revised patch incorporating Alan's comments Streaming / ACID : hive cli session creation takes too long and times out if execution engine is tez Key: HIVE-8629 URL: https://issues.apache.org/jira/browse/HIVE-8629 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.14.0 Reporter: Roshan Naik Assignee: Roshan Naik Labels: ACID, Streaming Attachments: HIVE-8629.patch, HIVE-8629.v2.patch When creating a hive session to run basic alter table create partition queries, the session creation takes too long (more than 5 sec) if the hive execution engine is set to tez. Since the streaming clients dont care about Tez , it can explicitly override the setting to mr. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8629) Streaming / ACID : hive cli session creation takes too long and times out if execution engine is tez
[ https://issues.apache.org/jira/browse/HIVE-8629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated HIVE-8629: -- Summary: Streaming / ACID : hive cli session creation takes too long and times out if execution engine is tez (was: Streaming / ACID : hive cli session creation takes too long and times out of execution engine is tez) Streaming / ACID : hive cli session creation takes too long and times out if execution engine is tez Key: HIVE-8629 URL: https://issues.apache.org/jira/browse/HIVE-8629 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.14.0 Reporter: Roshan Naik Assignee: Roshan Naik Labels: ACID, Streaming When creating a hive session to run basic alter table create partition queries, the session creation takes too long (more than 5 sec) if the hive execution engine is set to tez. Since the streaming clients dont care about Tez , it can explicitly override the setting to mr. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-8629) Streaming / ACID : hive cli session creation takes too long and times out of execution engine is tez
Roshan Naik created HIVE-8629: - Summary: Streaming / ACID : hive cli session creation takes too long and times out of execution engine is tez Key: HIVE-8629 URL: https://issues.apache.org/jira/browse/HIVE-8629 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.14.0 Reporter: Roshan Naik Assignee: Roshan Naik When creating a hive session to run basic alter table create partition queries, the session creation takes too long (more than 5 sec) if the hive execution engine is set to tez. Since the streaming clients dont care about Tez , it can explicitly override the setting to mr. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8629) Streaming / ACID : hive cli session creation takes too long and times out if execution engine is tez
[ https://issues.apache.org/jira/browse/HIVE-8629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated HIVE-8629: -- Attachment: HIVE-8629.patch Uploading patch Streaming / ACID : hive cli session creation takes too long and times out if execution engine is tez Key: HIVE-8629 URL: https://issues.apache.org/jira/browse/HIVE-8629 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.14.0 Reporter: Roshan Naik Assignee: Roshan Naik Labels: ACID, Streaming Attachments: HIVE-8629.patch When creating a hive session to run basic alter table create partition queries, the session creation takes too long (more than 5 sec) if the hive execution engine is set to tez. Since the streaming clients dont care about Tez , it can explicitly override the setting to mr. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8629) Streaming / ACID : hive cli session creation takes too long and times out if execution engine is tez
[ https://issues.apache.org/jira/browse/HIVE-8629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated HIVE-8629: -- Status: Patch Available (was: Open) Streaming / ACID : hive cli session creation takes too long and times out if execution engine is tez Key: HIVE-8629 URL: https://issues.apache.org/jira/browse/HIVE-8629 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.14.0 Reporter: Roshan Naik Assignee: Roshan Naik Labels: ACID, Streaming Attachments: HIVE-8629.patch When creating a hive session to run basic alter table create partition queries, the session creation takes too long (more than 5 sec) if the hive execution engine is set to tez. Since the streaming clients dont care about Tez , it can explicitly override the setting to mr. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-8476) JavaDoc updates to HiveEndPoint.newConnection() for secure streaming with Kerberos
Roshan Naik created HIVE-8476: - Summary: JavaDoc updates to HiveEndPoint.newConnection() for secure streaming with Kerberos Key: HIVE-8476 URL: https://issues.apache.org/jira/browse/HIVE-8476 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.13.1 Reporter: Roshan Naik Assignee: Roshan Naik Fix For: 0.14.0 Add additional notes on using kerberos authenticated streaming connection in HiveEndPoint.newConnection() method -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8476) JavaDoc updates to HiveEndPoint.newConnection() for secure streaming with Kerberos
[ https://issues.apache.org/jira/browse/HIVE-8476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated HIVE-8476: -- Status: Patch Available (was: Open) JavaDoc updates to HiveEndPoint.newConnection() for secure streaming with Kerberos -- Key: HIVE-8476 URL: https://issues.apache.org/jira/browse/HIVE-8476 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.13.1 Reporter: Roshan Naik Assignee: Roshan Naik Fix For: 0.14.0 Attachments: HIVE-8476.patch Add additional notes on using kerberos authenticated streaming connection in HiveEndPoint.newConnection() method -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8476) JavaDoc updates to HiveEndPoint.newConnection() for secure streaming with Kerberos
[ https://issues.apache.org/jira/browse/HIVE-8476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated HIVE-8476: -- Attachment: HIVE-8476.patch JavaDoc updates to HiveEndPoint.newConnection() for secure streaming with Kerberos -- Key: HIVE-8476 URL: https://issues.apache.org/jira/browse/HIVE-8476 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.13.1 Reporter: Roshan Naik Assignee: Roshan Naik Fix For: 0.14.0 Attachments: HIVE-8476.patch Add additional notes on using kerberos authenticated streaming connection in HiveEndPoint.newConnection() method -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-8427) Hive Streaming : throws NPE seen when streaming to secure Hive
Roshan Naik created HIVE-8427: - Summary: Hive Streaming : throws NPE seen when streaming to secure Hive Key: HIVE-8427 URL: https://issues.apache.org/jira/browse/HIVE-8427 Project: Hive Issue Type: Bug Components: HCatalog Reporter: Roshan Naik Assignee: Roshan Naik Fix For: 0.14.0 {code} 2014-10-08 08:13:48,745 (SinkRunner-PollingRunner-DefaultSinkProcessor) [ERROR - org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:160)] Unable to deliver event. Exception follows. org.apache.flume.EventDeliveryException: java.lang.NullPointerException at org.apache.flume.sink.hive.HiveSink.process(HiveSink.java:375) at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68) at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.NullPointerException at org.apache.hive.hcatalog.streaming.HiveEndPoint$ConnectionImpl.init(HiveEndPoint.java:265) at org.apache.hive.hcatalog.streaming.HiveEndPoint$ConnectionImpl.init(HiveEndPoint.java:238) at org.apache.hive.hcatalog.streaming.HiveEndPoint.newConnectionImpl(HiveEndPoint.java:175) at org.apache.hive.hcatalog.streaming.HiveEndPoint.newConnection(HiveEndPoint.java:152) at org.apache.flume.sink.hive.HiveWriter$6.call(HiveWriter.java:293) at org.apache.flume.sink.hive.HiveWriter$6.call(HiveWriter.java:290) at org.apache.flume.sink.hive.HiveWriter$9.call(HiveWriter.java:347) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) ... 1 more {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8427) Hive Streaming : secure streaming hangs.
[ https://issues.apache.org/jira/browse/HIVE-8427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated HIVE-8427: -- Summary: Hive Streaming : secure streaming hangs. (was: Hive Streaming : throws NPE seen when streaming to secure Hive) Hive Streaming : secure streaming hangs. - Key: HIVE-8427 URL: https://issues.apache.org/jira/browse/HIVE-8427 Project: Hive Issue Type: Bug Components: HCatalog Reporter: Roshan Naik Assignee: Roshan Naik Fix For: 0.14.0 {code} 2014-10-08 08:13:48,745 (SinkRunner-PollingRunner-DefaultSinkProcessor) [ERROR - org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:160)] Unable to deliver event. Exception follows. org.apache.flume.EventDeliveryException: java.lang.NullPointerException at org.apache.flume.sink.hive.HiveSink.process(HiveSink.java:375) at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68) at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.NullPointerException at org.apache.hive.hcatalog.streaming.HiveEndPoint$ConnectionImpl.init(HiveEndPoint.java:265) at org.apache.hive.hcatalog.streaming.HiveEndPoint$ConnectionImpl.init(HiveEndPoint.java:238) at org.apache.hive.hcatalog.streaming.HiveEndPoint.newConnectionImpl(HiveEndPoint.java:175) at org.apache.hive.hcatalog.streaming.HiveEndPoint.newConnection(HiveEndPoint.java:152) at org.apache.flume.sink.hive.HiveWriter$6.call(HiveWriter.java:293) at org.apache.flume.sink.hive.HiveWriter$6.call(HiveWriter.java:290) at org.apache.flume.sink.hive.HiveWriter$9.call(HiveWriter.java:347) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) ... 1 more {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8427) Hive Streaming : secure streaming hangs leading to time outs.
[ https://issues.apache.org/jira/browse/HIVE-8427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated HIVE-8427: -- Summary: Hive Streaming : secure streaming hangs leading to time outs. (was: Hive Streaming : secure streaming hangs.) Hive Streaming : secure streaming hangs leading to time outs. -- Key: HIVE-8427 URL: https://issues.apache.org/jira/browse/HIVE-8427 Project: Hive Issue Type: Bug Components: HCatalog Reporter: Roshan Naik Assignee: Roshan Naik Fix For: 0.14.0 The enableSasl setting -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8427) Hive Streaming : secure streaming hangs.
[ https://issues.apache.org/jira/browse/HIVE-8427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated HIVE-8427: -- Description: The enableSasl setting (was: {code} 2014-10-08 08:13:48,745 (SinkRunner-PollingRunner-DefaultSinkProcessor) [ERROR - org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:160)] Unable to deliver event. Exception follows. org.apache.flume.EventDeliveryException: java.lang.NullPointerException at org.apache.flume.sink.hive.HiveSink.process(HiveSink.java:375) at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68) at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.NullPointerException at org.apache.hive.hcatalog.streaming.HiveEndPoint$ConnectionImpl.init(HiveEndPoint.java:265) at org.apache.hive.hcatalog.streaming.HiveEndPoint$ConnectionImpl.init(HiveEndPoint.java:238) at org.apache.hive.hcatalog.streaming.HiveEndPoint.newConnectionImpl(HiveEndPoint.java:175) at org.apache.hive.hcatalog.streaming.HiveEndPoint.newConnection(HiveEndPoint.java:152) at org.apache.flume.sink.hive.HiveWriter$6.call(HiveWriter.java:293) at org.apache.flume.sink.hive.HiveWriter$6.call(HiveWriter.java:290) at org.apache.flume.sink.hive.HiveWriter$9.call(HiveWriter.java:347) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) ... 1 more {code}) Hive Streaming : secure streaming hangs. - Key: HIVE-8427 URL: https://issues.apache.org/jira/browse/HIVE-8427 Project: Hive Issue Type: Bug Components: HCatalog Reporter: Roshan Naik Assignee: Roshan Naik Fix For: 0.14.0 The enableSasl setting -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8427) Hive Streaming : secure streaming hangs leading to time outs.
[ https://issues.apache.org/jira/browse/HIVE-8427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated HIVE-8427: -- Attachment: HIVE-8427.patch Patch sets METASTORE_USE_THRIFT_SASL for secure streaming Hive Streaming : secure streaming hangs leading to time outs. -- Key: HIVE-8427 URL: https://issues.apache.org/jira/browse/HIVE-8427 Project: Hive Issue Type: Bug Components: HCatalog Reporter: Roshan Naik Assignee: Roshan Naik Fix For: 0.14.0 Attachments: HIVE-8427.patch The enableSasl setting -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8427) Hive Streaming : secure streaming hangs leading to time outs.
[ https://issues.apache.org/jira/browse/HIVE-8427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated HIVE-8427: -- Labels: ACID Streaming (was: ) Affects Version/s: 0.14.0 Status: Patch Available (was: Open) Hive Streaming : secure streaming hangs leading to time outs. -- Key: HIVE-8427 URL: https://issues.apache.org/jira/browse/HIVE-8427 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.14.0 Reporter: Roshan Naik Assignee: Roshan Naik Labels: ACID, Streaming Fix For: 0.14.0 Attachments: HIVE-8427.patch The enableSasl setting -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8427) Hive Streaming : secure streaming hangs leading to time outs.
[ https://issues.apache.org/jira/browse/HIVE-8427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated HIVE-8427: -- Description: Need to enable Thrift Sasl setting for secure mode communcation (was: The enableSasl setting ) Hive Streaming : secure streaming hangs leading to time outs. -- Key: HIVE-8427 URL: https://issues.apache.org/jira/browse/HIVE-8427 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.14.0 Reporter: Roshan Naik Assignee: Roshan Naik Labels: ACID, Streaming Fix For: 0.14.0 Attachments: HIVE-8427.patch Need to enable Thrift Sasl setting for secure mode communcation -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8427) Hive Streaming : secure streaming hangs leading to time outs.
[ https://issues.apache.org/jira/browse/HIVE-8427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14167763#comment-14167763 ] Roshan Naik commented on HIVE-8427: --- Not a security expert. But here is my thoughts... The streaming client (like flume) does not run on the hadoop cluster. It may in fact be streaming to one or more clusters. The reason for using ugi.hasKerberosCredentials() is not to detect if the hadoop cluster is security enabled, but to check if the streaming client has decided to use secure mode connection (by initializing kerberos on the ugi object). The client should be able to maintain multiple connections .. one to a secure cluster, and another to a non-secure cluster. Hive Streaming : secure streaming hangs leading to time outs. -- Key: HIVE-8427 URL: https://issues.apache.org/jira/browse/HIVE-8427 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.14.0 Reporter: Roshan Naik Assignee: Roshan Naik Labels: ACID, Streaming Fix For: 0.14.0 Attachments: HIVE-8427.patch Need to enable Thrift Sasl setting for secure mode communcation -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7508) Kerberos support for streaming
[ https://issues.apache.org/jira/browse/HIVE-7508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14136516#comment-14136516 ] Roshan Naik commented on HIVE-7508: --- [~leftylev] FYI... I have updated the wiki Kerberos support for streaming -- Key: HIVE-7508 URL: https://issues.apache.org/jira/browse/HIVE-7508 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.13.1 Reporter: Roshan Naik Assignee: Roshan Naik Labels: Streaming, TODOC14 Fix For: 0.14.0 Attachments: HIVE-7508.patch Add kerberos support for streaming to secure Hive cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7508) Kerberos support for streaming
[ https://issues.apache.org/jira/browse/HIVE-7508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14120448#comment-14120448 ] Roshan Naik commented on HIVE-7508: --- [~leftylev]. Yes Thanks for bringing it up. I will work with [~alangates] on updating that. Kerberos support for streaming -- Key: HIVE-7508 URL: https://issues.apache.org/jira/browse/HIVE-7508 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.13.1 Reporter: Roshan Naik Assignee: Roshan Naik Labels: Streaming, TODOC14 Fix For: 0.14.0 Attachments: HIVE-7508.patch Add kerberos support for streaming to secure Hive cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7508) Kerberos support for streaming
[ https://issues.apache.org/jira/browse/HIVE-7508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14120854#comment-14120854 ] Roshan Naik commented on HIVE-7508: --- [~leftylev] or [~ashutoshc] can you grant me write permission on that wiki ? Kerberos support for streaming -- Key: HIVE-7508 URL: https://issues.apache.org/jira/browse/HIVE-7508 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.13.1 Reporter: Roshan Naik Assignee: Roshan Naik Labels: Streaming, TODOC14 Fix For: 0.14.0 Attachments: HIVE-7508.patch Add kerberos support for streaming to secure Hive cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7508) Kerberos support for streaming
[ https://issues.apache.org/jira/browse/HIVE-7508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14075028#comment-14075028 ] Roshan Naik commented on HIVE-7508: --- Above errors are unrelated to patch. Kerberos support for streaming -- Key: HIVE-7508 URL: https://issues.apache.org/jira/browse/HIVE-7508 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.13.1 Reporter: Roshan Naik Labels: Streaming Attachments: HIVE-7508.patch Add kerberos support for streaming to secure Hive cluster. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7508) Kerberos support for streaming
Roshan Naik created HIVE-7508: - Summary: Kerberos support for streaming Key: HIVE-7508 URL: https://issues.apache.org/jira/browse/HIVE-7508 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.13.1 Reporter: Roshan Naik Add kerberos support for streaming to secure Hive cluster. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7508) Kerberos support for streaming
[ https://issues.apache.org/jira/browse/HIVE-7508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated HIVE-7508: -- Attachment: HIVE-7508.patch Kerberos support for streaming -- Key: HIVE-7508 URL: https://issues.apache.org/jira/browse/HIVE-7508 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.13.1 Reporter: Roshan Naik Labels: Streaming Attachments: HIVE-7508.patch Add kerberos support for streaming to secure Hive cluster. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7508) Kerberos support for streaming
[ https://issues.apache.org/jira/browse/HIVE-7508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated HIVE-7508: -- Status: Patch Available (was: Open) Kerberos support for streaming -- Key: HIVE-7508 URL: https://issues.apache.org/jira/browse/HIVE-7508 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.13.1 Reporter: Roshan Naik Labels: Streaming Attachments: HIVE-7508.patch Add kerberos support for streaming to secure Hive cluster. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7192) Hive Streaming - Some required settings are not mentioned in the documentation
Roshan Naik created HIVE-7192: - Summary: Hive Streaming - Some required settings are not mentioned in the documentation Key: HIVE-7192 URL: https://issues.apache.org/jira/browse/HIVE-7192 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.13.0 Reporter: Roshan Naik Assignee: Roshan Naik Specifically: - hive.support.concurrency on metastore - hive.vectorized.execution.enabled for query client -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7192) Hive Streaming - Some required settings are not mentioned in the documentation
[ https://issues.apache.org/jira/browse/HIVE-7192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated HIVE-7192: -- Attachment: HIVE-7192.patch uploading patch Hive Streaming - Some required settings are not mentioned in the documentation -- Key: HIVE-7192 URL: https://issues.apache.org/jira/browse/HIVE-7192 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.13.0 Reporter: Roshan Naik Assignee: Roshan Naik Labels: Streaming Attachments: HIVE-7192.patch Specifically: - hive.support.concurrency on metastore - hive.vectorized.execution.enabled for query client -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7192) Hive Streaming - Some required settings are not mentioned in the documentation
[ https://issues.apache.org/jira/browse/HIVE-7192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated HIVE-7192: -- Status: Patch Available (was: Open) Hive Streaming - Some required settings are not mentioned in the documentation -- Key: HIVE-7192 URL: https://issues.apache.org/jira/browse/HIVE-7192 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.13.0 Reporter: Roshan Naik Assignee: Roshan Naik Labels: Streaming Attachments: HIVE-7192.patch Specifically: - hive.support.concurrency on metastore - hive.vectorized.execution.enabled for query client -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-5687) Streaming support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated HIVE-5687: -- Attachment: (was: Hive Streaming Ingest API for v4 patch.pdf) Streaming support in Hive - Key: HIVE-5687 URL: https://issues.apache.org/jira/browse/HIVE-5687 Project: Hive Issue Type: Sub-task Reporter: Roshan Naik Assignee: Roshan Naik Labels: ACID, Streaming Fix For: 0.13.0 Attachments: 5687-api-spec4.pdf, 5687-draft-api-spec.pdf, 5687-draft-api-spec2.pdf, 5687-draft-api-spec3.pdf, HIVE-5687-unit-test-fix.patch, HIVE-5687.patch, HIVE-5687.v2.patch, HIVE-5687.v3.patch, HIVE-5687.v4.patch, HIVE-5687.v5.patch, HIVE-5687.v6.patch, HIVE-5687.v7.patch, Hive Streaming Ingest API for v3 patch.pdf, package.html Implement support for Streaming data into HIVE. - Provide a client streaming API - Transaction support: Clients should be able to periodically commit a batch of records atomically - Immediate visibility: Records should be immediately visible to queries on commit - Should not overload HDFS with too many small files Use Cases: - Streaming logs into HIVE via Flume - Streaming results of computations from Storm -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-5687) Streaming support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated HIVE-5687: -- Attachment: Hive Streaming Ingest API for v4 patch.pdf updating 'Hive Streaming Ingest API for v4 patch.pdf' document with requirements Streaming support in Hive - Key: HIVE-5687 URL: https://issues.apache.org/jira/browse/HIVE-5687 Project: Hive Issue Type: Sub-task Reporter: Roshan Naik Assignee: Roshan Naik Labels: ACID, Streaming Fix For: 0.13.0 Attachments: 5687-api-spec4.pdf, 5687-draft-api-spec.pdf, 5687-draft-api-spec2.pdf, 5687-draft-api-spec3.pdf, HIVE-5687-unit-test-fix.patch, HIVE-5687.patch, HIVE-5687.v2.patch, HIVE-5687.v3.patch, HIVE-5687.v4.patch, HIVE-5687.v5.patch, HIVE-5687.v6.patch, HIVE-5687.v7.patch, Hive Streaming Ingest API for v3 patch.pdf, Hive Streaming Ingest API for v4 patch.pdf, package.html Implement support for Streaming data into HIVE. - Provide a client streaming API - Transaction support: Clients should be able to periodically commit a batch of records atomically - Immediate visibility: Records should be immediately visible to queries on commit - Should not overload HDFS with too many small files Use Cases: - Streaming logs into HIVE via Flume - Streaming results of computations from Storm -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7153) HiveStreamin - Minor bug in TransactionBatch.toString() method
Roshan Naik created HIVE-7153: - Summary: HiveStreamin - Minor bug in TransactionBatch.toString() method Key: HIVE-7153 URL: https://issues.apache.org/jira/browse/HIVE-7153 Project: Hive Issue Type: Bug Components: HCatalog Reporter: Roshan Naik Assignee: Roshan Naik The toString() method current returns : {code} return TxnIds=[ + txnIds.get(0) + src/gen/thrift + txnIds.get(txnIds.size()-1) + ] on endPoint= + endPt; {code} The src/gen/thrift there is a typo and needs to replaced with ... -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7153) HiveStreamin - Minor bug in TransactionBatch.toString() method
[ https://issues.apache.org/jira/browse/HIVE-7153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated HIVE-7153: -- Description: The TransactionBatchImpl.toString() method currently returns : {code} return TxnIds=[ + txnIds.get(0) + src/gen/thrift + txnIds.get(txnIds.size()-1) + ] on endPoint= + endPt; {code} The src/gen/thrift there is a typo and needs to replaced with ... was: The toString() method current returns : {code} return TxnIds=[ + txnIds.get(0) + src/gen/thrift + txnIds.get(txnIds.size()-1) + ] on endPoint= + endPt; {code} The src/gen/thrift there is a typo and needs to replaced with ... HiveStreamin - Minor bug in TransactionBatch.toString() method -- Key: HIVE-7153 URL: https://issues.apache.org/jira/browse/HIVE-7153 Project: Hive Issue Type: Bug Components: HCatalog Reporter: Roshan Naik Assignee: Roshan Naik Labels: Streaming The TransactionBatchImpl.toString() method currently returns : {code} return TxnIds=[ + txnIds.get(0) + src/gen/thrift + txnIds.get(txnIds.size()-1) + ] on endPoint= + endPt; {code} The src/gen/thrift there is a typo and needs to replaced with ... -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7153) HiveStreamin - Minor bug in TransactionBatch.toString() method
[ https://issues.apache.org/jira/browse/HIVE-7153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated HIVE-7153: -- Attachment: HIVE-7153.patch uploading patch HiveStreamin - Minor bug in TransactionBatch.toString() method -- Key: HIVE-7153 URL: https://issues.apache.org/jira/browse/HIVE-7153 Project: Hive Issue Type: Bug Components: HCatalog Reporter: Roshan Naik Assignee: Roshan Naik Labels: Streaming Attachments: HIVE-7153.patch The TransactionBatchImpl.toString() method currently returns : {code} return TxnIds=[ + txnIds.get(0) + src/gen/thrift + txnIds.get(txnIds.size()-1) + ] on endPoint= + endPt; {code} The src/gen/thrift there is a typo and needs to replaced with ... -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7153) HiveStreamin - Minor bug in TransactionBatch.toString() method
[ https://issues.apache.org/jira/browse/HIVE-7153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated HIVE-7153: -- Affects Version/s: 0.13.0 Status: Patch Available (was: Open) HiveStreamin - Minor bug in TransactionBatch.toString() method -- Key: HIVE-7153 URL: https://issues.apache.org/jira/browse/HIVE-7153 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.13.0 Reporter: Roshan Naik Assignee: Roshan Naik Labels: Streaming Attachments: HIVE-7153.patch The TransactionBatchImpl.toString() method currently returns : {code} return TxnIds=[ + txnIds.get(0) + src/gen/thrift + txnIds.get(txnIds.size()-1) + ] on endPoint= + endPt; {code} The src/gen/thrift there is a typo and needs to replaced with ... -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7153) HiveStreaming - Bug in TransactionBatch.toString() method
[ https://issues.apache.org/jira/browse/HIVE-7153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated HIVE-7153: -- Summary: HiveStreaming - Bug in TransactionBatch.toString() method (was: HiveStreamin - Minor bug in TransactionBatch.toString() method) HiveStreaming - Bug in TransactionBatch.toString() method - Key: HIVE-7153 URL: https://issues.apache.org/jira/browse/HIVE-7153 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.13.0 Reporter: Roshan Naik Assignee: Roshan Naik Labels: Streaming Attachments: HIVE-7153.patch The TransactionBatchImpl.toString() method currently returns : {code} return TxnIds=[ + txnIds.get(0) + src/gen/thrift + txnIds.get(txnIds.size()-1) + ] on endPoint= + endPt; {code} The src/gen/thrift there is a typo and needs to replaced with ... -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6890) Bug in HiveStreaming API causes problems if hive-site.xml is missing on streaming client side
[ https://issues.apache.org/jira/browse/HIVE-6890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13966966#comment-13966966 ] Roshan Naik commented on HIVE-6890: --- Test failure is unrelated to patch. Bug in HiveStreaming API causes problems if hive-site.xml is missing on streaming client side - Key: HIVE-6890 URL: https://issues.apache.org/jira/browse/HIVE-6890 Project: Hive Issue Type: Bug Reporter: Roshan Naik Assignee: Roshan Naik Attachments: HIVE-6890.patch Incorrect conf object being passed to MetaStore client in AbstractRecordWriter is causing the issue. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-5687) Streaming support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13965129#comment-13965129 ] Roshan Naik commented on HIVE-5687: --- [~leftylev] Yes looks like it went unnoticed due to the short time frame. For some reason i never got a notification of your review. We can get it in via another patch... but it appears to be too late to get it into this release. [~orahive] You can query the data while it is being streamed into Hive. Queries will always see a consistent view of the data as this feature relies on the new ACID support in Hive. So queries will not see new data that was committed after they began executing. FLUME-1734 consumes this API to implement a Flume sink that streams data continuously into Hive. Streaming support in Hive - Key: HIVE-5687 URL: https://issues.apache.org/jira/browse/HIVE-5687 Project: Hive Issue Type: Sub-task Reporter: Roshan Naik Assignee: Roshan Naik Labels: ACID, Streaming Fix For: 0.13.0 Attachments: 5687-api-spec4.pdf, 5687-draft-api-spec.pdf, 5687-draft-api-spec2.pdf, 5687-draft-api-spec3.pdf, HIVE-5687-unit-test-fix.patch, HIVE-5687.patch, HIVE-5687.v2.patch, HIVE-5687.v3.patch, HIVE-5687.v4.patch, HIVE-5687.v5.patch, HIVE-5687.v6.patch, HIVE-5687.v7.patch, Hive Streaming Ingest API for v3 patch.pdf, Hive Streaming Ingest API for v4 patch.pdf, package.html Implement support for Streaming data into HIVE. - Provide a client streaming API - Transaction support: Clients should be able to periodically commit a batch of records atomically - Immediate visibility: Records should be immediately visible to queries on commit - Should not overload HDFS with too many small files Use Cases: - Streaming logs into HIVE via Flume - Streaming results of computations from Storm -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6890) Bug in HiveStreaming API causes problems if hive-site.xml is missing on streaming client side
[ https://issues.apache.org/jira/browse/HIVE-6890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated HIVE-6890: -- Attachment: HIVE-6890.patch uploading patch Bug in HiveStreaming API causes problems if hive-site.xml is missing on streaming client side - Key: HIVE-6890 URL: https://issues.apache.org/jira/browse/HIVE-6890 Project: Hive Issue Type: Bug Reporter: Roshan Naik Assignee: Roshan Naik Attachments: HIVE-6890.patch Incorrect conf object being passed to MetaStore client in AbstractRecordWriter is causing the issue. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-6890) Bug in HiveStreaming API causes problems if hive-site.xml is missing on streaming client side
Roshan Naik created HIVE-6890: - Summary: Bug in HiveStreaming API causes problems if hive-site.xml is missing on streaming client side Key: HIVE-6890 URL: https://issues.apache.org/jira/browse/HIVE-6890 Project: Hive Issue Type: Bug Reporter: Roshan Naik Assignee: Roshan Naik Attachments: HIVE-6890.patch Incorrect conf object being passed to MetaStore client in AbstractRecordWriter is causing the issue. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6890) Bug in HiveStreaming API causes problems if hive-site.xml is missing on streaming client side
[ https://issues.apache.org/jira/browse/HIVE-6890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated HIVE-6890: -- Status: Patch Available (was: Open) Bug in HiveStreaming API causes problems if hive-site.xml is missing on streaming client side - Key: HIVE-6890 URL: https://issues.apache.org/jira/browse/HIVE-6890 Project: Hive Issue Type: Bug Reporter: Roshan Naik Assignee: Roshan Naik Attachments: HIVE-6890.patch Incorrect conf object being passed to MetaStore client in AbstractRecordWriter is causing the issue. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-5687) Streaming support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated HIVE-5687: -- Attachment: HIVE-5687.v7.patch patch v7 using package.html from Owen and fixing a bug in packaging Streaming support in Hive - Key: HIVE-5687 URL: https://issues.apache.org/jira/browse/HIVE-5687 Project: Hive Issue Type: Sub-task Reporter: Roshan Naik Assignee: Roshan Naik Labels: ACID, Streaming Fix For: 0.13.0 Attachments: 5687-api-spec4.pdf, 5687-draft-api-spec.pdf, 5687-draft-api-spec2.pdf, 5687-draft-api-spec3.pdf, HIVE-5687-unit-test-fix.patch, HIVE-5687.patch, HIVE-5687.v2.patch, HIVE-5687.v3.patch, HIVE-5687.v4.patch, HIVE-5687.v5.patch, HIVE-5687.v6.patch, HIVE-5687.v7.patch, Hive Streaming Ingest API for v3 patch.pdf, Hive Streaming Ingest API for v4 patch.pdf, package.html Implement support for Streaming data into HIVE. - Provide a client streaming API - Transaction support: Clients should be able to periodically commit a batch of records atomically - Immediate visibility: Records should be immediately visible to queries on commit - Should not overload HDFS with too many small files Use Cases: - Streaming logs into HIVE via Flume - Streaming results of computations from Storm -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-5687) Streaming support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13963560#comment-13963560 ] Roshan Naik commented on HIVE-5687: --- Owen: Thanks a lot for revising package.html Streaming support in Hive - Key: HIVE-5687 URL: https://issues.apache.org/jira/browse/HIVE-5687 Project: Hive Issue Type: Sub-task Reporter: Roshan Naik Assignee: Roshan Naik Labels: ACID, Streaming Fix For: 0.13.0 Attachments: 5687-api-spec4.pdf, 5687-draft-api-spec.pdf, 5687-draft-api-spec2.pdf, 5687-draft-api-spec3.pdf, HIVE-5687-unit-test-fix.patch, HIVE-5687.patch, HIVE-5687.v2.patch, HIVE-5687.v3.patch, HIVE-5687.v4.patch, HIVE-5687.v5.patch, HIVE-5687.v6.patch, Hive Streaming Ingest API for v3 patch.pdf, Hive Streaming Ingest API for v4 patch.pdf, package.html Implement support for Streaming data into HIVE. - Provide a client streaming API - Transaction support: Clients should be able to periodically commit a batch of records atomically - Immediate visibility: Records should be immediately visible to queries on commit - Should not overload HDFS with too many small files Use Cases: - Streaming logs into HIVE via Flume - Streaming results of computations from Storm -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-5687) Streaming support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13963820#comment-13963820 ] Roshan Naik commented on HIVE-5687: --- I had posted the revised patch on RB Streaming support in Hive - Key: HIVE-5687 URL: https://issues.apache.org/jira/browse/HIVE-5687 Project: Hive Issue Type: Sub-task Reporter: Roshan Naik Assignee: Roshan Naik Labels: ACID, Streaming Fix For: 0.13.0 Attachments: 5687-api-spec4.pdf, 5687-draft-api-spec.pdf, 5687-draft-api-spec2.pdf, 5687-draft-api-spec3.pdf, HIVE-5687-unit-test-fix.patch, HIVE-5687.patch, HIVE-5687.v2.patch, HIVE-5687.v3.patch, HIVE-5687.v4.patch, HIVE-5687.v5.patch, HIVE-5687.v6.patch, Hive Streaming Ingest API for v3 patch.pdf, Hive Streaming Ingest API for v4 patch.pdf, package.html Implement support for Streaming data into HIVE. - Provide a client streaming API - Transaction support: Clients should be able to periodically commit a batch of records atomically - Immediate visibility: Records should be immediately visible to queries on commit - Should not overload HDFS with too many small files Use Cases: - Streaming logs into HIVE via Flume - Streaming results of computations from Storm -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-5687) Streaming support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated HIVE-5687: -- Attachment: HIVE-5687.v6.patch Addressing review comments from Alan, Owen and some of Lars. Owen: DDL was used there mostly for convenience and correctness. The other places where API is used, cannot be accomplished via DDL. Streaming support in Hive - Key: HIVE-5687 URL: https://issues.apache.org/jira/browse/HIVE-5687 Project: Hive Issue Type: Sub-task Reporter: Roshan Naik Assignee: Roshan Naik Labels: ACID, Streaming Fix For: 0.13.0 Attachments: 5687-api-spec4.pdf, 5687-draft-api-spec.pdf, 5687-draft-api-spec2.pdf, 5687-draft-api-spec3.pdf, HIVE-5687-unit-test-fix.patch, HIVE-5687.patch, HIVE-5687.v2.patch, HIVE-5687.v3.patch, HIVE-5687.v4.patch, HIVE-5687.v5.patch, HIVE-5687.v6.patch, Hive Streaming Ingest API for v3 patch.pdf, Hive Streaming Ingest API for v4 patch.pdf Implement support for Streaming data into HIVE. - Provide a client streaming API - Transaction support: Clients should be able to periodically commit a batch of records atomically - Immediate visibility: Records should be immediately visible to queries on commit - Should not overload HDFS with too many small files Use Cases: - Streaming logs into HIVE via Flume - Streaming results of computations from Storm -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-5687) Streaming support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated HIVE-5687: -- Attachment: (was: HIVE-5687.v5.patch) Streaming support in Hive - Key: HIVE-5687 URL: https://issues.apache.org/jira/browse/HIVE-5687 Project: Hive Issue Type: Sub-task Reporter: Roshan Naik Assignee: Roshan Naik Labels: ACID, Streaming Fix For: 0.13.0 Attachments: 5687-api-spec4.pdf, 5687-draft-api-spec.pdf, 5687-draft-api-spec2.pdf, 5687-draft-api-spec3.pdf, HIVE-5687-unit-test-fix.patch, HIVE-5687.patch, HIVE-5687.v2.patch, HIVE-5687.v3.patch, HIVE-5687.v4.patch, Hive Streaming Ingest API for v3 patch.pdf, Hive Streaming Ingest API for v4 patch.pdf Implement support for Streaming data into HIVE. - Provide a client streaming API - Transaction support: Clients should be able to periodically commit a batch of records atomically - Immediate visibility: Records should be immediately visible to queries on commit - Should not overload HDFS with too many small files Use Cases: - Streaming logs into HIVE via Flume - Streaming results of computations from Storm -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-5687) Streaming support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated HIVE-5687: -- Attachment: HIVE-5687.v5.patch refreshing patch v5 with minor fix to compile with hadoop1 profile Streaming support in Hive - Key: HIVE-5687 URL: https://issues.apache.org/jira/browse/HIVE-5687 Project: Hive Issue Type: Sub-task Reporter: Roshan Naik Assignee: Roshan Naik Labels: ACID, Streaming Fix For: 0.13.0 Attachments: 5687-api-spec4.pdf, 5687-draft-api-spec.pdf, 5687-draft-api-spec2.pdf, 5687-draft-api-spec3.pdf, HIVE-5687-unit-test-fix.patch, HIVE-5687.patch, HIVE-5687.v2.patch, HIVE-5687.v3.patch, HIVE-5687.v4.patch, HIVE-5687.v5.patch, Hive Streaming Ingest API for v3 patch.pdf, Hive Streaming Ingest API for v4 patch.pdf Implement support for Streaming data into HIVE. - Provide a client streaming API - Transaction support: Clients should be able to periodically commit a batch of records atomically - Immediate visibility: Records should be immediately visible to queries on commit - Should not overload HDFS with too many small files Use Cases: - Streaming logs into HIVE via Flume - Streaming results of computations from Storm -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-5687) Streaming support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated HIVE-5687: -- Attachment: HIVE-5687.v5.patch v5 patch addresses Owen's comments - fixes for unit test issues Streaming support in Hive - Key: HIVE-5687 URL: https://issues.apache.org/jira/browse/HIVE-5687 Project: Hive Issue Type: Sub-task Reporter: Roshan Naik Assignee: Roshan Naik Attachments: 5687-api-spec4.pdf, 5687-draft-api-spec.pdf, 5687-draft-api-spec2.pdf, 5687-draft-api-spec3.pdf, HIVE-5687-unit-test-fix.patch, HIVE-5687.patch, HIVE-5687.v2.patch, HIVE-5687.v3.patch, HIVE-5687.v4.patch, HIVE-5687.v5.patch, Hive Streaming Ingest API for v3 patch.pdf, Hive Streaming Ingest API for v4 patch.pdf Implement support for Streaming data into HIVE. - Provide a client streaming API - Transaction support: Clients should be able to periodically commit a batch of records atomically - Immediate visibility: Records should be immediately visible to queries on commit - Should not overload HDFS with too many small files Use Cases: - Streaming logs into HIVE via Flume - Streaming results of computations from Storm -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-5687) Streaming support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated HIVE-5687: -- Tags: Streaming ACID Fix Version/s: 0.13.0 Labels: ACID Streaming (was: ) Release Note: New transactional APIs to support Streaming data directly into Hive. Status: Patch Available (was: Open) Streaming support in Hive - Key: HIVE-5687 URL: https://issues.apache.org/jira/browse/HIVE-5687 Project: Hive Issue Type: Sub-task Reporter: Roshan Naik Assignee: Roshan Naik Labels: Streaming, ACID Fix For: 0.13.0 Attachments: 5687-api-spec4.pdf, 5687-draft-api-spec.pdf, 5687-draft-api-spec2.pdf, 5687-draft-api-spec3.pdf, HIVE-5687-unit-test-fix.patch, HIVE-5687.patch, HIVE-5687.v2.patch, HIVE-5687.v3.patch, HIVE-5687.v4.patch, HIVE-5687.v5.patch, Hive Streaming Ingest API for v3 patch.pdf, Hive Streaming Ingest API for v4 patch.pdf Implement support for Streaming data into HIVE. - Provide a client streaming API - Transaction support: Clients should be able to periodically commit a batch of records atomically - Immediate visibility: Records should be immediately visible to queries on commit - Should not overload HDFS with too many small files Use Cases: - Streaming logs into HIVE via Flume - Streaming results of computations from Storm -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-5687) Streaming support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated HIVE-5687: -- Attachment: Hive Streaming Ingest API for v4 patch.pdf HIVE-5687.v4.patch v4 patch .. Adding JSON writer suport, tweaks to JavaDocs. Updated pdf Document Streaming support in Hive - Key: HIVE-5687 URL: https://issues.apache.org/jira/browse/HIVE-5687 Project: Hive Issue Type: Sub-task Reporter: Roshan Naik Assignee: Roshan Naik Attachments: 5687-api-spec4.pdf, 5687-draft-api-spec.pdf, 5687-draft-api-spec2.pdf, 5687-draft-api-spec3.pdf, HIVE-5687.patch, HIVE-5687.v2.patch, HIVE-5687.v3.patch, HIVE-5687.v4.patch, Hive Streaming Ingest API for v3 patch.pdf, Hive Streaming Ingest API for v4 patch.pdf Implement support for Streaming data into HIVE. - Provide a client streaming API - Transaction support: Clients should be able to periodically commit a batch of records atomically - Immediate visibility: Records should be immediately visible to queries on commit - Should not overload HDFS with too many small files Use Cases: - Streaming logs into HIVE via Flume - Streaming results of computations from Storm -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-5687) Streaming support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated HIVE-5687: -- Attachment: HIVE-5687.v3.patch - Addressed review comments from Alan. - Tweaked the APIs - Wrote Java docs - Added heartbeat support - Improved log messages - More tests - Fixes for multiple bugs found during unit testing and manual testing Streaming support in Hive - Key: HIVE-5687 URL: https://issues.apache.org/jira/browse/HIVE-5687 Project: Hive Issue Type: Sub-task Reporter: Roshan Naik Assignee: Roshan Naik Attachments: 5687-api-spec4.pdf, 5687-draft-api-spec.pdf, 5687-draft-api-spec2.pdf, 5687-draft-api-spec3.pdf, HIVE-5687.patch, HIVE-5687.v2.patch, HIVE-5687.v3.patch Implement support for Streaming data into HIVE. - Provide a client streaming API - Transaction support: Clients should be able to periodically commit a batch of records atomically - Immediate visibility: Records should be immediately visible to queries on commit - Should not overload HDFS with too many small files Use Cases: - Streaming logs into HIVE via Flume - Streaming results of computations from Storm -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-5687) Streaming support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated HIVE-5687: -- Attachment: Hive Streaming Ingest API for v3 patch.pdf Adding design spec documentation for v3 patch Streaming support in Hive - Key: HIVE-5687 URL: https://issues.apache.org/jira/browse/HIVE-5687 Project: Hive Issue Type: Sub-task Reporter: Roshan Naik Assignee: Roshan Naik Attachments: 5687-api-spec4.pdf, 5687-draft-api-spec.pdf, 5687-draft-api-spec2.pdf, 5687-draft-api-spec3.pdf, HIVE-5687.patch, HIVE-5687.v2.patch, HIVE-5687.v3.patch, Hive Streaming Ingest API for v3 patch.pdf Implement support for Streaming data into HIVE. - Provide a client streaming API - Transaction support: Clients should be able to periodically commit a batch of records atomically - Immediate visibility: Records should be immediately visible to queries on commit - Should not overload HDFS with too many small files Use Cases: - Streaming logs into HIVE via Flume - Streaming results of computations from Storm -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (HIVE-2442) Metastore upgrade script and schema DDL for Hive 0.8.0
[ https://issues.apache.org/jira/browse/HIVE-2442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik reassigned HIVE-2442: - Assignee: Roshan Naik (was: Carl Steinbach) Metastore upgrade script and schema DDL for Hive 0.8.0 -- Key: HIVE-2442 URL: https://issues.apache.org/jira/browse/HIVE-2442 Project: Hive Issue Type: Task Components: Metastore Reporter: Carl Steinbach Assignee: Roshan Naik Priority: Blocker Fix For: 0.8.0 Attachments: HIVE-2442-branch-08.1.patch.txt, HIVE-2442-trunk.1.patch.txt -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-5687) Streaming support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated HIVE-5687: -- Issue Type: Sub-task (was: Bug) Parent: HIVE-5317 Streaming support in Hive - Key: HIVE-5687 URL: https://issues.apache.org/jira/browse/HIVE-5687 Project: Hive Issue Type: Sub-task Reporter: Roshan Naik Assignee: Roshan Naik Attachments: 5687-api-spec4.pdf, 5687-draft-api-spec.pdf, 5687-draft-api-spec2.pdf, 5687-draft-api-spec3.pdf, HIVE-5687.patch, HIVE-5687.v2.patch Implement support for Streaming data into HIVE. - Provide a client streaming API - Transaction support: Clients should be able to periodically commit a batch of records atomically - Immediate visibility: Records should be immediately visible to queries on commit - Should not overload HDFS with too many small files Use Cases: - Streaming logs into HIVE via Flume - Streaming results of computations from Storm -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-5687) Streaming support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated HIVE-5687: -- Attachment: (was: HIVE-5687.v2.patch) Streaming support in Hive - Key: HIVE-5687 URL: https://issues.apache.org/jira/browse/HIVE-5687 Project: Hive Issue Type: Bug Reporter: Roshan Naik Assignee: Roshan Naik Attachments: 5687-api-spec4.pdf, 5687-draft-api-spec.pdf, 5687-draft-api-spec2.pdf, 5687-draft-api-spec3.pdf, HIVE-5687.patch, HIVE-5687.v2.patch Implement support for Streaming data into HIVE. - Provide a client streaming API - Transaction support: Clients should be able to periodically commit a batch of records atomically - Immediate visibility: Records should be immediately visible to queries on commit - Should not overload HDFS with too many small files Use Cases: - Streaming logs into HIVE via Flume - Streaming results of computations from Storm -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-5687) Streaming support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated HIVE-5687: -- Attachment: HIVE-5687.v2.patch updating patch v2 with minor tweaks Streaming support in Hive - Key: HIVE-5687 URL: https://issues.apache.org/jira/browse/HIVE-5687 Project: Hive Issue Type: Bug Reporter: Roshan Naik Assignee: Roshan Naik Attachments: 5687-api-spec4.pdf, 5687-draft-api-spec.pdf, 5687-draft-api-spec2.pdf, 5687-draft-api-spec3.pdf, HIVE-5687.patch, HIVE-5687.v2.patch Implement support for Streaming data into HIVE. - Provide a client streaming API - Transaction support: Clients should be able to periodically commit a batch of records atomically - Immediate visibility: Records should be immediately visible to queries on commit - Should not overload HDFS with too many small files Use Cases: - Streaming logs into HIVE via Flume - Streaming results of computations from Storm -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-5687) Streaming support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated HIVE-5687: -- Attachment: 5687-api-spec4.docx HIVE-5687.v2.patch Revising API in patch and Spec to handle mapping of incoming data format to corresponding cols in table (RecordWriter interface). Adding out of the box support for Delimited text formats. More formats are pluggable. Added support for auto creation of new partitions for streaming clients Streaming support in Hive - Key: HIVE-5687 URL: https://issues.apache.org/jira/browse/HIVE-5687 Project: Hive Issue Type: Bug Reporter: Roshan Naik Assignee: Roshan Naik Attachments: 5687-api-spec4.docx, 5687-draft-api-spec.pdf, 5687-draft-api-spec2.pdf, 5687-draft-api-spec3.pdf, HIVE-5687.patch, HIVE-5687.v2.patch Implement support for Streaming data into HIVE. - Provide a client streaming API - Transaction support: Clients should be able to periodically commit a batch of records atomically - Immediate visibility: Records should be immediately visible to queries on commit - Should not overload HDFS with too many small files Use Cases: - Streaming logs into HIVE via Flume - Streaming results of computations from Storm -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-5687) Streaming support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated HIVE-5687: -- Attachment: (was: 5687-api-spec4.docx) Streaming support in Hive - Key: HIVE-5687 URL: https://issues.apache.org/jira/browse/HIVE-5687 Project: Hive Issue Type: Bug Reporter: Roshan Naik Assignee: Roshan Naik Attachments: 5687-draft-api-spec.pdf, 5687-draft-api-spec2.pdf, 5687-draft-api-spec3.pdf, HIVE-5687.patch, HIVE-5687.v2.patch Implement support for Streaming data into HIVE. - Provide a client streaming API - Transaction support: Clients should be able to periodically commit a batch of records atomically - Immediate visibility: Records should be immediately visible to queries on commit - Should not overload HDFS with too many small files Use Cases: - Streaming logs into HIVE via Flume - Streaming results of computations from Storm -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-5687) Streaming support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated HIVE-5687: -- Attachment: 5687-api-spec4.pdf fixing typos in spec v4 Streaming support in Hive - Key: HIVE-5687 URL: https://issues.apache.org/jira/browse/HIVE-5687 Project: Hive Issue Type: Bug Reporter: Roshan Naik Assignee: Roshan Naik Attachments: 5687-api-spec4.pdf, 5687-draft-api-spec.pdf, 5687-draft-api-spec2.pdf, 5687-draft-api-spec3.pdf, HIVE-5687.patch, HIVE-5687.v2.patch Implement support for Streaming data into HIVE. - Provide a client streaming API - Transaction support: Clients should be able to periodically commit a batch of records atomically - Immediate visibility: Records should be immediately visible to queries on commit - Should not overload HDFS with too many small files Use Cases: - Streaming logs into HIVE via Flume - Streaming results of computations from Storm -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-5687) Streaming support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated HIVE-5687: -- Attachment: HIVE-5687.patch Initial patch (depends upon HIVE-5843 HIVE-6060) Streaming support in Hive - Key: HIVE-5687 URL: https://issues.apache.org/jira/browse/HIVE-5687 Project: Hive Issue Type: Bug Reporter: Roshan Naik Assignee: Roshan Naik Attachments: 5687-draft-api-spec.pdf, 5687-draft-api-spec2.pdf, HIVE-5687.patch Implement support for Streaming data into HIVE. - Provide a client streaming API - Transaction support: Clients should be able to periodically commit a batch of records atomically - Immediate visibility: Records should be immediately visible to queries on commit - Should not overload HDFS with too many small files Use Cases: - Streaming logs into HIVE via Flume - Streaming results of computations from Storm -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-5687) Streaming support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated HIVE-5687: -- Attachment: 5687-draft-api-spec3.pdf spec updated to match first draft patch Streaming support in Hive - Key: HIVE-5687 URL: https://issues.apache.org/jira/browse/HIVE-5687 Project: Hive Issue Type: Bug Reporter: Roshan Naik Assignee: Roshan Naik Attachments: 5687-draft-api-spec.pdf, 5687-draft-api-spec2.pdf, 5687-draft-api-spec3.pdf, HIVE-5687.patch Implement support for Streaming data into HIVE. - Provide a client streaming API - Transaction support: Clients should be able to periodically commit a batch of records atomically - Immediate visibility: Records should be immediately visible to queries on commit - Should not overload HDFS with too many small files Use Cases: - Streaming logs into HIVE via Flume - Streaming results of computations from Storm -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6315) MetaStoreDirectSql ctor should not throw
[ https://issues.apache.org/jira/browse/HIVE-6315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13883136#comment-13883136 ] Roshan Naik commented on HIVE-6315: --- I see the following exception with this patch applied: org.datanucleus.api.jdo.exceptions.ClassNotPersistenceCapableException: The class org.apache.hadoop.hive.metastore.model.MVersionTable is not persistable. This means that it either hasnt been enhanced, or that the enhanced version of the file is not in the CLASSPATH (or is hidden by an unenhanced version), or the Meta-Data/annotations for the class are not found. at org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:380) at org.datanucleus.api.jdo.JDOPersistenceManager.jdoMakePersistent(JDOPersistenceManager.java:732) at org.datanucleus.api.jdo.JDOPersistenceManager.makePersistent(JDOPersistenceManager.java:752) at org.apache.hadoop.hive.metastore.ObjectStore.setMetaStoreSchemaVersion(ObjectStore.java:6025) at org.apache.hadoop.hive.metastore.ObjectStore.checkSchema(ObjectStore.java:5935) at org.apache.hadoop.hive.metastore.ObjectStore.verifySchema(ObjectStore.java:5913) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hive.metastore.RetryingRawStore.invoke(RetryingRawStore.java:122) at com.sun.proxy.$Proxy7.verifySchema(Unknown Source) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:389) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:427) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:314) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:274) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.init(RetryingHMSHandler.java:54) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:59) at org.apache.hadoop.hive.metastore.HiveMetaStore.newHMSHandler(HiveMetaStore.java:4175) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.init(HiveMetaStoreClient.java:115) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.init(HiveMetaStoreClient.java:98) at org.apache.hive.streaming.TestStreaming.setup(TestStreaming.java:45) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:27) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:30) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222) at org.junit.runners.ParentRunner.run(ParentRunner.java:300) at org.junit.runners.Suite.runChild(Suite.java:128) at org.junit.runners.Suite.runChild(Suite.java:24) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222) at org.junit.runners.ParentRunner.run(ParentRunner.java:300) at org.junit.runner.JUnitCore.run(JUnitCore.java:157) at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:77) at
[jira] [Updated] (HIVE-5687) Streaming support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated HIVE-5687: -- Attachment: 5687-draft-api-spec2.pdf revising draft spec Streaming support in Hive - Key: HIVE-5687 URL: https://issues.apache.org/jira/browse/HIVE-5687 Project: Hive Issue Type: Bug Reporter: Roshan Naik Assignee: Roshan Naik Attachments: 5687-draft-api-spec.pdf, 5687-draft-api-spec2.pdf Implement support for Streaming data into HIVE. - Provide a client streaming API - Transaction support: Clients should be able to periodically commit a batch of records atomically - Immediate visibility: Records should be immediately visible to queries on commit - Should not overload HDFS with too many small files Use Cases: - Streaming logs into HIVE via Flume - Streaming results of computations from Storm -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-5687) Streaming support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated HIVE-5687: -- Attachment: 5687-draft-api-spec.pdf Attaching draft api spec for comments Streaming support in Hive - Key: HIVE-5687 URL: https://issues.apache.org/jira/browse/HIVE-5687 Project: Hive Issue Type: Bug Reporter: Roshan Naik Assignee: Roshan Naik Attachments: 5687-draft-api-spec.pdf Implement support for Streaming data into HIVE. - Provide a client streaming API - Transaction support: Clients should be able to periodically commit a batch of records atomically - Immediate visibility: Records should be immediately visible to queries on commit - Should not overload HDFS with too many small files Use Cases: - Streaming logs into HIVE via Flume - Streaming results of computations from Storm -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (HIVE-4196) Support for Streaming Partitions in Hive
[ https://issues.apache.org/jira/browse/HIVE-4196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik resolved HIVE-4196. --- Resolution: Won't Fix In view of the HIVE-5317 which brings in insert/update/delete support to Hive, the need for introducing streaming partitions is no longer necessary. Streaming support can be provided with a far less complexity by leveraging HIVE-5317 Support for Streaming Partitions in Hive Key: HIVE-4196 URL: https://issues.apache.org/jira/browse/HIVE-4196 Project: Hive Issue Type: New Feature Components: Database/Schema, HCatalog Affects Versions: 0.10.1 Reporter: Roshan Naik Assignee: Roshan Naik Attachments: HCatalogStreamingIngestFunctionalSpecificationandDesign- apr 29- patch1.docx, HCatalogStreamingIngestFunctionalSpecificationandDesign- apr 29- patch1.pdf, HIVE-4196.v1.patch Motivation: Allow Hive users to immediately query data streaming in through clients such as Flume. Currently Hive partitions must be created after all the data for the partition is available. Thereafter, data in the partitions is considered immutable. This proposal introduces the notion of a streaming partition into which new files an be committed periodically and made available for queries before the partition is closed and converted into a standard partition. The admin enables streaming partition on a table using DDL. He provides the following pieces of information: - Name of the partition in the table on which streaming is enabled - Frequency at which the streaming partition should be closed and converted into a standard partition. Tables with streaming partition enabled will be partitioned by one and only one column. It is assumed that this column will contain a timestamp. Closing the current streaming partition converts it into a standard partition. Based on the specified frequency, the current streaming partition is closed and a new one created for future writes. This is referred to as 'rolling the partition'. A streaming partition's life cycle is as follows: - A new streaming partition is instantiated for writes - Streaming clients request (via webhcat) for a HDFS file name into which they can write a chunk of records for a specific table. - Streaming clients write a chunk (via webhdfs) to that file and commit it(via webhcat). Committing merely indicates that the chunk has been written completely and ready for serving queries. - When the partition is rolled, all committed chunks are swept into single directory and a standard partition pointing to that directory is created. The streaming partition is closed and new streaming partition is created. Rolling the partition is atomic. Streaming clients are agnostic of partition rolling. - Hive queries will be able to query the partition that is currently open for streaming. only committed chunks will be visible. read consistency will be ensured so that repeated reads of the same partition will be idempotent for the lifespan of the query. Partition rolling requires an active agent/thread running to check when it is time to roll and trigger the roll. This could be either be achieved by using an external agent such as Oozie (preferably) or an internal agent. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HIVE-5687) Streaming support in Hive
Roshan Naik created HIVE-5687: - Summary: Streaming support in Hive Key: HIVE-5687 URL: https://issues.apache.org/jira/browse/HIVE-5687 Project: Hive Issue Type: Bug Reporter: Roshan Naik Implement support for Streaming data into HIVE. - Provide a client streaming API - Transaction support: Clients should be able to periodically commit a batch of records atomically - Immediate visibility: Records should be immediately visible to queries on commit - Should not overload HDFS with too many small files Use Cases: - Streaming logs into HIVE via Flume - Streaming results of computational from Storm -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HIVE-5687) Streaming support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated HIVE-5687: -- Description: Implement support for Streaming data into HIVE. - Provide a client streaming API - Transaction support: Clients should be able to periodically commit a batch of records atomically - Immediate visibility: Records should be immediately visible to queries on commit - Should not overload HDFS with too many small files Use Cases: - Streaming logs into HIVE via Flume - Streaming results of computations from Storm was: Implement support for Streaming data into HIVE. - Provide a client streaming API - Transaction support: Clients should be able to periodically commit a batch of records atomically - Immediate visibility: Records should be immediately visible to queries on commit - Should not overload HDFS with too many small files Use Cases: - Streaming logs into HIVE via Flume - Streaming results of computational from Storm Streaming support in Hive - Key: HIVE-5687 URL: https://issues.apache.org/jira/browse/HIVE-5687 Project: Hive Issue Type: Bug Reporter: Roshan Naik Implement support for Streaming data into HIVE. - Provide a client streaming API - Transaction support: Clients should be able to periodically commit a batch of records atomically - Immediate visibility: Records should be immediately visible to queries on commit - Should not overload HDFS with too many small files Use Cases: - Streaming logs into HIVE via Flume - Streaming results of computations from Storm -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Assigned] (HIVE-5687) Streaming support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik reassigned HIVE-5687: - Assignee: Roshan Naik Streaming support in Hive - Key: HIVE-5687 URL: https://issues.apache.org/jira/browse/HIVE-5687 Project: Hive Issue Type: Bug Reporter: Roshan Naik Assignee: Roshan Naik Implement support for Streaming data into HIVE. - Provide a client streaming API - Transaction support: Clients should be able to periodically commit a batch of records atomically - Immediate visibility: Records should be immediately visible to queries on commit - Should not overload HDFS with too many small files Use Cases: - Streaming logs into HIVE via Flume - Streaming results of computations from Storm -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HIVE-4196) Support for Streaming Partitions in Hive
[ https://issues.apache.org/jira/browse/HIVE-4196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13808548#comment-13808548 ] Roshan Naik commented on HIVE-4196: --- Moving the streaming work to a new jira HIVE-5687 since it will be based on a different design. Support for Streaming Partitions in Hive Key: HIVE-4196 URL: https://issues.apache.org/jira/browse/HIVE-4196 Project: Hive Issue Type: New Feature Components: Database/Schema, HCatalog Affects Versions: 0.10.1 Reporter: Roshan Naik Assignee: Roshan Naik Attachments: HCatalogStreamingIngestFunctionalSpecificationandDesign- apr 29- patch1.docx, HCatalogStreamingIngestFunctionalSpecificationandDesign- apr 29- patch1.pdf, HIVE-4196.v1.patch Motivation: Allow Hive users to immediately query data streaming in through clients such as Flume. Currently Hive partitions must be created after all the data for the partition is available. Thereafter, data in the partitions is considered immutable. This proposal introduces the notion of a streaming partition into which new files an be committed periodically and made available for queries before the partition is closed and converted into a standard partition. The admin enables streaming partition on a table using DDL. He provides the following pieces of information: - Name of the partition in the table on which streaming is enabled - Frequency at which the streaming partition should be closed and converted into a standard partition. Tables with streaming partition enabled will be partitioned by one and only one column. It is assumed that this column will contain a timestamp. Closing the current streaming partition converts it into a standard partition. Based on the specified frequency, the current streaming partition is closed and a new one created for future writes. This is referred to as 'rolling the partition'. A streaming partition's life cycle is as follows: - A new streaming partition is instantiated for writes - Streaming clients request (via webhcat) for a HDFS file name into which they can write a chunk of records for a specific table. - Streaming clients write a chunk (via webhdfs) to that file and commit it(via webhcat). Committing merely indicates that the chunk has been written completely and ready for serving queries. - When the partition is rolled, all committed chunks are swept into single directory and a standard partition pointing to that directory is created. The streaming partition is closed and new streaming partition is created. Rolling the partition is atomic. Streaming clients are agnostic of partition rolling. - Hive queries will be able to query the partition that is currently open for streaming. only committed chunks will be visible. read consistency will be ensured so that repeated reads of the same partition will be idempotent for the lifespan of the query. Partition rolling requires an active agent/thread running to check when it is time to roll and trigger the roll. This could be either be achieved by using an external agent such as Oozie (preferably) or an internal agent. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HIVE-5138) Streaming - Web HCat API
[ https://issues.apache.org/jira/browse/HIVE-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13776946#comment-13776946 ] Roshan Naik commented on HIVE-5138: --- Capturing API related comments from [~ashutoshc] noted [here|https://issues.apache.org/jira/browse/HIVE-4196?focusedCommentId=13770314page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13770314] in HIVE-4196 {quote} We should try to eliminate the need of intermediate staging area while rolling on new partitions. Seems like there should not be any gotchas while moving data from streaming dir to partition dir directly. We should make thrift apis in metastore forward compatible. One way to do that is to use struct (which contains all parameters) instead of passing in list of arguments. We should try to leave TBLS table untouched in backend db. That will simplify upgrade story. One way to do that is to have all new columns in a new table and than add constraints for this new table. {quote} Streaming - Web HCat API - Key: HIVE-5138 URL: https://issues.apache.org/jira/browse/HIVE-5138 Project: Hive Issue Type: Sub-task Components: HCatalog, Metastore, WebHCat Reporter: Roshan Naik Assignee: Roshan Naik Attachments: HIVE-4196.v2.patch, HIVE-5138.v1.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4196) Support for Streaming Partitions in Hive
[ https://issues.apache.org/jira/browse/HIVE-4196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13776952#comment-13776952 ] Roshan Naik commented on HIVE-4196: --- Thanks Ashutosh. Since your recommendations apply to subtask HIVE-5138, I have copied ur comments over to it. I will address them there. Support for Streaming Partitions in Hive Key: HIVE-4196 URL: https://issues.apache.org/jira/browse/HIVE-4196 Project: Hive Issue Type: New Feature Components: Database/Schema, HCatalog Affects Versions: 0.10.1 Reporter: Roshan Naik Assignee: Roshan Naik Attachments: HCatalogStreamingIngestFunctionalSpecificationandDesign- apr 29- patch1.docx, HCatalogStreamingIngestFunctionalSpecificationandDesign- apr 29- patch1.pdf, HIVE-4196.v1.patch Motivation: Allow Hive users to immediately query data streaming in through clients such as Flume. Currently Hive partitions must be created after all the data for the partition is available. Thereafter, data in the partitions is considered immutable. This proposal introduces the notion of a streaming partition into which new files an be committed periodically and made available for queries before the partition is closed and converted into a standard partition. The admin enables streaming partition on a table using DDL. He provides the following pieces of information: - Name of the partition in the table on which streaming is enabled - Frequency at which the streaming partition should be closed and converted into a standard partition. Tables with streaming partition enabled will be partitioned by one and only one column. It is assumed that this column will contain a timestamp. Closing the current streaming partition converts it into a standard partition. Based on the specified frequency, the current streaming partition is closed and a new one created for future writes. This is referred to as 'rolling the partition'. A streaming partition's life cycle is as follows: - A new streaming partition is instantiated for writes - Streaming clients request (via webhcat) for a HDFS file name into which they can write a chunk of records for a specific table. - Streaming clients write a chunk (via webhdfs) to that file and commit it(via webhcat). Committing merely indicates that the chunk has been written completely and ready for serving queries. - When the partition is rolled, all committed chunks are swept into single directory and a standard partition pointing to that directory is created. The streaming partition is closed and new streaming partition is created. Rolling the partition is atomic. Streaming clients are agnostic of partition rolling. - Hive queries will be able to query the partition that is currently open for streaming. only committed chunks will be visible. read consistency will be ensured so that repeated reads of the same partition will be idempotent for the lifespan of the query. Partition rolling requires an active agent/thread running to check when it is time to roll and trigger the roll. This could be either be achieved by using an external agent such as Oozie (preferably) or an internal agent. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5138) Streaming - Web HCat API
[ https://issues.apache.org/jira/browse/HIVE-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13776954#comment-13776954 ] Roshan Naik commented on HIVE-5138: --- bq. We should try to eliminate the need of intermediate staging area while rolling on new partitions. Seems like there should not be any gotchas while moving data from streaming dir to partition dir directly. Thanks. That change is already part of the patch. bq. We should make thrift apis in metastore forward compatible. One way to do that is to use struct (which contains all parameters) instead of passing in list of arguments. Hate it .. but Ok. :-) bq. We should try to leave TBLS table untouched in backend db. Sure. Will move them to a new table. Streaming - Web HCat API - Key: HIVE-5138 URL: https://issues.apache.org/jira/browse/HIVE-5138 Project: Hive Issue Type: Sub-task Components: HCatalog, Metastore, WebHCat Reporter: Roshan Naik Assignee: Roshan Naik Attachments: HIVE-4196.v2.patch, HIVE-5138.v1.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5138) Streaming - Web HCat API
[ https://issues.apache.org/jira/browse/HIVE-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated HIVE-5138: -- Attachment: HIVE-5138.v1.patch Patch address comments from Eugene, additional unit test, some additional checks in partitionRoll for better error reporting. Streaming - Web HCat API - Key: HIVE-5138 URL: https://issues.apache.org/jira/browse/HIVE-5138 Project: Hive Issue Type: Sub-task Components: HCatalog, Metastore, WebHCat Reporter: Roshan Naik Assignee: Roshan Naik Attachments: HIVE-4196.v2.patch, HIVE-5138.v1.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5138) Streaming - Web HCat API
[ https://issues.apache.org/jira/browse/HIVE-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769209#comment-13769209 ] Roshan Naik commented on HIVE-5138: --- Thanks [~ekoifman] for the comments: h5. On Pt 1. Thanks. I need to take a closer look at this. h5. On Pt 2. I think you mean 'safe to invoke concurrently' instead of 'atomic', since the intermediate states are going to be visible when an operation spans both file system and meta store. Here is a summary of the reasons why each operation is concurrency safe: - *streamingStatus* : Readonly metastore operation - *chunkGet* : This is an atomic metastore operation - *chunkAbort* : Just deletes a file. So no concurrency issues here. - *chunkCommit* : Just renames a file. So only one of concurrent operations will succeed. - *disableStreaming* : This is an atomic metastore operation - *enableStreaming* : Does a couple of mkdirs (for setup) followed by an atomic metastore operation. mkdirs() is idempotent, so all concurrent calls succeed. All concurrent invocations enter a transaction to do the metastore update atomically...only one should update metastore. - *partitionRoll* : Creates empty dir for the new current partition then atomically updates metastore as follows: -# Make note of this new current partition dir -# Do an addPartition() on the previous current partition. - If concurrent partitionRoll() invocations use same arguments, the addPartition() step will fail on all but one. If arguments are not same in concurrent invocations, they all succeed and updates made by the last invocation to exit the metastore transaction would override the others. Streaming - Web HCat API - Key: HIVE-5138 URL: https://issues.apache.org/jira/browse/HIVE-5138 Project: Hive Issue Type: Sub-task Components: HCatalog, Metastore, WebHCat Reporter: Roshan Naik Assignee: Roshan Naik Attachments: HIVE-4196.v2.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5138) Streaming - Web HCat API
[ https://issues.apache.org/jira/browse/HIVE-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated HIVE-5138: -- Attachment: HIVE-4196.v2.patch Patch v2 is based on git commit version 9e9e711 Streaming - Web HCat API - Key: HIVE-5138 URL: https://issues.apache.org/jira/browse/HIVE-5138 Project: Hive Issue Type: Sub-task Components: HCatalog, Metastore Reporter: Roshan Naik Assignee: Roshan Naik Attachments: HIVE-4196.v2.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5138) Streaming - Web HCat API
[ https://issues.apache.org/jira/browse/HIVE-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760863#comment-13760863 ] Roshan Naik commented on HIVE-5138: --- Patch v2 addresses the review comments from https://issues.apache.org/jira/browse/HIVE-4196?focusedCommentId=13714235page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13714235 Streaming - Web HCat API - Key: HIVE-5138 URL: https://issues.apache.org/jira/browse/HIVE-5138 Project: Hive Issue Type: Sub-task Components: HCatalog, Metastore Reporter: Roshan Naik Assignee: Roshan Naik Attachments: HIVE-4196.v2.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4196) Support for Streaming Partitions in Hive
[ https://issues.apache.org/jira/browse/HIVE-4196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13752946#comment-13752946 ] Roshan Naik commented on HIVE-4196: --- {quote} According to the Hive coding conventions lines should be bounded at 100 characters. Many lines in this patch exceed that. {quote} Will fix the ones which are not in the thrift generated files. {quote} I'm surprised to see that streamingStatus sets the chunk id for the table. {quote} Seems like a bug. Will fix. {quote} The logic at the end of of these functions doesn't look right. Take getNextChunkID for example. If commitTransaction fails (line 2132) rollback will be called but the next chunk id will still be returned. It seems you need a check on success after commit. I realize many of the calls in the class follow this, but it doesn't seem right. {quote} Good catch. At the time I thought commitTxn() will only fail with an exception does not return false. But on closer inspection there is indeed a corner case (if rollBack was called) that it returns false also. Its a bizzare thing for a function to fail with without exceptions. But for now I will fix my code to live with it. {quote} In HiveMetaStoreClient.java, is assert what you want? Are you ok with the validity of the arguments not being checked most of the time?{quote} Not all checks are in place. There is some checks that will happen at lower layers. Some at higher. Will be adding more checks. {quote} I'm trying to figure out whether the chunk files are moved, deleted, or left alone during the partition rolling. {quote} That would depend on whether the table is defined to be an external or internal table. It is essentially an add_partition of the new partition. It calls HiveMetastore.add_partition_core_notxn() inside a transaction. Support for Streaming Partitions in Hive Key: HIVE-4196 URL: https://issues.apache.org/jira/browse/HIVE-4196 Project: Hive Issue Type: New Feature Components: Database/Schema, HCatalog Affects Versions: 0.10.1 Reporter: Roshan Naik Assignee: Roshan Naik Attachments: HCatalogStreamingIngestFunctionalSpecificationandDesign- apr 29- patch1.docx, HCatalogStreamingIngestFunctionalSpecificationandDesign- apr 29- patch1.pdf, HIVE-4196.v1.patch Motivation: Allow Hive users to immediately query data streaming in through clients such as Flume. Currently Hive partitions must be created after all the data for the partition is available. Thereafter, data in the partitions is considered immutable. This proposal introduces the notion of a streaming partition into which new files an be committed periodically and made available for queries before the partition is closed and converted into a standard partition. The admin enables streaming partition on a table using DDL. He provides the following pieces of information: - Name of the partition in the table on which streaming is enabled - Frequency at which the streaming partition should be closed and converted into a standard partition. Tables with streaming partition enabled will be partitioned by one and only one column. It is assumed that this column will contain a timestamp. Closing the current streaming partition converts it into a standard partition. Based on the specified frequency, the current streaming partition is closed and a new one created for future writes. This is referred to as 'rolling the partition'. A streaming partition's life cycle is as follows: - A new streaming partition is instantiated for writes - Streaming clients request (via webhcat) for a HDFS file name into which they can write a chunk of records for a specific table. - Streaming clients write a chunk (via webhdfs) to that file and commit it(via webhcat). Committing merely indicates that the chunk has been written completely and ready for serving queries. - When the partition is rolled, all committed chunks are swept into single directory and a standard partition pointing to that directory is created. The streaming partition is closed and new streaming partition is created. Rolling the partition is atomic. Streaming clients are agnostic of partition rolling. - Hive queries will be able to query the partition that is currently open for streaming. only committed chunks will be visible. read consistency will be ensured so that repeated reads of the same partition will be idempotent for the lifespan of the query. Partition rolling requires an active agent/thread running to check when it is time to roll and trigger the roll. This could be either be achieved by using an external agent such as Oozie (preferably) or an internal agent. -- This message is automatically generated by JIRA. If you think
[jira] [Commented] (HIVE-5107) Change hive's build to maven
[ https://issues.apache.org/jira/browse/HIVE-5107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753233#comment-13753233 ] Roshan Naik commented on HIVE-5107: --- curious .. is ant's 'makepom' task (to convert a ivy file into a pom file) a useful starting point for such an effort ? Change hive's build to maven Key: HIVE-5107 URL: https://issues.apache.org/jira/browse/HIVE-5107 Project: Hive Issue Type: Task Reporter: Edward Capriolo Assignee: Edward Capriolo I can not cope with hive's build infrastructure any more. I have started working on porting the project to maven. When I have some solid progess i will github the entire thing for review. Then we can talk about switching the project somehow. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-5138) Web HCat API for Streaming
Roshan Naik created HIVE-5138: - Summary: Web HCat API for Streaming Key: HIVE-5138 URL: https://issues.apache.org/jira/browse/HIVE-5138 Project: Hive Issue Type: Sub-task Components: HCatalog, Metastore Reporter: Roshan Naik Assignee: Roshan Naik -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-5139) Streaming - DDL support for enabling and disabling streaming
Roshan Naik created HIVE-5139: - Summary: Streaming - DDL support for enabling and disabling streaming Key: HIVE-5139 URL: https://issues.apache.org/jira/browse/HIVE-5139 Project: Hive Issue Type: Sub-task Reporter: Roshan Naik -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5139) Streaming - DDL support for enabling and disabling streaming
[ https://issues.apache.org/jira/browse/HIVE-5139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated HIVE-5139: -- Assignee: Roshan Naik Streaming - DDL support for enabling and disabling streaming Key: HIVE-5139 URL: https://issues.apache.org/jira/browse/HIVE-5139 Project: Hive Issue Type: Sub-task Components: Database/Schema, HCatalog Reporter: Roshan Naik Assignee: Roshan Naik Labels: ddl, streaming -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5139) Streaming - DDL support for enabling and disabling streaming
[ https://issues.apache.org/jira/browse/HIVE-5139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated HIVE-5139: -- Labels: ddl streaming (was: ) Streaming - DDL support for enabling and disabling streaming Key: HIVE-5139 URL: https://issues.apache.org/jira/browse/HIVE-5139 Project: Hive Issue Type: Sub-task Components: Database/Schema, HCatalog Reporter: Roshan Naik Labels: ddl, streaming -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-5140) Streaming - Active agent for rolling a streaming partition into a standard partition
Roshan Naik created HIVE-5140: - Summary: Streaming - Active agent for rolling a streaming partition into a standard partition Key: HIVE-5140 URL: https://issues.apache.org/jira/browse/HIVE-5140 Project: Hive Issue Type: Sub-task Reporter: Roshan Naik Assignee: Roshan Naik The task is to implement an entity which rolls all the committed transactions from the streaming partition into a new standard partition atomically. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5138) Streaming- Web HCat API
[ https://issues.apache.org/jira/browse/HIVE-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated HIVE-5138: -- Summary: Streaming- Web HCat API (was: Web HCat API for Streaming) Streaming- Web HCat API Key: HIVE-5138 URL: https://issues.apache.org/jira/browse/HIVE-5138 Project: Hive Issue Type: Sub-task Components: HCatalog, Metastore Reporter: Roshan Naik Assignee: Roshan Naik -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5139) Streaming - DDL support for enabling and disabling streaming
[ https://issues.apache.org/jira/browse/HIVE-5139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13748149#comment-13748149 ] Roshan Naik commented on HIVE-5139: --- Task is to implement support for enabling and disabling streaming functionality on a Hive table via DDL. Streaming - DDL support for enabling and disabling streaming Key: HIVE-5139 URL: https://issues.apache.org/jira/browse/HIVE-5139 Project: Hive Issue Type: Sub-task Components: Database/Schema, HCatalog Reporter: Roshan Naik Assignee: Roshan Naik Labels: ddl, streaming -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5138) Streaming- Web HCat API
[ https://issues.apache.org/jira/browse/HIVE-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13748165#comment-13748165 ] Roshan Naik commented on HIVE-5138: --- Implement Webhcat API to: 1) Enable and Disable streaming on a table 2) Check streaming status 3) Transaction Support: - Get a Chunk File - Commit a Chunk File - Abort the chunk 4) Roll Partition: To roll the committed chunks from streaming partition to a new standard partition Streaming- Web HCat API Key: HIVE-5138 URL: https://issues.apache.org/jira/browse/HIVE-5138 Project: Hive Issue Type: Sub-task Components: HCatalog, Metastore Reporter: Roshan Naik Assignee: Roshan Naik -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5138) Streaming - Web HCat API
[ https://issues.apache.org/jira/browse/HIVE-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated HIVE-5138: -- Summary: Streaming - Web HCat API (was: Streaming- Web HCat API) Streaming - Web HCat API - Key: HIVE-5138 URL: https://issues.apache.org/jira/browse/HIVE-5138 Project: Hive Issue Type: Sub-task Components: HCatalog, Metastore Reporter: Roshan Naik Assignee: Roshan Naik -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5142) Streaming - Query committed chunks
[ https://issues.apache.org/jira/browse/HIVE-5142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated HIVE-5142: -- Assignee: Roshan Naik Streaming - Query committed chunks -- Key: HIVE-5142 URL: https://issues.apache.org/jira/browse/HIVE-5142 Project: Hive Issue Type: Sub-task Components: Database/Schema, HCatalog Reporter: Roshan Naik Assignee: Roshan Naik Task is to enable queries to read through the chunks committed into the streaming partition. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-5142) Streaming - Query committed chunks
Roshan Naik created HIVE-5142: - Summary: Streaming - Query committed chunks Key: HIVE-5142 URL: https://issues.apache.org/jira/browse/HIVE-5142 Project: Hive Issue Type: Sub-task Reporter: Roshan Naik Task is to enable queries to read through the chunks committed into the streaming partition. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-5143) Streaming - Compaction of partitions
Roshan Naik created HIVE-5143: - Summary: Streaming - Compaction of partitions Key: HIVE-5143 URL: https://issues.apache.org/jira/browse/HIVE-5143 Project: Hive Issue Type: Sub-task Reporter: Roshan Naik Assignee: Roshan Naik Task is to support compaction of partitions. Rationale: Streaming partitions are composed of a large number of small files (each commit is one file). Since compaction can be a potentially expensive operation (for e.g. converting to single ORC file), we do not compact the streaming partition at the time of rolling it into a standard partition. This allows rolling to be quick and atomic. Compaction will be performed at a later time. The streaming partition is converted as is (typically with a many small files) into a standard partition. This new standard partition will be queued up for compaction by a separate job. This decouples the compaction feature from streaming support, and makes it more generally available for any partitions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4196) Support for Streaming Partitions in Hive
[ https://issues.apache.org/jira/browse/HIVE-4196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated HIVE-4196: -- Attachment: HCatalogStreamingIngestFunctionalSpecificationandDesign- apr 29- patch1.pdf pdf version of design spec doc Support for Streaming Partitions in Hive Key: HIVE-4196 URL: https://issues.apache.org/jira/browse/HIVE-4196 Project: Hive Issue Type: New Feature Components: Database/Schema, HCatalog Affects Versions: 0.10.1 Reporter: Roshan Naik Assignee: Roshan Naik Attachments: HCatalogStreamingIngestFunctionalSpecificationandDesign- apr 29- patch1.docx, HCatalogStreamingIngestFunctionalSpecificationandDesign- apr 29- patch1.pdf, HIVE-4196.v1.patch Motivation: Allow Hive users to immediately query data streaming in through clients such as Flume. Currently Hive partitions must be created after all the data for the partition is available. Thereafter, data in the partitions is considered immutable. This proposal introduces the notion of a streaming partition into which new files an be committed periodically and made available for queries before the partition is closed and converted into a standard partition. The admin enables streaming partition on a table using DDL. He provides the following pieces of information: - Name of the partition in the table on which streaming is enabled - Frequency at which the streaming partition should be closed and converted into a standard partition. Tables with streaming partition enabled will be partitioned by one and only one column. It is assumed that this column will contain a timestamp. Closing the current streaming partition converts it into a standard partition. Based on the specified frequency, the current streaming partition is closed and a new one created for future writes. This is referred to as 'rolling the partition'. A streaming partition's life cycle is as follows: - A new streaming partition is instantiated for writes - Streaming clients request (via webhcat) for a HDFS file name into which they can write a chunk of records for a specific table. - Streaming clients write a chunk (via webhdfs) to that file and commit it(via webhcat). Committing merely indicates that the chunk has been written completely and ready for serving queries. - When the partition is rolled, all committed chunks are swept into single directory and a standard partition pointing to that directory is created. The streaming partition is closed and new streaming partition is created. Rolling the partition is atomic. Streaming clients are agnostic of partition rolling. - Hive queries will be able to query the partition that is currently open for streaming. only committed chunks will be visible. read consistency will be ensured so that repeated reads of the same partition will be idempotent for the lifespan of the query. Partition rolling requires an active agent/thread running to check when it is time to roll and trigger the roll. This could be either be achieved by using an external agent such as Oozie (preferably) or an internal agent. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4196) Support for Streaming Partitions in Hive
[ https://issues.apache.org/jira/browse/HIVE-4196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated HIVE-4196: -- Attachment: HIVE-4196.v1.patch draft patch for review. based on phase mentioned in design doc. Deviates slighlty... 1) adds a couple of (temporary) rest calls to enable/disable streaming on a table. Later these will be replaced with support in DDL. 2) Also also HTTP methods are GET for easy testing with web browser 3) Authentication disabled on the new streaming HTTP methods Usage Examples on db named 'sdb' table named 'log' : 1) *Setup db table with single partition column 'date':* hcat -e create database sdb; use sdb; create table log(msg string, region string) partitioned by (date string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' STORED AS TEXTFILE; 2) *To check streaming status:* http://localhost:50111/templeton/v1/streaming/status?database=sdbtable=log 3) *Enable Streaming:* http://localhost:50111/templeton/v1/streaming/enable?database=sdbtable=logcol=datevalue=1000 4) *Get Chunk File to write to:* http://localhost:50111/templeton/v1/streaming/chunkget?database=sdbtable=logschema=blahformat=blahrecord_separator=blahfield_separator=blah 5) *Commit Chunk File:* http://localhost:50111/templeton/v1/streaming/chunkcommit?database=sdbtable=logchunkfile=/user/hive/streaming/tmp/sdb/log/2 6) *Abort Chunk File:* http://localhost:50111/templeton/v1/streaming/chunkabort?database=sdbtable=logchunkfile=/user/hive/streaming/tmp/sdb/log/3 7) *Roll Partition:* http://localhost:50111/templeton/v1/streaming/partitionroll?database=sdbtable=logpartition_column=datepartition_value=3000 Support for Streaming Partitions in Hive Key: HIVE-4196 URL: https://issues.apache.org/jira/browse/HIVE-4196 Project: Hive Issue Type: New Feature Components: Database/Schema, HCatalog Affects Versions: 0.10.1 Reporter: Roshan Naik Assignee: Roshan Naik Attachments: HCatalogStreamingIngestFunctionalSpecificationandDesign.docx, HIVE-4196.v1.patch Motivation: Allow Hive users to immediately query data streaming in through clients such as Flume. Currently Hive partitions must be created after all the data for the partition is available. Thereafter, data in the partitions is considered immutable. This proposal introduces the notion of a streaming partition into which new files an be committed periodically and made available for queries before the partition is closed and converted into a standard partition. The admin enables streaming partition on a table using DDL. He provides the following pieces of information: - Name of the partition in the table on which streaming is enabled - Frequency at which the streaming partition should be closed and converted into a standard partition. Tables with streaming partition enabled will be partitioned by one and only one column. It is assumed that this column will contain a timestamp. Closing the current streaming partition converts it into a standard partition. Based on the specified frequency, the current streaming partition is closed and a new one created for future writes. This is referred to as 'rolling the partition'. A streaming partition's life cycle is as follows: - A new streaming partition is instantiated for writes - Streaming clients request (via webhcat) for a HDFS file name into which they can write a chunk of records for a specific table. - Streaming clients write a chunk (via webhdfs) to that file and commit it(via webhcat). Committing merely indicates that the chunk has been written completely and ready for serving queries. - When the partition is rolled, all committed chunks are swept into single directory and a standard partition pointing to that directory is created. The streaming partition is closed and new streaming partition is created. Rolling the partition is atomic. Streaming clients are agnostic of partition rolling. - Hive queries will be able to query the partition that is currently open for streaming. only committed chunks will be visible. read consistency will be ensured so that repeated reads of the same partition will be idempotent for the lifespan of the query. Partition rolling requires an active agent/thread running to check when it is time to roll and trigger the roll. This could be either be achieved by using an external agent such as Oozie (preferably) or an internal agent. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4196) Support for Streaming Partitions in Hive
[ https://issues.apache.org/jira/browse/HIVE-4196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated HIVE-4196: -- Attachment: (was: HCatalogStreamingIngestFunctionalSpecificationandDesign.docx) Support for Streaming Partitions in Hive Key: HIVE-4196 URL: https://issues.apache.org/jira/browse/HIVE-4196 Project: Hive Issue Type: New Feature Components: Database/Schema, HCatalog Affects Versions: 0.10.1 Reporter: Roshan Naik Assignee: Roshan Naik Attachments: HCatalogStreamingIngestFunctionalSpecificationandDesign- apr 29- patch1.docx, HIVE-4196.v1.patch Motivation: Allow Hive users to immediately query data streaming in through clients such as Flume. Currently Hive partitions must be created after all the data for the partition is available. Thereafter, data in the partitions is considered immutable. This proposal introduces the notion of a streaming partition into which new files an be committed periodically and made available for queries before the partition is closed and converted into a standard partition. The admin enables streaming partition on a table using DDL. He provides the following pieces of information: - Name of the partition in the table on which streaming is enabled - Frequency at which the streaming partition should be closed and converted into a standard partition. Tables with streaming partition enabled will be partitioned by one and only one column. It is assumed that this column will contain a timestamp. Closing the current streaming partition converts it into a standard partition. Based on the specified frequency, the current streaming partition is closed and a new one created for future writes. This is referred to as 'rolling the partition'. A streaming partition's life cycle is as follows: - A new streaming partition is instantiated for writes - Streaming clients request (via webhcat) for a HDFS file name into which they can write a chunk of records for a specific table. - Streaming clients write a chunk (via webhdfs) to that file and commit it(via webhcat). Committing merely indicates that the chunk has been written completely and ready for serving queries. - When the partition is rolled, all committed chunks are swept into single directory and a standard partition pointing to that directory is created. The streaming partition is closed and new streaming partition is created. Rolling the partition is atomic. Streaming clients are agnostic of partition rolling. - Hive queries will be able to query the partition that is currently open for streaming. only committed chunks will be visible. read consistency will be ensured so that repeated reads of the same partition will be idempotent for the lifespan of the query. Partition rolling requires an active agent/thread running to check when it is time to roll and trigger the roll. This could be either be achieved by using an external agent such as Oozie (preferably) or an internal agent. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4196) Support for Streaming Partitions in Hive
[ https://issues.apache.org/jira/browse/HIVE-4196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated HIVE-4196: -- Attachment: HCatalogStreamingIngestFunctionalSpecificationandDesign- apr 29- patch1.docx Support for Streaming Partitions in Hive Key: HIVE-4196 URL: https://issues.apache.org/jira/browse/HIVE-4196 Project: Hive Issue Type: New Feature Components: Database/Schema, HCatalog Affects Versions: 0.10.1 Reporter: Roshan Naik Assignee: Roshan Naik Attachments: HCatalogStreamingIngestFunctionalSpecificationandDesign- apr 29- patch1.docx, HIVE-4196.v1.patch Motivation: Allow Hive users to immediately query data streaming in through clients such as Flume. Currently Hive partitions must be created after all the data for the partition is available. Thereafter, data in the partitions is considered immutable. This proposal introduces the notion of a streaming partition into which new files an be committed periodically and made available for queries before the partition is closed and converted into a standard partition. The admin enables streaming partition on a table using DDL. He provides the following pieces of information: - Name of the partition in the table on which streaming is enabled - Frequency at which the streaming partition should be closed and converted into a standard partition. Tables with streaming partition enabled will be partitioned by one and only one column. It is assumed that this column will contain a timestamp. Closing the current streaming partition converts it into a standard partition. Based on the specified frequency, the current streaming partition is closed and a new one created for future writes. This is referred to as 'rolling the partition'. A streaming partition's life cycle is as follows: - A new streaming partition is instantiated for writes - Streaming clients request (via webhcat) for a HDFS file name into which they can write a chunk of records for a specific table. - Streaming clients write a chunk (via webhdfs) to that file and commit it(via webhcat). Committing merely indicates that the chunk has been written completely and ready for serving queries. - When the partition is rolled, all committed chunks are swept into single directory and a standard partition pointing to that directory is created. The streaming partition is closed and new streaming partition is created. Rolling the partition is atomic. Streaming clients are agnostic of partition rolling. - Hive queries will be able to query the partition that is currently open for streaming. only committed chunks will be visible. read consistency will be ensured so that repeated reads of the same partition will be idempotent for the lifespan of the query. Partition rolling requires an active agent/thread running to check when it is time to roll and trigger the roll. This could be either be achieved by using an external agent such as Oozie (preferably) or an internal agent. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4300) ant thriftif generated code that is checkedin is not up-to-date
[ https://issues.apache.org/jira/browse/HIVE-4300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13638318#comment-13638318 ] Roshan Naik commented on HIVE-4300: --- Namit, Those files are no longer part of patch v2. The files you point out got updated by the HIVE-4322 which got committed first. ant thriftif generated code that is checkedin is not up-to-date Key: HIVE-4300 URL: https://issues.apache.org/jira/browse/HIVE-4300 Project: Hive Issue Type: Bug Components: Thrift API Affects Versions: 0.10.0 Reporter: Roshan Naik Assignee: Roshan Naik Attachments: HIVE-4300.2.patch, HIVE-4300.patch running 'ant thriftif -Dthrift.home=/usr/local' on a freshly checkedout trunk should be a no-op as per [instructions|https://cwiki.apache.org/Hive/howtocontribute.html#HowToContribute-GeneratingThriftCode] However this is not the case. Some of files seem to be have been relocated or the classes in them are now in a different file. Below is the git status showing the state after the command is run: # On branch trunk # Changes not staged for commit: # (use git add/rm file... to update what will be committed) # (use git checkout -- file... to discard changes in working directory) # # modified: build.properties # modified: metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Database.java # modified: metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/EnvironmentContext.java # modified: metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Index.java # modified: metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Partition.java # modified: metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/PrincipalPrivilegeSet.java # modified: metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Schema.java # modified: metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/SerDeInfo.java # modified: metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/SkewedInfo.java # modified: metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/StorageDescriptor.java # modified: metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Table.java # modified: metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ThriftHiveMetastore.java # deleted:metastore/src/gen/thrift/gen-php/ThriftHiveMetastore.php # deleted: metastore/src/gen/thrift/gen-php/hive_metastore/ThriftHiveMetastore.php # deleted: metastore/src/gen/thrift/gen-php/hive_metastore/hive_metastore_constants.php # deleted: metastore/src/gen/thrift/gen-php/hive_metastore/hive_metastore_types.php # deleted: metastore/src/gen/thrift/gen-php/hive_metastore_constants.php # deleted:metastore/src/gen/thrift/gen-php/hive_metastore_types.php # modified: metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore-remote # deleted:ql/src/gen/thrift/gen-php/queryplan/queryplan_types.php # modified: serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde/test/InnerStruct.java # modified: serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde/test/ThriftTestObj.java # modified: serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/Complex.java # modified: serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/IntString.java # modified: serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/MegaStruct.java # modified: serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/MiniStruct.java # deleted:serde/src/gen/thrift/gen-php/serde/serde_constants.php # deleted:serde/src/gen/thrift/gen-php/serde/serde_types.php # deleted:service/src/gen/thrift/gen-php/hive_service/ThriftHive.php # deleted: service/src/gen/thrift/gen-php/hive_service/hive_service_types.php # modified: service/src/gen/thrift/gen-py/TCLIService/TCLIService-remote # modified: service/src/gen/thrift/gen-py/hive_service/ThriftHive-remote # # Untracked files: # (use git add file... to include in what will be committed) # # serde/src/gen/thrift/gen-cpp/complex_constants.cpp # serde/src/gen/thrift/gen-cpp/complex_constants.h # serde/src/gen/thrift/gen-cpp/complex_types.cpp # serde/src/gen/thrift/gen-cpp/complex_types.h # serde/src/gen/thrift/gen-cpp/megastruct_constants.cpp #
[jira] [Commented] (HIVE-4300) ant thriftif generated code that is checkedin is not up-to-date
[ https://issues.apache.org/jira/browse/HIVE-4300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13638750#comment-13638750 ] Roshan Naik commented on HIVE-4300: --- just that much. ant thriftif generated code that is checkedin is not up-to-date Key: HIVE-4300 URL: https://issues.apache.org/jira/browse/HIVE-4300 Project: Hive Issue Type: Bug Components: Thrift API Affects Versions: 0.10.0 Reporter: Roshan Naik Assignee: Roshan Naik Attachments: HIVE-4300.2.patch, HIVE-4300.patch running 'ant thriftif -Dthrift.home=/usr/local' on a freshly checkedout trunk should be a no-op as per [instructions|https://cwiki.apache.org/Hive/howtocontribute.html#HowToContribute-GeneratingThriftCode] However this is not the case. Some of files seem to be have been relocated or the classes in them are now in a different file. Below is the git status showing the state after the command is run: # On branch trunk # Changes not staged for commit: # (use git add/rm file... to update what will be committed) # (use git checkout -- file... to discard changes in working directory) # # modified: build.properties # modified: metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Database.java # modified: metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/EnvironmentContext.java # modified: metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Index.java # modified: metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Partition.java # modified: metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/PrincipalPrivilegeSet.java # modified: metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Schema.java # modified: metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/SerDeInfo.java # modified: metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/SkewedInfo.java # modified: metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/StorageDescriptor.java # modified: metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Table.java # modified: metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ThriftHiveMetastore.java # deleted:metastore/src/gen/thrift/gen-php/ThriftHiveMetastore.php # deleted: metastore/src/gen/thrift/gen-php/hive_metastore/ThriftHiveMetastore.php # deleted: metastore/src/gen/thrift/gen-php/hive_metastore/hive_metastore_constants.php # deleted: metastore/src/gen/thrift/gen-php/hive_metastore/hive_metastore_types.php # deleted: metastore/src/gen/thrift/gen-php/hive_metastore_constants.php # deleted:metastore/src/gen/thrift/gen-php/hive_metastore_types.php # modified: metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore-remote # deleted:ql/src/gen/thrift/gen-php/queryplan/queryplan_types.php # modified: serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde/test/InnerStruct.java # modified: serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde/test/ThriftTestObj.java # modified: serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/Complex.java # modified: serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/IntString.java # modified: serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/MegaStruct.java # modified: serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/MiniStruct.java # deleted:serde/src/gen/thrift/gen-php/serde/serde_constants.php # deleted:serde/src/gen/thrift/gen-php/serde/serde_types.php # deleted:service/src/gen/thrift/gen-php/hive_service/ThriftHive.php # deleted: service/src/gen/thrift/gen-php/hive_service/hive_service_types.php # modified: service/src/gen/thrift/gen-py/TCLIService/TCLIService-remote # modified: service/src/gen/thrift/gen-py/hive_service/ThriftHive-remote # # Untracked files: # (use git add file... to include in what will be committed) # # serde/src/gen/thrift/gen-cpp/complex_constants.cpp # serde/src/gen/thrift/gen-cpp/complex_constants.h # serde/src/gen/thrift/gen-cpp/complex_types.cpp # serde/src/gen/thrift/gen-cpp/complex_types.h # serde/src/gen/thrift/gen-cpp/megastruct_constants.cpp # serde/src/gen/thrift/gen-cpp/megastruct_constants.h # serde/src/gen/thrift/gen-cpp/megastruct_types.cpp #
[jira] [Commented] (HIVE-4300) ant thriftif generated code that is checkedin is not up-to-date
[ https://issues.apache.org/jira/browse/HIVE-4300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13633119#comment-13633119 ] Roshan Naik commented on HIVE-4300: --- FYI.. HIVE-4322 makes manual changes to auto generated code. This will be a maintenance headache. I have incorporated those changes into this as part of the rebasing in patch v2. ant thriftif generated code that is checkedin is not up-to-date Key: HIVE-4300 URL: https://issues.apache.org/jira/browse/HIVE-4300 Project: Hive Issue Type: Bug Components: Thrift API Affects Versions: 0.10.0 Reporter: Roshan Naik Assignee: Roshan Naik Attachments: HIVE-4300.2.patch, HIVE-4300.patch running 'ant thriftif -Dthrift.home=/usr/local' on a freshly checkedout trunk should be a no-op as per [instructions|https://cwiki.apache.org/Hive/howtocontribute.html#HowToContribute-GeneratingThriftCode] However this is not the case. Some of files seem to be have been relocated or the classes in them are now in a different file. Below is the git status showing the state after the command is run: # On branch trunk # Changes not staged for commit: # (use git add/rm file... to update what will be committed) # (use git checkout -- file... to discard changes in working directory) # # modified: build.properties # modified: metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Database.java # modified: metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/EnvironmentContext.java # modified: metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Index.java # modified: metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Partition.java # modified: metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/PrincipalPrivilegeSet.java # modified: metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Schema.java # modified: metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/SerDeInfo.java # modified: metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/SkewedInfo.java # modified: metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/StorageDescriptor.java # modified: metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Table.java # modified: metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ThriftHiveMetastore.java # deleted:metastore/src/gen/thrift/gen-php/ThriftHiveMetastore.php # deleted: metastore/src/gen/thrift/gen-php/hive_metastore/ThriftHiveMetastore.php # deleted: metastore/src/gen/thrift/gen-php/hive_metastore/hive_metastore_constants.php # deleted: metastore/src/gen/thrift/gen-php/hive_metastore/hive_metastore_types.php # deleted: metastore/src/gen/thrift/gen-php/hive_metastore_constants.php # deleted:metastore/src/gen/thrift/gen-php/hive_metastore_types.php # modified: metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore-remote # deleted:ql/src/gen/thrift/gen-php/queryplan/queryplan_types.php # modified: serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde/test/InnerStruct.java # modified: serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde/test/ThriftTestObj.java # modified: serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/Complex.java # modified: serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/IntString.java # modified: serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/MegaStruct.java # modified: serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/MiniStruct.java # deleted:serde/src/gen/thrift/gen-php/serde/serde_constants.php # deleted:serde/src/gen/thrift/gen-php/serde/serde_types.php # deleted:service/src/gen/thrift/gen-php/hive_service/ThriftHive.php # deleted: service/src/gen/thrift/gen-php/hive_service/hive_service_types.php # modified: service/src/gen/thrift/gen-py/TCLIService/TCLIService-remote # modified: service/src/gen/thrift/gen-py/hive_service/ThriftHive-remote # # Untracked files: # (use git add file... to include in what will be committed) # # serde/src/gen/thrift/gen-cpp/complex_constants.cpp # serde/src/gen/thrift/gen-cpp/complex_constants.h # serde/src/gen/thrift/gen-cpp/complex_types.cpp # serde/src/gen/thrift/gen-cpp/complex_types.h #