[jira] [Updated] (HIVE-2390) Expand support for union types
[ https://issues.apache.org/jira/browse/HIVE-2390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-2390: - Labels: TODOC14 uniontype (was: uniontype) Expand support for union types -- Key: HIVE-2390 URL: https://issues.apache.org/jira/browse/HIVE-2390 Project: Hive Issue Type: Bug Affects Versions: 0.13.1 Reporter: Jakob Homan Assignee: Suma Shivaprasad Labels: TODOC14, uniontype Fix For: 0.14.0 Attachments: HIVE-2390.1.patch, HIVE-2390.patch When the union type was introduced, full support for it wasn't provided. For instance, when working with a union that gets passed to LazyBinarySerde: {noformat}Caused by: java.lang.RuntimeException: Unrecognized type: UNION at org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serialize(LazyBinarySerDe.java:468) at org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serializeStruct(LazyBinarySerDe.java:230) at org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serialize(LazyBinarySerDe.java:184) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8019) Missing commit from trunk : `export/import statement update`
[ https://issues.apache.org/jira/browse/HIVE-8019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-8019: Attachment: HIVE-8019.2.patch HIVE-8019.2.patch - fixes test failures Missing commit from trunk : `export/import statement update` Key: HIVE-8019 URL: https://issues.apache.org/jira/browse/HIVE-8019 Project: Hive Issue Type: Bug Components: Import/Export Affects Versions: 0.14.0 Reporter: Mohit Sabharwal Assignee: Thejas M Nair Priority: Blocker Attachments: HIVE-8019.1.patch, HIVE-8019.2.patch Noticed that commit 1882de7810fc55a2466dd4cbe74ed67bb41cb667 exists in 0.13 branch, but not it trunk. https://github.com/apache/hive/commit/1882de7810fc55a2466dd4cbe74ed67bb41cb667 {code} (trunk) $ git branch -a --contains 1882de7810fc55a2466dd4cbe74ed67bb41cb667 remotes/origin/branch-0.13 {code} I looked through some of the changes in this commit and don't see those in trunk. Nor do I see a commit that reverts these changes in trunk. [~thejas], should we port this over to trunk ? Thanks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7694) SMB join on tables differing by number of sorted by columns with same join prefix fails
[ https://issues.apache.org/jira/browse/HIVE-7694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated HIVE-7694: -- Release Note: SMB join on tables differing by number of sorted by columns with same join prefix (was: I just committed this. Thanks Suma!) SMB join on tables differing by number of sorted by columns with same join prefix fails --- Key: HIVE-7694 URL: https://issues.apache.org/jira/browse/HIVE-7694 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.13.1 Reporter: Suma Shivaprasad Assignee: Suma Shivaprasad Fix For: 0.14.0 Attachments: HIVE-7694.1.patch, HIVE-7694.2.patch, HIVE-7694.patch For eg: If two tables T1 sorted by (a, b, c) clustered by a and T2 sorted by (a) and clustered by (a) are joined, the following exception is seen {noformat} 14/08/11 09:09:38 ERROR ql.Driver: FAILED: IndexOutOfBoundsException Index: 1, Size: 1 java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 at java.util.ArrayList.RangeCheck(ArrayList.java:547) at java.util.ArrayList.get(ArrayList.java:322) at org.apache.hadoop.hive.ql.optimizer.AbstractSMBJoinProc.checkSortColsAndJoinCols(AbstractSMBJoinProc.java:378) at org.apache.hadoop.hive.ql.optimizer.AbstractSMBJoinProc.isEligibleForBucketSortMergeJoin(AbstractSMBJoinProc.java:352) at org.apache.hadoop.hive.ql.optimizer.AbstractSMBJoinProc.canConvertBucketMapJoinToSMBJoin(AbstractSMBJoinProc.java:119) at org.apache.hadoop.hive.ql.optimizer.SortedMergeBucketMapjoinProc.process(SortedMergeBucketMapjoinProc.java:51) at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:78) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:132) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:109) at org.apache.hadoop.hive.ql.optimizer.SortedMergeBucketMapJoinOptimizer.transform(SortedMergeBucketMapJoinOptimizer.java:109) at org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:146) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:9305) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327) at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:64) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:393) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7694) SMB join on tables differing by number of sorted by columns with same join prefix fails
[ https://issues.apache.org/jira/browse/HIVE-7694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated HIVE-7694: -- Resolution: Fixed Release Note: I just committed this. Thanks Suma! Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) SMB join on tables differing by number of sorted by columns with same join prefix fails --- Key: HIVE-7694 URL: https://issues.apache.org/jira/browse/HIVE-7694 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.13.1 Reporter: Suma Shivaprasad Assignee: Suma Shivaprasad Fix For: 0.14.0 Attachments: HIVE-7694.1.patch, HIVE-7694.2.patch, HIVE-7694.patch For eg: If two tables T1 sorted by (a, b, c) clustered by a and T2 sorted by (a) and clustered by (a) are joined, the following exception is seen {noformat} 14/08/11 09:09:38 ERROR ql.Driver: FAILED: IndexOutOfBoundsException Index: 1, Size: 1 java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 at java.util.ArrayList.RangeCheck(ArrayList.java:547) at java.util.ArrayList.get(ArrayList.java:322) at org.apache.hadoop.hive.ql.optimizer.AbstractSMBJoinProc.checkSortColsAndJoinCols(AbstractSMBJoinProc.java:378) at org.apache.hadoop.hive.ql.optimizer.AbstractSMBJoinProc.isEligibleForBucketSortMergeJoin(AbstractSMBJoinProc.java:352) at org.apache.hadoop.hive.ql.optimizer.AbstractSMBJoinProc.canConvertBucketMapJoinToSMBJoin(AbstractSMBJoinProc.java:119) at org.apache.hadoop.hive.ql.optimizer.SortedMergeBucketMapjoinProc.process(SortedMergeBucketMapjoinProc.java:51) at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:78) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:132) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:109) at org.apache.hadoop.hive.ql.optimizer.SortedMergeBucketMapJoinOptimizer.transform(SortedMergeBucketMapJoinOptimizer.java:109) at org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:146) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:9305) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327) at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:64) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:393) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7694) SMB join on tables differing by number of sorted by columns with same join prefix fails
[ https://issues.apache.org/jira/browse/HIVE-7694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14128116#comment-14128116 ] Amareshwari Sriramadasu commented on HIVE-7694: --- I just committed this. Thanks Suma! SMB join on tables differing by number of sorted by columns with same join prefix fails --- Key: HIVE-7694 URL: https://issues.apache.org/jira/browse/HIVE-7694 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.13.1 Reporter: Suma Shivaprasad Assignee: Suma Shivaprasad Fix For: 0.14.0 Attachments: HIVE-7694.1.patch, HIVE-7694.2.patch, HIVE-7694.patch For eg: If two tables T1 sorted by (a, b, c) clustered by a and T2 sorted by (a) and clustered by (a) are joined, the following exception is seen {noformat} 14/08/11 09:09:38 ERROR ql.Driver: FAILED: IndexOutOfBoundsException Index: 1, Size: 1 java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 at java.util.ArrayList.RangeCheck(ArrayList.java:547) at java.util.ArrayList.get(ArrayList.java:322) at org.apache.hadoop.hive.ql.optimizer.AbstractSMBJoinProc.checkSortColsAndJoinCols(AbstractSMBJoinProc.java:378) at org.apache.hadoop.hive.ql.optimizer.AbstractSMBJoinProc.isEligibleForBucketSortMergeJoin(AbstractSMBJoinProc.java:352) at org.apache.hadoop.hive.ql.optimizer.AbstractSMBJoinProc.canConvertBucketMapJoinToSMBJoin(AbstractSMBJoinProc.java:119) at org.apache.hadoop.hive.ql.optimizer.SortedMergeBucketMapjoinProc.process(SortedMergeBucketMapjoinProc.java:51) at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:78) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:132) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:109) at org.apache.hadoop.hive.ql.optimizer.SortedMergeBucketMapJoinOptimizer.transform(SortedMergeBucketMapJoinOptimizer.java:109) at org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:146) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:9305) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327) at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:64) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:393) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 25468: HIVE-7777: add CSVSerde support
On Sept. 9, 2014, 3:07 p.m., Brock Noland wrote: serde/pom.xml, line 73 https://reviews.apache.org/r/25468/diff/1/?file=683466#file683466line73 These should only be indented by two spaces, not four. Have you tried submitting an MR job on a cluster with this patch? The reason I ask is that I think the serde must be in here: https://github.com/apache/hive/blob/trunk/ql/pom.xml#L563 for it to be available to MR jobs. I think it does not need to add the class alone because org.apache.hive:hive-serde was already included. BTW, I do a test as the following steps: (1) create a table with the csv format: create table csv_table(a string, b string) row format serde 'org.apache.hadoop.hive.serde2.OpenCSVSerde' WITH SERDEPROPERTIES( separatorChar = ,, quoteChar = ', escapeChar= \ ) stored as textfile; (2) load data by: load data local inpath /root/workspace/data overwrite into table csv_table; (3) cat /root/workspace/data: aa,bb dd,cc (4) select a from csv_table: +-+--+ | a | +-+--+ | aa | | dd | +-+--+ If I am missing anything, please help figure it out. Thanks! - cheng --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25468/#review52723 --- On Sept. 9, 2014, 2:16 a.m., cheng xu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25468/ --- (Updated Sept. 9, 2014, 2:16 a.m.) Review request for hive. Bugs: HIVE- https://issues.apache.org/jira/browse/HIVE- Repository: hive-git Description --- HIVE-: add CSVSerde support Diffs - serde/pom.xml f8bcc830cfb298d739819db8fbaa2f98f221ccf3 serde/src/java/org/apache/hadoop/hive/serde2/CSVSerde.java PRE-CREATION serde/src/test/org/apache/hadoop/hive/serde2/TestCSVSerde.java PRE-CREATION Diff: https://reviews.apache.org/r/25468/diff/ Testing --- Unit test Thanks, cheng xu
Re: Remove hive.metastore.metadb.dir from HiveConf.java?
Nevermind, it's already been done by HIVE-1879 https://issues.apache.org/jira/browse/HIVE-1879. Sorry about the spam. Thanks Lars. -- Lefty On Wed, Sep 10, 2014 at 1:35 AM, Lefty Leverenz leftylever...@gmail.com wrote: Lars Francke updated the Metastore Admin doc https://cwiki.apache.org/confluence/display/Hive/AdminManual+MetastoreAdmin#AdminManualMetastoreAdmin-AdditionalConfigurationParameters as follows: hive.metastore.metadb.dir The location of filestore metadata base directory. (Functionality removed in 0.4.0 with HIVE-143 https://issues.apache.org/jira/browse/HIVE-143) But hive.metastore.metadb.dir still exists in HiveConf.java. As I'm making various other fixes to HiveConf.java in HIVE-6586 https://issues.apache.org/jira/browse/HIVE-6586, should I remove this obsolete parameter? -- Lefty
[jira] [Updated] (HIVE-2390) Add UNIONTYPE serialization support to LazyBinarySerDe
[ https://issues.apache.org/jira/browse/HIVE-2390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-2390: - Summary: Add UNIONTYPE serialization support to LazyBinarySerDe (was: Expand support for union types) Add UNIONTYPE serialization support to LazyBinarySerDe -- Key: HIVE-2390 URL: https://issues.apache.org/jira/browse/HIVE-2390 Project: Hive Issue Type: Bug Affects Versions: 0.13.1 Reporter: Jakob Homan Assignee: Suma Shivaprasad Labels: TODOC14, uniontype Fix For: 0.14.0 Attachments: HIVE-2390.1.patch, HIVE-2390.patch When the union type was introduced, full support for it wasn't provided. For instance, when working with a union that gets passed to LazyBinarySerde: {noformat}Caused by: java.lang.RuntimeException: Unrecognized type: UNION at org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serialize(LazyBinarySerDe.java:468) at org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serializeStruct(LazyBinarySerDe.java:230) at org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serialize(LazyBinarySerDe.java:184) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-2390) Add UNIONTYPE serialization support to LazyBinarySerDe
[ https://issues.apache.org/jira/browse/HIVE-2390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14128124#comment-14128124 ] Carl Steinbach commented on HIVE-2390: -- I updated the description of this ticket to accurately reflect the change that was made in this patch. My impression is that this patch doesn't really change the situation in Hive with respect to UNIONTYPEs -- this feature is still unusable. If I'm wrong about this I would appreciate someone setting me straight. Add UNIONTYPE serialization support to LazyBinarySerDe -- Key: HIVE-2390 URL: https://issues.apache.org/jira/browse/HIVE-2390 Project: Hive Issue Type: Bug Affects Versions: 0.13.1 Reporter: Jakob Homan Assignee: Suma Shivaprasad Labels: TODOC14, uniontype Fix For: 0.14.0 Attachments: HIVE-2390.1.patch, HIVE-2390.patch When the union type was introduced, full support for it wasn't provided. For instance, when working with a union that gets passed to LazyBinarySerde: {noformat}Caused by: java.lang.RuntimeException: Unrecognized type: UNION at org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serialize(LazyBinarySerDe.java:468) at org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serializeStruct(LazyBinarySerDe.java:230) at org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serialize(LazyBinarySerDe.java:184) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7086) TestHiveServer2.testConnection is failing on trunk
[ https://issues.apache.org/jira/browse/HIVE-7086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14128134#comment-14128134 ] Vaibhav Gumashta commented on HIVE-7086: [~ashutoshc] The failed test looks flaky. Does this look good now? TestHiveServer2.testConnection is failing on trunk -- Key: HIVE-7086 URL: https://issues.apache.org/jira/browse/HIVE-7086 Project: Hive Issue Type: Test Components: HiveServer2, JDBC Affects Versions: 0.14.0 Reporter: Ashutosh Chauhan Assignee: Vaibhav Gumashta Fix For: 0.14.0 Attachments: HIVE-7086.1.patch, HIVE-7086.2.patch, HIVE-7086.3.patch Able to repro locally on fresh checkout -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7892) Thrift Set type not working with Hive
[ https://issues.apache.org/jira/browse/HIVE-7892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14128138#comment-14128138 ] Amareshwari Sriramadasu commented on HIVE-7892: --- Code changes look fine. Can you update the test output for convert_enum_to_string.q and upload the patch? Thrift Set type not working with Hive - Key: HIVE-7892 URL: https://issues.apache.org/jira/browse/HIVE-7892 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Reporter: Satish Mittal Assignee: Satish Mittal Attachments: HIVE-7892.patch.txt Thrift supports List, Map and Struct complex types, which get mapped to Array, Map and Struct complex types in Hive respectively. However thrift Set type doesn't seem to be working. Here is an example thrift struct: {noformat} namespace java sample.thrift struct setrow { 1: required seti32 ids, 2: required string name, } {noformat} A Hive table is created with ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.thrift.ThriftDeserializer' WITH SERDEPROPERTIES ('serialization.class'='sample.thrift.setrow', 'serialization.format'='org.apache.thrift.protocol.TBinaryProtocol'). Describing the table shows: {noformat} hive describe settable; OK ids structfrom deserializer namestringfrom deserializer {noformat} Issuing a select query on set column throws SemanticException: {noformat} hive select ids from settable; FAILED: SemanticException java.lang.IllegalArgumentException: Error: name expected at the position 7 of 'struct' but '' is found. {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 25468: HIVE-7777: add CSVSerde support
On Sept. 9, 2014, 8:49 a.m., Lars Francke wrote: Looks good apart from minor comments. Maybe add a test for the Serialization part? https://issues.apache.org/jira/browse/HIVE-5976 integration might be nice: STORED AS CSV. Unfortunately there's no documentation yet so I'm not sure if it's feasible. Good point! Why not file it in a new jira ticket as a future work? - cheng --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25468/#review52688 --- On Sept. 9, 2014, 2:16 a.m., cheng xu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25468/ --- (Updated Sept. 9, 2014, 2:16 a.m.) Review request for hive. Bugs: HIVE- https://issues.apache.org/jira/browse/HIVE- Repository: hive-git Description --- HIVE-: add CSVSerde support Diffs - serde/pom.xml f8bcc830cfb298d739819db8fbaa2f98f221ccf3 serde/src/java/org/apache/hadoop/hive/serde2/CSVSerde.java PRE-CREATION serde/src/test/org/apache/hadoop/hive/serde2/TestCSVSerde.java PRE-CREATION Diff: https://reviews.apache.org/r/25468/diff/ Testing --- Unit test Thanks, cheng xu
[jira] [Updated] (HIVE-7777) add CSV support for Serde
[ https://issues.apache.org/jira/browse/HIVE-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdinand Xu updated HIVE-: --- Attachment: HIVE-.1.patch add CSV support for Serde - Key: HIVE- URL: https://issues.apache.org/jira/browse/HIVE- Project: Hive Issue Type: Bug Components: Serializers/Deserializers Reporter: Ferdinand Xu Assignee: Ferdinand Xu Attachments: HIVE-.1.patch, HIVE-.patch, csv-serde-master.zip There is no official support for csvSerde for hive while there is an open source project in github(https://github.com/ogrodnek/csv-serde). CSV is of high frequency in use as a data format. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7935) Support dynamic service discovery for HiveServer2
[ https://issues.apache.org/jira/browse/HIVE-7935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14128148#comment-14128148 ] Lefty Leverenz commented on HIVE-7935: -- +1 for parameter descriptions in HiveConf.java (although I'm surprised to see parameter values represented in the form $\{hive.param.xyz\}). Support dynamic service discovery for HiveServer2 - Key: HIVE-7935 URL: https://issues.apache.org/jira/browse/HIVE-7935 Project: Hive Issue Type: New Feature Components: HiveServer2, JDBC Affects Versions: 0.14.0 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Fix For: 0.14.0 Attachments: HIVE-7935.1.patch, HIVE-7935.2.patch, HIVE-7935.3.patch To support Rolling Upgrade / HA, we need a mechanism by which a JDBC client can dynamically resolve an HiveServer2 to connect to. *High Level Design:* Whether, dynamic service discovery is supported or not, can be configured by setting HIVE_SERVER2_SUPPORT_DYNAMIC_SERVICE_DISCOVERY. ZooKeeper is used to support this. * When an instance of HiveServer2 comes up, it adds itself as a znode to ZooKeeper under a configurable namespace (HIVE_SERVER2_ZOOKEEPER_NAMESPACE). * A JDBC/ODBC client now specifies the ZooKeeper ensemble in its connection string, instead of pointing to a specific HiveServer2 instance. The JDBC driver, uses the ZooKeeper ensemble to pick an instance of HiveServer2 to connect for the entire session. * When an instance is removed from ZooKeeper, the existing client sessions continue till completion. When the last client session completes, the instance shuts down. * All new client connection pick one of the available HiveServer2 uris from ZooKeeper. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8022) Recursive root scratch directory creation is not using hdfs umask properly
[ https://issues.apache.org/jira/browse/HIVE-8022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14128152#comment-14128152 ] Vaibhav Gumashta commented on HIVE-8022: Just ran the failed test - it looks like a flaky test which has been failing on other precommits. I'll commit this tomorrow. Thanks for the review [~thejas]]! Recursive root scratch directory creation is not using hdfs umask properly --- Key: HIVE-8022 URL: https://issues.apache.org/jira/browse/HIVE-8022 Project: Hive Issue Type: Bug Components: CLI, HiveServer2 Affects Versions: 0.14.0 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Fix For: 0.14.0 Attachments: HIVE-8022.1.patch, HIVE-8022.2.patch, HIVE-8022.3.patch Changes made in HIVE-6847 removed the helper methods that were added HIVE-7001 to get around this problem. Since the root scratch dir must be writable by all, its creation should use those methods. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8030) NullPointerException on getSchemas
[ https://issues.apache.org/jira/browse/HIVE-8030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14128153#comment-14128153 ] Lars Francke commented on HIVE-8030: I had a typo in my comment from yesterday. I meant that it looks very similar to HIVE-2069. Here's the code as of version 0.13.1: https://github.com/apache/hive/blob/release-0.13.1/jdbc/src/java/org/apache/hadoop/hive/jdbc/HiveMetaDataResultSet.java#L32 As you can see there's no ArrayList at line 32 and even if there were all of them are guarded by null checks. Are you 100% sure you are using Hive 0.13.1? NullPointerException on getSchemas -- Key: HIVE-8030 URL: https://issues.apache.org/jira/browse/HIVE-8030 Project: Hive Issue Type: Bug Components: Database/Schema, JDBC Affects Versions: 0.13.1 Environment: Linux (Ubuntu 12.04) Reporter: Shiv Prakash Labels: hadoop Fix For: 0.13.1 java.lang.NullPointerException at java.util.ArrayList.init(ArrayList.java:164) at org.apache.hadoop.hive.jdbc.HiveMetaDataResultSet.init(HiveMetaDataResultSet.java:32) at org.apache.hadoop.hive.jdbc.HiveDatabaseMetaData$3.init(HiveDatabaseMetaData.java:482) at org.apache.hadoop.hive.jdbc.HiveDatabaseMetaData.getSchemas(HiveDatabaseMetaData.java:481) at org.apache.hadoop.hive.jdbc.HiveDatabaseMetaData.getSchemas(HiveDatabaseMetaData.java:476) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.pentaho.hadoop.shim.common.DriverProxyInvocationChain$DatabaseMetaDataInvocationHandler.invoke(DriverProxyInvocationChain.java:368) at com.sun.proxy.$Proxy20.getSchemas(Unknown Source) at org.pentaho.di.core.database.Database.getSchemas(Database.java:3857) at org.pentaho.di.ui.trans.steps.tableoutput.TableOutputDialog.getSchemaNames(TableOutputDialog.java:1036) at org.pentaho.di.ui.trans.steps.tableoutput.TableOutputDialog.access$2400(TableOutputDialog.java:94) at org.pentaho.di.ui.trans.steps.tableoutput.TableOutputDialog$24.widgetSelected(TableOutputDialog.java:863) at org.eclipse.swt.widgets.TypedListener.handleEvent(Unknown Source) at org.eclipse.swt.widgets.EventTable.sendEvent(Unknown Source) at org.eclipse.swt.widgets.Widget.sendEvent(Unknown Source) at org.eclipse.swt.widgets.Display.runDeferredEvents(Unknown Source) at org.eclipse.swt.widgets.Display.readAndDispatch(Unknown Source) at org.pentaho.di.ui.trans.steps.tableoutput.TableOutputDialog.open(TableOutputDialog.java:884) at org.pentaho.di.ui.spoon.delegates.SpoonStepsDelegate.editStep(SpoonStepsDelegate.java:124) at org.pentaho.di.ui.spoon.Spoon.editStep(Spoon.java:8648) at org.pentaho.di.ui.spoon.trans.TransGraph.editStep(TransGraph.java:3020) at org.pentaho.di.ui.spoon.trans.TransGraph.mouseDoubleClick(TransGraph.java:737) at org.eclipse.swt.widgets.TypedListener.handleEvent(Unknown Source) at org.eclipse.swt.widgets.EventTable.sendEvent(Unknown Source) at org.eclipse.swt.widgets.Widget.sendEvent(Unknown Source) at org.eclipse.swt.widgets.Display.runDeferredEvents(Unknown Source) at org.eclipse.swt.widgets.Display.readAndDispatch(Unknown Source) at org.pentaho.di.ui.spoon.Spoon.readAndDispatch(Spoon.java:1297) at org.pentaho.di.ui.spoon.Spoon.waitForDispose(Spoon.java:7801) at org.pentaho.di.ui.spoon.Spoon.start(Spoon.java:9130) at org.pentaho.di.ui.spoon.Spoon.main(Spoon.java:638) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.pentaho.commons.launcher.Launcher.main(Launcher.java:151) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-6747) TestEmbeddedThriftBinaryCLIService.testExecuteStatementAsync is failing
[ https://issues.apache.org/jira/browse/HIVE-6747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta resolved HIVE-6747. Resolution: Duplicate Duplicate. TestEmbeddedThriftBinaryCLIService.testExecuteStatementAsync is failing --- Key: HIVE-6747 URL: https://issues.apache.org/jira/browse/HIVE-6747 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-952) Support analytic NTILE function
[ https://issues.apache.org/jira/browse/HIVE-952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-952: Fix Version/s: 0.11.0 Support analytic NTILE function --- Key: HIVE-952 URL: https://issues.apache.org/jira/browse/HIVE-952 Project: Hive Issue Type: New Feature Components: OLAP, Query Processor, UDF Reporter: Carl Steinbach Fix For: 0.11.0 The NTILE function divides a set of ordered rows into equally sized buckets and assigns a bucket number to each row. Useful for calculating tertiles, quartiles, quintiles, etc. Example: {code:sql} SELECT last_name, salary, NTILE(4) OVER (ORDER BY salary DESC) AS quartile FROM employees WHERE department_id = 100; {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7892) Thrift Set type not working with Hive
[ https://issues.apache.org/jira/browse/HIVE-7892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Satish Mittal updated HIVE-7892: Attachment: HIVE-7892.patch.1.txt Attaching updated patch. The test convert_enum_to_string.q works with existing MegaStruct thrift table, which contains set columns with older description. Fixed the description. Thrift Set type not working with Hive - Key: HIVE-7892 URL: https://issues.apache.org/jira/browse/HIVE-7892 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Reporter: Satish Mittal Assignee: Satish Mittal Attachments: HIVE-7892.patch.1.txt, HIVE-7892.patch.txt Thrift supports List, Map and Struct complex types, which get mapped to Array, Map and Struct complex types in Hive respectively. However thrift Set type doesn't seem to be working. Here is an example thrift struct: {noformat} namespace java sample.thrift struct setrow { 1: required seti32 ids, 2: required string name, } {noformat} A Hive table is created with ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.thrift.ThriftDeserializer' WITH SERDEPROPERTIES ('serialization.class'='sample.thrift.setrow', 'serialization.format'='org.apache.thrift.protocol.TBinaryProtocol'). Describing the table shows: {noformat} hive describe settable; OK ids structfrom deserializer namestringfrom deserializer {noformat} Issuing a select query on set column throws SemanticException: {noformat} hive select ids from settable; FAILED: SemanticException java.lang.IllegalArgumentException: Error: name expected at the position 7 of 'struct' but '' is found. {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 25473: Thrift Set type not working with Hive
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25473/ --- (Updated Sept. 10, 2014, 7:03 a.m.) Review request for hive, Amareshwari Sriramadasu, Ashutosh Chauhan, and Navis Ryu. Changes --- The test convert_enum_to_string.q works with existing MegaStruct thrift table, which contains set columns with older description. Fixed the columns description in the updated patch. Bugs: HIVE-7892 https://issues.apache.org/jira/browse/HIVE-7892 Repository: hive-git Description --- Thrift supports List, Map and Struct complex types, which get mapped to Array, Map and Struct complex types in Hive respectively. However thrift Set type doesn't get mapped to any Hive type, and hence doesn't work with ThriftDeserializer serde. Diffs (updated) - ql/src/test/results/beelinepositive/convert_enum_to_string.q.out 24acdcd ql/src/test/results/clientpositive/convert_enum_to_string.q.out a1ef04f serde/if/test/complex.thrift 308b64c serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/SetIntString.java PRE-CREATION serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorFactory.java 9a226b3 serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/StandardListObjectInspector.java 6eb8803 serde/src/test/org/apache/hadoop/hive/serde2/objectinspector/TestThriftObjectInspectors.java 5f692fb Diff: https://reviews.apache.org/r/25473/diff/ Testing --- 1) Added Unit test along with the fix. 2) Manually tested by creating a table with ThriftDeserializer serde and having thrift set columns: a) described the table b) issued query to select the set column Thanks, Satish Mittal
Re: Timeline for release of Hive 0.14
Hi, Can you please include HIVE-7892 (Thrift Set type not working with Hive) as well? It is under code review. Regards, Satish On Tue, Sep 9, 2014 at 2:10 PM, Suma Shivaprasad sumasai.shivapra...@gmail.com wrote: Please include https://issues.apache.org/jira/browse/HIVE-7694 as well. It is currently under review by Amareshwari and should be done in the next couple of days. Thanks Suma On Mon, Sep 8, 2014 at 5:44 PM, Alan Gates ga...@hortonworks.com wrote: I'll review that. I just need the time to test it against mysql, oracle, and hopefully sqlserver. But I think we can do this post branch if we need to, as it's a bug fix rather than a feature. Alan. Damien Carol dca...@blitzbs.com September 8, 2014 at 3:19 Same request for https://issues.apache.org/jira/browse/HIVE-7689 I already provided a patch, re-based it many times and I'm waiting for a review. Regards, Le 08/09/2014 12:08, amareshwarisr . a écrit : amareshwarisr . amareshw...@gmail.com September 8, 2014 at 3:08 Would like to include https://issues.apache.org/jira/browse/HIVE-2390 and https://issues.apache.org/jira/browse/HIVE-7936 . I can review and merge them. Thanks Amareshwari Vikram Dixit vik...@hortonworks.com September 5, 2014 at 17:53 Hi Folks, I am going to start consolidating the items mentioned in this list and create a wiki page to track it. I will wait till the end of next week to create the branch taking into account Ashutosh's request. Thanks Vikram. On Fri, Sep 5, 2014 at 5:39 PM, Ashutosh Chauhan hashut...@apache.org hashut...@apache.org Ashutosh Chauhan hashut...@apache.org September 5, 2014 at 17:39 Vikram, Some of us are working on stabilizing cbo branch and trying to get it merged into trunk. We feel we are close. May I request to defer cutting the branch for few more days? Folks interested in this can track our progress here : https://issues.apache.org/jira/browse/HIVE-7946 Thanks, Ashutosh On Fri, Aug 22, 2014 at 4:09 PM, Lars Francke lars.fran...@gmail.com lars.fran...@gmail.com Lars Francke lars.fran...@gmail.com August 22, 2014 at 16:09 Thank you for volunteering to do the release. I think a 0.14 release is a good idea. I have a couple of issues I'd like to get in too: * Either HIVE-7107[0] (Fix an issue in the HiveServer1 JDBC driver) or HIVE-6977[1] (Delete HiveServer1). The former needs a review the latter a patch * HIVE-6123[2] Checkstyle in Maven needs a review HIVE-7622[3] HIVE-7543[4] are waiting for any reviews or comments on my previous thread[5]. I'd still appreciate any helpers for reviews or even just comments. I'd feel very sad if I had done all that work for nothing. Hoping this thread gives me a wider audience. Both patches fix up issues that should have been caught in earlier reviews as they are almost all Checkstyle or other style violations but they make for huge patches. I could also create hundreds of small issues or stop doing these things entirely [0] https://issues.apache.org/jira/browse/HIVE-7107 https://issues.apache.org/jira/browse/HIVE-7107 [1] https://issues.apache.org/jira/browse/HIVE-6977 https://issues.apache.org/jira/browse/HIVE-6977 [2] https://issues.apache.org/jira/browse/HIVE-6123 https://issues.apache.org/jira/browse/HIVE-6123 [3] https://issues.apache.org/jira/browse/HIVE-7622 https://issues.apache.org/jira/browse/HIVE-7622 [4] https://issues.apache.org/jira/browse/HIVE-7543 https://issues.apache.org/jira/browse/HIVE-7543 On Fri, Aug 22, 2014 at 11:01 PM, John Pullokkaran -- Sent with Postbox http://www.getpostbox.com CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- _ The information contained in this communication is intended solely for the use of the individual or entity to whom it is addressed and others authorized to receive it. It may contain confidential or legally privileged information. If you are not the intended recipient you are hereby notified that any disclosure, copying, distribution or taking any action in reliance on the contents of this information is strictly prohibited and may be unlawful. If you have received this communication in error, please notify us immediately by responding to this email and then delete
[jira] [Commented] (HIVE-7950) StorageHandler resources aren't added to Tez Session if already Session is already Open
[ https://issues.apache.org/jira/browse/HIVE-7950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14128178#comment-14128178 ] Hive QA commented on HIVE-7950: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12667581/HIVE-7950.2.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 6194 tests executed *Failed tests:* {noformat} org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/719/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/719/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-719/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12667581 StorageHandler resources aren't added to Tez Session if already Session is already Open --- Key: HIVE-7950 URL: https://issues.apache.org/jira/browse/HIVE-7950 Project: Hive Issue Type: Bug Components: StorageHandler, Tez Reporter: Josh Elser Assignee: Josh Elser Fix For: 0.14.0 Attachments: HIVE-7950-1.diff, HIVE-7950.2.patch, hive-7950-tez-WIP.diff Was trying to run some queries using the AccumuloStorageHandler when using the Tez execution engine. Some things that classes which were added to tmpjars weren't making it into the container. When a Tez Session is already open, as is the normal case when simply using the `hive` command, the resources aren't added. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 23352: Support non-constant expressions for MAP type indices.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/23352/#review52831 --- I'm really looking forward to this. Thanks for working on it! ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java https://reviews.apache.org/r/23352/#comment91998 I suggest stating here that only Integers are supported. Currently only integers are supported for array indexes or something like that - Lars Francke On July 9, 2014, 6:57 a.m., Navis Ryu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/23352/ --- (Updated July 9, 2014, 6:57 a.m.) Review request for hive. Bugs: HIVE-7325 https://issues.apache.org/jira/browse/HIVE-7325 Repository: hive-git Description --- Here is my sample: {code} CREATE TABLE RECORD(RecordID string, BatchDate string, Country string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES (hbase.columns.mapping = :key,D:BatchDate,D:Country) TBLPROPERTIES (hbase.table.name = RECORD); CREATE TABLE KEY_RECORD(KeyValue String, RecordId mapstring,string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES (hbase.columns.mapping = :key, K:) TBLPROPERTIES (hbase.table.name = KEY_RECORD); {code} The following join statement doesn't work. {code} SELECT a.*, b.* from KEY_RECORD a join RECORD b WHERE a.RecordId[b.RecordID] is not null; {code} FAILED: SemanticException 2:16 Non-constant expression for map indexes not supported. Error encountered near token 'RecordID' Diffs - ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java 9889cfe ql/src/java/org/apache/hadoop/hive/ql/parse/TypeCheckProcFactory.java e44f5ae ql/src/test/queries/clientpositive/array_map_access_nonconstant.q PRE-CREATION ql/src/test/queries/negative/invalid_list_index.q c40f079 ql/src/test/queries/negative/invalid_list_index2.q 99d0b3d ql/src/test/queries/negative/invalid_map_index2.q 5828f07 ql/src/test/results/clientpositive/array_map_access_nonconstant.q.out PRE-CREATION ql/src/test/results/compiler/errors/invalid_list_index.q.out a4179cd ql/src/test/results/compiler/errors/invalid_list_index2.q.out aaa9455 ql/src/test/results/compiler/errors/invalid_map_index2.q.out edc9bda serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/PrimitiveObjectInspectorUtils.java 5ccacf1 Diff: https://reviews.apache.org/r/23352/diff/ Testing --- Thanks, Navis Ryu
[jira] [Updated] (HIVE-7704) Create tez task for fast file merging
[ https://issues.apache.org/jira/browse/HIVE-7704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-7704: - Attachment: HIVE-7704.9.patch Hopefully will fix the test case. Create tez task for fast file merging - Key: HIVE-7704 URL: https://issues.apache.org/jira/browse/HIVE-7704 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Attachments: HIVE-7704.1.patch, HIVE-7704.2.patch, HIVE-7704.3.patch, HIVE-7704.4.patch, HIVE-7704.4.patch, HIVE-7704.5.patch, HIVE-7704.6.patch, HIVE-7704.7.patch, HIVE-7704.8.patch, HIVE-7704.9.patch Currently tez falls back to MR task for merge file task. It will beneficial to convert the merge file tasks to tez task to make use of the performance gains from tez. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7405) Vectorize GROUP BY on the Reduce-Side (Part 1 – Basic)
[ https://issues.apache.org/jira/browse/HIVE-7405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7405: --- Resolution: Fixed Fix Version/s: 0.14.0 Status: Resolved (was: Patch Available) Committed to trunk. Thanks, Matt! Vectorize GROUP BY on the Reduce-Side (Part 1 – Basic) -- Key: HIVE-7405 URL: https://issues.apache.org/jira/browse/HIVE-7405 Project: Hive Issue Type: Sub-task Components: Vectorization Reporter: Matt McCline Assignee: Matt McCline Fix For: 0.14.0 Attachments: HIVE-7405.1.patch, HIVE-7405.2.patch, HIVE-7405.3.patch, HIVE-7405.4.patch, HIVE-7405.5.patch, HIVE-7405.6.patch, HIVE-7405.7.patch, HIVE-7405.8.patch, HIVE-7405.9.patch, HIVE-7405.91.patch, HIVE-7405.92.patch, HIVE-7405.93.patch, HIVE-7405.94.patch, HIVE-7405.95.patch, HIVE-7405.96.patch, HIVE-7405.97.patch, HIVE-7405.98.patch, HIVE-7405.99.patch, HIVE-7405.991.patch, HIVE-7405.994.patch, HIVE-7405.995.patch, HIVE-7405.996.patch Vectorize the basic case that does not have any count distinct aggregation. Add a 4th processing mode in VectorGroupByOperator for reduce where each input VectorizedRowBatch has only values for one key at a time. Thus, the values in the batch can be aggregated quickly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-6550) SemanticAnalyzer.reset() doesn't clear all the state
[ https://issues.apache.org/jira/browse/HIVE-6550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-6550: --- Resolution: Fixed Fix Version/s: 0.14.0 Status: Resolved (was: Patch Available) Committed to trunk. Thanks, Sergey! SemanticAnalyzer.reset() doesn't clear all the state Key: HIVE-6550 URL: https://issues.apache.org/jira/browse/HIVE-6550 Project: Hive Issue Type: Bug Affects Versions: 0.12.0, 0.13.0, 0.13.1 Reporter: Laljo John Pullokkaran Assignee: Sergey Shelukhin Fix For: 0.14.0 Attachments: HIVE-6550.01.patch, HIVE-6550.02.patch, HIVE-6550.03.patch, HIVE-6550.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-6550) SemanticAnalyzer.reset() doesn't clear all the state
[ https://issues.apache.org/jira/browse/HIVE-6550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-6550: --- Component/s: Query Processor SemanticAnalyzer.reset() doesn't clear all the state Key: HIVE-6550 URL: https://issues.apache.org/jira/browse/HIVE-6550 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.12.0, 0.13.0, 0.13.1 Reporter: Laljo John Pullokkaran Assignee: Sergey Shelukhin Fix For: 0.14.0 Attachments: HIVE-6550.01.patch, HIVE-6550.02.patch, HIVE-6550.03.patch, HIVE-6550.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7689) Enable Postgres as METASTORE back-end
[ https://issues.apache.org/jira/browse/HIVE-7689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14128192#comment-14128192 ] Damien Carol commented on HIVE-7689: Tests errors are not related to this patch. Enable Postgres as METASTORE back-end - Key: HIVE-7689 URL: https://issues.apache.org/jira/browse/HIVE-7689 Project: Hive Issue Type: Improvement Components: Metastore Affects Versions: 0.14.0 Reporter: Damien Carol Assignee: Damien Carol Priority: Minor Labels: metastore, postgres Fix For: 0.14.0 Attachments: HIVE-7689.5.patch, HIVE-7689.6.patch, HIVE-7689.7.patch, HIVE-7889.1.patch, HIVE-7889.2.patch, HIVE-7889.3.patch, HIVE-7889.4.patch I maintain few patches to make Metastore works with Postgres back end in our production environment. The main goal of this JIRA is to push upstream these patches. This patch enable LOCKS, COMPACTION and fix error in STATS on postgres metastore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7405) Vectorize GROUP BY on the Reduce-Side (Part 1 – Basic)
[ https://issues.apache.org/jira/browse/HIVE-7405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-7405: - Labels: TODOC14 (was: ) Vectorize GROUP BY on the Reduce-Side (Part 1 – Basic) -- Key: HIVE-7405 URL: https://issues.apache.org/jira/browse/HIVE-7405 Project: Hive Issue Type: Sub-task Components: Vectorization Reporter: Matt McCline Assignee: Matt McCline Labels: TODOC14 Fix For: 0.14.0 Attachments: HIVE-7405.1.patch, HIVE-7405.2.patch, HIVE-7405.3.patch, HIVE-7405.4.patch, HIVE-7405.5.patch, HIVE-7405.6.patch, HIVE-7405.7.patch, HIVE-7405.8.patch, HIVE-7405.9.patch, HIVE-7405.91.patch, HIVE-7405.92.patch, HIVE-7405.93.patch, HIVE-7405.94.patch, HIVE-7405.95.patch, HIVE-7405.96.patch, HIVE-7405.97.patch, HIVE-7405.98.patch, HIVE-7405.99.patch, HIVE-7405.991.patch, HIVE-7405.994.patch, HIVE-7405.995.patch, HIVE-7405.996.patch Vectorize the basic case that does not have any count distinct aggregation. Add a 4th processing mode in VectorGroupByOperator for reduce where each input VectorizedRowBatch has only values for one key at a time. Thus, the values in the batch can be aggregated quickly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7818) Support boolean PPD for ORC
[ https://issues.apache.org/jira/browse/HIVE-7818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14128194#comment-14128194 ] Prasanth J commented on HIVE-7818: -- Tested this patch with a small dataset. It works fine. +1 Support boolean PPD for ORC --- Key: HIVE-7818 URL: https://issues.apache.org/jira/browse/HIVE-7818 Project: Hive Issue Type: Improvement Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.14.0 Attachments: HIVE-7818.1.patch Currently ORC does collect stats for boolean field. However, the boolean stats is not range based, instead, it collects counts of true records. RecordReaderImpl.evaluatePredicate currently only deals with range based stats, we need to improve it to deal with the boolean stats. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 25468: HIVE-7777: add CSVSerde support
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25468/#review52832 --- serde/src/java/org/apache/hadoop/hive/serde2/OpenCSVSerde.java https://reviews.apache.org/r/25468/#comment91999 Thanks for moving these out. Could you make them static? serde/src/java/org/apache/hadoop/hive/serde2/OpenCSVSerde.java https://reviews.apache.org/r/25468/#comment92000 no need to wrap this serde/src/java/org/apache/hadoop/hive/serde2/OpenCSVSerde.java https://reviews.apache.org/r/25468/#comment92001 missing spaces serde/src/test/org/apache/hadoop/hive/serde2/TestOpenCSVSerde.java https://reviews.apache.org/r/25468/#comment92003 There's a couple more puts in this file that can be replaced with setProperty - Lars Francke On Sept. 9, 2014, 2:16 a.m., cheng xu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25468/ --- (Updated Sept. 9, 2014, 2:16 a.m.) Review request for hive. Bugs: HIVE- https://issues.apache.org/jira/browse/HIVE- Repository: hive-git Description --- HIVE-: add CSVSerde support Diffs - pom.xml 8973c2b52d0797d1f34859951de7349f7e5b996f serde/pom.xml f8bcc830cfb298d739819db8fbaa2f98f221ccf3 serde/src/java/org/apache/hadoop/hive/serde2/OpenCSVSerde.java PRE-CREATION serde/src/test/org/apache/hadoop/hive/serde2/TestOpenCSVSerde.java PRE-CREATION Diff: https://reviews.apache.org/r/25468/diff/ Testing --- Unit test Thanks, cheng xu
Re: Review Request 25468: HIVE-7777: add CSVSerde support
On Sept. 9, 2014, 8:49 a.m., Lars Francke wrote: serde/src/java/org/apache/hadoop/hive/serde2/CSVSerde.java, line 151 https://reviews.apache.org/r/25468/diff/1/?file=683467#file683467line151 I don't quite get this comment. Looking at the two CSVReader constructors they seem to do the same in this case. From how I understand it this if-statement is not needed. Same for the newWriter method. Maybe I'm missing something? cheng xu wrote: The CSVParser will do a check work if the separator, quotechar or escape is the same. If so, it will throw an exception. For this reason, we have to replace with CSVParser.DEFAULT_ESCAPE_CHARACTER('\') if the escape is DEFAULT_ESCAPE_CHARACTER(''). Ahh! I see now. That's a bit weird indeed. Thanks for the explanation. - Lars --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25468/#review52688 --- On Sept. 9, 2014, 2:16 a.m., cheng xu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25468/ --- (Updated Sept. 9, 2014, 2:16 a.m.) Review request for hive. Bugs: HIVE- https://issues.apache.org/jira/browse/HIVE- Repository: hive-git Description --- HIVE-: add CSVSerde support Diffs - pom.xml 8973c2b52d0797d1f34859951de7349f7e5b996f serde/pom.xml f8bcc830cfb298d739819db8fbaa2f98f221ccf3 serde/src/java/org/apache/hadoop/hive/serde2/OpenCSVSerde.java PRE-CREATION serde/src/test/org/apache/hadoop/hive/serde2/TestOpenCSVSerde.java PRE-CREATION Diff: https://reviews.apache.org/r/25468/diff/ Testing --- Unit test Thanks, cheng xu
[jira] [Commented] (HIVE-7405) Vectorize GROUP BY on the Reduce-Side (Part 1 – Basic)
[ https://issues.apache.org/jira/browse/HIVE-7405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14128198#comment-14128198 ] Lefty Leverenz commented on HIVE-7405: -- Doc note: This adds configuration parameter *hive.vectorized.execution.reduce.enabled* to HiveConf.java, so it needs to be documented in the wiki: * [Configuration Properties -- Query and DDL Execution | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-QueryandDDLExecution] Vectorize GROUP BY on the Reduce-Side (Part 1 – Basic) -- Key: HIVE-7405 URL: https://issues.apache.org/jira/browse/HIVE-7405 Project: Hive Issue Type: Sub-task Components: Vectorization Reporter: Matt McCline Assignee: Matt McCline Labels: TODOC14 Fix For: 0.14.0 Attachments: HIVE-7405.1.patch, HIVE-7405.2.patch, HIVE-7405.3.patch, HIVE-7405.4.patch, HIVE-7405.5.patch, HIVE-7405.6.patch, HIVE-7405.7.patch, HIVE-7405.8.patch, HIVE-7405.9.patch, HIVE-7405.91.patch, HIVE-7405.92.patch, HIVE-7405.93.patch, HIVE-7405.94.patch, HIVE-7405.95.patch, HIVE-7405.96.patch, HIVE-7405.97.patch, HIVE-7405.98.patch, HIVE-7405.99.patch, HIVE-7405.991.patch, HIVE-7405.994.patch, HIVE-7405.995.patch, HIVE-7405.996.patch Vectorize the basic case that does not have any count distinct aggregation. Add a 4th processing mode in VectorGroupByOperator for reduce where each input VectorizedRowBatch has only values for one key at a time. Thus, the values in the batch can be aggregated quickly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7946) CBO: Merge CBO changes to Trunk
[ https://issues.apache.org/jira/browse/HIVE-7946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14128202#comment-14128202 ] Lars Francke commented on HIVE-7946: I'll try to look at the code issues in the next few days. CBO: Merge CBO changes to Trunk --- Key: HIVE-7946 URL: https://issues.apache.org/jira/browse/HIVE-7946 Project: Hive Issue Type: Bug Components: CBO Reporter: Laljo John Pullokkaran Assignee: Laljo John Pullokkaran Attachments: HIVE-7946.1.patch, HIVE-7946.2.patch, HIVE-7946.3.patch, HIVE-7946.4.patch, HIVE-7946.5.patch, HIVE-7946.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 25178: Add DROP TABLE PURGE
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25178/#review52834 --- ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java https://reviews.apache.org/r/25178/#comment92005 typo: falser should be false - Lefty Leverenz On Sept. 9, 2014, 6:51 p.m., david seraf wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25178/ --- (Updated Sept. 9, 2014, 6:51 p.m.) Review request for hive and Xuefu Zhang. Repository: hive-git Description --- Add PURGE option to DROP TABLE command to skip saving table data to the trash Diffs - hcatalog/core/src/test/java/org/apache/hive/hcatalog/mapreduce/TestHCatPartitionPublish.java be7134f hcatalog/webhcat/svr/src/test/java/org/apache/hive/hcatalog/templeton/tool/TestTempletonUtils.java af952f2 itests/hive-unit/src/test/java/org/apache/hive/jdbc/miniHS2/TestHiveServer2.java da51a55 metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 9489949 metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java a94a7a3 metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreFsImpl.java cff0718 metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java cbdba30 metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreFS.java a141793 metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java 613b709 ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java cd017d8 ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java e387b8f ql/src/java/org/apache/hadoop/hive/ql/metadata/SessionHiveMetaStoreClient.java 4cf98d8 ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java f31a409 ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g 32db0c7 ql/src/java/org/apache/hadoop/hive/ql/plan/DropTableDesc.java ba30e1f ql/src/test/org/apache/hadoop/hive/ql/metadata/TestHive.java 406aae9 ql/src/test/org/apache/hadoop/hive/ql/metadata/TestHiveRemote.java 1a5ba87 ql/src/test/queries/clientpositive/drop_table_purge.q PRE-CREATION ql/src/test/results/clientpositive/drop_table_purge.q.out PRE-CREATION Diff: https://reviews.apache.org/r/25178/diff/ Testing --- added code test and added QL test. Tests passed in CI, but other, unrelated tests failed. Thanks, david seraf
[jira] [Updated] (HIVE-6147) Support avro data stored in HBase columns
[ https://issues.apache.org/jira/browse/HIVE-6147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-6147: - Labels: TODOC14 (was: ) Support avro data stored in HBase columns - Key: HIVE-6147 URL: https://issues.apache.org/jira/browse/HIVE-6147 Project: Hive Issue Type: Improvement Components: HBase Handler Affects Versions: 0.12.0, 0.13.0 Reporter: Swarnim Kulkarni Assignee: Swarnim Kulkarni Labels: TODOC14 Fix For: 0.14.0 Attachments: HIVE-6147.1.patch.txt, HIVE-6147.2.patch.txt, HIVE-6147.3.patch.txt, HIVE-6147.3.patch.txt, HIVE-6147.4.patch.txt, HIVE-6147.5.patch.txt, HIVE-6147.6.patch.txt Presently, the HBase Hive integration supports querying only primitive data types in columns. It would be nice to be able to store and query Avro objects in HBase columns by making them visible as structs to Hive. This will allow Hive to perform ad hoc analysis of HBase data which can be deeply structured. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-6147) Support avro data stored in HBase columns
[ https://issues.apache.org/jira/browse/HIVE-6147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14128218#comment-14128218 ] Lefty Leverenz commented on HIVE-6147: -- Doc question: Will this be documented in the HBase Integration design doc or the Avro SerDe doc, or a new doc? (The HBase doc has a list of open issues, but this one isn't on the list.) * [HBase Integration -- Open Issues (JIRA) | https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration#HBaseIntegration-OpenIssues(JIRA)] * [Avro SerDe | https://cwiki.apache.org/confluence/display/Hive/AvroSerDe] Support avro data stored in HBase columns - Key: HIVE-6147 URL: https://issues.apache.org/jira/browse/HIVE-6147 Project: Hive Issue Type: Improvement Components: HBase Handler Affects Versions: 0.12.0, 0.13.0 Reporter: Swarnim Kulkarni Assignee: Swarnim Kulkarni Labels: TODOC14 Fix For: 0.14.0 Attachments: HIVE-6147.1.patch.txt, HIVE-6147.2.patch.txt, HIVE-6147.3.patch.txt, HIVE-6147.3.patch.txt, HIVE-6147.4.patch.txt, HIVE-6147.5.patch.txt, HIVE-6147.6.patch.txt Presently, the HBase Hive integration supports querying only primitive data types in columns. It would be nice to be able to store and query Avro objects in HBase columns by making them visible as structs to Hive. This will allow Hive to perform ad hoc analysis of HBase data which can be deeply structured. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7776) enable sample10.q.[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengxiang Li updated HIVE-7776: Attachment: HIVE-7776.1-spark.patch Hive get task Id through 2 ways in Utilities::getTaskId: # get parameter value of mapred.task.id from configuration. # generate random value while #1 return null. Currently, Hive on Spark can't get parameter value of mapred.task.id from configuration. FileSinkOperator use taskid to distinct different bucket file name, FileSinkOperator should take taskid as field variable and initiate it only once since one FileSinkOperator instance only refered in one task. but FileSinkOperator call Utilities::getTaskId to get new taskId each time, for this issue, it would cause more bucket files than bucket number, which lead to unexpected result of tablesample queries. enable sample10.q.[Spark Branch] Key: HIVE-7776 URL: https://issues.apache.org/jira/browse/HIVE-7776 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Attachments: HIVE-7776.1-spark.patch sample10.q contain dynamic partition operation, should enable this qtest after hive on spark support dynamic partition. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7776) enable sample10.q.[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengxiang Li updated HIVE-7776: Status: Patch Available (was: Open) enable sample10.q.[Spark Branch] Key: HIVE-7776 URL: https://issues.apache.org/jira/browse/HIVE-7776 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Attachments: HIVE-7776.1-spark.patch sample10.q contain dynamic partition operation, should enable this qtest after hive on spark support dynamic partition. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Review Request 25495: HIVE-7776, enable sample10.q
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25495/ --- Review request for hive, Brock Noland and Xuefu Zhang. Bugs: HIVE-7776 https://issues.apache.org/jira/browse/HIVE-7776 Repository: hive-git Description --- Hive get task Id through 2 ways in Utilities::getTaskId: get parameter value of mapred.task.id from configuration. generate random value while #1 return null. Currently, Hive on Spark can't get parameter value of mapred.task.id from configuration. FileSinkOperator use taskid to distinct different bucket file name, FileSinkOperator should take taskid as field variable and initiate it only once since one FileSinkOperator instance only refered in one task. but FileSinkOperator call Utilities::getTaskId to get new taskId each time, for this issue, it would cause more bucket files than bucket number, which lead to unexpected result of tablesample queries. Diffs - itests/src/test/resources/testconfiguration.properties 155abad ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java 3ff0782 ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 02f9d99 ql/src/test/results/clientpositive/spark/sample10.q.out PRE-CREATION Diff: https://reviews.apache.org/r/25495/diff/ Testing --- Thanks, chengxiang li
[jira] [Commented] (HIVE-8035) Add SORT_QUERY_RESULTS for test that doesn't guarantee order
[ https://issues.apache.org/jira/browse/HIVE-8035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14128233#comment-14128233 ] Hive QA commented on HIVE-8035: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12667610/HIVE-8035.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6193 tests executed *Failed tests:* {noformat} org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/721/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/721/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-721/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12667610 Add SORT_QUERY_RESULTS for test that doesn't guarantee order Key: HIVE-8035 URL: https://issues.apache.org/jira/browse/HIVE-8035 Project: Hive Issue Type: Test Components: Tests Reporter: Rui Li Assignee: Rui Li Priority: Minor Attachments: HIVE-8035.patch Some test query doesn't guarantee output order, e.g. group by, union all. Therefore we should add {{-- SORT_QUERY_RESULTS}} to the qfiles. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8035) Add SORT_QUERY_RESULTS for test that doesn't guarantee order
[ https://issues.apache.org/jira/browse/HIVE-8035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14128236#comment-14128236 ] Rui Li commented on HIVE-8035: -- I noted there's an age-1 failure: {code} org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes {code} But I'm not sure if it's related to the patch. cc [~xuefuz], [~brocknoland] Add SORT_QUERY_RESULTS for test that doesn't guarantee order Key: HIVE-8035 URL: https://issues.apache.org/jira/browse/HIVE-8035 Project: Hive Issue Type: Test Components: Tests Reporter: Rui Li Assignee: Rui Li Priority: Minor Attachments: HIVE-8035.patch Some test query doesn't guarantee output order, e.g. group by, union all. Therefore we should add {{-- SORT_QUERY_RESULTS}} to the qfiles. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7627) FSStatsPublisher does fit into Spark multi-thread task mode[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengxiang Li updated HIVE-7627: Status: Patch Available (was: Open) FSStatsPublisher does fit into Spark multi-thread task mode[Spark Branch] - Key: HIVE-7627 URL: https://issues.apache.org/jira/browse/HIVE-7627 Project: Hive Issue Type: Bug Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Labels: spark-m1 Attachments: HIVE-7627.1-spark.patch, HIVE-7627.2-spark.patch Hive table statistic failed on FSStatsPublisher mode, with the following exception in Spark executor side: {noformat} 14/08/05 16:46:24 WARN hdfs.DFSClient: DataStreamer Exception java.io.FileNotFoundException: ID mismatch. Request id and saved id: 20277 , 20278 for file /tmp/hive-root/8833d172-1edd-4508-86db-fdd7a1b0af17/hive_2014-08-05_16-46-03_013_6279446857294757772-1/-ext-1/tmpstats-0 at org.apache.hadoop.hdfs.server.namenode.INodeId.checkId(INodeId.java:53) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2952) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:2754) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2662) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:525) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1442) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525) Caused by: org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): ID mismatch. Request id and saved id: 20277 , 20278 for file /tmp/hive-root/8833d172-1edd-4508-86db-fdd7a1b0af17/hive_2014-08-05_16-46-03_013_6279446857294757772-1/-ext-1/tmpstats-0 at org.apache.hadoop.hdfs.server.namenode.INodeId.checkId(INodeId.java:53) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2952) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:2754) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2662) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessController.doPrivileged(Native
[jira] [Updated] (HIVE-7627) FSStatsPublisher does fit into Spark multi-thread task mode[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengxiang Li updated HIVE-7627: Attachment: HIVE-7627.2-spark.patch make taskId field variable of FSStatPublisher would resolve this issue either, since SPARK-2895 is still under review, we could enable random generated taskId first. FSStatsPublisher does fit into Spark multi-thread task mode[Spark Branch] - Key: HIVE-7627 URL: https://issues.apache.org/jira/browse/HIVE-7627 Project: Hive Issue Type: Bug Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Labels: spark-m1 Attachments: HIVE-7627.1-spark.patch, HIVE-7627.2-spark.patch Hive table statistic failed on FSStatsPublisher mode, with the following exception in Spark executor side: {noformat} 14/08/05 16:46:24 WARN hdfs.DFSClient: DataStreamer Exception java.io.FileNotFoundException: ID mismatch. Request id and saved id: 20277 , 20278 for file /tmp/hive-root/8833d172-1edd-4508-86db-fdd7a1b0af17/hive_2014-08-05_16-46-03_013_6279446857294757772-1/-ext-1/tmpstats-0 at org.apache.hadoop.hdfs.server.namenode.INodeId.checkId(INodeId.java:53) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2952) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:2754) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2662) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:525) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1442) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525) Caused by: org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): ID mismatch. Request id and saved id: 20277 , 20278 for file /tmp/hive-root/8833d172-1edd-4508-86db-fdd7a1b0af17/hive_2014-08-05_16-46-03_013_6279446857294757772-1/-ext-1/tmpstats-0 at org.apache.hadoop.hdfs.server.namenode.INodeId.checkId(INodeId.java:53) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2952) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:2754) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2662) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at
[jira] [Commented] (HIVE-8035) Add SORT_QUERY_RESULTS for test that doesn't guarantee order
[ https://issues.apache.org/jira/browse/HIVE-8035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14128249#comment-14128249 ] Rui Li commented on HIVE-8035: -- I also have one concern that some qfile contains both cases with and without a guaranteed order. For example in {{limit_pushdown.q}}, we have both: {{select key,value from src order by key desc limit 20;}} and {{select value, sum(key + 1) as sum from src group by value limit 20;}} If we add {{-- SORT_QUERY_RESULTS}}, the generated results can be different from the expected, e.g. for an {{order by desc}} query. Do you think this is OK? Add SORT_QUERY_RESULTS for test that doesn't guarantee order Key: HIVE-8035 URL: https://issues.apache.org/jira/browse/HIVE-8035 Project: Hive Issue Type: Test Components: Tests Reporter: Rui Li Assignee: Rui Li Priority: Minor Attachments: HIVE-8035.patch Some test query doesn't guarantee output order, e.g. group by, union all. Therefore we should add {{-- SORT_QUERY_RESULTS}} to the qfiles. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Review Request 25497: HIVE-7627, FSStatsPublisher does fit into Spark multi-thread task mode
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25497/ --- Review request for hive, Brock Noland and Xuefu Zhang. Bugs: HIVE-7627 https://issues.apache.org/jira/browse/HIVE-7627 Repository: hive-git Description --- make taskId field variable of FSStatPublisher would resolve this issue either, since SPARK-2895 is still under review, we could enable random generated taskId first. Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java cb010fb Diff: https://reviews.apache.org/r/25497/diff/ Testing --- Thanks, chengxiang li
[jira] [Commented] (HIVE-7776) enable sample10.q.[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14128293#comment-14128293 ] Hive QA commented on HIVE-7776: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12667632/HIVE-7776.1-spark.patch {color:red}ERROR:{color} -1 due to 161 failed/errored test(s), 6344 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_numbuckets_partitioned_table2_h23 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_numbuckets_partitioned_table_h23 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_archive_excludeHadoop20 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join27 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_10 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucket1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin5 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_combine1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_combine3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_database org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_explode_null org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby1_limit org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby1_map org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby1_map_skew org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby1_noskew org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby2_limit org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby2_map org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby2_map_multi_distinct org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby2_map_skew org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby2_noskew org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby2_noskew_multi_distinct org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby3_map_skew org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby4_noskew org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby5_noskew org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby6_map org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby6_map_skew org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby6_noskew org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby7_map org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby7_map_multi_single_reducer org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby7_map_skew org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby7_noskew org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby7_noskew_multi_single_reducer org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby8_map org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby8_map_skew org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby8_noskew org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_map_ppr org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_map_ppr_multi_distinct org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_sort_10 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_sort_1_23 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_sort_skew_1_23 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_mult_tables_compact org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_partitioned org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_bitmap org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_bitmap_auto_partitioned org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_bitmap_rc org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_compact org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_compact_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input26 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input41 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join35 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_filters org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_nulls org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_nullsafe org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_leftsemijoin org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_4
[jira] [Commented] (HIVE-8019) Missing commit from trunk : `export/import statement update`
[ https://issues.apache.org/jira/browse/HIVE-8019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14128309#comment-14128309 ] Hive QA commented on HIVE-8019: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12667613/HIVE-8019.2.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 6195 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.metastore.txn.TestCompactionTxnHandler.testRevokeTimedOutWorkers {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/722/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/722/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-722/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12667613 Missing commit from trunk : `export/import statement update` Key: HIVE-8019 URL: https://issues.apache.org/jira/browse/HIVE-8019 Project: Hive Issue Type: Bug Components: Import/Export Affects Versions: 0.14.0 Reporter: Mohit Sabharwal Assignee: Thejas M Nair Priority: Blocker Attachments: HIVE-8019.1.patch, HIVE-8019.2.patch Noticed that commit 1882de7810fc55a2466dd4cbe74ed67bb41cb667 exists in 0.13 branch, but not it trunk. https://github.com/apache/hive/commit/1882de7810fc55a2466dd4cbe74ed67bb41cb667 {code} (trunk) $ git branch -a --contains 1882de7810fc55a2466dd4cbe74ed67bb41cb667 remotes/origin/branch-0.13 {code} I looked through some of the changes in this commit and don't see those in trunk. Nor do I see a commit that reverts these changes in trunk. [~thejas], should we port this over to trunk ? Thanks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 25492: HIVE-7936 - Thrift Union support
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25492/#review52844 --- serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ThriftObjectInspectorUtils.java https://reviews.apache.org/r/25492/#comment92014 Remove this method if not used - Amareshwari Sriramadasu On Sept. 10, 2014, 5:27 a.m., Suma Shivaprasad wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25492/ --- (Updated Sept. 10, 2014, 5:27 a.m.) Review request for hive, Amareshwari Sriramadasu and Ashutosh Chauhan. Bugs: HIVE-7936 https://issues.apache.org/jira/browse/HIVE-7936 Repository: hive-git Description --- ThriftDeserializer currently does not support UNION types Diffs - contrib/src/test/results/clientpositive/udf_example_arraymapstruct.q.out e876cdd data/files/complex.seq c27d5c09b1da881d8fd6fb2aaa1f5d169d1de3ae ql/src/test/queries/clientpositive/input_lazyserde.q 69c0d04 ql/src/test/results/clientnegative/describe_xpath1.q.out d81c96e ql/src/test/results/clientnegative/describe_xpath2.q.out 2bd0f06 ql/src/test/results/clientpositive/case_sensitivity.q.out 8684557 ql/src/test/results/clientpositive/columnarserde_create_shortcut.q.out 4805836 ql/src/test/results/clientpositive/input17.q.out 8fff21b ql/src/test/results/clientpositive/input5.q.out 7524ca7 ql/src/test/results/clientpositive/input_columnarserde.q.out 13cfb7f ql/src/test/results/clientpositive/input_dynamicserde.q.out ebcf1d8 ql/src/test/results/clientpositive/input_lazyserde.q.out 0f685f2 ql/src/test/results/clientpositive/input_testxpath.q.out 3f4b96e ql/src/test/results/clientpositive/input_testxpath2.q.out af1e999 ql/src/test/results/clientpositive/input_testxpath3.q.out b31b2f3 ql/src/test/results/clientpositive/input_testxpath4.q.out 3dca8bf ql/src/test/results/clientpositive/inputddl8.q.out fc13356 ql/src/test/results/clientpositive/join_thrift.q.out e1588c5 ql/src/test/results/clientpositive/udf_case_thrift.q.out 0fc8e84 ql/src/test/results/clientpositive/udf_coalesce.q.out 0d32476 ql/src/test/results/clientpositive/udf_isnull_isnotnull.q.out 1f600b4 ql/src/test/results/clientpositive/udf_size.q.out d7a4fa2 ql/src/test/results/clientpositive/union21.q.out 0e47ff4 serde/if/test/complex.thrift 308b64c serde/src/gen/thrift/gen-cpp/complex_types.h 17991d4 serde/src/gen/thrift/gen-cpp/complex_types.cpp 9526d3d serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/Complex.java e36a792 serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/PropValueUnion.java PRE-CREATION serde/src/gen/thrift/gen-py/complex/ttypes.py 7283e4c serde/src/gen/thrift/gen-rb/complex_types.rb 5527096 serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorFactory.java 9a226b3 serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ReflectionStructObjectInspector.java ee5b0d0 serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ThriftObjectInspectorUtils.java PRE-CREATION serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ThriftUnionObjectInspector.java PRE-CREATION serde/src/test/org/apache/hadoop/hive/serde2/objectinspector/TestObjectInspectorUtils.java a18f4a7 serde/src/test/org/apache/hadoop/hive/serde2/objectinspector/TestThriftObjectInspectors.java 5f692fb serde/src/test/org/apache/hadoop/hive/serde2/thrift_test/CreateSequenceFile.java 7269cd0 Diff: https://reviews.apache.org/r/25492/diff/ Testing --- input_lazyserde.q Thanks, Suma Shivaprasad
[jira] [Commented] (HIVE-7936) Support for handling Thrift Union types
[ https://issues.apache.org/jira/browse/HIVE-7936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14128313#comment-14128313 ] Amareshwari Sriramadasu commented on HIVE-7936: --- The code changes look fine. Put a few comments on the review board. Since the patch involves a binary file change, i think jenkins wont be able to apply the patch. Can you run the tests on a local machine and update the result here? Support for handling Thrift Union types Key: HIVE-7936 URL: https://issues.apache.org/jira/browse/HIVE-7936 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 0.13.1 Reporter: Suma Shivaprasad Assignee: Suma Shivaprasad Fix For: 0.14.0 Attachments: HIVE-7936.1.patch, HIVE-7936.patch, complex.seq Currently hive does not support thrift unions through ThriftDeserializer. Need to add support for the same -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7627) FSStatsPublisher does fit into Spark multi-thread task mode[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14128338#comment-14128338 ] Hive QA commented on HIVE-7627: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12667637/HIVE-7627.2-spark.patch {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 6343 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_fs_default_name2 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_optimize_nullscan org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/123/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/123/console Test logs: http://ec2-54-176-176-199.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-123/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12667637 FSStatsPublisher does fit into Spark multi-thread task mode[Spark Branch] - Key: HIVE-7627 URL: https://issues.apache.org/jira/browse/HIVE-7627 Project: Hive Issue Type: Bug Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Labels: spark-m1 Attachments: HIVE-7627.1-spark.patch, HIVE-7627.2-spark.patch Hive table statistic failed on FSStatsPublisher mode, with the following exception in Spark executor side: {noformat} 14/08/05 16:46:24 WARN hdfs.DFSClient: DataStreamer Exception java.io.FileNotFoundException: ID mismatch. Request id and saved id: 20277 , 20278 for file /tmp/hive-root/8833d172-1edd-4508-86db-fdd7a1b0af17/hive_2014-08-05_16-46-03_013_6279446857294757772-1/-ext-1/tmpstats-0 at org.apache.hadoop.hdfs.server.namenode.INodeId.checkId(INodeId.java:53) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2952) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:2754) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2662) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:525) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1442) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525) Caused by: org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): ID mismatch. Request id and saved id: 20277 , 20278 for file
[jira] [Commented] (HIVE-7777) add CSV support for Serde
[ https://issues.apache.org/jira/browse/HIVE-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14128381#comment-14128381 ] Hive QA commented on HIVE-: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12667616/HIVE-.1.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6196 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.ql.parse.TestParse.testParse_union org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/723/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/723/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-723/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12667616 add CSV support for Serde - Key: HIVE- URL: https://issues.apache.org/jira/browse/HIVE- Project: Hive Issue Type: Bug Components: Serializers/Deserializers Reporter: Ferdinand Xu Assignee: Ferdinand Xu Attachments: HIVE-.1.patch, HIVE-.patch, csv-serde-master.zip There is no official support for csvSerde for hive while there is an open source project in github(https://github.com/ogrodnek/csv-serde). CSV is of high frequency in use as a data format. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8030) NullPointerException on getSchemas
[ https://issues.apache.org/jira/browse/HIVE-8030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14128403#comment-14128403 ] Shiv Prakash commented on HIVE-8030: Yes, I'm using hive-0.13.1. NullPointerException on getSchemas -- Key: HIVE-8030 URL: https://issues.apache.org/jira/browse/HIVE-8030 Project: Hive Issue Type: Bug Components: Database/Schema, JDBC Affects Versions: 0.13.1 Environment: Linux (Ubuntu 12.04) Reporter: Shiv Prakash Labels: hadoop Fix For: 0.13.1 java.lang.NullPointerException at java.util.ArrayList.init(ArrayList.java:164) at org.apache.hadoop.hive.jdbc.HiveMetaDataResultSet.init(HiveMetaDataResultSet.java:32) at org.apache.hadoop.hive.jdbc.HiveDatabaseMetaData$3.init(HiveDatabaseMetaData.java:482) at org.apache.hadoop.hive.jdbc.HiveDatabaseMetaData.getSchemas(HiveDatabaseMetaData.java:481) at org.apache.hadoop.hive.jdbc.HiveDatabaseMetaData.getSchemas(HiveDatabaseMetaData.java:476) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.pentaho.hadoop.shim.common.DriverProxyInvocationChain$DatabaseMetaDataInvocationHandler.invoke(DriverProxyInvocationChain.java:368) at com.sun.proxy.$Proxy20.getSchemas(Unknown Source) at org.pentaho.di.core.database.Database.getSchemas(Database.java:3857) at org.pentaho.di.ui.trans.steps.tableoutput.TableOutputDialog.getSchemaNames(TableOutputDialog.java:1036) at org.pentaho.di.ui.trans.steps.tableoutput.TableOutputDialog.access$2400(TableOutputDialog.java:94) at org.pentaho.di.ui.trans.steps.tableoutput.TableOutputDialog$24.widgetSelected(TableOutputDialog.java:863) at org.eclipse.swt.widgets.TypedListener.handleEvent(Unknown Source) at org.eclipse.swt.widgets.EventTable.sendEvent(Unknown Source) at org.eclipse.swt.widgets.Widget.sendEvent(Unknown Source) at org.eclipse.swt.widgets.Display.runDeferredEvents(Unknown Source) at org.eclipse.swt.widgets.Display.readAndDispatch(Unknown Source) at org.pentaho.di.ui.trans.steps.tableoutput.TableOutputDialog.open(TableOutputDialog.java:884) at org.pentaho.di.ui.spoon.delegates.SpoonStepsDelegate.editStep(SpoonStepsDelegate.java:124) at org.pentaho.di.ui.spoon.Spoon.editStep(Spoon.java:8648) at org.pentaho.di.ui.spoon.trans.TransGraph.editStep(TransGraph.java:3020) at org.pentaho.di.ui.spoon.trans.TransGraph.mouseDoubleClick(TransGraph.java:737) at org.eclipse.swt.widgets.TypedListener.handleEvent(Unknown Source) at org.eclipse.swt.widgets.EventTable.sendEvent(Unknown Source) at org.eclipse.swt.widgets.Widget.sendEvent(Unknown Source) at org.eclipse.swt.widgets.Display.runDeferredEvents(Unknown Source) at org.eclipse.swt.widgets.Display.readAndDispatch(Unknown Source) at org.pentaho.di.ui.spoon.Spoon.readAndDispatch(Spoon.java:1297) at org.pentaho.di.ui.spoon.Spoon.waitForDispose(Spoon.java:7801) at org.pentaho.di.ui.spoon.Spoon.start(Spoon.java:9130) at org.pentaho.di.ui.spoon.Spoon.main(Spoon.java:638) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.pentaho.commons.launcher.Launcher.main(Launcher.java:151) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7689) Enable Postgres as METASTORE back-end
[ https://issues.apache.org/jira/browse/HIVE-7689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14128404#comment-14128404 ] Alan Gates commented on HIVE-7689: -- Are you 100% certain? The TestCompactor uses the transaction tables in the metastore. Enable Postgres as METASTORE back-end - Key: HIVE-7689 URL: https://issues.apache.org/jira/browse/HIVE-7689 Project: Hive Issue Type: Improvement Components: Metastore Affects Versions: 0.14.0 Reporter: Damien Carol Assignee: Damien Carol Priority: Minor Labels: metastore, postgres Fix For: 0.14.0 Attachments: HIVE-7689.5.patch, HIVE-7689.6.patch, HIVE-7689.7.patch, HIVE-7889.1.patch, HIVE-7889.2.patch, HIVE-7889.3.patch, HIVE-7889.4.patch I maintain few patches to make Metastore works with Postgres back end in our production environment. The main goal of this JIRA is to push upstream these patches. This patch enable LOCKS, COMPACTION and fix error in STATS on postgres metastore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8030) NullPointerException on getSchemas
[ https://issues.apache.org/jira/browse/HIVE-8030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14128413#comment-14128413 ] Lars Francke commented on HIVE-8030: I'm sorry but those numbers don't match up. See for yourself in the link above what's going on in line 32 in release 0.13.1. It instead matches up perfectly with what was available in version 0.7.1 and before: https://github.com/apache/hive/blob/release-0.7.1/jdbc/src/java/org/apache/hadoop/hive/jdbc/HiveMetaDataResultSet.java The other line numbers don't match up either. Could you paste more information for example about your classpath? I'm relatively certain that somehow you are not using a vanilla 0.13.1 release. Maybe Pentaho messes something up. NullPointerException on getSchemas -- Key: HIVE-8030 URL: https://issues.apache.org/jira/browse/HIVE-8030 Project: Hive Issue Type: Bug Components: Database/Schema, JDBC Affects Versions: 0.13.1 Environment: Linux (Ubuntu 12.04) Reporter: Shiv Prakash Labels: hadoop Fix For: 0.13.1 java.lang.NullPointerException at java.util.ArrayList.init(ArrayList.java:164) at org.apache.hadoop.hive.jdbc.HiveMetaDataResultSet.init(HiveMetaDataResultSet.java:32) at org.apache.hadoop.hive.jdbc.HiveDatabaseMetaData$3.init(HiveDatabaseMetaData.java:482) at org.apache.hadoop.hive.jdbc.HiveDatabaseMetaData.getSchemas(HiveDatabaseMetaData.java:481) at org.apache.hadoop.hive.jdbc.HiveDatabaseMetaData.getSchemas(HiveDatabaseMetaData.java:476) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.pentaho.hadoop.shim.common.DriverProxyInvocationChain$DatabaseMetaDataInvocationHandler.invoke(DriverProxyInvocationChain.java:368) at com.sun.proxy.$Proxy20.getSchemas(Unknown Source) at org.pentaho.di.core.database.Database.getSchemas(Database.java:3857) at org.pentaho.di.ui.trans.steps.tableoutput.TableOutputDialog.getSchemaNames(TableOutputDialog.java:1036) at org.pentaho.di.ui.trans.steps.tableoutput.TableOutputDialog.access$2400(TableOutputDialog.java:94) at org.pentaho.di.ui.trans.steps.tableoutput.TableOutputDialog$24.widgetSelected(TableOutputDialog.java:863) at org.eclipse.swt.widgets.TypedListener.handleEvent(Unknown Source) at org.eclipse.swt.widgets.EventTable.sendEvent(Unknown Source) at org.eclipse.swt.widgets.Widget.sendEvent(Unknown Source) at org.eclipse.swt.widgets.Display.runDeferredEvents(Unknown Source) at org.eclipse.swt.widgets.Display.readAndDispatch(Unknown Source) at org.pentaho.di.ui.trans.steps.tableoutput.TableOutputDialog.open(TableOutputDialog.java:884) at org.pentaho.di.ui.spoon.delegates.SpoonStepsDelegate.editStep(SpoonStepsDelegate.java:124) at org.pentaho.di.ui.spoon.Spoon.editStep(Spoon.java:8648) at org.pentaho.di.ui.spoon.trans.TransGraph.editStep(TransGraph.java:3020) at org.pentaho.di.ui.spoon.trans.TransGraph.mouseDoubleClick(TransGraph.java:737) at org.eclipse.swt.widgets.TypedListener.handleEvent(Unknown Source) at org.eclipse.swt.widgets.EventTable.sendEvent(Unknown Source) at org.eclipse.swt.widgets.Widget.sendEvent(Unknown Source) at org.eclipse.swt.widgets.Display.runDeferredEvents(Unknown Source) at org.eclipse.swt.widgets.Display.readAndDispatch(Unknown Source) at org.pentaho.di.ui.spoon.Spoon.readAndDispatch(Spoon.java:1297) at org.pentaho.di.ui.spoon.Spoon.waitForDispose(Spoon.java:7801) at org.pentaho.di.ui.spoon.Spoon.start(Spoon.java:9130) at org.pentaho.di.ui.spoon.Spoon.main(Spoon.java:638) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.pentaho.commons.launcher.Launcher.main(Launcher.java:151) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7689) Enable Postgres as METASTORE back-end
[ https://issues.apache.org/jira/browse/HIVE-7689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14128427#comment-14128427 ] Damien Carol commented on HIVE-7689: Using this command with last trunk and this patch applied : {code} mvn -B -o test -Phadoop-2 -Dtest=TestCompactor {code} Enable Postgres as METASTORE back-end - Key: HIVE-7689 URL: https://issues.apache.org/jira/browse/HIVE-7689 Project: Hive Issue Type: Improvement Components: Metastore Affects Versions: 0.14.0 Reporter: Damien Carol Assignee: Damien Carol Priority: Minor Labels: metastore, postgres Fix For: 0.14.0 Attachments: HIVE-7689.5.patch, HIVE-7689.6.patch, HIVE-7689.7.patch, HIVE-7889.1.patch, HIVE-7889.2.patch, HIVE-7889.3.patch, HIVE-7889.4.patch I maintain few patches to make Metastore works with Postgres back end in our production environment. The main goal of this JIRA is to push upstream these patches. This patch enable LOCKS, COMPACTION and fix error in STATS on postgres metastore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7627) FSStatsPublisher does fit into Spark multi-thread task mode[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengxiang Li updated HIVE-7627: Status: Open (was: Patch Available) FSStatsPublisher does fit into Spark multi-thread task mode[Spark Branch] - Key: HIVE-7627 URL: https://issues.apache.org/jira/browse/HIVE-7627 Project: Hive Issue Type: Bug Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Labels: spark-m1 Attachments: HIVE-7627.1-spark.patch, HIVE-7627.2-spark.patch Hive table statistic failed on FSStatsPublisher mode, with the following exception in Spark executor side: {noformat} 14/08/05 16:46:24 WARN hdfs.DFSClient: DataStreamer Exception java.io.FileNotFoundException: ID mismatch. Request id and saved id: 20277 , 20278 for file /tmp/hive-root/8833d172-1edd-4508-86db-fdd7a1b0af17/hive_2014-08-05_16-46-03_013_6279446857294757772-1/-ext-1/tmpstats-0 at org.apache.hadoop.hdfs.server.namenode.INodeId.checkId(INodeId.java:53) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2952) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:2754) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2662) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:525) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1442) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525) Caused by: org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): ID mismatch. Request id and saved id: 20277 , 20278 for file /tmp/hive-root/8833d172-1edd-4508-86db-fdd7a1b0af17/hive_2014-08-05_16-46-03_013_6279446857294757772-1/-ext-1/tmpstats-0 at org.apache.hadoop.hdfs.server.namenode.INodeId.checkId(INodeId.java:53) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2952) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:2754) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2662) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessController.doPrivileged(Native
[jira] [Commented] (HIVE-2149) Fix ant target generate-schema
[ https://issues.apache.org/jira/browse/HIVE-2149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14128448#comment-14128448 ] Lars Francke commented on HIVE-2149: Okay, I'll reopen the issue but won't work on it. The problem is that the DataNucleus Maven Plugin requires a database connection even to create the Schema. That means we need to provide a profile or some other way to get a driver on the classpath. Fix ant target generate-schema --- Key: HIVE-2149 URL: https://issues.apache.org/jira/browse/HIVE-2149 Project: Hive Issue Type: Bug Reporter: Ashutosh Chauhan Priority: Minor Running generate-schema target in metastore dir results in generate-schema: [java] Exception in thread main java.lang.NoClassDefFoundError: org/jpox/SchemaTool -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-2149) Provide a way to generate an SQL file with the Metastore schema
[ https://issues.apache.org/jira/browse/HIVE-2149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Francke updated HIVE-2149: --- Summary: Provide a way to generate an SQL file with the Metastore schema (was: Fix ant target generate-schema ) Provide a way to generate an SQL file with the Metastore schema --- Key: HIVE-2149 URL: https://issues.apache.org/jira/browse/HIVE-2149 Project: Hive Issue Type: Bug Reporter: Ashutosh Chauhan Priority: Minor Running generate-schema target in metastore dir results in generate-schema: [java] Exception in thread main java.lang.NoClassDefFoundError: org/jpox/SchemaTool -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-2149) Provide a way to generate an SQL file with the Metastore schema
[ https://issues.apache.org/jira/browse/HIVE-2149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14128449#comment-14128449 ] Lars Francke commented on HIVE-2149: Turns out I can't reopen issuescan you? Provide a way to generate an SQL file with the Metastore schema --- Key: HIVE-2149 URL: https://issues.apache.org/jira/browse/HIVE-2149 Project: Hive Issue Type: Bug Reporter: Ashutosh Chauhan Priority: Minor Running generate-schema target in metastore dir results in generate-schema: [java] Exception in thread main java.lang.NoClassDefFoundError: org/jpox/SchemaTool -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7776) enable sample10.q.[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengxiang Li updated HIVE-7776: Status: Open (was: Patch Available) enable sample10.q.[Spark Branch] Key: HIVE-7776 URL: https://issues.apache.org/jira/browse/HIVE-7776 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Attachments: HIVE-7776.1-spark.patch sample10.q contain dynamic partition operation, should enable this qtest after hive on spark support dynamic partition. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7704) Create tez task for fast file merging
[ https://issues.apache.org/jira/browse/HIVE-7704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14128454#comment-14128454 ] Hive QA commented on HIVE-7704: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12667624/HIVE-7704.9.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6201 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_split_elimination org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/724/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/724/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-724/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12667624 Create tez task for fast file merging - Key: HIVE-7704 URL: https://issues.apache.org/jira/browse/HIVE-7704 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Attachments: HIVE-7704.1.patch, HIVE-7704.2.patch, HIVE-7704.3.patch, HIVE-7704.4.patch, HIVE-7704.4.patch, HIVE-7704.5.patch, HIVE-7704.6.patch, HIVE-7704.7.patch, HIVE-7704.8.patch, HIVE-7704.9.patch Currently tez falls back to MR task for merge file task. It will beneficial to convert the merge file tasks to tez task to make use of the performance gains from tez. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7689) Enable Postgres as METASTORE back-end
[ https://issues.apache.org/jira/browse/HIVE-7689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Carol updated HIVE-7689: --- Attachment: HIVE-7689.8.patch Add modifications in {{prepDB}} et {{cleanDb}} methods of {{TxnDbUtil}}. Enable Postgres as METASTORE back-end - Key: HIVE-7689 URL: https://issues.apache.org/jira/browse/HIVE-7689 Project: Hive Issue Type: Improvement Components: Metastore Affects Versions: 0.14.0 Reporter: Damien Carol Assignee: Damien Carol Priority: Minor Labels: metastore, postgres Fix For: 0.14.0 Attachments: HIVE-7689.5.patch, HIVE-7689.6.patch, HIVE-7689.7.patch, HIVE-7689.8.patch, HIVE-7889.1.patch, HIVE-7889.2.patch, HIVE-7889.3.patch, HIVE-7889.4.patch I maintain few patches to make Metastore works with Postgres back end in our production environment. The main goal of this JIRA is to push upstream these patches. This patch enable LOCKS, COMPACTION and fix error in STATS on postgres metastore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-2390) Add UNIONTYPE serialization support to LazyBinarySerDe
[ https://issues.apache.org/jira/browse/HIVE-2390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14128486#comment-14128486 ] Suma Shivaprasad commented on HIVE-2390: Carl, I am working on a related feature to support UNIONTYPE in ThriftDeserializer as well. Since I am a fairly new contributor to Hive and not aware of the existing issues in UNIONTYPE feature, if someone could identify the missing pieces and raise jiras, i can take a stab at it. Add UNIONTYPE serialization support to LazyBinarySerDe -- Key: HIVE-2390 URL: https://issues.apache.org/jira/browse/HIVE-2390 Project: Hive Issue Type: Bug Affects Versions: 0.13.1 Reporter: Jakob Homan Assignee: Suma Shivaprasad Labels: TODOC14, uniontype Fix For: 0.14.0 Attachments: HIVE-2390.1.patch, HIVE-2390.patch When the union type was introduced, full support for it wasn't provided. For instance, when working with a union that gets passed to LazyBinarySerde: {noformat}Caused by: java.lang.RuntimeException: Unrecognized type: UNION at org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serialize(LazyBinarySerDe.java:468) at org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serializeStruct(LazyBinarySerDe.java:230) at org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serialize(LazyBinarySerDe.java:184) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8017) Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14128501#comment-14128501 ] Brock Noland commented on HIVE-8017: No that's an infra issue. The test framework uses ec2 which can at times be flaky. Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark Branch] --- Key: HIVE-8017 URL: https://issues.apache.org/jira/browse/HIVE-8017 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Rui Li Assignee: Rui Li Attachments: HIVE-8017-spark.patch, HIVE-8017.2-spark.patch, HIVE-8017.3-spark.patch HiveKey should be used as the key type because it holds the hash code for partitioning. While BytesWritable serves partitioning well for simple cases, we have to use {{HiveKey.hashCode}} for more complicated ones, e.g. join, bucketed table, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8035) Add SORT_QUERY_RESULTS for test that doesn't guarantee order
[ https://issues.apache.org/jira/browse/HIVE-8035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14128507#comment-14128507 ] Brock Noland commented on HIVE-8035: +1 The test {{TestOrcHCatLoader.testReadDataPrimitiveTypes}} seems to be quite flaky. with regards to {{limit_pushdown.q}}, I think the test that is important there is that the limit is pushed down as shown in the explain plan. It's not a correctness test for order by. Add SORT_QUERY_RESULTS for test that doesn't guarantee order Key: HIVE-8035 URL: https://issues.apache.org/jira/browse/HIVE-8035 Project: Hive Issue Type: Test Components: Tests Reporter: Rui Li Assignee: Rui Li Priority: Minor Attachments: HIVE-8035.patch Some test query doesn't guarantee output order, e.g. group by, union all. Therefore we should add {{-- SORT_QUERY_RESULTS}} to the qfiles. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7689) Enable Postgres as METASTORE back-end
[ https://issues.apache.org/jira/browse/HIVE-7689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14128565#comment-14128565 ] Hive QA commented on HIVE-7689: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12667660/HIVE-7689.8.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 6193 tests executed *Failed tests:* {noformat} org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/725/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/725/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-725/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12667660 Enable Postgres as METASTORE back-end - Key: HIVE-7689 URL: https://issues.apache.org/jira/browse/HIVE-7689 Project: Hive Issue Type: Improvement Components: Metastore Affects Versions: 0.14.0 Reporter: Damien Carol Assignee: Damien Carol Priority: Minor Labels: metastore, postgres Fix For: 0.14.0 Attachments: HIVE-7689.5.patch, HIVE-7689.6.patch, HIVE-7689.7.patch, HIVE-7689.8.patch, HIVE-7889.1.patch, HIVE-7889.2.patch, HIVE-7889.3.patch, HIVE-7889.4.patch I maintain few patches to make Metastore works with Postgres back end in our production environment. The main goal of this JIRA is to push upstream these patches. This patch enable LOCKS, COMPACTION and fix error in STATS on postgres metastore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7812) Disable CombineHiveInputFormat when ACID format is used
[ https://issues.apache.org/jira/browse/HIVE-7812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated HIVE-7812: Attachment: HIVE-7812.patch Updated to Ashutosh's comments. Disable CombineHiveInputFormat when ACID format is used --- Key: HIVE-7812 URL: https://issues.apache.org/jira/browse/HIVE-7812 Project: Hive Issue Type: Bug Components: File Formats Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: HIVE-7812.patch, HIVE-7812.patch, HIVE-7812.patch Currently the HiveCombineInputFormat complains when called on an ACID directory. Modify HiveCombineInputFormat so that HiveInputFormat is used instead if the directory is ACID format. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Debugging Hive frontend in eclipse
Hello, I am new to Hive dev community, I am trying to debug Hive frontend (till semantic analysis) from eclipse. I want to start from Main in CliDriver. I don't want to go debugging till execution and don't care if it fails. As described in https://cwiki.apache.org/confluence/display/Hive/DeveloperGuide#DeveloperGuide-DebuggingHiveCode , I am able to debug TestCliDriver from eclipse and several tests pass showing that metastore is working fine. The problem is that, when I start debugging from CliDriver, metastore is not initialized properly. So semantic analysis fails at getMetadata call . Is any additional setup required to get metadata work properly from eclipse debugging? -- Saumitra S. Shahapure
[jira] [Updated] (HIVE-8034) Don't add colon when no port is specified
[ https://issues.apache.org/jira/browse/HIVE-8034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-8034: --- Resolution: Fixed Fix Version/s: 0.14.0 Status: Resolved (was: Patch Available) Don't add colon when no port is specified - Key: HIVE-8034 URL: https://issues.apache.org/jira/browse/HIVE-8034 Project: Hive Issue Type: Bug Reporter: Brock Noland Assignee: Brock Noland Fix For: 0.14.0 Attachments: HIVE-8034.1.patch In HIVE-4910 we added a {{:}} even if there was no port due to HADOOP-9776. Now that this is fixed I think we should fix ours as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-2149) Provide a way to generate an SQL file with the Metastore schema
[ https://issues.apache.org/jira/browse/HIVE-2149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-2149: --- Component/s: Metastore Provide a way to generate an SQL file with the Metastore schema --- Key: HIVE-2149 URL: https://issues.apache.org/jira/browse/HIVE-2149 Project: Hive Issue Type: Bug Components: Metastore Reporter: Ashutosh Chauhan Priority: Minor Running generate-schema target in metastore dir results in generate-schema: [java] Exception in thread main java.lang.NoClassDefFoundError: org/jpox/SchemaTool -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (HIVE-2149) Provide a way to generate an SQL file with the Metastore schema
[ https://issues.apache.org/jira/browse/HIVE-2149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan reopened HIVE-2149: Provide a way to generate an SQL file with the Metastore schema --- Key: HIVE-2149 URL: https://issues.apache.org/jira/browse/HIVE-2149 Project: Hive Issue Type: Bug Components: Metastore Reporter: Ashutosh Chauhan Priority: Minor Running generate-schema target in metastore dir results in generate-schema: [java] Exception in thread main java.lang.NoClassDefFoundError: org/jpox/SchemaTool -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7936) Support for handling Thrift Union types
[ https://issues.apache.org/jira/browse/HIVE-7936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suma Shivaprasad updated HIVE-7936: --- Attachment: HIVE-7936.2.patch Fixed parsing test case output failure mismatches Support for handling Thrift Union types Key: HIVE-7936 URL: https://issues.apache.org/jira/browse/HIVE-7936 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 0.13.1 Reporter: Suma Shivaprasad Assignee: Suma Shivaprasad Fix For: 0.14.0 Attachments: HIVE-7936.1.patch, HIVE-7936.2.patch, HIVE-7936.patch, complex.seq Currently hive does not support thrift unions through ThriftDeserializer. Need to add support for the same -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 25468: HIVE-7777: add CSVSerde support
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25468/#review52883 --- This looks great! I think we are almost ready to commit. Can you add a new test (e.g. ql/src/test/queries/clientpositive/serde_csv.q) which runs a couple queries? e.g. ql/src/test/queries/clientpositive/serde_regex.q - Brock Noland On Sept. 9, 2014, 2:16 a.m., cheng xu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25468/ --- (Updated Sept. 9, 2014, 2:16 a.m.) Review request for hive. Bugs: HIVE- https://issues.apache.org/jira/browse/HIVE- Repository: hive-git Description --- HIVE-: add CSVSerde support Diffs - pom.xml 8973c2b52d0797d1f34859951de7349f7e5b996f serde/pom.xml f8bcc830cfb298d739819db8fbaa2f98f221ccf3 serde/src/java/org/apache/hadoop/hive/serde2/OpenCSVSerde.java PRE-CREATION serde/src/test/org/apache/hadoop/hive/serde2/TestOpenCSVSerde.java PRE-CREATION Diff: https://reviews.apache.org/r/25468/diff/ Testing --- Unit test Thanks, cheng xu
[jira] [Created] (HIVE-8038) Decouple ORC files split calculation logic from Filesystem's get file location implementation
Pankit Thapar created HIVE-8038: --- Summary: Decouple ORC files split calculation logic from Filesystem's get file location implementation Key: HIVE-8038 URL: https://issues.apache.org/jira/browse/HIVE-8038 Project: Hive Issue Type: Improvement Components: File Formats Affects Versions: 0.13.1 Reporter: Pankit Thapar Fix For: 0.14.0 What is the Current Logic == 1.get the file blocks from FileSystem.getFileBlockLocations() which returns an array of BlockLocation 2.In SplitGenerator.createSplit(), check if split only spans one block or multiple blocks. 3.If split spans just one block, then using the array index (index = offset/blockSize), get the corresponding host having the blockLocation 4.If the split spans multiple blocks, then get all hosts that have at least 80% of the max of total data in split hosted by any host. 5.add the split to a list of splits Issue with Current Logic = Dependency on FileSystem API’s logic for block location calculations. It returns an array and we need to rely on FileSystem to make all blocks of same size if we want to directly access a block from the array. What is the Fix = 1a.get the file blocks from FileSystem.getFileBlockLocations() which returns an array of BlockLocation 1b.convert the array into a tree map offset, BlockLocation and return it through getLocationsWithOffSet() 2.In SplitGenerator.createSplit(), check if split only spans one block or multiple blocks. 3.If split spans just one block, then using Tree.floorEntry(key), get the highest entry smaller than offset for the split and get the corresponding host. 4a.If the split spans multiple blocks, get a submap, which contains all entries containing blockLocations from the offset to offset + length 4b.get all hosts that have at least 80% of the max of total data in split hosted by any host. 5.add the split to a list of splits What are the major changes in logic == 1. store BlockLocations in a Map instead of an array 2. Call SHIMS.getLocationsWithOffSet() instead of getLocations() 3. one block case is checked by if(offset + length = start.getOffset() + start.getLength()) instead of if((offset % blockSize) + length = blockSize) What is the affect on Complexity (Big O) = 1. We add a O(n) loop to build a TreeMap from an array but its a one time cost and would not be called for each split 2. In case of one block case, we can get the block in O(logn) worst case which was O(1) before 3. Getting the submap is O(logn) 4. In case of multiple block case, building the list of hosts is O(m) which was O(n) m n as previously we were iterating over all the block locations but now we are only iterating only blocks that belong to that range go offsets that we need. What are the benefits of the change == 1. With this fix, we do not depend on the blockLocations returned by FileSystem to figure out the block corresponding to the offset and blockSize 2. Also, it is not necessary that block lengths is same for all blocks for all FileSystems 3. Previously we were using blockSize for one block case and block.length for multiple block case, which is not the case now. We figure out the block depending upon the actual length and offset of the block -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8038) Decouple ORC files split calculation logic from Filesystem's get file location implementation
[ https://issues.apache.org/jira/browse/HIVE-8038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pankit Thapar updated HIVE-8038: Attachment: HIVE-8038.patch Decouple ORC files split calculation logic from Filesystem's get file location implementation - Key: HIVE-8038 URL: https://issues.apache.org/jira/browse/HIVE-8038 Project: Hive Issue Type: Improvement Components: File Formats Affects Versions: 0.13.1 Reporter: Pankit Thapar Fix For: 0.14.0 Attachments: HIVE-8038.patch What is the Current Logic == 1.get the file blocks from FileSystem.getFileBlockLocations() which returns an array of BlockLocation 2.In SplitGenerator.createSplit(), check if split only spans one block or multiple blocks. 3.If split spans just one block, then using the array index (index = offset/blockSize), get the corresponding host having the blockLocation 4.If the split spans multiple blocks, then get all hosts that have at least 80% of the max of total data in split hosted by any host. 5.add the split to a list of splits Issue with Current Logic = Dependency on FileSystem API’s logic for block location calculations. It returns an array and we need to rely on FileSystem to make all blocks of same size if we want to directly access a block from the array. What is the Fix = 1a.get the file blocks from FileSystem.getFileBlockLocations() which returns an array of BlockLocation 1b.convert the array into a tree map offset, BlockLocation and return it through getLocationsWithOffSet() 2.In SplitGenerator.createSplit(), check if split only spans one block or multiple blocks. 3.If split spans just one block, then using Tree.floorEntry(key), get the highest entry smaller than offset for the split and get the corresponding host. 4a.If the split spans multiple blocks, get a submap, which contains all entries containing blockLocations from the offset to offset + length 4b.get all hosts that have at least 80% of the max of total data in split hosted by any host. 5.add the split to a list of splits What are the major changes in logic == 1. store BlockLocations in a Map instead of an array 2. Call SHIMS.getLocationsWithOffSet() instead of getLocations() 3. one block case is checked by if(offset + length = start.getOffset() + start.getLength()) instead of if((offset % blockSize) + length = blockSize) What is the affect on Complexity (Big O) = 1. We add a O(n) loop to build a TreeMap from an array but its a one time cost and would not be called for each split 2. In case of one block case, we can get the block in O(logn) worst case which was O(1) before 3. Getting the submap is O(logn) 4. In case of multiple block case, building the list of hosts is O(m) which was O(n) m n as previously we were iterating over all the block locations but now we are only iterating only blocks that belong to that range go offsets that we need. What are the benefits of the change == 1. With this fix, we do not depend on the blockLocations returned by FileSystem to figure out the block corresponding to the offset and blockSize 2. Also, it is not necessary that block lengths is same for all blocks for all FileSystems 3. Previously we were using blockSize for one block case and block.length for multiple block case, which is not the case now. We figure out the block depending upon the actual length and offset of the block -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8038) Decouple ORC files split calculation logic from Filesystem's get file location implementation
[ https://issues.apache.org/jira/browse/HIVE-8038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pankit Thapar updated HIVE-8038: Status: Patch Available (was: Open) Decouple ORC files split calculation logic from Filesystem's get file location implementation - Key: HIVE-8038 URL: https://issues.apache.org/jira/browse/HIVE-8038 Project: Hive Issue Type: Improvement Components: File Formats Affects Versions: 0.13.1 Reporter: Pankit Thapar Fix For: 0.14.0 Attachments: HIVE-8038.patch What is the Current Logic == 1.get the file blocks from FileSystem.getFileBlockLocations() which returns an array of BlockLocation 2.In SplitGenerator.createSplit(), check if split only spans one block or multiple blocks. 3.If split spans just one block, then using the array index (index = offset/blockSize), get the corresponding host having the blockLocation 4.If the split spans multiple blocks, then get all hosts that have at least 80% of the max of total data in split hosted by any host. 5.add the split to a list of splits Issue with Current Logic = Dependency on FileSystem API’s logic for block location calculations. It returns an array and we need to rely on FileSystem to make all blocks of same size if we want to directly access a block from the array. What is the Fix = 1a.get the file blocks from FileSystem.getFileBlockLocations() which returns an array of BlockLocation 1b.convert the array into a tree map offset, BlockLocation and return it through getLocationsWithOffSet() 2.In SplitGenerator.createSplit(), check if split only spans one block or multiple blocks. 3.If split spans just one block, then using Tree.floorEntry(key), get the highest entry smaller than offset for the split and get the corresponding host. 4a.If the split spans multiple blocks, get a submap, which contains all entries containing blockLocations from the offset to offset + length 4b.get all hosts that have at least 80% of the max of total data in split hosted by any host. 5.add the split to a list of splits What are the major changes in logic == 1. store BlockLocations in a Map instead of an array 2. Call SHIMS.getLocationsWithOffSet() instead of getLocations() 3. one block case is checked by if(offset + length = start.getOffset() + start.getLength()) instead of if((offset % blockSize) + length = blockSize) What is the affect on Complexity (Big O) = 1. We add a O(n) loop to build a TreeMap from an array but its a one time cost and would not be called for each split 2. In case of one block case, we can get the block in O(logn) worst case which was O(1) before 3. Getting the submap is O(logn) 4. In case of multiple block case, building the list of hosts is O(m) which was O(n) m n as previously we were iterating over all the block locations but now we are only iterating only blocks that belong to that range go offsets that we need. What are the benefits of the change == 1. With this fix, we do not depend on the blockLocations returned by FileSystem to figure out the block corresponding to the offset and blockSize 2. Also, it is not necessary that block lengths is same for all blocks for all FileSystems 3. Previously we were using blockSize for one block case and block.length for multiple block case, which is not the case now. We figure out the block depending upon the actual length and offset of the block -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7086) TestHiveServer2.testConnection is failing on trunk
[ https://issues.apache.org/jira/browse/HIVE-7086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14128724#comment-14128724 ] Ashutosh Chauhan commented on HIVE-7086: +1 TestHiveServer2.testConnection is failing on trunk -- Key: HIVE-7086 URL: https://issues.apache.org/jira/browse/HIVE-7086 Project: Hive Issue Type: Test Components: HiveServer2, JDBC Affects Versions: 0.14.0 Reporter: Ashutosh Chauhan Assignee: Vaibhav Gumashta Fix For: 0.14.0 Attachments: HIVE-7086.1.patch, HIVE-7086.2.patch, HIVE-7086.3.patch Able to repro locally on fresh checkout -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7812) Disable CombineHiveInputFormat when ACID format is used
[ https://issues.apache.org/jira/browse/HIVE-7812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14128749#comment-14128749 ] Hive QA commented on HIVE-7812: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12667764/HIVE-7812.patch {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 6193 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_metadataonly1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_select_dummy_source org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/726/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/726/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-726/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12667764 Disable CombineHiveInputFormat when ACID format is used --- Key: HIVE-7812 URL: https://issues.apache.org/jira/browse/HIVE-7812 Project: Hive Issue Type: Bug Components: File Formats Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: HIVE-7812.patch, HIVE-7812.patch, HIVE-7812.patch Currently the HiveCombineInputFormat complains when called on an ACID directory. Modify HiveCombineInputFormat so that HiveInputFormat is used instead if the directory is ACID format. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Debugging Hive frontend in eclipse
There have been some threads on how to get around the metastore initialization issue. But another easy way to work around this issue is to build hive, and then run hive --debug . Hive will wait for the debugger to connect on port 8000. You can configure eclipse debugging to connect to that port. On Wed, Sep 10, 2014 at 7:09 AM, Saumitra Shahapure saumitra.offic...@gmail.com wrote: Hello, I am new to Hive dev community, I am trying to debug Hive frontend (till semantic analysis) from eclipse. I want to start from Main in CliDriver. I don't want to go debugging till execution and don't care if it fails. As described in https://cwiki.apache.org/confluence/display/Hive/DeveloperGuide#DeveloperGuide-DebuggingHiveCode , I am able to debug TestCliDriver from eclipse and several tests pass showing that metastore is working fine. The problem is that, when I start debugging from CliDriver, metastore is not initialized properly. So semantic analysis fails at getMetadata call . Is any additional setup required to get metadata work properly from eclipse debugging? -- Saumitra S. Shahapure -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
[jira] [Commented] (HIVE-8038) Decouple ORC files split calculation logic from Filesystem's get file location implementation
[ https://issues.apache.org/jira/browse/HIVE-8038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14128768#comment-14128768 ] Gopal V commented on HIVE-8038: --- This is an interesting change-set. bq. 4a.If the split spans multiple blocks, get a submap, which contains all entries containing blockLocations from the offset to offset + length For ORC to be really fast, we enforce that a stripe (the smallest split you can get) always fits within a block - this is true for HDFS at least, because it can specify a preferred block size when creating files. From an elegance point of view, I like the TreeMap.floorEntry() over a for loop - but I have never seen the 4A/4B scenarios when using Hive-13. bq. 2. Also, it is not necessary that block lengths is same for all blocks for all FileSystems This is something to be fixed anyway - as HDFS-3689 will allow variable length blocks in HDFS as well. Decouple ORC files split calculation logic from Filesystem's get file location implementation - Key: HIVE-8038 URL: https://issues.apache.org/jira/browse/HIVE-8038 Project: Hive Issue Type: Improvement Components: File Formats Affects Versions: 0.13.1 Reporter: Pankit Thapar Fix For: 0.14.0 Attachments: HIVE-8038.patch What is the Current Logic == 1.get the file blocks from FileSystem.getFileBlockLocations() which returns an array of BlockLocation 2.In SplitGenerator.createSplit(), check if split only spans one block or multiple blocks. 3.If split spans just one block, then using the array index (index = offset/blockSize), get the corresponding host having the blockLocation 4.If the split spans multiple blocks, then get all hosts that have at least 80% of the max of total data in split hosted by any host. 5.add the split to a list of splits Issue with Current Logic = Dependency on FileSystem API’s logic for block location calculations. It returns an array and we need to rely on FileSystem to make all blocks of same size if we want to directly access a block from the array. What is the Fix = 1a.get the file blocks from FileSystem.getFileBlockLocations() which returns an array of BlockLocation 1b.convert the array into a tree map offset, BlockLocation and return it through getLocationsWithOffSet() 2.In SplitGenerator.createSplit(), check if split only spans one block or multiple blocks. 3.If split spans just one block, then using Tree.floorEntry(key), get the highest entry smaller than offset for the split and get the corresponding host. 4a.If the split spans multiple blocks, get a submap, which contains all entries containing blockLocations from the offset to offset + length 4b.get all hosts that have at least 80% of the max of total data in split hosted by any host. 5.add the split to a list of splits What are the major changes in logic == 1. store BlockLocations in a Map instead of an array 2. Call SHIMS.getLocationsWithOffSet() instead of getLocations() 3. one block case is checked by if(offset + length = start.getOffset() + start.getLength()) instead of if((offset % blockSize) + length = blockSize) What is the affect on Complexity (Big O) = 1. We add a O(n) loop to build a TreeMap from an array but its a one time cost and would not be called for each split 2. In case of one block case, we can get the block in O(logn) worst case which was O(1) before 3. Getting the submap is O(logn) 4. In case of multiple block case, building the list of hosts is O(m) which was O(n) m n as previously we were iterating over all the block locations but now we are only iterating only blocks that belong to that range go offsets that we need. What are the benefits of the change == 1. With this fix, we do not depend on the blockLocations returned by FileSystem to figure out the block corresponding to the offset and blockSize 2. Also, it is not necessary that block lengths is same for all blocks for all FileSystems 3. Previously we were using blockSize for one block case and block.length for multiple block case, which is not the case now. We figure out the block depending upon the actual length and offset of the block -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7984) AccumuloOutputFormat Configuration items from StorageHandler not re-set in Configuration in Tez
[ https://issues.apache.org/jira/browse/HIVE-7984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Elser updated HIVE-7984: - Attachment: HIVE-7984-1.patch Same changes, but named the original attachment wrong. Fixing suffix to trigger HIVE-QA AccumuloOutputFormat Configuration items from StorageHandler not re-set in Configuration in Tez --- Key: HIVE-7984 URL: https://issues.apache.org/jira/browse/HIVE-7984 Project: Hive Issue Type: Bug Components: StorageHandler, Tez Reporter: Josh Elser Assignee: Josh Elser Fix For: 0.14.0 Attachments: HIVE-7984-1.diff, HIVE-7984-1.patch Ran AccumuloStorageHandler queries with Tez and found that configuration elements that are pulled from the {{-hiveconf}} and passed to the inputJobProperties or outputJobProperties by the AccumuloStorageHandler aren't available inside of the Tez container. I'm guessing that there is a disconnect from the configuration that the StorageHandler creates and what the Tez container sees. The HBaseStorageHandler likely doesn't run into this because it expects to have hbase-site.xml available via tmpjars (and can extrapolate connection information from that file). Accumulo's site configuration file is not meant to be shared with consumers which means that this exact approach is not sufficient. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Hive User Group Meeting
Hi all, I'm very excited as we are just about one month away from the meetup. Here is a list of talks that will be delivered in the coming Hive user group meeting. 1. Julian Hyde, cost-based optimization, Optiq, and materialized views 2. Xuefu Zhang, Hive on Spark 3. George Chow, Updates on Hive Thrift Protocol 4. Prasad Mujumdar, What's new in Apache Sentry We still have a couple of slots open, so please let me know if you're interested in giving a talk. In the meantime, please RSVP if you plan to join the event. Thanks, Xuefu On Tue, Aug 26, 2014 at 6:37 PM, Xuefu Zhang xzh...@cloudera.com wrote: Dear Apache Hive users and developers, The next Hive user group meeting mentioned previously was officially announced here: http://www.meetup.com/Hive-User-Group-Meeting/events/202007872/. As it's only about one and a half month away, please RSVP if you plan to go so that the organizers can plan the meeting accordingly. Currently, we still have a few talk slots open. Please let me know if you're interested to give a talk. Regards, Xuefu On Mon, Jul 7, 2014 at 6:01 PM, Xuefu Zhang xzh...@cloudera.com wrote: Dear Hive users, Hive community is considering a user group meeting during Hadoop World that will be held in New York October 15-17th. To make this happen, your support is essential. First, I'm wondering if any user, especially those in New York area would be willing to host the meetup. Secondly, I'm soliciting talks from users as well as developers, and so please propose or share your thoughts on the contents of the meetup. I will soon setup a meetup event to formally announce this. In the meantime, your suggestions, comments, and kind assistance are greatly appreciated. Sincerely, Xuefu
[jira] [Updated] (HIVE-8022) Recursive root scratch directory creation is not using hdfs umask properly
[ https://issues.apache.org/jira/browse/HIVE-8022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta updated HIVE-8022: --- Resolution: Fixed Status: Resolved (was: Patch Available) Patch committed to trunk. Thanks for the review [~thejas]! Recursive root scratch directory creation is not using hdfs umask properly --- Key: HIVE-8022 URL: https://issues.apache.org/jira/browse/HIVE-8022 Project: Hive Issue Type: Bug Components: CLI, HiveServer2 Affects Versions: 0.14.0 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Fix For: 0.14.0 Attachments: HIVE-8022.1.patch, HIVE-8022.2.patch, HIVE-8022.3.patch Changes made in HIVE-6847 removed the helper methods that were added HIVE-7001 to get around this problem. Since the root scratch dir must be writable by all, its creation should use those methods. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7818) Support boolean PPD for ORC
[ https://issues.apache.org/jira/browse/HIVE-7818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-7818: - Resolution: Fixed Status: Resolved (was: Patch Available) Committed to trunk. Thanks [~daijy] for the patch. Support boolean PPD for ORC --- Key: HIVE-7818 URL: https://issues.apache.org/jira/browse/HIVE-7818 Project: Hive Issue Type: Improvement Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.14.0 Attachments: HIVE-7818.1.patch Currently ORC does collect stats for boolean field. However, the boolean stats is not range based, instead, it collects counts of true records. RecordReaderImpl.evaluatePredicate currently only deals with range based stats, we need to improve it to deal with the boolean stats. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-6109) Support customized location for EXTERNAL tables created by Dynamic Partitioning
[ https://issues.apache.org/jira/browse/HIVE-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14128873#comment-14128873 ] karthik commented on HIVE-6109: --- Satish Mittal, You participation is phenomenal in this forum and very helpful for new users like me. I need to use dynamic partitioning -Custom pattern and i am missing something very obvious. Do i need to set up hcat.dynamic.partitioning.custom.pattern in Hive CLI as both Hcatalog and Hive are integrated together. ? My path for partition in external table is asusual like data/year=2013/month=jan/ But i need data/year/month.. Do i need to amend location for this external table.? Please accept apologies if i sound very basic. Thanks in advance Support customized location for EXTERNAL tables created by Dynamic Partitioning --- Key: HIVE-6109 URL: https://issues.apache.org/jira/browse/HIVE-6109 Project: Hive Issue Type: Improvement Components: HCatalog Reporter: Satish Mittal Assignee: Satish Mittal Fix For: 0.13.0 Attachments: HIVE-6109.1.patch.txt, HIVE-6109.2.patch.txt, HIVE-6109.3.patch.txt, HIVE-6109.pdf Currently when dynamic partitions are created by HCatalog, the underlying directories for the partitions are created in a fixed 'Hive-style' format, i.e. root_dir/key1=value1/key2=value2/ and so on. However in case of external table, user should be able to control the format of directories created for dynamic partitions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8038) Decouple ORC files split calculation logic from Filesystem's get file location implementation
[ https://issues.apache.org/jira/browse/HIVE-8038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14128871#comment-14128871 ] Pankit Thapar commented on HIVE-8038: - Hi, Thanks for the feedback. 1. The use case where the split may span more than one block would be when Math.min(MAX_BLOCK_SIZE, 2 * stripeSize) returns MAX_BLOCK_SIZE as the size of the block for the file. Example : stripe size 512MB and BLOCK SIZE is 400MB, in that case, split would span more than one block. 2. I see that HDFS wants to support variable length blocks but what I meant was to remove the usage of blockSize variable all together as that is not true for all the FileSystems. We want to generalize the usage for FileSystems apart from HDFS. Decouple ORC files split calculation logic from Filesystem's get file location implementation - Key: HIVE-8038 URL: https://issues.apache.org/jira/browse/HIVE-8038 Project: Hive Issue Type: Improvement Components: File Formats Affects Versions: 0.13.1 Reporter: Pankit Thapar Fix For: 0.14.0 Attachments: HIVE-8038.patch What is the Current Logic == 1.get the file blocks from FileSystem.getFileBlockLocations() which returns an array of BlockLocation 2.In SplitGenerator.createSplit(), check if split only spans one block or multiple blocks. 3.If split spans just one block, then using the array index (index = offset/blockSize), get the corresponding host having the blockLocation 4.If the split spans multiple blocks, then get all hosts that have at least 80% of the max of total data in split hosted by any host. 5.add the split to a list of splits Issue with Current Logic = Dependency on FileSystem API’s logic for block location calculations. It returns an array and we need to rely on FileSystem to make all blocks of same size if we want to directly access a block from the array. What is the Fix = 1a.get the file blocks from FileSystem.getFileBlockLocations() which returns an array of BlockLocation 1b.convert the array into a tree map offset, BlockLocation and return it through getLocationsWithOffSet() 2.In SplitGenerator.createSplit(), check if split only spans one block or multiple blocks. 3.If split spans just one block, then using Tree.floorEntry(key), get the highest entry smaller than offset for the split and get the corresponding host. 4a.If the split spans multiple blocks, get a submap, which contains all entries containing blockLocations from the offset to offset + length 4b.get all hosts that have at least 80% of the max of total data in split hosted by any host. 5.add the split to a list of splits What are the major changes in logic == 1. store BlockLocations in a Map instead of an array 2. Call SHIMS.getLocationsWithOffSet() instead of getLocations() 3. one block case is checked by if(offset + length = start.getOffset() + start.getLength()) instead of if((offset % blockSize) + length = blockSize) What is the affect on Complexity (Big O) = 1. We add a O(n) loop to build a TreeMap from an array but its a one time cost and would not be called for each split 2. In case of one block case, we can get the block in O(logn) worst case which was O(1) before 3. Getting the submap is O(logn) 4. In case of multiple block case, building the list of hosts is O(m) which was O(n) m n as previously we were iterating over all the block locations but now we are only iterating only blocks that belong to that range go offsets that we need. What are the benefits of the change == 1. With this fix, we do not depend on the blockLocations returned by FileSystem to figure out the block corresponding to the offset and blockSize 2. Also, it is not necessary that block lengths is same for all blocks for all FileSystems 3. Previously we were using blockSize for one block case and block.length for multiple block case, which is not the case now. We figure out the block depending upon the actual length and offset of the block -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Debugging Hive frontend in eclipse
Hey Thejas, It seems that hive --debug is also not smooth. I ran the script build/dist/bin/hive --debug after clean build It gives error ERROR: Cannot load this JVM TI agent twice, check your java command line for duplicate jdwp options. Error occurred during initialization of VM agent library failed to init: jdwp Am I missing something here? -- Saumitra S. Shahapure On Wed, Sep 10, 2014 at 10:49 PM, Thejas Nair the...@hortonworks.com wrote: There have been some threads on how to get around the metastore initialization issue. But another easy way to work around this issue is to build hive, and then run hive --debug . Hive will wait for the debugger to connect on port 8000. You can configure eclipse debugging to connect to that port. On Wed, Sep 10, 2014 at 7:09 AM, Saumitra Shahapure saumitra.offic...@gmail.com wrote: Hello, I am new to Hive dev community, I am trying to debug Hive frontend (till semantic analysis) from eclipse. I want to start from Main in CliDriver. I don't want to go debugging till execution and don't care if it fails. As described in https://cwiki.apache.org/confluence/display/Hive/DeveloperGuide#DeveloperGuide-DebuggingHiveCode , I am able to debug TestCliDriver from eclipse and several tests pass showing that metastore is working fine. The problem is that, when I start debugging from CliDriver, metastore is not initialized properly. So semantic analysis fails at getMetadata call . Is any additional setup required to get metadata work properly from eclipse debugging? -- Saumitra S. Shahapure -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Review Request 25178: Add DROP TABLE PURGE
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25178/#review52911 --- metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java https://reviews.apache.org/r/25178/#comment92121 Nit: should we just pass ifPurge as boolean to the method unless envContext is also used for something else. This seemingly makes the called method cleaner. - Xuefu Zhang On Sept. 9, 2014, 6:51 p.m., david seraf wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25178/ --- (Updated Sept. 9, 2014, 6:51 p.m.) Review request for hive and Xuefu Zhang. Repository: hive-git Description --- Add PURGE option to DROP TABLE command to skip saving table data to the trash Diffs - hcatalog/core/src/test/java/org/apache/hive/hcatalog/mapreduce/TestHCatPartitionPublish.java be7134f hcatalog/webhcat/svr/src/test/java/org/apache/hive/hcatalog/templeton/tool/TestTempletonUtils.java af952f2 itests/hive-unit/src/test/java/org/apache/hive/jdbc/miniHS2/TestHiveServer2.java da51a55 metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 9489949 metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java a94a7a3 metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreFsImpl.java cff0718 metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java cbdba30 metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreFS.java a141793 metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java 613b709 ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java cd017d8 ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java e387b8f ql/src/java/org/apache/hadoop/hive/ql/metadata/SessionHiveMetaStoreClient.java 4cf98d8 ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java f31a409 ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g 32db0c7 ql/src/java/org/apache/hadoop/hive/ql/plan/DropTableDesc.java ba30e1f ql/src/test/org/apache/hadoop/hive/ql/metadata/TestHive.java 406aae9 ql/src/test/org/apache/hadoop/hive/ql/metadata/TestHiveRemote.java 1a5ba87 ql/src/test/queries/clientpositive/drop_table_purge.q PRE-CREATION ql/src/test/results/clientpositive/drop_table_purge.q.out PRE-CREATION Diff: https://reviews.apache.org/r/25178/diff/ Testing --- added code test and added QL test. Tests passed in CI, but other, unrelated tests failed. Thanks, david seraf
[jira] [Updated] (HIVE-8037) CBO: Refactor Join condn gen code, loosen restrictions on Join Conditions
[ https://issues.apache.org/jira/browse/HIVE-8037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Laljo John Pullokkaran updated HIVE-8037: - Attachment: HIVE-8037.1.patch CBO: Refactor Join condn gen code, loosen restrictions on Join Conditions - Key: HIVE-8037 URL: https://issues.apache.org/jira/browse/HIVE-8037 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Laljo John Pullokkaran Assignee: Laljo John Pullokkaran Attachments: HIVE-8037.1.patch, HIVE-8037.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Debugging Hive frontend in eclipse
I have never seen that before. Maybe you have some env setting (hadoop or hive) that is messing with it ? Edit the shell script to print the 'java' command it is running and see if you can figure out what is wrong. On Wed, Sep 10, 2014 at 11:26 AM, Saumitra Shahapure saumitra.offic...@gmail.com wrote: Hey Thejas, It seems that hive --debug is also not smooth. I ran the script build/dist/bin/hive --debug after clean build It gives error ERROR: Cannot load this JVM TI agent twice, check your java command line for duplicate jdwp options. Error occurred during initialization of VM agent library failed to init: jdwp Am I missing something here? -- Saumitra S. Shahapure On Wed, Sep 10, 2014 at 10:49 PM, Thejas Nair the...@hortonworks.com wrote: There have been some threads on how to get around the metastore initialization issue. But another easy way to work around this issue is to build hive, and then run hive --debug . Hive will wait for the debugger to connect on port 8000. You can configure eclipse debugging to connect to that port. On Wed, Sep 10, 2014 at 7:09 AM, Saumitra Shahapure saumitra.offic...@gmail.com wrote: Hello, I am new to Hive dev community, I am trying to debug Hive frontend (till semantic analysis) from eclipse. I want to start from Main in CliDriver. I don't want to go debugging till execution and don't care if it fails. As described in https://cwiki.apache.org/confluence/display/Hive/DeveloperGuide#DeveloperGuide-DebuggingHiveCode , I am able to debug TestCliDriver from eclipse and several tests pass showing that metastore is working fine. The problem is that, when I start debugging from CliDriver, metastore is not initialized properly. So semantic analysis fails at getMetadata call . Is any additional setup required to get metadata work properly from eclipse debugging? -- Saumitra S. Shahapure -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
[jira] [Commented] (HIVE-7100) Users of hive should be able to specify skipTrash when dropping tables.
[ https://issues.apache.org/jira/browse/HIVE-7100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14128907#comment-14128907 ] Xuefu Zhang commented on HIVE-7100: --- Patch looks good to me. I have a minor comment on RB. BTW, could you please fill in the review request a title reflecting this JIRA and also JIRA number for Bugs field for easy navigation? Users of hive should be able to specify skipTrash when dropping tables. --- Key: HIVE-7100 URL: https://issues.apache.org/jira/browse/HIVE-7100 Project: Hive Issue Type: Improvement Affects Versions: 0.13.0 Reporter: Ravi Prakash Assignee: Jayesh Attachments: HIVE-7100.1.patch, HIVE-7100.2.patch, HIVE-7100.3.patch, HIVE-7100.4.patch, HIVE-7100.5.patch, HIVE-7100.8.patch, HIVE-7100.patch Users of our clusters are often running up against their quota limits because of Hive tables. When they drop tables, they have to then manually delete the files from HDFS using skipTrash. This is cumbersome and unnecessary. We should enable users to skipTrash directly when dropping tables. We should also be able to provide this functionality without polluting SQL syntax. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7936) Support for handling Thrift Union types
[ https://issues.apache.org/jira/browse/HIVE-7936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14128909#comment-14128909 ] Hive QA commented on HIVE-7936: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12667771/HIVE-7936.2.patch {color:red}ERROR:{color} -1 due to 22 failed/errored test(s), 6193 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_case_sensitivity org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_columnarserde_create_shortcut org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input17 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input5 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input_columnarserde org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input_dynamicserde org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input_lazyserde org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input_testxpath org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input_testxpath2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input_testxpath3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input_testxpath4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_thrift org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_case_thrift org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_coalesce org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_isnull_isnotnull org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_size org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union21 org.apache.hadoop.hive.cli.TestContribCliDriver.testCliDriver_udf_example_arraymapstruct org.apache.hadoop.hive.ql.parse.TestParse.testParse_case_sensitivity org.apache.hadoop.hive.ql.parse.TestParse.testParse_input5 org.apache.hadoop.hive.ql.parse.TestParse.testParse_input_testxpath org.apache.hadoop.hive.ql.parse.TestParse.testParse_input_testxpath2 {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/727/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/727/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-727/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 22 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12667771 Support for handling Thrift Union types Key: HIVE-7936 URL: https://issues.apache.org/jira/browse/HIVE-7936 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 0.13.1 Reporter: Suma Shivaprasad Assignee: Suma Shivaprasad Fix For: 0.14.0 Attachments: HIVE-7936.1.patch, HIVE-7936.2.patch, HIVE-7936.patch, complex.seq Currently hive does not support thrift unions through ThriftDeserializer. Need to add support for the same -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-7100) Users of hive should be able to specify skipTrash when dropping tables.
[ https://issues.apache.org/jira/browse/HIVE-7100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang reassigned HIVE-7100: - Assignee: david serafini (was: Jayesh) Users of hive should be able to specify skipTrash when dropping tables. --- Key: HIVE-7100 URL: https://issues.apache.org/jira/browse/HIVE-7100 Project: Hive Issue Type: Improvement Affects Versions: 0.13.0 Reporter: Ravi Prakash Assignee: david serafini Attachments: HIVE-7100.1.patch, HIVE-7100.2.patch, HIVE-7100.3.patch, HIVE-7100.4.patch, HIVE-7100.5.patch, HIVE-7100.8.patch, HIVE-7100.patch Users of our clusters are often running up against their quota limits because of Hive tables. When they drop tables, they have to then manually delete the files from HDFS using skipTrash. This is cumbersome and unnecessary. We should enable users to skipTrash directly when dropping tables. We should also be able to provide this functionality without polluting SQL syntax. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8037) CBO: Refactor Join condn gen code, loosen restrictions on Join Conditions
[ https://issues.apache.org/jira/browse/HIVE-8037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Laljo John Pullokkaran updated HIVE-8037: - Attachment: HIVE-8037.2.patch CBO: Refactor Join condn gen code, loosen restrictions on Join Conditions - Key: HIVE-8037 URL: https://issues.apache.org/jira/browse/HIVE-8037 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Laljo John Pullokkaran Assignee: Laljo John Pullokkaran Attachments: HIVE-8037.1.patch, HIVE-8037.2.patch, HIVE-8037.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-8039) [CBO] Handle repeated alias
Ashutosh Chauhan created HIVE-8039: -- Summary: [CBO] Handle repeated alias Key: HIVE-8039 URL: https://issues.apache.org/jira/browse/HIVE-8039 Project: Hive Issue Type: Bug Components: CBO Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Relax condition in CBO of not allowing repeated alias. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Review Request 25513: Handle repeated alias.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25513/ --- Review request for hive and John Pullokkaran. Bugs: HIVE-8039 https://issues.apache.org/jira/browse/HIVE-8039 Repository: hive-git Description --- Handle repeated alias. Diffs - ql/src/java/org/apache/hadoop/hive/ql/parse/RowResolver.java 22295a1 Diff: https://reviews.apache.org/r/25513/diff/ Testing --- limit_pushdown.q Thanks, Ashutosh Chauhan
[jira] [Updated] (HIVE-8037) CBO: Refactor Join condn gen code, loosen restrictions on Join Conditions
[ https://issues.apache.org/jira/browse/HIVE-8037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Laljo John Pullokkaran updated HIVE-8037: - Attachment: (was: HIVE-8037.3.patch) CBO: Refactor Join condn gen code, loosen restrictions on Join Conditions - Key: HIVE-8037 URL: https://issues.apache.org/jira/browse/HIVE-8037 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Laljo John Pullokkaran Assignee: Laljo John Pullokkaran Attachments: HIVE-8037.1.patch, HIVE-8037.2.patch, HIVE-8037.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8037) CBO: Refactor Join condn gen code, loosen restrictions on Join Conditions
[ https://issues.apache.org/jira/browse/HIVE-8037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Laljo John Pullokkaran updated HIVE-8037: - Attachment: HIVE-8037.3.patch CBO: Refactor Join condn gen code, loosen restrictions on Join Conditions - Key: HIVE-8037 URL: https://issues.apache.org/jira/browse/HIVE-8037 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Laljo John Pullokkaran Assignee: Laljo John Pullokkaran Attachments: HIVE-8037.1.patch, HIVE-8037.2.patch, HIVE-8037.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7859) Tune zlib compression in ORC to account for the encoding strategy
[ https://issues.apache.org/jira/browse/HIVE-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14128942#comment-14128942 ] Prasanth J commented on HIVE-7859: -- I like the patch. LGTM +1. Pending unit test runs. Under COMPRESSION strategy have you tried using zlib.BEST_COMPRESSION instead of zlib.DEFAULT_COMPRESSION to see the changes to space vs time? Tune zlib compression in ORC to account for the encoding strategy - Key: HIVE-7859 URL: https://issues.apache.org/jira/browse/HIVE-7859 Project: Hive Issue Type: Bug Components: File Formats Reporter: Gopal V Assignee: Gopal V Attachments: HIVE-7859.1.patch, HIVE-7859.2.patch Currently ORC Zlib is slow because several compression strategies ZLib uses is already done by ORC in itself (dictionary, RLE, bit-packing). We need to pick between Z_FILTERED, Z_HUFFMAN_ONLY, Z_RLE, Z_FIXED and Z_DEFAULT_STRATEGY according to column stream type. For instance an RLE_V2 stream could a use Z_FILTERED compression without invoking the rest of the strategies. The string streams can use Z_FIXED compression strategies and so on. The core limitation to stick to retain compatibility with the default decompressor, so that these are automatically backward compatible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)