[jira] [Updated] (HIVE-5760) Add vectorized support for CHAR/VARCHAR data types
[ https://issues.apache.org/jira/browse/HIVE-5760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-5760: --- Status: In Progress (was: Patch Available) Add vectorized support for CHAR/VARCHAR data types -- Key: HIVE-5760 URL: https://issues.apache.org/jira/browse/HIVE-5760 Project: Hive Issue Type: Sub-task Reporter: Eric Hanson Assignee: Matt McCline Attachments: HIVE-5760.1.patch, HIVE-5760.2.patch, HIVE-5760.3.patch, HIVE-5760.4.patch, HIVE-5760.5.patch Add support to allow queries referencing VARCHAR columns and expression results to run efficiently in vectorized mode. This should re-use the code for the STRING type to the extent possible and beneficial. Include unit tests and end-to-end tests. Consider re-using or extending existing end-to-end tests for vectorized string operations. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 15449: session/operation timeout for hiveserver2
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/15449/#review51951 --- common/src/java/org/apache/hadoop/hive/conf/HiveConf.java https://reviews.apache.org/r/15449/#comment90645 Shouldn't this have a TimeValidator? common/src/java/org/apache/hadoop/hive/conf/HiveConf.java https://reviews.apache.org/r/15449/#comment90646 Again, no TimeValidator. common/src/java/org/apache/hadoop/hive/conf/HiveConf.java https://reviews.apache.org/r/15449/#comment90647 No TimeValidator. common/src/java/org/apache/hadoop/hive/conf/HiveConf.java https://reviews.apache.org/r/15449/#comment90648 No TimeValidator. common/src/java/org/apache/hadoop/hive/conf/HiveConf.java https://reviews.apache.org/r/15449/#comment90649 Lack of TimeValidator here is deliberate, right? - Lefty Leverenz On Aug. 29, 2014, 9:05 a.m., Navis Ryu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/15449/ --- (Updated Aug. 29, 2014, 9:05 a.m.) Review request for hive. Bugs: HIVE-5799 https://issues.apache.org/jira/browse/HIVE-5799 Repository: hive-git Description --- Need some timeout facility for preventing resource leakages from instable or bad clients. Diffs - common/src/java/org/apache/hadoop/hive/ant/GenHiveTemplate.java 4293b7c common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 74bb863 common/src/java/org/apache/hadoop/hive/conf/Validator.java cea9c41 itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestRetryingHMSHandler.java 39e7005 itests/hive-unit/src/test/java/org/apache/hive/jdbc/miniHS2/TestHiveServer2SessionTimeout.java PRE-CREATION metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 9e3481a metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 4e76236 metastore/src/java/org/apache/hadoop/hive/metastore/RetryingHMSHandler.java 84e6dcd metastore/src/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java 063dee6 metastore/src/test/org/apache/hadoop/hive/metastore/txn/TestTxnHandler.java 8287c60 ql/src/java/org/apache/hadoop/hive/ql/exec/AutoProgressor.java d7323cb ql/src/java/org/apache/hadoop/hive/ql/exec/Heartbeater.java 7fdb4e7 ql/src/java/org/apache/hadoop/hive/ql/exec/ScriptOperator.java 5b857e2 ql/src/java/org/apache/hadoop/hive/ql/exec/UDTFOperator.java afd7bcf ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 70047a2 ql/src/java/org/apache/hadoop/hive/ql/exec/mr/HadoopJobExecHelper.java eb2851b ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java ebe9f92 ql/src/java/org/apache/hadoop/hive/ql/lockmgr/EmbeddedLockManager.java 11434a0 ql/src/java/org/apache/hadoop/hive/ql/lockmgr/zookeeper/ZooKeeperHiveLockManager.java 46044d0 ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsAggregator.java f636cff ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsPublisher.java db62721 ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java 3211759 ql/src/test/org/apache/hadoop/hive/ql/txn/compactor/TestInitiator.java f34b5ad ql/src/test/results/clientnegative/set_hiveconf_validation2.q.out 33f9360 service/src/java/org/apache/hadoop/hive/service/HiveServer.java 32729f2 service/src/java/org/apache/hive/service/cli/CLIService.java ff5de4a service/src/java/org/apache/hive/service/cli/OperationState.java 3e15f0c service/src/java/org/apache/hive/service/cli/operation/Operation.java 0d6436e service/src/java/org/apache/hive/service/cli/operation/OperationManager.java 2867301 service/src/java/org/apache/hive/service/cli/session/HiveSession.java 270e4a6 service/src/java/org/apache/hive/service/cli/session/HiveSessionBase.java 84e1c7e service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java 4e5f595 service/src/java/org/apache/hive/service/cli/session/HiveSessionImplwithUGI.java 7668904 service/src/java/org/apache/hive/service/cli/session/SessionManager.java 17c1c7b service/src/java/org/apache/hive/service/cli/thrift/ThriftCLIService.java 86ed4b4 service/src/java/org/apache/hive/service/cli/thrift/ThriftHttpCLIService.java 21d1563 service/src/test/org/apache/hive/service/cli/CLIServiceTest.java d01e819 Diff: https://reviews.apache.org/r/15449/diff/ Testing --- Confirmed in the local environment. Thanks, Navis Ryu
[jira] [Updated] (HIVE-5760) Add vectorized support for CHAR/VARCHAR data types
[ https://issues.apache.org/jira/browse/HIVE-5760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-5760: --- Status: Patch Available (was: In Progress) Add vectorized support for CHAR/VARCHAR data types -- Key: HIVE-5760 URL: https://issues.apache.org/jira/browse/HIVE-5760 Project: Hive Issue Type: Sub-task Reporter: Eric Hanson Assignee: Matt McCline Attachments: HIVE-5760.1.patch, HIVE-5760.2.patch, HIVE-5760.3.patch, HIVE-5760.4.patch, HIVE-5760.5.patch, HIVE-5760.7.patch Add support to allow queries referencing VARCHAR columns and expression results to run efficiently in vectorized mode. This should re-use the code for the STRING type to the extent possible and beneficial. Include unit tests and end-to-end tests. Consider re-using or extending existing end-to-end tests for vectorized string operations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-5760) Add vectorized support for CHAR/VARCHAR data types
[ https://issues.apache.org/jira/browse/HIVE-5760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-5760: --- Attachment: HIVE-5760.7.patch Add vectorized support for CHAR/VARCHAR data types -- Key: HIVE-5760 URL: https://issues.apache.org/jira/browse/HIVE-5760 Project: Hive Issue Type: Sub-task Reporter: Eric Hanson Assignee: Matt McCline Attachments: HIVE-5760.1.patch, HIVE-5760.2.patch, HIVE-5760.3.patch, HIVE-5760.4.patch, HIVE-5760.5.patch, HIVE-5760.7.patch Add support to allow queries referencing VARCHAR columns and expression results to run efficiently in vectorized mode. This should re-use the code for the STRING type to the extent possible and beneficial. Include unit tests and end-to-end tests. Consider re-using or extending existing end-to-end tests for vectorized string operations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-5760) Add vectorized support for CHAR/VARCHAR data types
[ https://issues.apache.org/jira/browse/HIVE-5760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116664#comment-14116664 ] Hive QA commented on HIVE-5760: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12665598/HIVE-5760.7.patch {color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 6156 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_between_in org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_coalesce org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_elt org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorized_casts org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorized_date_funcs org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorized_timestamp_funcs org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorized_timestamp_funcs {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/581/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/581/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-581/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 7 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12665598 Add vectorized support for CHAR/VARCHAR data types -- Key: HIVE-5760 URL: https://issues.apache.org/jira/browse/HIVE-5760 Project: Hive Issue Type: Sub-task Reporter: Eric Hanson Assignee: Matt McCline Attachments: HIVE-5760.1.patch, HIVE-5760.2.patch, HIVE-5760.3.patch, HIVE-5760.4.patch, HIVE-5760.5.patch, HIVE-5760.7.patch Add support to allow queries referencing VARCHAR columns and expression results to run efficiently in vectorized mode. This should re-use the code for the STRING type to the extent possible and beneficial. Include unit tests and end-to-end tests. Consider re-using or extending existing end-to-end tests for vectorized string operations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7925) extend current partition status extrapolation to support all DBs
[ https://issues.apache.org/jira/browse/HIVE-7925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116833#comment-14116833 ] Ashutosh Chauhan commented on HIVE-7925: +1 extend current partition status extrapolation to support all DBs Key: HIVE-7925 URL: https://issues.apache.org/jira/browse/HIVE-7925 Project: Hive Issue Type: Improvement Reporter: pengcheng xiong Assignee: pengcheng xiong Priority: Minor Attachments: HIVE-7925.1.patch extend current partition status extrapolation only supports Derby. That is why we got errors such as https://hortonworks.jira.com/browse/BUG-21983 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7921) Fix confusing dead assignment in return statement (JavaHiveVarcharObjectInspector)
[ https://issues.apache.org/jira/browse/HIVE-7921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szehon Ho updated HIVE-7921: Resolution: Fixed Fix Version/s: 0.14.0 Status: Resolved (was: Patch Available) Committed to trunk. Thanks Lars for your contribution. Fix confusing dead assignment in return statement (JavaHiveVarcharObjectInspector) -- Key: HIVE-7921 URL: https://issues.apache.org/jira/browse/HIVE-7921 Project: Hive Issue Type: Improvement Reporter: Lars Francke Assignee: Lars Francke Priority: Minor Fix For: 0.14.0 Attachments: HIVE-7921.1.patch There are multiple instances of something like this {{return o = new HiveVarchar(value, getMaxLength());}} in this class. That's not only confusing but also useless as it doesn't do anything. I've removed those assignments and cleaned up the class a bit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7923) populate stats for test tables
[ https://issues.apache.org/jira/browse/HIVE-7923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7923: --- Status: Open (was: Patch Available) [~pxiong] We also need basic stats (# of rows) which are collected via {code} analyze table T compute statistics; {code} I think you also need to include this for all tables. Also, did you analyze why we cant compute statistics for thrift and primitive tables? populate stats for test tables -- Key: HIVE-7923 URL: https://issues.apache.org/jira/browse/HIVE-7923 Project: Hive Issue Type: Improvement Reporter: pengcheng xiong Assignee: pengcheng xiong Priority: Minor Attachments: HIVE-7923.1.patch Current q_test only generates tables, e.g., src only but does not create status. All the test cases will fail in CBO because CBO depends on the status. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-6123) Implement checkstyle in maven
[ https://issues.apache.org/jira/browse/HIVE-6123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116874#comment-14116874 ] Ashutosh Chauhan commented on HIVE-6123: +1 Implement checkstyle in maven - Key: HIVE-6123 URL: https://issues.apache.org/jira/browse/HIVE-6123 Project: Hive Issue Type: Sub-task Reporter: Brock Noland Assignee: Lars Francke Attachments: HIVE-6123.1.patch, HIVE-6123.2.patch ant had a checkstyle target, we should do something similar for maven -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-6123) Implement checkstyle in maven
[ https://issues.apache.org/jira/browse/HIVE-6123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116887#comment-14116887 ] Lars Francke commented on HIVE-6123: Thanks Ashutosh. We could also easily run Checkstyle during every build but that'd make the build slightly longer. It'd be great to extend the Jenkins bot to run checkstyle and to diff previous checkstyle results to new ones and do a -1 when new issues are introduced. I think Hadoop or HBase do this. Probably better to do this in a new JIRA Implement checkstyle in maven - Key: HIVE-6123 URL: https://issues.apache.org/jira/browse/HIVE-6123 Project: Hive Issue Type: Sub-task Reporter: Brock Noland Assignee: Lars Francke Attachments: HIVE-6123.1.patch, HIVE-6123.2.patch ant had a checkstyle target, we should do something similar for maven -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7543) Cleanup of org.apache.hive.service.auth package
[ https://issues.apache.org/jira/browse/HIVE-7543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7543: --- Status: Open (was: Patch Available) I think we lost the chance to name {{PasswdAuthenticationProvider}} interface or its methods correctly. Its public interface which is released for a while now. Seems like folks are using it as well (HIVE-4778) So I will suggest to undo any changes to it to avoid backward-compat issues. Other changes look good. Cleanup of org.apache.hive.service.auth package --- Key: HIVE-7543 URL: https://issues.apache.org/jira/browse/HIVE-7543 Project: Hive Issue Type: Improvement Components: Authentication Reporter: Lars Francke Assignee: Lars Francke Priority: Minor Attachments: HIVE-7543.1.patch While trying to understand Hive's Thrift and Auth code I found some inconsistencies and complaints using Hive's own Checkstyle rules. My IDE and Sonar complained as well so I've taken the opportunity to clean this package up. I'll follow up with a list of important changes tomorrow. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7543) Cleanup of org.apache.hive.service.auth package
[ https://issues.apache.org/jira/browse/HIVE-7543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116924#comment-14116924 ] Lars Francke commented on HIVE-7543: Thanks for your comments and thank you very much for taking the time to look at this, I know these clean up patches can be annoying. While it pains me to leave the {{PasswdAuthenticationProvider}} like it is I agree that it'd break backwards-compatibility and probably isn't worth it. Maybe I'll try another time :) I'll provide a new patch hopefully this week. Cleanup of org.apache.hive.service.auth package --- Key: HIVE-7543 URL: https://issues.apache.org/jira/browse/HIVE-7543 Project: Hive Issue Type: Improvement Components: Authentication Reporter: Lars Francke Assignee: Lars Francke Priority: Minor Attachments: HIVE-7543.1.patch While trying to understand Hive's Thrift and Auth code I found some inconsistencies and complaints using Hive's own Checkstyle rules. My IDE and Sonar complained as well so I've taken the opportunity to clean this package up. I'll follow up with a list of important changes tomorrow. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7753) Same operand appears on both sides of in DataType#compareByteArray()
[ https://issues.apache.org/jira/browse/HIVE-7753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116939#comment-14116939 ] Ashutosh Chauhan commented on HIVE-7753: +1 Same operand appears on both sides of in DataType#compareByteArray() -- Key: HIVE-7753 URL: https://issues.apache.org/jira/browse/HIVE-7753 Project: Hive Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu Attachments: hive-7753-v1.txt Around line 227: {code} if (o1[i] o1[i]) { return 1; {code} The above comparison would never be true. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7683) Test TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx is still failing
[ https://issues.apache.org/jira/browse/HIVE-7683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7683: --- Resolution: Fixed Fix Version/s: 0.14.0 Status: Resolved (was: Patch Available) Committed to trunk. Thanks, Navis! Test TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx is still failing -- Key: HIVE-7683 URL: https://issues.apache.org/jira/browse/HIVE-7683 Project: Hive Issue Type: Bug Reporter: Navis Assignee: Navis Priority: Minor Fix For: 0.14.0 Attachments: HIVE-7683.1.patch.txt NO PRECOMMIT TESTS As commented in HIVE-7415, counter stat fails sometimes in the test (see http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/257/testReport/org.apache.hadoop.hive.cli/TestMinimrCliDriver/testCliDriver_ql_rewrite_gbtoidx). Let's try other stat collector and see the test result. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7645) Hive CompactorMR job set NUM_BUCKETS mistake
[ https://issues.apache.org/jira/browse/HIVE-7645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116943#comment-14116943 ] Ashutosh Chauhan commented on HIVE-7645: +1 Hive CompactorMR job set NUM_BUCKETS mistake Key: HIVE-7645 URL: https://issues.apache.org/jira/browse/HIVE-7645 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 0.13.1 Reporter: Xiaoyu Wang Attachments: HIVE-7645.patch code: job.setInt(NUM_BUCKETS, sd.getBucketColsSize()); should change to: job.setInt(NUM_BUCKETS, sd.getNumBuckets()); -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7923) populate stats for test tables
[ https://issues.apache.org/jira/browse/HIVE-7923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116944#comment-14116944 ] pengcheng xiong commented on HIVE-7923: --- [~ashutoshc] The src_thrift table is aintint from deserializer astring string from deserializer lintarrayint from deserializer lstring arraystring from deserializer lintstring arraystructmyint:int,mystring:string,underscore_int:int from deserializer mstringstring mapstring,string from deserializer and when i run query: ANALYZE TABLE src_thrift COMPUTE STATISTICS it threw exception FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.ColumnStatsTask The main reason is that in ObjectStore.java, validateTableCols function, table.getSd().getCols() returns null. The primitive table was there after the data/scripts/q_test_init.sql is executed. But the primitive table and (dest1,2,3,4 tables) disappeared right before I run any q test. The partition column status of primitive table are there. I could not find the code where primitive table is dropped/deleted. populate stats for test tables -- Key: HIVE-7923 URL: https://issues.apache.org/jira/browse/HIVE-7923 Project: Hive Issue Type: Improvement Reporter: pengcheng xiong Assignee: pengcheng xiong Priority: Minor Attachments: HIVE-7923.1.patch Current q_test only generates tables, e.g., src only but does not create status. All the test cases will fail in CBO because CBO depends on the status. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7405) Vectorize GROUP BY on the Reduce-Side (Part 1 – Basic)
[ https://issues.apache.org/jira/browse/HIVE-7405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-7405: --- Status: In Progress (was: Patch Available) Vectorize GROUP BY on the Reduce-Side (Part 1 – Basic) -- Key: HIVE-7405 URL: https://issues.apache.org/jira/browse/HIVE-7405 Project: Hive Issue Type: Sub-task Reporter: Matt McCline Assignee: Matt McCline Attachments: HIVE-7405.1.patch, HIVE-7405.2.patch, HIVE-7405.3.patch, HIVE-7405.4.patch, HIVE-7405.5.patch, HIVE-7405.6.patch, HIVE-7405.7.patch, HIVE-7405.8.patch, HIVE-7405.9.patch, HIVE-7405.91.patch, HIVE-7405.92.patch, HIVE-7405.93.patch, HIVE-7405.94.patch, HIVE-7405.95.patch Vectorize the basic case that does not have any count distinct aggregation. Add a 4th processing mode in VectorGroupByOperator for reduce where each input VectorizedRowBatch has only values for one key at a time. Thus, the values in the batch can be aggregated quickly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7599) NPE in MergeTask#main() when -format is absent
[ https://issues.apache.org/jira/browse/HIVE-7599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116945#comment-14116945 ] Ashutosh Chauhan commented on HIVE-7599: +1 NPE in MergeTask#main() when -format is absent -- Key: HIVE-7599 URL: https://issues.apache.org/jira/browse/HIVE-7599 Project: Hive Issue Type: Bug Reporter: Ted Yu Priority: Minor Attachments: HIVE-7599.patch When '-format' is absent from commandline, the following call would result in NPE (format is initialized to null): {code} if (format.equals(rcfile)) { mergeWork = new MergeWork(inputPaths, new Path(outputDir), RCFileInputFormat.class); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7405) Vectorize GROUP BY on the Reduce-Side (Part 1 – Basic)
[ https://issues.apache.org/jira/browse/HIVE-7405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-7405: --- Status: Patch Available (was: In Progress) Vectorize GROUP BY on the Reduce-Side (Part 1 – Basic) -- Key: HIVE-7405 URL: https://issues.apache.org/jira/browse/HIVE-7405 Project: Hive Issue Type: Sub-task Reporter: Matt McCline Assignee: Matt McCline Attachments: HIVE-7405.1.patch, HIVE-7405.2.patch, HIVE-7405.3.patch, HIVE-7405.4.patch, HIVE-7405.5.patch, HIVE-7405.6.patch, HIVE-7405.7.patch, HIVE-7405.8.patch, HIVE-7405.9.patch, HIVE-7405.91.patch, HIVE-7405.92.patch, HIVE-7405.93.patch, HIVE-7405.94.patch, HIVE-7405.95.patch, HIVE-7405.96.patch Vectorize the basic case that does not have any count distinct aggregation. Add a 4th processing mode in VectorGroupByOperator for reduce where each input VectorizedRowBatch has only values for one key at a time. Thus, the values in the batch can be aggregated quickly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7405) Vectorize GROUP BY on the Reduce-Side (Part 1 – Basic)
[ https://issues.apache.org/jira/browse/HIVE-7405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-7405: --- Attachment: HIVE-7405.96.patch tez_join_hash and dynpart_sort_opt_vectorization do not fail on my laptop. Re-submit same patch again... Vectorize GROUP BY on the Reduce-Side (Part 1 – Basic) -- Key: HIVE-7405 URL: https://issues.apache.org/jira/browse/HIVE-7405 Project: Hive Issue Type: Sub-task Reporter: Matt McCline Assignee: Matt McCline Attachments: HIVE-7405.1.patch, HIVE-7405.2.patch, HIVE-7405.3.patch, HIVE-7405.4.patch, HIVE-7405.5.patch, HIVE-7405.6.patch, HIVE-7405.7.patch, HIVE-7405.8.patch, HIVE-7405.9.patch, HIVE-7405.91.patch, HIVE-7405.92.patch, HIVE-7405.93.patch, HIVE-7405.94.patch, HIVE-7405.95.patch, HIVE-7405.96.patch Vectorize the basic case that does not have any count distinct aggregation. Add a 4th processing mode in VectorGroupByOperator for reduce where each input VectorizedRowBatch has only values for one key at a time. Thus, the values in the batch can be aggregated quickly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7923) populate stats for test tables
[ https://issues.apache.org/jira/browse/HIVE-7923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116947#comment-14116947 ] pengcheng xiong commented on HIVE-7923: --- sorry, it should be ANALYZE TABLE src_thrift COMPUTE STATISTICS FOR COLUMNS aint,astring; rather than ANALYZE TABLE src_thrift COMPUTE STATISTICS; populate stats for test tables -- Key: HIVE-7923 URL: https://issues.apache.org/jira/browse/HIVE-7923 Project: Hive Issue Type: Improvement Reporter: pengcheng xiong Assignee: pengcheng xiong Priority: Minor Attachments: HIVE-7923.1.patch Current q_test only generates tables, e.g., src only but does not create status. All the test cases will fail in CBO because CBO depends on the status. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7531) auxpath parameter does not handle paths relative to current working directory.
[ https://issues.apache.org/jira/browse/HIVE-7531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7531: --- Resolution: Fixed Fix Version/s: 0.14.0 Status: Resolved (was: Patch Available) Committed to trunk. Thanks, Abhishek! auxpath parameter does not handle paths relative to current working directory. --- Key: HIVE-7531 URL: https://issues.apache.org/jira/browse/HIVE-7531 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.13.1 Reporter: Abhishek Agarwal Assignee: Abhishek Agarwal Fix For: 0.14.0 Attachments: HIVE-7531.patch NO PRECOMMIT TESTS If I were to specify the auxpath value as a relative path {noformat} hive --auxpath lib {noformat} I get the following error {noformat} java.lang.IllegalArgumentException: Wrong FS: file://lib/Test.jar, expected: file:/// at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:625) at org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:69) at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:464) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:380) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:231) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:183) at org.apache.hadoop.mapred.JobClient.copyRemoteFiles(JobClient.java:715) at org.apache.hadoop.mapred.JobClient.copyAndConfigureFiles(JobClient.java:818) at org.apache.hadoop.mapred.JobClient.copyAndConfigureFiles(JobClient.java:743) at org.apache.hadoop.mapred.JobClient.access$400(JobClient.java:174) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:960) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:945) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:945) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:919) at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:420){noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7399) Timestamp type is not copied by ObjectInspectorUtils.copyToStandardObject
[ https://issues.apache.org/jira/browse/HIVE-7399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116951#comment-14116951 ] Ashutosh Chauhan commented on HIVE-7399: +1 Timestamp type is not copied by ObjectInspectorUtils.copyToStandardObject - Key: HIVE-7399 URL: https://issues.apache.org/jira/browse/HIVE-7399 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Navis Assignee: Navis Attachments: HIVE-7399.1.patch.txt, HIVE-7399.2.patch.txt, HIVE-7399.3.patch.txt Most of primitive types are non-mutable, so copyToStandardObject retuns input object as-is. But for Timestamp objects, it's used something like wrapper and changed value by hive. copyToStandardObject should real copy for them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7352) Queries without tables fail under Tez
[ https://issues.apache.org/jira/browse/HIVE-7352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7352: --- Status: Open (was: Patch Available) Failed tests needs to be looked at. Queries without tables fail under Tez - Key: HIVE-7352 URL: https://issues.apache.org/jira/browse/HIVE-7352 Project: Hive Issue Type: Bug Components: Tez Affects Versions: 0.13.1, 0.13.0 Reporter: Craig Condit Assignee: Gunther Hagleitner Attachments: HIVE-7352.1.patch.txt, HIVE-7352.2.patch Hive 0.13.0 added support for queries that do not reference tables (such as 'SELECT 1'). These queries fail under Tez: {noformat} Vertex failed as one or more tasks failed. failedTasks:1] 14/07/07 09:54:42 ERROR tez.TezJobMonitor: Vertex failed, vertexName=Map 1, vertexId=vertex_1404652697071_4487_1_00, diagnostics=[Task failed, taskId=task_1404652697071_4487_1_00_00, diagnostics=[AttemptID:attempt_1404652697071_4487_1_00_00_0 Info:Error: java.lang.RuntimeException: java.lang.IllegalArgumentException: Can not create a Path from an empty string at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:174) at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.init(TezGroupedSplitsInputFormat.java:113) at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat.getRecordReader(TezGroupedSplitsInputFormat.java:79) at org.apache.tez.mapreduce.input.MRInput.setupOldRecordReader(MRInput.java:205) at org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:362) at org.apache.tez.mapreduce.input.MRInput.initFromEvent(MRInput.java:341) at org.apache.tez.mapreduce.input.MRInputLegacy.checkAndAwaitRecordReaderInitialization(MRInputLegacy.java:99) at org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:68) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:141) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:307) at org.apache.hadoop.mapred.YarnTezDagChild$5.run(YarnTezDagChild.java:562) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) at org.apache.hadoop.mapred.YarnTezDagChild.main(YarnTezDagChild.java:551) Caused by: java.lang.IllegalArgumentException: Can not create a Path from an empty string at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127) at org.apache.hadoop.fs.Path.init(Path.java:135) at org.apache.hadoop.hive.ql.io.HiveInputFormat$HiveInputSplit.getPath(HiveInputFormat.java:110) at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:228) at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:171) ... 14 more {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7366) getDatabase using direct sql
[ https://issues.apache.org/jira/browse/HIVE-7366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7366: --- Status: Open (was: Patch Available) getDatabase using direct sql Key: HIVE-7366 URL: https://issues.apache.org/jira/browse/HIVE-7366 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.14.0 Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Attachments: HIVE-7366.2.patch, HIVE-7366.patch Given that get_database is easily one of the most frequent calls made on the metastore, we should have the ability to bypass datanucleus for that, and use direct SQL instead. This was something that I did initially as part of debugging HIVE-7368, but I think that given the frequency of this call, it's useful to have it in mainline direct sql. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-6978) beeline always exits with 0 status, should exit with non-zero status on error
[ https://issues.apache.org/jira/browse/HIVE-6978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116955#comment-14116955 ] Ashutosh Chauhan commented on HIVE-6978: +1 beeline always exits with 0 status, should exit with non-zero status on error - Key: HIVE-6978 URL: https://issues.apache.org/jira/browse/HIVE-6978 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.12.0 Reporter: Gwen Shapira Assignee: Navis Attachments: HIVE-6978.1.patch.txt Was supposed to be fixed in Hive 0.12 (HIVE-4364). Doesn't look fixed from here. [i@p sqoop]$ beeline -u 'jdbc:hive2://p:1/k;principal=hive/p@L' -e select * from MEMBERS --outputformat=vertical scan complete in 3ms Connecting to jdbc:hive2://p:1/k;principal=hive/p@L SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.0.0-1.cdh5.0.0.p0.47/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.0.0-1.cdh5.0.0.p0.47/lib/avro/avro-tools-1.7.5-cdh5.0.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] Connected to: Apache Hive (version 0.12.0-cdh5.0.0) Driver: Hive JDBC (version 0.12.0-cdh5.0.0) Transaction isolation: TRANSACTION_REPEATABLE_READ -hiveconf (No such file or directory) hive.aux.jars.path=[redacted] Error: Error while compiling statement: FAILED: SemanticException [Error 10001]: Line 1:14 Table not found 'MEMBERS' (state=42S02,code=10001) Beeline version 0.12.0-cdh5.0.0 by Apache Hive Closing: org.apache.hive.jdbc.HiveConnection [inter@p sqoop]$ echo $? 0 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-6923) Use slf4j For Logging Everywhere
[ https://issues.apache.org/jira/browse/HIVE-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116956#comment-14116956 ] Ashutosh Chauhan commented on HIVE-6923: Okies, in that case slf4j indeed is better choice. This patch will need a rebase, if someone is still interested in pursuing this further. Use slf4j For Logging Everywhere Key: HIVE-6923 URL: https://issues.apache.org/jira/browse/HIVE-6923 Project: Hive Issue Type: Improvement Components: HiveServer2 Reporter: Nick White Assignee: Nick White Fix For: 0.14.0 Attachments: HIVE-6923.patch Hive uses a mixture of slf4j (backed by log4j) and commons-logging. I've attached a patch to tidy this up, by just using slf4j for all loggers. This means that applications using the JDBC driver can make Hive log through their own slf4j implementation consistently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-6963) Beeline logs are printing on the console
[ https://issues.apache.org/jira/browse/HIVE-6963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-6963: --- Status: Open (was: Patch Available) Failed tests need to be looked at. Beeline logs are printing on the console Key: HIVE-6963 URL: https://issues.apache.org/jira/browse/HIVE-6963 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Chinna Rao Lalam Assignee: Chinna Rao Lalam Attachments: HIVE-6963.patch beeline logs are not redirected to the log file. If log is redirected to log file, only required information will print on the console. This way it is more easy to read the output. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-6978) beeline always exits with 0 status, should exit with non-zero status on error
[ https://issues.apache.org/jira/browse/HIVE-6978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116959#comment-14116959 ] Gwen Shapira commented on HIVE-6978: Thanks for fixing my bug :) I may be missing something, but it looks like the only error condition covered by unit-tests is an error involving unmatched args. Can we also add a tests that validates that we get an error code when the query fails (for example as result of SemanticException)? Otherwise this issue may return in the future and we won't know about it. beeline always exits with 0 status, should exit with non-zero status on error - Key: HIVE-6978 URL: https://issues.apache.org/jira/browse/HIVE-6978 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.12.0 Reporter: Gwen Shapira Assignee: Navis Attachments: HIVE-6978.1.patch.txt Was supposed to be fixed in Hive 0.12 (HIVE-4364). Doesn't look fixed from here. [i@p sqoop]$ beeline -u 'jdbc:hive2://p:1/k;principal=hive/p@L' -e select * from MEMBERS --outputformat=vertical scan complete in 3ms Connecting to jdbc:hive2://p:1/k;principal=hive/p@L SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.0.0-1.cdh5.0.0.p0.47/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.0.0-1.cdh5.0.0.p0.47/lib/avro/avro-tools-1.7.5-cdh5.0.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] Connected to: Apache Hive (version 0.12.0-cdh5.0.0) Driver: Hive JDBC (version 0.12.0-cdh5.0.0) Transaction isolation: TRANSACTION_REPEATABLE_READ -hiveconf (No such file or directory) hive.aux.jars.path=[redacted] Error: Error while compiling statement: FAILED: SemanticException [Error 10001]: Line 1:14 Table not found 'MEMBERS' (state=42S02,code=10001) Beeline version 0.12.0-cdh5.0.0 by Apache Hive Closing: org.apache.hive.jdbc.HiveConnection [inter@p sqoop]$ echo $? 0 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-5857) Reduce tasks do not work in uber mode in YARN
[ https://issues.apache.org/jira/browse/HIVE-5857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116962#comment-14116962 ] Ashutosh Chauhan commented on HIVE-5857: +1 LGTM, unless [~appodictic] has some suggestion on how to achieve what he suggested. Reduce tasks do not work in uber mode in YARN - Key: HIVE-5857 URL: https://issues.apache.org/jira/browse/HIVE-5857 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.12.0, 0.13.0, 0.13.1 Reporter: Adam Kawa Assignee: Adam Kawa Priority: Critical Labels: plan, uber-jar, uberization, yarn Fix For: 0.13.0 Attachments: HIVE-5857.1.patch.txt, HIVE-5857.2.patch, HIVE-5857.3.patch, HIVE-5857.4.patch A Hive query fails when it tries to run a reduce task in uber mode in YARN. The NullPointerException is thrown in the ExecReducer.configure method, because the plan file (reduce.xml) for a reduce task is not found. The Utilities.getBaseWork method is expected to return BaseWork object, but it returns NULL due to FileNotFoundException. {code} // org.apache.hadoop.hive.ql.exec.Utilities public static BaseWork getBaseWork(Configuration conf, String name) { ... try { ... if (gWork == null) { Path localPath; if (ShimLoader.getHadoopShims().isLocalMode(conf)) { localPath = path; } else { localPath = new Path(name); } InputStream in = new FileInputStream(localPath.toUri().getPath()); BaseWork ret = deserializePlan(in); } return gWork; } catch (FileNotFoundException fnf) { // happens. e.g.: no reduce work. LOG.debug(No plan file found: +path); return null; } ... } {code} It happens because, the ShimLoader.getHadoopShims().isLocalMode(conf)) method returns true, because immediately before running a reduce task, org.apache.hadoop.mapred.LocalContainerLauncher changes its configuration to local mode (mapreduce.framework.name is changed from yarn to local). On the other hand map tasks run successfully, because its configuration is not changed and still remains yarn. {code} // org.apache.hadoop.mapred.LocalContainerLauncher private void runSubtask(..) { ... conf.set(MRConfig.FRAMEWORK_NAME, MRConfig.LOCAL_FRAMEWORK_NAME); conf.set(MRConfig.MASTER_ADDRESS, local); // bypass shuffle ReduceTask reduce = (ReduceTask)task; reduce.setConf(conf); reduce.run(conf, umbilical); } {code} A super quick fix could just an additional if-branch, where we check if we run a reduce task in uber mode, and then look for a plan file in a different location. *Java stacktrace* {code} 2013-11-20 00:50:56,862 INFO [uber-SubtaskRunner] org.apache.hadoop.hive.ql.exec.Utilities: No plan file found: hdfs://namenode.c.lon.spotify.net:54310/var/tmp/kawaa/hive_2013-11-20_00-50-43_888_3938384086824086680-2/-mr-10003/e3caacf6-15d6-4987-b186-d2906791b5b0/reduce.xml 2013-11-20 00:50:56,862 WARN [uber-SubtaskRunner] org.apache.hadoop.mapred.LocalContainerLauncher: Exception running local (uberized) 'child' : java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:427) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408) at org.apache.hadoop.mapred.LocalContainerLauncher$SubtaskRunner.runSubtask(LocalContainerLauncher.java:340) at org.apache.hadoop.mapred.LocalContainerLauncher$SubtaskRunner.run(LocalContainerLauncher.java:225) at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) ... 7 more Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:116) ... 12 more 2013-11-20 00:50:56,862 INFO [uber-SubtaskRunner] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Status update from attempt_1384392632998_34791_r_00_0 2013-11-20 00:50:56,862 INFO [uber-SubtaskRunner]
[jira] [Commented] (HIVE-7622) Semi-automated cleanup of code
[ https://issues.apache.org/jira/browse/HIVE-7622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116965#comment-14116965 ] Ashutosh Chauhan commented on HIVE-7622: For this to have a chance of getting committed, I would suggest to split this patch on either of following lines: * Do cleanup per module (ql, metastore, etc.) * Do cleanup for a kind of fixup (removing redundant modifiers, converting all tabs to spaces, etc.) I think second option will be more convenient for review, but if you choose first, thats fine too. Semi-automated cleanup of code -- Key: HIVE-7622 URL: https://issues.apache.org/jira/browse/HIVE-7622 Project: Hive Issue Type: Improvement Reporter: Lars Francke Assignee: Lars Francke Priority: Minor Attachments: HIVE-7622.1-noprefix.patch This patch fixes the following issues across the whole Hive codebase. I realize it's huge but these are all things that slipped through past reviews and pop up in Checkstyle, SonarQube, IDEs, etc.: * Remove redundant modifiers (e.g. {{public}} modifiers in interfaces) * Converts all tabs to spaces * Removes all redundant semicolons * Minor issues -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-6123) Implement checkstyle in maven
[ https://issues.apache.org/jira/browse/HIVE-6123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-6123: --- Component/s: Build Infrastructure Implement checkstyle in maven - Key: HIVE-6123 URL: https://issues.apache.org/jira/browse/HIVE-6123 Project: Hive Issue Type: Sub-task Components: Build Infrastructure Reporter: Brock Noland Assignee: Lars Francke Fix For: 0.14.0 Attachments: HIVE-6123.1.patch, HIVE-6123.2.patch ant had a checkstyle target, we should do something similar for maven -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-6123) Implement checkstyle in maven
[ https://issues.apache.org/jira/browse/HIVE-6123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-6123: --- Resolution: Fixed Fix Version/s: 0.14.0 Status: Resolved (was: Patch Available) Committed to trunk. Thanks, Lars! I think its a good idea to enhance ptest framework to fail the build if a patch increases checkstyle warnings, that way Hive QA will refuse to run the build. It will be awesome if someone takes that up. [~brocknoland] / [~szehon] might provider pointers on how to make that happen. Implement checkstyle in maven - Key: HIVE-6123 URL: https://issues.apache.org/jira/browse/HIVE-6123 Project: Hive Issue Type: Sub-task Components: Build Infrastructure Reporter: Brock Noland Assignee: Lars Francke Fix For: 0.14.0 Attachments: HIVE-6123.1.patch, HIVE-6123.2.patch ant had a checkstyle target, we should do something similar for maven -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7925) extend current partition status extrapolation to support all DBs
[ https://issues.apache.org/jira/browse/HIVE-7925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7925: --- Resolution: Fixed Fix Version/s: 0.14.0 Status: Resolved (was: Patch Available) Committed to trunk. Thanks, Pengcheng! extend current partition status extrapolation to support all DBs Key: HIVE-7925 URL: https://issues.apache.org/jira/browse/HIVE-7925 Project: Hive Issue Type: Improvement Components: Metastore Affects Versions: 0.14.0 Reporter: pengcheng xiong Assignee: pengcheng xiong Priority: Minor Fix For: 0.14.0 Attachments: HIVE-7925.1.patch extend current partition status extrapolation only supports Derby. That is why we got errors such as https://hortonworks.jira.com/browse/BUG-21983 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7925) extend current partition status extrapolation to support all DBs
[ https://issues.apache.org/jira/browse/HIVE-7925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7925: --- Affects Version/s: 0.14.0 extend current partition status extrapolation to support all DBs Key: HIVE-7925 URL: https://issues.apache.org/jira/browse/HIVE-7925 Project: Hive Issue Type: Improvement Components: Metastore Affects Versions: 0.14.0 Reporter: pengcheng xiong Assignee: pengcheng xiong Priority: Minor Fix For: 0.14.0 Attachments: HIVE-7925.1.patch extend current partition status extrapolation only supports Derby. That is why we got errors such as https://hortonworks.jira.com/browse/BUG-21983 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7925) extend current partition status extrapolation to support all DBs
[ https://issues.apache.org/jira/browse/HIVE-7925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7925: --- Component/s: Metastore extend current partition status extrapolation to support all DBs Key: HIVE-7925 URL: https://issues.apache.org/jira/browse/HIVE-7925 Project: Hive Issue Type: Improvement Components: Metastore Affects Versions: 0.14.0 Reporter: pengcheng xiong Assignee: pengcheng xiong Priority: Minor Fix For: 0.14.0 Attachments: HIVE-7925.1.patch extend current partition status extrapolation only supports Derby. That is why we got errors such as https://hortonworks.jira.com/browse/BUG-21983 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7753) Same operand appears on both sides of in DataType#compareByteArray()
[ https://issues.apache.org/jira/browse/HIVE-7753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7753: --- Resolution: Fixed Fix Version/s: 0.14.0 Status: Resolved (was: Patch Available) Committed to trunk. Thanks, Ted! Same operand appears on both sides of in DataType#compareByteArray() -- Key: HIVE-7753 URL: https://issues.apache.org/jira/browse/HIVE-7753 Project: Hive Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu Fix For: 0.14.0 Attachments: hive-7753-v1.txt Around line 227: {code} if (o1[i] o1[i]) { return 1; {code} The above comparison would never be true. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7753) Same operand appears on both sides of in DataType#compareByteArray()
[ https://issues.apache.org/jira/browse/HIVE-7753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7753: --- Component/s: HCatalog Same operand appears on both sides of in DataType#compareByteArray() -- Key: HIVE-7753 URL: https://issues.apache.org/jira/browse/HIVE-7753 Project: Hive Issue Type: Bug Components: HCatalog Reporter: Ted Yu Assignee: Ted Yu Fix For: 0.14.0 Attachments: hive-7753-v1.txt Around line 227: {code} if (o1[i] o1[i]) { return 1; {code} The above comparison would never be true. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7645) Hive CompactorMR job set NUM_BUCKETS mistake
[ https://issues.apache.org/jira/browse/HIVE-7645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7645: --- Assignee: Xiaoyu Wang Hive CompactorMR job set NUM_BUCKETS mistake Key: HIVE-7645 URL: https://issues.apache.org/jira/browse/HIVE-7645 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 0.13.1 Reporter: Xiaoyu Wang Assignee: Xiaoyu Wang Attachments: HIVE-7645.patch code: job.setInt(NUM_BUCKETS, sd.getBucketColsSize()); should change to: job.setInt(NUM_BUCKETS, sd.getNumBuckets()); -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7645) Hive CompactorMR job set NUM_BUCKETS mistake
[ https://issues.apache.org/jira/browse/HIVE-7645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7645: --- Resolution: Fixed Fix Version/s: 0.14.0 Status: Resolved (was: Patch Available) Committed to trunk. Thanks, Xiaoyu! Hive CompactorMR job set NUM_BUCKETS mistake Key: HIVE-7645 URL: https://issues.apache.org/jira/browse/HIVE-7645 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 0.13.1 Reporter: Xiaoyu Wang Assignee: Xiaoyu Wang Fix For: 0.14.0 Attachments: HIVE-7645.patch code: job.setInt(NUM_BUCKETS, sd.getBucketColsSize()); should change to: job.setInt(NUM_BUCKETS, sd.getNumBuckets()); -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7599) NPE in MergeTask#main() when -format is absent
[ https://issues.apache.org/jira/browse/HIVE-7599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7599: --- Assignee: DJ Choi NPE in MergeTask#main() when -format is absent -- Key: HIVE-7599 URL: https://issues.apache.org/jira/browse/HIVE-7599 Project: Hive Issue Type: Bug Reporter: Ted Yu Assignee: DJ Choi Priority: Minor Attachments: HIVE-7599.patch When '-format' is absent from commandline, the following call would result in NPE (format is initialized to null): {code} if (format.equals(rcfile)) { mergeWork = new MergeWork(inputPaths, new Path(outputDir), RCFileInputFormat.class); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7599) NPE in MergeTask#main() when -format is absent
[ https://issues.apache.org/jira/browse/HIVE-7599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7599: --- Resolution: Fixed Fix Version/s: 0.14.0 Status: Resolved (was: Patch Available) Committed to trunk. Thanks, DJ! NPE in MergeTask#main() when -format is absent -- Key: HIVE-7599 URL: https://issues.apache.org/jira/browse/HIVE-7599 Project: Hive Issue Type: Bug Reporter: Ted Yu Assignee: DJ Choi Priority: Minor Fix For: 0.14.0 Attachments: HIVE-7599.patch When '-format' is absent from commandline, the following call would result in NPE (format is initialized to null): {code} if (format.equals(rcfile)) { mergeWork = new MergeWork(inputPaths, new Path(outputDir), RCFileInputFormat.class); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7399) Timestamp type is not copied by ObjectInspectorUtils.copyToStandardObject
[ https://issues.apache.org/jira/browse/HIVE-7399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7399: --- Resolution: Fixed Fix Version/s: 0.14.0 Status: Resolved (was: Patch Available) Committed to trunk. Thanks, Navis! Timestamp type is not copied by ObjectInspectorUtils.copyToStandardObject - Key: HIVE-7399 URL: https://issues.apache.org/jira/browse/HIVE-7399 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Navis Assignee: Navis Fix For: 0.14.0 Attachments: HIVE-7399.1.patch.txt, HIVE-7399.2.patch.txt, HIVE-7399.3.patch.txt Most of primitive types are non-mutable, so copyToStandardObject retuns input object as-is. But for Timestamp objects, it's used something like wrapper and changed value by hive. copyToStandardObject should real copy for them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-6978) beeline always exits with 0 status, should exit with non-zero status on error
[ https://issues.apache.org/jira/browse/HIVE-6978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-6978: --- Resolution: Fixed Fix Version/s: 0.14.0 Status: Resolved (was: Patch Available) Committed to trunk. Thanks, Navis! beeline always exits with 0 status, should exit with non-zero status on error - Key: HIVE-6978 URL: https://issues.apache.org/jira/browse/HIVE-6978 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.12.0 Reporter: Gwen Shapira Assignee: Navis Fix For: 0.14.0 Attachments: HIVE-6978.1.patch.txt Was supposed to be fixed in Hive 0.12 (HIVE-4364). Doesn't look fixed from here. [i@p sqoop]$ beeline -u 'jdbc:hive2://p:1/k;principal=hive/p@L' -e select * from MEMBERS --outputformat=vertical scan complete in 3ms Connecting to jdbc:hive2://p:1/k;principal=hive/p@L SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.0.0-1.cdh5.0.0.p0.47/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.0.0-1.cdh5.0.0.p0.47/lib/avro/avro-tools-1.7.5-cdh5.0.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] Connected to: Apache Hive (version 0.12.0-cdh5.0.0) Driver: Hive JDBC (version 0.12.0-cdh5.0.0) Transaction isolation: TRANSACTION_REPEATABLE_READ -hiveconf (No such file or directory) hive.aux.jars.path=[redacted] Error: Error while compiling statement: FAILED: SemanticException [Error 10001]: Line 1:14 Table not found 'MEMBERS' (state=42S02,code=10001) Beeline version 0.12.0-cdh5.0.0 by Apache Hive Closing: org.apache.hive.jdbc.HiveConnection [inter@p sqoop]$ echo $? 0 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-6978) beeline always exits with 0 status, should exit with non-zero status on error
[ https://issues.apache.org/jira/browse/HIVE-6978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116976#comment-14116976 ] Ashutosh Chauhan commented on HIVE-6978: [~gwenshap] Sorry missed your comment. [~navis] Gwen's request is legit, it will be good to add such a test case. Can be done in a follow-up. beeline always exits with 0 status, should exit with non-zero status on error - Key: HIVE-6978 URL: https://issues.apache.org/jira/browse/HIVE-6978 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.12.0 Reporter: Gwen Shapira Assignee: Navis Fix For: 0.14.0 Attachments: HIVE-6978.1.patch.txt Was supposed to be fixed in Hive 0.12 (HIVE-4364). Doesn't look fixed from here. [i@p sqoop]$ beeline -u 'jdbc:hive2://p:1/k;principal=hive/p@L' -e select * from MEMBERS --outputformat=vertical scan complete in 3ms Connecting to jdbc:hive2://p:1/k;principal=hive/p@L SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.0.0-1.cdh5.0.0.p0.47/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.0.0-1.cdh5.0.0.p0.47/lib/avro/avro-tools-1.7.5-cdh5.0.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] Connected to: Apache Hive (version 0.12.0-cdh5.0.0) Driver: Hive JDBC (version 0.12.0-cdh5.0.0) Transaction isolation: TRANSACTION_REPEATABLE_READ -hiveconf (No such file or directory) hive.aux.jars.path=[redacted] Error: Error while compiling statement: FAILED: SemanticException [Error 10001]: Line 1:14 Table not found 'MEMBERS' (state=42S02,code=10001) Beeline version 0.12.0-cdh5.0.0 by Apache Hive Closing: org.apache.hive.jdbc.HiveConnection [inter@p sqoop]$ echo $? 0 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-6963) Beeline logs are printing on the console
[ https://issues.apache.org/jira/browse/HIVE-6963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116977#comment-14116977 ] Ashutosh Chauhan commented on HIVE-6963: Test failures are probably unrelated. But having different file name, configurable path for log file and different namespace for log4j properties is a good idea. Otherwise, this patch has limited usability. Beeline logs are printing on the console Key: HIVE-6963 URL: https://issues.apache.org/jira/browse/HIVE-6963 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Chinna Rao Lalam Assignee: Chinna Rao Lalam Attachments: HIVE-6963.patch beeline logs are not redirected to the log file. If log is redirected to log file, only required information will print on the console. This way it is more easy to read the output. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7869) Long running tests (1) [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116979#comment-14116979 ] Ashutosh Chauhan commented on HIVE-7869: This looks pretty useful. I wonder if we shall do this directly on trunk, seems like there is nothing spark specific here. [~vaibhavgumashta] You may find this useful. Long running tests (1) [Spark Branch] - Key: HIVE-7869 URL: https://issues.apache.org/jira/browse/HIVE-7869 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Brock Noland Assignee: Suhas Satish Attachments: HIVE-7869-spark.patch, HIVE-7869.2-spark.patch I have noticed when running the full test suite locally that the test JVM eventually crashes. We should do some testing (not part of the unit tests) which starts up a HS2 and runs queries on it continuously for 24 hours or so. In this JIRA let's create a stand alone java program which connects to a HS2 over JDBC, creates a bunch of tables (say 100) and then runs queries until the JDBC client is killed. This will allow us to run long running tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7405) Vectorize GROUP BY on the Reduce-Side (Part 1 – Basic)
[ https://issues.apache.org/jira/browse/HIVE-7405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116993#comment-14116993 ] Hive QA commented on HIVE-7405: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12665688/HIVE-7405.96.patch {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 6132 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_join_hash org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/582/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/582/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-582/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12665688 Vectorize GROUP BY on the Reduce-Side (Part 1 – Basic) -- Key: HIVE-7405 URL: https://issues.apache.org/jira/browse/HIVE-7405 Project: Hive Issue Type: Sub-task Reporter: Matt McCline Assignee: Matt McCline Attachments: HIVE-7405.1.patch, HIVE-7405.2.patch, HIVE-7405.3.patch, HIVE-7405.4.patch, HIVE-7405.5.patch, HIVE-7405.6.patch, HIVE-7405.7.patch, HIVE-7405.8.patch, HIVE-7405.9.patch, HIVE-7405.91.patch, HIVE-7405.92.patch, HIVE-7405.93.patch, HIVE-7405.94.patch, HIVE-7405.95.patch, HIVE-7405.96.patch Vectorize the basic case that does not have any count distinct aggregation. Add a 4th processing mode in VectorGroupByOperator for reduce where each input VectorizedRowBatch has only values for one key at a time. Thus, the values in the batch can be aggregated quickly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7869) Long running tests (1) [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14117006#comment-14117006 ] Brock Noland commented on HIVE-7869: Thank you Suhas! This looks good. We might add more queries, but we can do that later. Ashutosh, we can commit this this to trunk and merge to spark. Long running tests (1) [Spark Branch] - Key: HIVE-7869 URL: https://issues.apache.org/jira/browse/HIVE-7869 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Brock Noland Assignee: Suhas Satish Attachments: HIVE-7869-spark.patch, HIVE-7869.2-spark.patch I have noticed when running the full test suite locally that the test JVM eventually crashes. We should do some testing (not part of the unit tests) which starts up a HS2 and runs queries on it continuously for 24 hours or so. In this JIRA let's create a stand alone java program which connects to a HS2 over JDBC, creates a bunch of tables (say 100) and then runs queries until the JDBC client is killed. This will allow us to run long running tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7869) Long running tests (1) [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14117007#comment-14117007 ] Brock Noland commented on HIVE-7869: +` Long running tests (1) [Spark Branch] - Key: HIVE-7869 URL: https://issues.apache.org/jira/browse/HIVE-7869 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Brock Noland Assignee: Suhas Satish Attachments: HIVE-7869-spark.patch, HIVE-7869.2-spark.patch I have noticed when running the full test suite locally that the test JVM eventually crashes. We should do some testing (not part of the unit tests) which starts up a HS2 and runs queries on it continuously for 24 hours or so. In this JIRA let's create a stand alone java program which connects to a HS2 over JDBC, creates a bunch of tables (say 100) and then runs queries until the JDBC client is killed. This will allow us to run long running tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-7869) Long running tests (1) [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14117007#comment-14117007 ] Brock Noland edited comment on HIVE-7869 at 9/1/14 3:56 AM: +1 was (Author: brocknoland): +` Long running tests (1) [Spark Branch] - Key: HIVE-7869 URL: https://issues.apache.org/jira/browse/HIVE-7869 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Brock Noland Assignee: Suhas Satish Attachments: HIVE-7869-spark.patch, HIVE-7869.2-spark.patch I have noticed when running the full test suite locally that the test JVM eventually crashes. We should do some testing (not part of the unit tests) which starts up a HS2 and runs queries on it continuously for 24 hours or so. In this JIRA let's create a stand alone java program which connects to a HS2 over JDBC, creates a bunch of tables (say 100) and then runs queries until the JDBC client is killed. This will allow us to run long running tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (HIVE-7730) Extend ReadEntity to add accessed columns from query
[ https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaomeng Huang reopened HIVE-7730: -- Extend ReadEntity to add accessed columns from query Key: HIVE-7730 URL: https://issues.apache.org/jira/browse/HIVE-7730 Project: Hive Issue Type: Bug Reporter: Xiaomeng Huang Assignee: Xiaomeng Huang Fix For: 0.14.0 Attachments: HIVE-7730.001.patch, HIVE-7730.002.patch, HIVE-7730.003.patch, HIVE-7730.004.patch -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- Hive should store accessed columns to ReadEntity when we set HIVE_STATS_COLLECT_SCANCOLS(or we can add a confVar) is true. Then external authorization model can get accessed columns when do authorization in compile before execute. Maybe we will remove columnAccessInfo from BaseSemanticAnalyzer, old authorization and AuthorizationModeV2 can get accessed columns from ReadEntity too. Here is the quick implement in SemanticAnalyzer.analyzeInternal() below: {code} boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2() HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED); if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) { ColumnAccessAnalyzer columnAccessAnalyzer = new ColumnAccessAnalyzer(pCtx); setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); } compiler.compile(pCtx, rootTasks, inputs, outputs); // TODO: // after compile, we can put accessed column list to ReadEntity getting from columnAccessInfo if HIVE_AUTHORIZATION_ENABLED is set true {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-5799) session/operation timeout for hiveserver2
[ https://issues.apache.org/jira/browse/HIVE-5799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-5799: Attachment: HIVE-5799.17.patch.txt session/operation timeout for hiveserver2 - Key: HIVE-5799 URL: https://issues.apache.org/jira/browse/HIVE-5799 Project: Hive Issue Type: Improvement Components: HiveServer2 Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-5799.1.patch.txt, HIVE-5799.10.patch.txt, HIVE-5799.11.patch.txt, HIVE-5799.12.patch.txt, HIVE-5799.13.patch.txt, HIVE-5799.14.patch.txt, HIVE-5799.15.patch.txt, HIVE-5799.16.patch.txt, HIVE-5799.17.patch.txt, HIVE-5799.2.patch.txt, HIVE-5799.3.patch.txt, HIVE-5799.4.patch.txt, HIVE-5799.5.patch.txt, HIVE-5799.6.patch.txt, HIVE-5799.7.patch.txt, HIVE-5799.8.patch.txt, HIVE-5799.9.patch.txt Need some timeout facility for preventing resource leakages from instable or bad clients. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 15449: session/operation timeout for hiveserver2
On Aug. 31, 2014, 6:24 a.m., Lefty Leverenz wrote: All my bad. I hate meetings. - Navis --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/15449/#review51951 --- On Aug. 29, 2014, 9:05 a.m., Navis Ryu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/15449/ --- (Updated Aug. 29, 2014, 9:05 a.m.) Review request for hive. Bugs: HIVE-5799 https://issues.apache.org/jira/browse/HIVE-5799 Repository: hive-git Description --- Need some timeout facility for preventing resource leakages from instable or bad clients. Diffs - common/src/java/org/apache/hadoop/hive/ant/GenHiveTemplate.java 4293b7c common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 74bb863 common/src/java/org/apache/hadoop/hive/conf/Validator.java cea9c41 itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestRetryingHMSHandler.java 39e7005 itests/hive-unit/src/test/java/org/apache/hive/jdbc/miniHS2/TestHiveServer2SessionTimeout.java PRE-CREATION metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 9e3481a metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 4e76236 metastore/src/java/org/apache/hadoop/hive/metastore/RetryingHMSHandler.java 84e6dcd metastore/src/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java 063dee6 metastore/src/test/org/apache/hadoop/hive/metastore/txn/TestTxnHandler.java 8287c60 ql/src/java/org/apache/hadoop/hive/ql/exec/AutoProgressor.java d7323cb ql/src/java/org/apache/hadoop/hive/ql/exec/Heartbeater.java 7fdb4e7 ql/src/java/org/apache/hadoop/hive/ql/exec/ScriptOperator.java 5b857e2 ql/src/java/org/apache/hadoop/hive/ql/exec/UDTFOperator.java afd7bcf ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 70047a2 ql/src/java/org/apache/hadoop/hive/ql/exec/mr/HadoopJobExecHelper.java eb2851b ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java ebe9f92 ql/src/java/org/apache/hadoop/hive/ql/lockmgr/EmbeddedLockManager.java 11434a0 ql/src/java/org/apache/hadoop/hive/ql/lockmgr/zookeeper/ZooKeeperHiveLockManager.java 46044d0 ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsAggregator.java f636cff ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsPublisher.java db62721 ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java 3211759 ql/src/test/org/apache/hadoop/hive/ql/txn/compactor/TestInitiator.java f34b5ad ql/src/test/results/clientnegative/set_hiveconf_validation2.q.out 33f9360 service/src/java/org/apache/hadoop/hive/service/HiveServer.java 32729f2 service/src/java/org/apache/hive/service/cli/CLIService.java ff5de4a service/src/java/org/apache/hive/service/cli/OperationState.java 3e15f0c service/src/java/org/apache/hive/service/cli/operation/Operation.java 0d6436e service/src/java/org/apache/hive/service/cli/operation/OperationManager.java 2867301 service/src/java/org/apache/hive/service/cli/session/HiveSession.java 270e4a6 service/src/java/org/apache/hive/service/cli/session/HiveSessionBase.java 84e1c7e service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java 4e5f595 service/src/java/org/apache/hive/service/cli/session/HiveSessionImplwithUGI.java 7668904 service/src/java/org/apache/hive/service/cli/session/SessionManager.java 17c1c7b service/src/java/org/apache/hive/service/cli/thrift/ThriftCLIService.java 86ed4b4 service/src/java/org/apache/hive/service/cli/thrift/ThriftHttpCLIService.java 21d1563 service/src/test/org/apache/hive/service/cli/CLIServiceTest.java d01e819 Diff: https://reviews.apache.org/r/15449/diff/ Testing --- Confirmed in the local environment. Thanks, Navis Ryu
Re: Review Request 15449: session/operation timeout for hiveserver2
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/15449/ --- (Updated Sept. 1, 2014, 5:14 a.m.) Review request for hive. Changes --- Fixed missing TimeValidators Bugs: HIVE-5799 https://issues.apache.org/jira/browse/HIVE-5799 Repository: hive-git Description --- Need some timeout facility for preventing resource leakages from instable or bad clients. Diffs (updated) - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 74bb863 common/src/java/org/apache/hadoop/hive/conf/Validator.java cea9c41 hcatalog/core/src/test/java/org/apache/hive/hcatalog/cli/TestPermsGrp.java bf2b24e hcatalog/core/src/test/java/org/apache/hive/hcatalog/mapreduce/TestHCatPartitionPublish.java be7134f itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestMetaStoreAuthorization.java a6a038a itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestRetryingHMSHandler.java 39e7005 itests/hive-unit/src/test/java/org/apache/hive/jdbc/miniHS2/TestHiveServer2SessionTimeout.java PRE-CREATION metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 9ae6d7a metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java a94a7a37 metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java b9cf701 metastore/src/java/org/apache/hadoop/hive/metastore/RetryingHMSHandler.java 84e6dcd metastore/src/java/org/apache/hadoop/hive/metastore/RetryingMetaStoreClient.java 5410b45 metastore/src/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java 063dee6 metastore/src/test/org/apache/hadoop/hive/metastore/txn/TestTxnHandler.java 8287c60 ql/src/java/org/apache/hadoop/hive/ql/exec/AutoProgressor.java d7323cb ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java cd017d8 ql/src/java/org/apache/hadoop/hive/ql/exec/Heartbeater.java 7fdb4e7 ql/src/java/org/apache/hadoop/hive/ql/exec/ScriptOperator.java 5b857e2 ql/src/java/org/apache/hadoop/hive/ql/exec/UDTFOperator.java afd7bcf ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 70047a2 ql/src/java/org/apache/hadoop/hive/ql/exec/mr/HadoopJobExecHelper.java eb2851b ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java ebe9f92 ql/src/java/org/apache/hadoop/hive/ql/lockmgr/EmbeddedLockManager.java 11434a0 ql/src/java/org/apache/hadoop/hive/ql/lockmgr/zookeeper/ZooKeeperHiveLockManager.java 46044d0 ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsAggregator.java f636cff ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsPublisher.java db62721 ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java 3211759 ql/src/test/org/apache/hadoop/hive/ql/txn/compactor/TestInitiator.java f34b5ad ql/src/test/results/clientpositive/show_conf.q.out a3c814a service/src/java/org/apache/hadoop/hive/service/HiveServer.java 32729f2 service/src/java/org/apache/hive/service/cli/CLIService.java ff5de4a service/src/java/org/apache/hive/service/cli/OperationState.java 3e15f0c service/src/java/org/apache/hive/service/cli/operation/Operation.java 0d6436e service/src/java/org/apache/hive/service/cli/operation/OperationManager.java 2867301 service/src/java/org/apache/hive/service/cli/session/HiveSession.java 270e4a6 service/src/java/org/apache/hive/service/cli/session/HiveSessionBase.java 84e1c7e service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java 4e5f595 service/src/java/org/apache/hive/service/cli/session/HiveSessionImplwithUGI.java 7668904 service/src/java/org/apache/hive/service/cli/session/SessionManager.java 17c1c7b service/src/java/org/apache/hive/service/cli/thrift/ThriftBinaryCLIService.java e5ce72f service/src/java/org/apache/hive/service/cli/thrift/ThriftCLIService.java 86ed4b4 service/src/java/org/apache/hive/service/cli/thrift/ThriftHttpCLIService.java 21d1563 service/src/test/org/apache/hive/service/cli/CLIServiceTest.java d01e819 Diff: https://reviews.apache.org/r/15449/diff/ Testing --- Confirmed in the local environment. Thanks, Navis Ryu
[jira] [Updated] (HIVE-7730) Extend ReadEntity to add accessed columns from query
[ https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaomeng Huang updated HIVE-7730: - Attachment: HIVE-7730-fix-NP-issue.patch Extend ReadEntity to add accessed columns from query Key: HIVE-7730 URL: https://issues.apache.org/jira/browse/HIVE-7730 Project: Hive Issue Type: Bug Reporter: Xiaomeng Huang Assignee: Xiaomeng Huang Fix For: 0.14.0 Attachments: HIVE-7730-fix-NP-issue.patch, HIVE-7730.001.patch, HIVE-7730.002.patch, HIVE-7730.003.patch, HIVE-7730.004.patch -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- Hive should store accessed columns to ReadEntity when we set HIVE_STATS_COLLECT_SCANCOLS(or we can add a confVar) is true. Then external authorization model can get accessed columns when do authorization in compile before execute. Maybe we will remove columnAccessInfo from BaseSemanticAnalyzer, old authorization and AuthorizationModeV2 can get accessed columns from ReadEntity too. Here is the quick implement in SemanticAnalyzer.analyzeInternal() below: {code} boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2() HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED); if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) { ColumnAccessAnalyzer columnAccessAnalyzer = new ColumnAccessAnalyzer(pCtx); setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); } compiler.compile(pCtx, rootTasks, inputs, outputs); // TODO: // after compile, we can put accessed column list to ReadEntity getting from columnAccessInfo if HIVE_AUTHORIZATION_ENABLED is set true {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7730) Extend ReadEntity to add accessed columns from query
[ https://issues.apache.org/jira/browse/HIVE-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14117038#comment-14117038 ] Xiaomeng Huang commented on HIVE-7730: -- Hi [~szehon] There is a null pointer issue in latest patch. entity.getAccessedColumns().addAll(tableToColumnAccessMap.get(entity.getTable().getCompleteName())); if tableToColumnAccessMap.get(entity.getTable().getCompleteName()) is null, addAll(null) will throw null pointer exception. I attached a patch to fix it, could you help to review it? Thanks! Extend ReadEntity to add accessed columns from query Key: HIVE-7730 URL: https://issues.apache.org/jira/browse/HIVE-7730 Project: Hive Issue Type: Bug Reporter: Xiaomeng Huang Assignee: Xiaomeng Huang Fix For: 0.14.0 Attachments: HIVE-7730-fix-NP-issue.patch, HIVE-7730.001.patch, HIVE-7730.002.patch, HIVE-7730.003.patch, HIVE-7730.004.patch -Now what we get from HiveSemanticAnalyzerHookContextImpl is limited. If we have hook of HiveSemanticAnalyzerHook, we may want to get more things from hookContext. (e.g. the needed colums from query).- -So we should get instance of HiveSemanticAnalyzerHookContext from configuration, extends HiveSemanticAnalyzerHookContext with a new implementation, overide the HiveSemanticAnalyzerHookContext.update() and put what you want to the class.- Hive should store accessed columns to ReadEntity when we set HIVE_STATS_COLLECT_SCANCOLS(or we can add a confVar) is true. Then external authorization model can get accessed columns when do authorization in compile before execute. Maybe we will remove columnAccessInfo from BaseSemanticAnalyzer, old authorization and AuthorizationModeV2 can get accessed columns from ReadEntity too. Here is the quick implement in SemanticAnalyzer.analyzeInternal() below: {code} boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2() HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED); if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS) == true) { ColumnAccessAnalyzer columnAccessAnalyzer = new ColumnAccessAnalyzer(pCtx); setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess()); } compiler.compile(pCtx, rootTasks, inputs, outputs); // TODO: // after compile, we can put accessed column list to ReadEntity getting from columnAccessInfo if HIVE_AUTHORIZATION_ENABLED is set true {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-7926) long-lived daemons for query fragment execution, I/O and caching
Sergey Shelukhin created HIVE-7926: -- Summary: long-lived daemons for query fragment execution, I/O and caching Key: HIVE-7926 URL: https://issues.apache.org/jira/browse/HIVE-7926 Project: Hive Issue Type: New Feature Reporter: Sergey Shelukhin We are proposing a new execution model for Hive that is a combination of existing process-based tasks and long-lived daemons running on worker nodes. These nodes can take care of efficient I/O, caching and query fragment execution, while heavy lifting like most joins, ordering, etc. can be handled by tasks. The proposed model is not a 2-system solution for small and large queries; neither it is a separate execution engine like MR or Tez. It can be used by any Hive execution engine, if support is added; in future even external products (e.g. Pig) can use it. The document with high-level design we are proposing will be attached shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7926) long-lived daemons for query fragment execution, I/O and caching
[ https://issues.apache.org/jira/browse/HIVE-7926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-7926: --- Attachment: LLAPdesigndocument.pdf Attaching the design document. Please feel free to provide comments. We will post on Hive wiki shortly. long-lived daemons for query fragment execution, I/O and caching Key: HIVE-7926 URL: https://issues.apache.org/jira/browse/HIVE-7926 Project: Hive Issue Type: New Feature Reporter: Sergey Shelukhin Attachments: LLAPdesigndocument.pdf We are proposing a new execution model for Hive that is a combination of existing process-based tasks and long-lived daemons running on worker nodes. These nodes can take care of efficient I/O, caching and query fragment execution, while heavy lifting like most joins, ordering, etc. can be handled by tasks. The proposed model is not a 2-system solution for small and large queries; neither it is a separate execution engine like MR or Tez. It can be used by any Hive execution engine, if support is added; in future even external products (e.g. Pig) can use it. The document with high-level design we are proposing will be attached shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7669) parallel order by clause on a string column fails with IOException: Split points are out of order
[ https://issues.apache.org/jira/browse/HIVE-7669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-7669: Resolution: Fixed Fix Version/s: 0.14.0 Status: Resolved (was: Patch Available) Committed to trunk. Thanks Szehon and Lefty, for your precious comments. parallel order by clause on a string column fails with IOException: Split points are out of order - Key: HIVE-7669 URL: https://issues.apache.org/jira/browse/HIVE-7669 Project: Hive Issue Type: Bug Components: HiveServer2, Query Processor, SQL Affects Versions: 0.12.0 Environment: Hive 0.12.0-cdh5.0.0 OS: Redhat linux Reporter: Vishal Kamath Assignee: Navis Labels: orderby Fix For: 0.14.0 Attachments: HIVE-7669.1.patch.txt, HIVE-7669.2.patch.txt, HIVE-7669.3.patch.txt, HIVE-7669.4.patch.txt The source table has 600 Million rows and it has a String column l_shipinstruct which has 4 unique values. (Ie. these 4 values are repeated across the 600 million rows) We are sorting it based on this string column l_shipinstruct as shown in the below HiveQL with the following parameters. {code:sql} set hive.optimize.sampling.orderby=true; set hive.optimize.sampling.orderby.number=1000; set hive.optimize.sampling.orderby.percent=0.1f; insert overwrite table lineitem_temp_report select l_orderkey, l_partkey, l_suppkey, l_linenumber, l_quantity, l_extendedprice, l_discount, l_tax, l_returnflag, l_linestatus, l_shipdate, l_commitdate, l_receiptdate, l_shipinstruct, l_shipmode, l_comment from lineitem order by l_shipinstruct; {code} Stack Trace Diagnostic Messages for this Task: {noformat} Error: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapred.MapTask$OldOutputCollector.init(MapTask.java:569) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) ... 10 more Caused by: java.lang.IllegalArgumentException: Can't read partitions file at org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:116) at org.apache.hadoop.mapred.lib.TotalOrderPartitioner.configure(TotalOrderPartitioner.java:42) at org.apache.hadoop.hive.ql.exec.HiveTotalOrderPartitioner.configure(HiveTotalOrderPartitioner.java:37) ... 15 more Caused by: java.io.IOException: Split points are out of order at org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:96) ... 17 more {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7926) long-lived daemons for query fragment execution, I/O and caching
[ https://issues.apache.org/jira/browse/HIVE-7926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14117041#comment-14117041 ] Sergey Shelukhin commented on HIVE-7926: Actually I have no permissions to create pages on wiki, or there's no create button on toolbar for whatever reason. Can I have access? long-lived daemons for query fragment execution, I/O and caching Key: HIVE-7926 URL: https://issues.apache.org/jira/browse/HIVE-7926 Project: Hive Issue Type: New Feature Reporter: Sergey Shelukhin Attachments: LLAPdesigndocument.pdf We are proposing a new execution model for Hive that is a combination of existing process-based tasks and long-lived daemons running on worker nodes. These nodes can take care of efficient I/O, caching and query fragment execution, while heavy lifting like most joins, ordering, etc. can be handled by tasks. The proposed model is not a 2-system solution for small and large queries; neither it is a separate execution engine like MR or Tez. It can be used by any Hive execution engine, if support is added; in future even external products (e.g. Pig) can use it. The document with high-level design we are proposing will be attached shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7669) parallel order by clause on a string column fails with IOException: Split points are out of order
[ https://issues.apache.org/jira/browse/HIVE-7669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14117045#comment-14117045 ] Navis commented on HIVE-7669: - I've forgot to mention that the license header is added. parallel order by clause on a string column fails with IOException: Split points are out of order - Key: HIVE-7669 URL: https://issues.apache.org/jira/browse/HIVE-7669 Project: Hive Issue Type: Bug Components: HiveServer2, Query Processor, SQL Affects Versions: 0.12.0 Environment: Hive 0.12.0-cdh5.0.0 OS: Redhat linux Reporter: Vishal Kamath Assignee: Navis Labels: orderby Fix For: 0.14.0 Attachments: HIVE-7669.1.patch.txt, HIVE-7669.2.patch.txt, HIVE-7669.3.patch.txt, HIVE-7669.4.patch.txt The source table has 600 Million rows and it has a String column l_shipinstruct which has 4 unique values. (Ie. these 4 values are repeated across the 600 million rows) We are sorting it based on this string column l_shipinstruct as shown in the below HiveQL with the following parameters. {code:sql} set hive.optimize.sampling.orderby=true; set hive.optimize.sampling.orderby.number=1000; set hive.optimize.sampling.orderby.percent=0.1f; insert overwrite table lineitem_temp_report select l_orderkey, l_partkey, l_suppkey, l_linenumber, l_quantity, l_extendedprice, l_discount, l_tax, l_returnflag, l_linestatus, l_shipdate, l_commitdate, l_receiptdate, l_shipinstruct, l_shipmode, l_comment from lineitem order by l_shipinstruct; {code} Stack Trace Diagnostic Messages for this Task: {noformat} Error: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapred.MapTask$OldOutputCollector.init(MapTask.java:569) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) ... 10 more Caused by: java.lang.IllegalArgumentException: Can't read partitions file at org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:116) at org.apache.hadoop.mapred.lib.TotalOrderPartitioner.configure(TotalOrderPartitioner.java:42) at org.apache.hadoop.hive.ql.exec.HiveTotalOrderPartitioner.configure(HiveTotalOrderPartitioner.java:37) ... 15 more Caused by: java.io.IOException: Split points are out of order at org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:96) ... 17 more {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7916) Snappy-java error when running hive query on spark [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14117050#comment-14117050 ] Rui Li commented on HIVE-7916: -- Hi [~xuefuz], I tried on my cluster but cannot reproduce the problem. I removed the spark jars from local maven repo before building hive, so that the jars are downloaded from the AWS server we maintain. After hive is built, I linked the spark-assembly jar to {{lib}} of the hive home directory. The spark-assembly jar is built with {{mvn -Pyarn -Phadoop-2.4 -DskipTests clean package}} of the spark 1.1 branch. Could you provide more info about your environment, e.g. the spark jars you used or if the table is snappy compressed? Snappy-java error when running hive query on spark [Spark Branch] - Key: HIVE-7916 URL: https://issues.apache.org/jira/browse/HIVE-7916 Project: Hive Issue Type: Bug Components: Spark Reporter: Xuefu Zhang Labels: Spark-M1 Recently spark branch upgraded its dependency on Spark to 1.1.0-SNAPSHOT. While the new version addressed some lib conflicts (such as guava), I'm afraid that it also introduced new problems. The following might be one, when I set the master URL to be a spark standalone cluster: {code} hive set hive.execution.engine=spark; hive set spark.serializer=org.apache.spark.serializer.KryoSerializer; hive set spark.master=spark://xzdt:7077; hive select name, avg(value) from dec group by name; 14/08/28 16:41:52 INFO storage.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 333.0 KB, free 128.0 MB) java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.xerial.snappy.SnappyLoader.loadNativeLibrary(SnappyLoader.java:317) at org.xerial.snappy.SnappyLoader.load(SnappyLoader.java:219) at org.xerial.snappy.Snappy.clinit(Snappy.java:44) at org.xerial.snappy.SnappyOutputStream.init(SnappyOutputStream.java:79) at org.apache.spark.io.SnappyCompressionCodec.compressedOutputStream(CompressionCodec.scala:124) at org.apache.spark.broadcast.TorrentBroadcast$.blockifyObject(TorrentBroadcast.scala:207) at org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:83) at org.apache.spark.broadcast.TorrentBroadcast.init(TorrentBroadcast.scala:68) at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:36) at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:29) at org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62) at org.apache.spark.SparkContext.broadcast(SparkContext.scala:809) at org.apache.spark.rdd.HadoopRDD.init(HadoopRDD.scala:116) at org.apache.spark.SparkContext.hadoopRDD(SparkContext.scala:541) at org.apache.spark.api.java.JavaSparkContext.hadoopRDD(JavaSparkContext.scala:318) at org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generateRDD(SparkPlanGenerator.java:160) at org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generate(SparkPlanGenerator.java:88) at org.apache.hadoop.hive.ql.exec.spark.SparkClient.execute(SparkClient.java:156) at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.submit(SparkSessionImpl.java:52) at org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:77) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:161) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1537) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1304) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1116) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:940) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:930) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:246) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:198) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:408) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:781) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
[jira] [Updated] (HIVE-7613) Research optimization of auto convert join to map join [Spark branch]
[ https://issues.apache.org/jira/browse/HIVE-7613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szehon Ho updated HIVE-7613: Attachment: HIve on Spark Map join background.docx I'm not going avail to look at this for two weeks, so others can take a look, or I'll take it back at that time. Attaching a doc with some background about existing map-joins in MR and Tez, for an idea, hopefully it will help. Unfortunately, I didnt get a chance to get a design working yet for Spark. Research optimization of auto convert join to map join [Spark branch] - Key: HIVE-7613 URL: https://issues.apache.org/jira/browse/HIVE-7613 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chengxiang Li Assignee: Szehon Ho Priority: Minor Attachments: HIve on Spark Map join background.docx ConvertJoinMapJoin is an optimization the replaces a common join(aka shuffle join) with a map join(aka broadcast or fragment replicate join) when possible. we need to research how to make it workable with Hive on Spark. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7916) Snappy-java error when running hive query on spark [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14117057#comment-14117057 ] Rui Li commented on HIVE-7916: -- I noted this may be related to SPARK-2881. Snappy-java is bumped to 1.0.5.3 in the 1.1 branch and to 1.1.1.3 in the master branch. Hadoop-2.4.0 seems to use snappy-java-1.0.4.1. While the snappy-java version is different, I don't see any conflicts on my side. [~xuefuz], I found the following in the description of SPARK-2881: {quote} The issue was that someone else had run with snappy and it created /tmp/snappy-*.so but it had restrictive permissions so I was not able to use it or remove it. This caused my spark job to not start. {quote} Could you check if this is the case in your environment? Snappy-java error when running hive query on spark [Spark Branch] - Key: HIVE-7916 URL: https://issues.apache.org/jira/browse/HIVE-7916 Project: Hive Issue Type: Bug Components: Spark Reporter: Xuefu Zhang Labels: Spark-M1 Recently spark branch upgraded its dependency on Spark to 1.1.0-SNAPSHOT. While the new version addressed some lib conflicts (such as guava), I'm afraid that it also introduced new problems. The following might be one, when I set the master URL to be a spark standalone cluster: {code} hive set hive.execution.engine=spark; hive set spark.serializer=org.apache.spark.serializer.KryoSerializer; hive set spark.master=spark://xzdt:7077; hive select name, avg(value) from dec group by name; 14/08/28 16:41:52 INFO storage.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 333.0 KB, free 128.0 MB) java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.xerial.snappy.SnappyLoader.loadNativeLibrary(SnappyLoader.java:317) at org.xerial.snappy.SnappyLoader.load(SnappyLoader.java:219) at org.xerial.snappy.Snappy.clinit(Snappy.java:44) at org.xerial.snappy.SnappyOutputStream.init(SnappyOutputStream.java:79) at org.apache.spark.io.SnappyCompressionCodec.compressedOutputStream(CompressionCodec.scala:124) at org.apache.spark.broadcast.TorrentBroadcast$.blockifyObject(TorrentBroadcast.scala:207) at org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:83) at org.apache.spark.broadcast.TorrentBroadcast.init(TorrentBroadcast.scala:68) at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:36) at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:29) at org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62) at org.apache.spark.SparkContext.broadcast(SparkContext.scala:809) at org.apache.spark.rdd.HadoopRDD.init(HadoopRDD.scala:116) at org.apache.spark.SparkContext.hadoopRDD(SparkContext.scala:541) at org.apache.spark.api.java.JavaSparkContext.hadoopRDD(JavaSparkContext.scala:318) at org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generateRDD(SparkPlanGenerator.java:160) at org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generate(SparkPlanGenerator.java:88) at org.apache.hadoop.hive.ql.exec.spark.SparkClient.execute(SparkClient.java:156) at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.submit(SparkSessionImpl.java:52) at org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:77) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:161) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1537) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1304) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1116) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:940) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:930) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:246) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:198) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:408) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:781) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
Re: Review Request 15449: session/operation timeout for hiveserver2
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/15449/#review51969 --- common/src/java/org/apache/hadoop/hive/conf/HiveConf.java https://reviews.apache.org/r/15449/#comment90685 Mismatch between default units 1800s and TimeValidator(TimeUnit.MILLISECONDS). - Lefty Leverenz On Sept. 1, 2014, 5:14 a.m., Navis Ryu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/15449/ --- (Updated Sept. 1, 2014, 5:14 a.m.) Review request for hive. Bugs: HIVE-5799 https://issues.apache.org/jira/browse/HIVE-5799 Repository: hive-git Description --- Need some timeout facility for preventing resource leakages from instable or bad clients. Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 74bb863 common/src/java/org/apache/hadoop/hive/conf/Validator.java cea9c41 hcatalog/core/src/test/java/org/apache/hive/hcatalog/cli/TestPermsGrp.java bf2b24e hcatalog/core/src/test/java/org/apache/hive/hcatalog/mapreduce/TestHCatPartitionPublish.java be7134f itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestMetaStoreAuthorization.java a6a038a itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestRetryingHMSHandler.java 39e7005 itests/hive-unit/src/test/java/org/apache/hive/jdbc/miniHS2/TestHiveServer2SessionTimeout.java PRE-CREATION metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 9ae6d7a metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java a94a7a37 metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java b9cf701 metastore/src/java/org/apache/hadoop/hive/metastore/RetryingHMSHandler.java 84e6dcd metastore/src/java/org/apache/hadoop/hive/metastore/RetryingMetaStoreClient.java 5410b45 metastore/src/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java 063dee6 metastore/src/test/org/apache/hadoop/hive/metastore/txn/TestTxnHandler.java 8287c60 ql/src/java/org/apache/hadoop/hive/ql/exec/AutoProgressor.java d7323cb ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java cd017d8 ql/src/java/org/apache/hadoop/hive/ql/exec/Heartbeater.java 7fdb4e7 ql/src/java/org/apache/hadoop/hive/ql/exec/ScriptOperator.java 5b857e2 ql/src/java/org/apache/hadoop/hive/ql/exec/UDTFOperator.java afd7bcf ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 70047a2 ql/src/java/org/apache/hadoop/hive/ql/exec/mr/HadoopJobExecHelper.java eb2851b ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java ebe9f92 ql/src/java/org/apache/hadoop/hive/ql/lockmgr/EmbeddedLockManager.java 11434a0 ql/src/java/org/apache/hadoop/hive/ql/lockmgr/zookeeper/ZooKeeperHiveLockManager.java 46044d0 ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsAggregator.java f636cff ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsPublisher.java db62721 ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java 3211759 ql/src/test/org/apache/hadoop/hive/ql/txn/compactor/TestInitiator.java f34b5ad ql/src/test/results/clientpositive/show_conf.q.out a3c814a service/src/java/org/apache/hadoop/hive/service/HiveServer.java 32729f2 service/src/java/org/apache/hive/service/cli/CLIService.java ff5de4a service/src/java/org/apache/hive/service/cli/OperationState.java 3e15f0c service/src/java/org/apache/hive/service/cli/operation/Operation.java 0d6436e service/src/java/org/apache/hive/service/cli/operation/OperationManager.java 2867301 service/src/java/org/apache/hive/service/cli/session/HiveSession.java 270e4a6 service/src/java/org/apache/hive/service/cli/session/HiveSessionBase.java 84e1c7e service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java 4e5f595 service/src/java/org/apache/hive/service/cli/session/HiveSessionImplwithUGI.java 7668904 service/src/java/org/apache/hive/service/cli/session/SessionManager.java 17c1c7b service/src/java/org/apache/hive/service/cli/thrift/ThriftBinaryCLIService.java e5ce72f service/src/java/org/apache/hive/service/cli/thrift/ThriftCLIService.java 86ed4b4 service/src/java/org/apache/hive/service/cli/thrift/ThriftHttpCLIService.java 21d1563 service/src/test/org/apache/hive/service/cli/CLIServiceTest.java d01e819 Diff: https://reviews.apache.org/r/15449/diff/ Testing --- Confirmed in the local environment. Thanks, Navis Ryu
[jira] [Updated] (HIVE-6179) OOM occurs when query spans to a large number of partitions
[ https://issues.apache.org/jira/browse/HIVE-6179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] perry wang updated HIVE-6179: - Description: When executing a query against a large number of partitions, such as select count(*) from table, OOM error may occur because Hive fetches the metadata for all partitions involved and tries to store it in memory. {code} 2014-01-09 13:14:17,090 ERROR metastore.RetryingHMSHandler (RetryingHMSHandler.java:invoke(141)) - java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2367) at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:130) at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:114) at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:415) at java.lang.StringBuffer.append(StringBuffer.java:237) at org.apache.derby.impl.sql.conn.GenericStatementContext.appendErrorInfo(Unknown Source) at org.apache.derby.iapi.services.context.ContextManager.cleanupOnError(Unknown Source) at org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown Source) at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown Source) at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown Source) at org.apache.derby.impl.jdbc.EmbedResultSet.closeOnTransactionError(Unknown Source) at org.apache.derby.impl.jdbc.EmbedResultSet.movePosition(Unknown Source) at org.apache.derby.impl.jdbc.EmbedResultSet.next(Unknown Source) at org.datanucleus.store.rdbms.query.ForwardQueryResult.nextResultSetElement(ForwardQueryResult.java:191) at org.datanucleus.store.rdbms.query.ForwardQueryResult$QueryResultIterator.next(ForwardQueryResult.java:379) at org.apache.hadoop.hive.metastore.MetaStoreDirectSql.loopJoinOrderedResult(MetaStoreDirectSql.java:641) at org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilterInternal(MetaStoreDirectSql.java:410) at org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitions(MetaStoreDirectSql.java:205) at org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsInternal(ObjectStore.java:1433) at org.apache.hadoop.hive.metastore.ObjectStore.getPartitions(ObjectStore.java:1420) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.hive.metastore.RetryingRawStore.invoke(RetryingRawStore.java:122) at com.sun.proxy.$Proxy7.getPartitions(Unknown Source) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions(HiveMetaStore.java:2128) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:103) {code} The above error happened when executing select count(*) on a table with 40K partitions. was: When executing a query against a large number of partitions, such as select count(*) from table, OOM error may occur because Hive fetches the metadata for all partitions involved and tries to store it in memory. {code} 2014-01-09 13:14:17,090 ERROR metastore.RetryingHMSHandler (RetryingHMSHandler.java:invoke(141)) - java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2367) at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:130) at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:114) at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:415) at java.lang.StringBuffer.append(StringBuffer.java:237) at org.apache.derby.impl.sql.conn.GenericStatementContext.appendErrorInfo(Unknown Source) at org.apache.derby.iapi.services.context.ContextManager.cleanupOnError(Unknown Source) at org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown Source) at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown Source) at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown Source) at org.apache.derby.impl.jdbc.EmbedResultSet.closeOnTransactionError(Unknown Source) at org.apache.derby.impl.jdbc.EmbedResultSet.movePosition(Unknown Source) at