[jira] [Created] (HIVE-5061) Row sampling throws NPE when used in sub-query
Navis created HIVE-5061: --- Summary: Row sampling throws NPE when used in sub-query Key: HIVE-5061 URL: https://issues.apache.org/jira/browse/HIVE-5061 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor select * from (select * from src TABLESAMPLE (1 ROWS)) x; {noformat} ava.lang.NullPointerException at org.apache.hadoop.hive.ql.parse.SplitSample.getTargetSize(SplitSample.java:103) at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.sampleSplits(CombineHiveInputFormat.java:487) at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:405) at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:1025) at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1017) at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:928) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:881) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:881) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:855) at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:426) at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:144) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:151) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:65) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1424) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1204) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1009) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:878) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:781) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5061) Row sampling throws NPE when used in sub-query
[ https://issues.apache.org/jira/browse/HIVE-5061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-5061: Status: Patch Available (was: Open) Row sampling throws NPE when used in sub-query -- Key: HIVE-5061 URL: https://issues.apache.org/jira/browse/HIVE-5061 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor select * from (select * from src TABLESAMPLE (1 ROWS)) x; {noformat} ava.lang.NullPointerException at org.apache.hadoop.hive.ql.parse.SplitSample.getTargetSize(SplitSample.java:103) at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.sampleSplits(CombineHiveInputFormat.java:487) at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:405) at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:1025) at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1017) at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:928) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:881) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:881) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:855) at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:426) at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:144) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:151) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:65) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1424) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1204) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1009) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:878) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:781) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5061) Row sampling throws NPE when used in sub-query
[ https://issues.apache.org/jira/browse/HIVE-5061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HIVE-5061: -- Attachment: HIVE-5061.D12165.1.patch navis requested code review of HIVE-5061 [jira] Row sampling throws NPE when used in sub-query. Reviewers: JIRA HIVE-5061 Row sampling throws NPE when used in sub-query select * from (select * from src TABLESAMPLE (1 ROWS)) x; ava.lang.NullPointerException at org.apache.hadoop.hive.ql.parse.SplitSample.getTargetSize(SplitSample.java:103) at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.sampleSplits(CombineHiveInputFormat.java:487) at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:405) at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:1025) at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1017) at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:928) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:881) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:881) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:855) at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:426) at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:144) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:151) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:65) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1424) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1204) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1009) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:878) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:781) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614) TEST PLAN EMPTY REVISION DETAIL https://reviews.facebook.net/D12165 AFFECTED FILES ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java ql/src/test/queries/clientpositive/split_sample.q ql/src/test/results/clientpositive/split_sample.q.out MANAGE HERALD RULES https://reviews.facebook.net/herald/view/differential/ WHY DID I GET THIS EMAIL? https://reviews.facebook.net/herald/transcript/29061/ To: JIRA, navis Row sampling throws NPE when used in sub-query -- Key: HIVE-5061 URL: https://issues.apache.org/jira/browse/HIVE-5061 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-5061.D12165.1.patch select * from (select * from src TABLESAMPLE (1 ROWS)) x; {noformat} ava.lang.NullPointerException at org.apache.hadoop.hive.ql.parse.SplitSample.getTargetSize(SplitSample.java:103) at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.sampleSplits(CombineHiveInputFormat.java:487) at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:405) at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:1025) at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1017) at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:928) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:881) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:881) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:855) at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:426) at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:144) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:151) at
[jira] [Created] (HIVE-5062) Insert + orderby + limit does not need additional RS for limiting rows
Navis created HIVE-5062: --- Summary: Insert + orderby + limit does not need additional RS for limiting rows Key: HIVE-5062 URL: https://issues.apache.org/jira/browse/HIVE-5062 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Trivial The query, {noformat} insert overwrite table dummy select * from src order by key limit 10; {noformat} runs two MR but single MR is enough. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5062) Insert + orderby + limit does not need additional RS for limiting rows
[ https://issues.apache.org/jira/browse/HIVE-5062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-5062: Status: Patch Available (was: Open) Insert + orderby + limit does not need additional RS for limiting rows -- Key: HIVE-5062 URL: https://issues.apache.org/jira/browse/HIVE-5062 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Trivial Attachments: HIVE-5062.D12171.1.patch The query, {noformat} insert overwrite table dummy select * from src order by key limit 10; {noformat} runs two MR but single MR is enough. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5062) Insert + orderby + limit does not need additional RS for limiting rows
[ https://issues.apache.org/jira/browse/HIVE-5062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HIVE-5062: -- Attachment: HIVE-5062.D12171.1.patch navis requested code review of HIVE-5062 [jira] Insert + orderby + limit does not need additional RS for limiting rows. Reviewers: JIRA HIVE-5062 Insert + orderby + limit does not need additional RS for limiting rows The query, insert overwrite table dummy select * from src order by key limit 10; runs two MR but single MR is enough. TEST PLAN EMPTY REVISION DETAIL https://reviews.facebook.net/D12171 AFFECTED FILES ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java ql/src/test/results/clientpositive/insert1_overwrite_partitions.q.out ql/src/test/results/clientpositive/insert2_overwrite_partitions.q.out MANAGE HERALD RULES https://reviews.facebook.net/herald/view/differential/ WHY DID I GET THIS EMAIL? https://reviews.facebook.net/herald/transcript/29067/ To: JIRA, navis Insert + orderby + limit does not need additional RS for limiting rows -- Key: HIVE-5062 URL: https://issues.apache.org/jira/browse/HIVE-5062 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Trivial Attachments: HIVE-5062.D12171.1.patch The query, {noformat} insert overwrite table dummy select * from src order by key limit 10; {noformat} runs two MR but single MR is enough. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4513) disable hivehistory logs by default
[ https://issues.apache.org/jira/browse/HIVE-4513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13736661#comment-13736661 ] Thejas M Nair commented on HIVE-4513: - Regarding the pre-commit test result, testNegativeCliDriver_mapreduce_stack_trace_hadoop20 is a flaky test tracked in HIVE-4851. Reviewboard has the latest patch. disable hivehistory logs by default --- Key: HIVE-4513 URL: https://issues.apache.org/jira/browse/HIVE-4513 Project: Hive Issue Type: Bug Components: Configuration, Logging Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: HIVE-4513.1.patch, HIVE-4513.2.patch, HIVE-4513.3.patch, HIVE-4513.4.patch, HIVE-4513.5.patch, HIVE-4513.6.patch HiveHistory log files (hive_job_log_hive_*.txt files) store information about hive query such as query string, plan , counters and MR job progress information. There is no mechanism to delete these files and as a result they get accumulated over time, using up lot of disk space. I don't think this is used by most people, so I think it would better to turn this off by default. Jobtracker logs already capture most of this information, though it is not as structured as history logs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5009) Fix minor optimization issues
[ https://issues.apache.org/jira/browse/HIVE-5009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13736667#comment-13736667 ] Thejas M Nair commented on HIVE-5009: - Based on the comments in HIVE-3739, it might be related to your jdk version. If you are using jdk7, you might want to check if jdk6 helps. Please let us know on jira if you find a way out. Fix minor optimization issues - Key: HIVE-5009 URL: https://issues.apache.org/jira/browse/HIVE-5009 Project: Hive Issue Type: Improvement Reporter: Benjamin Jakobus Assignee: Benjamin Jakobus Priority: Minor Fix For: 0.12.0 Attachments: AbstractBucketJoinProc.java Original Estimate: 48h Remaining Estimate: 48h I have found some minor optimization issues in the codebase, which I would like to rectify and contribute. Specifically, these are: The optimizations that could be applied to Hive's code base are as follows: 1. Use StringBuffer when appending strings - In 184 instances, the concatination operator (+=) was used when appending strings. This is inherintly inefficient - instead Java's StringBuffer or StringBuilder class should be used. 12 instances of this optimization can be applied to the GenMRSkewJoinProcessor class and another three to the optimizer. CliDriver uses the + operator inside a loop, so does the column projection utilities class (ColumnProjectionUtils) and the aforementioned skew-join processor. Tests showed that using the StringBuilder when appending strings is 57\% faster than using the + operator (using the StringBuffer took 122 milliseconds whilst the + operator took 284 milliseconds). The reason as to why using the StringBuffer class is preferred over using the + operator, is because String third = first + second; gets compiled to: StringBuilder builder = new StringBuilder( first ); builder.append( second ); third = builder.toString(); Therefore, when building complex strings, that, for example involve loops, require many instantiations (and as discussed below, creating new objects inside loops is inefficient). 2. Use arrays instead of List - Java's java.util.Arrays class asList method is a more efficient at creating creating lists from arrays than using loops to manually iterate over the elements (using asList is computationally very cheap, O(1), as it merely creates a wrapper object around the array; looping through the list however has a complexity of O(n) since a new list is created and every element in the array is added to this new list). As confirmed by the experiment detailed in Appendix D, the Java compiler does not automatically optimize and replace tight-loop copying with asList: the loop-copying of 1,000,000 items took 15 milliseconds whilst using asList is instant. Four instances of this optimization can be applied to Hive's codebase (two of these should be applied to the Map-Join container - MapJoinRowContainer) - lines 92 to 98: for (obj = other.first(); obj != null; obj = other.next()) { ArrayListObject ele = new ArrayList(obj.length); for (int i = 0; i obj.length; i++) { ele.add(obj[i]); } list.add((Row) ele); } 3. Unnecessary wrapper object creation - In 31 cases, wrapper object creation could be avoided by simply using the provided static conversion methods. As noted in the PMD documentation, using these avoids the cost of creating objects that also need to be garbage-collected later. For example, line 587 of the SemanticAnalyzer class, could be replaced by the more efficient parseDouble method call: // Inefficient: Double percent = Double.valueOf(value).doubleValue(); // To be replaced by: Double percent = Double.parseDouble(value); Our test case in Appendix D confirms this: converting 10,000 strings into integers using Integer.parseInt(gen.nextSessionId()) (i.e. creating an unnecessary wrapper object) took 119 on average; using parseInt() took only 38. Therefore creating even just one unnecessary wrapper object can make your code up to 68% slower. 4. Converting literals to strings using + - Converting literals to strings using + is quite inefficient (see Appendix D) and should be done by calling the toString() method instead: converting 1,000,000 integers to strings using + took, on average, 1340 milliseconds whilst using the toString() method only required 1183 milliseconds (hence adding empty strings takes nearly 12% more time). 89 instances of this using + when converting literals were found in Hive's codebase - one of these are found in the JoinUtil. 5. Avoid manual copying of arrays - Instead of copying arrays as is done in GroupByOperator on line 1040 (see below), the more efficient System.arraycopy can
[jira] [Commented] (HIVE-5059) Meaningless warning message from TypeCheckProcFactory
[ https://issues.apache.org/jira/browse/HIVE-5059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13736672#comment-13736672 ] Hive QA commented on HIVE-5059: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12597406/HIVE-5059.D12159.1.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 2789 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.metastore.TestMetaStoreAuthorization.testMetaStoreAuthorization {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/398/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/398/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests failed with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. Meaningless warning message from TypeCheckProcFactory - Key: HIVE-5059 URL: https://issues.apache.org/jira/browse/HIVE-5059 Project: Hive Issue Type: Task Components: Logging Reporter: Navis Assignee: Navis Priority: Trivial Attachments: HIVE-5059.D12159.1.patch Regression from HIVE-3849, hive logs meaningless messages as warning like below, {noformat} WARN parse.TypeCheckProcFactory (TypeCheckProcFactory.java:convert(180)) - Invalid type entry TOK_TABLE_OR_COL=null {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-5063) Fix some non-deterministic or not-updated tests
Navis created HIVE-5063: --- Summary: Fix some non-deterministic or not-updated tests Key: HIVE-5063 URL: https://issues.apache.org/jira/browse/HIVE-5063 Project: Hive Issue Type: Sub-task Components: Tests Reporter: Navis Assignee: Navis Priority: Minor update result auto_join14.q,input12.q,join14.q,union_remove_19.q fix non-determinisitcs partition_date.q,partition_date2.q,ppd_vc.q,nonblock_op_deduplicate.q -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5063) Fix some non-deterministic or not-updated tests
[ https://issues.apache.org/jira/browse/HIVE-5063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-5063: Status: Patch Available (was: Open) Fix some non-deterministic or not-updated tests --- Key: HIVE-5063 URL: https://issues.apache.org/jira/browse/HIVE-5063 Project: Hive Issue Type: Sub-task Components: Tests Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-5063.D12177.1.patch update result auto_join14.q,input12.q,join14.q,union_remove_19.q fix non-determinisitcs partition_date.q,partition_date2.q,ppd_vc.q,nonblock_op_deduplicate.q -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5063) Fix some non-deterministic or not-updated tests
[ https://issues.apache.org/jira/browse/HIVE-5063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HIVE-5063: -- Attachment: HIVE-5063.D12177.1.patch navis requested code review of HIVE-5063 [jira] Fix some non-deterministic or not-updated tests. Reviewers: JIRA DPAL-2107 update result auto_join14.q,input12.q,join14.q,union_remove_19.q fix non-determinisitcs partition_date.q,partition_date2.q,ppd_vc.q,nonblock_op_deduplicate.q TEST PLAN EMPTY REVISION DETAIL https://reviews.facebook.net/D12177 AFFECTED FILES ql/src/test/queries/clientpositive/nonblock_op_deduplicate.q ql/src/test/queries/clientpositive/partition_date.q ql/src/test/queries/clientpositive/partition_date2.q ql/src/test/queries/clientpositive/ppd_vc.q ql/src/test/results/clientpositive/auto_join14.q.out ql/src/test/results/clientpositive/input12.q.out ql/src/test/results/clientpositive/join14.q.out ql/src/test/results/clientpositive/nonblock_op_deduplicate.q.out ql/src/test/results/clientpositive/partition_date.q.out ql/src/test/results/clientpositive/partition_date2.q.out ql/src/test/results/clientpositive/ppd_vc.q.out ql/src/test/results/clientpositive/union_remove_19.q.out MANAGE HERALD RULES https://reviews.facebook.net/herald/view/differential/ WHY DID I GET THIS EMAIL? https://reviews.facebook.net/herald/transcript/29073/ To: JIRA, navis Fix some non-deterministic or not-updated tests --- Key: HIVE-5063 URL: https://issues.apache.org/jira/browse/HIVE-5063 Project: Hive Issue Type: Sub-task Components: Tests Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-5063.D12177.1.patch update result auto_join14.q,input12.q,join14.q,union_remove_19.q fix non-determinisitcs partition_date.q,partition_date2.q,ppd_vc.q,nonblock_op_deduplicate.q -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5009) Fix minor optimization issues
[ https://issues.apache.org/jira/browse/HIVE-5009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13736707#comment-13736707 ] Benjamin Jakobus commented on HIVE-5009: Thanks. Yes, that did the trick! Fix minor optimization issues - Key: HIVE-5009 URL: https://issues.apache.org/jira/browse/HIVE-5009 Project: Hive Issue Type: Improvement Reporter: Benjamin Jakobus Assignee: Benjamin Jakobus Priority: Minor Fix For: 0.12.0 Attachments: AbstractBucketJoinProc.java Original Estimate: 48h Remaining Estimate: 48h I have found some minor optimization issues in the codebase, which I would like to rectify and contribute. Specifically, these are: The optimizations that could be applied to Hive's code base are as follows: 1. Use StringBuffer when appending strings - In 184 instances, the concatination operator (+=) was used when appending strings. This is inherintly inefficient - instead Java's StringBuffer or StringBuilder class should be used. 12 instances of this optimization can be applied to the GenMRSkewJoinProcessor class and another three to the optimizer. CliDriver uses the + operator inside a loop, so does the column projection utilities class (ColumnProjectionUtils) and the aforementioned skew-join processor. Tests showed that using the StringBuilder when appending strings is 57\% faster than using the + operator (using the StringBuffer took 122 milliseconds whilst the + operator took 284 milliseconds). The reason as to why using the StringBuffer class is preferred over using the + operator, is because String third = first + second; gets compiled to: StringBuilder builder = new StringBuilder( first ); builder.append( second ); third = builder.toString(); Therefore, when building complex strings, that, for example involve loops, require many instantiations (and as discussed below, creating new objects inside loops is inefficient). 2. Use arrays instead of List - Java's java.util.Arrays class asList method is a more efficient at creating creating lists from arrays than using loops to manually iterate over the elements (using asList is computationally very cheap, O(1), as it merely creates a wrapper object around the array; looping through the list however has a complexity of O(n) since a new list is created and every element in the array is added to this new list). As confirmed by the experiment detailed in Appendix D, the Java compiler does not automatically optimize and replace tight-loop copying with asList: the loop-copying of 1,000,000 items took 15 milliseconds whilst using asList is instant. Four instances of this optimization can be applied to Hive's codebase (two of these should be applied to the Map-Join container - MapJoinRowContainer) - lines 92 to 98: for (obj = other.first(); obj != null; obj = other.next()) { ArrayListObject ele = new ArrayList(obj.length); for (int i = 0; i obj.length; i++) { ele.add(obj[i]); } list.add((Row) ele); } 3. Unnecessary wrapper object creation - In 31 cases, wrapper object creation could be avoided by simply using the provided static conversion methods. As noted in the PMD documentation, using these avoids the cost of creating objects that also need to be garbage-collected later. For example, line 587 of the SemanticAnalyzer class, could be replaced by the more efficient parseDouble method call: // Inefficient: Double percent = Double.valueOf(value).doubleValue(); // To be replaced by: Double percent = Double.parseDouble(value); Our test case in Appendix D confirms this: converting 10,000 strings into integers using Integer.parseInt(gen.nextSessionId()) (i.e. creating an unnecessary wrapper object) took 119 on average; using parseInt() took only 38. Therefore creating even just one unnecessary wrapper object can make your code up to 68% slower. 4. Converting literals to strings using + - Converting literals to strings using + is quite inefficient (see Appendix D) and should be done by calling the toString() method instead: converting 1,000,000 integers to strings using + took, on average, 1340 milliseconds whilst using the toString() method only required 1183 milliseconds (hence adding empty strings takes nearly 12% more time). 89 instances of this using + when converting literals were found in Hive's codebase - one of these are found in the JoinUtil. 5. Avoid manual copying of arrays - Instead of copying arrays as is done in GroupByOperator on line 1040 (see below), the more efficient System.arraycopy can be used (arraycopy is a native method meaning that the entire memory block is copied using memcpy or mmove). // Line 1040 of the GroupByOperator for
[jira] [Commented] (HIVE-494) Select columns by index instead of name
[ https://issues.apache.org/jira/browse/HIVE-494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13736716#comment-13736716 ] Hive QA commented on HIVE-494: -- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12597403/HIVE-494.D12153.1.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 2790 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_reducers_power_two {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/399/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/399/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests failed with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. Select columns by index instead of name --- Key: HIVE-494 URL: https://issues.apache.org/jira/browse/HIVE-494 Project: Hive Issue Type: Wish Components: Clients, Query Processor Reporter: Adam Kramer Assignee: Navis Priority: Minor Labels: SQL Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-494.D1641.1.patch, HIVE-494.D12153.1.patch SELECT mytable[0], mytable[2] FROM some_table_name mytable; ...should return the first and third columns, respectively, from mytable regardless of their column names. The need for names specifically is kind of silly when they just get translated into numbers anyway. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4945) Make RLIKE/REGEXP run end-to-end by updating VectorizationContext
[ https://issues.apache.org/jira/browse/HIVE-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi updated HIVE-4945: - Attachment: HIVE-4945.1.patch.txt Review request on https://reviews.apache.org/r/13494/ Make RLIKE/REGEXP run end-to-end by updating VectorizationContext - Key: HIVE-4945 URL: https://issues.apache.org/jira/browse/HIVE-4945 Project: Hive Issue Type: Sub-task Affects Versions: vectorization-branch Reporter: Eric Hanson Attachments: HIVE-4945.1.patch.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4945) Make RLIKE/REGEXP run end-to-end by updating VectorizationContext
[ https://issues.apache.org/jira/browse/HIVE-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13736761#comment-13736761 ] Teddy Choi commented on HIVE-4945: -- While writing this code, I found that even if the query was not running with vectorization expressions, it returns the same .q.out file. So it makes hard to check that vectorization expressions were used or not. There is a way to check it. If vectorized expressions were used, ExecDriver#job#getMapperClass() returns VectorExecMapper#class after calling ExecDriver#execute(). Otherwise, getMapperClass() returns ExecMapper#class. Make RLIKE/REGEXP run end-to-end by updating VectorizationContext - Key: HIVE-4945 URL: https://issues.apache.org/jira/browse/HIVE-4945 Project: Hive Issue Type: Sub-task Affects Versions: vectorization-branch Reporter: Eric Hanson Attachments: HIVE-4945.1.patch.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5039) Support autoReconnect at JDBC
[ https://issues.apache.org/jira/browse/HIVE-5039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HIVE-5039: -- Attachment: HIVE-5039.D12183.1.patch azrael requested code review of HIVE-5039 [jira] Support autoReconnect at JDBC. Reviewers: JIRA HIVE-5039 : Support autoReconnect at JDBC If hiveServer2 is shutdown, connection is broken. Let the connection can reconnect automatically after hiveServer2 re-started. jdbc:hive2://localhost:1/default?autoReconnect=true TEST PLAN unit test and manual test REVISION DETAIL https://reviews.facebook.net/D12183 AFFECTED FILES jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java jdbc/src/java/org/apache/hive/jdbc/HivePreparedStatement.java jdbc/src/java/org/apache/hive/jdbc/HiveStatement.java jdbc/src/test/org/apache/hive/jdbc/TestJdbcDriver2Connection.java MANAGE HERALD RULES https://reviews.facebook.net/herald/view/differential/ WHY DID I GET THIS EMAIL? https://reviews.facebook.net/herald/transcript/29079/ To: JIRA, azrael Support autoReconnect at JDBC -- Key: HIVE-5039 URL: https://issues.apache.org/jira/browse/HIVE-5039 Project: Hive Issue Type: New Feature Components: JDBC Affects Versions: 0.11.0 Reporter: Azrael Park Assignee: Azrael Park Priority: Trivial Attachments: HIVE-5039.D12183.1.patch, HIVE-5039.patch If hiveServer2 is shutdown, connection is broken. Let the connection can reconnect automatically after hiveServer2 re-started. {noformat} jdbc:hive2://localhost:1/default?autoReconnect=true {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4123) The RLE encoding for ORC can be improved
[ https://issues.apache.org/jira/browse/HIVE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13736773#comment-13736773 ] Hive QA commented on HIVE-4123: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12597402/HIVE-4123.patch.txt {color:green}SUCCESS:{color} +1 2848 tests passed Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/400/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/400/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. The RLE encoding for ORC can be improved Key: HIVE-4123 URL: https://issues.apache.org/jira/browse/HIVE-4123 Project: Hive Issue Type: New Feature Components: File Formats Affects Versions: 0.12.0 Reporter: Owen O'Malley Assignee: Prasanth J Labels: orcfile Fix For: 0.12.0 Attachments: HIVE-4123.1.git.patch.txt, HIVE-4123.2.git.patch.txt, HIVE-4123.3.patch.txt, HIVE-4123.4.patch.txt, HIVE-4123.5.txt, HIVE-4123.6.txt, HIVE-4123.7.txt, HIVE-4123-8.patch, HIVE-4123.8.txt, HIVE-4123.8.txt, HIVE-4123.patch.txt, ORC-Compression-Ratio-Comparison.xlsx The run length encoding of integers can be improved: * tighter bit packing * allow delta encoding * allow longer runs -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5009) Fix minor optimization issues
[ https://issues.apache.org/jira/browse/HIVE-5009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13736792#comment-13736792 ] Benjamin Jakobus commented on HIVE-5009: Mhh, another silly question: my changes don't seem to take effect after compiling. 1) Edit file (e.g. add console.printInfo( DEBUG: exec time: + ((end - offset) / 1000) ); ) 2) ant -Dhadoop.version=1.2.1 clean package 3) Run test script. But no output written to log or console) Any advice? Fix minor optimization issues - Key: HIVE-5009 URL: https://issues.apache.org/jira/browse/HIVE-5009 Project: Hive Issue Type: Improvement Reporter: Benjamin Jakobus Assignee: Benjamin Jakobus Priority: Minor Fix For: 0.12.0 Attachments: AbstractBucketJoinProc.java Original Estimate: 48h Remaining Estimate: 48h I have found some minor optimization issues in the codebase, which I would like to rectify and contribute. Specifically, these are: The optimizations that could be applied to Hive's code base are as follows: 1. Use StringBuffer when appending strings - In 184 instances, the concatination operator (+=) was used when appending strings. This is inherintly inefficient - instead Java's StringBuffer or StringBuilder class should be used. 12 instances of this optimization can be applied to the GenMRSkewJoinProcessor class and another three to the optimizer. CliDriver uses the + operator inside a loop, so does the column projection utilities class (ColumnProjectionUtils) and the aforementioned skew-join processor. Tests showed that using the StringBuilder when appending strings is 57\% faster than using the + operator (using the StringBuffer took 122 milliseconds whilst the + operator took 284 milliseconds). The reason as to why using the StringBuffer class is preferred over using the + operator, is because String third = first + second; gets compiled to: StringBuilder builder = new StringBuilder( first ); builder.append( second ); third = builder.toString(); Therefore, when building complex strings, that, for example involve loops, require many instantiations (and as discussed below, creating new objects inside loops is inefficient). 2. Use arrays instead of List - Java's java.util.Arrays class asList method is a more efficient at creating creating lists from arrays than using loops to manually iterate over the elements (using asList is computationally very cheap, O(1), as it merely creates a wrapper object around the array; looping through the list however has a complexity of O(n) since a new list is created and every element in the array is added to this new list). As confirmed by the experiment detailed in Appendix D, the Java compiler does not automatically optimize and replace tight-loop copying with asList: the loop-copying of 1,000,000 items took 15 milliseconds whilst using asList is instant. Four instances of this optimization can be applied to Hive's codebase (two of these should be applied to the Map-Join container - MapJoinRowContainer) - lines 92 to 98: for (obj = other.first(); obj != null; obj = other.next()) { ArrayListObject ele = new ArrayList(obj.length); for (int i = 0; i obj.length; i++) { ele.add(obj[i]); } list.add((Row) ele); } 3. Unnecessary wrapper object creation - In 31 cases, wrapper object creation could be avoided by simply using the provided static conversion methods. As noted in the PMD documentation, using these avoids the cost of creating objects that also need to be garbage-collected later. For example, line 587 of the SemanticAnalyzer class, could be replaced by the more efficient parseDouble method call: // Inefficient: Double percent = Double.valueOf(value).doubleValue(); // To be replaced by: Double percent = Double.parseDouble(value); Our test case in Appendix D confirms this: converting 10,000 strings into integers using Integer.parseInt(gen.nextSessionId()) (i.e. creating an unnecessary wrapper object) took 119 on average; using parseInt() took only 38. Therefore creating even just one unnecessary wrapper object can make your code up to 68% slower. 4. Converting literals to strings using + - Converting literals to strings using + is quite inefficient (see Appendix D) and should be done by calling the toString() method instead: converting 1,000,000 integers to strings using + took, on average, 1340 milliseconds whilst using the toString() method only required 1183 milliseconds (hence adding empty strings takes nearly 12% more time). 89 instances of this using + when converting literals were found in Hive's codebase - one of these are found in the JoinUtil. 5. Avoid manual copying of arrays - Instead of
[jira] [Commented] (HIVE-5009) Fix minor optimization issues
[ https://issues.apache.org/jira/browse/HIVE-5009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13736809#comment-13736809 ] Benjamin Jakobus commented on HIVE-5009: Never mind - resolved. Problem was me being an idiot. Fix minor optimization issues - Key: HIVE-5009 URL: https://issues.apache.org/jira/browse/HIVE-5009 Project: Hive Issue Type: Improvement Reporter: Benjamin Jakobus Assignee: Benjamin Jakobus Priority: Minor Fix For: 0.12.0 Attachments: AbstractBucketJoinProc.java Original Estimate: 48h Remaining Estimate: 48h I have found some minor optimization issues in the codebase, which I would like to rectify and contribute. Specifically, these are: The optimizations that could be applied to Hive's code base are as follows: 1. Use StringBuffer when appending strings - In 184 instances, the concatination operator (+=) was used when appending strings. This is inherintly inefficient - instead Java's StringBuffer or StringBuilder class should be used. 12 instances of this optimization can be applied to the GenMRSkewJoinProcessor class and another three to the optimizer. CliDriver uses the + operator inside a loop, so does the column projection utilities class (ColumnProjectionUtils) and the aforementioned skew-join processor. Tests showed that using the StringBuilder when appending strings is 57\% faster than using the + operator (using the StringBuffer took 122 milliseconds whilst the + operator took 284 milliseconds). The reason as to why using the StringBuffer class is preferred over using the + operator, is because String third = first + second; gets compiled to: StringBuilder builder = new StringBuilder( first ); builder.append( second ); third = builder.toString(); Therefore, when building complex strings, that, for example involve loops, require many instantiations (and as discussed below, creating new objects inside loops is inefficient). 2. Use arrays instead of List - Java's java.util.Arrays class asList method is a more efficient at creating creating lists from arrays than using loops to manually iterate over the elements (using asList is computationally very cheap, O(1), as it merely creates a wrapper object around the array; looping through the list however has a complexity of O(n) since a new list is created and every element in the array is added to this new list). As confirmed by the experiment detailed in Appendix D, the Java compiler does not automatically optimize and replace tight-loop copying with asList: the loop-copying of 1,000,000 items took 15 milliseconds whilst using asList is instant. Four instances of this optimization can be applied to Hive's codebase (two of these should be applied to the Map-Join container - MapJoinRowContainer) - lines 92 to 98: for (obj = other.first(); obj != null; obj = other.next()) { ArrayListObject ele = new ArrayList(obj.length); for (int i = 0; i obj.length; i++) { ele.add(obj[i]); } list.add((Row) ele); } 3. Unnecessary wrapper object creation - In 31 cases, wrapper object creation could be avoided by simply using the provided static conversion methods. As noted in the PMD documentation, using these avoids the cost of creating objects that also need to be garbage-collected later. For example, line 587 of the SemanticAnalyzer class, could be replaced by the more efficient parseDouble method call: // Inefficient: Double percent = Double.valueOf(value).doubleValue(); // To be replaced by: Double percent = Double.parseDouble(value); Our test case in Appendix D confirms this: converting 10,000 strings into integers using Integer.parseInt(gen.nextSessionId()) (i.e. creating an unnecessary wrapper object) took 119 on average; using parseInt() took only 38. Therefore creating even just one unnecessary wrapper object can make your code up to 68% slower. 4. Converting literals to strings using + - Converting literals to strings using + is quite inefficient (see Appendix D) and should be done by calling the toString() method instead: converting 1,000,000 integers to strings using + took, on average, 1340 milliseconds whilst using the toString() method only required 1183 milliseconds (hence adding empty strings takes nearly 12% more time). 89 instances of this using + when converting literals were found in Hive's codebase - one of these are found in the JoinUtil. 5. Avoid manual copying of arrays - Instead of copying arrays as is done in GroupByOperator on line 1040 (see below), the more efficient System.arraycopy can be used (arraycopy is a native method meaning that the entire memory block is copied using memcpy or mmove). // Line 1040 of the
[jira] [Commented] (HIVE-5009) Fix minor optimization issues
[ https://issues.apache.org/jira/browse/HIVE-5009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13736811#comment-13736811 ] Benjamin Jakobus commented on HIVE-5009: However is there a faster way to compile - or do I need to rely on ivy, maven etc every time? ant -Dhadoop.version=1.2.1 clean package takes about 3 minutes every time. Fix minor optimization issues - Key: HIVE-5009 URL: https://issues.apache.org/jira/browse/HIVE-5009 Project: Hive Issue Type: Improvement Reporter: Benjamin Jakobus Assignee: Benjamin Jakobus Priority: Minor Fix For: 0.12.0 Attachments: AbstractBucketJoinProc.java Original Estimate: 48h Remaining Estimate: 48h I have found some minor optimization issues in the codebase, which I would like to rectify and contribute. Specifically, these are: The optimizations that could be applied to Hive's code base are as follows: 1. Use StringBuffer when appending strings - In 184 instances, the concatination operator (+=) was used when appending strings. This is inherintly inefficient - instead Java's StringBuffer or StringBuilder class should be used. 12 instances of this optimization can be applied to the GenMRSkewJoinProcessor class and another three to the optimizer. CliDriver uses the + operator inside a loop, so does the column projection utilities class (ColumnProjectionUtils) and the aforementioned skew-join processor. Tests showed that using the StringBuilder when appending strings is 57\% faster than using the + operator (using the StringBuffer took 122 milliseconds whilst the + operator took 284 milliseconds). The reason as to why using the StringBuffer class is preferred over using the + operator, is because String third = first + second; gets compiled to: StringBuilder builder = new StringBuilder( first ); builder.append( second ); third = builder.toString(); Therefore, when building complex strings, that, for example involve loops, require many instantiations (and as discussed below, creating new objects inside loops is inefficient). 2. Use arrays instead of List - Java's java.util.Arrays class asList method is a more efficient at creating creating lists from arrays than using loops to manually iterate over the elements (using asList is computationally very cheap, O(1), as it merely creates a wrapper object around the array; looping through the list however has a complexity of O(n) since a new list is created and every element in the array is added to this new list). As confirmed by the experiment detailed in Appendix D, the Java compiler does not automatically optimize and replace tight-loop copying with asList: the loop-copying of 1,000,000 items took 15 milliseconds whilst using asList is instant. Four instances of this optimization can be applied to Hive's codebase (two of these should be applied to the Map-Join container - MapJoinRowContainer) - lines 92 to 98: for (obj = other.first(); obj != null; obj = other.next()) { ArrayListObject ele = new ArrayList(obj.length); for (int i = 0; i obj.length; i++) { ele.add(obj[i]); } list.add((Row) ele); } 3. Unnecessary wrapper object creation - In 31 cases, wrapper object creation could be avoided by simply using the provided static conversion methods. As noted in the PMD documentation, using these avoids the cost of creating objects that also need to be garbage-collected later. For example, line 587 of the SemanticAnalyzer class, could be replaced by the more efficient parseDouble method call: // Inefficient: Double percent = Double.valueOf(value).doubleValue(); // To be replaced by: Double percent = Double.parseDouble(value); Our test case in Appendix D confirms this: converting 10,000 strings into integers using Integer.parseInt(gen.nextSessionId()) (i.e. creating an unnecessary wrapper object) took 119 on average; using parseInt() took only 38. Therefore creating even just one unnecessary wrapper object can make your code up to 68% slower. 4. Converting literals to strings using + - Converting literals to strings using + is quite inefficient (see Appendix D) and should be done by calling the toString() method instead: converting 1,000,000 integers to strings using + took, on average, 1340 milliseconds whilst using the toString() method only required 1183 milliseconds (hence adding empty strings takes nearly 12% more time). 89 instances of this using + when converting literals were found in Hive's codebase - one of these are found in the JoinUtil. 5. Avoid manual copying of arrays - Instead of copying arrays as is done in GroupByOperator on line 1040 (see below), the more efficient System.arraycopy can be used
[jira] [Commented] (HIVE-5061) Row sampling throws NPE when used in sub-query
[ https://issues.apache.org/jira/browse/HIVE-5061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13736841#comment-13736841 ] Hive QA commented on HIVE-5061: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12597417/HIVE-5061.D12165.1.patch {color:green}SUCCESS:{color} +1 2789 tests passed Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/401/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/401/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. Row sampling throws NPE when used in sub-query -- Key: HIVE-5061 URL: https://issues.apache.org/jira/browse/HIVE-5061 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-5061.D12165.1.patch select * from (select * from src TABLESAMPLE (1 ROWS)) x; {noformat} ava.lang.NullPointerException at org.apache.hadoop.hive.ql.parse.SplitSample.getTargetSize(SplitSample.java:103) at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.sampleSplits(CombineHiveInputFormat.java:487) at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:405) at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:1025) at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1017) at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:928) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:881) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:881) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:855) at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:426) at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:144) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:151) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:65) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1424) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1204) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1009) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:878) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:781) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4979) If any compiling error exists, test-shims should stop
[ https://issues.apache.org/jira/browse/HIVE-4979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai updated HIVE-4979: --- Status: Open (was: Patch Available) If any compiling error exists, test-shims should stop - Key: HIVE-4979 URL: https://issues.apache.org/jira/browse/HIVE-4979 Project: Hive Issue Type: Sub-task Reporter: Yin Huai Assignee: Yin Huai Attachments: HIVE-4979.4980.failedTest.txt, HIVE-4979.D11931.1.patch, HIVE-4979.D11931.2.patch, HIVE-4979.D11931.3.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4569) GetQueryPlan api in Hive Server2
[ https://issues.apache.org/jira/browse/HIVE-4569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13736885#comment-13736885 ] Henry Robinson commented on HIVE-4569: -- Although {{executeStatement}} is implemented synchronously in Hive, was it meant to be synchronous from the outset? The comment in the Thrift definition suggests otherwise: {code} // ExecuteStatement() // // Execute a statement. // The returned OperationHandle can be used to check on the // status of the statement, and to fetch results once the // statement has finished executing. {code} GetQueryPlan api in Hive Server2 Key: HIVE-4569 URL: https://issues.apache.org/jira/browse/HIVE-4569 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Amareshwari Sriramadasu Assignee: Jaideep Dhok Attachments: git-4569.patch, HIVE-4569.D10887.1.patch, HIVE-4569.D11469.1.patch It would nice to have GetQueryPlan as thrift api. I do not see GetQueryPlan api available in HiveServer2, though the wiki https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Thrift+API contains, not sure why it was not added. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5062) Insert + orderby + limit does not need additional RS for limiting rows
[ https://issues.apache.org/jira/browse/HIVE-5062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13736913#comment-13736913 ] Hive QA commented on HIVE-5062: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12597421/HIVE-5062.D12171.1.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 2789 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_script_broken_pipe1 {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/402/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/402/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests failed with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. Insert + orderby + limit does not need additional RS for limiting rows -- Key: HIVE-5062 URL: https://issues.apache.org/jira/browse/HIVE-5062 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Trivial Attachments: HIVE-5062.D12171.1.patch The query, {noformat} insert overwrite table dummy select * from src order by key limit 10; {noformat} runs two MR but single MR is enough. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5063) Fix some non-deterministic or not-updated tests
[ https://issues.apache.org/jira/browse/HIVE-5063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13736922#comment-13736922 ] Brock Noland commented on HIVE-5063: +1 LGTM We'll see what the automated tests say and then I'll run the affected tests on hadoop2. Fix some non-deterministic or not-updated tests --- Key: HIVE-5063 URL: https://issues.apache.org/jira/browse/HIVE-5063 Project: Hive Issue Type: Sub-task Components: Tests Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-5063.D12177.1.patch update result auto_join14.q,input12.q,join14.q,union_remove_19.q fix non-determinisitcs partition_date.q,partition_date2.q,ppd_vc.q,nonblock_op_deduplicate.q -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4123) The RLE encoding for ORC can be improved
[ https://issues.apache.org/jira/browse/HIVE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated HIVE-4123: Resolution: Fixed Release Note: I just committed this. Thanks, Prasanth! Status: Resolved (was: Patch Available) The RLE encoding for ORC can be improved Key: HIVE-4123 URL: https://issues.apache.org/jira/browse/HIVE-4123 Project: Hive Issue Type: New Feature Components: File Formats Affects Versions: 0.12.0 Reporter: Owen O'Malley Assignee: Prasanth J Labels: orcfile Fix For: 0.12.0 Attachments: HIVE-4123.1.git.patch.txt, HIVE-4123.2.git.patch.txt, HIVE-4123.3.patch.txt, HIVE-4123.4.patch.txt, HIVE-4123.5.txt, HIVE-4123.6.txt, HIVE-4123.7.txt, HIVE-4123-8.patch, HIVE-4123.8.txt, HIVE-4123.8.txt, HIVE-4123.patch.txt, ORC-Compression-Ratio-Comparison.xlsx The run length encoding of integers can be improved: * tighter bit packing * allow delta encoding * allow longer runs -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4123) The RLE encoding for ORC can be improved
[ https://issues.apache.org/jira/browse/HIVE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13736931#comment-13736931 ] Brock Noland commented on HIVE-4123: [~owen.omalley] looks like your comment was accidently put in the Release Notes section. The RLE encoding for ORC can be improved Key: HIVE-4123 URL: https://issues.apache.org/jira/browse/HIVE-4123 Project: Hive Issue Type: New Feature Components: File Formats Affects Versions: 0.12.0 Reporter: Owen O'Malley Assignee: Prasanth J Labels: orcfile Fix For: 0.12.0 Attachments: HIVE-4123.1.git.patch.txt, HIVE-4123.2.git.patch.txt, HIVE-4123.3.patch.txt, HIVE-4123.4.patch.txt, HIVE-4123.5.txt, HIVE-4123.6.txt, HIVE-4123.7.txt, HIVE-4123-8.patch, HIVE-4123.8.txt, HIVE-4123.8.txt, HIVE-4123.patch.txt, ORC-Compression-Ratio-Comparison.xlsx The run length encoding of integers can be improved: * tighter bit packing * allow delta encoding * allow longer runs -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5063) Fix some non-deterministic or not-updated tests
[ https://issues.apache.org/jira/browse/HIVE-5063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737007#comment-13737007 ] Hive QA commented on HIVE-5063: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12597426/HIVE-5063.D12177.1.patch {color:green}SUCCESS:{color} +1 2789 tests passed Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/403/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/403/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. Fix some non-deterministic or not-updated tests --- Key: HIVE-5063 URL: https://issues.apache.org/jira/browse/HIVE-5063 Project: Hive Issue Type: Sub-task Components: Tests Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-5063.D12177.1.patch update result auto_join14.q,input12.q,join14.q,union_remove_19.q fix non-determinisitcs partition_date.q,partition_date2.q,ppd_vc.q,nonblock_op_deduplicate.q -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5046) Hcatalog's bin/hcat script doesn't respect HIVE_HOME
[ https://issues.apache.org/jira/browse/HIVE-5046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737008#comment-13737008 ] Mark Grover commented on HIVE-5046: --- Thanks Brock! Hcatalog's bin/hcat script doesn't respect HIVE_HOME Key: HIVE-5046 URL: https://issues.apache.org/jira/browse/HIVE-5046 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.11.0 Reporter: Mark Grover Assignee: Mark Grover Fix For: 0.12.0 Attachments: HIVE-5046.1.patch https://github.com/apache/hive/blob/trunk/hcatalog/bin/hcat#L81 The quoted snippet (see below) intends to set HIVE_HOME if it's not set (i.e. HIVE_HOME is currently null). {code} if [ -n ${HIVE_HOME} ]; then {code} However, {{-n}} checks if the variable is _not_ null. So, the above code ends up setting HIVE_HOME to the default value if it is actually set already, overriding the set value. This condition needs to be negated. Moreover, {{-n}} checks requires the string being tested to be enclosed in quotes. Reference: http://tldp.org/LDP/abs/html/comparison-ops.html -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5056) MapJoinProcessor ignores order of values in removing RS
[ https://issues.apache.org/jira/browse/HIVE-5056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737038#comment-13737038 ] Xuefu Zhang commented on HIVE-5056: --- Could anyone give concise description with enough details for other people to understand the bug? Abbreviation sometimes cause confusion too. RS? Sorry if this is obvious. MapJoinProcessor ignores order of values in removing RS --- Key: HIVE-5056 URL: https://issues.apache.org/jira/browse/HIVE-5056 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Navis Assignee: Navis Attachments: HIVE-5056.D12147.1.patch, HIVE-5056.D12147.2.patch http://www.mail-archive.com/user@hive.apache.org/msg09073.html -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4123) The RLE encoding for ORC can be improved
[ https://issues.apache.org/jira/browse/HIVE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737060#comment-13737060 ] Prasanth J commented on HIVE-4123: -- Thanks [~owen.omalley]for committing the patch! The RLE encoding for ORC can be improved Key: HIVE-4123 URL: https://issues.apache.org/jira/browse/HIVE-4123 Project: Hive Issue Type: New Feature Components: File Formats Affects Versions: 0.12.0 Reporter: Owen O'Malley Assignee: Prasanth J Labels: orcfile Fix For: 0.12.0 Attachments: HIVE-4123.1.git.patch.txt, HIVE-4123.2.git.patch.txt, HIVE-4123.3.patch.txt, HIVE-4123.4.patch.txt, HIVE-4123.5.txt, HIVE-4123.6.txt, HIVE-4123.7.txt, HIVE-4123-8.patch, HIVE-4123.8.txt, HIVE-4123.8.txt, HIVE-4123.patch.txt, ORC-Compression-Ratio-Comparison.xlsx The run length encoding of integers can be improved: * tighter bit packing * allow delta encoding * allow longer runs -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: [jira] [Commented] (HIVE-5056) MapJoinProcessor ignores order of values in removing RS
From the source code, it looks like RS indicates Reduce Sink. Sent from my iPad On 12-Aug-2013, at 10:33 pm, Xuefu Zhang (JIRA) j...@apache.org wrote: [ https://issues.apache.org/jira/browse/HIVE-5056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737038#comment-13737038 ] Xuefu Zhang commented on HIVE-5056: --- Could anyone give concise description with enough details for other people to understand the bug? Abbreviation sometimes cause confusion too. RS? Sorry if this is obvious. MapJoinProcessor ignores order of values in removing RS --- Key: HIVE-5056 URL: https://issues.apache.org/jira/browse/HIVE-5056 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Navis Assignee: Navis Attachments: HIVE-5056.D12147.1.patch, HIVE-5056.D12147.2.patch http://www.mail-archive.com/user@hive.apache.org/msg09073.html -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4388) HBase tests fail against Hadoop 2
[ https://issues.apache.org/jira/browse/HIVE-4388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737106#comment-13737106 ] Mark Grover commented on HIVE-4388: --- Brock, thanks for looking into this. I was reviewing the patch and saw that you have several references to {{getFamilyMap()}}. This method's return type was changed in newer version of HBase. Even though HBASE-9142 introduces the original method back in 0.95.2, it's deprecated. Do you think it makes more sense to use {{getFamilyCellMap()}} here instead? HBase tests fail against Hadoop 2 - Key: HIVE-4388 URL: https://issues.apache.org/jira/browse/HIVE-4388 Project: Hive Issue Type: Bug Components: HBase Handler Reporter: Gunther Hagleitner Assignee: Brock Noland Attachments: HIVE-4388.patch, HIVE-4388.patch, HIVE-4388.patch, HIVE-4388.patch, HIVE-4388.patch, HIVE-4388.patch, HIVE-4388-wip.txt Currently we're building by default against 0.92. When you run against hadoop 2 (-Dhadoop.mr.rev=23) builds fail because of: HBASE-5963. HIVE-3861 upgrades the version of hbase used. This will get you past the problem in HBASE-5963 (which was fixed in 0.94.1) but fails with: HBASE-6396. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5047) Hive client filters partitions incorrectly via pushdown in certain cases involving or
[ https://issues.apache.org/jira/browse/HIVE-5047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737109#comment-13737109 ] Sergey Shelukhin commented on HIVE-5047: [~ashutoshc] Ping? Hive client filters partitions incorrectly via pushdown in certain cases involving or --- Key: HIVE-5047 URL: https://issues.apache.org/jira/browse/HIVE-5047 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-5047.D12141.1.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5029) direct SQL perf optimization cannot be tested well (yet)
[ https://issues.apache.org/jira/browse/HIVE-5029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737108#comment-13737108 ] Sergey Shelukhin commented on HIVE-5029: [~ashutoshc] Wdyt? I am running the test to see what's wrong now, could be one of the examples of working SQL masking non working JDO, this query was added fairly recently direct SQL perf optimization cannot be tested well (yet) Key: HIVE-5029 URL: https://issues.apache.org/jira/browse/HIVE-5029 Project: Hive Issue Type: Test Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Priority: Critical Attachments: HIVE-5029.patch HIVE-4051 introduced perf optimization that involves getting partitions directly via SQL in metastore. Given that SQL queries might not work on all datastores (and will not work on non-SQL ones), JDO fallback is in place. Given that perf improvement is very large for short queries, it's on by default. However, there's a problem with tests with regard to that. If SQL code is broken, tests may fall back to JDO and pass. If JDO code is broken, SQL might allow tests to pass. We are going to disable SQL by default before the testing problem is resolved. There are several possible solultions: 1) Separate build for this setting. Seems like an overkill... 2) Enable by default; disable by default in tests, create a clone of TestCliDriver with a subset of queries that will exercise the SQL path. 3) Have some sort of test hook inside metastore that will run both ORM and SQL and compare. 3') Or make a subclass of ObjectStore that will do that. ObjectStore is already pluggable. 4) Write unit tests for one of the modes (JDO, as non-default?) and declare that they are sufficient; disable fallback in tests. 3' seems like the easiest. For now we will disable SQL by default. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3926) PPD on virtual column of partitioned table is not working
[ https://issues.apache.org/jira/browse/HIVE-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737112#comment-13737112 ] Sergey Shelukhin commented on HIVE-3926: It does filter non-partition columns in some cases, in HIVE-5047 there's a related problem. PPD on virtual column of partitioned table is not working - Key: HIVE-3926 URL: https://issues.apache.org/jira/browse/HIVE-3926 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Fix For: 0.12.0 Attachments: HIVE-3926.6.patch, HIVE-3926.D8121.1.patch, HIVE-3926.D8121.2.patch, HIVE-3926.D8121.3.patch, HIVE-3926.D8121.4.patch, HIVE-3926.D8121.5.patch {code} select * from src where BLOCK__OFFSET__INSIDE__FILE100; {code} is working, but {code} select * from srcpart where BLOCK__OFFSET__INSIDE__FILE100; {code} throws SemanticException. Disabling PPD makes it work. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3926) PPD on virtual column of partitioned table is not working
[ https://issues.apache.org/jira/browse/HIVE-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737113#comment-13737113 ] Sergey Shelukhin commented on HIVE-3926: Let me double check... (I am making changes in HIVE-4985 and HIVE-4914, they are not quite ready yet though) PPD on virtual column of partitioned table is not working - Key: HIVE-3926 URL: https://issues.apache.org/jira/browse/HIVE-3926 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Fix For: 0.12.0 Attachments: HIVE-3926.6.patch, HIVE-3926.D8121.1.patch, HIVE-3926.D8121.2.patch, HIVE-3926.D8121.3.patch, HIVE-3926.D8121.4.patch, HIVE-3926.D8121.5.patch {code} select * from src where BLOCK__OFFSET__INSIDE__FILE100; {code} is working, but {code} select * from srcpart where BLOCK__OFFSET__INSIDE__FILE100; {code} throws SemanticException. Disabling PPD makes it work. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5039) Support autoReconnect at JDBC
[ https://issues.apache.org/jira/browse/HIVE-5039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737137#comment-13737137 ] Hive QA commented on HIVE-5039: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12597453/HIVE-5039.D12183.1.patch {color:green}SUCCESS:{color} +1 2849 tests passed Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/404/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/404/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. Support autoReconnect at JDBC -- Key: HIVE-5039 URL: https://issues.apache.org/jira/browse/HIVE-5039 Project: Hive Issue Type: New Feature Components: JDBC Affects Versions: 0.11.0 Reporter: Azrael Park Assignee: Azrael Park Priority: Trivial Attachments: HIVE-5039.D12183.1.patch, HIVE-5039.patch If hiveServer2 is shutdown, connection is broken. Let the connection can reconnect automatically after hiveServer2 re-started. {noformat} jdbc:hive2://localhost:1/default?autoReconnect=true {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5003) Localize hive exec jar for tez
[ https://issues.apache.org/jira/browse/HIVE-5003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikram Dixit K updated HIVE-5003: - Attachment: HIVE-5003.4.patch.txt Updated to better re-use code. Localize hive exec jar for tez -- Key: HIVE-5003 URL: https://issues.apache.org/jira/browse/HIVE-5003 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Vikram Dixit K Fix For: tez-branch Attachments: HIVE-5003.1.patch.txt, HIVE-5003.2.patch.txt, HIVE-5003.3.patch.txt, HIVE-5003.4.patch.txt, HiveLocalizationDesign.txt Tez doesn't expose a distributed cache. JARs are localized via yarn APIs and added to vertices and the dag itself as needed. For hive we need to localize the hive-exec.jar. NO PRECOMMIT TESTS (this is wip for the tez branch) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5003) Localize hive exec jar for tez
[ https://issues.apache.org/jira/browse/HIVE-5003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737198#comment-13737198 ] Edward Capriolo commented on HIVE-5003: --- I think we are repeating a semi-disturbing trend of writing a lot of code we have little direct coverage for. For example take a method like: {code} private static Path getDefaultDestDir(Configuration conf) throws LoginException, IOException { {code} or {code} private static String getExecJarPathLocal () { {code} I think we should have direct junit style tests around these methods. The code clean (for its development state) and well documented. But I think we have the chance to do it better. Right now, for our current code, and this code. We are totally reliant on our end-to-end system to validate every minor change. If we have smaller unit tests on things like this we can have more coverage and enhance our ability to make changes to the project without having as many worries around side effects that will not manifest until final end to end tests. I think we should draw a line in the sand and here and attempt to write unit tests and design code in a testable way. Not just write it and worry about unit tests later. What do you think? Localize hive exec jar for tez -- Key: HIVE-5003 URL: https://issues.apache.org/jira/browse/HIVE-5003 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Vikram Dixit K Fix For: tez-branch Attachments: HIVE-5003.1.patch.txt, HIVE-5003.2.patch.txt, HIVE-5003.3.patch.txt, HIVE-5003.4.patch.txt, HiveLocalizationDesign.txt Tez doesn't expose a distributed cache. JARs are localized via yarn APIs and added to vertices and the dag itself as needed. For hive we need to localize the hive-exec.jar. NO PRECOMMIT TESTS (this is wip for the tez branch) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3189) cast ( string type as bigint) returning null values
[ https://issues.apache.org/jira/browse/HIVE-3189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiu updated HIVE-3189: -- Attachment: Hive-3189.patch.txt cast ( string type as bigint) returning null values - Key: HIVE-3189 URL: https://issues.apache.org/jira/browse/HIVE-3189 Project: Hive Issue Type: Bug Affects Versions: 0.8.0 Reporter: N Campbell Attachments: Hive-3189.patch.txt select rnum, c1, cast(c1 as bigint) from cert.tsdchar tsdchar where rnum in (0,1,2) create table if not exists CERT.TSDCHAR ( RNUM int , C1 string) row format sequencefile rnum c1 _c2 0 -1 null 1 0 null 2 10 null -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3189) cast ( string type as bigint) returning null values
[ https://issues.apache.org/jira/browse/HIVE-3189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiu updated HIVE-3189: -- Status: Patch Available (was: Open) It is fixed in Hive 0.9.0+ as well as the current Hive trunk and the defect could not be reproduced. A patch of new testcases is attached to verify the fix. cast ( string type as bigint) returning null values - Key: HIVE-3189 URL: https://issues.apache.org/jira/browse/HIVE-3189 Project: Hive Issue Type: Bug Affects Versions: 0.8.0 Reporter: N Campbell Attachments: Hive-3189.patch.txt select rnum, c1, cast(c1 as bigint) from cert.tsdchar tsdchar where rnum in (0,1,2) create table if not exists CERT.TSDCHAR ( RNUM int , C1 string) row format sequencefile rnum c1 _c2 0 -1 null 1 0 null 2 10 null -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5009) Fix minor optimization issues
[ https://issues.apache.org/jira/browse/HIVE-5009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Jakobus updated HIVE-5009: --- Attachment: (was: AbstractBucketJoinProc.java) Fix minor optimization issues - Key: HIVE-5009 URL: https://issues.apache.org/jira/browse/HIVE-5009 Project: Hive Issue Type: Improvement Reporter: Benjamin Jakobus Assignee: Benjamin Jakobus Priority: Minor Fix For: 0.12.0 Original Estimate: 48h Remaining Estimate: 48h I have found some minor optimization issues in the codebase, which I would like to rectify and contribute. Specifically, these are: The optimizations that could be applied to Hive's code base are as follows: 1. Use StringBuffer when appending strings - In 184 instances, the concatination operator (+=) was used when appending strings. This is inherintly inefficient - instead Java's StringBuffer or StringBuilder class should be used. 12 instances of this optimization can be applied to the GenMRSkewJoinProcessor class and another three to the optimizer. CliDriver uses the + operator inside a loop, so does the column projection utilities class (ColumnProjectionUtils) and the aforementioned skew-join processor. Tests showed that using the StringBuilder when appending strings is 57\% faster than using the + operator (using the StringBuffer took 122 milliseconds whilst the + operator took 284 milliseconds). The reason as to why using the StringBuffer class is preferred over using the + operator, is because String third = first + second; gets compiled to: StringBuilder builder = new StringBuilder( first ); builder.append( second ); third = builder.toString(); Therefore, when building complex strings, that, for example involve loops, require many instantiations (and as discussed below, creating new objects inside loops is inefficient). 2. Use arrays instead of List - Java's java.util.Arrays class asList method is a more efficient at creating creating lists from arrays than using loops to manually iterate over the elements (using asList is computationally very cheap, O(1), as it merely creates a wrapper object around the array; looping through the list however has a complexity of O(n) since a new list is created and every element in the array is added to this new list). As confirmed by the experiment detailed in Appendix D, the Java compiler does not automatically optimize and replace tight-loop copying with asList: the loop-copying of 1,000,000 items took 15 milliseconds whilst using asList is instant. Four instances of this optimization can be applied to Hive's codebase (two of these should be applied to the Map-Join container - MapJoinRowContainer) - lines 92 to 98: for (obj = other.first(); obj != null; obj = other.next()) { ArrayListObject ele = new ArrayList(obj.length); for (int i = 0; i obj.length; i++) { ele.add(obj[i]); } list.add((Row) ele); } 3. Unnecessary wrapper object creation - In 31 cases, wrapper object creation could be avoided by simply using the provided static conversion methods. As noted in the PMD documentation, using these avoids the cost of creating objects that also need to be garbage-collected later. For example, line 587 of the SemanticAnalyzer class, could be replaced by the more efficient parseDouble method call: // Inefficient: Double percent = Double.valueOf(value).doubleValue(); // To be replaced by: Double percent = Double.parseDouble(value); Our test case in Appendix D confirms this: converting 10,000 strings into integers using Integer.parseInt(gen.nextSessionId()) (i.e. creating an unnecessary wrapper object) took 119 on average; using parseInt() took only 38. Therefore creating even just one unnecessary wrapper object can make your code up to 68% slower. 4. Converting literals to strings using + - Converting literals to strings using + is quite inefficient (see Appendix D) and should be done by calling the toString() method instead: converting 1,000,000 integers to strings using + took, on average, 1340 milliseconds whilst using the toString() method only required 1183 milliseconds (hence adding empty strings takes nearly 12% more time). 89 instances of this using + when converting literals were found in Hive's codebase - one of these are found in the JoinUtil. 5. Avoid manual copying of arrays - Instead of copying arrays as is done in GroupByOperator on line 1040 (see below), the more efficient System.arraycopy can be used (arraycopy is a native method meaning that the entire memory block is copied using memcpy or mmove). // Line 1040 of the GroupByOperator for (int i = 0; i keys.length; i++) { forwardCache[i] = keys[i]; } Using
[jira] [Commented] (HIVE-5003) Localize hive exec jar for tez
[ https://issues.apache.org/jira/browse/HIVE-5003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737221#comment-13737221 ] Gunther Hagleitner commented on HIVE-5003: -- [~appodictic] I agree with you and I don't mind the tez integration being a test bed for this. I'll open a tez blocker jira for this - the reason being that we need to put some serious thought into this first. For instance: - private methods should not have unit tests, but public should I think (private should be covered implicitly, I don't want to make everything package private for testing and I don't want reflection to call those) - Mocking: We have to investigate how we do this cleanly - Dependency injection: We might have to redesign parts of the code allow proper testing easily Localize hive exec jar for tez -- Key: HIVE-5003 URL: https://issues.apache.org/jira/browse/HIVE-5003 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Vikram Dixit K Fix For: tez-branch Attachments: HIVE-5003.1.patch.txt, HIVE-5003.2.patch.txt, HIVE-5003.3.patch.txt, HIVE-5003.4.patch.txt, HiveLocalizationDesign.txt Tez doesn't expose a distributed cache. JARs are localized via yarn APIs and added to vertices and the dag itself as needed. For hive we need to localize the hive-exec.jar. NO PRECOMMIT TESTS (this is wip for the tez branch) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-1511) Hive plan serialization is slow
[ https://issues.apache.org/jira/browse/HIVE-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-1511: --- Attachment: HIVE-1511-wip3.patch More progress. Some testcases still failing. Hive plan serialization is slow --- Key: HIVE-1511 URL: https://issues.apache.org/jira/browse/HIVE-1511 Project: Hive Issue Type: Improvement Affects Versions: 0.7.0 Reporter: Ning Zhang Attachments: HIVE-1511.patch, HIVE-1511-wip2.patch, HIVE-1511-wip3.patch, HIVE-1511-wip.patch As reported by Edward Capriolo: For reference I did this as a test case SELECT * FROM src where key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR ...(100 more of these) No OOM but I gave up after the test case did not go anywhere for about 2 minutes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5018) Avoiding object instantiation in loops (issue 6)
[ https://issues.apache.org/jira/browse/HIVE-5018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Jakobus updated HIVE-5018: --- Description: Object instantiation inside loops is very expensive. Where possible, object references should be created outside the loop so that they can be reused. (was: java/org/apache/hadoop/hive/ql/Context.java java/org/apache/hadoop/hive/ql/Driver.java java/org/apache/hadoop/hive/ql/QueryPlan.java java/org/apache/hadoop/hive/ql/exec/ColumnStatsTask.java java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java java/org/apache/hadoop/hive/ql/exec/DDLTask.java java/org/apache/hadoop/hive/ql/exec/DefaultBucketMatcher.java java/org/apache/hadoop/hive/ql/exec/DemuxOperator.java java/org/apache/hadoop/hive/ql/exec/ExplainTask.java java/org/apache/hadoop/hive/ql/exec/ExprNodeGenericFuncEvaluator.java java/org/apache/hadoop/hive/ql/exec/FetchOperator.java java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java java/org/apache/hadoop/hive/ql/exec/JoinUtil.java java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java java/org/apache/hadoop/hive/ql/exec/MapOperator.java java/org/apache/hadoop/hive/ql/exec/MoveTask.java java/org/apache/hadoop/hive/ql/exec/MuxOperator.java java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java java/org/apache/hadoop/hive/ql/exec/PTFPersistence.java java/org/apache/hadoop/hive/ql/exec/PartitionKeySampler.java java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java java/org/apache/hadoop/hive/ql/exec/ScriptOperator.java java/org/apache/hadoop/hive/ql/exec/SkewJoinHandler.java java/org/apache/hadoop/hive/ql/exec/StatsTask.java java/org/apache/hadoop/hive/ql/exec/TaskFactory.java java/org/apache/hadoop/hive/ql/exec/UDFArgumentException.java java/org/apache/hadoop/hive/ql/exec/UnionOperator.java java/org/apache/hadoop/hive/ql/exec/Utilities.java java/org/apache/hadoop/hive/ql/exec/errors/RegexErrorHeuristic.java java/org/apache/hadoop/hive/ql/exec/errors/ScriptErrorHeuristic.java java/org/apache/hadoop/hive/ql/exec/errors/TaskLogProcessor.java java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java java/org/apache/hadoop/hive/ql/exec/mr/ExecReducer.java java/org/apache/hadoop/hive/ql/exec/mr/HadoopJobExecHelper.java java/org/apache/hadoop/hive/ql/exec/mr/JobDebugger.java java/org/apache/hadoop/hive/ql/exec/mr/MapRedTask.java java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java java/org/apache/hadoop/hive/ql/exec/mr/Throttle.java java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinObjectValue.java java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinRowContainer.java java/org/apache/hadoop/hive/ql/history/HiveHistory.java java/org/apache/hadoop/hive/ql/index/HiveIndexResult.java java/org/apache/hadoop/hive/ql/index/HiveIndexedInputFormat.java java/org/apache/hadoop/hive/ql/index/IndexPredicateAnalyzer.java java/org/apache/hadoop/hive/ql/index/TableBasedIndexHandler.java java/org/apache/hadoop/hive/ql/index/bitmap/BitmapIndexHandler.java java/org/apache/hadoop/hive/ql/io/BucketizedHiveInputFormat.java java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java java/org/apache/hadoop/hive/ql/io/HiveFileFormatUtils.java java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java java/org/apache/hadoop/hive/ql/io/NonSyncDataInputBuffer.java java/org/apache/hadoop/hive/ql/io/RCFile.java java/org/apache/hadoop/hive/ql/io/RCFileInputFormat.java java/org/apache/hadoop/hive/ql/io/SequenceFileInputFormatChecker.java java/org/apache/hadoop/hive/ql/io/SymbolicInputFormat.java java/org/apache/hadoop/hive/ql/io/SymlinkTextInputFormat.java java/org/apache/hadoop/hive/ql/io/orc/DynamicByteArray.java java/org/apache/hadoop/hive/ql/io/orc/DynamicIntArray.java java/org/apache/hadoop/hive/ql/io/orc/FileDump.java java/org/apache/hadoop/hive/ql/io/orc/OrcStruct.java java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java java/org/apache/hadoop/hive/ql/io/rcfile/merge/BlockMergeTask.java java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileMergeMapper.java java/org/apache/hadoop/hive/ql/io/rcfile/stats/PartialScanTask.java java/org/apache/hadoop/hive/ql/io/rcfile/truncate/ColumnTruncateMapper.java java/org/apache/hadoop/hive/ql/lockmgr/EmbeddedLockManager.java java/org/apache/hadoop/hive/ql/lockmgr/zookeeper/ZooKeeperHiveLockManager.java java/org/apache/hadoop/hive/ql/metadata/Hive.java java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java java/org/apache/hadoop/hive/ql/metadata/formatting/JsonMetaDataFormatter.java java/org/apache/hadoop/hive/ql/metadata/formatting/TextMetaDataFormatter.java java/org/apache/hadoop/hive/ql/optimizer/AbstractBucketJoinProc.java
[jira] [Updated] (HIVE-5054) Remove unused property submitviachild
[ https://issues.apache.org/jira/browse/HIVE-5054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-5054: --- Attachment: HIVE-5054.patch Previous patch has some unwanted changes. Uploading the correct patch. Remove unused property submitviachild - Key: HIVE-5054 URL: https://issues.apache.org/jira/browse/HIVE-5054 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-5054.patch, HIVE-5054.patch This property only exist in HiveConf and is always set to false. Lets get rid of dead code. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5054) Remove unused property submitviachild
[ https://issues.apache.org/jira/browse/HIVE-5054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-5054: --- Status: Open (was: Patch Available) Uploaded incorrect patch. Remove unused property submitviachild - Key: HIVE-5054 URL: https://issues.apache.org/jira/browse/HIVE-5054 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-5054.patch, HIVE-5054.patch This property only exist in HiveConf and is always set to false. Lets get rid of dead code. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5009) Fix minor optimization issues
[ https://issues.apache.org/jira/browse/HIVE-5009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737225#comment-13737225 ] Edward Capriolo commented on HIVE-5009: --- Our build process is slow. Technically do not need clean 'every' time mostly you only need it when changing the hadoop version or updating one of the libs. However the build is still 'slow' regardless of running clean first. Its just something we have to deal with for a bit until we re factor everything. Fix minor optimization issues - Key: HIVE-5009 URL: https://issues.apache.org/jira/browse/HIVE-5009 Project: Hive Issue Type: Improvement Reporter: Benjamin Jakobus Assignee: Benjamin Jakobus Priority: Minor Fix For: 0.12.0 Original Estimate: 48h Remaining Estimate: 48h I have found some minor optimization issues in the codebase, which I would like to rectify and contribute. Specifically, these are: The optimizations that could be applied to Hive's code base are as follows: 1. Use StringBuffer when appending strings - In 184 instances, the concatination operator (+=) was used when appending strings. This is inherintly inefficient - instead Java's StringBuffer or StringBuilder class should be used. 12 instances of this optimization can be applied to the GenMRSkewJoinProcessor class and another three to the optimizer. CliDriver uses the + operator inside a loop, so does the column projection utilities class (ColumnProjectionUtils) and the aforementioned skew-join processor. Tests showed that using the StringBuilder when appending strings is 57\% faster than using the + operator (using the StringBuffer took 122 milliseconds whilst the + operator took 284 milliseconds). The reason as to why using the StringBuffer class is preferred over using the + operator, is because String third = first + second; gets compiled to: StringBuilder builder = new StringBuilder( first ); builder.append( second ); third = builder.toString(); Therefore, when building complex strings, that, for example involve loops, require many instantiations (and as discussed below, creating new objects inside loops is inefficient). 2. Use arrays instead of List - Java's java.util.Arrays class asList method is a more efficient at creating creating lists from arrays than using loops to manually iterate over the elements (using asList is computationally very cheap, O(1), as it merely creates a wrapper object around the array; looping through the list however has a complexity of O(n) since a new list is created and every element in the array is added to this new list). As confirmed by the experiment detailed in Appendix D, the Java compiler does not automatically optimize and replace tight-loop copying with asList: the loop-copying of 1,000,000 items took 15 milliseconds whilst using asList is instant. Four instances of this optimization can be applied to Hive's codebase (two of these should be applied to the Map-Join container - MapJoinRowContainer) - lines 92 to 98: for (obj = other.first(); obj != null; obj = other.next()) { ArrayListObject ele = new ArrayList(obj.length); for (int i = 0; i obj.length; i++) { ele.add(obj[i]); } list.add((Row) ele); } 3. Unnecessary wrapper object creation - In 31 cases, wrapper object creation could be avoided by simply using the provided static conversion methods. As noted in the PMD documentation, using these avoids the cost of creating objects that also need to be garbage-collected later. For example, line 587 of the SemanticAnalyzer class, could be replaced by the more efficient parseDouble method call: // Inefficient: Double percent = Double.valueOf(value).doubleValue(); // To be replaced by: Double percent = Double.parseDouble(value); Our test case in Appendix D confirms this: converting 10,000 strings into integers using Integer.parseInt(gen.nextSessionId()) (i.e. creating an unnecessary wrapper object) took 119 on average; using parseInt() took only 38. Therefore creating even just one unnecessary wrapper object can make your code up to 68% slower. 4. Converting literals to strings using + - Converting literals to strings using + is quite inefficient (see Appendix D) and should be done by calling the toString() method instead: converting 1,000,000 integers to strings using + took, on average, 1340 milliseconds whilst using the toString() method only required 1183 milliseconds (hence adding empty strings takes nearly 12% more time). 89 instances of this using + when converting literals were found in Hive's codebase - one of these are found in the JoinUtil. 5. Avoid manual copying of arrays - Instead of copying arrays as is done in
[jira] [Commented] (HIVE-5009) Fix minor optimization issues
[ https://issues.apache.org/jira/browse/HIVE-5009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737229#comment-13737229 ] Benjamin Jakobus commented on HIVE-5009: OK, thanks. Fix minor optimization issues - Key: HIVE-5009 URL: https://issues.apache.org/jira/browse/HIVE-5009 Project: Hive Issue Type: Improvement Reporter: Benjamin Jakobus Assignee: Benjamin Jakobus Priority: Minor Fix For: 0.12.0 Original Estimate: 48h Remaining Estimate: 48h I have found some minor optimization issues in the codebase, which I would like to rectify and contribute. Specifically, these are: The optimizations that could be applied to Hive's code base are as follows: 1. Use StringBuffer when appending strings - In 184 instances, the concatination operator (+=) was used when appending strings. This is inherintly inefficient - instead Java's StringBuffer or StringBuilder class should be used. 12 instances of this optimization can be applied to the GenMRSkewJoinProcessor class and another three to the optimizer. CliDriver uses the + operator inside a loop, so does the column projection utilities class (ColumnProjectionUtils) and the aforementioned skew-join processor. Tests showed that using the StringBuilder when appending strings is 57\% faster than using the + operator (using the StringBuffer took 122 milliseconds whilst the + operator took 284 milliseconds). The reason as to why using the StringBuffer class is preferred over using the + operator, is because String third = first + second; gets compiled to: StringBuilder builder = new StringBuilder( first ); builder.append( second ); third = builder.toString(); Therefore, when building complex strings, that, for example involve loops, require many instantiations (and as discussed below, creating new objects inside loops is inefficient). 2. Use arrays instead of List - Java's java.util.Arrays class asList method is a more efficient at creating creating lists from arrays than using loops to manually iterate over the elements (using asList is computationally very cheap, O(1), as it merely creates a wrapper object around the array; looping through the list however has a complexity of O(n) since a new list is created and every element in the array is added to this new list). As confirmed by the experiment detailed in Appendix D, the Java compiler does not automatically optimize and replace tight-loop copying with asList: the loop-copying of 1,000,000 items took 15 milliseconds whilst using asList is instant. Four instances of this optimization can be applied to Hive's codebase (two of these should be applied to the Map-Join container - MapJoinRowContainer) - lines 92 to 98: for (obj = other.first(); obj != null; obj = other.next()) { ArrayListObject ele = new ArrayList(obj.length); for (int i = 0; i obj.length; i++) { ele.add(obj[i]); } list.add((Row) ele); } 3. Unnecessary wrapper object creation - In 31 cases, wrapper object creation could be avoided by simply using the provided static conversion methods. As noted in the PMD documentation, using these avoids the cost of creating objects that also need to be garbage-collected later. For example, line 587 of the SemanticAnalyzer class, could be replaced by the more efficient parseDouble method call: // Inefficient: Double percent = Double.valueOf(value).doubleValue(); // To be replaced by: Double percent = Double.parseDouble(value); Our test case in Appendix D confirms this: converting 10,000 strings into integers using Integer.parseInt(gen.nextSessionId()) (i.e. creating an unnecessary wrapper object) took 119 on average; using parseInt() took only 38. Therefore creating even just one unnecessary wrapper object can make your code up to 68% slower. 4. Converting literals to strings using + - Converting literals to strings using + is quite inefficient (see Appendix D) and should be done by calling the toString() method instead: converting 1,000,000 integers to strings using + took, on average, 1340 milliseconds whilst using the toString() method only required 1183 milliseconds (hence adding empty strings takes nearly 12% more time). 89 instances of this using + when converting literals were found in Hive's codebase - one of these are found in the JoinUtil. 5. Avoid manual copying of arrays - Instead of copying arrays as is done in GroupByOperator on line 1040 (see below), the more efficient System.arraycopy can be used (arraycopy is a native method meaning that the entire memory block is copied using memcpy or mmove). // Line 1040 of the GroupByOperator for (int i = 0; i keys.length; i++) { forwardCache[i] = keys[i]; }
[jira] [Created] (HIVE-5064) TestParse fails on JDK7
Brock Noland created HIVE-5064: -- Summary: TestParse fails on JDK7 Key: HIVE-5064 URL: https://issues.apache.org/jira/browse/HIVE-5064 Project: Hive Issue Type: Bug Components: Tests Reporter: Brock Noland Assignee: Brock Noland TestParse fails on JDK 7 because of the order of XML attributes -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5003) Localize hive exec jar for tez
[ https://issues.apache.org/jira/browse/HIVE-5003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737235#comment-13737235 ] Gunther Hagleitner commented on HIVE-5003: -- [~vikram.dixit] Some review comments (RB would be great): - CANNOT_FIND_EXEC_JAR isn't used/ please remove - hive.jar.directory should be /user/hive/... - DagUtils class level javadoc comment is already there (seems you're adding a second one) - e.printStackTrace is not good as a debugging method. Either log or pass it on to the caller - // java magic isn't a great comment, be good to say what the magic is achieving - getResourceVersion comment doesn't match the logic. it will return the basename not the version string - There's a system.out.println in the code Localize hive exec jar for tez -- Key: HIVE-5003 URL: https://issues.apache.org/jira/browse/HIVE-5003 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Vikram Dixit K Fix For: tez-branch Attachments: HIVE-5003.1.patch.txt, HIVE-5003.2.patch.txt, HIVE-5003.3.patch.txt, HIVE-5003.4.patch.txt, HiveLocalizationDesign.txt Tez doesn't expose a distributed cache. JARs are localized via yarn APIs and added to vertices and the dag itself as needed. For hive we need to localize the hive-exec.jar. NO PRECOMMIT TESTS (this is wip for the tez branch) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5065) Create proper (i.e.: non .q file based) junit tests for DagUtils and TezTask
[ https://issues.apache.org/jira/browse/HIVE-5065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-5065: - Priority: Blocker (was: Major) Create proper (i.e.: non .q file based) junit tests for DagUtils and TezTask Key: HIVE-5065 URL: https://issues.apache.org/jira/browse/HIVE-5065 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Priority: Blocker Fix For: tez-branch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5003) Localize hive exec jar for tez
[ https://issues.apache.org/jira/browse/HIVE-5003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737245#comment-13737245 ] Edward Capriolo commented on HIVE-5003: --- I understand what you are saying. I am ok with the package private idea and dependency injection, I generally prefer that to a heavy solution like mocking. I would not call this a blocker, but I think we need to design more with testing in mind. Lets talk it over elsewhere. Localize hive exec jar for tez -- Key: HIVE-5003 URL: https://issues.apache.org/jira/browse/HIVE-5003 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Vikram Dixit K Fix For: tez-branch Attachments: HIVE-5003.1.patch.txt, HIVE-5003.2.patch.txt, HIVE-5003.3.patch.txt, HIVE-5003.4.patch.txt, HiveLocalizationDesign.txt Tez doesn't expose a distributed cache. JARs are localized via yarn APIs and added to vertices and the dag itself as needed. For hive we need to localize the hive-exec.jar. NO PRECOMMIT TESTS (this is wip for the tez branch) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-5065) Create proper (i.e.: non .q file based) junit tests for DagUtils and TezTask
Gunther Hagleitner created HIVE-5065: Summary: Create proper (i.e.: non .q file based) junit tests for DagUtils and TezTask Key: HIVE-5065 URL: https://issues.apache.org/jira/browse/HIVE-5065 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Fix For: tez-branch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5064) TestParse fails on JDK7
[ https://issues.apache.org/jira/browse/HIVE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737253#comment-13737253 ] Xuefu Zhang commented on HIVE-5064: --- It seems multiple tickets are raised to address the same issue. HIVE-1551 and HIVE-4885 were talking about different serialization mechanisms. TestParse fails on JDK7 --- Key: HIVE-5064 URL: https://issues.apache.org/jira/browse/HIVE-5064 Project: Hive Issue Type: Bug Components: Tests Reporter: Brock Noland Assignee: Brock Noland TestParse fails on JDK 7 because of the order of XML attributes -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5064) TestParse fails on JDK7
[ https://issues.apache.org/jira/browse/HIVE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737237#comment-13737237 ] Brock Noland commented on HIVE-5064: I propose before doing the diff we turn the xml into [Canoncial XML|http://en.wikipedia.org/wiki/Canonical_XML]. For example: {noformat} [brock@bigboy ~]$ cat test.xml root attr z=value k = value a= value / /root [brock@bigboy ~]$ [brock@bigboy ~]$ xmllint test.xml ?xml version=1.0? root attr z=value k=value a=value/ /root [brock@bigboy ~]$ xmllint --c14n test.xml ; echo root attr a=value k=value z=value/attr /root {noformat} TestParse fails on JDK7 --- Key: HIVE-5064 URL: https://issues.apache.org/jira/browse/HIVE-5064 Project: Hive Issue Type: Bug Components: Tests Reporter: Brock Noland Assignee: Brock Noland TestParse fails on JDK 7 because of the order of XML attributes -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5064) TestParse fails on JDK7
[ https://issues.apache.org/jira/browse/HIVE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737258#comment-13737258 ] Brock Noland commented on HIVE-5064: Thanks [~xuefuz], I had forgotten about HIVE-4885. I'll mark this a dup. TestParse fails on JDK7 --- Key: HIVE-5064 URL: https://issues.apache.org/jira/browse/HIVE-5064 Project: Hive Issue Type: Bug Components: Tests Reporter: Brock Noland Assignee: Brock Noland TestParse fails on JDK 7 because of the order of XML attributes -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4885) Alternative object serialization for execution plan in hive testing
[ https://issues.apache.org/jira/browse/HIVE-4885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737257#comment-13737257 ] Brock Noland commented on HIVE-4885: Hey guys, since this patch fixes the tests on JDK7 how about we commit and open a follow-on JIRA about a different way of doing the serialization? Alternative object serialization for execution plan in hive testing Key: HIVE-4885 URL: https://issues.apache.org/jira/browse/HIVE-4885 Project: Hive Issue Type: Improvement Components: CLI Affects Versions: 0.10.0, 0.11.0 Reporter: Xuefu Zhang Assignee: Xuefu Zhang Fix For: 0.12.0 Attachments: HIVE-4885.patch Currently there are a lot of test cases involving in comparing execution plan, such as those in TestParse suite. XmlEncoder is used to serialize the generated plan by hive, and store it in the file for file diff comparison. However, XmlEncoder is tied with Java compiler, whose implementation may change from version to version. Thus, upgrade the compiler can generate a lot of fake test failures. The following is an example of diff generated when running hive with JDK7: {code} Begin query: case_sensitivity.q diff -a /data/4/hive-local/a2307.halxg.cloudera.com-hiveptest-2/cdh-source/build/ql/test/logs/positive/case_sensitivity.q.out /data/4/hive-local/a2307.halxg.cloudera.com-hiveptest-2/cdh-source/ql/src/test/results/compiler/parse/case_sensitivity.q.out diff -a -b /data/4/hive-local/a2307.halxg.cloudera.com-hiveptest-2/cdh-source/build/ql/test/logs/positive/case_sensitivity.q.xml /data/4/hive-local/a2307.halxg.cloudera.com-hiveptest-2/cdh-source/ql/src/test/results/compiler/plan/case_sensitivity.q.xml 3c3 object class=org.apache.hadoop.hive.ql.exec.MapRedTask id=MapRedTask0 --- object id=MapRedTask0 class=org.apache.hadoop.hive.ql.exec.MapRedTask 12c12 object class=java.util.ArrayList id=ArrayList0 --- object id=ArrayList0 class=java.util.ArrayList 14c14 object class=org.apache.hadoop.hive.ql.exec.MoveTask id=MoveTask0 --- object id=MoveTask0 class=org.apache.hadoop.hive.ql.exec.MoveTask 18c18 object class=org.apache.hadoop.hive.ql.exec.MoveTask id=MoveTask1 --- object id=MoveTask1 class=org.apache.hadoop.hive.ql.exec.MoveTask 22c22 object class=org.apache.hadoop.hive.ql.exec.StatsTask id=StatsTask0 --- object id=StatsTask0 class=org.apache.hadoop.hive.ql.exec.StatsTask 60c60 object class=org.apache.hadoop.hive.ql.exec.MapRedTask id=MapRedTask1 --- object id=MapRedTask1 class=org.apache.hadoop.hive.ql.exec.MapRedTask {code} As it can be seen, the only difference is the order of the attributes in the serialized XML doc, yet it brings 50+ test failures in Hive. We need to have a better plan comparison, or object serialization to improve the situation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4899) Hive returns non-meanful error message for ill-formed fs.default.name
[ https://issues.apache.org/jira/browse/HIVE-4899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-4899: --- Resolution: Fixed Status: Resolved (was: Patch Available) I committed this to trunk. Thanks Xuefu for the patch and Ashutosh for the review! Hive returns non-meanful error message for ill-formed fs.default.name - Key: HIVE-4899 URL: https://issues.apache.org/jira/browse/HIVE-4899 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.10.0, 0.11.0 Reporter: Xuefu Zhang Assignee: Xuefu Zhang Priority: Minor Fix For: 0.12.0 Attachments: HIVE-4899.patch For query in test case fs_default_name1.q: {code} set fs.default.name='http://www.example.com; show tables; {code} The following error message is returned: {code} FAILED: IllegalArgumentException null {code} The message is not very meaningful, and has null in it. It would be better if we can provide detailed error message. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3630) udf_substr.q fails when using JDK7
[ https://issues.apache.org/jira/browse/HIVE-3630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737266#comment-13737266 ] Brock Noland commented on HIVE-3630: Xuefu, I don't follow your last comment. It seems this test is now passing on JDK7 and this JIRA can be resolved, is that what you are saying? udf_substr.q fails when using JDK7 -- Key: HIVE-3630 URL: https://issues.apache.org/jira/browse/HIVE-3630 Project: Hive Issue Type: Bug Components: Tests Affects Versions: 0.9.1, 0.10.0, 0.11.0 Reporter: Chris Drome Assignee: Chris Drome Attachments: HIVE-3630-0.10.patch, HIVE-3630-0.9.patch, HIVE-3630-trunk.patch Internal error: Cannot find ConstantObjectInspector for BINARY This exception has two causes. JDK7 iterators do not return values in the same order as JDK6, which selects a different implementation of this UDF when the first argument is null. With JDK7 this happens to be the binary version. The binary version is not implemented properly which ultimately causes the exception when the method is called. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HIVE-5064) TestParse fails on JDK7
[ https://issues.apache.org/jira/browse/HIVE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland resolved HIVE-5064. Resolution: Duplicate TestParse fails on JDK7 --- Key: HIVE-5064 URL: https://issues.apache.org/jira/browse/HIVE-5064 Project: Hive Issue Type: Bug Components: Tests Reporter: Brock Noland Assignee: Brock Noland TestParse fails on JDK 7 because of the order of XML attributes -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4123) The RLE encoding for ORC can be improved
[ https://issues.apache.org/jira/browse/HIVE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737282#comment-13737282 ] Hudson commented on HIVE-4123: -- SUCCESS: Integrated in Hive-trunk-hadoop1-ptest #124 (See [https://builds.apache.org/job/Hive-trunk-hadoop1-ptest/124/]) HIVE-4123 Improved ORC integer RLE version 2. (Prasanth Jayachandran via omalley) (omalley: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1513155) * /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java * /hive/trunk/ql/src/gen/protobuf/gen-java/org/apache/hadoop/hive/ql/io/orc/OrcProto.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/IntegerReader.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/IntegerWriter.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java.orig * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthIntegerReader.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthIntegerReaderV2.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthIntegerWriter.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthIntegerWriterV2.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/SerializationUtils.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java * /hive/trunk/ql/src/protobuf/org/apache/hadoop/hive/ql/io/orc/orc_proto.proto * /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestBitPack.java * /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestIntegerCompressionReader.java * /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestNewIntegerEncoding.java * /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcNullOptimization.java * /hive/trunk/ql/src/test/resources/orc-file-dump-dictionary-threshold.out * /hive/trunk/ql/src/test/resources/orc-file-dump.out The RLE encoding for ORC can be improved Key: HIVE-4123 URL: https://issues.apache.org/jira/browse/HIVE-4123 Project: Hive Issue Type: New Feature Components: File Formats Affects Versions: 0.12.0 Reporter: Owen O'Malley Assignee: Prasanth J Labels: orcfile Fix For: 0.12.0 Attachments: HIVE-4123.1.git.patch.txt, HIVE-4123.2.git.patch.txt, HIVE-4123.3.patch.txt, HIVE-4123.4.patch.txt, HIVE-4123.5.txt, HIVE-4123.6.txt, HIVE-4123.7.txt, HIVE-4123-8.patch, HIVE-4123.8.txt, HIVE-4123.8.txt, HIVE-4123.patch.txt, ORC-Compression-Ratio-Comparison.xlsx The run length encoding of integers can be improved: * tighter bit packing * allow delta encoding * allow longer runs -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4579) Create a SARG interface for RecordReaders
[ https://issues.apache.org/jira/browse/HIVE-4579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737283#comment-13737283 ] Hudson commented on HIVE-4579: -- SUCCESS: Integrated in Hive-trunk-hadoop1-ptest #124 (See [https://builds.apache.org/job/Hive-trunk-hadoop1-ptest/124/]) HIVE-4579: Create a SARG interface for RecordReaders (Owen O'Malley via Gunther Hagleitner) (gunther: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1513029) * /hive/trunk/ivy/libraries.properties * /hive/trunk/ql/ivy.xml * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/sarg * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/sarg/PredicateLeaf.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/sarg/SearchArgument.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/sarg/SearchArgumentImpl.java * /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/sarg * /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/sarg/TestSearchArgumentImpl.java Create a SARG interface for RecordReaders - Key: HIVE-4579 URL: https://issues.apache.org/jira/browse/HIVE-4579 Project: Hive Issue Type: Improvement Reporter: Owen O'Malley Assignee: Owen O'Malley Fix For: 0.12.0 Attachments: h-4579.patch, HIVE-4579.4.patch, HIVE-4579.D11409.1.patch, HIVE-4579.D11409.2.patch, HIVE-4579.D11409.3.patch, pushdown.pdf I think we should create a SARG (http://en.wikipedia.org/wiki/Sargable) interface for RecordReaders. For a first pass, I'll create an API that uses the value stored in hive.io.filter.expr.serialized. The desire is to define an simpler interface that the direct AST expression that is provided by hive.io.filter.expr.serialized so that the code to evaluate expressions can be generalized instead of put inside a particular RecordReader. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2482) Convenience UDFs for binary data type
[ https://issues.apache.org/jira/browse/HIVE-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737287#comment-13737287 ] Mark Wagner commented on HIVE-2482: --- bq. I think we should not do this ^ lets make another UDF, or overload the parameters of this one. Is there any way to deprecate a UDF that will move people away from the current 'unhex'? The only difference from the updated version is that the current one wraps the output in Text, so that it could be used by Hive before the binary support. Now that there is binary support it doesn't make any sense for unhex to wrap its output. Convenience UDFs for binary data type - Key: HIVE-2482 URL: https://issues.apache.org/jira/browse/HIVE-2482 Project: Hive Issue Type: New Feature Reporter: Ashutosh Chauhan Assignee: Mark Wagner Fix For: 0.12.0 Attachments: HIVE-2482.1.patch, HIVE-2482.2.patch, HIVE-2482.3.patch, HIVE-2482.4.patch HIVE-2380 introduced binary data type in Hive. It will be good to have following udfs to make it more useful: * UDF's to convert to/from hex string * UDF's to convert to/from string using a specific encoding * UDF's to convert to/from base64 string -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5029) direct SQL perf optimization cannot be tested well (yet)
[ https://issues.apache.org/jira/browse/HIVE-5029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737294#comment-13737294 ] Sergey Shelukhin commented on HIVE-5029: The test passes on my test machine on recent trunk direct SQL perf optimization cannot be tested well (yet) Key: HIVE-5029 URL: https://issues.apache.org/jira/browse/HIVE-5029 Project: Hive Issue Type: Test Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Priority: Critical Attachments: HIVE-5029.patch HIVE-4051 introduced perf optimization that involves getting partitions directly via SQL in metastore. Given that SQL queries might not work on all datastores (and will not work on non-SQL ones), JDO fallback is in place. Given that perf improvement is very large for short queries, it's on by default. However, there's a problem with tests with regard to that. If SQL code is broken, tests may fall back to JDO and pass. If JDO code is broken, SQL might allow tests to pass. We are going to disable SQL by default before the testing problem is resolved. There are several possible solultions: 1) Separate build for this setting. Seems like an overkill... 2) Enable by default; disable by default in tests, create a clone of TestCliDriver with a subset of queries that will exercise the SQL path. 3) Have some sort of test hook inside metastore that will run both ORM and SQL and compare. 3') Or make a subclass of ObjectStore that will do that. ObjectStore is already pluggable. 4) Write unit tests for one of the modes (JDO, as non-default?) and declare that they are sufficient; disable fallback in tests. 3' seems like the easiest. For now we will disable SQL by default. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2482) Convenience UDFs for binary data type
[ https://issues.apache.org/jira/browse/HIVE-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737295#comment-13737295 ] Ashutosh Chauhan commented on HIVE-2482: I agree with [~mwagner] analysis. Convenience UDFs for binary data type - Key: HIVE-2482 URL: https://issues.apache.org/jira/browse/HIVE-2482 Project: Hive Issue Type: New Feature Reporter: Ashutosh Chauhan Assignee: Mark Wagner Fix For: 0.12.0 Attachments: HIVE-2482.1.patch, HIVE-2482.2.patch, HIVE-2482.3.patch, HIVE-2482.4.patch HIVE-2380 introduced binary data type in Hive. It will be good to have following udfs to make it more useful: * UDF's to convert to/from hex string * UDF's to convert to/from string using a specific encoding * UDF's to convert to/from base64 string -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4885) Alternative object serialization for execution plan in hive testing
[ https://issues.apache.org/jira/browse/HIVE-4885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737297#comment-13737297 ] Ashutosh Chauhan commented on HIVE-4885: I am fine with moving forward on this one. Don't know if [~appodictic] has some concerns or other suggestions for this issue. Alternative object serialization for execution plan in hive testing Key: HIVE-4885 URL: https://issues.apache.org/jira/browse/HIVE-4885 Project: Hive Issue Type: Improvement Components: CLI Affects Versions: 0.10.0, 0.11.0 Reporter: Xuefu Zhang Assignee: Xuefu Zhang Fix For: 0.12.0 Attachments: HIVE-4885.patch Currently there are a lot of test cases involving in comparing execution plan, such as those in TestParse suite. XmlEncoder is used to serialize the generated plan by hive, and store it in the file for file diff comparison. However, XmlEncoder is tied with Java compiler, whose implementation may change from version to version. Thus, upgrade the compiler can generate a lot of fake test failures. The following is an example of diff generated when running hive with JDK7: {code} Begin query: case_sensitivity.q diff -a /data/4/hive-local/a2307.halxg.cloudera.com-hiveptest-2/cdh-source/build/ql/test/logs/positive/case_sensitivity.q.out /data/4/hive-local/a2307.halxg.cloudera.com-hiveptest-2/cdh-source/ql/src/test/results/compiler/parse/case_sensitivity.q.out diff -a -b /data/4/hive-local/a2307.halxg.cloudera.com-hiveptest-2/cdh-source/build/ql/test/logs/positive/case_sensitivity.q.xml /data/4/hive-local/a2307.halxg.cloudera.com-hiveptest-2/cdh-source/ql/src/test/results/compiler/plan/case_sensitivity.q.xml 3c3 object class=org.apache.hadoop.hive.ql.exec.MapRedTask id=MapRedTask0 --- object id=MapRedTask0 class=org.apache.hadoop.hive.ql.exec.MapRedTask 12c12 object class=java.util.ArrayList id=ArrayList0 --- object id=ArrayList0 class=java.util.ArrayList 14c14 object class=org.apache.hadoop.hive.ql.exec.MoveTask id=MoveTask0 --- object id=MoveTask0 class=org.apache.hadoop.hive.ql.exec.MoveTask 18c18 object class=org.apache.hadoop.hive.ql.exec.MoveTask id=MoveTask1 --- object id=MoveTask1 class=org.apache.hadoop.hive.ql.exec.MoveTask 22c22 object class=org.apache.hadoop.hive.ql.exec.StatsTask id=StatsTask0 --- object id=StatsTask0 class=org.apache.hadoop.hive.ql.exec.StatsTask 60c60 object class=org.apache.hadoop.hive.ql.exec.MapRedTask id=MapRedTask1 --- object id=MapRedTask1 class=org.apache.hadoop.hive.ql.exec.MapRedTask {code} As it can be seen, the only difference is the order of the attributes in the serialized XML doc, yet it brings 50+ test failures in Hive. We need to have a better plan comparison, or object serialization to improve the situation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4885) Alternative object serialization for execution plan in hive testing
[ https://issues.apache.org/jira/browse/HIVE-4885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737307#comment-13737307 ] Edward Capriolo commented on HIVE-4885: --- +1 move forward. Alternative object serialization for execution plan in hive testing Key: HIVE-4885 URL: https://issues.apache.org/jira/browse/HIVE-4885 Project: Hive Issue Type: Improvement Components: CLI Affects Versions: 0.10.0, 0.11.0 Reporter: Xuefu Zhang Assignee: Xuefu Zhang Fix For: 0.12.0 Attachments: HIVE-4885.patch Currently there are a lot of test cases involving in comparing execution plan, such as those in TestParse suite. XmlEncoder is used to serialize the generated plan by hive, and store it in the file for file diff comparison. However, XmlEncoder is tied with Java compiler, whose implementation may change from version to version. Thus, upgrade the compiler can generate a lot of fake test failures. The following is an example of diff generated when running hive with JDK7: {code} Begin query: case_sensitivity.q diff -a /data/4/hive-local/a2307.halxg.cloudera.com-hiveptest-2/cdh-source/build/ql/test/logs/positive/case_sensitivity.q.out /data/4/hive-local/a2307.halxg.cloudera.com-hiveptest-2/cdh-source/ql/src/test/results/compiler/parse/case_sensitivity.q.out diff -a -b /data/4/hive-local/a2307.halxg.cloudera.com-hiveptest-2/cdh-source/build/ql/test/logs/positive/case_sensitivity.q.xml /data/4/hive-local/a2307.halxg.cloudera.com-hiveptest-2/cdh-source/ql/src/test/results/compiler/plan/case_sensitivity.q.xml 3c3 object class=org.apache.hadoop.hive.ql.exec.MapRedTask id=MapRedTask0 --- object id=MapRedTask0 class=org.apache.hadoop.hive.ql.exec.MapRedTask 12c12 object class=java.util.ArrayList id=ArrayList0 --- object id=ArrayList0 class=java.util.ArrayList 14c14 object class=org.apache.hadoop.hive.ql.exec.MoveTask id=MoveTask0 --- object id=MoveTask0 class=org.apache.hadoop.hive.ql.exec.MoveTask 18c18 object class=org.apache.hadoop.hive.ql.exec.MoveTask id=MoveTask1 --- object id=MoveTask1 class=org.apache.hadoop.hive.ql.exec.MoveTask 22c22 object class=org.apache.hadoop.hive.ql.exec.StatsTask id=StatsTask0 --- object id=StatsTask0 class=org.apache.hadoop.hive.ql.exec.StatsTask 60c60 object class=org.apache.hadoop.hive.ql.exec.MapRedTask id=MapRedTask1 --- object id=MapRedTask1 class=org.apache.hadoop.hive.ql.exec.MapRedTask {code} As it can be seen, the only difference is the order of the attributes in the serialized XML doc, yet it brings 50+ test failures in Hive. We need to have a better plan comparison, or object serialization to improve the situation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4885) Alternative object serialization for execution plan in hive testing
[ https://issues.apache.org/jira/browse/HIVE-4885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737311#comment-13737311 ] Brock Noland commented on HIVE-4885: Sounds good. I am +1 on the patch as well. Alternative object serialization for execution plan in hive testing Key: HIVE-4885 URL: https://issues.apache.org/jira/browse/HIVE-4885 Project: Hive Issue Type: Improvement Components: CLI Affects Versions: 0.10.0, 0.11.0 Reporter: Xuefu Zhang Assignee: Xuefu Zhang Fix For: 0.12.0 Attachments: HIVE-4885.patch Currently there are a lot of test cases involving in comparing execution plan, such as those in TestParse suite. XmlEncoder is used to serialize the generated plan by hive, and store it in the file for file diff comparison. However, XmlEncoder is tied with Java compiler, whose implementation may change from version to version. Thus, upgrade the compiler can generate a lot of fake test failures. The following is an example of diff generated when running hive with JDK7: {code} Begin query: case_sensitivity.q diff -a /data/4/hive-local/a2307.halxg.cloudera.com-hiveptest-2/cdh-source/build/ql/test/logs/positive/case_sensitivity.q.out /data/4/hive-local/a2307.halxg.cloudera.com-hiveptest-2/cdh-source/ql/src/test/results/compiler/parse/case_sensitivity.q.out diff -a -b /data/4/hive-local/a2307.halxg.cloudera.com-hiveptest-2/cdh-source/build/ql/test/logs/positive/case_sensitivity.q.xml /data/4/hive-local/a2307.halxg.cloudera.com-hiveptest-2/cdh-source/ql/src/test/results/compiler/plan/case_sensitivity.q.xml 3c3 object class=org.apache.hadoop.hive.ql.exec.MapRedTask id=MapRedTask0 --- object id=MapRedTask0 class=org.apache.hadoop.hive.ql.exec.MapRedTask 12c12 object class=java.util.ArrayList id=ArrayList0 --- object id=ArrayList0 class=java.util.ArrayList 14c14 object class=org.apache.hadoop.hive.ql.exec.MoveTask id=MoveTask0 --- object id=MoveTask0 class=org.apache.hadoop.hive.ql.exec.MoveTask 18c18 object class=org.apache.hadoop.hive.ql.exec.MoveTask id=MoveTask1 --- object id=MoveTask1 class=org.apache.hadoop.hive.ql.exec.MoveTask 22c22 object class=org.apache.hadoop.hive.ql.exec.StatsTask id=StatsTask0 --- object id=StatsTask0 class=org.apache.hadoop.hive.ql.exec.StatsTask 60c60 object class=org.apache.hadoop.hive.ql.exec.MapRedTask id=MapRedTask1 --- object id=MapRedTask1 class=org.apache.hadoop.hive.ql.exec.MapRedTask {code} As it can be seen, the only difference is the order of the attributes in the serialized XML doc, yet it brings 50+ test failures in Hive. We need to have a better plan comparison, or object serialization to improve the situation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Comment Edited] (HIVE-2482) Convenience UDFs for binary data type
[ https://issues.apache.org/jira/browse/HIVE-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737319#comment-13737319 ] Edward Capriolo edited comment on HIVE-2482 at 8/12/13 8:45 PM: I am ok with it as well, but remember everything you change breaks someones workflow. was (Author: appodictic): I am ok with it as well, but temember everything you change breaks someones workflow. Convenience UDFs for binary data type - Key: HIVE-2482 URL: https://issues.apache.org/jira/browse/HIVE-2482 Project: Hive Issue Type: New Feature Reporter: Ashutosh Chauhan Assignee: Mark Wagner Fix For: 0.12.0 Attachments: HIVE-2482.1.patch, HIVE-2482.2.patch, HIVE-2482.3.patch, HIVE-2482.4.patch HIVE-2380 introduced binary data type in Hive. It will be good to have following udfs to make it more useful: * UDF's to convert to/from hex string * UDF's to convert to/from string using a specific encoding * UDF's to convert to/from base64 string -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2482) Convenience UDFs for binary data type
[ https://issues.apache.org/jira/browse/HIVE-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737319#comment-13737319 ] Edward Capriolo commented on HIVE-2482: --- I am ok with it as well, but temember everything you change breaks someones workflow. Convenience UDFs for binary data type - Key: HIVE-2482 URL: https://issues.apache.org/jira/browse/HIVE-2482 Project: Hive Issue Type: New Feature Reporter: Ashutosh Chauhan Assignee: Mark Wagner Fix For: 0.12.0 Attachments: HIVE-2482.1.patch, HIVE-2482.2.patch, HIVE-2482.3.patch, HIVE-2482.4.patch HIVE-2380 introduced binary data type in Hive. It will be good to have following udfs to make it more useful: * UDF's to convert to/from hex string * UDF's to convert to/from string using a specific encoding * UDF's to convert to/from base64 string -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-1511) Hive plan serialization is slow
[ https://issues.apache.org/jira/browse/HIVE-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737324#comment-13737324 ] Ashutosh Chauhan commented on HIVE-1511: Had a brief chat with [~kamrul] who expressed interest in working on this. Assigning it to him. Hive plan serialization is slow --- Key: HIVE-1511 URL: https://issues.apache.org/jira/browse/HIVE-1511 Project: Hive Issue Type: Improvement Affects Versions: 0.7.0 Reporter: Ning Zhang Attachments: HIVE-1511.patch, HIVE-1511-wip2.patch, HIVE-1511-wip3.patch, HIVE-1511-wip.patch As reported by Edward Capriolo: For reference I did this as a test case SELECT * FROM src where key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR ...(100 more of these) No OOM but I gave up after the test case did not go anywhere for about 2 minutes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-1511) Hive plan serialization is slow
[ https://issues.apache.org/jira/browse/HIVE-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-1511: --- Assignee: Mohammad Kamrul Islam Hive plan serialization is slow --- Key: HIVE-1511 URL: https://issues.apache.org/jira/browse/HIVE-1511 Project: Hive Issue Type: Improvement Affects Versions: 0.7.0 Reporter: Ning Zhang Assignee: Mohammad Kamrul Islam Attachments: HIVE-1511.patch, HIVE-1511-wip2.patch, HIVE-1511-wip3.patch, HIVE-1511-wip.patch As reported by Edward Capriolo: For reference I did this as a test case SELECT * FROM src where key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR ...(100 more of these) No OOM but I gave up after the test case did not go anywhere for about 2 minutes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5066) [WebHCat] Other code fixes for Windows
[ https://issues.apache.org/jira/browse/HIVE-5066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated HIVE-5066: - Summary: [WebHCat] Other code fixes for Windows (was: Other code fixes for Windows) [WebHCat] Other code fixes for Windows -- Key: HIVE-5066 URL: https://issues.apache.org/jira/browse/HIVE-5066 Project: Hive Issue Type: Sub-task Components: HCatalog Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.12.0 This is equivalent to HCATALOG-526, but updated to sync with latest trunk. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5066) [WebHCat] Other code fixes for Windows
[ https://issues.apache.org/jira/browse/HIVE-5066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated HIVE-5066: - Attachment: HIVE-5034-1.patch [WebHCat] Other code fixes for Windows -- Key: HIVE-5066 URL: https://issues.apache.org/jira/browse/HIVE-5066 Project: Hive Issue Type: Sub-task Components: HCatalog Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.12.0 Attachments: HIVE-5034-1.patch This is equivalent to HCATALOG-526, but updated to sync with latest trunk. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-5067) Add bzip compressor for ORC
Owen O'Malley created HIVE-5067: --- Summary: Add bzip compressor for ORC Key: HIVE-5067 URL: https://issues.apache.org/jira/browse/HIVE-5067 Project: Hive Issue Type: Bug Components: File Formats Reporter: Owen O'Malley It would be good to add a bzip compressor for ORC. Bzip does very well for long term/cold storage. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-5068) Some queries fail due to xml ecoder error
Brock Noland created HIVE-5068: -- Summary: Some queries fail due to xml ecoder error Key: HIVE-5068 URL: https://issues.apache.org/jira/browse/HIVE-5068 Project: Hive Issue Type: Sub-task Reporter: Brock Noland Looks like something snuck in that breaks the JDK 7 build: {noformat} Caused by: java.lang.Exception: XMLEncoder: discarding statement ArrayList.add(ASTNode); ... 106 more Caused by: java.lang.RuntimeException: Cannot serialize object at org.apache.hadoop.hive.ql.exec.Utilities$1.exceptionThrown(Utilities.java:598) at java.beans.DefaultPersistenceDelegate.initBean(DefaultPersistenceDelegate.java:238) at java.beans.DefaultPersistenceDelegate.initialize(DefaultPersistenceDelegate.java:400) at java.beans.PersistenceDelegate.writeObject(PersistenceDelegate.java:118) at java.beans.Encoder.writeObject(Encoder.java:74) at java.beans.XMLEncoder.writeObject(XMLEncoder.java:327) at java.beans.Encoder.writeExpression(Encoder.java:330) at java.beans.XMLEncoder.writeExpression(XMLEncoder.java:454) at java.beans.PersistenceDelegate.writeObject(PersistenceDelegate.java:115) at java.beans.Encoder.writeObject(Encoder.java:74) at java.beans.XMLEncoder.writeObject(XMLEncoder.java:327) at java.beans.Encoder.writeObject1(Encoder.java:258) at java.beans.Encoder.cloneStatement(Encoder.java:271) at java.beans.Encoder.writeStatement(Encoder.java:301) at java.beans.XMLEncoder.writeStatement(XMLEncoder.java:400) ... 105 more Caused by: java.lang.RuntimeException: Cannot serialize object at org.apache.hadoop.hive.ql.exec.Utilities$1.exceptionThrown(Utilities.java:598) at java.beans.Encoder.getValue(Encoder.java:108) at java.beans.Encoder.get(Encoder.java:252) at java.beans.PersistenceDelegate.writeObject(PersistenceDelegate.java:112) at java.beans.Encoder.writeObject(Encoder.java:74) at java.beans.XMLEncoder.writeObject(XMLEncoder.java:327) at java.beans.Encoder.writeExpression(Encoder.java:330) at java.beans.XMLEncoder.writeExpression(XMLEncoder.java:454) at java.beans.PersistenceDelegate.writeObject(PersistenceDelegate.java:115) at java.beans.Encoder.writeObject(Encoder.java:74) at java.beans.XMLEncoder.writeObject(XMLEncoder.java:327) at java.beans.Encoder.writeExpression(Encoder.java:330) at java.beans.XMLEncoder.writeExpression(XMLEncoder.java:454) at java.beans.DefaultPersistenceDelegate.initBean(DefaultPersistenceDelegate.java:232) ... 118 more Caused by: java.lang.InstantiationException: org.antlr.runtime.CommonToken at java.lang.Class.newInstance(Class.java:359) at sun.reflect.GeneratedMethodAccessor32.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:75) at sun.reflect.GeneratedMethodAccessor29.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:279) at java.beans.Statement.invokeInternal(Statement.java:292) at java.beans.Statement.access$000(Statement.java:58) at java.beans.Statement$2.run(Statement.java:185) at java.security.AccessController.doPrivileged(Native Method) at java.beans.Statement.invoke(Statement.java:182) at java.beans.Expression.getValue(Expression.java:153) at java.beans.Encoder.getValue(Encoder.java:105) ... 130 more {noformat} and {noformat} java.lang.RuntimeException: Cannot serialize object at org.apache.hadoop.hive.ql.exec.Utilities$1.exceptionThrown(Utilities.java:598) at java.beans.XMLEncoder.writeStatement(XMLEncoder.java:426) at java.beans.XMLEncoder.writeObject(XMLEncoder.java:330) at org.apache.hadoop.hive.ql.exec.Utilities.serializeObject(Utilities.java:611) at org.apache.hadoop.hive.ql.plan.MapredWork.toXML(MapredWork.java:88) at org.apache.hadoop.hive.ql.optimizer.physical.CommonJoinTaskDispatcher.processCurrentTask(CommonJoinTaskDispatcher.java:505) at org.apache.hadoop.hive.ql.optimizer.physical.AbstractJoinTaskDispatcher.dispatch(AbstractJoinTaskDispatcher.java:182) at org.apache.hadoop.hive.ql.lib.TaskGraphWalker.dispatch(TaskGraphWalker.java:111) at org.apache.hadoop.hive.ql.lib.TaskGraphWalker.walk(TaskGraphWalker.java:194) at org.apache.hadoop.hive.ql.lib.TaskGraphWalker.startWalking(TaskGraphWalker.java:139) at
[jira] [Created] (HIVE-5066) Other code fixes for Windows
Daniel Dai created HIVE-5066: Summary: Other code fixes for Windows Key: HIVE-5066 URL: https://issues.apache.org/jira/browse/HIVE-5066 Project: Hive Issue Type: Sub-task Components: HCatalog Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.12.0 This is equivalent to HCATALOG-526, but updated to sync with latest trunk. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5003) Localize hive exec jar for tez
[ https://issues.apache.org/jira/browse/HIVE-5003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikram Dixit K updated HIVE-5003: - Attachment: HIVE-5003.5.patch.txt Addressed Gunther's comments. Localize hive exec jar for tez -- Key: HIVE-5003 URL: https://issues.apache.org/jira/browse/HIVE-5003 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Vikram Dixit K Fix For: tez-branch Attachments: HIVE-5003.1.patch.txt, HIVE-5003.2.patch.txt, HIVE-5003.3.patch.txt, HIVE-5003.4.patch.txt, HIVE-5003.5.patch.txt, HiveLocalizationDesign.txt Tez doesn't expose a distributed cache. JARs are localized via yarn APIs and added to vertices and the dag itself as needed. For hive we need to localize the hive-exec.jar. NO PRECOMMIT TESTS (this is wip for the tez branch) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Review Request 13507: HIVE-5003: Localize hive exec jar for tez
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/13507/ --- Review request for hive. Bugs: HIVE-5003 and HIVE-5004 https://issues.apache.org/jira/browse/HIVE-5003 https://issues.apache.org/jira/browse/HIVE-5004 Repository: hive-git Description --- Tez localization of exec and additional jars. Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 79c38c1 ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java 12e9334 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java faa99f7 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezTask.java ac536e2 Diff: https://reviews.apache.org/r/13507/diff/ Testing --- Thanks, Vikram Dixit Kumaraswamy
[jira] [Commented] (HIVE-5003) Localize hive exec jar for tez
[ https://issues.apache.org/jira/browse/HIVE-5003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737384#comment-13737384 ] Vikram Dixit K commented on HIVE-5003: -- RB entry: https://reviews.apache.org/r/13507/ Localize hive exec jar for tez -- Key: HIVE-5003 URL: https://issues.apache.org/jira/browse/HIVE-5003 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Vikram Dixit K Fix For: tez-branch Attachments: HIVE-5003.1.patch.txt, HIVE-5003.2.patch.txt, HIVE-5003.3.patch.txt, HIVE-5003.4.patch.txt, HIVE-5003.5.patch.txt, HiveLocalizationDesign.txt Tez doesn't expose a distributed cache. JARs are localized via yarn APIs and added to vertices and the dag itself as needed. For hive we need to localize the hive-exec.jar. NO PRECOMMIT TESTS (this is wip for the tez branch) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4899) Hive returns non-meanful error message for ill-formed fs.default.name
[ https://issues.apache.org/jira/browse/HIVE-4899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737399#comment-13737399 ] Hudson commented on HIVE-4899: -- FAILURE: Integrated in Hive-trunk-hadoop2-ptest #55 (See [https://builds.apache.org/job/Hive-trunk-hadoop2-ptest/55/]) HIVE-4899 - Hive returns non-meanful error message for ill-formed fs.default.name (Xuefu Zhang, Reviewed By: Ashutosh Chauhan) (brock: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1513229) * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/Driver.java * /hive/trunk/ql/src/test/results/clientnegative/fs_default_name1.q.out * /hive/trunk/ql/src/test/results/clientnegative/fs_default_name2.q.out Hive returns non-meanful error message for ill-formed fs.default.name - Key: HIVE-4899 URL: https://issues.apache.org/jira/browse/HIVE-4899 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.10.0, 0.11.0 Reporter: Xuefu Zhang Assignee: Xuefu Zhang Priority: Minor Fix For: 0.12.0 Attachments: HIVE-4899.patch For query in test case fs_default_name1.q: {code} set fs.default.name='http://www.example.com; show tables; {code} The following error message is returned: {code} FAILED: IllegalArgumentException null {code} The message is not very meaningful, and has null in it. It would be better if we can provide detailed error message. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4123) The RLE encoding for ORC can be improved
[ https://issues.apache.org/jira/browse/HIVE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737398#comment-13737398 ] Hudson commented on HIVE-4123: -- FAILURE: Integrated in Hive-trunk-hadoop2-ptest #55 (See [https://builds.apache.org/job/Hive-trunk-hadoop2-ptest/55/]) HIVE-4123 Improved ORC integer RLE version 2. (Prasanth Jayachandran via omalley) (omalley: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1513155) * /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java * /hive/trunk/ql/src/gen/protobuf/gen-java/org/apache/hadoop/hive/ql/io/orc/OrcProto.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/IntegerReader.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/IntegerWriter.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java.orig * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthIntegerReader.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthIntegerReaderV2.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthIntegerWriter.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthIntegerWriterV2.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/SerializationUtils.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java * /hive/trunk/ql/src/protobuf/org/apache/hadoop/hive/ql/io/orc/orc_proto.proto * /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestBitPack.java * /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestIntegerCompressionReader.java * /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestNewIntegerEncoding.java * /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcNullOptimization.java * /hive/trunk/ql/src/test/resources/orc-file-dump-dictionary-threshold.out * /hive/trunk/ql/src/test/resources/orc-file-dump.out The RLE encoding for ORC can be improved Key: HIVE-4123 URL: https://issues.apache.org/jira/browse/HIVE-4123 Project: Hive Issue Type: New Feature Components: File Formats Affects Versions: 0.12.0 Reporter: Owen O'Malley Assignee: Prasanth J Labels: orcfile Fix For: 0.12.0 Attachments: HIVE-4123.1.git.patch.txt, HIVE-4123.2.git.patch.txt, HIVE-4123.3.patch.txt, HIVE-4123.4.patch.txt, HIVE-4123.5.txt, HIVE-4123.6.txt, HIVE-4123.7.txt, HIVE-4123-8.patch, HIVE-4123.8.txt, HIVE-4123.8.txt, HIVE-4123.patch.txt, ORC-Compression-Ratio-Comparison.xlsx The run length encoding of integers can be improved: * tighter bit packing * allow delta encoding * allow longer runs -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2599) Support Composit/Compound Keys with HBaseStorageHandler
[ https://issues.apache.org/jira/browse/HIVE-2599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Swarnim Kulkarni updated HIVE-2599: --- Attachment: HIVE-2599.2.patch.txt Support Composit/Compound Keys with HBaseStorageHandler --- Key: HIVE-2599 URL: https://issues.apache.org/jira/browse/HIVE-2599 Project: Hive Issue Type: Improvement Components: HBase Handler Affects Versions: 0.8.0 Reporter: Hans Uhlig Assignee: Swarnim Kulkarni Attachments: HIVE-2599.1.patch.txt, HIVE-2599.2.patch.txt, HIVE-2599.2.patch.txt It would be really nice for hive to be able to understand composite keys from an underlying HBase schema. Currently we have to store key fields twice to be able to both key and make data available. I noticed John Sichi mentioned in HIVE-1228 that this would be a separate issue but I cant find any follow up. How feasible is this in the HBaseStorageHandler? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2599) Support Composit/Compound Keys with HBaseStorageHandler
[ https://issues.apache.org/jira/browse/HIVE-2599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737401#comment-13737401 ] Swarnim Kulkarni commented on HIVE-2599: This should be ready for review. If someone has a chance to take a look, that will be great! Support Composit/Compound Keys with HBaseStorageHandler --- Key: HIVE-2599 URL: https://issues.apache.org/jira/browse/HIVE-2599 Project: Hive Issue Type: Improvement Components: HBase Handler Affects Versions: 0.8.0 Reporter: Hans Uhlig Assignee: Swarnim Kulkarni Attachments: HIVE-2599.1.patch.txt, HIVE-2599.2.patch.txt, HIVE-2599.2.patch.txt It would be really nice for hive to be able to understand composite keys from an underlying HBase schema. Currently we have to store key fields twice to be able to both key and make data available. I noticed John Sichi mentioned in HIVE-1228 that this would be a separate issue but I cant find any follow up. How feasible is this in the HBaseStorageHandler? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5058) Fix NPE issue with DAG submission in TEZ
[ https://issues.apache.org/jira/browse/HIVE-5058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-5058: - Attachment: HIVE-5058.2.patch Fixed one more issue resulting in NPE (localization of reduce plans was incorrect) Fix NPE issue with DAG submission in TEZ Key: HIVE-5058 URL: https://issues.apache.org/jira/browse/HIVE-5058 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Fix For: tez-branch Attachments: HIVE-5058.1.patch, HIVE-5058.2.patch Submitting dag caused NPE on execution. Multiple issues: - Some configs weren't set right - Key desc/Table desc weren't set properly - parallelism was left at -1 NO PRECOMMIT TESTS (this is wip for the tez branch) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3630) udf_substr.q fails when using JDK7
[ https://issues.apache.org/jira/browse/HIVE-3630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737427#comment-13737427 ] Xuefu Zhang commented on HIVE-3630: --- [~brocknoland] No. I meant HIVE-3630 is needed to allow JDK7 to pass. HIVE-3840 addresses a different issue. The patch here probably needs to rebase because of changes introduced by HIVE-3840. udf_substr.q fails when using JDK7 -- Key: HIVE-3630 URL: https://issues.apache.org/jira/browse/HIVE-3630 Project: Hive Issue Type: Bug Components: Tests Affects Versions: 0.9.1, 0.10.0, 0.11.0 Reporter: Chris Drome Assignee: Chris Drome Attachments: HIVE-3630-0.10.patch, HIVE-3630-0.9.patch, HIVE-3630-trunk.patch Internal error: Cannot find ConstantObjectInspector for BINARY This exception has two causes. JDK7 iterators do not return values in the same order as JDK6, which selects a different implementation of this UDF when the first argument is null. With JDK7 this happens to be the binary version. The binary version is not implemented properly which ultimately causes the exception when the method is called. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5023) Hive get wrong result when partition has the same path but different schema or authority
[ https://issues.apache.org/jira/browse/HIVE-5023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737426#comment-13737426 ] Sushanth Sowmyan commented on HIVE-5023: +1 on intent from looking at what the patch fixes. Haven't explicitly tested it myself. Hive get wrong result when partition has the same path but different schema or authority Key: HIVE-5023 URL: https://issues.apache.org/jira/browse/HIVE-5023 Project: Hive Issue Type: Bug Reporter: Shuaishuai Nie Assignee: Shuaishuai Nie Attachments: HIVE-5023.1.patch, HIVE-5023.2.patch Hive does not differentiate scheme and authority in file uris which cause wrong result when partition has the same path but different schema or authority. Here is a simple repro partition file path: asv://contain...@secondary1.blob.core.windows.net/2013-08-05/00/text1.txt with content 2013-08-05 00:00:00 asv://contain...@secondary1.blob.core.windows.net/2013-08-05/00/text2.txt with content 2013-08-05 00:00:20 {noformat} CREATE EXTERNAL TABLE IF NOT EXISTS T1 (t STRING) PARTITIONED BY (ProcessDate STRING, Hour STRING, ClusterName STRING) ROW FORMAT DELIMITED FIELDS TERMINATED by '\t' STORED AS TEXTFILE; ALTER TABLE T1 DROP IF EXISTS PARTITION(processDate='2013-08-05', Hour='00', clusterName ='CLusterA'); ALTER TABLE T1 ADD IF NOT EXISTS PARTITION(processDate='2013-08-05', Hour='00', clusterName ='ClusterA') LOCATION 'asv://contain...@secondary1.blob.core.windows.net/2013-08-05/00'; ALTER TABLE T1 DROP IF EXISTS PARTITION(processDate='2013-08-05', Hour='00', clusterName ='ClusterB'); ALTER TABLE T1 ADD IF NOT EXISTS PARTITION(processDate='2013-08-05', Hour='00', clusterName ='ClusterB') LOCATION 'asv://contain...@secondary1.blob.core.windows.net/2013-08-05/00'; {noformat} the expect output of the hive query {noformat} SELECT ClusterName, t FROM T1 WHERE ProcessDate=’2013-08-05’ AND Hour=’00’; {noformat} should be {noformat} ClusterA2013-08-05 00:00:00 ClusterB2013-08-05 00:00:20 {noformat} However it is {noformat} ClusterA2013-08-05 00:00:00 ClusterA2013-08-05 00:00:20 {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HIVE-4778) hive.server2.authentication CUSTOM not working
[ https://issues.apache.org/jira/browse/HIVE-4778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Azrael Park reassigned HIVE-4778: - Assignee: Azrael Park hive.server2.authentication CUSTOM not working -- Key: HIVE-4778 URL: https://issues.apache.org/jira/browse/HIVE-4778 Project: Hive Issue Type: Bug Components: Authentication Affects Versions: 0.11.0 Environment: CentOS release 6.2 x86_64 java version 1.6.0_31 Java(TM) SE Runtime Environment (build 1.6.0_31-b04) Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01, mixed mode) Reporter: Zdenek Ott Assignee: Azrael Park I have created my own class PamAuthenticationProvider that implements PasswdAuthenticationProvider interface. I have puted jar into hive lib directory and have configured hive-site.xml in following way: property namehive.server2.authentication/name valueCUSTOM/value /property property namehive.server2.custom.authentication.class/name valuecom.avast.ff.hive.PamAuthenticationProvider/value /property I use SQuireL and jdbc drivers to connect to hive. During authentication Hive throws following exception: java.lang.RuntimeException: java.lang.NoSuchMethodException: org.apache.hive.service.auth.PasswdAuthenticationProvider.init() at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:128) at org.apache.hive.service.auth.CustomAuthenticationProviderImpl.init(CustomAuthenticationProviderImpl.java:20) at org.apache.hive.service.auth.AuthenticationProviderFactory.getAuthenticationProvider(AuthenticationProviderFactory.java:57) at org.apache.hive.service.auth.PlainSaslHelper$PlainServerCallbackHandler.handle(PlainSaslHelper.java:61) at org.apache.hive.service.auth.PlainSaslServer.evaluateResponse(PlainSaslServer.java:127) at org.apache.thrift.transport.TSaslTransport$SaslParticipant.evaluateChallengeOrResponse(TSaslTransport.java:509) at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:264) at org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41) at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:189) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.NoSuchMethodException: org.apache.hive.service.auth.PasswdAuthenticationProvider.init() at java.lang.Class.getConstructor0(Class.java:2706) at java.lang.Class.getDeclaredConstructor(Class.java:1985) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:122) ... 12 more I have done small patch for org.apache.hive.service.auth.CustomAuthenticationProviderImpl , that have solved my problem, but I'm not sure if it's the best solution. Here is the patch: --- CustomAuthenticationProviderImpl.java 2013-06-20 14:55:22.473995184 +0200 +++ CustomAuthenticationProviderImpl.java.new 2013-06-20 14:57:36.549012966 +0200 @@ -33,7 +33,7 @@ HiveConf conf = new HiveConf(); this.customHandlerClass = (Class? extends PasswdAuthenticationProvider) conf.getClass( - HiveConf.ConfVars.HIVE_SERVER2_CUSTOM_AUTHENTICATION_CLASS.name(), + HiveConf.ConfVars.HIVE_SERVER2_CUSTOM_AUTHENTICATION_CLASS.varname, PasswdAuthenticationProvider.class); this.customProvider = ReflectionUtils.newInstance(this.customHandlerClass, conf); -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3630) udf_substr.q fails when using JDK7
[ https://issues.apache.org/jira/browse/HIVE-3630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737448#comment-13737448 ] Brock Noland commented on HIVE-3630: [~xuefuz] udf_substr.q does not fail on JDK7 for me. I think we can close this. udf_substr.q fails when using JDK7 -- Key: HIVE-3630 URL: https://issues.apache.org/jira/browse/HIVE-3630 Project: Hive Issue Type: Bug Components: Tests Affects Versions: 0.9.1, 0.10.0, 0.11.0 Reporter: Chris Drome Assignee: Chris Drome Attachments: HIVE-3630-0.10.patch, HIVE-3630-0.9.patch, HIVE-3630-trunk.patch Internal error: Cannot find ConstantObjectInspector for BINARY This exception has two causes. JDK7 iterators do not return values in the same order as JDK6, which selects a different implementation of this UDF when the first argument is null. With JDK7 this happens to be the binary version. The binary version is not implemented properly which ultimately causes the exception when the method is called. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3630) udf_substr.q fails when using JDK7
[ https://issues.apache.org/jira/browse/HIVE-3630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737450#comment-13737450 ] Xuefu Zhang commented on HIVE-3630: --- [~brocknoland] Okay. Feel free to close it if it's no longer reproducible. It was there a couple months back. udf_substr.q fails when using JDK7 -- Key: HIVE-3630 URL: https://issues.apache.org/jira/browse/HIVE-3630 Project: Hive Issue Type: Bug Components: Tests Affects Versions: 0.9.1, 0.10.0, 0.11.0 Reporter: Chris Drome Assignee: Chris Drome Attachments: HIVE-3630-0.10.patch, HIVE-3630-0.9.patch, HIVE-3630-trunk.patch Internal error: Cannot find ConstantObjectInspector for BINARY This exception has two causes. JDK7 iterators do not return values in the same order as JDK6, which selects a different implementation of this UDF when the first argument is null. With JDK7 this happens to be the binary version. The binary version is not implemented properly which ultimately causes the exception when the method is called. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request 13507: HIVE-5003: Localize hive exec jar for tez
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/13507/#review25040 --- Ship it! Ship It! - Gunther Hagleitner On Aug. 12, 2013, 9:38 p.m., Vikram Dixit Kumaraswamy wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/13507/ --- (Updated Aug. 12, 2013, 9:38 p.m.) Review request for hive. Bugs: HIVE-5003 and HIVE-5004 https://issues.apache.org/jira/browse/HIVE-5003 https://issues.apache.org/jira/browse/HIVE-5004 Repository: hive-git Description --- Tez localization of exec and additional jars. Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 79c38c1 ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java 12e9334 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java faa99f7 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezTask.java ac536e2 Diff: https://reviews.apache.org/r/13507/diff/ Testing --- Thanks, Vikram Dixit Kumaraswamy
Re: Review Request 13507: HIVE-5003: Localize hive exec jar for tez
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/13507/#review25041 --- Ship it! Ship It! - Gunther Hagleitner On Aug. 12, 2013, 9:38 p.m., Vikram Dixit Kumaraswamy wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/13507/ --- (Updated Aug. 12, 2013, 9:38 p.m.) Review request for hive. Bugs: HIVE-5003 and HIVE-5004 https://issues.apache.org/jira/browse/HIVE-5003 https://issues.apache.org/jira/browse/HIVE-5004 Repository: hive-git Description --- Tez localization of exec and additional jars. Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 79c38c1 ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java 12e9334 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java faa99f7 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezTask.java ac536e2 Diff: https://reviews.apache.org/r/13507/diff/ Testing --- Thanks, Vikram Dixit Kumaraswamy
[jira] [Resolved] (HIVE-3630) udf_substr.q fails when using JDK7
[ https://issues.apache.org/jira/browse/HIVE-3630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland resolved HIVE-3630. Resolution: Cannot Reproduce I am unable to reproduce despite repeated efforts. It seems something else fixed this therefore I am marking resolved. Please re-open if required. udf_substr.q fails when using JDK7 -- Key: HIVE-3630 URL: https://issues.apache.org/jira/browse/HIVE-3630 Project: Hive Issue Type: Bug Components: Tests Affects Versions: 0.9.1, 0.10.0, 0.11.0 Reporter: Chris Drome Assignee: Chris Drome Attachments: HIVE-3630-0.10.patch, HIVE-3630-0.9.patch, HIVE-3630-trunk.patch Internal error: Cannot find ConstantObjectInspector for BINARY This exception has two causes. JDK7 iterators do not return values in the same order as JDK6, which selects a different implementation of this UDF when the first argument is null. With JDK7 this happens to be the binary version. The binary version is not implemented properly which ultimately causes the exception when the method is called. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3630) udf_substr.q fails when using JDK7
[ https://issues.apache.org/jira/browse/HIVE-3630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737467#comment-13737467 ] Chris Drome commented on HIVE-3630: --- Sorry for jumping into the discussion late. Feel free to close this if it is no longer reproducible ([~ashutoshc]] thought that would be the case after HIVE-3840). udf_substr.q fails when using JDK7 -- Key: HIVE-3630 URL: https://issues.apache.org/jira/browse/HIVE-3630 Project: Hive Issue Type: Bug Components: Tests Affects Versions: 0.9.1, 0.10.0, 0.11.0 Reporter: Chris Drome Assignee: Chris Drome Attachments: HIVE-3630-0.10.patch, HIVE-3630-0.9.patch, HIVE-3630-trunk.patch Internal error: Cannot find ConstantObjectInspector for BINARY This exception has two causes. JDK7 iterators do not return values in the same order as JDK6, which selects a different implementation of this UDF when the first argument is null. With JDK7 this happens to be the binary version. The binary version is not implemented properly which ultimately causes the exception when the method is called. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3688) Various tests failing in TestNegativeCliDriver, TestParseNegative, TestParse when using JDK7
[ https://issues.apache.org/jira/browse/HIVE-3688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737469#comment-13737469 ] Chris Drome commented on HIVE-3688: --- [~brocknoland] that would be great. I'll remove the TestParse parts of this patch and resubmit for the TestNegativeCliDriver cases only. Thanks. Various tests failing in TestNegativeCliDriver, TestParseNegative, TestParse when using JDK7 Key: HIVE-3688 URL: https://issues.apache.org/jira/browse/HIVE-3688 Project: Hive Issue Type: Bug Components: Query Processor, Tests Affects Versions: 0.9.1, 0.10.0 Reporter: Chris Drome Assignee: Chris Drome Attachments: HIVE-3688-0.9.patch, HIVE-3688-trunk.patch The following tests are failing when using JDK7. TestNegativeCliDriver: case_sensitivity.q cast1.q groupby1.q groupby2.q groupby3.q groupby4.q groupby5.q groupby6.q input1.q input2.q input20.q input3.q input4.q input5.q input6.q input7.q input8.q input9.q input_part1.q input_testsequencefile.q input_testxpath.q input_testxpath2.q join1.q join2.q join3.q join4.q join5.q join6.q join7.q join8.q sample1.q sample2.q sample3.q sample4.q sample5.q sample6.q sample7.q subq.q udf1.q udf4.q udf6.q udf_case.q udf_when.q union.q TestParseNegative: invalid_function_param2.q TestNegativeCliDriver: fs_default_name1.q.out_0.23_1.7 fs_default_name2.q.out_0.23_1.7 invalid_cast_from_binary_1.q.out_0.23_1.7 invalid_cast_from_binary_2.q.out_0.23_1.7 invalid_cast_from_binary_3.q.out_0.23_1.7 invalid_cast_from_binary_4.q.out_0.23_1.7 invalid_cast_from_binary_5.q.out_0.23_1.7 invalid_cast_from_binary_6.q.out_0.23_1.7 wrong_column_type.q.out_0.23_1.7 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4838) Refactor MapJoin HashMap code to improve testability and readability
[ https://issues.apache.org/jira/browse/HIVE-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737471#comment-13737471 ] Ashutosh Chauhan commented on HIVE-4838: Good work Brock. Left some comments on phabricator. Another question is it seems like there are few file mvs? To preserve history how shall we proceed about applying this patch on trunk. Refactor MapJoin HashMap code to improve testability and readability Key: HIVE-4838 URL: https://issues.apache.org/jira/browse/HIVE-4838 Project: Hive Issue Type: Bug Reporter: Brock Noland Assignee: Brock Noland Attachments: HIVE-4838.patch, HIVE-4838.patch, HIVE-4838.patch, HIVE-4838.patch, HIVE-4838.patch, HIVE-4838.patch MapJoin is an essential component for high performance joins in Hive and the current code has done great service for many years. However, the code is showing it's age and currently suffers from the following issues: * Uses static state via the MapJoinMetaData class to pass serialization metadata to the Key, Row classes. * The api of a logical Table Container is not defined and therefore it's unclear what apis HashMapWrapper needs to publicize. Additionally HashMapWrapper has many used public methods. * HashMapWrapper contains logic to serialize, test memory bounds, and implement the table container. Ideally these logical units could be seperated * HashTableSinkObjectCtx has unused fields and unused methods * CommonJoinOperator and children use ArrayList on left hand side when only List is required * There are unused classes MRU, DCLLItemm and classes which duplicate functionality MapJoinSingleKey and MapJoinDoubleKeys -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4838) Refactor MapJoin HashMap code to improve testability and readability
[ https://issues.apache.org/jira/browse/HIVE-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737482#comment-13737482 ] Brock Noland commented on HIVE-4838: Sounds good, I will address them. In regards to the moves, I don't believe there are any true mv's. MapJoinObjectKey - MapJoinKey is kind of a move but I'd say it's more of complete re-implementation. Refactor MapJoin HashMap code to improve testability and readability Key: HIVE-4838 URL: https://issues.apache.org/jira/browse/HIVE-4838 Project: Hive Issue Type: Bug Reporter: Brock Noland Assignee: Brock Noland Attachments: HIVE-4838.patch, HIVE-4838.patch, HIVE-4838.patch, HIVE-4838.patch, HIVE-4838.patch, HIVE-4838.patch MapJoin is an essential component for high performance joins in Hive and the current code has done great service for many years. However, the code is showing it's age and currently suffers from the following issues: * Uses static state via the MapJoinMetaData class to pass serialization metadata to the Key, Row classes. * The api of a logical Table Container is not defined and therefore it's unclear what apis HashMapWrapper needs to publicize. Additionally HashMapWrapper has many used public methods. * HashMapWrapper contains logic to serialize, test memory bounds, and implement the table container. Ideally these logical units could be seperated * HashTableSinkObjectCtx has unused fields and unused methods * CommonJoinOperator and children use ArrayList on left hand side when only List is required * There are unused classes MRU, DCLLItemm and classes which duplicate functionality MapJoinSingleKey and MapJoinDoubleKeys -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira