[jira] [Commented] (HIVE-8017) Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126674#comment-14126674 ] Rui Li commented on HIVE-8017: -- [~xuefuz], spark doesn't guarantee order for group-by queries. And the hash code of BytesWrtiable and HiveKey are computed differently. So the records produced by the mappers are likely to be partitioned to different reducers. I believe that's why we got different results (only different in order) after the change. I don't quite get the point of your last comment. I did try adding {{-- SORT_BEFORE_DIFF}} in the q file. It just seems the test framework is a little particular about where we should put it (the test failed to run if I put it before the {{set}} commands in the q file). [~brocknoland] do you have any comments as how to use the {{-- SORT_BEFORE_DIFF}} label? Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark Branch] --- Key: HIVE-8017 URL: https://issues.apache.org/jira/browse/HIVE-8017 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Rui Li Assignee: Rui Li Attachments: HIVE-8017-spark.patch HiveKey should be used as the key type because it holds the hash code for partitioning. While BytesWritable serves partitioning well for simple cases, we have to use {{HiveKey.hashCode}} for more complicated ones, e.g. join, bucketed table, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-8029) Remove reducers number configure in SparkTask[Spark Branch]
Chengxiang Li created HIVE-8029: --- Summary: Remove reducers number configure in SparkTask[Spark Branch] Key: HIVE-8029 URL: https://issues.apache.org/jira/browse/HIVE-8029 Project: Hive Issue Type: Improvement Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li We do not need duplicated logic to configure reducers number in SparkTask, as SetSparkReduceParallelism would always set reducers number in compiler phase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-5871) Use multiple-characters as field delimiter
[ https://issues.apache.org/jira/browse/HIVE-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126682#comment-14126682 ] Lefty Leverenz commented on HIVE-5871: -- Actually you _can_ edit comments, although that's discouraged because preserving the history is important. (A good workaround is to append Edit date: to the comment instead of changing the original text.) The edit function is via a pencil icon in the upper right corner of each comment. But updating the description is also good. If you have wiki update permission, each Hive wiki page has an Edit link with a pencil icon in the upper right corner, next to Share and Tools. If not, you can request permission as described in AboutThisWiki which you can find on the left side of the Home page. A preliminary guide to editing the wiki is in a comment on HIVE-7142. By the way, that's an edited comment. Here are the links: * [AboutThisWiki -- How to get permission to edit | https://cwiki.apache.org/confluence/display/Hive/AboutThisWiki#AboutThisWiki-Howtogetpermissiontoedit] * [HIVE-7142 comment about how to edit the wiki | https://issues.apache.org/jira/browse/HIVE-7142?focusedCommentId=14096756page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14096756] Use multiple-characters as field delimiter -- Key: HIVE-5871 URL: https://issues.apache.org/jira/browse/HIVE-5871 Project: Hive Issue Type: Improvement Components: Contrib Affects Versions: 0.12.0 Reporter: Rui Li Assignee: Rui Li Labels: TODOC14 Fix For: 0.14.0 Attachments: HIVE-5871.2.patch, HIVE-5871.3.patch, HIVE-5871.4.patch, HIVE-5871.5.patch, HIVE-5871.6.patch, HIVE-5871.patch By default, hive only allows user to use single character as field delimiter. Although there's RegexSerDe to specify multiple-character delimiter, it can be daunting to use, especially for amateurs. In the patch, I add a new SerDe named MultiDelimitSerDe. With MultiDelimitSerDe, users can specify a multiple-character field delimiter when creating tables, in a way most similar to typical table creations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 24627: HIVE-7704: Create tez task for fast file merging
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24627/#review52679 --- ql/src/java/org/apache/hadoop/hive/ql/exec/AbstractFileMergeOperator.java https://reviews.apache.org/r/24627/#comment91637 All these were exisiting code taken from MergeMapper.java. Anyways, I rewrote the comment in the new patch. Also merged fixTmpPath and fixTmpPathConcatenate method to single method in the new patch. ql/src/java/org/apache/hadoop/hive/ql/exec/AbstractFileMergeOperator.java https://reviews.apache.org/r/24627/#comment91638 Updated in new patch. ql/src/java/org/apache/hadoop/hive/ql/exec/AbstractFileMergeOperator.java https://reviews.apache.org/r/24627/#comment91639 Updated comment in new patch. ql/src/java/org/apache/hadoop/hive/ql/exec/AbstractFileMergeOperator.java https://reviews.apache.org/r/24627/#comment91640 This is all gone in new patch. ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java https://reviews.apache.org/r/24627/#comment91641 Thats the eclipse. I used intellij. Perhaps both seems to do the opposite. :) ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java https://reviews.apache.org/r/24627/#comment91642 Fixed it. ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java https://reviews.apache.org/r/24627/#comment91643 If I use Map interface here, then I need to cast it to LinkedHashMap when I set aliasToWork(). To avoid casting I am using ListkedHashMap on LHS. ql/src/java/org/apache/hadoop/hive/ql/exec/OrcFileMergeOperator.java https://reviews.apache.org/r/24627/#comment91644 Fixed it. ql/src/java/org/apache/hadoop/hive/ql/io/merge/MergeFileOutputFormat.java https://reviews.apache.org/r/24627/#comment91645 This is a dummy output format which is set in DagUtils/MergeFileTask. I am just using to make sure the operator pipeline for fast file merge is initialized properly. If the operator pipeline is wrongly initialized with say TS - FS, then FS will get record writer from this output format throwing RuntimeException. If OFM or RFM operators are initialized then this will never be called. The RCFile and ORC file writers handles output file opening and closing themselves. It does not use the standard record writer interfaces for writing the output. Both RCFile and ORC use custom interfaces for block level and stripe level writing respectively. - Prasanth_J On Sept. 6, 2014, 2:03 a.m., Prasanth_J wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24627/ --- (Updated Sept. 6, 2014, 2:03 a.m.) Review request for hive and Gunther Hagleitner. Bugs: HIVE-7704 https://issues.apache.org/jira/browse/HIVE-7704 Repository: hive-git Description --- Currently tez falls back to MR task for merge file task. It will beneficial to convert the merge file tasks to tez task to make use of the performance gains from tez. Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 54e2b18 itests/src/test/resources/testconfiguration.properties 99049ca ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/OperatorType.java 6f23575 ql/src/java/org/apache/hadoop/hive/ql/exec/AbstractFileMergeOperator.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java e076683 ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java 7477199 ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 8946221 ql/src/java/org/apache/hadoop/hive/ql/exec/OrcFileMergeOperator.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/RCFileMergeOperator.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/TaskFactory.java 3d74459 ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 2d9b9c3 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java 4ff568d1 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/MergeFileRecordProcessor.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/MergeFileTezProcessor.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/RecordProcessor.java 994721f ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezProcessor.java 831e6a5 ql/src/java/org/apache/hadoop/hive/ql/io/merge/MergeFileInputFormat.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/merge/MergeFileMapper.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/merge/MergeFileOutputFormat.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/merge/MergeFileTask.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/merge/MergeFileWork.java PRE-CREATION
[jira] [Commented] (HIVE-5871) Use multiple-characters as field delimiter
[ https://issues.apache.org/jira/browse/HIVE-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126690#comment-14126690 ] Rui Li commented on HIVE-5871: -- Thanks [~leftylev] that's very helpful. However I don't see a pencil icon in the upper right corner of my comments. (There is one in the description though). Wonder if I'm still missing something? Use multiple-characters as field delimiter -- Key: HIVE-5871 URL: https://issues.apache.org/jira/browse/HIVE-5871 Project: Hive Issue Type: Improvement Components: Contrib Affects Versions: 0.12.0 Reporter: Rui Li Assignee: Rui Li Labels: TODOC14 Fix For: 0.14.0 Attachments: HIVE-5871.2.patch, HIVE-5871.3.patch, HIVE-5871.4.patch, HIVE-5871.5.patch, HIVE-5871.6.patch, HIVE-5871.patch By default, hive only allows user to use single character as field delimiter. Although there's RegexSerDe to specify multiple-character delimiter, it can be daunting to use, especially for amateurs. In the patch, I add a new SerDe named MultiDelimitSerDe. With MultiDelimitSerDe, users can specify a multiple-character field delimiter when creating tables, in a way most similar to typical table creations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7818) Support boolean PPD for ORC
[ https://issues.apache.org/jira/browse/HIVE-7818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126696#comment-14126696 ] Hive QA commented on HIVE-7818: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12667118/HIVE-7818.1.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 6186 tests executed *Failed tests:* {noformat} org.apache.hive.service.TestHS2ImpersonationWithRemoteMS.testImpersonation {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/702/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/702/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-702/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12667118 Support boolean PPD for ORC --- Key: HIVE-7818 URL: https://issues.apache.org/jira/browse/HIVE-7818 Project: Hive Issue Type: Improvement Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.14.0 Attachments: HIVE-7818.1.patch Currently ORC does collect stats for boolean field. However, the boolean stats is not range based, instead, it collects counts of true records. RecordReaderImpl.evaluatePredicate currently only deals with range based stats, we need to improve it to deal with the boolean stats. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7950) StorageHandler resources aren't added to Tez Session if already Session is already Open
[ https://issues.apache.org/jira/browse/HIVE-7950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Elser updated HIVE-7950: - Attachment: hive-7950-tez-WIP.diff I took a look at the tez branch to see if I could add more resources to an existing session as you described, [~sershe]. Looking at the javadoc, I feel like this patch should work, but the query still errors out when the map inside the dag fails due to missing classes. I can see that the dag does get the extra jars localized: {noformat} 2014-09-08 23:20:34,823 INFO [AsyncDispatcher event handler] org.apache.tez.dag.app.dag.impl.DAGImpl: Added additional resources : [[file:/usr/local/lib/hadoop-2.6.0-SNAPSHOT/yarn/nm-tmp/usercache/jelser/appcache/application_1410243497503_0001/container_1410243497503_0001_01_01/accumulo-fate-1.6.0.jar, file:/usr/local/lib/hadoop-2.6.0-SNAPSHOT/yarn/nm-tmp/usercache/jelser/appcache/application_1410243497503_0001/container_1410243497503_0001_01_01/accumulo-core-1.6.0.jar, file:/usr/local/lib/hadoop-2.6.0-SNAPSHOT/yarn/nm-tmp/usercache/jelser/appcache/application_1410243497503_0001/container_1410243497503_0001_01_01/accumulo-trace-1.6.0.jar, file:/usr/local/lib/hadoop-2.6.0-SNAPSHOT/yarn/nm-tmp/usercache/jelser/appcache/application_1410243497503_0001/container_1410243497503_0001_01_01/accumulo-start-1.6.0.jar, file:/usr/local/lib/hadoop-2.6.0-SNAPSHOT/yarn/nm-tmp/usercache/jelser/appcache/application_1410243497503_0001/container_1410243497503_0001_01_01/zookeeper-3.4.6.jar]] to classpath {noformat} But I'm still getting a NoClassDefFoundException on a class which is in accumulo-core.jar: {noformat} Serialization trace: inputFileFormatClass (org.apache.hadoop.hive.ql.plan.TableDesc) tableDesc (org.apache.hadoop.hive.ql.plan.PartitionDesc) aliasToPartnInfo (org.apache.hadoop.hive.ql.plan.MapWork) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:183) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:180) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:172) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:167) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.RuntimeException: org.apache.hive.com.esotericsoftware.kryo.KryoException: java.lang.IllegalArgumentException: Unable to create serializer org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer for class: org.apache.hadoop.hive.accumulo.mr.HiveAccumuloTableInputFormat Serialization trace: inputFileFormatClass (org.apache.hadoop.hive.ql.plan.TableDesc) tableDesc (org.apache.hadoop.hive.ql.plan.PartitionDesc) aliasToPartnInfo (org.apache.hadoop.hive.ql.plan.MapWork) at org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:384) at org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:281) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:73) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:134) ... 12 more Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: java.lang.IllegalArgumentException: Unable to create serializer org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer for class: org.apache.hadoop.hive.accumulo.mr.HiveAccumuloTableInputFormat Serialization trace: inputFileFormatClass (org.apache.hadoop.hive.ql.plan.TableDesc) tableDesc (org.apache.hadoop.hive.ql.plan.PartitionDesc) aliasToPartnInfo (org.apache.hadoop.hive.ql.plan.MapWork) at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125) at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507) at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694) at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106) at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507) at
[jira] [Commented] (HIVE-7704) Create tez task for fast file merging
[ https://issues.apache.org/jira/browse/HIVE-7704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126719#comment-14126719 ] Prasanth J commented on HIVE-7704: -- Addressed Vikram's review comments. Create tez task for fast file merging - Key: HIVE-7704 URL: https://issues.apache.org/jira/browse/HIVE-7704 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Attachments: HIVE-7704.1.patch, HIVE-7704.2.patch, HIVE-7704.3.patch, HIVE-7704.4.patch, HIVE-7704.4.patch, HIVE-7704.5.patch, HIVE-7704.6.patch, HIVE-7704.7.patch, HIVE-7704.8.patch Currently tez falls back to MR task for merge file task. It will beneficial to convert the merge file tasks to tez task to make use of the performance gains from tez. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7704) Create tez task for fast file merging
[ https://issues.apache.org/jira/browse/HIVE-7704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-7704: - Attachment: HIVE-7704.8.patch Create tez task for fast file merging - Key: HIVE-7704 URL: https://issues.apache.org/jira/browse/HIVE-7704 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Attachments: HIVE-7704.1.patch, HIVE-7704.2.patch, HIVE-7704.3.patch, HIVE-7704.4.patch, HIVE-7704.4.patch, HIVE-7704.5.patch, HIVE-7704.6.patch, HIVE-7704.7.patch, HIVE-7704.8.patch Currently tez falls back to MR task for merge file task. It will beneficial to convert the merge file tasks to tez task to make use of the performance gains from tez. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 24627: HIVE-7704: Create tez task for fast file merging
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24627/ --- (Updated Sept. 9, 2014, 7:32 a.m.) Review request for hive and Gunther Hagleitner. Changes --- Addressed Vikram's review comment.s Bugs: HIVE-7704 https://issues.apache.org/jira/browse/HIVE-7704 Repository: hive-git Description --- Currently tez falls back to MR task for merge file task. It will beneficial to convert the merge file tasks to tez task to make use of the performance gains from tez. Diffs (updated) - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 31aeba9 itests/src/test/resources/testconfiguration.properties 99049ca ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/OperatorType.java 6f23575 ql/src/java/org/apache/hadoop/hive/ql/exec/AbstractFileMergeOperator.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java e076683 ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java 7477199 ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 8946221 ql/src/java/org/apache/hadoop/hive/ql/exec/OrcFileMergeOperator.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/RCFileMergeOperator.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/TaskFactory.java 3d74459 ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 5bbf3f6 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java 4ff568d1 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/MergeFileRecordProcessor.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/MergeFileTezProcessor.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/RecordProcessor.java 994721f ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezProcessor.java 831e6a5 ql/src/java/org/apache/hadoop/hive/ql/io/merge/MergeFileInputFormat.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/merge/MergeFileMapper.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/merge/MergeFileOutputFormat.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/merge/MergeFileTask.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/merge/MergeFileWork.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/merge/MergeInputFormat.java 4651920 ql/src/java/org/apache/hadoop/hive/ql/io/merge/MergeMapper.java 6c691b1 ql/src/java/org/apache/hadoop/hive/ql/io/merge/MergeOutputFormat.java a3ce699 ql/src/java/org/apache/hadoop/hive/ql/io/merge/MergeTask.java c30476b ql/src/java/org/apache/hadoop/hive/ql/io/merge/MergeWork.java 9efee3c ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFileMergeMapper.java 13ec642 ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFileStripeMergeInputFormat.java a6c92fb ql/src/java/org/apache/hadoop/hive/ql/io/orc/Writer.java c391b0e ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java 195d60e ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileBlockMergeInputFormat.java 6809c79 ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileMergeMapper.java dee6b1c ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 7129ed8 ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java 11a9419 ql/src/java/org/apache/hadoop/hive/ql/plan/FileMergeDesc.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/plan/OrcFileMergeDesc.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/plan/RCFileMergeDesc.java PRE-CREATION ql/src/test/queries/clientpositive/list_bucket_dml_8.q 9e81b8d ql/src/test/queries/clientpositive/orc_merge1.q ee65b98 ql/src/test/queries/clientpositive/orc_merge5.q PRE-CREATION ql/src/test/queries/clientpositive/orc_merge6.q PRE-CREATION ql/src/test/queries/clientpositive/orc_merge7.q PRE-CREATION ql/src/test/results/clientpositive/infer_bucket_sort_dyn_part.q.out 11c7578 ql/src/test/results/clientpositive/list_bucket_dml_10.q.out 8de452f ql/src/test/results/clientpositive/list_bucket_dml_4.q.out b1c060e ql/src/test/results/clientpositive/list_bucket_dml_6.q.out 3450d63 ql/src/test/results/clientpositive/list_bucket_dml_7.q.out f6a4cb5 ql/src/test/results/clientpositive/list_bucket_dml_9.q.out 796c7af ql/src/test/results/clientpositive/merge_dynamic_partition4.q.out 0899648 ql/src/test/results/clientpositive/merge_dynamic_partition5.q.out 0653469 ql/src/test/results/clientpositive/orc_createas1.q.out 993c853 ql/src/test/results/clientpositive/orc_merge1.q.out 7f88125 ql/src/test/results/clientpositive/orc_merge3.q.out 258f538 ql/src/test/results/clientpositive/orc_merge5.q.out PRE-CREATION ql/src/test/results/clientpositive/orc_merge6.q.out PRE-CREATION ql/src/test/results/clientpositive/orc_merge7.q.out PRE-CREATION ql/src/test/results/clientpositive/rcfile_createas1.q.out cdfa036
[jira] [Commented] (HIVE-7777) add CSV support for Serde
[ https://issues.apache.org/jira/browse/HIVE-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126737#comment-14126737 ] Hive QA commented on HIVE-: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12667327/HIVE-.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6188 tests executed *Failed tests:* {noformat} org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection org.apache.hive.service.TestHS2ImpersonationWithRemoteMS.testImpersonation {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/703/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/703/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-703/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12667327 add CSV support for Serde - Key: HIVE- URL: https://issues.apache.org/jira/browse/HIVE- Project: Hive Issue Type: Bug Components: Serializers/Deserializers Reporter: Ferdinand Xu Assignee: Ferdinand Xu Attachments: HIVE-.patch, csv-serde-master.zip There is no official support for csvSerde for hive while there is an open source project in github(https://github.com/ogrodnek/csv-serde). CSV is of high frequency in use as a data format. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Timeline for release of Hive 0.14
Please include https://issues.apache.org/jira/browse/HIVE-7694 as well. It is currently under review by Amareshwari and should be done in the next couple of days. Thanks Suma On Mon, Sep 8, 2014 at 5:44 PM, Alan Gates ga...@hortonworks.com wrote: I'll review that. I just need the time to test it against mysql, oracle, and hopefully sqlserver. But I think we can do this post branch if we need to, as it's a bug fix rather than a feature. Alan. Damien Carol dca...@blitzbs.com September 8, 2014 at 3:19 Same request for https://issues.apache.org/jira/browse/HIVE-7689 I already provided a patch, re-based it many times and I'm waiting for a review. Regards, Le 08/09/2014 12:08, amareshwarisr . a écrit : amareshwarisr . amareshw...@gmail.com September 8, 2014 at 3:08 Would like to include https://issues.apache.org/jira/browse/HIVE-2390 and https://issues.apache.org/jira/browse/HIVE-7936. I can review and merge them. Thanks Amareshwari Vikram Dixit vik...@hortonworks.com September 5, 2014 at 17:53 Hi Folks, I am going to start consolidating the items mentioned in this list and create a wiki page to track it. I will wait till the end of next week to create the branch taking into account Ashutosh's request. Thanks Vikram. On Fri, Sep 5, 2014 at 5:39 PM, Ashutosh Chauhan hashut...@apache.org hashut...@apache.org Ashutosh Chauhan hashut...@apache.org September 5, 2014 at 17:39 Vikram, Some of us are working on stabilizing cbo branch and trying to get it merged into trunk. We feel we are close. May I request to defer cutting the branch for few more days? Folks interested in this can track our progress here : https://issues.apache.org/jira/browse/HIVE-7946 Thanks, Ashutosh On Fri, Aug 22, 2014 at 4:09 PM, Lars Francke lars.fran...@gmail.com lars.fran...@gmail.com Lars Francke lars.fran...@gmail.com August 22, 2014 at 16:09 Thank you for volunteering to do the release. I think a 0.14 release is a good idea. I have a couple of issues I'd like to get in too: * Either HIVE-7107[0] (Fix an issue in the HiveServer1 JDBC driver) or HIVE-6977[1] (Delete HiveServer1). The former needs a review the latter a patch * HIVE-6123[2] Checkstyle in Maven needs a review HIVE-7622[3] HIVE-7543[4] are waiting for any reviews or comments on my previous thread[5]. I'd still appreciate any helpers for reviews or even just comments. I'd feel very sad if I had done all that work for nothing. Hoping this thread gives me a wider audience. Both patches fix up issues that should have been caught in earlier reviews as they are almost all Checkstyle or other style violations but they make for huge patches. I could also create hundreds of small issues or stop doing these things entirely [0] https://issues.apache.org/jira/browse/HIVE-7107 https://issues.apache.org/jira/browse/HIVE-7107 [1] https://issues.apache.org/jira/browse/HIVE-6977 https://issues.apache.org/jira/browse/HIVE-6977 [2] https://issues.apache.org/jira/browse/HIVE-6123 https://issues.apache.org/jira/browse/HIVE-6123 [3] https://issues.apache.org/jira/browse/HIVE-7622 https://issues.apache.org/jira/browse/HIVE-7622 [4] https://issues.apache.org/jira/browse/HIVE-7543 https://issues.apache.org/jira/browse/HIVE-7543 On Fri, Aug 22, 2014 at 11:01 PM, John Pullokkaran -- Sent with Postbox http://www.getpostbox.com CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Review Request 25468: HIVE-7777: add CSVSerde support
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25468/#review52688 --- Looks good apart from minor comments. Maybe add a test for the Serialization part? https://issues.apache.org/jira/browse/HIVE-5976 integration might be nice: STORED AS CSV. Unfortunately there's no documentation yet so I'm not sure if it's feasible. serde/src/java/org/apache/hadoop/hive/serde2/CSVSerde.java https://reviews.apache.org/r/25468/#comment91646 This comment doesn't add value so I suggest removing it. serde/src/java/org/apache/hadoop/hive/serde2/CSVSerde.java https://reviews.apache.org/r/25468/#comment91647 * Constants is deprecated. Use serdeConstants instead * Exceeds maximum line length (100 chars) serde/src/java/org/apache/hadoop/hive/serde2/CSVSerde.java https://reviews.apache.org/r/25468/#comment91648 Unused serde/src/java/org/apache/hadoop/hive/serde2/CSVSerde.java https://reviews.apache.org/r/25468/#comment91651 Missing spaces around operators serde/src/java/org/apache/hadoop/hive/serde2/CSVSerde.java https://reviews.apache.org/r/25468/#comment91650 2 x Unnecessary this serde/src/java/org/apache/hadoop/hive/serde2/CSVSerde.java https://reviews.apache.org/r/25468/#comment91653 Missing spaces around operators serde/src/java/org/apache/hadoop/hive/serde2/CSVSerde.java https://reviews.apache.org/r/25468/#comment91669 I suggest moving these properties to Constants somewhere serde/src/java/org/apache/hadoop/hive/serde2/CSVSerde.java https://reviews.apache.org/r/25468/#comment91649 Method declared final in final class serde/src/java/org/apache/hadoop/hive/serde2/CSVSerde.java https://reviews.apache.org/r/25468/#comment91657 long line serde/src/java/org/apache/hadoop/hive/serde2/CSVSerde.java https://reviews.apache.org/r/25468/#comment91656 Missing spaces serde/src/java/org/apache/hadoop/hive/serde2/CSVSerde.java https://reviews.apache.org/r/25468/#comment91659 I don't quite get this comment. Looking at the two CSVReader constructors they seem to do the same in this case. From how I understand it this if-statement is not needed. Same for the newWriter method. Maybe I'm missing something? serde/src/java/org/apache/hadoop/hive/serde2/CSVSerde.java https://reviews.apache.org/r/25468/#comment91660 Missing @Override annotation serde/src/test/org/apache/hadoop/hive/serde2/TestCSVSerde.java https://reviews.apache.org/r/25468/#comment91661 Can be private too serde/src/test/org/apache/hadoop/hive/serde2/TestCSVSerde.java https://reviews.apache.org/r/25468/#comment91662 Properties.put should not be used. Use setProperty instead. Also Constants == deprecated - Lars Francke On Sept. 9, 2014, 2:16 a.m., cheng xu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25468/ --- (Updated Sept. 9, 2014, 2:16 a.m.) Review request for hive. Bugs: HIVE- https://issues.apache.org/jira/browse/HIVE- Repository: hive-git Description --- HIVE-: add CSVSerde support Diffs - serde/pom.xml f8bcc830cfb298d739819db8fbaa2f98f221ccf3 serde/src/java/org/apache/hadoop/hive/serde2/CSVSerde.java PRE-CREATION serde/src/test/org/apache/hadoop/hive/serde2/TestCSVSerde.java PRE-CREATION Diff: https://reviews.apache.org/r/25468/diff/ Testing --- Unit test Thanks, cheng xu
[jira] [Commented] (HIVE-7689) Enable Postgres as METASTORE back-end
[ https://issues.apache.org/jira/browse/HIVE-7689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126766#comment-14126766 ] Damien Carol commented on HIVE-7689: My bad. It's a new method {{findColumnsWithStats}} added in {{CompactionTxnHandler}}. I written : {code:java} if (ci.partName == null) s += AND + TxnDbUtil.getEscape(PARTITION_NAME, identifierQuoteString) + =' + ci.partName + '; {code} Instead of {code:java} if (ci.partName != null) s += AND + TxnDbUtil.getEscape(PARTITION_NAME, identifierQuoteString) + =' + ci.partName + '; {code} These produce : {noformat} 2014-09-09 10:34:12,818 ERROR [nc-h04-22]: compactor.Worker (Worker.java:run(165)) - Caught an exception in the main loop of compactor worker nc-h04-22, exiting MetaException(message:Unable to connect to transaction database org.postgresql.util.PSQLException: ERROR: column PARTITION_NAME does not exist Position: 104 at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2096) at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1829) at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:257) at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:510) at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:372) at org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery(AbstractJdbc2Statement.java:252) at com.jolbox.bonecp.StatementHandle.executeQuery(StatementHandle.java:464) at org.apache.hadoop.hive.metastore.txn.CompactionTxnHandler.findColumnsWithStats(CompactionTxnHandler.java:628) at org.apache.hadoop.hive.ql.txn.compactor.Worker.run(Worker.java:140) ) at org.apache.hadoop.hive.metastore.txn.CompactionTxnHandler.findColumnsWithStats(CompactionTxnHandler.java:645) at org.apache.hadoop.hive.ql.txn.compactor.Worker.run(Worker.java:140) {noformat} This error breack all compactions. Version 4 of the patch was ok but version 5 introduce a new bug. Fixed with new patch now. Enable Postgres as METASTORE back-end - Key: HIVE-7689 URL: https://issues.apache.org/jira/browse/HIVE-7689 Project: Hive Issue Type: Improvement Components: Metastore Affects Versions: 0.14.0 Reporter: Damien Carol Assignee: Damien Carol Priority: Minor Labels: metastore, postgres Fix For: 0.14.0 Attachments: HIVE-7689.5.patch, HIVE-7889.1.patch, HIVE-7889.2.patch, HIVE-7889.3.patch, HIVE-7889.4.patch I maintain few patches to make Metastore works with Postgres back end in our production environment. The main goal of this JIRA is to push upstream these patches. This patch enable LOCKS, COMPACTION and fix error in STATS on postgres metastore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7689) Enable Postgres as METASTORE back-end
[ https://issues.apache.org/jira/browse/HIVE-7689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126772#comment-14126772 ] Damien Carol commented on HIVE-7689: Verified with our streaming benchmark on real hardware and with : {noformat} mvn -B -o test -Phadoop-2 -Dtest=TestWorker {noformat} Enable Postgres as METASTORE back-end - Key: HIVE-7689 URL: https://issues.apache.org/jira/browse/HIVE-7689 Project: Hive Issue Type: Improvement Components: Metastore Affects Versions: 0.14.0 Reporter: Damien Carol Assignee: Damien Carol Priority: Minor Labels: metastore, postgres Fix For: 0.14.0 Attachments: HIVE-7689.5.patch, HIVE-7889.1.patch, HIVE-7889.2.patch, HIVE-7889.3.patch, HIVE-7889.4.patch I maintain few patches to make Metastore works with Postgres back end in our production environment. The main goal of this JIRA is to push upstream these patches. This patch enable LOCKS, COMPACTION and fix error in STATS on postgres metastore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7689) Enable Postgres as METASTORE back-end
[ https://issues.apache.org/jira/browse/HIVE-7689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Carol updated HIVE-7689: --- Attachment: HIVE-7689.6.patch Enable Postgres as METASTORE back-end - Key: HIVE-7689 URL: https://issues.apache.org/jira/browse/HIVE-7689 Project: Hive Issue Type: Improvement Components: Metastore Affects Versions: 0.14.0 Reporter: Damien Carol Assignee: Damien Carol Priority: Minor Labels: metastore, postgres Fix For: 0.14.0 Attachments: HIVE-7689.5.patch, HIVE-7689.6.patch, HIVE-7889.1.patch, HIVE-7889.2.patch, HIVE-7889.3.patch, HIVE-7889.4.patch I maintain few patches to make Metastore works with Postgres back end in our production environment. The main goal of this JIRA is to push upstream these patches. This patch enable LOCKS, COMPACTION and fix error in STATS on postgres metastore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 24602: HIVE-7689 : Enable Postgres as METASTORE back-end
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24602/ --- (Updated sep. 9, 2014, 8:55 matin) Review request for hive. Changes --- Rebased on last trunk and fixed a test failure. Bugs: HIVE-7689 https://issues.apache.org/jira/browse/HIVE-7689 Repository: hive-git Description --- I maintain few patches to make Metastore works with Postgres back end in our production environment. The main goal of this JIRA is to push upstream these patches. This patch enable these features : * LOCKS on postgres metastore * COMPACTION on postgres metastore * TRANSACTION on postgres metastore * fix metastore update script for postgres Diffs (updated) - metastore/scripts/upgrade/postgres/hive-txn-schema-0.13.0.postgres.sql 2ebd3b0 metastore/src/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java d3aa66f metastore/src/java/org/apache/hadoop/hive/metastore/txn/TxnDbUtil.java df183a0 metastore/src/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java f1697bb ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java 264052f ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsAggregator.java b074ca9 ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsPublisher.java 5e317ab ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsUtils.java 4625d27 Diff: https://reviews.apache.org/r/24602/diff/ Testing --- Using patched version in production. Enable concurrency with DbTxnManager. Thanks, Damien Carol
[jira] [Updated] (HIVE-2390) Expand support for union types
[ https://issues.apache.org/jira/browse/HIVE-2390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated HIVE-2390: -- Resolution: Fixed Release Note: Adds UnionType support in LazyBinarySerde Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) I just committed this. Thanks Suma! Expand support for union types -- Key: HIVE-2390 URL: https://issues.apache.org/jira/browse/HIVE-2390 Project: Hive Issue Type: Bug Affects Versions: 0.13.1 Reporter: Jakob Homan Assignee: Suma Shivaprasad Labels: uniontype Fix For: 0.14.0 Attachments: HIVE-2390.1.patch, HIVE-2390.patch When the union type was introduced, full support for it wasn't provided. For instance, when working with a union that gets passed to LazyBinarySerde: {noformat}Caused by: java.lang.RuntimeException: Unrecognized type: UNION at org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serialize(LazyBinarySerDe.java:468) at org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serializeStruct(LazyBinarySerDe.java:230) at org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serialize(LazyBinarySerDe.java:184) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7694) SMB join on tables differing by number of sorted by columns with same join prefix fails
[ https://issues.apache.org/jira/browse/HIVE-7694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126785#comment-14126785 ] Amareshwari Sriramadasu commented on HIVE-7694: --- +1 Code changes look fine to me. [~suma.shivaprasad], Can you rebase the patch? Also run tests once again as the last test build was having test failures. Make sure failed tests are not failing on your local machine before submitting again SMB join on tables differing by number of sorted by columns with same join prefix fails --- Key: HIVE-7694 URL: https://issues.apache.org/jira/browse/HIVE-7694 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.13.1 Reporter: Suma Shivaprasad Assignee: Suma Shivaprasad Fix For: 0.14.0 Attachments: HIVE-7694.1.patch, HIVE-7694.patch For eg: If two tables T1 sorted by (a, b, c) clustered by a and T2 sorted by (a) and clustered by (a) are joined, the following exception is seen {noformat} 14/08/11 09:09:38 ERROR ql.Driver: FAILED: IndexOutOfBoundsException Index: 1, Size: 1 java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 at java.util.ArrayList.RangeCheck(ArrayList.java:547) at java.util.ArrayList.get(ArrayList.java:322) at org.apache.hadoop.hive.ql.optimizer.AbstractSMBJoinProc.checkSortColsAndJoinCols(AbstractSMBJoinProc.java:378) at org.apache.hadoop.hive.ql.optimizer.AbstractSMBJoinProc.isEligibleForBucketSortMergeJoin(AbstractSMBJoinProc.java:352) at org.apache.hadoop.hive.ql.optimizer.AbstractSMBJoinProc.canConvertBucketMapJoinToSMBJoin(AbstractSMBJoinProc.java:119) at org.apache.hadoop.hive.ql.optimizer.SortedMergeBucketMapjoinProc.process(SortedMergeBucketMapjoinProc.java:51) at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:78) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:132) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:109) at org.apache.hadoop.hive.ql.optimizer.SortedMergeBucketMapJoinOptimizer.transform(SortedMergeBucketMapJoinOptimizer.java:109) at org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:146) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:9305) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327) at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:64) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:393) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 24630: HIVE-7694 - SMB joins on tables differing by number of sorted by columns but same sort prefix and join keys fail
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24630/#review52695 --- Ship it! Ship It! - Amareshwari Sriramadasu On Sept. 8, 2014, 5:25 p.m., Suma Shivaprasad wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24630/ --- (Updated Sept. 8, 2014, 5:25 p.m.) Review request for hive, Amareshwari Sriramadasu, Brock Noland, Gunther Hagleitner, and Navis Ryu. Bugs: HIVE-7694 https://issues.apache.org/jira/browse/HIVE-7694 Repository: hive-git Description --- For eg: If two tables T1 sorted by (a, b, c) clustered by a and T2 sorted by (a) and clustered by (a) are joined, an exception is seen as reported in https://issues.apache.org/jira/browse/HIVE-7694 Diffs - ql/src/java/org/apache/hadoop/hive/ql/optimizer/AbstractSMBJoinProc.java 0b7b1a3 ql/src/test/queries/clientpositive/sort_merge_join_desc_8.q PRE-CREATION ql/src/test/results/clientpositive/sort_merge_join_desc_8.q.out PRE-CREATION Diff: https://reviews.apache.org/r/24630/diff/ Testing --- sort_merge_join_desc_8.q added for testing the above cases Thanks, Suma Shivaprasad
[jira] [Updated] (HIVE-8017) Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li updated HIVE-8017: - Attachment: HIVE-8017.2-spark.patch This patch fixes some failed qfile tests caused by last patch. Two qtests are not fixed: {{optimize_nullscan.q}} and {{union_remove_25.q}}. For {{optimize_nullscan.q}} I checked the corresponding MR output and found the operator tree in the new output file is more similar to the one in the MR version output. Besides this failure is of age 2, so I guess it's not related to the patch here. For {{union_remove_25.q}}, the only diff is the total size of {{outputTbl2}} (6812 - 6826). I checked the MR version and the total size is also 6812. I'm not sure what causes this difference. Maybe need to do more tests for partitioned table. [~xuefuz] do you have any idea on this? Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark Branch] --- Key: HIVE-8017 URL: https://issues.apache.org/jira/browse/HIVE-8017 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Rui Li Assignee: Rui Li Attachments: HIVE-8017-spark.patch, HIVE-8017.2-spark.patch HiveKey should be used as the key type because it holds the hash code for partitioning. While BytesWritable serves partitioning well for simple cases, we have to use {{HiveKey.hashCode}} for more complicated ones, e.g. join, bucketed table, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8029) Remove reducers number configure in SparkTask[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengxiang Li updated HIVE-8029: Attachment: HIVE-8029.1-spark.patch Remove reducers number configure in SparkTask[Spark Branch] --- Key: HIVE-8029 URL: https://issues.apache.org/jira/browse/HIVE-8029 Project: Hive Issue Type: Improvement Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Labels: Spark-M4 Attachments: HIVE-8029.1-spark.patch We do not need duplicated logic to configure reducers number in SparkTask, as SetSparkReduceParallelism would always set reducers number in compiler phase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8029) Remove reducers number configure in SparkTask[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengxiang Li updated HIVE-8029: Status: Patch Available (was: Open) Remove reducers number configure in SparkTask[Spark Branch] --- Key: HIVE-8029 URL: https://issues.apache.org/jira/browse/HIVE-8029 Project: Hive Issue Type: Improvement Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Labels: Spark-M4 Attachments: HIVE-8029.1-spark.patch We do not need duplicated logic to configure reducers number in SparkTask, as SetSparkReduceParallelism would always set reducers number in compiler phase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7156) Group-By operator stat-annotation only uses distinct approx to generate rollups
[ https://issues.apache.org/jira/browse/HIVE-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126795#comment-14126795 ] Hive QA commented on HIVE-7156: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12667341/HIVE-7156.1.patch {color:red}ERROR:{color} -1 due to 178 failed/errored test(s), 6186 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join0 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join10 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join11 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join12 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join13 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join15 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join16 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join18 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join18_multi_distinct org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join20 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join22 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join24 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join26 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join27 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join30 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join31 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join32 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_smb_mapjoin_14 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_10 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_11 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_12 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_14 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_15 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_5 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_6 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_7 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_8 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_9 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_binarysortable_1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucket_map_join_1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucket_map_join_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_5 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_6 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_7 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_8 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin10 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin11 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin12 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin13 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin8 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin9 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_combine2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer10 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer11 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer14 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer15 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer6 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer7 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer8
[jira] [Commented] (HIVE-8017) Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126833#comment-14126833 ] Hive QA commented on HIVE-8017: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12667378/HIVE-8017.2-spark.patch {color:red}ERROR:{color} -1 due to 18 failed/errored test(s), 6343 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_having org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_limit_pushdown org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_merge1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_merge2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_19 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_groupby1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_having org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_limit_pushdown org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_merge1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_merge2 org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_fs_default_name2 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_optimize_nullscan org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_25 org.apache.hadoop.hive.ql.TestMTQueries.testMTQueries1 org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/119/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/119/console Test logs: http://ec2-54-176-176-199.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-119/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 18 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12667378 Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark Branch] --- Key: HIVE-8017 URL: https://issues.apache.org/jira/browse/HIVE-8017 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Rui Li Assignee: Rui Li Attachments: HIVE-8017-spark.patch, HIVE-8017.2-spark.patch HiveKey should be used as the key type because it holds the hash code for partitioning. While BytesWritable serves partitioning well for simple cases, we have to use {{HiveKey.hashCode}} for more complicated ones, e.g. join, bucketed table, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7694) SMB join on tables differing by number of sorted by columns with same join prefix fails
[ https://issues.apache.org/jira/browse/HIVE-7694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suma Shivaprasad updated HIVE-7694: --- Attachment: HIVE-7694.2.patch SMB join on tables differing by number of sorted by columns with same join prefix fails --- Key: HIVE-7694 URL: https://issues.apache.org/jira/browse/HIVE-7694 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.13.1 Reporter: Suma Shivaprasad Assignee: Suma Shivaprasad Fix For: 0.14.0 Attachments: HIVE-7694.1.patch, HIVE-7694.2.patch, HIVE-7694.patch For eg: If two tables T1 sorted by (a, b, c) clustered by a and T2 sorted by (a) and clustered by (a) are joined, the following exception is seen {noformat} 14/08/11 09:09:38 ERROR ql.Driver: FAILED: IndexOutOfBoundsException Index: 1, Size: 1 java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 at java.util.ArrayList.RangeCheck(ArrayList.java:547) at java.util.ArrayList.get(ArrayList.java:322) at org.apache.hadoop.hive.ql.optimizer.AbstractSMBJoinProc.checkSortColsAndJoinCols(AbstractSMBJoinProc.java:378) at org.apache.hadoop.hive.ql.optimizer.AbstractSMBJoinProc.isEligibleForBucketSortMergeJoin(AbstractSMBJoinProc.java:352) at org.apache.hadoop.hive.ql.optimizer.AbstractSMBJoinProc.canConvertBucketMapJoinToSMBJoin(AbstractSMBJoinProc.java:119) at org.apache.hadoop.hive.ql.optimizer.SortedMergeBucketMapjoinProc.process(SortedMergeBucketMapjoinProc.java:51) at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:78) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:132) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:109) at org.apache.hadoop.hive.ql.optimizer.SortedMergeBucketMapJoinOptimizer.transform(SortedMergeBucketMapJoinOptimizer.java:109) at org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:146) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:9305) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327) at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:64) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:393) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7892) Thrift Set type not working with Hive
[ https://issues.apache.org/jira/browse/HIVE-7892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Satish Mittal updated HIVE-7892: Attachment: HIVE-7892.patch.txt Attaching patch that resolves the issue. The approach taken here is to essentially map thrift Set type to hive Array type (thrift List type already maps to hive Array). Since both List and Set are essentially collections, we can simply leverage the existing Array type, instead of exposing a new complex type at hive level. Thrift Set type not working with Hive - Key: HIVE-7892 URL: https://issues.apache.org/jira/browse/HIVE-7892 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Reporter: Satish Mittal Assignee: Satish Mittal Attachments: HIVE-7892.patch.txt Thrift supports List, Map and Struct complex types, which get mapped to Array, Map and Struct complex types in Hive respectively. However thrift Set type doesn't seem to be working. Here is an example thrift struct: {noformat} namespace java sample.thrift struct setrow { 1: required seti32 ids, 2: required string name, } {noformat} A Hive table is created with ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.thrift.ThriftDeserializer' WITH SERDEPROPERTIES ('serialization.class'='sample.thrift.setrow', 'serialization.format'='org.apache.thrift.protocol.TBinaryProtocol'). Describing the table shows: {noformat} hive describe settable; OK ids structfrom deserializer namestringfrom deserializer {noformat} Issuing a select query on set column throws SemanticException: {noformat} hive select ids from settable; FAILED: SemanticException java.lang.IllegalArgumentException: Error: name expected at the position 7 of 'struct' but '' is found. {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Review Request 25473: Thrift Set type not working with Hive
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25473/ --- Review request for hive, Amareshwari Sriramadasu, Ashutosh Chauhan, and Navis Ryu. Bugs: HIVE-7892 https://issues.apache.org/jira/browse/HIVE-7892 Repository: hive-git Description --- Thrift supports List, Map and Struct complex types, which get mapped to Array, Map and Struct complex types in Hive respectively. However thrift Set type doesn't get mapped to any Hive type, and hence doesn't work with ThriftDeserializer serde. Diffs - serde/if/test/complex.thrift 308b64c serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/SetIntString.java PRE-CREATION serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorFactory.java 9a226b3 serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/StandardListObjectInspector.java 6eb8803 serde/src/test/org/apache/hadoop/hive/serde2/objectinspector/TestThriftObjectInspectors.java 5f692fb Diff: https://reviews.apache.org/r/25473/diff/ Testing --- 1) Added Unit test along with the fix. 2) Manually tested by creating a table with ThriftDeserializer serde and having thrift set columns: a) described the table b) issued query to select the set column Thanks, Satish Mittal
[jira] [Commented] (HIVE-7892) Thrift Set type not working with Hive
[ https://issues.apache.org/jira/browse/HIVE-7892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126866#comment-14126866 ] Satish Mittal commented on HIVE-7892: - Review: https://reviews.apache.org/r/25473/ Thrift Set type not working with Hive - Key: HIVE-7892 URL: https://issues.apache.org/jira/browse/HIVE-7892 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Reporter: Satish Mittal Assignee: Satish Mittal Attachments: HIVE-7892.patch.txt Thrift supports List, Map and Struct complex types, which get mapped to Array, Map and Struct complex types in Hive respectively. However thrift Set type doesn't seem to be working. Here is an example thrift struct: {noformat} namespace java sample.thrift struct setrow { 1: required seti32 ids, 2: required string name, } {noformat} A Hive table is created with ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.thrift.ThriftDeserializer' WITH SERDEPROPERTIES ('serialization.class'='sample.thrift.setrow', 'serialization.format'='org.apache.thrift.protocol.TBinaryProtocol'). Describing the table shows: {noformat} hive describe settable; OK ids structfrom deserializer namestringfrom deserializer {noformat} Issuing a select query on set column throws SemanticException: {noformat} hive select ids from settable; FAILED: SemanticException java.lang.IllegalArgumentException: Error: name expected at the position 7 of 'struct' but '' is found. {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7892) Thrift Set type not working with Hive
[ https://issues.apache.org/jira/browse/HIVE-7892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Satish Mittal updated HIVE-7892: Status: Patch Available (was: Open) Thrift Set type not working with Hive - Key: HIVE-7892 URL: https://issues.apache.org/jira/browse/HIVE-7892 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Reporter: Satish Mittal Assignee: Satish Mittal Attachments: HIVE-7892.patch.txt Thrift supports List, Map and Struct complex types, which get mapped to Array, Map and Struct complex types in Hive respectively. However thrift Set type doesn't seem to be working. Here is an example thrift struct: {noformat} namespace java sample.thrift struct setrow { 1: required seti32 ids, 2: required string name, } {noformat} A Hive table is created with ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.thrift.ThriftDeserializer' WITH SERDEPROPERTIES ('serialization.class'='sample.thrift.setrow', 'serialization.format'='org.apache.thrift.protocol.TBinaryProtocol'). Describing the table shows: {noformat} hive describe settable; OK ids structfrom deserializer namestringfrom deserializer {noformat} Issuing a select query on set column throws SemanticException: {noformat} hive select ids from settable; FAILED: SemanticException java.lang.IllegalArgumentException: Error: name expected at the position 7 of 'struct' but '' is found. {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8029) Remove reducers number configure in SparkTask[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126899#comment-14126899 ] Hive QA commented on HIVE-8029: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12667379/HIVE-8029.1-spark.patch {color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 6343 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_fs_default_name2 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_optimize_nullscan org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection org.apache.hive.service.TestHS2ImpersonationWithRemoteMS.testImpersonation {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/120/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/120/console Test logs: http://ec2-54-176-176-199.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-120/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12667379 Remove reducers number configure in SparkTask[Spark Branch] --- Key: HIVE-8029 URL: https://issues.apache.org/jira/browse/HIVE-8029 Project: Hive Issue Type: Improvement Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Labels: Spark-M4 Attachments: HIVE-8029.1-spark.patch We do not need duplicated logic to configure reducers number in SparkTask, as SetSparkReduceParallelism would always set reducers number in compiler phase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-8030) NullPointerException on getSchemas
Shiv Prakash created HIVE-8030: -- Summary: NullPointerException on getSchemas Key: HIVE-8030 URL: https://issues.apache.org/jira/browse/HIVE-8030 Project: Hive Issue Type: Bug Components: Database/Schema, JDBC Affects Versions: 0.13.1 Environment: Linux (Ubuntu 12.04) Reporter: Shiv Prakash Fix For: 0.13.1 java.lang.NullPointerException at java.util.ArrayList.init(ArrayList.java:151) at org.apache.hadoop.hive.jdbc.HiveMetaDataResultSet.init(HiveMetaDataResultSet.java:32) at org.apache.hadoop.hive.jdbc.HiveDatabaseMetaData$3.init(HiveDatabaseMetaData.java:482) at org.apache.hadoop.hive.jdbc.HiveDatabaseMetaData.getSchemas(HiveDatabaseMetaData.java:481) at org.apache.hadoop.hive.jdbc.HiveDatabaseMetaData.getSchemas(HiveDatabaseMetaData.java:476) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:622) at org.pentaho.hadoop.shim.common.DriverProxyInvocationChain$DatabaseMetaDataInvocationHandler.invoke(DriverProxyInvocationChain.java:368) at com.sun.proxy.$Proxy20.getSchemas(Unknown Source) at org.pentaho.di.core.database.Database.getSchemas(Database.java:3857) at org.pentaho.di.ui.trans.steps.tableoutput.TableOutputDialog.getSchemaNames(TableOutputDialog.java:1036) at org.pentaho.di.ui.trans.steps.tableoutput.TableOutputDialog.access$2400(TableOutputDialog.java:94) at org.pentaho.di.ui.trans.steps.tableoutput.TableOutputDialog$24.widgetSelected(TableOutputDialog.java:863) at org.eclipse.swt.widgets.TypedListener.handleEvent(Unknown Source) at org.eclipse.swt.widgets.EventTable.sendEvent(Unknown Source) at org.eclipse.swt.widgets.Widget.sendEvent(Unknown Source) at org.eclipse.swt.widgets.Display.runDeferredEvents(Unknown Source) at org.eclipse.swt.widgets.Display.readAndDispatch(Unknown Source) at org.pentaho.di.ui.trans.steps.tableoutput.TableOutputDialog.open(TableOutputDialog.java:884) at org.pentaho.di.ui.spoon.delegates.SpoonStepsDelegate.editStep(SpoonStepsDelegate.java:124) at org.pentaho.di.ui.spoon.Spoon.editStep(Spoon.java:8648) at org.pentaho.di.ui.spoon.trans.TransGraph.editStep(TransGraph.java:3020) at org.pentaho.di.ui.spoon.trans.TransGraph.mouseDoubleClick(TransGraph.java:737) at org.eclipse.swt.widgets.TypedListener.handleEvent(Unknown Source) at org.eclipse.swt.widgets.EventTable.sendEvent(Unknown Source) at org.eclipse.swt.widgets.Widget.sendEvent(Unknown Source) at org.eclipse.swt.widgets.Display.runDeferredEvents(Unknown Source) at org.eclipse.swt.widgets.Display.readAndDispatch(Unknown Source) at org.pentaho.di.ui.spoon.Spoon.readAndDispatch(Spoon.java:1297) at org.pentaho.di.ui.spoon.Spoon.waitForDispose(Spoon.java:7801) at org.pentaho.di.ui.spoon.Spoon.start(Spoon.java:9130) at org.pentaho.di.ui.spoon.Spoon.main(Spoon.java:638) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:622) at org.pentaho.commons.launcher.Launcher.main(Launcher.java:151) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7946) CBO: Merge CBO changes to Trunk
[ https://issues.apache.org/jira/browse/HIVE-7946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126921#comment-14126921 ] Hive QA commented on HIVE-7946: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12667347/HIVE-7946.5.patch {color:red}ERROR:{color} -1 due to 261 failed/errored test(s), 5557 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver_accumulo_predicate_pushdown org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_allcolref_in_udf org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_partition_coltype org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_analyze_table_null_partition org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_filter org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_groupby org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_join org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_limit org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_part org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_select org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_table org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_union org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ansi_sql_arithmetic org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_explain org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_nulls org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_reordering_values org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_without_localtask org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_partitioned_native org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucket_groupby org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_char_cast org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_combine2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_constprog2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_count org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_create_func1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_create_genericudaf org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_create_union_table org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_create_view_partitioned org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cross_join org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cross_product_check_1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cross_product_check_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_decimal_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_decimal_precision org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_decimal_udf org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_distinct_stats org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_04_evolved_parts org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_explain_dependency org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_explain_logical org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_fetch_aggregation org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_filter_join_breaktask org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_filter_join_breaktask2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby2_limit org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_distinct_samekey org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_resolution org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_sort_10 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_sort_11 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_sort_1_23 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_sort_8 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_sort_9 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_sort_skew_1_23 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_having org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_having2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_empty org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_file_format org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_mult_tables org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_mult_tables_compact org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_multiple org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_partitioned org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_self_join
[jira] [Commented] (HIVE-8030) NullPointerException on getSchemas
[ https://issues.apache.org/jira/browse/HIVE-8030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126924#comment-14126924 ] Lars Francke commented on HIVE-8030: This looks very similar to HIVE-8030. I'm looking at the current code and all the ArrayList creations are guarded by null checks. Are you sure that you are using Hive 0.13.1? The line numbers don't seem to match up either. NullPointerException on getSchemas -- Key: HIVE-8030 URL: https://issues.apache.org/jira/browse/HIVE-8030 Project: Hive Issue Type: Bug Components: Database/Schema, JDBC Affects Versions: 0.13.1 Environment: Linux (Ubuntu 12.04) Reporter: Shiv Prakash Labels: hadoop Fix For: 0.13.1 java.lang.NullPointerException at java.util.ArrayList.init(ArrayList.java:151) at org.apache.hadoop.hive.jdbc.HiveMetaDataResultSet.init(HiveMetaDataResultSet.java:32) at org.apache.hadoop.hive.jdbc.HiveDatabaseMetaData$3.init(HiveDatabaseMetaData.java:482) at org.apache.hadoop.hive.jdbc.HiveDatabaseMetaData.getSchemas(HiveDatabaseMetaData.java:481) at org.apache.hadoop.hive.jdbc.HiveDatabaseMetaData.getSchemas(HiveDatabaseMetaData.java:476) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:622) at org.pentaho.hadoop.shim.common.DriverProxyInvocationChain$DatabaseMetaDataInvocationHandler.invoke(DriverProxyInvocationChain.java:368) at com.sun.proxy.$Proxy20.getSchemas(Unknown Source) at org.pentaho.di.core.database.Database.getSchemas(Database.java:3857) at org.pentaho.di.ui.trans.steps.tableoutput.TableOutputDialog.getSchemaNames(TableOutputDialog.java:1036) at org.pentaho.di.ui.trans.steps.tableoutput.TableOutputDialog.access$2400(TableOutputDialog.java:94) at org.pentaho.di.ui.trans.steps.tableoutput.TableOutputDialog$24.widgetSelected(TableOutputDialog.java:863) at org.eclipse.swt.widgets.TypedListener.handleEvent(Unknown Source) at org.eclipse.swt.widgets.EventTable.sendEvent(Unknown Source) at org.eclipse.swt.widgets.Widget.sendEvent(Unknown Source) at org.eclipse.swt.widgets.Display.runDeferredEvents(Unknown Source) at org.eclipse.swt.widgets.Display.readAndDispatch(Unknown Source) at org.pentaho.di.ui.trans.steps.tableoutput.TableOutputDialog.open(TableOutputDialog.java:884) at org.pentaho.di.ui.spoon.delegates.SpoonStepsDelegate.editStep(SpoonStepsDelegate.java:124) at org.pentaho.di.ui.spoon.Spoon.editStep(Spoon.java:8648) at org.pentaho.di.ui.spoon.trans.TransGraph.editStep(TransGraph.java:3020) at org.pentaho.di.ui.spoon.trans.TransGraph.mouseDoubleClick(TransGraph.java:737) at org.eclipse.swt.widgets.TypedListener.handleEvent(Unknown Source) at org.eclipse.swt.widgets.EventTable.sendEvent(Unknown Source) at org.eclipse.swt.widgets.Widget.sendEvent(Unknown Source) at org.eclipse.swt.widgets.Display.runDeferredEvents(Unknown Source) at org.eclipse.swt.widgets.Display.readAndDispatch(Unknown Source) at org.pentaho.di.ui.spoon.Spoon.readAndDispatch(Spoon.java:1297) at org.pentaho.di.ui.spoon.Spoon.waitForDispose(Spoon.java:7801) at org.pentaho.di.ui.spoon.Spoon.start(Spoon.java:9130) at org.pentaho.di.ui.spoon.Spoon.main(Spoon.java:638) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:622) at org.pentaho.commons.launcher.Launcher.main(Launcher.java:151) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-7776) enable sample10.q.[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengxiang Li reassigned HIVE-7776: --- Assignee: Chengxiang Li enable sample10.q.[Spark Branch] Key: HIVE-7776 URL: https://issues.apache.org/jira/browse/HIVE-7776 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li sample10.q contain dynamic partition operation, should enable this qtest after hive on spark support dynamic partition. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7704) Create tez task for fast file merging
[ https://issues.apache.org/jira/browse/HIVE-7704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126985#comment-14126985 ] Hive QA commented on HIVE-7704: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12667365/HIVE-7704.8.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 6193 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_merge1 {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/707/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/707/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-707/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12667365 Create tez task for fast file merging - Key: HIVE-7704 URL: https://issues.apache.org/jira/browse/HIVE-7704 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Attachments: HIVE-7704.1.patch, HIVE-7704.2.patch, HIVE-7704.3.patch, HIVE-7704.4.patch, HIVE-7704.4.patch, HIVE-7704.5.patch, HIVE-7704.6.patch, HIVE-7704.7.patch, HIVE-7704.8.patch Currently tez falls back to MR task for merge file task. It will beneficial to convert the merge file tasks to tez task to make use of the performance gains from tez. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-7834) Use min, max and NDV from the stats to better estimate many to many vs one to many inner joins
[ https://issues.apache.org/jira/browse/HIVE-7834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mostafa Mokhtar resolved HIVE-7834. --- Resolution: Duplicate Use min, max and NDV from the stats to better estimate many to many vs one to many inner joins -- Key: HIVE-7834 URL: https://issues.apache.org/jira/browse/HIVE-7834 Project: Hive Issue Type: Bug Components: CBO Affects Versions: 0.13.1 Reporter: Mostafa Mokhtar Fix For: 0.14.0 I noticed that the estimate number of rows in Map joins is higher after the join than before the join that is with column stats fetch ON or OFF. TPC-DS Q55 was a good example for that, the issue is that the current statistics provide us enough information that we can estimate with strong confidence that the joins are one to many and not many to many. Joining store_sales x item on ss_item_sk = i_item_sk, we know that the NDV, min and max values for both join columns match while the row counts are different this pattern indicates a PK/FK relationship between store_sales and item. Yet when a filter is applied on item and reduces the number of rows from 462K to 7K we estimate a many to many join between the filtered item and store_sales and as a result the estimate number of rows coming out of the join is off by several orders of magnitude. Available information from the stats {code} Table Join column NDV from describe NDV actual min max item i_item_sk 439,501 462,000 1 462,000 date_dim d_date_sk 65,332 73,049 2,415,022 2,488,070 store_sales ss_item_sk 439,501 462,000 1 462,000 store_sales ss_sold_date_sk 2,226 1,823 2,450,816 2,452,642 {code} Same thing applies to store_sales and date_dim but with a caveat that the NDV , min and max values don't match where date_dim has a bigger domain and accordingly a higher NDV count. For joining store_sales and item on on ss_item_sk = i_item_sk since both columns have the same NDV, min and max values we can safely conclude that selectivity on item will translate to similar selectivity on store_sales. This is not the case for joining store_sales and date_dim on ss_sold_date_sk = d_date_sk since the domain of d_date_sk is much bigger than that of ss_sold_date_sk, differences in domain need to be taken into account when inferring selectivity onto store_sales. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-8031) CBO should use per column join selectivity not NDV when applying exponential backoff.
Mostafa Mokhtar created HIVE-8031: - Summary: CBO should use per column join selectivity not NDV when applying exponential backoff. Key: HIVE-8031 URL: https://issues.apache.org/jira/browse/HIVE-8031 Project: Hive Issue Type: Bug Components: CBO Affects Versions: 0.13.1 Reporter: Mostafa Mokhtar Assignee: Laljo John Pullokkaran Fix For: 0.14.0 Simplify predicates for disjunctive predicates so that can get pushed down to the scan. For TPC-DS query 13 we push down predicates in the following form where c_martial_status in ('M','D','U') etc.. {code} select avg(ss_quantity) ,avg(ss_ext_sales_price) ,avg(ss_ext_wholesale_cost) ,sum(ss_ext_wholesale_cost) from store_sales ,store ,customer_demographics ,household_demographics ,customer_address ,date_dim where store.s_store_sk = store_sales.ss_store_sk and store_sales.ss_sold_date_sk = date_dim.d_date_sk and date_dim.d_year = 2001 and((store_sales.ss_hdemo_sk=household_demographics.hd_demo_sk and customer_demographics.cd_demo_sk = store_sales.ss_cdemo_sk and customer_demographics.cd_marital_status = 'M' and customer_demographics.cd_education_status = '4 yr Degree' and store_sales.ss_sales_price between 100.00 and 150.00 and household_demographics.hd_dep_count = 3 )or (store_sales.ss_hdemo_sk=household_demographics.hd_demo_sk and customer_demographics.cd_demo_sk = store_sales.ss_cdemo_sk and customer_demographics.cd_marital_status = 'D' and customer_demographics.cd_education_status = 'Primary' and store_sales.ss_sales_price between 50.00 and 100.00 and household_demographics.hd_dep_count = 1 ) or (store_sales.ss_hdemo_sk=household_demographics.hd_demo_sk and customer_demographics.cd_demo_sk = ss_cdemo_sk and customer_demographics.cd_marital_status = 'U' and customer_demographics.cd_education_status = 'Advanced Degree' and store_sales.ss_sales_price between 150.00 and 200.00 and household_demographics.hd_dep_count = 1 )) and((store_sales.ss_addr_sk = customer_address.ca_address_sk and customer_address.ca_country = 'United States' and customer_address.ca_state in ('KY', 'GA', 'NM') and store_sales.ss_net_profit between 100 and 200 ) or (store_sales.ss_addr_sk = customer_address.ca_address_sk and customer_address.ca_country = 'United States' and customer_address.ca_state in ('MT', 'OR', 'IN') and store_sales.ss_net_profit between 150 and 300 ) or (store_sales.ss_addr_sk = customer_address.ca_address_sk and customer_address.ca_country = 'United States' and customer_address.ca_state in ('WI', 'MO', 'WV') and store_sales.ss_net_profit between 50 and 250 )) ; {code} This is the plan currently generated without any predicate simplification {code} STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Tez Edges: Map 7 - Map 8 (BROADCAST_EDGE) Map 8 - Map 5 (BROADCAST_EDGE), Map 6 (BROADCAST_EDGE) Reducer 2 - Map 1 (SIMPLE_EDGE), Map 4 (BROADCAST_EDGE), Map 7 (SIMPLE_EDGE) Reducer 3 - Reducer 2 (SIMPLE_EDGE) DagName: mmokhtar_20140828155050_7059c24b-501b-4683-86c0-4f3c023f0b0e:1 Vertices: Map 1 Map Operator Tree: TableScan alias: customer_address Statistics: Num rows: 4000 Data size: 40595195284 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: ca_address_sk (type: int), ca_state (type: string), ca_country (type: string) outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 4000 Data size: 40595195284 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator sort order: Statistics: Num rows: 4000 Data size: 40595195284 Basic stats: COMPLETE Column stats: NONE value expressions: _col0 (type: int), _col1 (type: string), _col2 (type: string) Execution mode: vectorized Map 4 Map Operator Tree: TableScan alias: date_dim filterExpr: ((d_year = 2001) and d_date_sk is not null) (type: boolean) Statistics: Num rows: 73049 Data size: 81741831 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: ((d_year = 2001) and d_date_sk is not null) (type: boolean) Statistics: Num rows: 18262 Data size: 20435178 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: d_date_sk (type: int) outputColumnNames: _col0
[jira] [Updated] (HIVE-8031) CBO should use per column join selectivity not NDV when applying exponential backoff.
[ https://issues.apache.org/jira/browse/HIVE-8031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mostafa Mokhtar updated HIVE-8031: -- Assignee: Harish Butani (was: Laljo John Pullokkaran) CBO should use per column join selectivity not NDV when applying exponential backoff. - Key: HIVE-8031 URL: https://issues.apache.org/jira/browse/HIVE-8031 Project: Hive Issue Type: Bug Components: CBO Affects Versions: 0.14.0, 0.13.1 Reporter: Mostafa Mokhtar Assignee: Harish Butani Fix For: 0.14.0 Simplify predicates for disjunctive predicates so that can get pushed down to the scan. For TPC-DS query 13 we push down predicates in the following form where c_martial_status in ('M','D','U') etc.. {code} select avg(ss_quantity) ,avg(ss_ext_sales_price) ,avg(ss_ext_wholesale_cost) ,sum(ss_ext_wholesale_cost) from store_sales ,store ,customer_demographics ,household_demographics ,customer_address ,date_dim where store.s_store_sk = store_sales.ss_store_sk and store_sales.ss_sold_date_sk = date_dim.d_date_sk and date_dim.d_year = 2001 and((store_sales.ss_hdemo_sk=household_demographics.hd_demo_sk and customer_demographics.cd_demo_sk = store_sales.ss_cdemo_sk and customer_demographics.cd_marital_status = 'M' and customer_demographics.cd_education_status = '4 yr Degree' and store_sales.ss_sales_price between 100.00 and 150.00 and household_demographics.hd_dep_count = 3 )or (store_sales.ss_hdemo_sk=household_demographics.hd_demo_sk and customer_demographics.cd_demo_sk = store_sales.ss_cdemo_sk and customer_demographics.cd_marital_status = 'D' and customer_demographics.cd_education_status = 'Primary' and store_sales.ss_sales_price between 50.00 and 100.00 and household_demographics.hd_dep_count = 1 ) or (store_sales.ss_hdemo_sk=household_demographics.hd_demo_sk and customer_demographics.cd_demo_sk = ss_cdemo_sk and customer_demographics.cd_marital_status = 'U' and customer_demographics.cd_education_status = 'Advanced Degree' and store_sales.ss_sales_price between 150.00 and 200.00 and household_demographics.hd_dep_count = 1 )) and((store_sales.ss_addr_sk = customer_address.ca_address_sk and customer_address.ca_country = 'United States' and customer_address.ca_state in ('KY', 'GA', 'NM') and store_sales.ss_net_profit between 100 and 200 ) or (store_sales.ss_addr_sk = customer_address.ca_address_sk and customer_address.ca_country = 'United States' and customer_address.ca_state in ('MT', 'OR', 'IN') and store_sales.ss_net_profit between 150 and 300 ) or (store_sales.ss_addr_sk = customer_address.ca_address_sk and customer_address.ca_country = 'United States' and customer_address.ca_state in ('WI', 'MO', 'WV') and store_sales.ss_net_profit between 50 and 250 )) ; {code} This is the plan currently generated without any predicate simplification {code} STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Tez Edges: Map 7 - Map 8 (BROADCAST_EDGE) Map 8 - Map 5 (BROADCAST_EDGE), Map 6 (BROADCAST_EDGE) Reducer 2 - Map 1 (SIMPLE_EDGE), Map 4 (BROADCAST_EDGE), Map 7 (SIMPLE_EDGE) Reducer 3 - Reducer 2 (SIMPLE_EDGE) DagName: mmokhtar_20140828155050_7059c24b-501b-4683-86c0-4f3c023f0b0e:1 Vertices: Map 1 Map Operator Tree: TableScan alias: customer_address Statistics: Num rows: 4000 Data size: 40595195284 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: ca_address_sk (type: int), ca_state (type: string), ca_country (type: string) outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 4000 Data size: 40595195284 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator sort order: Statistics: Num rows: 4000 Data size: 40595195284 Basic stats: COMPLETE Column stats: NONE value expressions: _col0 (type: int), _col1 (type: string), _col2 (type: string) Execution mode: vectorized Map 4 Map Operator Tree: TableScan alias: date_dim filterExpr: ((d_year = 2001) and d_date_sk is not null) (type: boolean) Statistics: Num rows: 73049 Data size: 81741831 Basic stats: COMPLETE Column stats: NONE Filter Operator
[jira] [Updated] (HIVE-8031) CBO should use per column join selectivity not NDV when applying exponential backoff.
[ https://issues.apache.org/jira/browse/HIVE-8031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mostafa Mokhtar updated HIVE-8031: -- Description: (was: Simplify predicates for disjunctive predicates so that can get pushed down to the scan. For TPC-DS query 13 we push down predicates in the following form where c_martial_status in ('M','D','U') etc.. {code} select avg(ss_quantity) ,avg(ss_ext_sales_price) ,avg(ss_ext_wholesale_cost) ,sum(ss_ext_wholesale_cost) from store_sales ,store ,customer_demographics ,household_demographics ,customer_address ,date_dim where store.s_store_sk = store_sales.ss_store_sk and store_sales.ss_sold_date_sk = date_dim.d_date_sk and date_dim.d_year = 2001 and((store_sales.ss_hdemo_sk=household_demographics.hd_demo_sk and customer_demographics.cd_demo_sk = store_sales.ss_cdemo_sk and customer_demographics.cd_marital_status = 'M' and customer_demographics.cd_education_status = '4 yr Degree' and store_sales.ss_sales_price between 100.00 and 150.00 and household_demographics.hd_dep_count = 3 )or (store_sales.ss_hdemo_sk=household_demographics.hd_demo_sk and customer_demographics.cd_demo_sk = store_sales.ss_cdemo_sk and customer_demographics.cd_marital_status = 'D' and customer_demographics.cd_education_status = 'Primary' and store_sales.ss_sales_price between 50.00 and 100.00 and household_demographics.hd_dep_count = 1 ) or (store_sales.ss_hdemo_sk=household_demographics.hd_demo_sk and customer_demographics.cd_demo_sk = ss_cdemo_sk and customer_demographics.cd_marital_status = 'U' and customer_demographics.cd_education_status = 'Advanced Degree' and store_sales.ss_sales_price between 150.00 and 200.00 and household_demographics.hd_dep_count = 1 )) and((store_sales.ss_addr_sk = customer_address.ca_address_sk and customer_address.ca_country = 'United States' and customer_address.ca_state in ('KY', 'GA', 'NM') and store_sales.ss_net_profit between 100 and 200 ) or (store_sales.ss_addr_sk = customer_address.ca_address_sk and customer_address.ca_country = 'United States' and customer_address.ca_state in ('MT', 'OR', 'IN') and store_sales.ss_net_profit between 150 and 300 ) or (store_sales.ss_addr_sk = customer_address.ca_address_sk and customer_address.ca_country = 'United States' and customer_address.ca_state in ('WI', 'MO', 'WV') and store_sales.ss_net_profit between 50 and 250 )) ; {code} This is the plan currently generated without any predicate simplification {code} STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Tez Edges: Map 7 - Map 8 (BROADCAST_EDGE) Map 8 - Map 5 (BROADCAST_EDGE), Map 6 (BROADCAST_EDGE) Reducer 2 - Map 1 (SIMPLE_EDGE), Map 4 (BROADCAST_EDGE), Map 7 (SIMPLE_EDGE) Reducer 3 - Reducer 2 (SIMPLE_EDGE) DagName: mmokhtar_20140828155050_7059c24b-501b-4683-86c0-4f3c023f0b0e:1 Vertices: Map 1 Map Operator Tree: TableScan alias: customer_address Statistics: Num rows: 4000 Data size: 40595195284 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: ca_address_sk (type: int), ca_state (type: string), ca_country (type: string) outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 4000 Data size: 40595195284 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator sort order: Statistics: Num rows: 4000 Data size: 40595195284 Basic stats: COMPLETE Column stats: NONE value expressions: _col0 (type: int), _col1 (type: string), _col2 (type: string) Execution mode: vectorized Map 4 Map Operator Tree: TableScan alias: date_dim filterExpr: ((d_year = 2001) and d_date_sk is not null) (type: boolean) Statistics: Num rows: 73049 Data size: 81741831 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: ((d_year = 2001) and d_date_sk is not null) (type: boolean) Statistics: Num rows: 18262 Data size: 20435178 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: d_date_sk (type: int) outputColumnNames: _col0 Statistics: Num rows: 18262 Data size: 20435178 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: _col0 (type: int) sort order: +
[jira] [Updated] (HIVE-8031) CBO should use per column join selectivity not NDV when applying exponential backoff.
[ https://issues.apache.org/jira/browse/HIVE-8031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mostafa Mokhtar updated HIVE-8031: -- Description: Currently CBO uses NDV not join selectivity in computeInnerJoinSelectivity which results in in-accurate estimate number of rows. I looked at the plan for TPC-DS Q17 after the latest set of changes and I am concerned that the estimate of rows for the join of store_sales and store_returns is so low, as you can see the estimate is 8461 rows for joining 1.2795706667449066E8 with 1.2922108035889767E7. {code} HiveJoinRel(condition=[AND(=($130, $3), =($129, $15))], joinType=[inner]): rowcount = 1079.1345153548855, cumulative cost = {8.271845957931738E10 rows, 0.0 cpu, 0.0 io}, id = 517 HiveJoinRel(condition=[=($0, $38)], joinType=[inner]): rowcount = 6.669190301841249E7, cumulative cost = {4.300510912631623E10 rows, 0.0 cpu, 0.0 io}, id = 402 HiveTableScanRel(table=[[catalog_sales]]): rowcount = 4.3005109025E10, cumulative cost = {0}, id = 2 HiveFilterRel(condition=[in($15, '2000Q1', '2000Q2', '2000Q3')]): rowcount = 101.31622746185853, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 181 HiveTableScanRel(table=[[d3]]): rowcount = 73049.0, cumulative cost = {0}, id = 3 HiveJoinRel(condition=[AND(AND(=($3, $61), =($2, $60)), =($9, $67))], joinType=[inner]): rowcount = 8461.27236667537, cumulative cost = {8.26517592150266E10 rows, 0.0 cpu, 0.0 io}, id = 515 HiveJoinRel(condition=[=($27, $0)], joinType=[inner]): rowcount = 1.2795706667449066E8, cumulative cost = {8.251088004031622E10 rows, 0.0 cpu, 0.0 io}, id = 417 HiveTableScanRel(table=[[store_sales]]): rowcount = 8.2510879939E10, cumulative cost = {0}, id = 5 HiveFilterRel(condition=[=($15, '2000Q1')]): rowcount = 101.31622746185853, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 173 HiveTableScanRel(table=[[d1]]): rowcount = 73049.0, cumulative cost = {0}, id = 0 HiveJoinRel(condition=[=($0, $24)], joinType=[inner]): rowcount = 1.2922108035889767E7, cumulative cost = {8.332595810316228E9 rows, 0.0 cpu, 0.0 io}, id = 424 HiveTableScanRel(table=[[store_returns]]): rowcount = 8.332595709E9, cumulative cost = {0}, id = 7 HiveFilterRel(condition=[in($15, '2000Q1', '2000Q2', '2000Q3')]): rowcount = 101.31622746185853, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 177 HiveTableScanRel(table=[[d2]]): rowcount = 73049.0, cumulative cost = {0}, id = 1 {code} CBO should use per column join selectivity not NDV when applying exponential backoff. - Key: HIVE-8031 URL: https://issues.apache.org/jira/browse/HIVE-8031 Project: Hive Issue Type: Bug Components: CBO Affects Versions: 0.14.0, 0.13.1 Reporter: Mostafa Mokhtar Assignee: Harish Butani Fix For: 0.14.0 Currently CBO uses NDV not join selectivity in computeInnerJoinSelectivity which results in in-accurate estimate number of rows. I looked at the plan for TPC-DS Q17 after the latest set of changes and I am concerned that the estimate of rows for the join of store_sales and store_returns is so low, as you can see the estimate is 8461 rows for joining 1.2795706667449066E8 with 1.2922108035889767E7. {code} HiveJoinRel(condition=[AND(=($130, $3), =($129, $15))], joinType=[inner]): rowcount = 1079.1345153548855, cumulative cost = {8.271845957931738E10 rows, 0.0 cpu, 0.0 io}, id = 517 HiveJoinRel(condition=[=($0, $38)], joinType=[inner]): rowcount = 6.669190301841249E7, cumulative cost = {4.300510912631623E10 rows, 0.0 cpu, 0.0 io}, id = 402 HiveTableScanRel(table=[[catalog_sales]]): rowcount = 4.3005109025E10, cumulative cost = {0}, id = 2 HiveFilterRel(condition=[in($15, '2000Q1', '2000Q2', '2000Q3')]): rowcount = 101.31622746185853, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 181 HiveTableScanRel(table=[[d3]]): rowcount = 73049.0, cumulative cost = {0}, id = 3 HiveJoinRel(condition=[AND(AND(=($3, $61), =($2, $60)), =($9, $67))], joinType=[inner]): rowcount = 8461.27236667537, cumulative cost = {8.26517592150266E10 rows, 0.0 cpu, 0.0 io}, id = 515 HiveJoinRel(condition=[=($27, $0)], joinType=[inner]): rowcount = 1.2795706667449066E8, cumulative cost = {8.251088004031622E10 rows, 0.0 cpu, 0.0 io}, id = 417 HiveTableScanRel(table=[[store_sales]]): rowcount = 8.2510879939E10, cumulative cost = {0}, id = 5
[jira] [Updated] (HIVE-8031) CBO should use per column join selectivity not NDV when applying exponential backoff.
[ https://issues.apache.org/jira/browse/HIVE-8031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mostafa Mokhtar updated HIVE-8031: -- Affects Version/s: 0.14.0 CBO should use per column join selectivity not NDV when applying exponential backoff. - Key: HIVE-8031 URL: https://issues.apache.org/jira/browse/HIVE-8031 Project: Hive Issue Type: Bug Components: CBO Affects Versions: 0.14.0, 0.13.1 Reporter: Mostafa Mokhtar Assignee: Harish Butani Fix For: 0.14.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8029) Remove reducers number configure in SparkTask[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127051#comment-14127051 ] Brock Noland commented on HIVE-8029: +1 Remove reducers number configure in SparkTask[Spark Branch] --- Key: HIVE-8029 URL: https://issues.apache.org/jira/browse/HIVE-8029 Project: Hive Issue Type: Improvement Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Labels: Spark-M4 Attachments: HIVE-8029.1-spark.patch We do not need duplicated logic to configure reducers number in SparkTask, as SetSparkReduceParallelism would always set reducers number in compiler phase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-8032) Fix TestSparkCliDriver = optimize_nullscan.q
Brock Noland created HIVE-8032: -- Summary: Fix TestSparkCliDriver = optimize_nullscan.q Key: HIVE-8032 URL: https://issues.apache.org/jira/browse/HIVE-8032 Project: Hive Issue Type: Sub-task Reporter: Brock Noland -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8032) Fix TestSparkCliDriver = optimize_nullscan.q
[ https://issues.apache.org/jira/browse/HIVE-8032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-8032: --- Description: It's been failing lately, perhaps since the last merge from trunk. Fix TestSparkCliDriver = optimize_nullscan.q - Key: HIVE-8032 URL: https://issues.apache.org/jira/browse/HIVE-8032 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Brock Noland It's been failing lately, perhaps since the last merge from trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8017) Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127054#comment-14127054 ] Brock Noland commented on HIVE-8017: I don't think nullscan is related as it's been failing for other runs. I created HIVE-8032 to fix that. Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark Branch] --- Key: HIVE-8017 URL: https://issues.apache.org/jira/browse/HIVE-8017 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Rui Li Assignee: Rui Li Attachments: HIVE-8017-spark.patch, HIVE-8017.2-spark.patch HiveKey should be used as the key type because it holds the hash code for partitioning. While BytesWritable serves partitioning well for simple cases, we have to use {{HiveKey.hashCode}} for more complicated ones, e.g. join, bucketed table, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8029) Remove reducers number configure in SparkTask[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127055#comment-14127055 ] Brock Noland commented on HIVE-8029: I don't think nullscan is related as it's been failing for other runs. I created HIVE-8032 to fix that. Remove reducers number configure in SparkTask[Spark Branch] --- Key: HIVE-8029 URL: https://issues.apache.org/jira/browse/HIVE-8029 Project: Hive Issue Type: Improvement Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Labels: Spark-M4 Attachments: HIVE-8029.1-spark.patch We do not need duplicated logic to configure reducers number in SparkTask, as SetSparkReduceParallelism would always set reducers number in compiler phase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8029) Remove reducers number configure in SparkTask [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-8029: --- Summary: Remove reducers number configure in SparkTask [Spark Branch] (was: Remove reducers number configure in SparkTask[Spark Branch]) Remove reducers number configure in SparkTask [Spark Branch] Key: HIVE-8029 URL: https://issues.apache.org/jira/browse/HIVE-8029 Project: Hive Issue Type: Improvement Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Labels: Spark-M4 Attachments: HIVE-8029.1-spark.patch We do not need duplicated logic to configure reducers number in SparkTask, as SetSparkReduceParallelism would always set reducers number in compiler phase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8029) Remove reducers number configure in SparkTask [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-8029: --- Resolution: Fixed Fix Version/s: spark-branch Status: Resolved (was: Patch Available) Thank you Chengxiang! I have committed this to spark. Note in the commit I actually said Rui since I was just reviewing HIVE-8017. I apologize for this mistake, but since the JIRA is assigned to you, you will still get the appropriate accreditation for the patch. Remove reducers number configure in SparkTask [Spark Branch] Key: HIVE-8029 URL: https://issues.apache.org/jira/browse/HIVE-8029 Project: Hive Issue Type: Improvement Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Labels: Spark-M4 Fix For: spark-branch Attachments: HIVE-8029.1-spark.patch We do not need duplicated logic to configure reducers number in SparkTask, as SetSparkReduceParallelism would always set reducers number in compiler phase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8023) Code in HIVE-6380 eats exceptions
[ https://issues.apache.org/jira/browse/HIVE-8023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-8023: --- Resolution: Fixed Fix Version/s: 0.14.0 Assignee: Jason Dere Status: Resolved (was: Patch Available) Thank you Jason! I have committed this to trunk. Code in HIVE-6380 eats exceptions - Key: HIVE-8023 URL: https://issues.apache.org/jira/browse/HIVE-8023 Project: Hive Issue Type: Bug Reporter: Brock Noland Assignee: Jason Dere Fix For: 0.14.0 Attachments: HIVE-8023.1.patch This code eats the stack trace {noformat} LOG.error(Unable to load resources for + dbName + . + fName + : + e); {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8012) TestHiveServer2Concurrency is not implemented
[ https://issues.apache.org/jira/browse/HIVE-8012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-8012: --- Resolution: Fixed Fix Version/s: 0.14.0 Status: Resolved (was: Patch Available) Thank you Jason! I have committed to trunk. TestHiveServer2Concurrency is not implemented - Key: HIVE-8012 URL: https://issues.apache.org/jira/browse/HIVE-8012 Project: Hive Issue Type: Bug Reporter: Jason Dere Assignee: Jason Dere Fix For: 0.14.0 Attachments: HIVE-8012.1.patch {code} @Test public void test() { fail(Not yet implemented); } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 25468: HIVE-7777: add CSVSerde support
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25468/#review52723 --- Great work! serde/pom.xml https://reviews.apache.org/r/25468/#comment91700 These should only be indented by two spaces, not four. Have you tried submitting an MR job on a cluster with this patch? The reason I ask is that I think the serde must be in here: https://github.com/apache/hive/blob/trunk/ql/pom.xml#L563 for it to be available to MR jobs. serde/src/java/org/apache/hadoop/hive/serde2/CSVSerde.java https://reviews.apache.org/r/25468/#comment91701 I think we should call this OpenCSVSerde since it's based on OpenCSV and I believe we might see multiple implementations of CSVSerde. I think we should extend AbstractSerde as that is what all the new Serdes are supposed to be doing. - Brock Noland On Sept. 9, 2014, 2:16 a.m., cheng xu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25468/ --- (Updated Sept. 9, 2014, 2:16 a.m.) Review request for hive. Bugs: HIVE- https://issues.apache.org/jira/browse/HIVE- Repository: hive-git Description --- HIVE-: add CSVSerde support Diffs - serde/pom.xml f8bcc830cfb298d739819db8fbaa2f98f221ccf3 serde/src/java/org/apache/hadoop/hive/serde2/CSVSerde.java PRE-CREATION serde/src/test/org/apache/hadoop/hive/serde2/TestCSVSerde.java PRE-CREATION Diff: https://reviews.apache.org/r/25468/diff/ Testing --- Unit test Thanks, cheng xu
[jira] [Comment Edited] (HIVE-5871) Use multiple-characters as field delimiter
[ https://issues.apache.org/jira/browse/HIVE-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127076#comment-14127076 ] Brock Noland edited comment on HIVE-5871 at 9/9/14 3:11 PM: I think for Hive, you need committer privs to get the ability to edit comments. Let me see if we can relax this. was (Author: brocknoland): I think for Hive, you need committer privs to get the ability to edit comments. Let me see if we can relax this. TEST EDIT. Use multiple-characters as field delimiter -- Key: HIVE-5871 URL: https://issues.apache.org/jira/browse/HIVE-5871 Project: Hive Issue Type: Improvement Components: Contrib Affects Versions: 0.12.0 Reporter: Rui Li Assignee: Rui Li Labels: TODOC14 Fix For: 0.14.0 Attachments: HIVE-5871.2.patch, HIVE-5871.3.patch, HIVE-5871.4.patch, HIVE-5871.5.patch, HIVE-5871.6.patch, HIVE-5871.patch By default, hive only allows user to use single character as field delimiter. Although there's RegexSerDe to specify multiple-character delimiter, it can be daunting to use, especially for amateurs. In the patch, I add a new SerDe named MultiDelimitSerDe. With MultiDelimitSerDe, users can specify a multiple-character field delimiter when creating tables, in a way most similar to typical table creations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-5871) Use multiple-characters as field delimiter
[ https://issues.apache.org/jira/browse/HIVE-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127076#comment-14127076 ] Brock Noland commented on HIVE-5871: I think for Hive, you need committer privs to get the ability to edit comments. Let me see if we can relax this. TEST EDIT. Use multiple-characters as field delimiter -- Key: HIVE-5871 URL: https://issues.apache.org/jira/browse/HIVE-5871 Project: Hive Issue Type: Improvement Components: Contrib Affects Versions: 0.12.0 Reporter: Rui Li Assignee: Rui Li Labels: TODOC14 Fix For: 0.14.0 Attachments: HIVE-5871.2.patch, HIVE-5871.3.patch, HIVE-5871.4.patch, HIVE-5871.5.patch, HIVE-5871.6.patch, HIVE-5871.patch By default, hive only allows user to use single character as field delimiter. Although there's RegexSerDe to specify multiple-character delimiter, it can be daunting to use, especially for amateurs. In the patch, I add a new SerDe named MultiDelimitSerDe. With MultiDelimitSerDe, users can specify a multiple-character field delimiter when creating tables, in a way most similar to typical table creations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7689) Enable Postgres as METASTORE back-end
[ https://issues.apache.org/jira/browse/HIVE-7689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127079#comment-14127079 ] Hive QA commented on HIVE-7689: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12667375/HIVE-7689.6.patch {color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 6185 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager.testDDLExclusive org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager.testDDLShared org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager.testDelete org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager.testJoin org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager.testReadWrite org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager.testRollback org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager.testSingleReadMultiPartition org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager.testSingleReadPartition org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager.testSingleReadTable org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager.testSingleWritePartition org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager.testSingleWriteTable org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager.testUpdate org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/708/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/708/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-708/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 13 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12667375 Enable Postgres as METASTORE back-end - Key: HIVE-7689 URL: https://issues.apache.org/jira/browse/HIVE-7689 Project: Hive Issue Type: Improvement Components: Metastore Affects Versions: 0.14.0 Reporter: Damien Carol Assignee: Damien Carol Priority: Minor Labels: metastore, postgres Fix For: 0.14.0 Attachments: HIVE-7689.5.patch, HIVE-7689.6.patch, HIVE-7889.1.patch, HIVE-7889.2.patch, HIVE-7889.3.patch, HIVE-7889.4.patch I maintain few patches to make Metastore works with Postgres back end in our production environment. The main goal of this JIRA is to push upstream these patches. This patch enable LOCKS, COMPACTION and fix error in STATS on postgres metastore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8017) Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127087#comment-14127087 ] Brock Noland commented on HIVE-8017: bq. do you have any comments as how to use the {{-- SORT_BEFORE_DIFF}} label? I am surprised the query failed if that was before the the set commands? I think the big item is to ensure that the comment flag {{--}} is before the text {{SORT_BEFORE_DIFF}}. The code which implements this is here: https://github.com/apache/hive/blob/trunk/itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java#L416 Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark Branch] --- Key: HIVE-8017 URL: https://issues.apache.org/jira/browse/HIVE-8017 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Rui Li Assignee: Rui Li Attachments: HIVE-8017-spark.patch, HIVE-8017.2-spark.patch HiveKey should be used as the key type because it holds the hash code for partitioning. While BytesWritable serves partitioning well for simple cases, we have to use {{HiveKey.hashCode}} for more complicated ones, e.g. join, bucketed table, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-6147) Support avro data stored in HBase columns
[ https://issues.apache.org/jira/browse/HIVE-6147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-6147: --- Resolution: Fixed Fix Version/s: 0.14.0 Status: Resolved (was: Patch Available) Thank you Swarnim! I have committed this to trunk! Support avro data stored in HBase columns - Key: HIVE-6147 URL: https://issues.apache.org/jira/browse/HIVE-6147 Project: Hive Issue Type: Improvement Components: HBase Handler Affects Versions: 0.12.0, 0.13.0 Reporter: Swarnim Kulkarni Assignee: Swarnim Kulkarni Fix For: 0.14.0 Attachments: HIVE-6147.1.patch.txt, HIVE-6147.2.patch.txt, HIVE-6147.3.patch.txt, HIVE-6147.3.patch.txt, HIVE-6147.4.patch.txt, HIVE-6147.5.patch.txt, HIVE-6147.6.patch.txt Presently, the HBase Hive integration supports querying only primitive data types in columns. It would be nice to be able to store and query Avro objects in HBase columns by making them visible as structs to Hive. This will allow Hive to perform ad hoc analysis of HBase data which can be deeply structured. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-8033) StorageBasedAuthorizationProvider too restrictive on insert/select
Alan Gates created HIVE-8033: Summary: StorageBasedAuthorizationProvider too restrictive on insert/select Key: HIVE-8033 URL: https://issues.apache.org/jira/browse/HIVE-8033 Project: Hive Issue Type: Bug Components: Authorization Affects Versions: 0.13.1 Reporter: Alan Gates When doing {code} insert into table foo select * from bar {code} StorageBasedAuth checks that the user has write permissions on bar. It only needs read permission on bar for this operation. To reproduce: # As user1, create a table bar with file permissions set to world readable but not writable. # As user2, create table foo. # Confirm that user2 can read from bar: select count(*) from bar; # As user2: insert into foo select * from bar; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8017) Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127157#comment-14127157 ] Xuefu Zhang commented on HIVE-8017: --- Hi Rui, {quote} Have you tried – SORT_BEFORE_DIFF, which is to sort the query result, which is different from SORT_BEFORE_DIFF. It's supposed to make query result sorted w/o an order by. {quote} This comment indeeded made no sense. Sorry for the confusion, I meant -- SORT_QUERY_RESULTS for the first -- SORT_BEFORE_DIFF. So my comment should really go like this: {bold} Have you tried – SORT_QUERY_RESULTS, which is to sort the query result, which is different from SORT_BEFORE_DIFF. It's supposed to make query result sorted w/o an order by. {bold} SORT_BEFORE_DIFF sorts the output files before making diff. It's less reliable, and sometimes the diff doesn't tell what's wrong. Thus, I think we should prefer SORT_QUERY_RESULTS when query output can diff in order. Neither do I know why SORT_BEFORE_DIFF has to come after set commands in your case, but it seems the usage of it in the .q files is not consistent on that front. Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark Branch] --- Key: HIVE-8017 URL: https://issues.apache.org/jira/browse/HIVE-8017 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Rui Li Assignee: Rui Li Attachments: HIVE-8017-spark.patch, HIVE-8017.2-spark.patch HiveKey should be used as the key type because it holds the hash code for partitioning. While BytesWritable serves partitioning well for simple cases, we have to use {{HiveKey.hashCode}} for more complicated ones, e.g. join, bucketed table, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-8017) Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127157#comment-14127157 ] Xuefu Zhang edited comment on HIVE-8017 at 9/9/14 4:17 PM: --- Hi Rui, {quote} Have you try – SORT_BEFORE_DIFF, which is to sort the query result, which is different from SORT_BEFORE_DIFF. It's supposed to make query result sorted w/o an order by. {quote} This comment indeeded made no sense. Sorry for the confusion, I meant -- SORT_QUERY_RESULTS for the first -- SORT_BEFORE_DIFF. So my comment should really go like this: {quote} Have you tried – SORT_QUERY_RESULTS, which is to sort the query result, which is different from SORT_BEFORE_DIFF. It's supposed to make query result sorted w/o an order by. {quote} SORT_BEFORE_DIFF sorts the output files before making diff. It's less reliable, and sometimes the diff doesn't tell what's wrong. Thus, I think we should prefer SORT_QUERY_RESULTS when query output can diff in order. Neither do I know why SORT_BEFORE_DIFF has to come after set commands in your case, but it seems the usage of it in the .q files is not consistent on that front. was (Author: xuefuz): Hi Rui, {quote} Have you tried – SORT_BEFORE_DIFF, which is to sort the query result, which is different from SORT_BEFORE_DIFF. It's supposed to make query result sorted w/o an order by. {quote} This comment indeeded made no sense. Sorry for the confusion, I meant -- SORT_QUERY_RESULTS for the first -- SORT_BEFORE_DIFF. So my comment should really go like this: {bold} Have you tried – SORT_QUERY_RESULTS, which is to sort the query result, which is different from SORT_BEFORE_DIFF. It's supposed to make query result sorted w/o an order by. {bold} SORT_BEFORE_DIFF sorts the output files before making diff. It's less reliable, and sometimes the diff doesn't tell what's wrong. Thus, I think we should prefer SORT_QUERY_RESULTS when query output can diff in order. Neither do I know why SORT_BEFORE_DIFF has to come after set commands in your case, but it seems the usage of it in the .q files is not consistent on that front. Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark Branch] --- Key: HIVE-8017 URL: https://issues.apache.org/jira/browse/HIVE-8017 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Rui Li Assignee: Rui Li Attachments: HIVE-8017-spark.patch, HIVE-8017.2-spark.patch HiveKey should be used as the key type because it holds the hash code for partitioning. While BytesWritable serves partitioning well for simple cases, we have to use {{HiveKey.hashCode}} for more complicated ones, e.g. join, bucketed table, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8017) Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127184#comment-14127184 ] Xuefu Zhang commented on HIVE-8017: --- {quote} For union_remove_25.q, the only diff is the total size of outputTbl2 (6812 - 6826). I checked the MR version and the total size is also 6812. I'm not sure what causes this difference. {quote} I have no clue either, but I had the same observation before. Nevertheless, this should be okay since the output data matches. It doesn't seem worth the time to drill down, at least for now. Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark Branch] --- Key: HIVE-8017 URL: https://issues.apache.org/jira/browse/HIVE-8017 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Rui Li Assignee: Rui Li Attachments: HIVE-8017-spark.patch, HIVE-8017.2-spark.patch HiveKey should be used as the key type because it holds the hash code for partitioning. While BytesWritable serves partitioning well for simple cases, we have to use {{HiveKey.hashCode}} for more complicated ones, e.g. join, bucketed table, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-5545) HCatRecord getInteger method returns String when used on Partition columns of type INT
[ https://issues.apache.org/jira/browse/HIVE-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127198#comment-14127198 ] Rishav Rohit commented on HIVE-5545: [~eugene.koifman] Attached is the error stack- {quote} 13/10/11 21:06:03 INFO mapred.JobClient: Task Id : attempt_201310112040_0005_m_00_2, Status : FAILED java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Integer at org.apache.hcatalog.data.HCatRecord.getInteger(HCatRecord.java:84) at com.test.hcatalog.testMapper.map(testMapper.java:25) at com.test.hcatalog.testMapper.map(testMapper.java:1) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:416) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) at org.apache.hadoop.mapred.Child.main(Child.java:249) {quote} HCatRecord getInteger method returns String when used on Partition columns of type INT -- Key: HIVE-5545 URL: https://issues.apache.org/jira/browse/HIVE-5545 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.11.0 Environment: hadoop-1.0.3 Reporter: Rishav Rohit HCatRecord getInteger method returns String when used on Partition columns of type INT. java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Integer -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7892) Thrift Set type not working with Hive
[ https://issues.apache.org/jira/browse/HIVE-7892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127199#comment-14127199 ] Hive QA commented on HIVE-7892: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12667390/HIVE-7892.patch.txt {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6186 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_convert_enum_to_string org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/709/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/709/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-709/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12667390 Thrift Set type not working with Hive - Key: HIVE-7892 URL: https://issues.apache.org/jira/browse/HIVE-7892 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Reporter: Satish Mittal Assignee: Satish Mittal Attachments: HIVE-7892.patch.txt Thrift supports List, Map and Struct complex types, which get mapped to Array, Map and Struct complex types in Hive respectively. However thrift Set type doesn't seem to be working. Here is an example thrift struct: {noformat} namespace java sample.thrift struct setrow { 1: required seti32 ids, 2: required string name, } {noformat} A Hive table is created with ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.thrift.ThriftDeserializer' WITH SERDEPROPERTIES ('serialization.class'='sample.thrift.setrow', 'serialization.format'='org.apache.thrift.protocol.TBinaryProtocol'). Describing the table shows: {noformat} hive describe settable; OK ids structfrom deserializer namestringfrom deserializer {noformat} Issuing a select query on set column throws SemanticException: {noformat} hive select ids from settable; FAILED: SemanticException java.lang.IllegalArgumentException: Error: name expected at the position 7 of 'struct' but '' is found. {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-5545) HCatRecord getInteger method returns String when used on Partition columns of type INT
[ https://issues.apache.org/jira/browse/HIVE-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127202#comment-14127202 ] Rishav Rohit commented on HIVE-5545: testMapper.java:25 is {quote} Integer year = new Integer(value.getInteger(year, schema)); {quote} HCatRecord getInteger method returns String when used on Partition columns of type INT -- Key: HIVE-5545 URL: https://issues.apache.org/jira/browse/HIVE-5545 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.11.0 Environment: hadoop-1.0.3 Reporter: Rishav Rohit HCatRecord getInteger method returns String when used on Partition columns of type INT. java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Integer -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7405) Vectorize GROUP BY on the Reduce-Side (Part 1 – Basic)
[ https://issues.apache.org/jira/browse/HIVE-7405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127225#comment-14127225 ] Ashutosh Chauhan commented on HIVE-7405: Do we really need AggregreateMapReduceUsage enum? Seems like GroupbyDesc.Mode can be used instead as follows: AggregreateMapReduceUsage.MAP - Mode.Hash AggregreateMapReduceUsage.REDUCE - Mode.MergePartial AggregreateMapReduceUsage.MAP_REDUCE - Mode.all_other If possible, we should reuse GroupbyDesc.Mode, otherwise these modes can be mixed and matched and will lead to explosion of combinations. Vectorize GROUP BY on the Reduce-Side (Part 1 – Basic) -- Key: HIVE-7405 URL: https://issues.apache.org/jira/browse/HIVE-7405 Project: Hive Issue Type: Sub-task Components: Vectorization Reporter: Matt McCline Assignee: Matt McCline Attachments: HIVE-7405.1.patch, HIVE-7405.2.patch, HIVE-7405.3.patch, HIVE-7405.4.patch, HIVE-7405.5.patch, HIVE-7405.6.patch, HIVE-7405.7.patch, HIVE-7405.8.patch, HIVE-7405.9.patch, HIVE-7405.91.patch, HIVE-7405.92.patch, HIVE-7405.93.patch, HIVE-7405.94.patch, HIVE-7405.95.patch, HIVE-7405.96.patch, HIVE-7405.97.patch, HIVE-7405.98.patch, HIVE-7405.99.patch, HIVE-7405.991.patch, HIVE-7405.994.patch, HIVE-7405.995.patch Vectorize the basic case that does not have any count distinct aggregation. Add a 4th processing mode in VectorGroupByOperator for reduce where each input VectorizedRowBatch has only values for one key at a time. Thus, the values in the batch can be aggregated quickly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-5871) Use multiple-characters as field delimiter
[ https://issues.apache.org/jira/browse/HIVE-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127242#comment-14127242 ] Lefty Leverenz commented on HIVE-5871: -- Thanks for that explanation, [~brocknoland]. Live and learn. [~lirui], since I have edit permission you can tell me what to change and I'll do it for you. That will help avoid confusion for JIRA surfers. Use multiple-characters as field delimiter -- Key: HIVE-5871 URL: https://issues.apache.org/jira/browse/HIVE-5871 Project: Hive Issue Type: Improvement Components: Contrib Affects Versions: 0.12.0 Reporter: Rui Li Assignee: Rui Li Labels: TODOC14 Fix For: 0.14.0 Attachments: HIVE-5871.2.patch, HIVE-5871.3.patch, HIVE-5871.4.patch, HIVE-5871.5.patch, HIVE-5871.6.patch, HIVE-5871.patch By default, hive only allows user to use single character as field delimiter. Although there's RegexSerDe to specify multiple-character delimiter, it can be daunting to use, especially for amateurs. In the patch, I add a new SerDe named MultiDelimitSerDe. With MultiDelimitSerDe, users can specify a multiple-character field delimiter when creating tables, in a way most similar to typical table creations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7689) Enable Postgres as METASTORE back-end
[ https://issues.apache.org/jira/browse/HIVE-7689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Carol updated HIVE-7689: --- Attachment: HIVE-7689.7.patch Fix error in unit tests Enable Postgres as METASTORE back-end - Key: HIVE-7689 URL: https://issues.apache.org/jira/browse/HIVE-7689 Project: Hive Issue Type: Improvement Components: Metastore Affects Versions: 0.14.0 Reporter: Damien Carol Assignee: Damien Carol Priority: Minor Labels: metastore, postgres Fix For: 0.14.0 Attachments: HIVE-7689.5.patch, HIVE-7689.6.patch, HIVE-7689.7.patch, HIVE-7889.1.patch, HIVE-7889.2.patch, HIVE-7889.3.patch, HIVE-7889.4.patch I maintain few patches to make Metastore works with Postgres back end in our production environment. The main goal of this JIRA is to push upstream these patches. This patch enable LOCKS, COMPACTION and fix error in STATS on postgres metastore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7989) Optimize Windowing function performance for row frames
[ https://issues.apache.org/jira/browse/HIVE-7989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127282#comment-14127282 ] Ankit Kamboj commented on HIVE-7989: Looks like the tests that failed are not due to the patch itself (ptf-windowing tests are part of ql module). Could somebody take a quick look and advise? Optimize Windowing function performance for row frames -- Key: HIVE-7989 URL: https://issues.apache.org/jira/browse/HIVE-7989 Project: Hive Issue Type: Improvement Components: PTF-Windowing Affects Versions: 0.13.0 Reporter: Ankit Kamboj Attachments: HIVE-7989.patch To find aggregate value for each row, current windowing function implementation creates a new aggregation buffer for each row, iterates over all the rows in respective window frame, puts them in buffer and then finds the aggregated value. This causes bottleneck for partitions with huge number of rows because this process runs in n-square complexity (n being rows in a partition) for each partition. So, if there are multiple partitions in a dataset, each with millions of rows, aggregation for all rows will take days to finish. There is scope of optimization for row frames, for following cases: a) For UNBOUNDED PRECEDING start and bounded end: Instead of iterating on window frame again for each row, we can slide the end one row at a time and aggregate, since we know the start is fixed for each row. This will have running time linear to the size of partition. b) For bounded start and UNBOUNDED FOLLOWING end: Instead of iterating on window frame again for each row, we can slide the start one row at a time and aggregate in reverse, since we know the end is fixed for each row. This will have running time linear to the size of partition. Also, In general for both row and value frames, we don't need to iterate over the range and re-create aggregation buffer if the start as well as end remain same. Instead, can re-use the previously created aggregation buffer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 25468: HIVE-7777: add CSVSerde support
On Sept. 9, 2014, 8:49 a.m., Lars Francke wrote: serde/src/java/org/apache/hadoop/hive/serde2/CSVSerde.java, line 31 https://reviews.apache.org/r/25468/diff/1/?file=683467#file683467line31 This comment doesn't add value so I suggest removing it. Or you could expand the comment. - Lefty --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25468/#review52688 --- On Sept. 9, 2014, 2:16 a.m., cheng xu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25468/ --- (Updated Sept. 9, 2014, 2:16 a.m.) Review request for hive. Bugs: HIVE- https://issues.apache.org/jira/browse/HIVE- Repository: hive-git Description --- HIVE-: add CSVSerde support Diffs - serde/pom.xml f8bcc830cfb298d739819db8fbaa2f98f221ccf3 serde/src/java/org/apache/hadoop/hive/serde2/CSVSerde.java PRE-CREATION serde/src/test/org/apache/hadoop/hive/serde2/TestCSVSerde.java PRE-CREATION Diff: https://reviews.apache.org/r/25468/diff/ Testing --- Unit test Thanks, cheng xu
[jira] [Updated] (HIVE-7936) Support for handling Thrift Union types
[ https://issues.apache.org/jira/browse/HIVE-7936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suma Shivaprasad updated HIVE-7936: --- Attachment: HIVE-7936.patch Support for handling Thrift Union types Key: HIVE-7936 URL: https://issues.apache.org/jira/browse/HIVE-7936 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 0.13.1 Reporter: Suma Shivaprasad Assignee: Suma Shivaprasad Attachments: HIVE-7936.patch Currently hive does not support thrift unions through ThriftDeserializer. Need to add support for the same -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7936) Support for handling Thrift Union types
[ https://issues.apache.org/jira/browse/HIVE-7936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suma Shivaprasad updated HIVE-7936: --- Fix Version/s: 0.14.0 Status: Patch Available (was: In Progress) Support for handling Thrift Union types Key: HIVE-7936 URL: https://issues.apache.org/jira/browse/HIVE-7936 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 0.13.1 Reporter: Suma Shivaprasad Assignee: Suma Shivaprasad Fix For: 0.14.0 Attachments: HIVE-7936.patch Currently hive does not support thrift unions through ThriftDeserializer. Need to add support for the same -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7100) Users of hive should be able to specify skipTrash when dropping tables.
[ https://issues.apache.org/jira/browse/HIVE-7100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127298#comment-14127298 ] Mithun Radhakrishnan commented on HIVE-7100: [~dbsalti], [~xuefuz], I agree. Another JIRA for dropPartitions(). Users of hive should be able to specify skipTrash when dropping tables. --- Key: HIVE-7100 URL: https://issues.apache.org/jira/browse/HIVE-7100 Project: Hive Issue Type: Improvement Affects Versions: 0.13.0 Reporter: Ravi Prakash Assignee: Jayesh Attachments: HIVE-7100.1.patch, HIVE-7100.2.patch, HIVE-7100.3.patch, HIVE-7100.4.patch, HIVE-7100.5.patch, HIVE-7100.8.patch, HIVE-7100.patch Users of our clusters are often running up against their quota limits because of Hive tables. When they drop tables, they have to then manually delete the files from HDFS using skipTrash. This is cumbersome and unnecessary. We should enable users to skipTrash directly when dropping tables. We should also be able to provide this functionality without polluting SQL syntax. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7694) SMB join on tables differing by number of sorted by columns with same join prefix fails
[ https://issues.apache.org/jira/browse/HIVE-7694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127310#comment-14127310 ] Hive QA commented on HIVE-7694: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12667389/HIVE-7694.2.patch {color:green}SUCCESS:{color} +1 6193 tests passed Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/710/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/710/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-710/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12667389 SMB join on tables differing by number of sorted by columns with same join prefix fails --- Key: HIVE-7694 URL: https://issues.apache.org/jira/browse/HIVE-7694 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.13.1 Reporter: Suma Shivaprasad Assignee: Suma Shivaprasad Fix For: 0.14.0 Attachments: HIVE-7694.1.patch, HIVE-7694.2.patch, HIVE-7694.patch For eg: If two tables T1 sorted by (a, b, c) clustered by a and T2 sorted by (a) and clustered by (a) are joined, the following exception is seen {noformat} 14/08/11 09:09:38 ERROR ql.Driver: FAILED: IndexOutOfBoundsException Index: 1, Size: 1 java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 at java.util.ArrayList.RangeCheck(ArrayList.java:547) at java.util.ArrayList.get(ArrayList.java:322) at org.apache.hadoop.hive.ql.optimizer.AbstractSMBJoinProc.checkSortColsAndJoinCols(AbstractSMBJoinProc.java:378) at org.apache.hadoop.hive.ql.optimizer.AbstractSMBJoinProc.isEligibleForBucketSortMergeJoin(AbstractSMBJoinProc.java:352) at org.apache.hadoop.hive.ql.optimizer.AbstractSMBJoinProc.canConvertBucketMapJoinToSMBJoin(AbstractSMBJoinProc.java:119) at org.apache.hadoop.hive.ql.optimizer.SortedMergeBucketMapjoinProc.process(SortedMergeBucketMapjoinProc.java:51) at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:78) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:132) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:109) at org.apache.hadoop.hive.ql.optimizer.SortedMergeBucketMapJoinOptimizer.transform(SortedMergeBucketMapJoinOptimizer.java:109) at org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:146) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:9305) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327) at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:64) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:393) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7100) Users of hive should be able to specify skipTrash when dropping tables.
[ https://issues.apache.org/jira/browse/HIVE-7100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127313#comment-14127313 ] Xuefu Zhang commented on HIVE-7100: --- [~dbsalti] Could you please update RB with your latest patch? Users of hive should be able to specify skipTrash when dropping tables. --- Key: HIVE-7100 URL: https://issues.apache.org/jira/browse/HIVE-7100 Project: Hive Issue Type: Improvement Affects Versions: 0.13.0 Reporter: Ravi Prakash Assignee: Jayesh Attachments: HIVE-7100.1.patch, HIVE-7100.2.patch, HIVE-7100.3.patch, HIVE-7100.4.patch, HIVE-7100.5.patch, HIVE-7100.8.patch, HIVE-7100.patch Users of our clusters are often running up against their quota limits because of Hive tables. When they drop tables, they have to then manually delete the files from HDFS using skipTrash. This is cumbersome and unnecessary. We should enable users to skipTrash directly when dropping tables. We should also be able to provide this functionality without polluting SQL syntax. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 23352: Support non-constant expressions for MAP type indices.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/23352/#review52751 --- ql/src/java/org/apache/hadoop/hive/ql/parse/TypeCheckProcFactory.java https://reviews.apache.org/r/23352/#comment91752 I think you can use FunctionRegistry.implicitConvertable() here rather than having to create a new method. ql/src/java/org/apache/hadoop/hive/ql/parse/TypeCheckProcFactory.java https://reviews.apache.org/r/23352/#comment91809 could we also use implicitConvertable() here? - Jason Dere On July 9, 2014, 6:57 a.m., Navis Ryu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/23352/ --- (Updated July 9, 2014, 6:57 a.m.) Review request for hive. Bugs: HIVE-7325 https://issues.apache.org/jira/browse/HIVE-7325 Repository: hive-git Description --- Here is my sample: {code} CREATE TABLE RECORD(RecordID string, BatchDate string, Country string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES (hbase.columns.mapping = :key,D:BatchDate,D:Country) TBLPROPERTIES (hbase.table.name = RECORD); CREATE TABLE KEY_RECORD(KeyValue String, RecordId mapstring,string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES (hbase.columns.mapping = :key, K:) TBLPROPERTIES (hbase.table.name = KEY_RECORD); {code} The following join statement doesn't work. {code} SELECT a.*, b.* from KEY_RECORD a join RECORD b WHERE a.RecordId[b.RecordID] is not null; {code} FAILED: SemanticException 2:16 Non-constant expression for map indexes not supported. Error encountered near token 'RecordID' Diffs - ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java 9889cfe ql/src/java/org/apache/hadoop/hive/ql/parse/TypeCheckProcFactory.java e44f5ae ql/src/test/queries/clientpositive/array_map_access_nonconstant.q PRE-CREATION ql/src/test/queries/negative/invalid_list_index.q c40f079 ql/src/test/queries/negative/invalid_list_index2.q 99d0b3d ql/src/test/queries/negative/invalid_map_index2.q 5828f07 ql/src/test/results/clientpositive/array_map_access_nonconstant.q.out PRE-CREATION ql/src/test/results/compiler/errors/invalid_list_index.q.out a4179cd ql/src/test/results/compiler/errors/invalid_list_index2.q.out aaa9455 ql/src/test/results/compiler/errors/invalid_map_index2.q.out edc9bda serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/PrimitiveObjectInspectorUtils.java 5ccacf1 Diff: https://reviews.apache.org/r/23352/diff/ Testing --- Thanks, Navis Ryu
[jira] [Commented] (HIVE-7325) Support non-constant expressions for MAP type indices.
[ https://issues.apache.org/jira/browse/HIVE-7325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127330#comment-14127330 ] Jason Dere commented on HIVE-7325: -- Couple of comments on RB Support non-constant expressions for MAP type indices. -- Key: HIVE-7325 URL: https://issues.apache.org/jira/browse/HIVE-7325 Project: Hive Issue Type: Bug Affects Versions: 0.13.1 Reporter: Mala Chikka Kempanna Assignee: Navis Fix For: 0.14.0 Attachments: HIVE-7325.1.patch.txt, HIVE-7325.2.patch.txt Here is my sample: {code} CREATE TABLE RECORD(RecordID string, BatchDate string, Country string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES (hbase.columns.mapping = :key,D:BatchDate,D:Country) TBLPROPERTIES (hbase.table.name = RECORD); CREATE TABLE KEY_RECORD(KeyValue String, RecordId mapstring,string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES (hbase.columns.mapping = :key, K:) TBLPROPERTIES (hbase.table.name = KEY_RECORD); {code} The following join statement doesn't work. {code} SELECT a.*, b.* from KEY_RECORD a join RECORD b WHERE a.RecordId[b.RecordID] is not null; {code} FAILED: SemanticException 2:16 Non-constant expression for map indexes not supported. Error encountered near token 'RecordID' -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7818) Support boolean PPD for ORC
[ https://issues.apache.org/jira/browse/HIVE-7818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127361#comment-14127361 ] Prasanth J commented on HIVE-7818: -- [~daijy] I will take a look at it sometime today and will post review. Meanwhile, can you please post the patch in review board? Support boolean PPD for ORC --- Key: HIVE-7818 URL: https://issues.apache.org/jira/browse/HIVE-7818 Project: Hive Issue Type: Improvement Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.14.0 Attachments: HIVE-7818.1.patch Currently ORC does collect stats for boolean field. However, the boolean stats is not range based, instead, it collects counts of true records. RecordReaderImpl.evaluatePredicate currently only deals with range based stats, we need to improve it to deal with the boolean stats. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 25178: Add DROP TABLE PURGE
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25178/ --- (Updated Sept. 9, 2014, 6:51 p.m.) Review request for hive and Xuefu Zhang. Changes --- latest patch from HIVE-7100 includes documentation updates and responses to RB comments Repository: hive-git Description --- Add PURGE option to DROP TABLE command to skip saving table data to the trash Diffs (updated) - hcatalog/core/src/test/java/org/apache/hive/hcatalog/mapreduce/TestHCatPartitionPublish.java be7134f hcatalog/webhcat/svr/src/test/java/org/apache/hive/hcatalog/templeton/tool/TestTempletonUtils.java af952f2 itests/hive-unit/src/test/java/org/apache/hive/jdbc/miniHS2/TestHiveServer2.java da51a55 metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 9489949 metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java a94a7a3 metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreFsImpl.java cff0718 metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java cbdba30 metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreFS.java a141793 metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java 613b709 ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java cd017d8 ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java e387b8f ql/src/java/org/apache/hadoop/hive/ql/metadata/SessionHiveMetaStoreClient.java 4cf98d8 ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java f31a409 ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g 32db0c7 ql/src/java/org/apache/hadoop/hive/ql/plan/DropTableDesc.java ba30e1f ql/src/test/org/apache/hadoop/hive/ql/metadata/TestHive.java 406aae9 ql/src/test/org/apache/hadoop/hive/ql/metadata/TestHiveRemote.java 1a5ba87 ql/src/test/queries/clientpositive/drop_table_purge.q PRE-CREATION ql/src/test/results/clientpositive/drop_table_purge.q.out PRE-CREATION Diff: https://reviews.apache.org/r/25178/diff/ Testing --- added code test and added QL test. Tests passed in CI, but other, unrelated tests failed. Thanks, david seraf
[jira] [Commented] (HIVE-7100) Users of hive should be able to specify skipTrash when dropping tables.
[ https://issues.apache.org/jira/browse/HIVE-7100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127387#comment-14127387 ] david serafini commented on HIVE-7100: -- done. HIVE-7100.8.patch uploaded to RB. Users of hive should be able to specify skipTrash when dropping tables. --- Key: HIVE-7100 URL: https://issues.apache.org/jira/browse/HIVE-7100 Project: Hive Issue Type: Improvement Affects Versions: 0.13.0 Reporter: Ravi Prakash Assignee: Jayesh Attachments: HIVE-7100.1.patch, HIVE-7100.2.patch, HIVE-7100.3.patch, HIVE-7100.4.patch, HIVE-7100.5.patch, HIVE-7100.8.patch, HIVE-7100.patch Users of our clusters are often running up against their quota limits because of Hive tables. When they drop tables, they have to then manually delete the files from HDFS using skipTrash. This is cumbersome and unnecessary. We should enable users to skipTrash directly when dropping tables. We should also be able to provide this functionality without polluting SQL syntax. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7950) StorageHandler resources aren't added to Tez Session if already Session is already Open
[ https://issues.apache.org/jira/browse/HIVE-7950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127407#comment-14127407 ] Josh Elser commented on HIVE-7950: -- Ok, I figured a bit more out here. I believe that the AM *is* correctly getting the extra jars from the storage handler as expected. The subsequent errors are coming from the containers that are started to actually run the DAG (rather than the coordination from the tez AM). The interesting part is that the patch (HIVE-7950-1.diff) which starts a brand new Session will result in a successful query. It seems like maybe Tez isn't passing along the extra resources we added to the running session (AM) in Hive along to the DAG containers to actually run the query. I have no idea at this point if this is a problem in how hive is using tez or if it's a bug in tez itself... StorageHandler resources aren't added to Tez Session if already Session is already Open --- Key: HIVE-7950 URL: https://issues.apache.org/jira/browse/HIVE-7950 Project: Hive Issue Type: Bug Components: StorageHandler, Tez Reporter: Josh Elser Assignee: Josh Elser Fix For: 0.14.0 Attachments: HIVE-7950-1.diff, hive-7950-tez-WIP.diff Was trying to run some queries using the AccumuloStorageHandler when using the Tez execution engine. Some things that classes which were added to tmpjars weren't making it into the container. When a Tez Session is already open, as is the normal case when simply using the `hive` command, the resources aren't added. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7950) StorageHandler resources aren't added to Tez Session if already Session is already Open
[ https://issues.apache.org/jira/browse/HIVE-7950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127417#comment-14127417 ] Gopal V commented on HIVE-7950: --- Can you post the sequence of things you are doing? As in, how is the JARs getting added - is it an explicit ADD JAR? Tez-0.5.0-rc1 had an issue we tackled with an API change to ship JARs differently between the AM and tasks. StorageHandler resources aren't added to Tez Session if already Session is already Open --- Key: HIVE-7950 URL: https://issues.apache.org/jira/browse/HIVE-7950 Project: Hive Issue Type: Bug Components: StorageHandler, Tez Reporter: Josh Elser Assignee: Josh Elser Fix For: 0.14.0 Attachments: HIVE-7950-1.diff, hive-7950-tez-WIP.diff Was trying to run some queries using the AccumuloStorageHandler when using the Tez execution engine. Some things that classes which were added to tmpjars weren't making it into the container. When a Tez Session is already open, as is the normal case when simply using the `hive` command, the resources aren't added. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7950) StorageHandler resources aren't added to Tez Session if already Session is already Open
[ https://issues.apache.org/jira/browse/HIVE-7950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127440#comment-14127440 ] Josh Elser commented on HIVE-7950: -- Sure thing, [~gopalv]. I don't actually have to do any extra {{ADD JAR}} commands. The AccumuloStorageHandler constructs a list of jars that need to be passed along to the execution engine (via tmpjars in the Hadoop configuration). With the 'yarn' execution.engine, this works just fine -- the resources are localized and added to the Map/Reduce containers and things are great. When I try to run with 'tez', there are a few issues. The first is that, if there is already a TezSessionState that was already open'ed (e.g. like what is done when I just open the hive shell), it will have been started without those extra 'tmpjars' resources from the StorageHandler and the query will fail because we need those jars. Sergey mentioned that Tez 0.5.0 had a new method that would allow more resources to be added to an already started TezClient ({{TezClient#addAppMasterLocalFiles(MapString, LocalResource)}}). Implementing this (in the hive-7950-tez-WIP.diff attachment), appears to have successfully added the extra jars from the StorageHandler to the DAGAppMaster, but the containers started to actually run the query are missing those extra jars. Does that make sense? StorageHandler resources aren't added to Tez Session if already Session is already Open --- Key: HIVE-7950 URL: https://issues.apache.org/jira/browse/HIVE-7950 Project: Hive Issue Type: Bug Components: StorageHandler, Tez Reporter: Josh Elser Assignee: Josh Elser Fix For: 0.14.0 Attachments: HIVE-7950-1.diff, hive-7950-tez-WIP.diff Was trying to run some queries using the AccumuloStorageHandler when using the Tez execution engine. Some things that classes which were added to tmpjars weren't making it into the container. When a Tez Session is already open, as is the normal case when simply using the `hive` command, the resources aren't added. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-6550) SemanticAnalyzer.reset() doesn't clear all the state
[ https://issues.apache.org/jira/browse/HIVE-6550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-6550: --- Status: Patch Available (was: Open) forgot to submit patch SemanticAnalyzer.reset() doesn't clear all the state Key: HIVE-6550 URL: https://issues.apache.org/jira/browse/HIVE-6550 Project: Hive Issue Type: Bug Affects Versions: 0.13.1, 0.13.0, 0.12.0 Reporter: Laljo John Pullokkaran Assignee: Sergey Shelukhin Attachments: HIVE-6550.01.patch, HIVE-6550.02.patch, HIVE-6550.03.patch, HIVE-6550.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7689) Enable Postgres as METASTORE back-end
[ https://issues.apache.org/jira/browse/HIVE-7689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127442#comment-14127442 ] Hive QA commented on HIVE-7689: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12667443/HIVE-7689.7.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6192 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority org.apache.hadoop.hive.ql.txn.compactor.TestCompactor.testStatsAfterCompactionPartTbl {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/711/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/711/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-711/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12667443 Enable Postgres as METASTORE back-end - Key: HIVE-7689 URL: https://issues.apache.org/jira/browse/HIVE-7689 Project: Hive Issue Type: Improvement Components: Metastore Affects Versions: 0.14.0 Reporter: Damien Carol Assignee: Damien Carol Priority: Minor Labels: metastore, postgres Fix For: 0.14.0 Attachments: HIVE-7689.5.patch, HIVE-7689.6.patch, HIVE-7689.7.patch, HIVE-7889.1.patch, HIVE-7889.2.patch, HIVE-7889.3.patch, HIVE-7889.4.patch I maintain few patches to make Metastore works with Postgres back end in our production environment. The main goal of this JIRA is to push upstream these patches. This patch enable LOCKS, COMPACTION and fix error in STATS on postgres metastore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7946) CBO: Merge CBO changes to Trunk
[ https://issues.apache.org/jira/browse/HIVE-7946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127447#comment-14127447 ] Sergey Shelukhin commented on HIVE-7946: stats_noscan_1 and about 10-15 more tests might be fixed by HIVE-6550 CBO: Merge CBO changes to Trunk --- Key: HIVE-7946 URL: https://issues.apache.org/jira/browse/HIVE-7946 Project: Hive Issue Type: Bug Components: CBO Reporter: Laljo John Pullokkaran Assignee: Laljo John Pullokkaran Attachments: HIVE-7946.1.patch, HIVE-7946.2.patch, HIVE-7946.3.patch, HIVE-7946.4.patch, HIVE-7946.5.patch, HIVE-7946.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-7946) CBO: Merge CBO changes to Trunk
[ https://issues.apache.org/jira/browse/HIVE-7946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127447#comment-14127447 ] Sergey Shelukhin edited comment on HIVE-7946 at 9/9/14 7:36 PM: stats_noscan_1 and about 10-15 more tests might be fixed by HIVE-6550 (when that is in) was (Author: sershe): stats_noscan_1 and about 10-15 more tests might be fixed by HIVE-6550 CBO: Merge CBO changes to Trunk --- Key: HIVE-7946 URL: https://issues.apache.org/jira/browse/HIVE-7946 Project: Hive Issue Type: Bug Components: CBO Reporter: Laljo John Pullokkaran Assignee: Laljo John Pullokkaran Attachments: HIVE-7946.1.patch, HIVE-7946.2.patch, HIVE-7946.3.patch, HIVE-7946.4.patch, HIVE-7946.5.patch, HIVE-7946.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-6866) Hive server2 jdbc driver connection leak with namenode
[ https://issues.apache.org/jira/browse/HIVE-6866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127451#comment-14127451 ] Ankita Bakshi commented on HIVE-6866: - We are facing same issue in production. We are using CDH4.4 with Apache Hive 0.12. Is there a workaround for this issue other than restarting hiveserver2? Hive server2 jdbc driver connection leak with namenode -- Key: HIVE-6866 URL: https://issues.apache.org/jira/browse/HIVE-6866 Project: Hive Issue Type: Bug Affects Versions: 0.11.0 Reporter: Shengjun Xin 1. Set 'ipc.client.connection.maxidletime' to 360 in core-site.xml and start hive-server2. 2. Connect hive server2 repetitively in a while true loop. 3. The tcp connection number will increase until out of memory, it seems that hive server2 will not close the connection until the time out, the error message is as the following: {code} 2014-03-18 23:30:36,873 ERROR ql.Driver (SessionState.java:printError(386)) - FAILED: RuntimeException java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: hdm1.hadoop.local/192.168.2.101; destination host is: hdm1.hadoop.local:8020; java.lang.RuntimeException: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: hdm1.hadoop.local/192.168.2.101; destination host is: hdm1.hadoop.local:8020; at org.apache.hadoop.hive.ql.Context.getScratchDir(Context.java:190) at org.apache.hadoop.hive.ql.Context.getMRScratchDir(Context.java:231) at org.apache.hadoop.hive.ql.Context.getMRTmpFileURI(Context.java:288) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1274) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1059) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:8676) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:278) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:433) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902) at org.apache.hive.service.cli.operation.SQLOperation.run(SQLOperation.java:95) at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatement(HiveSessionImpl.java:181) at org.apache.hive.service.cli.CLIService.executeStatement(CLIService.java:148) at org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:203) at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1133) at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1118) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.hive.service.auth.TUGIContainingProcessor$1.run(TUGIContainingProcessor.java:40) at org.apache.hive.service.auth.TUGIContainingProcessor$1.run(TUGIContainingProcessor.java:37) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1478) at org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:524) at org.apache.hive.service.auth.TUGIContainingProcessor.process(TUGIContainingProcessor.java:37) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: hdm1.hadoop.local/192.168.2.101; destination host is: hdm1.hadoop.local:8020; at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:761) at org.apache.hadoop.ipc.Client.call(Client.java:1239) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202) at com.sun.proxy.$Proxy11.mkdirs(Unknown Source) at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at
[jira] [Created] (HIVE-8034) Don't add colon when no port is specified
Brock Noland created HIVE-8034: -- Summary: Don't add colon when no port is specified Key: HIVE-8034 URL: https://issues.apache.org/jira/browse/HIVE-8034 Project: Hive Issue Type: Bug Reporter: Brock Noland -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8034) Don't add colon when no port is specified
[ https://issues.apache.org/jira/browse/HIVE-8034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-8034: --- Description: In HIVE-4910 we added a {{:}} even if there was no port due to HADOOP-9776. Now that this is fixed I think we should fix ours as well. Don't add colon when no port is specified - Key: HIVE-8034 URL: https://issues.apache.org/jira/browse/HIVE-8034 Project: Hive Issue Type: Bug Reporter: Brock Noland In HIVE-4910 we added a {{:}} even if there was no port due to HADOOP-9776. Now that this is fixed I think we should fix ours as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8034) Don't add colon when no port is specified
[ https://issues.apache.org/jira/browse/HIVE-8034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-8034: --- Assignee: Brock Noland Status: Patch Available (was: Open) Don't add colon when no port is specified - Key: HIVE-8034 URL: https://issues.apache.org/jira/browse/HIVE-8034 Project: Hive Issue Type: Bug Reporter: Brock Noland Assignee: Brock Noland Attachments: HIVE-8034.1.patch In HIVE-4910 we added a {{:}} even if there was no port due to HADOOP-9776. Now that this is fixed I think we should fix ours as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8034) Don't add colon when no port is specified
[ https://issues.apache.org/jira/browse/HIVE-8034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-8034: --- Attachment: HIVE-8034.1.patch Don't add colon when no port is specified - Key: HIVE-8034 URL: https://issues.apache.org/jira/browse/HIVE-8034 Project: Hive Issue Type: Bug Reporter: Brock Noland Attachments: HIVE-8034.1.patch In HIVE-4910 we added a {{:}} even if there was no port due to HADOOP-9776. Now that this is fixed I think we should fix ours as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7405) Vectorize GROUP BY on the Reduce-Side (Part 1 – Basic)
[ https://issues.apache.org/jira/browse/HIVE-7405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-7405: --- Status: In Progress (was: Patch Available) Vectorize GROUP BY on the Reduce-Side (Part 1 – Basic) -- Key: HIVE-7405 URL: https://issues.apache.org/jira/browse/HIVE-7405 Project: Hive Issue Type: Sub-task Components: Vectorization Reporter: Matt McCline Assignee: Matt McCline Attachments: HIVE-7405.1.patch, HIVE-7405.2.patch, HIVE-7405.3.patch, HIVE-7405.4.patch, HIVE-7405.5.patch, HIVE-7405.6.patch, HIVE-7405.7.patch, HIVE-7405.8.patch, HIVE-7405.9.patch, HIVE-7405.91.patch, HIVE-7405.92.patch, HIVE-7405.93.patch, HIVE-7405.94.patch, HIVE-7405.95.patch, HIVE-7405.96.patch, HIVE-7405.97.patch, HIVE-7405.98.patch, HIVE-7405.99.patch, HIVE-7405.991.patch, HIVE-7405.994.patch, HIVE-7405.995.patch Vectorize the basic case that does not have any count distinct aggregation. Add a 4th processing mode in VectorGroupByOperator for reduce where each input VectorizedRowBatch has only values for one key at a time. Thus, the values in the batch can be aggregated quickly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7405) Vectorize GROUP BY on the Reduce-Side (Part 1 – Basic)
[ https://issues.apache.org/jira/browse/HIVE-7405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-7405: --- Attachment: HIVE-7405.996.patch Vectorize GROUP BY on the Reduce-Side (Part 1 – Basic) -- Key: HIVE-7405 URL: https://issues.apache.org/jira/browse/HIVE-7405 Project: Hive Issue Type: Sub-task Components: Vectorization Reporter: Matt McCline Assignee: Matt McCline Attachments: HIVE-7405.1.patch, HIVE-7405.2.patch, HIVE-7405.3.patch, HIVE-7405.4.patch, HIVE-7405.5.patch, HIVE-7405.6.patch, HIVE-7405.7.patch, HIVE-7405.8.patch, HIVE-7405.9.patch, HIVE-7405.91.patch, HIVE-7405.92.patch, HIVE-7405.93.patch, HIVE-7405.94.patch, HIVE-7405.95.patch, HIVE-7405.96.patch, HIVE-7405.97.patch, HIVE-7405.98.patch, HIVE-7405.99.patch, HIVE-7405.991.patch, HIVE-7405.994.patch, HIVE-7405.995.patch, HIVE-7405.996.patch Vectorize the basic case that does not have any count distinct aggregation. Add a 4th processing mode in VectorGroupByOperator for reduce where each input VectorizedRowBatch has only values for one key at a time. Thus, the values in the batch can be aggregated quickly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7405) Vectorize GROUP BY on the Reduce-Side (Part 1 – Basic)
[ https://issues.apache.org/jira/browse/HIVE-7405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-7405: --- Status: Patch Available (was: In Progress) Vectorize GROUP BY on the Reduce-Side (Part 1 – Basic) -- Key: HIVE-7405 URL: https://issues.apache.org/jira/browse/HIVE-7405 Project: Hive Issue Type: Sub-task Components: Vectorization Reporter: Matt McCline Assignee: Matt McCline Attachments: HIVE-7405.1.patch, HIVE-7405.2.patch, HIVE-7405.3.patch, HIVE-7405.4.patch, HIVE-7405.5.patch, HIVE-7405.6.patch, HIVE-7405.7.patch, HIVE-7405.8.patch, HIVE-7405.9.patch, HIVE-7405.91.patch, HIVE-7405.92.patch, HIVE-7405.93.patch, HIVE-7405.94.patch, HIVE-7405.95.patch, HIVE-7405.96.patch, HIVE-7405.97.patch, HIVE-7405.98.patch, HIVE-7405.99.patch, HIVE-7405.991.patch, HIVE-7405.994.patch, HIVE-7405.995.patch, HIVE-7405.996.patch Vectorize the basic case that does not have any count distinct aggregation. Add a 4th processing mode in VectorGroupByOperator for reduce where each input VectorizedRowBatch has only values for one key at a time. Thus, the values in the batch can be aggregated quickly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7950) StorageHandler resources aren't added to Tez Session if already Session is already Open
[ https://issues.apache.org/jira/browse/HIVE-7950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127479#comment-14127479 ] Gopal V commented on HIVE-7950: --- This is related to TEZ-1469, which should provide guidance. For reference, take a look at some of my example code https://github.com/t3rmin4t0r/tez-broadcast-example/blob/master/src/main/java/org/notmysock/tez/BroadcastTest.java#L203 Lines 203 and 195. StorageHandler resources aren't added to Tez Session if already Session is already Open --- Key: HIVE-7950 URL: https://issues.apache.org/jira/browse/HIVE-7950 Project: Hive Issue Type: Bug Components: StorageHandler, Tez Reporter: Josh Elser Assignee: Josh Elser Fix For: 0.14.0 Attachments: HIVE-7950-1.diff, hive-7950-tez-WIP.diff Was trying to run some queries using the AccumuloStorageHandler when using the Tez execution engine. Some things that classes which were added to tmpjars weren't making it into the container. When a Tez Session is already open, as is the normal case when simply using the `hive` command, the resources aren't added. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7950) StorageHandler resources aren't added to Tez Session if already Session is already Open
[ https://issues.apache.org/jira/browse/HIVE-7950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127485#comment-14127485 ] Josh Elser commented on HIVE-7950: -- You're the man. That was exactly what I needed. I completely missed that I needed to add the resources to the DAG as well. I'll clean up my changes and post and updated patch here later today after I poke/prod it some more. StorageHandler resources aren't added to Tez Session if already Session is already Open --- Key: HIVE-7950 URL: https://issues.apache.org/jira/browse/HIVE-7950 Project: Hive Issue Type: Bug Components: StorageHandler, Tez Reporter: Josh Elser Assignee: Josh Elser Fix For: 0.14.0 Attachments: HIVE-7950-1.diff, hive-7950-tez-WIP.diff Was trying to run some queries using the AccumuloStorageHandler when using the Tez execution engine. Some things that classes which were added to tmpjars weren't making it into the container. When a Tez Session is already open, as is the normal case when simply using the `hive` command, the resources aren't added. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8024) Find out whether it's possible to remove UnionOperator from original operator tree [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127504#comment-14127504 ] Chao commented on HIVE-8024: There's a problem again: Suppose the operator tree is the following: {code} TS_0TS_2 \/ UNION_3 | SEL_4 {code} After removing UNION operator, it will look like this: {code} TS_0TS_2 \/ SEL_4 {code} (Again, I ignored some operators, but you get the idea.) Then, we could have MapWork 1 starts from {{TS_0}}, and MapWork 2 starts from {{TS_2}}. Now, when MapWork 1 initialize itself, it will initialize the operator tree, starting from {{TS_0}, and go down the tree. When it gets to {{SEL_4}}, it will not be able to initialize it, because not all of {{SEL_4}}'s parent are initialize at that point. Hence, the execution will fail. Find out whether it's possible to remove UnionOperator from original operator tree [Spark Branch] - Key: HIVE-8024 URL: https://issues.apache.org/jira/browse/HIVE-8024 Project: Hive Issue Type: Task Components: Spark Reporter: Chao Assignee: Chao Currently, after operator tree is processed, the generated works with union operators will go through {{GenSparkUtils::removeUnionOperators}}, which will clone the original operator plan associated with the work, and remove union operators in it. This caused some issues as seen, for example, in HIVE-7870. This JIRA is created to find out whether it's possible to just remove the union operators in the original plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8024) Find out whether it's possible to remove UnionOperator from original operator tree [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127509#comment-14127509 ] Chao commented on HIVE-8024: Closing this JIRA as this approach seems not working or at least is hard to implement. [~xuefuz] is writing a design doc and we'll continue the discussion there. Find out whether it's possible to remove UnionOperator from original operator tree [Spark Branch] - Key: HIVE-8024 URL: https://issues.apache.org/jira/browse/HIVE-8024 Project: Hive Issue Type: Task Components: Spark Reporter: Chao Assignee: Chao Currently, after operator tree is processed, the generated works with union operators will go through {{GenSparkUtils::removeUnionOperators}}, which will clone the original operator plan associated with the work, and remove union operators in it. This caused some issues as seen, for example, in HIVE-7870. This JIRA is created to find out whether it's possible to just remove the union operators in the original plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-8024) Find out whether it's possible to remove UnionOperator from original operator tree [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao resolved HIVE-8024. Resolution: Done Find out whether it's possible to remove UnionOperator from original operator tree [Spark Branch] - Key: HIVE-8024 URL: https://issues.apache.org/jira/browse/HIVE-8024 Project: Hive Issue Type: Task Components: Spark Reporter: Chao Assignee: Chao Currently, after operator tree is processed, the generated works with union operators will go through {{GenSparkUtils::removeUnionOperators}}, which will clone the original operator plan associated with the work, and remove union operators in it. This caused some issues as seen, for example, in HIVE-7870. This JIRA is created to find out whether it's possible to just remove the union operators in the original plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8019) Missing commit from trunk : `export/import statement update`
[ https://issues.apache.org/jira/browse/HIVE-8019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-8019: Status: Patch Available (was: Open) Missing commit from trunk : `export/import statement update` Key: HIVE-8019 URL: https://issues.apache.org/jira/browse/HIVE-8019 Project: Hive Issue Type: Bug Components: Import/Export Affects Versions: 0.14.0 Reporter: Mohit Sabharwal Assignee: Thejas M Nair Priority: Blocker Attachments: HIVE-8019.1.patch Noticed that commit 1882de7810fc55a2466dd4cbe74ed67bb41cb667 exists in 0.13 branch, but not it trunk. https://github.com/apache/hive/commit/1882de7810fc55a2466dd4cbe74ed67bb41cb667 {code} (trunk) $ git branch -a --contains 1882de7810fc55a2466dd4cbe74ed67bb41cb667 remotes/origin/branch-0.13 {code} I looked through some of the changes in this commit and don't see those in trunk. Nor do I see a commit that reverts these changes in trunk. [~thejas], should we port this over to trunk ? Thanks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)