[jira] [Commented] (HIVE-4123) The RLE encoding for ORC can be improved
[ https://issues.apache.org/jira/browse/HIVE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090350#comment-14090350 ] Prasanth J commented on HIVE-4123: -- Please go ahead and update the original description. At this point the only possible valid values are 0.11 and 0.12. As you had mentioned if the parameter is not defined or defined wrongly it will use the default 0.12 encoding. bq. Is that accurate? Can releases be specified as 0.12.0 or 0.13.1? Yes. Accurate. HIVE-6002 was trying to add patch number to the write version so that numbers can be specified as 0.12.1. But I don't think it will be committed until next major change to ORC writer. The RLE encoding for ORC can be improved Key: HIVE-4123 URL: https://issues.apache.org/jira/browse/HIVE-4123 Project: Hive Issue Type: New Feature Components: File Formats Affects Versions: 0.12.0 Reporter: Owen O'Malley Assignee: Prasanth J Labels: TODOC12, orcfile Fix For: 0.12.0 Attachments: HIVE-4123-8.patch, HIVE-4123.1.git.patch.txt, HIVE-4123.2.git.patch.txt, HIVE-4123.3.patch.txt, HIVE-4123.4.patch.txt, HIVE-4123.5.txt, HIVE-4123.6.txt, HIVE-4123.7.txt, HIVE-4123.8.txt, HIVE-4123.8.txt, HIVE-4123.patch.txt, ORC-Compression-Ratio-Comparison.xlsx The run length encoding of integers can be improved: * tighter bit packing * allow delta encoding * allow longer runs -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7629) Problem in SMB Joins between two Parquet tables
[ https://issues.apache.org/jira/browse/HIVE-7629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090365#comment-14090365 ] Hive QA commented on HIVE-7629: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12660393/HIVE-7629.patch {color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 5887 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx org.apache.hadoop.hive.ql.TestDDLWithRemoteMetastoreSecondNamenode.testCreateTableWithIndexAndPartitionsNonDefaultNameNode org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/220/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/220/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-220/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12660393 Problem in SMB Joins between two Parquet tables --- Key: HIVE-7629 URL: https://issues.apache.org/jira/browse/HIVE-7629 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 0.13.0 Reporter: Suma Shivaprasad Labels: Parquet Fix For: 0.14.0 Attachments: HIVE-7629.patch The issue is clearly seen when two bucketed and sorted parquet tables with different number of columns are involved in the join . The following exception is seen Caused by: java.lang.IndexOutOfBoundsException: Index: 2, Size: 2 at java.util.ArrayList.rangeCheck(ArrayList.java:635) at java.util.ArrayList.get(ArrayList.java:411) at org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.init(DataWritableReadSupport.java:101) at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.getSplit(ParquetRecordReaderWrapper.java:204) at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.init(ParquetRecordReaderWrapper.java:79) at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.init(ParquetRecordReaderWrapper.java:66) at org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:51) at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.init(CombineHiveRecordReader.java:65) -- This message was sent by Atlassian JIRA (v6.2#6252)
Review Request 24497: HIVE-7629 - Map joins between two parquet tables failing
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24497/ --- Review request for hive. Bugs: HIVE-7629 https://issues.apache.org/jira/browse/HIVE-7629 Repository: hive-git Description --- Map Joins between 2 parquet tables are failing since the Mapper is trying to access the columns of the first table(bigger table) while trying to load the second table(smaller map join table). Fixed this by adding a guard on the column indexes passed by hive Diffs - ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ProjectionPusher.java 2f155f6 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/DataWritableReadSupport.java d6be4bd ql/src/test/queries/clientpositive/parquet_join.q PRE-CREATION ql/src/test/results/clientpositive/parquet_join.q.out PRE-CREATION Diff: https://reviews.apache.org/r/24497/diff/ Testing --- parquet_join.q covers most types of joins between 2 parquet tables - Normal, Map join, SMB join Thanks, Suma Shivaprasad
[jira] [Commented] (HIVE-7629) Problem in SMB Joins between two Parquet tables
[ https://issues.apache.org/jira/browse/HIVE-7629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090371#comment-14090371 ] Suma Shivaprasad commented on HIVE-7629: Reviewboard request - https://reviews.apache.org/r/24497/ Problem in SMB Joins between two Parquet tables --- Key: HIVE-7629 URL: https://issues.apache.org/jira/browse/HIVE-7629 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 0.13.0 Reporter: Suma Shivaprasad Labels: Parquet Fix For: 0.14.0 Attachments: HIVE-7629.patch The issue is clearly seen when two bucketed and sorted parquet tables with different number of columns are involved in the join . The following exception is seen Caused by: java.lang.IndexOutOfBoundsException: Index: 2, Size: 2 at java.util.ArrayList.rangeCheck(ArrayList.java:635) at java.util.ArrayList.get(ArrayList.java:411) at org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.init(DataWritableReadSupport.java:101) at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.getSplit(ParquetRecordReaderWrapper.java:204) at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.init(ParquetRecordReaderWrapper.java:79) at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.init(ParquetRecordReaderWrapper.java:66) at org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:51) at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.init(CombineHiveRecordReader.java:65) -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 23674: Handle db qualified names consistently across all HiveQL statements
On Aug. 5, 2014, 5:09 a.m., Thejas Nair wrote: ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java, line 607 https://reviews.apache.org/r/23674/diff/2/?file=647687#file647687line607 doesn't the default authorization mode support columns in show-grants ? It is there in ShowGrantDesc Navis Ryu wrote: I've moved columns in ShowGrantDesc to PrivilegeObjectDesc, which seemed more neat, imho. Isn't it? Yes, thats certainly better. On Aug. 5, 2014, 5:09 a.m., Thejas Nair wrote: ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java, line 2110 https://reviews.apache.org/r/23674/diff/2/?file=647694#file647694line2110 how about using BaseSemanticAnalyzer. getQualifiedTableName and having this check for 'duplicate declaration' there ? Navis Ryu wrote: It's not TOK_TABNAME (which is TOK_FROM identifier? (identifier|StringLiteral)?) and seemed not replaced with getQualifiedTableName() Thanks for clarifying! On Aug. 5, 2014, 5:09 a.m., Thejas Nair wrote: ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/HivePrivilegeObject.java, line 94 https://reviews.apache.org/r/23674/diff/2/?file=647707#file647707line94 Isn't it better to represent the columns as a set instead of list, as multiple columns with same name in this object does not make sense ? Same for other places in this patch where columns has been changed from a set to list. Navis Ryu wrote: HivePrivilegeObject compares columns by iteration. If columns is not ordered somehow, it seemed not a valid comparison. I didn't have a idea to compare two column sets, I just replaced it to a sorted list, which felt easier that that. Any idea? Sounds fine. Maybe we can do a copy and sort as part of the constructor of this HivePrivilegeObject, instead of relying on the argument being sorted. That is likely to avoid potential bugs. But that can be done as part of separate jira. - Thejas --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/23674/#review49498 --- On Aug. 1, 2014, 1:55 a.m., Navis Ryu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/23674/ --- (Updated Aug. 1, 2014, 1:55 a.m.) Review request for hive and Thejas Nair. Bugs: HIVE-4064 https://issues.apache.org/jira/browse/HIVE-4064 Repository: hive-git Description --- Hive doesn't consistently handle db qualified names across all HiveQL statements. While some HiveQL statements such as SELECT support DB qualified names, other such as CREATE INDEX doesn't. Diffs - itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/security/authorization/plugin/TestHiveAuthorizerCheckInvocation.java c91b15c itests/util/src/main/java/org/apache/hadoop/hive/ql/hooks/CheckColumnAccessHook.java 14fc430 metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java b74868b metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java 5a56ced metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 4f186f4 ql/src/java/org/apache/hadoop/hive/ql/Driver.java cba5cfa ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 40d910c ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 4cf4522 ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java a7e50ad ql/src/java/org/apache/hadoop/hive/ql/optimizer/IndexUtils.java ae87aac ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteGBUsingIndex.java 11a6d07 ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 22945e3 ql/src/java/org/apache/hadoop/hive/ql/parse/ColumnAccessInfo.java 939dc65 ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 67a3aa7 ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g ab1188a ql/src/java/org/apache/hadoop/hive/ql/parse/IndexUpdater.java 856ec2f ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 51838ae ql/src/java/org/apache/hadoop/hive/ql/parse/authorization/HiveAuthorizationTaskFactoryImpl.java 826bdf3 ql/src/java/org/apache/hadoop/hive/ql/plan/AlterIndexDesc.java 0318e4b ql/src/java/org/apache/hadoop/hive/ql/plan/AlterTableAlterPartDesc.java cf67e16 ql/src/java/org/apache/hadoop/hive/ql/plan/AlterTableSimpleDesc.java 541675c ql/src/java/org/apache/hadoop/hive/ql/plan/PrivilegeObjectDesc.java 9417220 ql/src/java/org/apache/hadoop/hive/ql/plan/RenamePartitionDesc.java 1b5fb9e ql/src/java/org/apache/hadoop/hive/ql/plan/ShowColumnsDesc.java fe6a91e ql/src/java/org/apache/hadoop/hive/ql/plan/ShowGrantDesc.java aa88153
[jira] [Commented] (HIVE-4064) Handle db qualified names consistently across all HiveQL statements
[ https://issues.apache.org/jira/browse/HIVE-4064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090422#comment-14090422 ] Thejas M Nair commented on HIVE-4064: - [~navis] Can you also please upload the new patch to reviewboard ? Handle db qualified names consistently across all HiveQL statements --- Key: HIVE-4064 URL: https://issues.apache.org/jira/browse/HIVE-4064 Project: Hive Issue Type: Bug Components: SQL Affects Versions: 0.10.0 Reporter: Shreepadma Venugopalan Assignee: Navis Attachments: HIVE-4064-1.patch, HIVE-4064.1.patch.txt, HIVE-4064.2.patch.txt, HIVE-4064.3.patch.txt, HIVE-4064.4.patch.txt, HIVE-4064.5.patch.txt, HIVE-4064.6.patch.txt Hive doesn't consistently handle db qualified names across all HiveQL statements. While some HiveQL statements such as SELECT support DB qualified names, other such as CREATE INDEX doesn't. -- This message was sent by Atlassian JIRA (v6.2#6252)
Review Request 24498: A method to extrapolate the missing column status for the partitions.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24498/ --- Review request for hive. Repository: hive-git Description --- We propose a method to extrapolate the missing column status for the partitions. Diffs - metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java 43c412d Diff: https://reviews.apache.org/r/24498/diff/ Testing --- Thanks, pengcheng xiong
[jira] [Created] (HIVE-7657) Nullable union of 3 or more types is not recognized nullable
Arkadiusz Gasior created HIVE-7657: -- Summary: Nullable union of 3 or more types is not recognized nullable Key: HIVE-7657 URL: https://issues.apache.org/jira/browse/HIVE-7657 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Reporter: Arkadiusz Gasior Handling nullable union of 3 types or more is causing serialization issues, as [null,long,string] is not recognized nullable. Potential code causing issues might be AvroSerdeUtils.java: {code} public static boolean isNullableType(Schema schema) { return schema.getType().equals(Schema.Type.UNION) schema.getTypes().size() == 2 (schema.getTypes().get(0).getType().equals(Schema.Type.NULL) || schema.getTypes().get(1).getType().equals(Schema.Type.NULL)); // [null, null] not allowed, so this check is ok. } {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7553) avoid the scheduling maintenance window for every jar change
[ https://issues.apache.org/jira/browse/HIVE-7553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdinand Xu updated HIVE-7553: --- Attachment: HIVE-7553.pdf Since I do not have post privilege in hive space from wiki, I have to attach the original design document here. Can you please help review my design? Thanks in advance! avoid the scheduling maintenance window for every jar change Key: HIVE-7553 URL: https://issues.apache.org/jira/browse/HIVE-7553 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Ferdinand Xu Assignee: Ferdinand Xu Attachments: HIVE-7553.pdf When user needs to refresh existing or add a new jar to HS2, it needs to restart it. As HS2 is service exposed to clients, this requires scheduling maintenance window for every jar change. It would be great if we could avoid that. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark
[ https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li updated HIVE-7624: - Attachment: HIVE-7624.3-spark.patch Reduce operator initialization failed when running multiple MR query on spark - Key: HIVE-7624 URL: https://issues.apache.org/jira/browse/HIVE-7624 Project: Hive Issue Type: Bug Components: Spark Reporter: Rui Li Assignee: Rui Li Attachments: HIVE-7624.2-spark.patch, HIVE-7624.3-spark.patch, HIVE-7624.patch The following error occurs when I try to run a query with multiple reduce works (M-R-R): {quote} 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1) java.lang.RuntimeException: Reduce operator initialization failed at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170) at org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53) at org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from [0:_col0] at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415) at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147) … {quote} I suspect we're applying the reduce function in wrong order. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark
[ https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li updated HIVE-7624: - Status: Patch Available (was: Open) Reduce operator initialization failed when running multiple MR query on spark - Key: HIVE-7624 URL: https://issues.apache.org/jira/browse/HIVE-7624 Project: Hive Issue Type: Bug Components: Spark Reporter: Rui Li Assignee: Rui Li Attachments: HIVE-7624.2-spark.patch, HIVE-7624.3-spark.patch, HIVE-7624.patch The following error occurs when I try to run a query with multiple reduce works (M-R-R): {quote} 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1) java.lang.RuntimeException: Reduce operator initialization failed at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170) at org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53) at org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from [0:_col0] at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415) at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147) … {quote} I suspect we're applying the reduce function in wrong order. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark
[ https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090452#comment-14090452 ] Rui Li commented on HIVE-7624: -- Finally I found this is because we don't set output collector for RS in ExecReducer. While this is natural for MR where ExecReducer shouldn't contain RS, we have to do it for spark. The added code just looks for RS and sets collector for it, so there shouldn't be any regression. Reduce operator initialization failed when running multiple MR query on spark - Key: HIVE-7624 URL: https://issues.apache.org/jira/browse/HIVE-7624 Project: Hive Issue Type: Bug Components: Spark Reporter: Rui Li Assignee: Rui Li Attachments: HIVE-7624.2-spark.patch, HIVE-7624.3-spark.patch, HIVE-7624.patch The following error occurs when I try to run a query with multiple reduce works (M-R-R): {quote} 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1) java.lang.RuntimeException: Reduce operator initialization failed at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170) at org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53) at org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from [0:_col0] at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415) at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147) … {quote} I suspect we're applying the reduce function in wrong order. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7658) Hive search order for hive-site.xml when using --config option
James Spurin created HIVE-7658: -- Summary: Hive search order for hive-site.xml when using --config option Key: HIVE-7658 URL: https://issues.apache.org/jira/browse/HIVE-7658 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.13.0 Environment: -bash-3.2$ cat /etc/redhat-release Red Hat Enterprise Linux Server release 5.9 (Tikanga) Hive 0.13.0-mapr-1406 Subversion git://rhbuild/root/builds/opensource/node/ecosystem/dl/hive -r 4ff8f8b4a8fc4862727108204399710ef7ee7abc Compiled by root on Tue Jul 1 14:18:09 PDT 2014 From source with checksum 208afc25260342b51aefd2e0edf4c9d6 Reporter: James Spurin Priority: Minor When using the hive cl, the tool appears to favour a hive-site.xml file in the current working directory even if the --config option is used with a valid directory containing a hive-site.xml file. I would have expected the directory specified with --config to take precedence in the CLASSPATH search order. Here's an example - /home/spurija/hive-site.xml = configuration property namehive.exec.local.scratchdir/name value/tmp/example1/value /property /configuration /tmp/hive/hive-site.xml = configuration property namehive.exec.local.scratchdir/name value/tmp/example2/value /property /configuration -bash-4.1$ diff /home/spurija/hive-site.xml /tmp/hive/hive-site.xml 23c23 value/tmp/example1/value --- value/tmp/example2/value { check the value of scratchdir, should be example 1 } -bash-4.1$ pwd /home/spurija -bash-4.1$ hive Logging initialized using configuration in jar:file:/opt/mapr/hive/hive-0.13/lib/hive-common-0.13.0-mapr-1405.jar!/hive-log4j.properties hive set hive.exec.local.scratchdir; hive.exec.local.scratchdir=/tmp/example1 { run with a specified config, check the value of scratchdir, should be example2 … still reported as example1 } -bash-4.1$ pwd /home/spurija -bash-4.1$ hive --config /tmp/hive Logging initialized using configuration in jar:file:/opt/mapr/hive/hive-0.13/lib/hive-common-0.13.0-mapr-1405.jar!/hive-log4j.properties hive set hive.exec.local.scratchdir; hive.exec.local.scratchdir=/tmp/example1 { remove the local config, check the value of scratchdir, should be example2 … now correct } -bash-4.1$ pwd /home/spurija -bash-4.1$ rm hive-site.xml -bash-4.1$ hive --config /tmp/hive Logging initialized using configuration in jar:file:/opt/mapr/hive/hive-0.13/lib/hive-common-0.13.0-mapr-1405.jar!/hive-log4j.properties hive set hive.exec.local.scratchdir; hive.exec.local.scratchdir=/tmp/example2 Is this expected behavior or should it use the directory supplied with --config as the preferred configuration? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7658) Hive search order for hive-site.xml when using --config option
[ https://issues.apache.org/jira/browse/HIVE-7658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Spurin updated HIVE-7658: --- Environment: Red Hat Enterprise Linux Server release 5.9 (Tikanga) Hive 0.13.0-mapr-1406 Subversion git://rhbuild/root/builds/opensource/node/ecosystem/dl/hive -r 4ff8f8b4a8fc4862727108204399710ef7ee7abc Compiled by root on Tue Jul 1 14:18:09 PDT 2014 From source with checksum 208afc25260342b51aefd2e0edf4c9d6 was: -bash-3.2$ cat /etc/redhat-release Red Hat Enterprise Linux Server release 5.9 (Tikanga) Hive 0.13.0-mapr-1406 Subversion git://rhbuild/root/builds/opensource/node/ecosystem/dl/hive -r 4ff8f8b4a8fc4862727108204399710ef7ee7abc Compiled by root on Tue Jul 1 14:18:09 PDT 2014 From source with checksum 208afc25260342b51aefd2e0edf4c9d6 Hive search order for hive-site.xml when using --config option -- Key: HIVE-7658 URL: https://issues.apache.org/jira/browse/HIVE-7658 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.13.0 Environment: Red Hat Enterprise Linux Server release 5.9 (Tikanga) Hive 0.13.0-mapr-1406 Subversion git://rhbuild/root/builds/opensource/node/ecosystem/dl/hive -r 4ff8f8b4a8fc4862727108204399710ef7ee7abc Compiled by root on Tue Jul 1 14:18:09 PDT 2014 From source with checksum 208afc25260342b51aefd2e0edf4c9d6 Reporter: James Spurin Priority: Minor When using the hive cl, the tool appears to favour a hive-site.xml file in the current working directory even if the --config option is used with a valid directory containing a hive-site.xml file. I would have expected the directory specified with --config to take precedence in the CLASSPATH search order. Here's an example - /home/spurija/hive-site.xml = configuration property namehive.exec.local.scratchdir/name value/tmp/example1/value /property /configuration /tmp/hive/hive-site.xml = configuration property namehive.exec.local.scratchdir/name value/tmp/example2/value /property /configuration -bash-4.1$ diff /home/spurija/hive-site.xml /tmp/hive/hive-site.xml 23c23 value/tmp/example1/value --- value/tmp/example2/value { check the value of scratchdir, should be example 1 } -bash-4.1$ pwd /home/spurija -bash-4.1$ hive Logging initialized using configuration in jar:file:/opt/mapr/hive/hive-0.13/lib/hive-common-0.13.0-mapr-1405.jar!/hive-log4j.properties hive set hive.exec.local.scratchdir; hive.exec.local.scratchdir=/tmp/example1 { run with a specified config, check the value of scratchdir, should be example2 … still reported as example1 } -bash-4.1$ pwd /home/spurija -bash-4.1$ hive --config /tmp/hive Logging initialized using configuration in jar:file:/opt/mapr/hive/hive-0.13/lib/hive-common-0.13.0-mapr-1405.jar!/hive-log4j.properties hive set hive.exec.local.scratchdir; hive.exec.local.scratchdir=/tmp/example1 { remove the local config, check the value of scratchdir, should be example2 … now correct } -bash-4.1$ pwd /home/spurija -bash-4.1$ rm hive-site.xml -bash-4.1$ hive --config /tmp/hive Logging initialized using configuration in jar:file:/opt/mapr/hive/hive-0.13/lib/hive-common-0.13.0-mapr-1405.jar!/hive-log4j.properties hive set hive.exec.local.scratchdir; hive.exec.local.scratchdir=/tmp/example2 Is this expected behavior or should it use the directory supplied with --config as the preferred configuration? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7658) Hive search order for hive-site.xml when using --config option
[ https://issues.apache.org/jira/browse/HIVE-7658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Spurin updated HIVE-7658: --- Description: When using the hive cli, the tool appears to favour a hive-site.xml file in the current working directory even if the --config option is used with a valid directory containing a hive-site.xml file. I would have expected the directory specified with --config to take precedence in the CLASSPATH search order. Here's an example - /home/spurija/hive-site.xml = configuration property namehive.exec.local.scratchdir/name value/tmp/example1/value /property /configuration /tmp/hive/hive-site.xml = configuration property namehive.exec.local.scratchdir/name value/tmp/example2/value /property /configuration -bash-4.1$ diff /home/spurija/hive-site.xml /tmp/hive/hive-site.xml 23c23 value/tmp/example1/value --- value/tmp/example2/value { check the value of scratchdir, should be example 1 } -bash-4.1$ pwd /home/spurija -bash-4.1$ hive Logging initialized using configuration in jar:file:/opt/mapr/hive/hive-0.13/lib/hive-common-0.13.0-mapr-1405.jar!/hive-log4j.properties hive set hive.exec.local.scratchdir; hive.exec.local.scratchdir=/tmp/example1 { run with a specified config, check the value of scratchdir, should be example2 … still reported as example1 } -bash-4.1$ pwd /home/spurija -bash-4.1$ hive --config /tmp/hive Logging initialized using configuration in jar:file:/opt/mapr/hive/hive-0.13/lib/hive-common-0.13.0-mapr-1405.jar!/hive-log4j.properties hive set hive.exec.local.scratchdir; hive.exec.local.scratchdir=/tmp/example1 { remove the local config, check the value of scratchdir, should be example2 … now correct } -bash-4.1$ pwd /home/spurija -bash-4.1$ rm hive-site.xml -bash-4.1$ hive --config /tmp/hive Logging initialized using configuration in jar:file:/opt/mapr/hive/hive-0.13/lib/hive-common-0.13.0-mapr-1405.jar!/hive-log4j.properties hive set hive.exec.local.scratchdir; hive.exec.local.scratchdir=/tmp/example2 Is this expected behavior or should it use the directory supplied with --config as the preferred configuration? was: When using the hive cl, the tool appears to favour a hive-site.xml file in the current working directory even if the --config option is used with a valid directory containing a hive-site.xml file. I would have expected the directory specified with --config to take precedence in the CLASSPATH search order. Here's an example - /home/spurija/hive-site.xml = configuration property namehive.exec.local.scratchdir/name value/tmp/example1/value /property /configuration /tmp/hive/hive-site.xml = configuration property namehive.exec.local.scratchdir/name value/tmp/example2/value /property /configuration -bash-4.1$ diff /home/spurija/hive-site.xml /tmp/hive/hive-site.xml 23c23 value/tmp/example1/value --- value/tmp/example2/value { check the value of scratchdir, should be example 1 } -bash-4.1$ pwd /home/spurija -bash-4.1$ hive Logging initialized using configuration in jar:file:/opt/mapr/hive/hive-0.13/lib/hive-common-0.13.0-mapr-1405.jar!/hive-log4j.properties hive set hive.exec.local.scratchdir; hive.exec.local.scratchdir=/tmp/example1 { run with a specified config, check the value of scratchdir, should be example2 … still reported as example1 } -bash-4.1$ pwd /home/spurija -bash-4.1$ hive --config /tmp/hive Logging initialized using configuration in jar:file:/opt/mapr/hive/hive-0.13/lib/hive-common-0.13.0-mapr-1405.jar!/hive-log4j.properties hive set hive.exec.local.scratchdir; hive.exec.local.scratchdir=/tmp/example1 { remove the local config, check the value of scratchdir, should be example2 … now correct } -bash-4.1$ pwd /home/spurija -bash-4.1$ rm hive-site.xml -bash-4.1$ hive --config /tmp/hive Logging initialized using configuration in jar:file:/opt/mapr/hive/hive-0.13/lib/hive-common-0.13.0-mapr-1405.jar!/hive-log4j.properties hive set hive.exec.local.scratchdir; hive.exec.local.scratchdir=/tmp/example2 Is this expected behavior or should it use the directory supplied with --config as the preferred configuration? Hive search order for hive-site.xml when using --config option -- Key: HIVE-7658 URL: https://issues.apache.org/jira/browse/HIVE-7658 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.13.0 Environment: Red Hat Enterprise Linux Server release 5.9 (Tikanga) Hive 0.13.0-mapr-1406 Subversion git://rhbuild/root/builds/opensource/node/ecosystem/dl/hive -r 4ff8f8b4a8fc4862727108204399710ef7ee7abc Compiled by root on Tue Jul 1 14:18:09 PDT 2014 From source with checksum 208afc25260342b51aefd2e0edf4c9d6 Reporter: James Spurin Priority: Minor When using the
[jira] [Commented] (HIVE-4629) HS2 should support an API to retrieve query logs
[ https://issues.apache.org/jira/browse/HIVE-4629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090499#comment-14090499 ] Hive QA commented on HIVE-4629: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12660420/HIVE-4629.6.patch {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 5875 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_join_hash org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_load_hdfs_file_with_space_in_the_name org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx org.apache.hadoop.hive.ql.TestDDLWithRemoteMetastoreSecondNamenode.testCreateTableWithIndexAndPartitionsNonDefaultNameNode org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/221/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/221/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-221/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12660420 HS2 should support an API to retrieve query logs Key: HIVE-4629 URL: https://issues.apache.org/jira/browse/HIVE-4629 Project: Hive Issue Type: Sub-task Components: HiveServer2 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Attachments: HIVE-4629-no_thrift.1.patch, HIVE-4629.1.patch, HIVE-4629.2.patch, HIVE-4629.3.patch.txt, HIVE-4629.4.patch, HIVE-4629.5.patch, HIVE-4629.6.patch HiveServer2 should support an API to retrieve query logs. This is particularly relevant because HiveServer2 supports async execution but doesn't provide a way to report progress. Providing an API to retrieve query logs will help report progress to the client. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark
[ https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li updated HIVE-7624: - Attachment: HIVE-7624.4-spark.patch Reduce operator initialization failed when running multiple MR query on spark - Key: HIVE-7624 URL: https://issues.apache.org/jira/browse/HIVE-7624 Project: Hive Issue Type: Bug Components: Spark Reporter: Rui Li Assignee: Rui Li Attachments: HIVE-7624.2-spark.patch, HIVE-7624.3-spark.patch, HIVE-7624.4-spark.patch, HIVE-7624.patch The following error occurs when I try to run a query with multiple reduce works (M-R-R): {quote} 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1) java.lang.RuntimeException: Reduce operator initialization failed at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170) at org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53) at org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from [0:_col0] at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415) at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147) … {quote} I suspect we're applying the reduce function in wrong order. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark
[ https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090509#comment-14090509 ] Rui Li commented on HIVE-7624: -- Some change may bypass HIVE-7597. Remove it. Reduce operator initialization failed when running multiple MR query on spark - Key: HIVE-7624 URL: https://issues.apache.org/jira/browse/HIVE-7624 Project: Hive Issue Type: Bug Components: Spark Reporter: Rui Li Assignee: Rui Li Attachments: HIVE-7624.2-spark.patch, HIVE-7624.3-spark.patch, HIVE-7624.4-spark.patch, HIVE-7624.patch The following error occurs when I try to run a query with multiple reduce works (M-R-R): {quote} 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1) java.lang.RuntimeException: Reduce operator initialization failed at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170) at org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53) at org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from [0:_col0] at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415) at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147) … {quote} I suspect we're applying the reduce function in wrong order. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark
[ https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090525#comment-14090525 ] Hive QA commented on HIVE-7624: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12660582/HIVE-7624.3-spark.patch {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 5843 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_fs_default_name2 org.apache.hadoop.hive.metastore.txn.TestCompactionTxnHandler.testRevokeTimedOutWorkers org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/23/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/23/console Test logs: http://ec2-54-176-176-199.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-23/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12660582 Reduce operator initialization failed when running multiple MR query on spark - Key: HIVE-7624 URL: https://issues.apache.org/jira/browse/HIVE-7624 Project: Hive Issue Type: Bug Components: Spark Reporter: Rui Li Assignee: Rui Li Attachments: HIVE-7624.2-spark.patch, HIVE-7624.3-spark.patch, HIVE-7624.4-spark.patch, HIVE-7624.patch The following error occurs when I try to run a query with multiple reduce works (M-R-R): {quote} 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1) java.lang.RuntimeException: Reduce operator initialization failed at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170) at org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53) at org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from [0:_col0] at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415) at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147) … {quote} I suspect we're applying the reduce function in wrong order. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-5760) Add vectorized support for CHAR/VARCHAR data types
[ https://issues.apache.org/jira/browse/HIVE-5760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090577#comment-14090577 ] Hive QA commented on HIVE-5760: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12660430/HIVE-5760.2.patch {color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 5894 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_8 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_analyze org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_char_2 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_char_simple org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_varchar_simple org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx org.apache.hadoop.hive.ql.TestDDLWithRemoteMetastoreSecondNamenode.testCreateTableWithIndexAndPartitionsNonDefaultNameNode {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/222/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/222/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-222/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 9 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12660430 Add vectorized support for CHAR/VARCHAR data types -- Key: HIVE-5760 URL: https://issues.apache.org/jira/browse/HIVE-5760 Project: Hive Issue Type: Sub-task Reporter: Eric Hanson Assignee: Matt McCline Attachments: HIVE-5760.1.patch, HIVE-5760.2.patch Add support to allow queries referencing VARCHAR columns and expression results to run efficiently in vectorized mode. This should re-use the code for the STRING type to the extent possible and beneficial. Include unit tests and end-to-end tests. Consider re-using or extending existing end-to-end tests for vectorized string operations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7647) Beeline does not honor --headerInterval and --color when executing with -e
[ https://issues.apache.org/jira/browse/HIVE-7647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090650#comment-14090650 ] Hive QA commented on HIVE-7647: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12660466/HIVE-7647.1.patch {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 5886 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx org.apache.hadoop.hive.ql.TestDDLWithRemoteMetastoreSecondNamenode.testCreateTableWithIndexAndPartitionsNonDefaultNameNode org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/223/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/223/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-223/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12660466 Beeline does not honor --headerInterval and --color when executing with -e Key: HIVE-7647 URL: https://issues.apache.org/jira/browse/HIVE-7647 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.14.0 Reporter: Naveen Gangam Assignee: Naveen Gangam Priority: Minor Fix For: 0.14.0 Attachments: HIVE-7647.1.patch --showHeader is being honored [root@localhost ~]# beeline --showHeader=false -u 'jdbc:hive2://localhost:1/default' -n hive -d org.apache.hive.jdbc.HiveDriver -e select * from sample_07 limit 10; Connecting to jdbc:hive2://localhost:1/default Connected to: Apache Hive (version 0.12.0-cdh5.0.1) Driver: Hive JDBC (version 0.12.0-cdh5.0.1) Transaction isolation: TRANSACTION_REPEATABLE_READ -hiveconf (No such file or directory) +--+--++-+ | 00- | All Occupations | 135185230 | 42270 | | 11- | Management occupations | 6152650| 100310 | | 11-1011 | Chief executives | 301930 | 160440 | | 11-1021 | General and operations managers | 1697690| 107970 | | 11-1031 | Legislators | 64650 | 37980 | | 11-2011 | Advertising and promotions managers | 36100 | 94720 | | 11-2021 | Marketing managers | 166790 | 118160 | | 11-2022 | Sales managers | 333910 | 110390 | | 11-2031 | Public relations managers| 51730 | 101220 | | 11-3011 | Administrative services managers | 246930 | 79500 | +--+--++-+ 10 rows selected (0.838 seconds) Beeline version 0.12.0-cdh5.1.0 by Apache Hive Closing: org.apache.hive.jdbc.HiveConnection --outputFormat is being honored. [root@localhost ~]# beeline --outputFormat=csv -u 'jdbc:hive2://localhost:1/default' -n hive -d org.apache.hive.jdbc.HiveDriver -e select * from sample_07 limit 10; Connecting to jdbc:hive2://localhost:1/default Connected to: Apache Hive (version 0.12.0-cdh5.0.1) Driver: Hive JDBC (version 0.12.0-cdh5.0.1) Transaction isolation: TRANSACTION_REPEATABLE_READ 'code','description','total_emp','salary' '00-','All Occupations','135185230','42270' '11-','Management occupations','6152650','100310' '11-1011','Chief executives','301930','160440' '11-1021','General and operations managers','1697690','107970' '11-1031','Legislators','64650','37980' '11-2011','Advertising and promotions managers','36100','94720' '11-2021','Marketing managers','166790','118160' '11-2022','Sales managers','333910','110390' '11-2031','Public relations managers','51730','101220' '11-3011','Administrative services managers','246930','79500' 10 rows selected (0.664 seconds) Beeline version 0.12.0-cdh5.1.0 by Apache Hive Closing: org.apache.hive.jdbc.HiveConnection both --color --headerInterval are being honored when executing using -f option (reads query from a file rather than the commandline) (cannot really see the color here but use the terminal colors) [root@localhost ~]# beeline --showheader=true --color=true --headerInterval=5 -u
[jira] [Commented] (HIVE-7223) Support generic PartitionSpecs in Metastore partition-functions
[ https://issues.apache.org/jira/browse/HIVE-7223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090656#comment-14090656 ] Hive QA commented on HIVE-7223: --- {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12660470/HIVE-7223.2.patch Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/224/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/224/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-224/ Messages: {noformat} This message was trimmed, see log for full details [ERROR] location: class org.apache.hadoop.hive.metastore.ObjectStore [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/metastore/src/java/org/apache/hadoop/hive/metastore/events/PreAddPartitionEvent.java:[24,55] package org.apache.hadoop.hive.metastore.partition.spec does not exist [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/metastore/src/java/org/apache/hadoop/hive/metastore/events/PreAddPartitionEvent.java:[34,11] cannot find symbol [ERROR] symbol: class PartitionSpecProxy [ERROR] location: class org.apache.hadoop.hive.metastore.events.PreAddPartitionEvent [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/metastore/src/java/org/apache/hadoop/hive/metastore/events/PreAddPartitionEvent.java:[50,44] cannot find symbol [ERROR] symbol: class PartitionSpecProxy [ERROR] location: class org.apache.hadoop.hive.metastore.events.PreAddPartitionEvent [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ThriftHiveMetastore.java:[323,43] cannot find symbol [ERROR] symbol: class PartitionSpec [ERROR] location: interface org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore.AsyncIface [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ThriftHiveMetastore.java:[4786,43] cannot find symbol [ERROR] symbol: class PartitionSpec [ERROR] location: class org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore.AsyncClient [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ThriftHiveMetastore.java:[4794,20] cannot find symbol [ERROR] symbol: class PartitionSpec [ERROR] location: class org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore.AsyncClient.add_partitions_pspec_call [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ThriftHiveMetastore.java:[4795,45] cannot find symbol [ERROR] symbol: class PartitionSpec [ERROR] location: class org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore.AsyncClient.add_partitions_pspec_call [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ThriftHiveMetastore.java:[5484,19] cannot find symbol [ERROR] symbol: class PartitionSpec [ERROR] location: class org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore.AsyncClient.get_partitions_pspec_call [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ThriftHiveMetastore.java:[5733,19] cannot find symbol [ERROR] symbol: class PartitionSpec [ERROR] location: class org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore.AsyncClient.get_part_specs_by_filter_call [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ThriftHiveMetastore.java:[1294,42] cannot find symbol [ERROR] symbol: class PartitionSpec [ERROR] location: class org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore.Client [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ThriftHiveMetastore.java:[1300,48] cannot find symbol [ERROR] symbol: class PartitionSpec [ERROR] location: class org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore.Client [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ThriftHiveMetastore.java:[1853,17] cannot find symbol [ERROR] symbol: class PartitionSpec [ERROR] location: class org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore.Client [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ThriftHiveMetastore.java:[1868,17] cannot find symbol [ERROR] symbol: class PartitionSpec [ERROR] location: class
[jira] [Commented] (HIVE-7649) Support column stats with temporary tables
[ https://issues.apache.org/jira/browse/HIVE-7649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090657#comment-14090657 ] Hive QA commented on HIVE-7649: --- {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12660474/HIVE-7649.1.patch Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/225/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/225/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-225/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]] + export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + export PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + cd /data/hive-ptest/working/ + tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-225/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ svn = \s\v\n ]] + [[ -n '' ]] + [[ -d apache-svn-trunk-source ]] + [[ ! -d apache-svn-trunk-source/.svn ]] + [[ ! -d apache-svn-trunk-source ]] + cd apache-svn-trunk-source + svn revert -R . Reverted 'metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreControlledCommit.java' Reverted 'metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java' Reverted 'metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java' Reverted 'metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java' Reverted 'metastore/src/java/org/apache/hadoop/hive/metastore/events/AddPartitionEvent.java' Reverted 'metastore/src/java/org/apache/hadoop/hive/metastore/events/PreAddPartitionEvent.java' Reverted 'metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java' Reverted 'metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java' Reverted 'metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java' Reverted 'metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java' Reverted 'metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java' Reverted 'metastore/src/gen/thrift/gen-py/hive_metastore/ttypes.py' Reverted 'metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore.py' Reverted 'metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore-remote' Reverted 'metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.cpp' Reverted 'metastore/src/gen/thrift/gen-cpp/hive_metastore_types.cpp' Reverted 'metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.h' Reverted 'metastore/src/gen/thrift/gen-cpp/hive_metastore_types.h' Reverted 'metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore_server.skeleton.cpp' Reverted 'metastore/src/gen/thrift/gen-rb/thrift_hive_metastore.rb' Reverted 'metastore/src/gen/thrift/gen-rb/hive_metastore_types.rb' Reverted 'metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/AggrStats.java' Reverted 'metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ColumnStatistics.java' Reverted 'metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/PartitionsStatsRequest.java' Reverted 'metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ShowCompactResponse.java' Reverted 'metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/EnvironmentContext.java' Reverted 'metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/RequestPartsSpec.java' Reverted 'metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/AddPartitionsRequest.java' Reverted 'metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/HeartbeatTxnRangeResponse.java' Reverted 'metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ShowLocksResponse.java' Reverted
[jira] [Commented] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark
[ https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090689#comment-14090689 ] Hive QA commented on HIVE-7624: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12660590/HIVE-7624.4-spark.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 5828 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_fs_default_name2 {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/24/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/24/console Test logs: http://ec2-54-176-176-199.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-24/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12660590 Reduce operator initialization failed when running multiple MR query on spark - Key: HIVE-7624 URL: https://issues.apache.org/jira/browse/HIVE-7624 Project: Hive Issue Type: Bug Components: Spark Reporter: Rui Li Assignee: Rui Li Attachments: HIVE-7624.2-spark.patch, HIVE-7624.3-spark.patch, HIVE-7624.4-spark.patch, HIVE-7624.patch The following error occurs when I try to run a query with multiple reduce works (M-R-R): {quote} 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1) java.lang.RuntimeException: Reduce operator initialization failed at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170) at org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53) at org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from [0:_col0] at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415) at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147) … {quote} I suspect we're applying the reduce function in wrong order. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7659) Unnecessary sort in query plan
Rui Li created HIVE-7659: Summary: Unnecessary sort in query plan Key: HIVE-7659 URL: https://issues.apache.org/jira/browse/HIVE-7659 Project: Hive Issue Type: Improvement Components: Spark Reporter: Rui Li For hive on spark. Currently we rely on the sort order in RS to decide whether we need a sortByKey transformation. However a simple group by query will also have the sort order set to '+'. Consider the query: select key from table group by key. The RS in the map work will have sort order set to '+', thus requiring a sortByKey shuffle. To avoid the unnecessary sort, we should either use another way to decide if there has to be a sort shuffle, or we should set the sort order only when sort is really needed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7620) Hive metastore fails to start in secure mode due to java.lang.NoSuchFieldError: SASL_PROPS error
[ https://issues.apache.org/jira/browse/HIVE-7620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090713#comment-14090713 ] Hive QA commented on HIVE-7620: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12660498/HIVE-7620.2.patch {color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 5886 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_stats_counter org.apache.hadoop.hive.ql.TestDDLWithRemoteMetastoreSecondNamenode.testCreateTableWithIndexAndPartitionsNonDefaultNameNode {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/226/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/226/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-226/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12660498 Hive metastore fails to start in secure mode due to java.lang.NoSuchFieldError: SASL_PROPS error -- Key: HIVE-7620 URL: https://issues.apache.org/jira/browse/HIVE-7620 Project: Hive Issue Type: Bug Components: Metastore Environment: Hadoop 2.5-snapshot with kerberos authentication on Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: HIVE-7620.1.patch, HIVE-7620.2.patch When Hive metastore is started in a Hadoop 2.5 cluster, it fails to start with following error {code} 14/07/31 17:45:58 [main]: ERROR metastore.HiveMetaStore: Metastore Thrift Server threw an exception... java.lang.NoSuchFieldError: SASL_PROPS at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S.getHadoopSaslProperties(HadoopThriftAuthBridge20S.java:126) at org.apache.hadoop.hive.metastore.MetaStoreUtils.getMetaStoreSaslProperties(MetaStoreUtils.java:1483) at org.apache.hadoop.hive.metastore.HiveMetaStore.startMetaStore(HiveMetaStore.java:5225) at org.apache.hadoop.hive.metastore.HiveMetaStore.main(HiveMetaStore.java:5152) {code} Changes in HADOOP-10451 to remove SaslRpcServer.SASL_PROPS are causing this error. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7645) Hive CompactorMR job set NUM_BUCKETS mistake
[ https://issues.apache.org/jira/browse/HIVE-7645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090728#comment-14090728 ] Xiaoyu Wang commented on HIVE-7645: --- This error should not cause by this patch! Hive CompactorMR job set NUM_BUCKETS mistake Key: HIVE-7645 URL: https://issues.apache.org/jira/browse/HIVE-7645 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 0.13.1 Reporter: Xiaoyu Wang Attachments: HIVE-7645.patch code: job.setInt(NUM_BUCKETS, sd.getBucketColsSize()); should change to: job.setInt(NUM_BUCKETS, sd.getNumBuckets()); -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6959) Enable Constant propagation optimizer for Hive Vectorization
[ https://issues.apache.org/jira/browse/HIVE-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090761#comment-14090761 ] Hive QA commented on HIVE-6959: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12660482/HIVE-6959.5.patch {color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 5885 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_cast_constant org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorization_14 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorization_15 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorization_9 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorization_short_regress org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx org.apache.hadoop.hive.ql.TestDDLWithRemoteMetastoreSecondNamenode.testCreateTableWithIndexAndPartitionsNonDefaultNameNode org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/227/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/227/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-227/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 9 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12660482 Enable Constant propagation optimizer for Hive Vectorization Key: HIVE-6959 URL: https://issues.apache.org/jira/browse/HIVE-6959 Project: Hive Issue Type: Bug Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan Attachments: HIVE-6959.1.patch, HIVE-6959.2.patch, HIVE-6959.4.patch, HIVE-6959.5.patch HIVE-5771 covers Constant propagation optimizer for Hive. Now that HIVE-5771 is committed, we should remove any vectorization related code which duplicates this feature. For example, a fn to be cleaned is VectorizarionContext::foldConstantsForUnaryExprs(). In addition to this change, constant propagation should kick in when vectorization is enabled. i.e. we need to lift the HIVE_VECTORIZATION_ENABLED restriction inside ConstantPropagate::transform(). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7653) Hive AvroSerDe does not support circular references in Schema
[ https://issues.apache.org/jira/browse/HIVE-7653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090766#comment-14090766 ] Hive QA commented on HIVE-7653: --- {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12660500/HIVE-7653.1.patch Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/228/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/228/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-228/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]] + export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + export PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + cd /data/hive-ptest/working/ + tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-228/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ svn = \s\v\n ]] + [[ -n '' ]] + [[ -d apache-svn-trunk-source ]] + [[ ! -d apache-svn-trunk-source/.svn ]] + [[ ! -d apache-svn-trunk-source ]] + cd apache-svn-trunk-source + svn revert -R . Reverted 'ql/src/test/results/clientpositive/vectorization_9.q.out' Reverted 'ql/src/test/results/clientpositive/vectorization_14.q.out' Reverted 'ql/src/test/results/clientpositive/vector_decimal_math_funcs.q.out' Reverted 'ql/src/test/results/clientpositive/vectorization_short_regress.q.out' Reverted 'ql/src/test/results/clientpositive/vectorization_16.q.out' Reverted 'ql/src/test/results/clientpositive/vectorized_parquet.q.out' Reverted 'ql/src/test/results/clientpositive/vector_cast_constant.q.out' Reverted 'ql/src/test/results/clientpositive/vector_elt.q.out' Reverted 'ql/src/test/results/clientpositive/vectorization_div0.q.out' Reverted 'ql/src/test/results/clientpositive/vector_coalesce.q.out' Reverted 'ql/src/test/results/clientpositive/vectorization_15.q.out' Reverted 'ql/src/test/results/clientpositive/vector_decimal_mapjoin.q.out' Reverted 'ql/src/test/results/clientpositive/vector_between_in.q.out' Reverted 'ql/src/test/results/clientpositive/vectorized_math_funcs.q.out' Reverted 'ql/src/test/org/apache/hadoop/hive/ql/exec/vector/TestVectorizationContext.java' Reverted 'ql/src/test/queries/clientpositive/vector_coalesce.q' Reverted 'ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java' Reverted 'ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConstantPropagate.java' Reverted 'ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/ConstantVectorExpression.java' Reverted 'ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorExpressionWriterFactory.java' Reverted 'ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java' ++ egrep -v '^X|^Performing status on external' ++ awk '{print $2}' ++ svn status --no-ignore + rm -rf target datanucleus.log ant/target shims/target shims/0.20/target shims/0.20S/target shims/0.23/target shims/aggregator/target shims/common/target shims/common-secure/target packaging/target hbase-handler/target testutils/target jdbc/target metastore/target itests/target itests/hcatalog-unit/target itests/test-serde/target itests/qtest/target itests/hive-unit-hadoop2/target itests/hive-minikdc/target itests/hive-unit/target itests/custom-serde/target itests/util/target hcatalog/target hcatalog/core/target hcatalog/streaming/target hcatalog/server-extensions/target hcatalog/webhcat/svr/target hcatalog/webhcat/java-client/target hcatalog/hcatalog-pig-adapter/target hwi/target common/target common/src/gen service/target contrib/target serde/target beeline/target odbc/target cli/target ql/dependency-reduced-pom.xml ql/target ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java.orig + svn update Fetching external item into
[jira] [Updated] (HIVE-7373) Hive should not remove trailing zeros for decimal numbers
[ https://issues.apache.org/jira/browse/HIVE-7373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-7373: -- Attachment: HIVE-7373.1.patch Hive should not remove trailing zeros for decimal numbers - Key: HIVE-7373 URL: https://issues.apache.org/jira/browse/HIVE-7373 Project: Hive Issue Type: Bug Components: Types Affects Versions: 0.13.0, 0.13.1 Reporter: Xuefu Zhang Assignee: Sergio Peña Attachments: HIVE-7373.1.patch Currently Hive blindly removes trailing zeros of a decimal input number as sort of standardization. This is questionable in theory and problematic in practice. 1. In decimal context, number 3.14 has a different semantic meaning from number 3.14. Removing trailing zeroes makes the meaning lost. 2. In a extreme case, 0.0 has (p, s) as (1, 1). Hive removes trailing zeros, and then the number becomes 0, which has (p, s) of (1, 0). Thus, for a decimal column of (1,1), input such as 0.0, 0.00, and so on becomes NULL because the column doesn't allow a decimal number with integer part. Therefore, I propose Hive preserve the trailing zeroes (up to what the scale allows). With this, in above example, 0.0, 0.00, and 0. will be represented as 0.0 (precision=1, scale=1) internally. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7373) Hive should not remove trailing zeros for decimal numbers
[ https://issues.apache.org/jira/browse/HIVE-7373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-7373: -- Status: Patch Available (was: In Progress) Hive should not remove trailing zeros for decimal numbers - Key: HIVE-7373 URL: https://issues.apache.org/jira/browse/HIVE-7373 Project: Hive Issue Type: Bug Components: Types Affects Versions: 0.13.1, 0.13.0 Reporter: Xuefu Zhang Assignee: Sergio Peña Attachments: HIVE-7373.1.patch Currently Hive blindly removes trailing zeros of a decimal input number as sort of standardization. This is questionable in theory and problematic in practice. 1. In decimal context, number 3.14 has a different semantic meaning from number 3.14. Removing trailing zeroes makes the meaning lost. 2. In a extreme case, 0.0 has (p, s) as (1, 1). Hive removes trailing zeros, and then the number becomes 0, which has (p, s) of (1, 0). Thus, for a decimal column of (1,1), input such as 0.0, 0.00, and so on becomes NULL because the column doesn't allow a decimal number with integer part. Therefore, I propose Hive preserve the trailing zeroes (up to what the scale allows). With this, in above example, 0.0, 0.00, and 0. will be represented as 0.0 (precision=1, scale=1) internally. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7373) Hive should not remove trailing zeros for decimal numbers
[ https://issues.apache.org/jira/browse/HIVE-7373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090818#comment-14090818 ] Sergio Peña commented on HIVE-7373: --- RB: https://reviews.apache.org/r/24467/ Hive should not remove trailing zeros for decimal numbers - Key: HIVE-7373 URL: https://issues.apache.org/jira/browse/HIVE-7373 Project: Hive Issue Type: Bug Components: Types Affects Versions: 0.13.0, 0.13.1 Reporter: Xuefu Zhang Assignee: Sergio Peña Attachments: HIVE-7373.1.patch Currently Hive blindly removes trailing zeros of a decimal input number as sort of standardization. This is questionable in theory and problematic in practice. 1. In decimal context, number 3.14 has a different semantic meaning from number 3.14. Removing trailing zeroes makes the meaning lost. 2. In a extreme case, 0.0 has (p, s) as (1, 1). Hive removes trailing zeros, and then the number becomes 0, which has (p, s) of (1, 0). Thus, for a decimal column of (1,1), input such as 0.0, 0.00, and so on becomes NULL because the column doesn't allow a decimal number with integer part. Therefore, I propose Hive preserve the trailing zeroes (up to what the scale allows). With this, in above example, 0.0, 0.00, and 0. will be represented as 0.0 (precision=1, scale=1) internally. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Work started] (HIVE-7373) Hive should not remove trailing zeros for decimal numbers
[ https://issues.apache.org/jira/browse/HIVE-7373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-7373 started by Sergio Peña. Hive should not remove trailing zeros for decimal numbers - Key: HIVE-7373 URL: https://issues.apache.org/jira/browse/HIVE-7373 Project: Hive Issue Type: Bug Components: Types Affects Versions: 0.13.0, 0.13.1 Reporter: Xuefu Zhang Assignee: Sergio Peña Currently Hive blindly removes trailing zeros of a decimal input number as sort of standardization. This is questionable in theory and problematic in practice. 1. In decimal context, number 3.14 has a different semantic meaning from number 3.14. Removing trailing zeroes makes the meaning lost. 2. In a extreme case, 0.0 has (p, s) as (1, 1). Hive removes trailing zeros, and then the number becomes 0, which has (p, s) of (1, 0). Thus, for a decimal column of (1,1), input such as 0.0, 0.00, and so on becomes NULL because the column doesn't allow a decimal number with integer part. Therefore, I propose Hive preserve the trailing zeroes (up to what the scale allows). With this, in above example, 0.0, 0.00, and 0. will be represented as 0.0 (precision=1, scale=1) internally. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7446) Add support to ALTER TABLE .. ADD COLUMN to Avro backed tables
[ https://issues.apache.org/jira/browse/HIVE-7446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090834#comment-14090834 ] Hive QA commented on HIVE-7446: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12660511/HIVE-7446.1.patch {color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 5889 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx org.apache.hadoop.hive.ql.TestDDLWithRemoteMetastoreSecondNamenode.testCreateTableWithIndexAndPartitionsNonDefaultNameNode org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/229/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/229/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-229/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12660511 Add support to ALTER TABLE .. ADD COLUMN to Avro backed tables -- Key: HIVE-7446 URL: https://issues.apache.org/jira/browse/HIVE-7446 Project: Hive Issue Type: New Feature Reporter: Ashish Kumar Singh Assignee: Ashish Kumar Singh Attachments: HIVE-7446.1.patch, HIVE-7446.patch HIVE-6806 adds native support for creating hive table stored as Avro. It would be good to add support to ALTER TABLE .. ADD COLUMN to Avro backed tables. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7229) String is compared using equal in HiveMetaStore#HMSHandler#init()
[ https://issues.apache.org/jira/browse/HIVE-7229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7229: --- Assignee: KangHS String is compared using equal in HiveMetaStore#HMSHandler#init() - Key: HIVE-7229 URL: https://issues.apache.org/jira/browse/HIVE-7229 Project: Hive Issue Type: Bug Reporter: Ted Yu Assignee: KangHS Priority: Minor Fix For: 0.14.0 Attachments: HIVE-7229.1.patch, HIVE-7229.patch, HIVE-7229.patch Around line 423: {code} if (partitionValidationRegex != null partitionValidationRegex != ) { partitionValidationPattern = Pattern.compile(partitionValidationRegex); {code} partitionValidationRegex.isEmpty() can be used instead. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7229) String is compared using equal in HiveMetaStore#HMSHandler#init()
[ https://issues.apache.org/jira/browse/HIVE-7229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7229: --- Resolution: Fixed Fix Version/s: 0.14.0 Status: Resolved (was: Patch Available) String is compared using equal in HiveMetaStore#HMSHandler#init() - Key: HIVE-7229 URL: https://issues.apache.org/jira/browse/HIVE-7229 Project: Hive Issue Type: Bug Reporter: Ted Yu Assignee: KangHS Priority: Minor Fix For: 0.14.0 Attachments: HIVE-7229.1.patch, HIVE-7229.patch, HIVE-7229.patch Around line 423: {code} if (partitionValidationRegex != null partitionValidationRegex != ) { partitionValidationPattern = Pattern.compile(partitionValidationRegex); {code} partitionValidationRegex.isEmpty() can be used instead. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7600) ConstantPropagateProcFactory uses reference equality on Boolean
[ https://issues.apache.org/jira/browse/HIVE-7600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7600: --- Assignee: KangHS ConstantPropagateProcFactory uses reference equality on Boolean --- Key: HIVE-7600 URL: https://issues.apache.org/jira/browse/HIVE-7600 Project: Hive Issue Type: Bug Reporter: Ted Yu Assignee: KangHS Attachments: HIVE-7600.patch shortcutFunction() has the following code: {code} if (c.getValue() == Boolean.FALSE) { {code} Boolean.FALSE.equals() should be used. There're a few other occurrences of using reference equality on Boolean in this class. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7600) ConstantPropagateProcFactory uses reference equality on Boolean
[ https://issues.apache.org/jira/browse/HIVE-7600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7600: --- Resolution: Fixed Fix Version/s: 0.14.0 Status: Resolved (was: Patch Available) Committed to trunk. Thanks, KangHS! ConstantPropagateProcFactory uses reference equality on Boolean --- Key: HIVE-7600 URL: https://issues.apache.org/jira/browse/HIVE-7600 Project: Hive Issue Type: Bug Reporter: Ted Yu Assignee: KangHS Fix For: 0.14.0 Attachments: HIVE-7600.patch shortcutFunction() has the following code: {code} if (c.getValue() == Boolean.FALSE) { {code} Boolean.FALSE.equals() should be used. There're a few other occurrences of using reference equality on Boolean in this class. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7636) cbo fails when no projection is required from aggregate function
[ https://issues.apache.org/jira/browse/HIVE-7636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7636: --- Resolution: Fixed Status: Resolved (was: Patch Available) Committed to cbo branch. cbo fails when no projection is required from aggregate function Key: HIVE-7636 URL: https://issues.apache.org/jira/browse/HIVE-7636 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: h-7636.patch select count (*) from t1 join t2 on t1.c1=t2.c2 fails -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7660) Hive to support qualify analytic filtering
Viji created HIVE-7660: -- Summary: Hive to support qualify analytic filtering Key: HIVE-7660 URL: https://issues.apache.org/jira/browse/HIVE-7660 Project: Hive Issue Type: New Feature Reporter: Viji Priority: Trivial Currently, Hive does not support qualify analytic filtering. It would be useful fi this feature were added in the future. As a workaround, since it is just a filter, we can replace it with a subquery and filter. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7341) Support for Table replication across HCatalog instances
[ https://issues.apache.org/jira/browse/HIVE-7341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan updated HIVE-7341: --- Attachment: HIVE-7341.3.patch Added documentation for MetadataSerializer, and subclass. Support for Table replication across HCatalog instances --- Key: HIVE-7341 URL: https://issues.apache.org/jira/browse/HIVE-7341 Project: Hive Issue Type: New Feature Components: HCatalog Affects Versions: 0.13.1 Reporter: Mithun Radhakrishnan Assignee: Mithun Radhakrishnan Fix For: 0.14.0 Attachments: HIVE-7341.1.patch, HIVE-7341.2.patch, HIVE-7341.3.patch The HCatClient currently doesn't provide very much support for replicating HCatTable definitions between 2 HCatalog Server (i.e. Hive metastore) instances. Systems similar to Apache Falcon might find the need to replicate partition data between 2 clusters, and keep the HCatalog metadata in sync between the two. This poses a couple of problems: # The definition of the source table might change (in column schema, I/O formats, record-formats, serde-parameters, etc.) The system will need a way to diff 2 tables and update the target-metastore with the changes. E.g. {code} targetTable.resolve( sourceTable, targetTable.diff(sourceTable) ); hcatClient.updateTableSchema(dbName, tableName, targetTable); {code} # The current {{HCatClient.addPartitions()}} API requires that the partition's schema be derived from the table's schema, thereby requiring that the table-schema be resolved *before* partitions with the new schema are added to the table. This is problematic, because it introduces race conditions when 2 partitions with differing column-schemas (e.g. right after a schema change) are copied in parallel. This can be avoided if each HCatAddPartitionDesc kept track of the partition's schema, in flight. # The source and target metastores might be running different/incompatible versions of Hive. The impending patch attempts to address these concerns (with some caveats). # {{HCatTable}} now has ## a {{diff()}} method, to compare against another HCatTable instance ## a {{resolve(diff)}} method to copy over specified table-attributes from another HCatTable ## a serialize/deserialize mechanism (via {{HCatClient.serializeTable()}} and {{HCatClient.deserializeTable()}}), so that HCatTable instances constructed in other class-loaders may be used for comparison # {{HCatPartition}} now provides finer-grained control over a Partition's column-schema, StorageDescriptor settings, etc. This allows partitions to be copied completely from source, with the ability to override specific properties if required (e.g. location). # {{HCatClient.updateTableSchema()}} can now update the entire table-definition, not just the column schema. # I've cleaned up and removed most of the redundancy between the HCatTable, HCatCreateTableDesc and HCatCreateTableDesc.Builder. The prior API failed to separate the table-attributes from the add-table-operation's attributes. By providing fluent-interfaces in HCatTable, and composing an HCatTable instance in HCatCreateTableDesc, the interfaces are cleaner(ish). The old setters are deprecated, in favour of those in HCatTable. Likewise, HCatPartition and HCatAddPartitionDesc. I'll post a patch for trunk shortly. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7341) Support for Table replication across HCatalog instances
[ https://issues.apache.org/jira/browse/HIVE-7341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan updated HIVE-7341: --- Status: Open (was: Patch Available) Support for Table replication across HCatalog instances --- Key: HIVE-7341 URL: https://issues.apache.org/jira/browse/HIVE-7341 Project: Hive Issue Type: New Feature Components: HCatalog Affects Versions: 0.13.1 Reporter: Mithun Radhakrishnan Assignee: Mithun Radhakrishnan Fix For: 0.14.0 Attachments: HIVE-7341.1.patch, HIVE-7341.2.patch, HIVE-7341.3.patch The HCatClient currently doesn't provide very much support for replicating HCatTable definitions between 2 HCatalog Server (i.e. Hive metastore) instances. Systems similar to Apache Falcon might find the need to replicate partition data between 2 clusters, and keep the HCatalog metadata in sync between the two. This poses a couple of problems: # The definition of the source table might change (in column schema, I/O formats, record-formats, serde-parameters, etc.) The system will need a way to diff 2 tables and update the target-metastore with the changes. E.g. {code} targetTable.resolve( sourceTable, targetTable.diff(sourceTable) ); hcatClient.updateTableSchema(dbName, tableName, targetTable); {code} # The current {{HCatClient.addPartitions()}} API requires that the partition's schema be derived from the table's schema, thereby requiring that the table-schema be resolved *before* partitions with the new schema are added to the table. This is problematic, because it introduces race conditions when 2 partitions with differing column-schemas (e.g. right after a schema change) are copied in parallel. This can be avoided if each HCatAddPartitionDesc kept track of the partition's schema, in flight. # The source and target metastores might be running different/incompatible versions of Hive. The impending patch attempts to address these concerns (with some caveats). # {{HCatTable}} now has ## a {{diff()}} method, to compare against another HCatTable instance ## a {{resolve(diff)}} method to copy over specified table-attributes from another HCatTable ## a serialize/deserialize mechanism (via {{HCatClient.serializeTable()}} and {{HCatClient.deserializeTable()}}), so that HCatTable instances constructed in other class-loaders may be used for comparison # {{HCatPartition}} now provides finer-grained control over a Partition's column-schema, StorageDescriptor settings, etc. This allows partitions to be copied completely from source, with the ability to override specific properties if required (e.g. location). # {{HCatClient.updateTableSchema()}} can now update the entire table-definition, not just the column schema. # I've cleaned up and removed most of the redundancy between the HCatTable, HCatCreateTableDesc and HCatCreateTableDesc.Builder. The prior API failed to separate the table-attributes from the add-table-operation's attributes. By providing fluent-interfaces in HCatTable, and composing an HCatTable instance in HCatCreateTableDesc, the interfaces are cleaner(ish). The old setters are deprecated, in favour of those in HCatTable. Likewise, HCatPartition and HCatAddPartitionDesc. I'll post a patch for trunk shortly. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7341) Support for Table replication across HCatalog instances
[ https://issues.apache.org/jira/browse/HIVE-7341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan updated HIVE-7341: --- Status: Patch Available (was: Open) Support for Table replication across HCatalog instances --- Key: HIVE-7341 URL: https://issues.apache.org/jira/browse/HIVE-7341 Project: Hive Issue Type: New Feature Components: HCatalog Affects Versions: 0.13.1 Reporter: Mithun Radhakrishnan Assignee: Mithun Radhakrishnan Fix For: 0.14.0 Attachments: HIVE-7341.1.patch, HIVE-7341.2.patch, HIVE-7341.3.patch The HCatClient currently doesn't provide very much support for replicating HCatTable definitions between 2 HCatalog Server (i.e. Hive metastore) instances. Systems similar to Apache Falcon might find the need to replicate partition data between 2 clusters, and keep the HCatalog metadata in sync between the two. This poses a couple of problems: # The definition of the source table might change (in column schema, I/O formats, record-formats, serde-parameters, etc.) The system will need a way to diff 2 tables and update the target-metastore with the changes. E.g. {code} targetTable.resolve( sourceTable, targetTable.diff(sourceTable) ); hcatClient.updateTableSchema(dbName, tableName, targetTable); {code} # The current {{HCatClient.addPartitions()}} API requires that the partition's schema be derived from the table's schema, thereby requiring that the table-schema be resolved *before* partitions with the new schema are added to the table. This is problematic, because it introduces race conditions when 2 partitions with differing column-schemas (e.g. right after a schema change) are copied in parallel. This can be avoided if each HCatAddPartitionDesc kept track of the partition's schema, in flight. # The source and target metastores might be running different/incompatible versions of Hive. The impending patch attempts to address these concerns (with some caveats). # {{HCatTable}} now has ## a {{diff()}} method, to compare against another HCatTable instance ## a {{resolve(diff)}} method to copy over specified table-attributes from another HCatTable ## a serialize/deserialize mechanism (via {{HCatClient.serializeTable()}} and {{HCatClient.deserializeTable()}}), so that HCatTable instances constructed in other class-loaders may be used for comparison # {{HCatPartition}} now provides finer-grained control over a Partition's column-schema, StorageDescriptor settings, etc. This allows partitions to be copied completely from source, with the ability to override specific properties if required (e.g. location). # {{HCatClient.updateTableSchema()}} can now update the entire table-definition, not just the column schema. # I've cleaned up and removed most of the redundancy between the HCatTable, HCatCreateTableDesc and HCatCreateTableDesc.Builder. The prior API failed to separate the table-attributes from the add-table-operation's attributes. By providing fluent-interfaces in HCatTable, and composing an HCatTable instance in HCatCreateTableDesc, the interfaces are cleaner(ish). The old setters are deprecated, in favour of those in HCatTable. Likewise, HCatPartition and HCatAddPartitionDesc. I'll post a patch for trunk shortly. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7661) Observed performance issues while sorting using Hive's Parallel Order by clause while retaining pre-existing sort order.
Vishal Kamath created HIVE-7661: --- Summary: Observed performance issues while sorting using Hive's Parallel Order by clause while retaining pre-existing sort order. Key: HIVE-7661 URL: https://issues.apache.org/jira/browse/HIVE-7661 Project: Hive Issue Type: Bug Components: Logical Optimizer Affects Versions: 0.12.0 Environment: Cloudera 5.0 hive-0.12.0-cdh5.0.0 Red Hat Linux Reporter: Vishal Kamath Fix For: 0.12.1 Improve Hive's sampling logic to accommodate use cases that require to retain the pre existing sort in the underlying source table. In order to support Parallel order by clause, Hive Samples the source table based on values provided to hive.optimize.sampling.orderby.number and hive.optimize.sampling.orderby.percent. This does work with reasonable performance when sorting is performed on a columns having random distribution of data but has severe performance issues when retaining the sort order. Let us try to understand this with an example. insert overwrite table lineitem_temp_report select l_orderkey, l_partkey, l_suppkey, l_linenumber, l_quantity, l_extendedprice, l_discount, l_tax, l_returnflag, l_linestatus, l_shipdate, l_commitdate, l_receiptdate, l_shipinstruct, l_shipmode, l_comment from lineitem order by l_orderkey, l_partkey, l_suppkey; Sample data set for lineitem table. The first column represents the l_orderKey and is sorted. l_orderkey|l_partkey|l_suppkey|l_linenumber|l_quantity|l_extendedprice|l_discount|l_tax|l_returnflag|l_linestatus|l_shipdate|l_commitdate|l_receiptdate|l_shipinstruct|l_shipmode|l_comment 197|1771022|96040|2|8|8743.52|0.09|0.02|A|F|1995-04-17|1995-07-01|1995-0 197|1771022|96040|2|8|4-27|DELIVER IN PERSON|SHIP|y blithely even 197|1771022|96040|2|8|deposits. blithely fina| 197|1558290|83306|3|17|22919.74|0.06|0.02|N|O|1995-08-02|1995-06-23|1995 197|1558290|83306|3|17|-08-03|COLLECT COD|REG AIR|ts. careful| 197|179355|29358|4|25|35858.75|0.04|0.01|N|F|1995-06-13|1995-05-23|1995- 197|179355|29358|4|25|06-24|TAKE BACK RETURN|FOB|s-- quickly final 197|179355|29358|4|25|accounts| 197|414653|39658|5|14|21946.82|0.09|0.01|R|F|1995-05-08|1995-05-24|1995- 197|414653|39658|5|14|05-12|TAKE BACK RETURN|RAIL|use slyly slyly silent 197|414653|39658|5|14|depo| 197|1058800|8821|6|1|1758.75|0.07|0.05|N|O|1995-07-15|1995-06-21|1995-08 197|1058800|8821|6|1|-11|COLLECT COD|RAIL| even, thin dependencies sno| 198|560609|60610|1|33|55096.14|0.07|0.02|N|O|1998-01-05|1998-03-20|1998- 198|560609|60610|1|33|01-10|TAKE BACK RETURN|TRUCK|carefully caref| 198|152287|77289|2|20|26785.60|0.03|0.00|N|O|1998-01-15|1998-03-31|1998- 198|152287|77289|2|20|01-25|DELIVER IN PERSON|FOB|carefully final 198|152287|77289|2|20|escapades a| 224|1899665|74720|3|41|68247.37|0.07|0.04|A|F|1994-09-01|1994-09-15|1994 224|1899665|74720|3|41|-09-02|TAKE BACK RETURN|SHIP|after the furiou| When we try to either sort on a presorted column or do a multi-column sort while trying to retain the sort order on the source table, Source table lineitem has 600 million rows. We don't see equal distribution of data to the reducers. Out of 100 reducers, 99 complete in less than 40 seconds. The last reducer is doing the bulk of the work processing nearly 570 million rows. So, let us understand what is going wrong here .. on a table having 600 million records with orderkey column sorted, i created temp table with 10% sampling. insert overwrite table sampTempTbl (select * from lineitem tablesample (10 percent) t); select min(l_orderkey), max(l_orderkey) from sampTempTbl ; 12306309,142321700 where as on the source table, the orderkey range (select min(l_orderkey), max(l_orderkey) from lineitem) is 1 and 6 So naturally bulk of the records will be directed towards single reducer. One way to work around this problem is to increase the hive.optimize.sampling.orderby.number to a larger value (as close as the # rows in the input source table). But then we will have to provide higher heap (hive-env.sh) for hive, otherwise it will fail while creating the Sampling Data. With larger data volume, it is not practical to sample the entire data set. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7223) Support generic PartitionSpecs in Metastore partition-functions
[ https://issues.apache.org/jira/browse/HIVE-7223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan updated HIVE-7223: --- Status: Open (was: Patch Available) Support generic PartitionSpecs in Metastore partition-functions --- Key: HIVE-7223 URL: https://issues.apache.org/jira/browse/HIVE-7223 Project: Hive Issue Type: Improvement Components: HCatalog, Metastore Affects Versions: 0.13.0, 0.12.0 Reporter: Mithun Radhakrishnan Assignee: Mithun Radhakrishnan Attachments: HIVE-7223.1.patch, HIVE-7223.2.patch Currently, the functions in the HiveMetaStore API that handle multiple partitions do so using ListPartition. E.g. {code} public ListPartition listPartitions(String db_name, String tbl_name, short max_parts); public ListPartition listPartitionsByFilter(String db_name, String tbl_name, String filter, short max_parts); public int add_partitions(ListPartition new_parts); {code} Partition objects are fairly heavyweight, since each Partition carries its own copy of a StorageDescriptor, partition-values, etc. Tables with tens of thousands of partitions take so long to have their partitions listed that the client times out with default hive.metastore.client.socket.timeout. There is the additional expense of serializing and deserializing metadata for large sets of partitions, w.r.t time and heap-space. Reducing the thrift traffic should help in this regard. In a date-partitioned table, all sub-partitions for a particular date are *likely* (but not expected) to have: # The same base directory (e.g. {{/feeds/search/20140601/}}) # Similar directory structure (e.g. {{/feeds/search/20140601/[US,UK,IN]}}) # The same SerDe/StorageHandler/IOFormat classes # Sorting/Bucketing/SkewInfo settings In this “most likely” scenario (henceforth termed “normal”), it’s possible to represent the partition-list (for a date) in a more condensed form: a list of LighterPartition instances, all sharing a common StorageDescriptor whose location points to the root directory. We can go one better for the {{add_partitions()}} case: When adding all partitions for a given date, the “normal” case affords us the ability to specify the top-level date-directory, where sub-partitions can be inferred from the HDFS directory-path. These extensions are hard to introduce at the metastore-level, since partition-functions explicitly specify {{ListPartition}} arguments. I wonder if a {{PartitionSpec}} interface might help: {code} public PartitionSpec listPartitions(db_name, tbl_name, max_parts) throws ... ; public int add_partitions( PartitionSpec new_parts ) throws … ; {code} where the PartitionSpec looks like: {code} public interface PartitionSpec { public ListPartition getPartitions(); public ListString getPartNames(); public IteratorPartition getPartitionIter(); public IteratorString getPartNameIter(); } {code} For addPartitions(), an {{HDFSDirBasedPartitionSpec}} class could implement {{PartitionSpec}}, store a top-level directory, and return Partition instances from sub-directory names, while storing a single StorageDescriptor for all of them. Similarly, list_partitions() could return a ListPartitionSpec, where each PartitionSpec corresponds to a set or partitions that can share a StorageDescriptor. By exposing iterator semantics, neither the client nor the metastore need instantiate all partitions at once. That should help with memory requirements. In case no smart grouping is possible, we could just fall back on a {{DefaultPartitionSpec}} which composes {{ListPartition}}, and is no worse than status quo. PartitionSpec abstracts away how a set of partitions may be represented. A tighter representation allows us to communicate metadata for a larger number of Partitions, with less Thrift traffic. Given that Thrift doesn’t support polymorphism, we’d have to implement the PartitionSpec as a Thrift Union of supported implementations. (We could convert from the Thrift PartitionSpec to the appropriate Java PartitionSpec sub-class.) Thoughts? -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 24085: HIVE-7446: Add support to ALTER TABLE .. ADD COLUMN to Avro backed tables
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24085/#review50049 --- Ship it! Ship It! - Tom White On Aug. 8, 2014, midnight, Ashish Singh wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24085/ --- (Updated Aug. 8, 2014, midnight) Review request for hive. Bugs: HIVE-7446 https://issues.apache.org/jira/browse/HIVE-7446 Repository: hive-git Description --- HIVE-7446: Add support to ALTER TABLE .. ADD COLUMN to Avro backed tables Diffs - ql/src/test/queries/clientpositive/avro_add_column.q PRE-CREATION ql/src/test/queries/clientpositive/avro_add_column2.q PRE-CREATION ql/src/test/queries/clientpositive/avro_add_column3.q PRE-CREATION ql/src/test/results/clientpositive/avro_add_column.q.out PRE-CREATION ql/src/test/results/clientpositive/avro_add_column2.q.out PRE-CREATION ql/src/test/results/clientpositive/avro_add_column3.q.out PRE-CREATION serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java 915f01679183904d0d93b9b8a88dc1a64ac2af78 serde/src/test/org/apache/hadoop/hive/serde2/avro/TestTypeInfoToSchema.java 722bdf9f8452fe8632db7d9167182310e467281d serde/src/test/resources/avro-nested-struct.avsc 785af83cd01fe91626741b3d7659d8f515854774 serde/src/test/resources/avro-struct.avsc 313c74f6140615d2737ef1a49a2777656f35f4e3 Diff: https://reviews.apache.org/r/24085/diff/ Testing --- qTests Thanks, Ashish Singh
[jira] [Updated] (HIVE-7223) Support generic PartitionSpecs in Metastore partition-functions
[ https://issues.apache.org/jira/browse/HIVE-7223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan updated HIVE-7223: --- Attachment: (was: HIVE-7223.2.patch) Support generic PartitionSpecs in Metastore partition-functions --- Key: HIVE-7223 URL: https://issues.apache.org/jira/browse/HIVE-7223 Project: Hive Issue Type: Improvement Components: HCatalog, Metastore Affects Versions: 0.12.0, 0.13.0 Reporter: Mithun Radhakrishnan Assignee: Mithun Radhakrishnan Attachments: HIVE-7223.1.patch Currently, the functions in the HiveMetaStore API that handle multiple partitions do so using ListPartition. E.g. {code} public ListPartition listPartitions(String db_name, String tbl_name, short max_parts); public ListPartition listPartitionsByFilter(String db_name, String tbl_name, String filter, short max_parts); public int add_partitions(ListPartition new_parts); {code} Partition objects are fairly heavyweight, since each Partition carries its own copy of a StorageDescriptor, partition-values, etc. Tables with tens of thousands of partitions take so long to have their partitions listed that the client times out with default hive.metastore.client.socket.timeout. There is the additional expense of serializing and deserializing metadata for large sets of partitions, w.r.t time and heap-space. Reducing the thrift traffic should help in this regard. In a date-partitioned table, all sub-partitions for a particular date are *likely* (but not expected) to have: # The same base directory (e.g. {{/feeds/search/20140601/}}) # Similar directory structure (e.g. {{/feeds/search/20140601/[US,UK,IN]}}) # The same SerDe/StorageHandler/IOFormat classes # Sorting/Bucketing/SkewInfo settings In this “most likely” scenario (henceforth termed “normal”), it’s possible to represent the partition-list (for a date) in a more condensed form: a list of LighterPartition instances, all sharing a common StorageDescriptor whose location points to the root directory. We can go one better for the {{add_partitions()}} case: When adding all partitions for a given date, the “normal” case affords us the ability to specify the top-level date-directory, where sub-partitions can be inferred from the HDFS directory-path. These extensions are hard to introduce at the metastore-level, since partition-functions explicitly specify {{ListPartition}} arguments. I wonder if a {{PartitionSpec}} interface might help: {code} public PartitionSpec listPartitions(db_name, tbl_name, max_parts) throws ... ; public int add_partitions( PartitionSpec new_parts ) throws … ; {code} where the PartitionSpec looks like: {code} public interface PartitionSpec { public ListPartition getPartitions(); public ListString getPartNames(); public IteratorPartition getPartitionIter(); public IteratorString getPartNameIter(); } {code} For addPartitions(), an {{HDFSDirBasedPartitionSpec}} class could implement {{PartitionSpec}}, store a top-level directory, and return Partition instances from sub-directory names, while storing a single StorageDescriptor for all of them. Similarly, list_partitions() could return a ListPartitionSpec, where each PartitionSpec corresponds to a set or partitions that can share a StorageDescriptor. By exposing iterator semantics, neither the client nor the metastore need instantiate all partitions at once. That should help with memory requirements. In case no smart grouping is possible, we could just fall back on a {{DefaultPartitionSpec}} which composes {{ListPartition}}, and is no worse than status quo. PartitionSpec abstracts away how a set of partitions may be represented. A tighter representation allows us to communicate metadata for a larger number of Partitions, with less Thrift traffic. Given that Thrift doesn’t support polymorphism, we’d have to implement the PartitionSpec as a Thrift Union of supported implementations. (We could convert from the Thrift PartitionSpec to the appropriate Java PartitionSpec sub-class.) Thoughts? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7223) Support generic PartitionSpecs in Metastore partition-functions
[ https://issues.apache.org/jira/browse/HIVE-7223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan updated HIVE-7223: --- Status: Patch Available (was: Open) Support generic PartitionSpecs in Metastore partition-functions --- Key: HIVE-7223 URL: https://issues.apache.org/jira/browse/HIVE-7223 Project: Hive Issue Type: Improvement Components: HCatalog, Metastore Affects Versions: 0.13.0, 0.12.0 Reporter: Mithun Radhakrishnan Assignee: Mithun Radhakrishnan Attachments: HIVE-7223.1.patch, HIVE-7223.2.patch Currently, the functions in the HiveMetaStore API that handle multiple partitions do so using ListPartition. E.g. {code} public ListPartition listPartitions(String db_name, String tbl_name, short max_parts); public ListPartition listPartitionsByFilter(String db_name, String tbl_name, String filter, short max_parts); public int add_partitions(ListPartition new_parts); {code} Partition objects are fairly heavyweight, since each Partition carries its own copy of a StorageDescriptor, partition-values, etc. Tables with tens of thousands of partitions take so long to have their partitions listed that the client times out with default hive.metastore.client.socket.timeout. There is the additional expense of serializing and deserializing metadata for large sets of partitions, w.r.t time and heap-space. Reducing the thrift traffic should help in this regard. In a date-partitioned table, all sub-partitions for a particular date are *likely* (but not expected) to have: # The same base directory (e.g. {{/feeds/search/20140601/}}) # Similar directory structure (e.g. {{/feeds/search/20140601/[US,UK,IN]}}) # The same SerDe/StorageHandler/IOFormat classes # Sorting/Bucketing/SkewInfo settings In this “most likely” scenario (henceforth termed “normal”), it’s possible to represent the partition-list (for a date) in a more condensed form: a list of LighterPartition instances, all sharing a common StorageDescriptor whose location points to the root directory. We can go one better for the {{add_partitions()}} case: When adding all partitions for a given date, the “normal” case affords us the ability to specify the top-level date-directory, where sub-partitions can be inferred from the HDFS directory-path. These extensions are hard to introduce at the metastore-level, since partition-functions explicitly specify {{ListPartition}} arguments. I wonder if a {{PartitionSpec}} interface might help: {code} public PartitionSpec listPartitions(db_name, tbl_name, max_parts) throws ... ; public int add_partitions( PartitionSpec new_parts ) throws … ; {code} where the PartitionSpec looks like: {code} public interface PartitionSpec { public ListPartition getPartitions(); public ListString getPartNames(); public IteratorPartition getPartitionIter(); public IteratorString getPartNameIter(); } {code} For addPartitions(), an {{HDFSDirBasedPartitionSpec}} class could implement {{PartitionSpec}}, store a top-level directory, and return Partition instances from sub-directory names, while storing a single StorageDescriptor for all of them. Similarly, list_partitions() could return a ListPartitionSpec, where each PartitionSpec corresponds to a set or partitions that can share a StorageDescriptor. By exposing iterator semantics, neither the client nor the metastore need instantiate all partitions at once. That should help with memory requirements. In case no smart grouping is possible, we could just fall back on a {{DefaultPartitionSpec}} which composes {{ListPartition}}, and is no worse than status quo. PartitionSpec abstracts away how a set of partitions may be represented. A tighter representation allows us to communicate metadata for a larger number of Partitions, with less Thrift traffic. Given that Thrift doesn’t support polymorphism, we’d have to implement the PartitionSpec as a Thrift Union of supported implementations. (We could convert from the Thrift PartitionSpec to the appropriate Java PartitionSpec sub-class.) Thoughts? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7223) Support generic PartitionSpecs in Metastore partition-functions
[ https://issues.apache.org/jira/browse/HIVE-7223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan updated HIVE-7223: --- Attachment: HIVE-7223.2.patch Updated patch, with Thrift definitions updated, etc. Support generic PartitionSpecs in Metastore partition-functions --- Key: HIVE-7223 URL: https://issues.apache.org/jira/browse/HIVE-7223 Project: Hive Issue Type: Improvement Components: HCatalog, Metastore Affects Versions: 0.12.0, 0.13.0 Reporter: Mithun Radhakrishnan Assignee: Mithun Radhakrishnan Attachments: HIVE-7223.1.patch, HIVE-7223.2.patch Currently, the functions in the HiveMetaStore API that handle multiple partitions do so using ListPartition. E.g. {code} public ListPartition listPartitions(String db_name, String tbl_name, short max_parts); public ListPartition listPartitionsByFilter(String db_name, String tbl_name, String filter, short max_parts); public int add_partitions(ListPartition new_parts); {code} Partition objects are fairly heavyweight, since each Partition carries its own copy of a StorageDescriptor, partition-values, etc. Tables with tens of thousands of partitions take so long to have their partitions listed that the client times out with default hive.metastore.client.socket.timeout. There is the additional expense of serializing and deserializing metadata for large sets of partitions, w.r.t time and heap-space. Reducing the thrift traffic should help in this regard. In a date-partitioned table, all sub-partitions for a particular date are *likely* (but not expected) to have: # The same base directory (e.g. {{/feeds/search/20140601/}}) # Similar directory structure (e.g. {{/feeds/search/20140601/[US,UK,IN]}}) # The same SerDe/StorageHandler/IOFormat classes # Sorting/Bucketing/SkewInfo settings In this “most likely” scenario (henceforth termed “normal”), it’s possible to represent the partition-list (for a date) in a more condensed form: a list of LighterPartition instances, all sharing a common StorageDescriptor whose location points to the root directory. We can go one better for the {{add_partitions()}} case: When adding all partitions for a given date, the “normal” case affords us the ability to specify the top-level date-directory, where sub-partitions can be inferred from the HDFS directory-path. These extensions are hard to introduce at the metastore-level, since partition-functions explicitly specify {{ListPartition}} arguments. I wonder if a {{PartitionSpec}} interface might help: {code} public PartitionSpec listPartitions(db_name, tbl_name, max_parts) throws ... ; public int add_partitions( PartitionSpec new_parts ) throws … ; {code} where the PartitionSpec looks like: {code} public interface PartitionSpec { public ListPartition getPartitions(); public ListString getPartNames(); public IteratorPartition getPartitionIter(); public IteratorString getPartNameIter(); } {code} For addPartitions(), an {{HDFSDirBasedPartitionSpec}} class could implement {{PartitionSpec}}, store a top-level directory, and return Partition instances from sub-directory names, while storing a single StorageDescriptor for all of them. Similarly, list_partitions() could return a ListPartitionSpec, where each PartitionSpec corresponds to a set or partitions that can share a StorageDescriptor. By exposing iterator semantics, neither the client nor the metastore need instantiate all partitions at once. That should help with memory requirements. In case no smart grouping is possible, we could just fall back on a {{DefaultPartitionSpec}} which composes {{ListPartition}}, and is no worse than status quo. PartitionSpec abstracts away how a set of partitions may be represented. A tighter representation allows us to communicate metadata for a larger number of Partitions, with less Thrift traffic. Given that Thrift doesn’t support polymorphism, we’d have to implement the PartitionSpec as a Thrift Union of supported implementations. (We could convert from the Thrift PartitionSpec to the appropriate Java PartitionSpec sub-class.) Thoughts? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7446) Add support to ALTER TABLE .. ADD COLUMN to Avro backed tables
[ https://issues.apache.org/jira/browse/HIVE-7446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090993#comment-14090993 ] Ashish Kumar Singh commented on HIVE-7446: -- The test errors above are not related to this patch. Add support to ALTER TABLE .. ADD COLUMN to Avro backed tables -- Key: HIVE-7446 URL: https://issues.apache.org/jira/browse/HIVE-7446 Project: Hive Issue Type: New Feature Reporter: Ashish Kumar Singh Assignee: Ashish Kumar Singh Attachments: HIVE-7446.1.patch, HIVE-7446.patch HIVE-6806 adds native support for creating hive table stored as Avro. It would be good to add support to ALTER TABLE .. ADD COLUMN to Avro backed tables. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7654) A method to extrapolate columnStats for partitions of a table
[ https://issues.apache.org/jira/browse/HIVE-7654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14091001#comment-14091001 ] Hive QA commented on HIVE-7654: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12660487/HIVE-7654.0.patch {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 5871 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx org.apache.hadoop.hive.ql.TestDDLWithRemoteMetastoreSecondNamenode.testCreateTableWithIndexAndPartitionsNonDefaultNameNode org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/230/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/230/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-230/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12660487 A method to extrapolate columnStats for partitions of a table - Key: HIVE-7654 URL: https://issues.apache.org/jira/browse/HIVE-7654 Project: Hive Issue Type: New Feature Reporter: pengcheng xiong Assignee: pengcheng xiong Priority: Minor Attachments: Extrapolate the Column Status.docx, HIVE-7654.0.patch In a PARTITIONED table, there are many partitions. For example, create table if not exists loc_orc ( state string, locid int, zip bigint ) partitioned by(year string) stored as orc; We assume there are 4 partitions, partition(year='2000'), partition(year='2001'), partition(year='2002') and partition(year='2003'). We can use the following command to compute statistics for columns state,locid of partition(year='2001') analyze table loc_orc partition(year='2001') compute statistics for columns state,locid; We need to know the “aggregated” column status for the whole table loc_orc. However, we may not have the column status for some partitions, e.g., partition(year='2002') and also we may not have the column status for some columns, e.g., zip bigint for partition(year='2001') We propose a method to extrapolate the missing column status for the partitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
RE: Key is null in map when OrcNewInputFormat is used as Input Format Class
Any update from anybody? Should I file a bug? Thanks -Original Message- From: John Zeng [mailto:john.z...@dataguise.com] Sent: Wednesday, August 6, 2014 10:17 AM To: dev@hive.apache.org Subject: Key is null in map when OrcNewInputFormat is used as Input Format Class Dear OrcNewInputFormat owner, When using OrcNewInputFormat as input format class for my map reduce job, I find its key is always null in my map method. This gives me no way to get row number in my map method. If you compare RCFileInputFormat (for RC file), its key in map method returns the row number so I know which row I am processing. Is there any workaround for me to get the row number from my map method? Of course, I can count the row number by myself. But that has two problems: #1 I have to assume the row is coming in the order; #2 I will get duplicated (and wrong) row numbers if a big input file causes multiple file splits (which will trigger my map method multiple times in different data nodes). At this point, I am really seeking a better way to get row number for each processed row in map method. Here is what I have in my map logs: [2014-08-06 09:39:25 DEBUG com..hadoop.orcfile.OrcFileMap]: Mapper Input Key: (null) [2014-08-06 09:39:25 DEBUG com..hadoop.orcfile.OrcFileMap]: Mapper Input Value: {Q8151, T9976, 69976, 8156756, 966798161, 97898989898, Laura, laura...@gmail.com} My map method is: protected void map(Object key, Writable value, Context context) throws IOException, InterruptedException { logger.debug(Mapper Input Key: + key); logger.debug(Mapper Input Value: + value.toString()); . } Thanks John
[jira] [Updated] (HIVE-7662) CBO: changes to Cost Model
[ https://issues.apache.org/jira/browse/HIVE-7662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harish Butani updated HIVE-7662: Summary: CBO: changes to Cost Model (was: CBO: changes to COst Model) CBO: changes to Cost Model -- Key: HIVE-7662 URL: https://issues.apache.org/jira/browse/HIVE-7662 Project: Hive Issue Type: Sub-task Reporter: Harish Butani Assignee: Harish Butani - Model Join cost as Sum of Input sizes - Fix bug with NDV calculations. For now use Optiq's default formulas. - Model Cummulative cost to favor broad Plans over Deep Plans. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7662) CBO: changes to COst Model
Harish Butani created HIVE-7662: --- Summary: CBO: changes to COst Model Key: HIVE-7662 URL: https://issues.apache.org/jira/browse/HIVE-7662 Project: Hive Issue Type: Sub-task Reporter: Harish Butani Assignee: Harish Butani - Model Join cost as Sum of Input sizes - Fix bug with NDV calculations. For now use Optiq's default formulas. - Model Cummulative cost to favor broad Plans over Deep Plans. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7655) Reading of partitioned table stats slows down explain
[ https://issues.apache.org/jira/browse/HIVE-7655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harish Butani updated HIVE-7655: Issue Type: Sub-task (was: Bug) Parent: HIVE-5775 Reading of partitioned table stats slows down explain - Key: HIVE-7655 URL: https://issues.apache.org/jira/browse/HIVE-7655 Project: Hive Issue Type: Sub-task Affects Versions: 0.13.1 Reporter: Mostafa Mokhtar Assignee: Harish Butani Labels: hive Fix For: 0.14.0 This defect is due to a regression introduced in https://issues.apache.org/jira/browse/HIVE-7625, explain for queries that touch partitioned tables is 10x slower. RelOptHiveTable.getRowCount calls listPartitionsWithAuthInfo which returns the data from all partitions, listPartitionsByExpr should be used instead. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7662) CBO: changes to Cost Model
[ https://issues.apache.org/jira/browse/HIVE-7662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harish Butani updated HIVE-7662: Attachment: HIVE-7662.1.patch CBO: changes to Cost Model -- Key: HIVE-7662 URL: https://issues.apache.org/jira/browse/HIVE-7662 Project: Hive Issue Type: Sub-task Reporter: Harish Butani Assignee: Harish Butani Attachments: HIVE-7662.1.patch - Model Join cost as Sum of Input sizes - Fix bug with NDV calculations. For now use Optiq's default formulas. - Model Cummulative cost to favor broad Plans over Deep Plans. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7649) Support column stats with temporary tables
[ https://issues.apache.org/jira/browse/HIVE-7649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-7649: - Attachment: HIVE-7649.2.patch rebasing patch with trunk Support column stats with temporary tables -- Key: HIVE-7649 URL: https://issues.apache.org/jira/browse/HIVE-7649 Project: Hive Issue Type: Bug Components: Statistics Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-7649.1.patch, HIVE-7649.2.patch Column stats currently not supported with temp tables, see if they can be added. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 24289: MetadataUpdater: provide a mechanism to edit the statistics of a column in a table (or a partition of a table)
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24289/#review50047 --- Please add .q tests for these. Test for partitioned table with more than one partition column on variety of column types and variety of stats type. ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsUpdateTask.java https://reviews.apache.org/r/24289/#comment87572 Include example sql statement for which this task is meant for. ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsUpdateTask.java https://reviews.apache.org/r/24289/#comment87575 Add a comment saying grammar prohibits more than 1 column, so we are guaranteed to have only 1 element in this lists. ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsUpdateTask.java https://reviews.apache.org/r/24289/#comment87576 Is clear() needed here? ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsUpdateTask.java https://reviews.apache.org/r/24289/#comment87579 Add else{ throw SemanticException (Unknown stat); } add to all of subsequent block. You may also want to reconsider some of this reptition in private method. ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsUpdateTask.java https://reviews.apache.org/r/24289/#comment87580 Add else { throw Exception (Unsupported type); } ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsUpdateTask.java https://reviews.apache.org/r/24289/#comment87574 Copy-paste comments? ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsUpdateTask.java https://reviews.apache.org/r/24289/#comment87573 Comments seem out of place. Copy-paste? ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java https://reviews.apache.org/r/24289/#comment87563 throw new SemanticException (table + tbl + not found); ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java https://reviews.apache.org/r/24289/#comment87564 if (colType == null) throw new Semantic Exception (col not found); ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java https://reviews.apache.org/r/24289/#comment87565 There can be multiple partitioning column, in which case this assert will fail. Dont think you want that. ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java https://reviews.apache.org/r/24289/#comment87566 Instead of this for loop, you want to use Warehouse.makePartName(partSpec, false); ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java https://reviews.apache.org/r/24289/#comment87567 throw SemanticEx ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java https://reviews.apache.org/r/24289/#comment87568 check colType != null ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzerFactory.java https://reviews.apache.org/r/24289/#comment87562 I don't think this if block is required. Further, you need to add a HiveOperation corresponding to this new token. ql/src/java/org/apache/hadoop/hive/ql/plan/ColumnStatsUpdateWork.java https://reviews.apache.org/r/24289/#comment87571 Add comment like, work corresponding to statement: alter table t1 partition (p1=c1,p2=c2), update... ql/src/java/org/apache/hadoop/hive/ql/plan/ColumnStatsUpdateWork.java https://reviews.apache.org/r/24289/#comment87569 This field doesnt seem to be used. Can be removed. ql/src/java/org/apache/hadoop/hive/ql/plan/ColumnStatsUpdateWork.java https://reviews.apache.org/r/24289/#comment87570 Good to implement this. Useful for debugging. - Ashutosh Chauhan On Aug. 5, 2014, 6:40 p.m., pengcheng xiong wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24289/ --- (Updated Aug. 5, 2014, 6:40 p.m.) Review request for hive. Repository: hive-git Description --- This patch provides ability to update certain stats without scanning any data or without hacking the backend db. It helps (esp for CBO work) to set up unit tests quickly and verify both cbo and the stats subsystem. It also helps when experimenting with the system if you're just trying out hive/hadoop on a small cluster. Finally it gives you a quick and clean way to fix things when something went wrong wrt stats in your environment. Usage: ALTER TABLE table_name PARTITION partition_spec UPDATE STATISTICS FOR COLUMN col_name SET col_statistics For example, ALTER TABLE src_x_int UPDATE STATISTICS FOR COLUMN key SET ('numDVs'='101','highValue'='10001.0'); ALTER TABLE src_p PARTITION(partitionId=1) UPDATE STATISTICS FOR COLUMN key SET ('numDVs'='100','avgColLen'='1.0001'); Diffs -
[jira] [Updated] (HIVE-7506) MetadataUpdater: provide a mechanism to edit the statistics of a column in a table (or a partition of a table)
[ https://issues.apache.org/jira/browse/HIVE-7506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7506: --- Status: Open (was: Patch Available) Left comments on RB. Two major items: * Add .q tests. * Add new HiveOperation enum corresponding to new token type. MetadataUpdater: provide a mechanism to edit the statistics of a column in a table (or a partition of a table) -- Key: HIVE-7506 URL: https://issues.apache.org/jira/browse/HIVE-7506 Project: Hive Issue Type: New Feature Components: Statistics Reporter: pengcheng xiong Assignee: pengcheng xiong Priority: Minor Attachments: HIVE-7506.1.patch, HIVE-7506.3.patch, HIVE-7506.4.patch, HIVE-7506.patch Original Estimate: 252h Remaining Estimate: 252h Two motivations: (1) Cost-based Optimizer (CBO) depends heavily on the statistics of a column in a table (or a partition of a table). If we would like to test whether CBO chooses the best plan under different statistics, it would be time consuming if we load the whole table and create the statistics from ground up. (2) As database runs, the statistics of a column in a table (or a partition of a table) may change. We need a way or a mechanism to synchronize. We propose the following command to achieve that: ALTER TABLE table_name PARTITION partition_spec [COLUMN col_name] UPDATE STATISTICS col_statistics [COMMENT col_comment] -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7506) MetadataUpdater: provide a mechanism to edit the statistics of a column in a table (or a partition of a table)
[ https://issues.apache.org/jira/browse/HIVE-7506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7506: --- Component/s: (was: Database/Schema) Statistics MetadataUpdater: provide a mechanism to edit the statistics of a column in a table (or a partition of a table) -- Key: HIVE-7506 URL: https://issues.apache.org/jira/browse/HIVE-7506 Project: Hive Issue Type: New Feature Components: Statistics Reporter: pengcheng xiong Assignee: pengcheng xiong Priority: Minor Attachments: HIVE-7506.1.patch, HIVE-7506.3.patch, HIVE-7506.4.patch, HIVE-7506.patch Original Estimate: 252h Remaining Estimate: 252h Two motivations: (1) Cost-based Optimizer (CBO) depends heavily on the statistics of a column in a table (or a partition of a table). If we would like to test whether CBO chooses the best plan under different statistics, it would be time consuming if we load the whole table and create the statistics from ground up. (2) As database runs, the statistics of a column in a table (or a partition of a table) may change. We need a way or a mechanism to synchronize. We propose the following command to achieve that: ALTER TABLE table_name PARTITION partition_spec [COLUMN col_name] UPDATE STATISTICS col_statistics [COMMENT col_comment] -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7663) OrcRecordUpdater needs to implement getStats
Alan Gates created HIVE-7663: Summary: OrcRecordUpdater needs to implement getStats Key: HIVE-7663 URL: https://issues.apache.org/jira/browse/HIVE-7663 Project: Hive Issue Type: Sub-task Components: Transactions Affects Versions: 0.13.0 Reporter: Alan Gates Assignee: Alan Gates OrcRecordUpdater.getStats currently returns null. It needs to track the stats and return a valid value. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7618) TestDDLWithRemoteMetastoreSecondNamenode unit test failure
[ https://issues.apache.org/jira/browse/HIVE-7618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14091126#comment-14091126 ] Sushanth Sowmyan commented on HIVE-7618: +1 on the patch as it currently stands. While I am, in theory, in favour of adding this to the SH interface, I think we should hold off on that for now. I would rather open discussion with the hive group at large in re-architecting StorageHandlers in general, trying to do the following: a) Deprecation/removal of HiveOutputFormat/HiveRecordWriter in general, in favour of using M/R definitions of the same, and having Committer semantics included. b) Rearchitecting/refactoring native Hive storage in a way that makes everything go through the SH interface, rather than having special-casing for SH and native c) Support for notion of SH per partition, rather than SH per table d) Notion of possible plan modifications by SH for any add-on tasks that are required. And if we're making that many changes, it's likely that we will break SHs significantly at that time, and I'd rather do it once rather than have a constant stream of breaking. I'd like to see us pursue that as a major initiative in the 0.15 timeframe, if possible. I'll shoot out a mail to the list on that regard. TestDDLWithRemoteMetastoreSecondNamenode unit test failure -- Key: HIVE-7618 URL: https://issues.apache.org/jira/browse/HIVE-7618 Project: Hive Issue Type: Bug Components: Tests Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-7618.1.patch, HIVE-7618.2.patch Looks like TestDDLWithRemoteMetastoreSecondNamenode started failing after HIVE-6584 was committed. {noformat} TestDDLWithRemoteMetastoreSecondNamenode.testCreateTableWithIndexAndPartitionsNonDefaultNameNode:272-createTableAndCheck:201-createTableAndCheck:219 Table should be located in the second filesystem expected:[hdfs] but was:[pfile] {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7616) pre-size mapjoin hashtable based on statistics
[ https://issues.apache.org/jira/browse/HIVE-7616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14091127#comment-14091127 ] Hive QA commented on HIVE-7616: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12660507/HIVE-7616.04.patch {color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 5886 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorized_nested_mapjoin org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx org.apache.hadoop.hive.ql.TestDDLWithRemoteMetastoreSecondNamenode.testCreateTableWithIndexAndPartitionsNonDefaultNameNode org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/231/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/231/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-231/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12660507 pre-size mapjoin hashtable based on statistics -- Key: HIVE-7616 URL: https://issues.apache.org/jira/browse/HIVE-7616 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-7616.01.patch, HIVE-7616.02.patch, HIVE-7616.03.patch, HIVE-7616.04.patch, HIVE-7616.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7655) Reading of partitioned table stats slows down explain
[ https://issues.apache.org/jira/browse/HIVE-7655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harish Butani updated HIVE-7655: Attachment: HIVE-7655.1.patch Reading of partitioned table stats slows down explain - Key: HIVE-7655 URL: https://issues.apache.org/jira/browse/HIVE-7655 Project: Hive Issue Type: Sub-task Affects Versions: 0.13.1 Reporter: Mostafa Mokhtar Assignee: Harish Butani Labels: hive Fix For: 0.14.0 Attachments: HIVE-7655.1.patch This defect is due to a regression introduced in https://issues.apache.org/jira/browse/HIVE-7625, explain for queries that touch partitioned tables is 10x slower. RelOptHiveTable.getRowCount calls listPartitionsWithAuthInfo which returns the data from all partitions, listPartitionsByExpr should be used instead. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7663) OrcRecordUpdater needs to implement getStats
[ https://issues.apache.org/jira/browse/HIVE-7663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-7663: - Attachment: HIVE-7663.patch This patch implements the getRowCount. It does not implement getRawSize, as that is very hard to calculate for update and delete. But in those cases the rawSize isn't so important as we can use the raw size of the base. OrcRecordUpdater needs to implement getStats Key: HIVE-7663 URL: https://issues.apache.org/jira/browse/HIVE-7663 Project: Hive Issue Type: Sub-task Components: Transactions Affects Versions: 0.13.0 Reporter: Alan Gates Assignee: Alan Gates Attachments: HIVE-7663.patch OrcRecordUpdater.getStats currently returns null. It needs to track the stats and return a valid value. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7663) OrcRecordUpdater needs to implement getStats
[ https://issues.apache.org/jira/browse/HIVE-7663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-7663: - Status: Patch Available (was: Open) OrcRecordUpdater needs to implement getStats Key: HIVE-7663 URL: https://issues.apache.org/jira/browse/HIVE-7663 Project: Hive Issue Type: Sub-task Components: Transactions Affects Versions: 0.13.0 Reporter: Alan Gates Assignee: Alan Gates Attachments: HIVE-7663.patch OrcRecordUpdater.getStats currently returns null. It needs to track the stats and return a valid value. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 24474: HIVE-6959 Enable Constant propagation optimizer for Hive Vectorization
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24474/ --- (Updated Aug. 8, 2014, 7:16 p.m.) Review request for hive, Ashutosh Chauhan and Jitendra Pandey. Bugs: HIVE-6959 https://issues.apache.org/jira/browse/HIVE-6959 Repository: hive-git Description --- Enable Constant propagation optimizer for Hive Vectorization Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java 535e4b3 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/ConstantVectorExpression.java 9fd3853 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorExpressionWriterFactory.java eeb76d7 ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConstantPropagate.java b12d3a8 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java e778ba4 ql/src/test/org/apache/hadoop/hive/ql/exec/vector/TestVectorizationContext.java 2329f52 ql/src/test/queries/clientpositive/vector_coalesce.q 052ab71 ql/src/test/results/clientpositive/tez/vector_cast_constant.q.out 9dac17b ql/src/test/results/clientpositive/tez/vectorization_14.q.out 04c99f1 ql/src/test/results/clientpositive/tez/vectorization_15.q.out 1381695 ql/src/test/results/clientpositive/tez/vectorization_9.q.out 3d2645a ql/src/test/results/clientpositive/tez/vectorization_short_regress.q.out 2fa1bae ql/src/test/results/clientpositive/vector_between_in.q.out 78e340b ql/src/test/results/clientpositive/vector_cast_constant.q.out cdb13cb ql/src/test/results/clientpositive/vector_coalesce.q.out 9561d47 ql/src/test/results/clientpositive/vector_decimal_mapjoin.q.out 71a3def ql/src/test/results/clientpositive/vector_decimal_math_funcs.q.out 717e81a ql/src/test/results/clientpositive/vector_elt.q.out ea0af62 ql/src/test/results/clientpositive/vectorization_14.q.out 3992bb1 ql/src/test/results/clientpositive/vectorization_15.q.out 1f48fea ql/src/test/results/clientpositive/vectorization_16.q.out 38596e6 ql/src/test/results/clientpositive/vectorization_9.q.out c757b1f ql/src/test/results/clientpositive/vectorization_div0.q.out b2321b4 ql/src/test/results/clientpositive/vectorization_short_regress.q.out 5b23850 ql/src/test/results/clientpositive/vectorized_math_funcs.q.out 181ab51 ql/src/test/results/clientpositive/vectorized_parquet.q.out 2e459a8 Diff: https://reviews.apache.org/r/24474/diff/ Testing --- Thanks, Hari Sankar Sivarama Subramaniyan
[jira] [Updated] (HIVE-6959) Enable Constant propagation optimizer for Hive Vectorization
[ https://issues.apache.org/jira/browse/HIVE-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-6959: Status: Patch Available (was: Open) Enable Constant propagation optimizer for Hive Vectorization Key: HIVE-6959 URL: https://issues.apache.org/jira/browse/HIVE-6959 Project: Hive Issue Type: Bug Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan Attachments: HIVE-6959.1.patch, HIVE-6959.2.patch, HIVE-6959.4.patch, HIVE-6959.5.patch, HIVE-6959.6.patch HIVE-5771 covers Constant propagation optimizer for Hive. Now that HIVE-5771 is committed, we should remove any vectorization related code which duplicates this feature. For example, a fn to be cleaned is VectorizarionContext::foldConstantsForUnaryExprs(). In addition to this change, constant propagation should kick in when vectorization is enabled. i.e. we need to lift the HIVE_VECTORIZATION_ENABLED restriction inside ConstantPropagate::transform(). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6959) Enable Constant propagation optimizer for Hive Vectorization
[ https://issues.apache.org/jira/browse/HIVE-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-6959: Status: Open (was: Patch Available) Enable Constant propagation optimizer for Hive Vectorization Key: HIVE-6959 URL: https://issues.apache.org/jira/browse/HIVE-6959 Project: Hive Issue Type: Bug Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan Attachments: HIVE-6959.1.patch, HIVE-6959.2.patch, HIVE-6959.4.patch, HIVE-6959.5.patch, HIVE-6959.6.patch HIVE-5771 covers Constant propagation optimizer for Hive. Now that HIVE-5771 is committed, we should remove any vectorization related code which duplicates this feature. For example, a fn to be cleaned is VectorizarionContext::foldConstantsForUnaryExprs(). In addition to this change, constant propagation should kick in when vectorization is enabled. i.e. we need to lift the HIVE_VECTORIZATION_ENABLED restriction inside ConstantPropagate::transform(). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6959) Enable Constant propagation optimizer for Hive Vectorization
[ https://issues.apache.org/jira/browse/HIVE-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-6959: Attachment: HIVE-6959.6.patch updated the MiniTezCliDriver test results as well. Enable Constant propagation optimizer for Hive Vectorization Key: HIVE-6959 URL: https://issues.apache.org/jira/browse/HIVE-6959 Project: Hive Issue Type: Bug Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan Attachments: HIVE-6959.1.patch, HIVE-6959.2.patch, HIVE-6959.4.patch, HIVE-6959.5.patch, HIVE-6959.6.patch HIVE-5771 covers Constant propagation optimizer for Hive. Now that HIVE-5771 is committed, we should remove any vectorization related code which duplicates this feature. For example, a fn to be cleaned is VectorizarionContext::foldConstantsForUnaryExprs(). In addition to this change, constant propagation should kick in when vectorization is enabled. i.e. we need to lift the HIVE_VECTORIZATION_ENABLED restriction inside ConstantPropagate::transform(). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark
[ https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14091198#comment-14091198 ] Szehon Ho commented on HIVE-7624: - Hi Li Rui, I think the patch looks reasonable. Just had a comment and a question on the RB. Thanks Reduce operator initialization failed when running multiple MR query on spark - Key: HIVE-7624 URL: https://issues.apache.org/jira/browse/HIVE-7624 Project: Hive Issue Type: Bug Components: Spark Reporter: Rui Li Assignee: Rui Li Attachments: HIVE-7624.2-spark.patch, HIVE-7624.3-spark.patch, HIVE-7624.4-spark.patch, HIVE-7624.patch The following error occurs when I try to run a query with multiple reduce works (M-R-R): {quote} 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1) java.lang.RuntimeException: Reduce operator initialization failed at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170) at org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53) at org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from [0:_col0] at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415) at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147) … {quote} I suspect we're applying the reduce function in wrong order. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 24427: HIVE-7616 pre-size mapjoin hashtable based on statistics
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24427/#review50071 --- ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkMapJoinProc.java https://reviews.apache.org/r/24427/#comment87593 I'm not sure about this one. But shouldn't the condition be: joinConf.isBucketMapJoin()? - Gunther Hagleitner On Aug. 7, 2014, 11:53 p.m., Sergey Shelukhin wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24427/ --- (Updated Aug. 7, 2014, 11:53 p.m.) Review request for hive, Gunther Hagleitner, Mostafa Mokhtar, and Prasanth_J. Repository: hive-git Description --- See jira Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 8490558 ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java cf64aa0 ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/BytesBytesMultiHashMap.java cdb5dc5 ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/HashMapWrapper.java 5b3b770 ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinBytesTableContainer.java 629457c ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HashTableLoader.java 6d292d0 ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java d42e1f7 ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkMapJoinProc.java 29d895a ql/src/java/org/apache/hadoop/hive/ql/plan/MapJoinDesc.java 44cb9c0 Diff: https://reviews.apache.org/r/24427/diff/ Testing --- Thanks, Sergey Shelukhin
Re: Review Request 24427: HIVE-7616 pre-size mapjoin hashtable based on statistics
On Aug. 7, 2014, 12:04 a.m., Gunther Hagleitner wrote: ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkMapJoinProc.java, line 136 https://reviews.apache.org/r/24427/diff/2/?file=654266#file654266line136 curlies per coding standard Sergey Shelukhin wrote: added; next time I review your patch, I'll enforce C variable declarations (all variables in the beginning of the block) which are also part of the same Sun standard Hive wiki page point to :P I do that anyways. - Gunther --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24427/#review49830 --- On Aug. 7, 2014, 11:53 p.m., Sergey Shelukhin wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24427/ --- (Updated Aug. 7, 2014, 11:53 p.m.) Review request for hive, Gunther Hagleitner, Mostafa Mokhtar, and Prasanth_J. Repository: hive-git Description --- See jira Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 8490558 ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java cf64aa0 ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/BytesBytesMultiHashMap.java cdb5dc5 ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/HashMapWrapper.java 5b3b770 ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinBytesTableContainer.java 629457c ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HashTableLoader.java 6d292d0 ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java d42e1f7 ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkMapJoinProc.java 29d895a ql/src/java/org/apache/hadoop/hive/ql/plan/MapJoinDesc.java 44cb9c0 Diff: https://reviews.apache.org/r/24427/diff/ Testing --- Thanks, Sergey Shelukhin
[jira] [Commented] (HIVE-7616) pre-size mapjoin hashtable based on statistics
[ https://issues.apache.org/jira/browse/HIVE-7616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14091284#comment-14091284 ] Gunther Hagleitner commented on HIVE-7616: -- Removing TODO in commit is fine by me. I've had one additional question about how to detect bucketed joins on the reviewboard. For testing: Can you add the expected key count to explain extended? that way you can verify the correct working through the unit tests. pre-size mapjoin hashtable based on statistics -- Key: HIVE-7616 URL: https://issues.apache.org/jira/browse/HIVE-7616 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-7616.01.patch, HIVE-7616.02.patch, HIVE-7616.03.patch, HIVE-7616.04.patch, HIVE-7616.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7616) pre-size mapjoin hashtable based on statistics
[ https://issues.apache.org/jira/browse/HIVE-7616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14091287#comment-14091287 ] Gunther Hagleitner commented on HIVE-7616: -- other than these 2 things i am +1 pre-size mapjoin hashtable based on statistics -- Key: HIVE-7616 URL: https://issues.apache.org/jira/browse/HIVE-7616 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-7616.01.patch, HIVE-7616.02.patch, HIVE-7616.03.patch, HIVE-7616.04.patch, HIVE-7616.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HIVE-7655) Reading of partitioned table stats slows down explain
[ https://issues.apache.org/jira/browse/HIVE-7655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner resolved HIVE-7655. -- Resolution: Fixed Committed to branch. Thank you [~rhbutani]! Reading of partitioned table stats slows down explain - Key: HIVE-7655 URL: https://issues.apache.org/jira/browse/HIVE-7655 Project: Hive Issue Type: Sub-task Affects Versions: 0.13.1 Reporter: Mostafa Mokhtar Assignee: Harish Butani Labels: hive Fix For: 0.14.0 Attachments: HIVE-7655.1.patch This defect is due to a regression introduced in https://issues.apache.org/jira/browse/HIVE-7625, explain for queries that touch partitioned tables is 10x slower. RelOptHiveTable.getRowCount calls listPartitionsWithAuthInfo which returns the data from all partitions, listPartitionsByExpr should be used instead. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7638) Disallow CREATE VIEW when created with a temporary table
[ https://issues.apache.org/jira/browse/HIVE-7638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14091316#comment-14091316 ] Hive QA commented on HIVE-7638: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12660527/HIVE-7638.1.patch {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 5872 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx org.apache.hadoop.hive.ql.TestDDLWithRemoteMetastoreSecondNamenode.testCreateTableWithIndexAndPartitionsNonDefaultNameNode org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/232/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/232/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-232/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12660527 Disallow CREATE VIEW when created with a temporary table Key: HIVE-7638 URL: https://issues.apache.org/jira/browse/HIVE-7638 Project: Hive Issue Type: Bug Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-7638.1.patch Followup item from HIVE-7090, don't allow view to be created if the view definition has a temp table. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7664) VectorizedBatchUtil.addRowToBatchFrom is not optimized for Vectorized execution and takes 25% CPU
Mostafa Mokhtar created HIVE-7664: - Summary: VectorizedBatchUtil.addRowToBatchFrom is not optimized for Vectorized execution and takes 25% CPU Key: HIVE-7664 URL: https://issues.apache.org/jira/browse/HIVE-7664 Project: Hive Issue Type: Bug Affects Versions: 0.13.1 Reporter: Mostafa Mokhtar Fix For: 0.14.0 In a Group by heavy vectorized Reducer vertex 25% of CPU is spent in VectorizedBatchUtil.addRowToBatchFrom(). Looked at the code of VectorizedBatchUtil.addRowToBatchFrom and it looks like it wasn't optimized for Vectorized processing. addRowToBatchFrom is called for every row and for each row and every column in the batch getPrimitiveCategory is called to figure the type of each column, column types are stored in a HashMap, for VectorGroupByOperator columns types won't change between batches, so column types shouldn't be looked up for every row. I recommend storing the column type in StructObjectInspector so that other components can leverage this optimization. Also addRowToBatchFrom has a case statement for every row and every column used for type casting I recommend encapsulating the type logic in templatized methods. {code} Stack Trace Sample CountPercentage(%) VectorizedBatchUtil.addRowToBatchFrom 86 26.543 AbstractPrimitiveObjectInspector.getPrimitiveCategory() 34 10.494 LazyBinaryStructObjectInspector.getStructFieldData 25 7.716 StandardStructObjectInspector.getStructFieldData 4 1.235 {code} The query used : {code} select ss_sold_date_sk from store_sales where ss_sold_date between '1998-01-01' and '1998-06-01' group by ss_item_sk , ss_customer_sk , ss_sold_date_sk having sum(ss_list_price) 50; {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HIVE-7662) CBO: changes to Cost Model
[ https://issues.apache.org/jira/browse/HIVE-7662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner resolved HIVE-7662. -- Resolution: Fixed Committed to branch. Thanks [~rhbutani]! CBO: changes to Cost Model -- Key: HIVE-7662 URL: https://issues.apache.org/jira/browse/HIVE-7662 Project: Hive Issue Type: Sub-task Reporter: Harish Butani Assignee: Harish Butani Attachments: HIVE-7662.1.patch - Model Join cost as Sum of Input sizes - Fix bug with NDV calculations. For now use Optiq's default formulas. - Model Cummulative cost to favor broad Plans over Deep Plans. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7656) Bring tez-branch up-to the API changes made by TEZ-1372
[ https://issues.apache.org/jira/browse/HIVE-7656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-7656: -- Attachment: HIVE-7656.1-tez.patch Bring tez-branch up-to the API changes made by TEZ-1372 --- Key: HIVE-7656 URL: https://issues.apache.org/jira/browse/HIVE-7656 Project: Hive Issue Type: Sub-task Affects Versions: tez-branch Reporter: Gopal V Assignee: Gopal V Fix For: tez-branch Attachments: HIVE-7656.1-tez.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7656) Bring tez-branch up-to the API changes made by TEZ-1372
[ https://issues.apache.org/jira/browse/HIVE-7656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-7656: -- Fix Version/s: tez-branch Status: Patch Available (was: Open) Bring tez-branch up-to the API changes made by TEZ-1372 --- Key: HIVE-7656 URL: https://issues.apache.org/jira/browse/HIVE-7656 Project: Hive Issue Type: Sub-task Affects Versions: tez-branch Reporter: Gopal V Assignee: Gopal V Fix For: tez-branch Attachments: HIVE-7656.1-tez.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7656) Bring tez-branch up-to the API changes made by TEZ-1372, TEZ-1386
[ https://issues.apache.org/jira/browse/HIVE-7656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-7656: -- Summary: Bring tez-branch up-to the API changes made by TEZ-1372, TEZ-1386 (was: Bring tez-branch up-to the API changes made by TEZ-1372) Bring tez-branch up-to the API changes made by TEZ-1372, TEZ-1386 -- Key: HIVE-7656 URL: https://issues.apache.org/jira/browse/HIVE-7656 Project: Hive Issue Type: Sub-task Affects Versions: tez-branch Reporter: Gopal V Assignee: Gopal V Fix For: tez-branch Attachments: HIVE-7656.1-tez.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7665) Create TestSparkCliDriver to run test in spark local mode
Szehon Ho created HIVE-7665: --- Summary: Create TestSparkCliDriver to run test in spark local mode Key: HIVE-7665 URL: https://issues.apache.org/jira/browse/HIVE-7665 Project: Hive Issue Type: Sub-task Components: Testing Infrastructure Reporter: Szehon Ho Assignee: Szehon Ho -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7656) Bring tez-branch up-to the API changes made by TEZ-1372
[ https://issues.apache.org/jira/browse/HIVE-7656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-7656: -- Status: Open (was: Patch Available) Bring tez-branch up-to the API changes made by TEZ-1372 --- Key: HIVE-7656 URL: https://issues.apache.org/jira/browse/HIVE-7656 Project: Hive Issue Type: Sub-task Affects Versions: tez-branch Reporter: Gopal V Assignee: Gopal V Fix For: tez-branch Attachments: HIVE-7656.1-tez.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7665) Create TestSparkCliDriver to run test in spark local mode
[ https://issues.apache.org/jira/browse/HIVE-7665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szehon Ho updated HIVE-7665: Component/s: Spark Create TestSparkCliDriver to run test in spark local mode - Key: HIVE-7665 URL: https://issues.apache.org/jira/browse/HIVE-7665 Project: Hive Issue Type: Sub-task Components: Spark, Testing Infrastructure Reporter: Szehon Ho Assignee: Szehon Ho -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7656) Bring tez-branch up-to the API changes made by TEZ-1372
[ https://issues.apache.org/jira/browse/HIVE-7656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-7656: -- Status: Patch Available (was: Open) Bring tez-branch up-to the API changes made by TEZ-1372 --- Key: HIVE-7656 URL: https://issues.apache.org/jira/browse/HIVE-7656 Project: Hive Issue Type: Sub-task Affects Versions: tez-branch Reporter: Gopal V Assignee: Gopal V Fix For: tez-branch Attachments: HIVE-7656.1-tez.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7656) Bring tez-branch up-to the API changes made by TEZ-1372
[ https://issues.apache.org/jira/browse/HIVE-7656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-7656: -- Summary: Bring tez-branch up-to the API changes made by TEZ-1372 (was: Bring tez-branch up-to the API changes made by TEZ-1372, TEZ-1386) Bring tez-branch up-to the API changes made by TEZ-1372 --- Key: HIVE-7656 URL: https://issues.apache.org/jira/browse/HIVE-7656 Project: Hive Issue Type: Sub-task Affects Versions: tez-branch Reporter: Gopal V Assignee: Gopal V Fix For: tez-branch Attachments: HIVE-7656.1-tez.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7432) Remove deprecated Avro's Schema.parse usages
[ https://issues.apache.org/jira/browse/HIVE-7432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14091386#comment-14091386 ] Hive QA commented on HIVE-7432: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12660529/HIVE-7432.2.patch {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 5886 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx org.apache.hadoop.hive.ql.TestDDLWithRemoteMetastoreSecondNamenode.testCreateTableWithIndexAndPartitionsNonDefaultNameNode org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/233/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/233/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-233/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12660529 Remove deprecated Avro's Schema.parse usages Key: HIVE-7432 URL: https://issues.apache.org/jira/browse/HIVE-7432 Project: Hive Issue Type: Improvement Reporter: Ashish Kumar Singh Assignee: Ashish Kumar Singh Attachments: HIVE-7432.1.patch, HIVE-7432.2.patch, HIVE-7432.patch Schema.parse has been deprecated by Avro, however it is being used at multiple places in Hive. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7432) Remove deprecated Avro's Schema.parse usages
[ https://issues.apache.org/jira/browse/HIVE-7432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14091389#comment-14091389 ] Ashish Kumar Singh commented on HIVE-7432: -- Above test errors are not related to this patch. Remove deprecated Avro's Schema.parse usages Key: HIVE-7432 URL: https://issues.apache.org/jira/browse/HIVE-7432 Project: Hive Issue Type: Improvement Reporter: Ashish Kumar Singh Assignee: Ashish Kumar Singh Attachments: HIVE-7432.1.patch, HIVE-7432.2.patch, HIVE-7432.patch Schema.parse has been deprecated by Avro, however it is being used at multiple places in Hive. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7656) Bring tez-branch up-to the API changes made by TEZ-1372
[ https://issues.apache.org/jira/browse/HIVE-7656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikram Dixit K updated HIVE-7656: - Attachment: HIVE-7656.2.patch Needed some more changes to work with TEZ-1386. Bring tez-branch up-to the API changes made by TEZ-1372 --- Key: HIVE-7656 URL: https://issues.apache.org/jira/browse/HIVE-7656 Project: Hive Issue Type: Sub-task Affects Versions: tez-branch Reporter: Gopal V Assignee: Gopal V Fix For: tez-branch Attachments: HIVE-7656.1-tez.patch, HIVE-7656.2.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7666) Join selectivity calculation should use exponential back-off for conjunction predicates
Mostafa Mokhtar created HIVE-7666: - Summary: Join selectivity calculation should use exponential back-off for conjunction predicates Key: HIVE-7666 URL: https://issues.apache.org/jira/browse/HIVE-7666 Project: Hive Issue Type: Bug Components: CBO Affects Versions: 0.13.1 Reporter: Mostafa Mokhtar Assignee: Laljo John Pullokkaran Fix For: 0.14.0 Assuming dependency for predicate (number of column joins filters) will almost always hurt us as implied correlations do actually exist. Currently HiveRelMdSelectivity.computeInnerJoinSelectivity uses to log to smoothen selectivity of conjunction predicates which results in un-optimal plans. The problem with log is that it still assumes dependency, For instance in TPC-DS Q17 store_sales has 6 join predicates which explains why stor_sales is in the wrong place in the plan. Change the algorithm to use exponential back-off : ndv(pe0) * ndv(pe1) ^(1/2) * ndv(pe2) ^(1/4) * ndv(pe3) ^(1/8) Opposed to : ndv(pex)*log(ndv(pe1))*log(ndv(pe2)) For TPC-DS Q17 store_sales has 6 inner join predicates if we assume selectivity of 0.7 for each join then join selectivity can end up being 6.24285E-05 which is too low and eventually results in an un-optimal plan. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7656) Bring tez-branch up-to the API changes made by TEZ-1372
[ https://issues.apache.org/jira/browse/HIVE-7656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14091440#comment-14091440 ] Vikram Dixit K commented on HIVE-7656: -- Committed to tez branch. Thanks Gopal! Bring tez-branch up-to the API changes made by TEZ-1372 --- Key: HIVE-7656 URL: https://issues.apache.org/jira/browse/HIVE-7656 Project: Hive Issue Type: Sub-task Affects Versions: tez-branch Reporter: Gopal V Assignee: Gopal V Fix For: tez-branch Attachments: HIVE-7656.1-tez.patch, HIVE-7656.2.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7616) pre-size mapjoin hashtable based on statistics
[ https://issues.apache.org/jira/browse/HIVE-7616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-7616: --- Attachment: HIVE-7616.05.patch added to explain plan, fixed column stats names. I ran some tests to change out files, let's see what else fails to change those pre-size mapjoin hashtable based on statistics -- Key: HIVE-7616 URL: https://issues.apache.org/jira/browse/HIVE-7616 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-7616.01.patch, HIVE-7616.02.patch, HIVE-7616.03.patch, HIVE-7616.04.patch, HIVE-7616.05.patch, HIVE-7616.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 24427: HIVE-7616 pre-size mapjoin hashtable based on statistics
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24427/ --- (Updated Aug. 8, 2014, 11:36 p.m.) Review request for hive, Gunther Hagleitner, Mostafa Mokhtar, and Prasanth_J. Repository: hive-git Description --- See jira Diffs (updated) - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 8490558 ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java cf64aa0 ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/BytesBytesMultiHashMap.java cdb5dc5 ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/HashMapWrapper.java 5b3b770 ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinBytesTableContainer.java 629457c ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HashTableLoader.java 6d292d0 ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkMapJoinProc.java 29d895a ql/src/java/org/apache/hadoop/hive/ql/plan/MapJoinDesc.java 44cb9c0 ql/src/java/org/apache/hadoop/hive/ql/plan/Statistics.java 4173ea4 ql/src/test/queries/clientpositive/mapjoin_mapjoin.q 3f36851 ql/src/test/results/clientpositive/bucket_map_join_1.q.out 63fb0d1 ql/src/test/results/clientpositive/bucket_map_join_2.q.out 21f2d5a ql/src/test/results/clientpositive/bucketcontext_1.q.out 5212de3 ql/src/test/results/clientpositive/bucketcontext_2.q.out d86c430 ql/src/test/results/clientpositive/bucketcontext_3.q.out a536e8b ql/src/test/results/clientpositive/bucketcontext_4.q.out 26c8720 ql/src/test/results/clientpositive/bucketcontext_5.q.out 2619cfb ql/src/test/results/clientpositive/bucketcontext_6.q.out 4c42ca7 ql/src/test/results/clientpositive/bucketcontext_7.q.out 7e5afb5 ql/src/test/results/clientpositive/bucketcontext_8.q.out 243b67a ql/src/test/results/clientpositive/bucketmapjoin1.q.out 10f1af4 ql/src/test/results/clientpositive/bucketmapjoin10.q.out f852cde ql/src/test/results/clientpositive/bucketmapjoin11.q.out 97e80fb ql/src/test/results/clientpositive/bucketmapjoin12.q.out e486ca5 ql/src/test/results/clientpositive/bucketmapjoin2.q.out 297412f ql/src/test/results/clientpositive/bucketmapjoin3.q.out 7f307a0 ql/src/test/results/clientpositive/bucketmapjoin4.q.out f0f9aee ql/src/test/results/clientpositive/bucketmapjoin5.q.out 79e1c3d ql/src/test/results/clientpositive/bucketmapjoin8.q.out e504c9d ql/src/test/results/clientpositive/bucketmapjoin9.q.out 18f350a ql/src/test/results/clientpositive/bucketmapjoin_negative.q.out 751e32f ql/src/test/results/clientpositive/bucketmapjoin_negative2.q.out 3eb70d1 ql/src/test/results/clientpositive/bucketmapjoin_negative3.q.out 34abe4f ql/src/test/results/clientpositive/join26.q.out bf8cf57 ql/src/test/results/clientpositive/join32.q.out ff0d7cc ql/src/test/results/clientpositive/join33.q.out ff0d7cc ql/src/test/results/clientpositive/join34.q.out b52777a ql/src/test/results/clientpositive/join35.q.out 20c69ea ql/src/test/results/clientpositive/join_map_ppr.q.out 51fb6c6 ql/src/test/results/clientpositive/mapjoin_mapjoin.q.out 567b0ca ql/src/test/results/clientpositive/sample8.q.out e0c0f9e ql/src/test/results/clientpositive/smb_mapjoin_11.q.out d59b801 ql/src/test/results/clientpositive/sort_merge_join_desc_5.q.out ba8928b ql/src/test/results/clientpositive/sort_merge_join_desc_6.q.out d51a54e ql/src/test/results/clientpositive/sort_merge_join_desc_7.q.out fcb6367 ql/src/test/results/clientpositive/stats11.q.out c5531c5 ql/src/test/results/clientpositive/tez/mapjoin_mapjoin.q.out 9e90ec2 ql/src/test/results/clientpositive/transform_ppr1.q.out 6f908fa ql/src/test/results/clientpositive/transform_ppr2.q.out 9285151 ql/src/test/results/clientpositive/union22.q.out 884c106 ql/src/test/results/clientpositive/union_ppr.q.out ee209c7 Diff: https://reviews.apache.org/r/24427/diff/ Testing --- Thanks, Sergey Shelukhin
[jira] [Updated] (HIVE-7616) pre-size mapjoin hashtable based on statistics
[ https://issues.apache.org/jira/browse/HIVE-7616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-7616: --- Attachment: HIVE-7616.06.patch restrict explain change to tez pre-size mapjoin hashtable based on statistics -- Key: HIVE-7616 URL: https://issues.apache.org/jira/browse/HIVE-7616 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-7616.01.patch, HIVE-7616.02.patch, HIVE-7616.03.patch, HIVE-7616.04.patch, HIVE-7616.05.patch, HIVE-7616.06.patch, HIVE-7616.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 24427: HIVE-7616 pre-size mapjoin hashtable based on statistics
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24427/ --- (Updated Aug. 8, 2014, 11:42 p.m.) Review request for hive, Gunther Hagleitner, Mostafa Mokhtar, and Prasanth_J. Repository: hive-git Description --- See jira Diffs (updated) - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 8490558 ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java cf64aa0 ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/BytesBytesMultiHashMap.java cdb5dc5 ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/HashMapWrapper.java 5b3b770 ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinBytesTableContainer.java 629457c ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HashTableLoader.java 6d292d0 ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkMapJoinProc.java 29d895a ql/src/java/org/apache/hadoop/hive/ql/plan/MapJoinDesc.java 44cb9c0 ql/src/java/org/apache/hadoop/hive/ql/plan/Statistics.java 4173ea4 ql/src/test/queries/clientpositive/mapjoin_mapjoin.q 3f36851 ql/src/test/results/clientpositive/mapjoin_mapjoin.q.out 567b0ca ql/src/test/results/clientpositive/tez/mapjoin_mapjoin.q.out 9e90ec2 Diff: https://reviews.apache.org/r/24427/diff/ Testing --- Thanks, Sergey Shelukhin
[jira] [Updated] (HIVE-7617) optimize bytes mapjoin hash table read path wrt serialization, at least for common cases
[ https://issues.apache.org/jira/browse/HIVE-7617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-7617: --- Attachment: HIVE-7617.prelim.patch preliminary patch after some experiments. Need to run tests and then perf tests too optimize bytes mapjoin hash table read path wrt serialization, at least for common cases Key: HIVE-7617 URL: https://issues.apache.org/jira/browse/HIVE-7617 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-7617.prelim.patch BytesBytes has table stores keys in the byte array for compact representation, however that means that the straightforward implementation of lookups serializes lookup keys to byte arrays, which is relatively expensive. We can either shortcut hashcode and compare for common types on read path (integral types which would cover most of the real-world keys), or specialize hashtable and from BytesBytes... create LongBytes, StringBytes, or whatever. First one seems simpler now. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7366) getDatabase using direct sql
[ https://issues.apache.org/jira/browse/HIVE-7366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-7366: --- Status: Patch Available (was: Open) getDatabase using direct sql Key: HIVE-7366 URL: https://issues.apache.org/jira/browse/HIVE-7366 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.14.0 Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Attachments: HIVE-7366.patch Given that get_database is easily one of the most frequent calls made on the metastore, we should have the ability to bypass datanucleus for that, and use direct SQL instead. This was something that I did initially as part of debugging HIVE-7368, but I think that given the frequency of this call, it's useful to have it in mainline direct sql. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7366) getDatabase using direct sql
[ https://issues.apache.org/jira/browse/HIVE-7366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-7366: --- Attachment: HIVE-7366.patch Attaching patch. getDatabase using direct sql Key: HIVE-7366 URL: https://issues.apache.org/jira/browse/HIVE-7366 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.14.0 Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Attachments: HIVE-7366.patch Given that get_database is easily one of the most frequent calls made on the metastore, we should have the ability to bypass datanucleus for that, and use direct SQL instead. This was something that I did initially as part of debugging HIVE-7368, but I think that given the frequency of this call, it's useful to have it in mainline direct sql. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (HIVE-7657) Nullable union of 3 or more types is not recognized nullable
[ https://issues.apache.org/jira/browse/HIVE-7657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashish Kumar Singh reassigned HIVE-7657: Assignee: Ashish Kumar Singh Nullable union of 3 or more types is not recognized nullable Key: HIVE-7657 URL: https://issues.apache.org/jira/browse/HIVE-7657 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Reporter: Arkadiusz Gasior Assignee: Ashish Kumar Singh Labels: avro Handling nullable union of 3 types or more is causing serialization issues, as [null,long,string] is not recognized nullable. Potential code causing issues might be AvroSerdeUtils.java: {code} public static boolean isNullableType(Schema schema) { return schema.getType().equals(Schema.Type.UNION) schema.getTypes().size() == 2 (schema.getTypes().get(0).getType().equals(Schema.Type.NULL) || schema.getTypes().get(1).getType().equals(Schema.Type.NULL)); // [null, null] not allowed, so this check is ok. } {code} -- This message was sent by Atlassian JIRA (v6.2#6252)