[jira] [Updated] (HIVE-5690) Support subquery for single sourced multi query
[ https://issues.apache.org/jira/browse/HIVE-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-5690: Attachment: HIVE-5690.11.patch.txt Support subquery for single sourced multi query --- Key: HIVE-5690 URL: https://issues.apache.org/jira/browse/HIVE-5690 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Attachments: D13791.1.patch, HIVE-5690.10.patch.txt, HIVE-5690.11.patch.txt, HIVE-5690.2.patch.txt, HIVE-5690.3.patch.txt, HIVE-5690.4.patch.txt, HIVE-5690.5.patch.txt, HIVE-5690.6.patch.txt, HIVE-5690.7.patch.txt, HIVE-5690.8.patch.txt, HIVE-5690.9.patch.txt Single sourced multi (insert) query is very useful for various ETL processes but it does not allow subqueries included. For example, {noformat} explain from src insert overwrite table x1 select * from (select distinct key,value) b order by key insert overwrite table x2 select * from (select distinct key,value) c order by value; {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7849) Support more generic predicate pushdown for hbase handler
[ https://issues.apache.org/jira/browse/HIVE-7849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14108795#comment-14108795 ] Navis commented on HIVE-7849: - vector_between_in need update. Others are not related to this. Support more generic predicate pushdown for hbase handler - Key: HIVE-7849 URL: https://issues.apache.org/jira/browse/HIVE-7849 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-7849.1.patch.txt, HIVE-7849.2.patch.txt Currently, hbase handler supports AND conjugated filters only. This is the first try to support OR, NOT, IN, BETWEEN predicates for hbase. Mostly based on the work done by [~teddy.choi]. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 24986: HIVE-7553: decouple the auxiliary jar loading from hive server2 starting phase
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24986/ --- (Updated Aug. 25, 2014, 6:45 a.m.) Review request for hive. Changes --- (1) clean code (2) change property description Bugs: HIVE-7553 https://issues.apache.org/jira/browse/HIVE-7553 Repository: hive-git Description --- HIVE-7553: decouple the auxiliary jar loading from hive server2 starting phase Diffs (updated) - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 7f4afd9d64aff18329e7850342855aade42e21f5 hcatalog/core/src/main/java/org/apache/hive/hcatalog/common/HCatUtil.java 93a03adeab7ba3c3c91344955d303e4252005239 hcatalog/webhcat/java-client/src/main/java/org/apache/hive/hcatalog/api/HCatClient.java f25039dcf55b3b24bbf8dcba05855665a1c7f3b0 ql/src/java/org/apache/hadoop/hive/ql/exec/DefaultFetchFormatter.java 5924bcf1f55dc4c2dd06f312f929047b7df9de55 ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 0c6a3d44ef1f796778768421dc02f8bf3ede6a8c ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionTask.java bd45df1a401d1adb009e953d08205c7d5c2d5de2 ql/src/java/org/apache/hadoop/hive/ql/exec/ListSinkOperator.java dcc19f70644c561e17df8c8660ca62805465f1d6 ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 76fee612a583cdc2c632d27932623521b735e768 ql/src/java/org/apache/hadoop/hive/ql/exec/mr/HadoopJobExecHelper.java eb2851b2c5fa52e0f555b3d8d1beea5d1ac3b225 ql/src/java/org/apache/hadoop/hive/ql/hooks/HookUtils.java 3f474f846c7af5f1f65f1c14f3ce51308f1279d4 ql/src/java/org/apache/hadoop/hive/ql/io/HivePassThroughOutputFormat.java 0962cadce0d515e046371d0a816f4efd70b8eef7 ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveUtils.java 9051ba6d80e619ddbb6c27bb161e1e7a5cdb08a5 ql/src/java/org/apache/hadoop/hive/ql/metadata/Partition.java edec1b734fb2f015902fd5e1c8afd5acdf4cb3bf ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java 2f13ac2e30195a25844a25e9ec8a7c42ed99b75c ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConstantPropagateProcFactory.java b15aedc15d8cd0979aced6ff4c9e87606576f0a3 ql/src/java/org/apache/hadoop/hive/ql/parse/ImportSemanticAnalyzer.java d86df453cd7686627940ade62c0fd72f1636dd0b ql/src/java/org/apache/hadoop/hive/ql/parse/ParseUtils.java 0a1c660b4bbd46d8410e646270b23c99a4de8b7e ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java b05d3b48ec014e4dc8026bb5f6615f62da0e2210 ql/src/java/org/apache/hadoop/hive/ql/plan/AggregationDesc.java 17eeae1a3435fceb4b57325675c58b599e0973ea ql/src/java/org/apache/hadoop/hive/ql/plan/CreateTableDesc.java 930acbc98e81f8d421cee1170659d8b7a427fe7d ql/src/java/org/apache/hadoop/hive/ql/plan/TableDesc.java 39f1793aaa5bed8a494883cac516ad314be951f4 ql/src/java/org/apache/hadoop/hive/ql/processors/CommandProcessorFactory.java 0d237f01a248a65b4092eb7202fe30eebf27be82 ql/src/java/org/apache/hadoop/hive/ql/processors/HiveCommand.java f5bc427a5834860441f21bfc72e175c6a1cf877f ql/src/java/org/apache/hadoop/hive/ql/processors/RefreshProcessor.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java 9798cf3f537a27d1f828f8139790c62c5945c366 ql/src/java/org/apache/hadoop/hive/ql/stats/StatsFactory.java e247184b7d95c85fd3e12432e7eb75eb1e2a0b68 ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFBridge.java 959007a54b335bb0bdef0256f60e6cbc65798dc7 ql/src/test/org/apache/hadoop/hive/ql/session/TestSessionState.java ef0052f5763922d50986f127c416af5eaa6ae30d ql/src/test/resources/SessionStateTest-V1.jar PRE-CREATION ql/src/test/resources/SessionStateTest-V2.jar PRE-CREATION service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java bc0a02c1df7f9fdb848d5f078e94a663a579e571 Diff: https://reviews.apache.org/r/24986/diff/ Testing --- Thanks, cheng xu
[jira] [Updated] (HIVE-7850) Hive Query failed if the data type is arraystring with parquet files
[ https://issues.apache.org/jira/browse/HIVE-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sathish updated HIVE-7850: -- Attachment: HIVE-7850.1.patch New patch file submitted by correcting indentations. Hive Query failed if the data type is arraystring with parquet files -- Key: HIVE-7850 URL: https://issues.apache.org/jira/browse/HIVE-7850 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 0.14.0, 0.13.1 Reporter: Sathish Assignee: Sathish Labels: parquet, serde Fix For: 0.14.0 Attachments: HIVE-7850.1.patch, HIVE-7850.patch * Created a parquet file from the Avro file which have 1 array data type and rest are primitive types. Avro Schema of the array data type. Eg: {code} { name : action, type : [ { type : array, items : string }, null ] } {code} * Created External Hive table with the Array type as below, {code} create external table paraArray (action Array) partitioned by (partitionid int) row format serde 'parquet.hive.serde.ParquetHiveSerDe' stored as inputformat 'parquet.hive.MapredParquetInputFormat' outputformat 'parquet.hive.MapredParquetOutputFormat' location '/testPara'; alter table paraArray add partition(partitionid=1) location '/testPara'; {code} * Run the following query(select action from paraArray limit 10) and the Map reduce jobs are failing with the following exception. {code} Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row [Error getting row data with exception java.lang.ClassCastException: parquet.hive.writable.BinaryWritable$DicBinaryWritable cannot be cast to org.apache.hadoop.io.ArrayWritable at parquet.hive.serde.ParquetHiveArrayInspector.getList(ParquetHiveArrayInspector.java:125) at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:315) at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:371) at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:236) at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:222) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:665) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:405) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:336) at org.apache.hadoop.mapred.Child$4.run(Child.java:270) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1126) at org.apache.hadoop.mapred.Child.main(Child.java:264) ] at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:671) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144) ... 8 more {code} This issue has long back posted on Parquet issues list and Since this is related to Parquet Hive serde, I have created the Hive issue here, The details and history of this information are as shown in the link here https://github.com/Parquet/parquet-mr/issues/281. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7799) TRANSFORM failed in transform_ppr1.q[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14108843#comment-14108843 ] Chengxiang Li commented on HIVE-7799: - Depends on the implementation of {{ResultIterator.hasNext()}}, it is designed to be a lazy iterator as it only try to call {{processNextRecord()}} while RowContainer is empty, but RowContainer does not support add more rows after already read as mentioned in previous comments. Here is what happens while different queries is executed: # For Map only job, it write map output into file directly, no need Collector in this case. # For Map Reduce job with GroupByOperator, {{HiveBaseFunctionResultList.collect()}} is triggered by {{closeRecordProcessor()}}, which is beyond the lazy-computing logic, so the ResultIterator does not do lazy computing in this case. # For Map Reduce job without GroupByOperator(like cluster by queries), ResultIterator do lazy computing, and it clear RowContainer each time befor call {{processNextRecord()}}. While read/write HiveBaseFunctionResultList in the same thread, access progress of RowContainer is like .clear()-addRow()-first()-clear()-addRow()-first().. so it won't violate RowContainer's access rule. But with mutli threads to read/write HiveBaseFunctionResultList, as the ScriptOperator does which venki mentioned above, it would definitely hit this JIRA issue. In my opinion, there are 2 solutions: # remove ResultIterator lazy computing feature as patch1 does. # implement a RowConatiner-like class, which support current RowContainer features. it also need to be thread-safe, and support add row after {{first()}} is already called. The second solution is quite complex, it may introduce performance degrade after support thread-safe access and write-after-read, compare with the performance upgrade of lazy-computing support, it's hardly to say whether it's worthy or not now. So I suggest we take the first solution to fix this issue, and left the possible optimization to milestone 4. TRANSFORM failed in transform_ppr1.q[Spark Branch] -- Key: HIVE-7799 URL: https://issues.apache.org/jira/browse/HIVE-7799 Project: Hive Issue Type: Bug Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Labels: Spark-M1 Attachments: HIVE-7799.1-spark.patch, HIVE-7799.2-spark.patch, HIVE-7799.3-spark.patch Here is the exception: {noformat} 2014-08-20 01:14:36,594 ERROR executor.Executor (Logging.scala:logError(96)) - Exception in task 0.0 in stage 1.0 (TID 0) java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.spark.HiveKVResultCache.next(HiveKVResultCache.java:113) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:124) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:82) at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:42) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) {noformat} Basically, the cause is that RowContainer is misused(it's not allowed to write once someone read row from it), i'm trying to figure out whether it's a hive issue or just in hive on spark mode. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7799) TRANSFORM failed in transform_ppr1.q[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengxiang Li updated HIVE-7799: Attachment: HIVE-7799.3-spark.patch reattach the first patch. TRANSFORM failed in transform_ppr1.q[Spark Branch] -- Key: HIVE-7799 URL: https://issues.apache.org/jira/browse/HIVE-7799 Project: Hive Issue Type: Bug Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Labels: Spark-M1 Attachments: HIVE-7799.1-spark.patch, HIVE-7799.2-spark.patch, HIVE-7799.3-spark.patch Here is the exception: {noformat} 2014-08-20 01:14:36,594 ERROR executor.Executor (Logging.scala:logError(96)) - Exception in task 0.0 in stage 1.0 (TID 0) java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.spark.HiveKVResultCache.next(HiveKVResultCache.java:113) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:124) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:82) at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:42) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) {noformat} Basically, the cause is that RowContainer is misused(it's not allowed to write once someone read row from it), i'm trying to figure out whether it's a hive issue or just in hive on spark mode. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-5690) Support subquery for single sourced multi query
[ https://issues.apache.org/jira/browse/HIVE-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14108860#comment-14108860 ] Hive QA commented on HIVE-5690: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12664102/HIVE-5690.11.patch.txt {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 6119 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/484/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/484/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-484/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12664102 Support subquery for single sourced multi query --- Key: HIVE-5690 URL: https://issues.apache.org/jira/browse/HIVE-5690 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Attachments: D13791.1.patch, HIVE-5690.10.patch.txt, HIVE-5690.11.patch.txt, HIVE-5690.2.patch.txt, HIVE-5690.3.patch.txt, HIVE-5690.4.patch.txt, HIVE-5690.5.patch.txt, HIVE-5690.6.patch.txt, HIVE-5690.7.patch.txt, HIVE-5690.8.patch.txt, HIVE-5690.9.patch.txt Single sourced multi (insert) query is very useful for various ETL processes but it does not allow subqueries included. For example, {noformat} explain from src insert overwrite table x1 select * from (select distinct key,value) b order by key insert overwrite table x2 select * from (select distinct key,value) c order by value; {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7826) Dynamic partition pruning on Tez
[ https://issues.apache.org/jira/browse/HIVE-7826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-7826: - Attachment: HIVE-7826.3.patch .3 has various fixes. Should be good to go now. Dynamic partition pruning on Tez Key: HIVE-7826 URL: https://issues.apache.org/jira/browse/HIVE-7826 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Labels: tez Attachments: HIVE-7826.1.patch, HIVE-7826.2.patch, HIVE-7826.3.patch It's natural in a star schema to map one or more dimensions to partition columns. Time or location are likely candidates. It can also useful to be to compute the partitions one would like to scan via a subquery (where p in select ... from ...). The resulting joins in hive require a full table scan of the large table though, because partition pruning takes place before the corresponding values are known. On Tez it's relatively straight forward to send the values needed to prune to the application master - where splits are generated and tasks are submitted. Using these values we can strip out any unneeded partitions dynamically, while the query is running. The approach is straight forward: - Insert synthetic conditions for each join representing x in (keys of other side in join) - This conditions will be pushed as far down as possible - If the condition hits a table scan and the column involved is a partition column: - Setup Operator to send key events to AM - else: - Remove synthetic predicate -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7826) Dynamic partition pruning on Tez
[ https://issues.apache.org/jira/browse/HIVE-7826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14108872#comment-14108872 ] Gunther Hagleitner commented on HIVE-7826: -- Review board link: https://reviews.apache.org/r/25019/ Dynamic partition pruning on Tez Key: HIVE-7826 URL: https://issues.apache.org/jira/browse/HIVE-7826 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Labels: tez Attachments: HIVE-7826.1.patch, HIVE-7826.2.patch, HIVE-7826.3.patch It's natural in a star schema to map one or more dimensions to partition columns. Time or location are likely candidates. It can also useful to be to compute the partitions one would like to scan via a subquery (where p in select ... from ...). The resulting joins in hive require a full table scan of the large table though, because partition pruning takes place before the corresponding values are known. On Tez it's relatively straight forward to send the values needed to prune to the application master - where splits are generated and tasks are submitted. Using these values we can strip out any unneeded partitions dynamically, while the query is running. The approach is straight forward: - Insert synthetic conditions for each join representing x in (keys of other side in join) - This conditions will be pushed as far down as possible - If the condition hits a table scan and the column involved is a partition column: - Setup Operator to send key events to AM - else: - Remove synthetic predicate -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7733) Ambiguous column reference error on query
[ https://issues.apache.org/jira/browse/HIVE-7733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14108871#comment-14108871 ] Navis commented on HIVE-7733: - Just a blind shot. I'll look into this. Ambiguous column reference error on query - Key: HIVE-7733 URL: https://issues.apache.org/jira/browse/HIVE-7733 Project: Hive Issue Type: Bug Affects Versions: 0.13.0 Reporter: Jason Dere Attachments: HIVE-7733.1.patch.txt {noformat} CREATE TABLE agg1 ( col0 INT, col1 STRING, col2 DOUBLE ); explain SELECT single_use_subq11.a1 AS a1, single_use_subq11.a2 AS a2 FROM (SELECT Sum(agg1.col2) AS a1 FROM agg1 GROUP BY agg1.col0) single_use_subq12 JOIN (SELECT alias.a2 AS a0, alias.a1 AS a1, alias.a1 AS a2 FROM (SELECT agg1.col1 AS a0, '42' AS a1, agg1.col0 AS a2 FROM agg1 UNION ALL SELECT agg1.col1 AS a0, '41' AS a1, agg1.col0 AS a2 FROM agg1) alias GROUP BY alias.a2, alias.a1) single_use_subq11 ON ( single_use_subq11.a0 = single_use_subq11.a0 ); {noformat} Gets the following error: FAILED: SemanticException [Error 10007]: Ambiguous column reference a2 Looks like this query had been working in 0.12 but starting failing with this error in 0.13 -- This message was sent by Atlassian JIRA (v6.2#6252)
Hive unwanted directories creation issue
We are creating external table in Hive and if the location path is not present in the HDFS say /testdata(as shown below), Hive is creating the '/testdata' dummy folder. Is there any option in hive or any way to stop creating dummy directories if the location folder not exists. So we end up creating many unwanted dummy directories if the data not present on the HDFS for many partitions we add after creating table. CREATE EXTERNAL TABLE testTable ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES ('avro.schema.literal'='{ schema json literal') STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION '/testdata/'; Regards Sathish Valluri
[jira] [Commented] (HIVE-5799) session/operation timeout for hiveserver2
[ https://issues.apache.org/jira/browse/HIVE-5799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14108910#comment-14108910 ] Hive QA commented on HIVE-5799: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12663284/HIVE-5799.12.patch.txt {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6119 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/486/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/486/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-486/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12663284 session/operation timeout for hiveserver2 - Key: HIVE-5799 URL: https://issues.apache.org/jira/browse/HIVE-5799 Project: Hive Issue Type: Improvement Components: HiveServer2 Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-5799.1.patch.txt, HIVE-5799.10.patch.txt, HIVE-5799.11.patch.txt, HIVE-5799.12.patch.txt, HIVE-5799.12.patch.txt, HIVE-5799.2.patch.txt, HIVE-5799.3.patch.txt, HIVE-5799.4.patch.txt, HIVE-5799.5.patch.txt, HIVE-5799.6.patch.txt, HIVE-5799.7.patch.txt, HIVE-5799.8.patch.txt, HIVE-5799.9.patch.txt Need some timeout facility for preventing resource leakages from instable or bad clients. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7850) Hive Query failed if the data type is arraystring with parquet files
[ https://issues.apache.org/jira/browse/HIVE-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14108912#comment-14108912 ] Hive QA commented on HIVE-7850: --- {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12664109/HIVE-7850.1.patch Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/487/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/487/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-487/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]] + export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + export PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + cd /data/hive-ptest/working/ + tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-487/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ svn = \s\v\n ]] + [[ -n '' ]] + [[ -d apache-svn-trunk-source ]] + [[ ! -d apache-svn-trunk-source/.svn ]] + [[ ! -d apache-svn-trunk-source ]] + cd apache-svn-trunk-source + svn revert -R . Reverted 'common/src/java/org/apache/hadoop/hive/conf/HiveConf.java' Reverted 'common/src/java/org/apache/hadoop/hive/conf/Validator.java' Reverted 'service/src/java/org/apache/hive/service/cli/OperationState.java' Reverted 'service/src/java/org/apache/hive/service/cli/session/HiveSession.java' Reverted 'service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java' Reverted 'service/src/java/org/apache/hive/service/cli/session/HiveSessionBase.java' Reverted 'service/src/java/org/apache/hive/service/cli/session/SessionManager.java' Reverted 'service/src/java/org/apache/hive/service/cli/operation/Operation.java' Reverted 'service/src/java/org/apache/hive/service/cli/operation/OperationManager.java' ++ awk '{print $2}' ++ egrep -v '^X|^Performing status on external' ++ svn status --no-ignore + rm -rf target datanucleus.log ant/target shims/target shims/0.20/target shims/0.20S/target shims/0.23/target shims/aggregator/target shims/common/target shims/common-secure/target packaging/target hbase-handler/target testutils/target jdbc/target metastore/target itests/target itests/hcatalog-unit/target itests/test-serde/target itests/qtest/target itests/hive-unit-hadoop2/target itests/hive-minikdc/target itests/hive-unit/target itests/hive-unit/src/test/java/org/apache/hive/jdbc/miniHS2/TestHiveServer2SessionTimeout.java itests/custom-serde/target itests/util/target hcatalog/target hcatalog/core/target hcatalog/streaming/target hcatalog/server-extensions/target hcatalog/webhcat/svr/target hcatalog/webhcat/java-client/target hcatalog/hcatalog-pig-adapter/target accumulo-handler/target hwi/target common/target common/src/gen service/target contrib/target serde/target beeline/target odbc/target cli/target ql/dependency-reduced-pom.xml ql/target + svn update Fetching external item into 'hcatalog/src/test/e2e/harness' External at revision 1620279. At revision 1620279. + patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hive-ptest/working/scratch/build.patch + [[ -f /data/hive-ptest/working/scratch/build.patch ]] + chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh + /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch The patch does not appear to apply with p0, p1, or p2 + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12664109 Hive Query failed if the data type is arraystring with parquet files -- Key: HIVE-7850 URL: https://issues.apache.org/jira/browse/HIVE-7850 Project: Hive
This looks like an Hive issue to me, Can anyone suggest other ways to overcome this.
We are creating external table in Hive and if the location path is not present in the HDFS say /testdata(as shown below), Hive is creating the ‘/testdata’ dummy folder. Is there any option in hive or any way to stop creating dummy directories if the location folder not exists. So we end up creating many unwanted dummy directories if the data not present on the HDFS for many partitions we add after creating table. CREATE EXTERNAL TABLE testTable ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES ('avro.schema.literal'='{ schema json literal') STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION '/testdata/'; Regards Sathish Valluri
Re: Hive on Tez Counters
Hi Siddharth/Gunther, Thanks for replying to my queries. I was particularly interested in the CPU counter since I was doing some benchmarking on queries. Can you please clarify if I just blindly take a mod(CPU counter) for all tasks and add them up..would they be fine..or should I take a patch from the fix and apply it on Tez 0.4 to get it working until 0.5 is released? Thanks Suma On Fri, Aug 22, 2014 at 2:55 AM, Gunther Hagleitner ghagleit...@hortonworks.com wrote: Hive logs the same counters regardless of whether you run with Tez or MR. We've removed some counters in hive 0.13 (HIVE-4518) - the specific one you're looking for might be in that list. Thanks, Gunther. On Thu, Aug 21, 2014 at 11:13 AM, Siddharth Seth ss...@apache.org wrote: I'll let Hive folks answer the questions about the Hive counters. In terms of the CPU counter - that was a bug in Tez-0.4.0, which has been fixed in 0.5.0. COMMITTED_HEAP_BYTES just represents the memory available to the JVM (Runtime.getRuntime().totalMemory()). This will only vary if the VM is started with a different Xms and Xmx option. In terms of Tez, the application logs are currently the best place. Hive may expose these in a more accessible manner though. On Wed, Aug 20, 2014 at 11:16 PM, Suma Shivaprasad sumasai.shivapra...@gmail.com wrote: Hi, Needed info on where I can get detailed job counters for Hive on Tez. Am running this on a HDP cluster with Hive 0.13 and see only the following job counters through Hive Tez in Yarn application logs which I got through( yarn logs -applicationId ...) . a. Cannot see any ReduceOperator counters and also only DESERIALIZE_ERRORS is the only counter present in MapOperator b. The CPU_MILLISECONDS in some cases in -ve. Is CPU_MILLISECONDS accurate c. What does COMMITTED_HEAP_BYTES indicate? d. Is there any other place I should be checking the counters? [[File System Counters FILE: BYTES_READ=512, FILE: BYTES_WRITTEN=3079881, FILE: READ_OPS=0, FILE: LARGE_READ_OPS=0, FILE: WRITE_OPS=0, HDFS: BYTES_READ=8215153, HDFS: BYTES_WRITTEN=0, HDFS: READ_OPS=3, HDFS: LARGE_READ_OPS=0, HDFS: WRITE_OPS=0] [org.apache.tez.common.counters.TaskCounter SPILLED_RECORDS=222543, GC_TIME_MILLIS=172, *CPU_MILLISECONDS=-19700*, PHYSICAL_MEMORY_BYTES=667566080, VIRTUAL_MEMORY_BYTES=1887797248, COMMITTED_HEAP_BYTES=1011023872, INPUT_RECORDS_PROCESSED=222543, OUTPUT_RECORDS=222543, OUTPUT_BYTES=23543896, OUTPUT_BYTES_WITH_OVERHEAD=23989024, OUTPUT_BYTES_PHYSICAL=3079369, ADDITIONAL_SPILLS_BYTES_WRITTEN=0, ADDITIONAL_SPILLS_BYTES_READ=0, ADDITIONAL_SPILL_COUNT=0] [*org.apache.hadoop.hive.ql.exec.MapOperator*$Counter DESERIALIZE_ERRORS=0]] Thanks Suma -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
[jira] [Commented] (HIVE-7799) TRANSFORM failed in transform_ppr1.q[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109246#comment-14109246 ] Venki Korukanti commented on HIVE-7799: --- [~chengxiang li] Your plan sounds good. Lets log a JIRA to enable lazy computing and we will revisit in milestone 4. TRANSFORM failed in transform_ppr1.q[Spark Branch] -- Key: HIVE-7799 URL: https://issues.apache.org/jira/browse/HIVE-7799 Project: Hive Issue Type: Bug Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Labels: Spark-M1 Attachments: HIVE-7799.1-spark.patch, HIVE-7799.2-spark.patch, HIVE-7799.3-spark.patch Here is the exception: {noformat} 2014-08-20 01:14:36,594 ERROR executor.Executor (Logging.scala:logError(96)) - Exception in task 0.0 in stage 1.0 (TID 0) java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.spark.HiveKVResultCache.next(HiveKVResultCache.java:113) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:124) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:82) at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:42) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) {noformat} Basically, the cause is that RowContainer is misused(it's not allowed to write once someone read row from it), i'm trying to figure out whether it's a hive issue or just in hive on spark mode. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7870) Insert overwrite table query does not generate correct task plan
Na Yang created HIVE-7870: - Summary: Insert overwrite table query does not generate correct task plan Key: HIVE-7870 URL: https://issues.apache.org/jira/browse/HIVE-7870 Project: Hive Issue Type: Task Components: Spark Reporter: Na Yang Insert overwrite table query does not generate correct task plan when hive.optimize.union.remove and hive.merge.sparkfiles properties are ON. {noformat} set hive.optimize.union.remove=true set hive.merge.sparkfiles=true insert overwrite table outputTbl1 SELECT * FROM ( select key, 1 as values from inputTbl1 union all select * FROM ( SELECT key, count(1) as values from inputTbl1 group by key UNION ALL SELECT key, 2 as values from inputTbl1 ) a )b; select * from outputTbl1 order by key, values; {noformat} query result {noformat} 1 1 1 2 2 1 2 2 3 1 3 2 7 1 7 2 8 2 8 2 8 2 {noformat} expected result: {noformat} 1 1 1 1 1 2 2 1 2 1 2 2 3 1 3 1 3 2 7 1 7 1 7 2 8 1 8 1 8 2 8 2 8 2 {noformat} Move work is not working properly and some data are missing during move. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7799) TRANSFORM failed in transform_ppr1.q[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-7799: --- Status: Patch Available (was: Open) TRANSFORM failed in transform_ppr1.q[Spark Branch] -- Key: HIVE-7799 URL: https://issues.apache.org/jira/browse/HIVE-7799 Project: Hive Issue Type: Bug Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Labels: Spark-M1 Attachments: HIVE-7799.1-spark.patch, HIVE-7799.2-spark.patch, HIVE-7799.3-spark.patch Here is the exception: {noformat} 2014-08-20 01:14:36,594 ERROR executor.Executor (Logging.scala:logError(96)) - Exception in task 0.0 in stage 1.0 (TID 0) java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.spark.HiveKVResultCache.next(HiveKVResultCache.java:113) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:124) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:82) at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:42) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) {noformat} Basically, the cause is that RowContainer is misused(it's not allowed to write once someone read row from it), i'm trying to figure out whether it's a hive issue or just in hive on spark mode. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (HIVE-7869) Long running tests (1) [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suhas Satish reassigned HIVE-7869: -- Assignee: Suhas Satish Long running tests (1) [Spark Branch] - Key: HIVE-7869 URL: https://issues.apache.org/jira/browse/HIVE-7869 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Brock Noland Assignee: Suhas Satish I have noticed when running the full test suite locally that the test JVM eventually crashes. We should do some testing (not part of the unit tests) which starts up a HS2 and runs queries on it continuously for 24 hours or so. In this JIRA let's create a stand alone java program which connects to a HS2 over JDBC, creates a bunch of tables (say 100) and then runs queries until the JDBC client is killed. This will allow us to run long running tests. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7810) Insert overwrite table query has strange behavior when set hive.optimize.union.remove=true [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-7810: --- Issue Type: Sub-task (was: Task) Parent: HIVE-7292 Insert overwrite table query has strange behavior when set hive.optimize.union.remove=true [Spark Branch] - Key: HIVE-7810 URL: https://issues.apache.org/jira/browse/HIVE-7810 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Na Yang Assignee: Na Yang Attachments: HIVE-7810.1-spark.patch Insert overwrite table query has strange behavior when set hive.optimize.union.remove=true set hive.mapred.supports.subdirectories=true; set hive.merge.mapfiles=true; set hive.merge.mapredfiles=true; We expect the following two sets of queries return the same set of data result, but they do not. 1) {noformat} insert overwrite table outputTbl1 SELECT * FROM ( select key, 1 as values from inputTbl1 union all select * FROM ( SELECT key, count(1) as values from inputTbl1 group by key UNION ALL SELECT key, 2 as values from inputTbl1 ) a )b; select * from outputTbl1 order by key, values; {noformat} Below is the query result: {noformat} 1 1 1 2 2 1 2 2 3 1 3 2 7 1 7 2 8 2 8 2 8 2 {noformat} 2) {noformat} SELECT * FROM ( select key, 1 as values from inputTbl1 union all select * FROM ( SELECT key, count(1) as values from inputTbl1 group by key UNION ALL SELECT key, 2 as values from inputTbl1 ) a )b order by key, values; {noformat} Below is the query result: {noformat} 1 1 1 1 1 2 2 1 2 1 2 2 3 1 3 1 3 2 7 1 7 1 7 2 8 1 8 1 8 2 8 2 8 2 {noformat} Some data is missing in the first set of query result. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7850) Hive Query failed if the data type is arraystring with parquet files
[ https://issues.apache.org/jira/browse/HIVE-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109315#comment-14109315 ] Ryan Blue commented on HIVE-7850: - Looking at just the changes to the schema conversion, I'm not sure why the change to the list structure was done. Previously, lists were converted to: {code} // arraystring name optional group name (LIST) { repeated group bag { optional string array_element; } } {code} This allowed the list itself to be null and allowed null elements. This patch changes the conversion to: {code} // arraystring name optional group name (LIST) { repeated string array_element; } {code} This requires that the elements are non-null. Was this on purpose? The first one looks more correct to me, but the second would be if nulls aren't allowed in Hive lists. In addition, the HiveSchemaConverter#listWrapper method and ParquetHiveSerDe.ARRAY static field are no longer used but not removed. The other change to schema conversion tests the Repetition and calls {{Types.required}} or {{Types.optional}}. This should instead call {{Types.primitive(type, repetition)}} to pass the repetition to the {{Types}} API. That way, {{Repetition.REPEATED}} is supported also, which is a bug in the current patch. Hive Query failed if the data type is arraystring with parquet files -- Key: HIVE-7850 URL: https://issues.apache.org/jira/browse/HIVE-7850 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 0.14.0, 0.13.1 Reporter: Sathish Assignee: Sathish Labels: parquet, serde Fix For: 0.14.0 Attachments: HIVE-7850.1.patch, HIVE-7850.patch * Created a parquet file from the Avro file which have 1 array data type and rest are primitive types. Avro Schema of the array data type. Eg: {code} { name : action, type : [ { type : array, items : string }, null ] } {code} * Created External Hive table with the Array type as below, {code} create external table paraArray (action Array) partitioned by (partitionid int) row format serde 'parquet.hive.serde.ParquetHiveSerDe' stored as inputformat 'parquet.hive.MapredParquetInputFormat' outputformat 'parquet.hive.MapredParquetOutputFormat' location '/testPara'; alter table paraArray add partition(partitionid=1) location '/testPara'; {code} * Run the following query(select action from paraArray limit 10) and the Map reduce jobs are failing with the following exception. {code} Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row [Error getting row data with exception java.lang.ClassCastException: parquet.hive.writable.BinaryWritable$DicBinaryWritable cannot be cast to org.apache.hadoop.io.ArrayWritable at parquet.hive.serde.ParquetHiveArrayInspector.getList(ParquetHiveArrayInspector.java:125) at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:315) at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:371) at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:236) at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:222) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:665) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:405) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:336) at org.apache.hadoop.mapred.Child$4.run(Child.java:270) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1126) at org.apache.hadoop.mapred.Child.main(Child.java:264) ] at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:671) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144) ... 8 more {code} This issue has long back posted on Parquet issues list and Since this is related to Parquet Hive serde, I have created the Hive issue here, The details and history of this information are as shown in the link here https://github.com/Parquet/parquet-mr/issues/281. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HIVE-7724) CBO: support Subquery predicates
[ https://issues.apache.org/jira/browse/HIVE-7724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan resolved HIVE-7724. Resolution: Fixed Committed to cbo branch. Thanks, Harish! CBO: support Subquery predicates Key: HIVE-7724 URL: https://issues.apache.org/jira/browse/HIVE-7724 Project: Hive Issue Type: Sub-task Reporter: Harish Butani Assignee: Harish Butani Attachments: HIVE-7724.1.patch, HIVE-7724.rewriteInHive.prelim.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7810) Insert overwrite table query has strange behavior when set hive.optimize.union.remove=true [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109323#comment-14109323 ] Brock Noland commented on HIVE-7810: [~csun] can you review this patch since it appears you have some knowledge here? Insert overwrite table query has strange behavior when set hive.optimize.union.remove=true [Spark Branch] - Key: HIVE-7810 URL: https://issues.apache.org/jira/browse/HIVE-7810 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Na Yang Assignee: Na Yang Attachments: HIVE-7810.1-spark.patch Insert overwrite table query has strange behavior when set hive.optimize.union.remove=true set hive.mapred.supports.subdirectories=true; set hive.merge.mapfiles=true; set hive.merge.mapredfiles=true; We expect the following two sets of queries return the same set of data result, but they do not. 1) {noformat} insert overwrite table outputTbl1 SELECT * FROM ( select key, 1 as values from inputTbl1 union all select * FROM ( SELECT key, count(1) as values from inputTbl1 group by key UNION ALL SELECT key, 2 as values from inputTbl1 ) a )b; select * from outputTbl1 order by key, values; {noformat} Below is the query result: {noformat} 1 1 1 2 2 1 2 2 3 1 3 2 7 1 7 2 8 2 8 2 8 2 {noformat} 2) {noformat} SELECT * FROM ( select key, 1 as values from inputTbl1 union all select * FROM ( SELECT key, count(1) as values from inputTbl1 group by key UNION ALL SELECT key, 2 as values from inputTbl1 ) a )b order by key, values; {noformat} Below is the query result: {noformat} 1 1 1 1 1 2 2 1 2 1 2 2 3 1 3 1 3 2 7 1 7 1 7 2 8 1 8 1 8 2 8 2 8 2 {noformat} Some data is missing in the first set of query result. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7724) CBO: support Subquery predicates
[ https://issues.apache.org/jira/browse/HIVE-7724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7724: --- Component/s: CBO CBO: support Subquery predicates Key: HIVE-7724 URL: https://issues.apache.org/jira/browse/HIVE-7724 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Harish Butani Assignee: Harish Butani Attachments: HIVE-7724.1.patch, HIVE-7724.rewriteInHive.prelim.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7850) Hive Query failed if the data type is arraystring with parquet files
[ https://issues.apache.org/jira/browse/HIVE-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109347#comment-14109347 ] Ryan Blue commented on HIVE-7850: - It looks like {{ArrayWritableGroupConverter}} is only used for maps and arrays, but the array handling was added mostly in this patch. Given that most of the methods check {{isMap}} and have completely different implementations for map and array, it makes more sense to separate this into two classes, {{ArrayGroupConverter}} and {{MapGroupConverter}}. Then {{HiveSchemaConverter}} should choose the correct one based on the {{OriginalType}} annotation. If there is no original type annotation, but the type is repeated, it should use an {{ArrayGroupConverter}}. Hive Query failed if the data type is arraystring with parquet files -- Key: HIVE-7850 URL: https://issues.apache.org/jira/browse/HIVE-7850 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 0.14.0, 0.13.1 Reporter: Sathish Assignee: Sathish Labels: parquet, serde Fix For: 0.14.0 Attachments: HIVE-7850.1.patch, HIVE-7850.patch * Created a parquet file from the Avro file which have 1 array data type and rest are primitive types. Avro Schema of the array data type. Eg: {code} { name : action, type : [ { type : array, items : string }, null ] } {code} * Created External Hive table with the Array type as below, {code} create external table paraArray (action Array) partitioned by (partitionid int) row format serde 'parquet.hive.serde.ParquetHiveSerDe' stored as inputformat 'parquet.hive.MapredParquetInputFormat' outputformat 'parquet.hive.MapredParquetOutputFormat' location '/testPara'; alter table paraArray add partition(partitionid=1) location '/testPara'; {code} * Run the following query(select action from paraArray limit 10) and the Map reduce jobs are failing with the following exception. {code} Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row [Error getting row data with exception java.lang.ClassCastException: parquet.hive.writable.BinaryWritable$DicBinaryWritable cannot be cast to org.apache.hadoop.io.ArrayWritable at parquet.hive.serde.ParquetHiveArrayInspector.getList(ParquetHiveArrayInspector.java:125) at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:315) at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:371) at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:236) at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:222) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:665) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:405) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:336) at org.apache.hadoop.mapred.Child$4.run(Child.java:270) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1126) at org.apache.hadoop.mapred.Child.main(Child.java:264) ] at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:671) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144) ... 8 more {code} This issue has long back posted on Parquet issues list and Since this is related to Parquet Hive serde, I have created the Hive issue here, The details and history of this information are as shown in the link here https://github.com/Parquet/parquet-mr/issues/281. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7810) Insert overwrite table query has strange behavior when set hive.optimize.union.remove=true [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109360#comment-14109360 ] Chao commented on HIVE-7810: [~brocknoland] OK, I'll take a look. Insert overwrite table query has strange behavior when set hive.optimize.union.remove=true [Spark Branch] - Key: HIVE-7810 URL: https://issues.apache.org/jira/browse/HIVE-7810 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Na Yang Assignee: Na Yang Attachments: HIVE-7810.1-spark.patch Insert overwrite table query has strange behavior when set hive.optimize.union.remove=true set hive.mapred.supports.subdirectories=true; set hive.merge.mapfiles=true; set hive.merge.mapredfiles=true; We expect the following two sets of queries return the same set of data result, but they do not. 1) {noformat} insert overwrite table outputTbl1 SELECT * FROM ( select key, 1 as values from inputTbl1 union all select * FROM ( SELECT key, count(1) as values from inputTbl1 group by key UNION ALL SELECT key, 2 as values from inputTbl1 ) a )b; select * from outputTbl1 order by key, values; {noformat} Below is the query result: {noformat} 1 1 1 2 2 1 2 2 3 1 3 2 7 1 7 2 8 2 8 2 8 2 {noformat} 2) {noformat} SELECT * FROM ( select key, 1 as values from inputTbl1 union all select * FROM ( SELECT key, count(1) as values from inputTbl1 group by key UNION ALL SELECT key, 2 as values from inputTbl1 ) a )b order by key, values; {noformat} Below is the query result: {noformat} 1 1 1 1 1 2 2 1 2 1 2 2 3 1 3 1 3 2 7 1 7 1 7 2 8 1 8 1 8 2 8 2 8 2 {noformat} Some data is missing in the first set of query result. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6847) Improve / fix bugs in Hive scratch dir setup
[ https://issues.apache.org/jira/browse/HIVE-6847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta updated HIVE-6847: --- Status: Open (was: Patch Available) Improve / fix bugs in Hive scratch dir setup Key: HIVE-6847 URL: https://issues.apache.org/jira/browse/HIVE-6847 Project: Hive Issue Type: Bug Components: CLI, HiveServer2 Affects Versions: 0.14.0 Reporter: Vikram Dixit K Assignee: Vaibhav Gumashta Fix For: 0.14.0 Attachments: HIVE-6847.1.patch, HIVE-6847.2.patch, HIVE-6847.3.patch, HIVE-6847.4.patch Currently, the hive server creates scratch directory and changes permission to 777 however, this is not great with respect to security. We need to create user specific scratch directories instead. Also refer to HIVE-6782 1st iteration of the patch for approach. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6847) Improve / fix bugs in Hive scratch dir setup
[ https://issues.apache.org/jira/browse/HIVE-6847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta updated HIVE-6847: --- Attachment: HIVE-6847.5.patch Improve / fix bugs in Hive scratch dir setup Key: HIVE-6847 URL: https://issues.apache.org/jira/browse/HIVE-6847 Project: Hive Issue Type: Bug Components: CLI, HiveServer2 Affects Versions: 0.14.0 Reporter: Vikram Dixit K Assignee: Vaibhav Gumashta Fix For: 0.14.0 Attachments: HIVE-6847.1.patch, HIVE-6847.2.patch, HIVE-6847.3.patch, HIVE-6847.4.patch, HIVE-6847.5.patch Currently, the hive server creates scratch directory and changes permission to 777 however, this is not great with respect to security. We need to create user specific scratch directories instead. Also refer to HIVE-6782 1st iteration of the patch for approach. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7871) WebHCat: Hive job with SQL server as MetastoreDB fails when Unicode characters are present in curl command
Hari Sankar Sivarama Subramaniyan created HIVE-7871: --- Summary: WebHCat: Hive job with SQL server as MetastoreDB fails when Unicode characters are present in curl command Key: HIVE-7871 URL: https://issues.apache.org/jira/browse/HIVE-7871 Project: Hive Issue Type: Bug Components: WebHCat Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan Please follow the steps below to repro. 1. Create a SQL Server. Create Username, Password, DB with Unicode characters in their name. 2. Create a cluster and run the below command against its templeton endpoint curl -i -u username:password \ -d define=javax.jdo.option.ConnectionUserName=dbusername@SQLserver \ -d define=hive.metastore.uris= \ -d define=javax.jdo.option.ConnectionURL=jdbc:sqlserver://SQLserver.database.windows.net;database=dbname; encrypt=true;trustServerCertificate=true;create=false \ -d define=javax.jdo.option.ConnectionPassword=dbpassword \ -d statusdir=/hivestatus \ -d user.name=admin \ -d enablelog=false \ -d execute=show tables; \ -s https://localhost:30111/templeton/v1/hive; The following error message is received. javax.jdo.JDOFatalDataStoreException: Unable to open a test connection to the given database. JDBC url = jdbc:sqlserver://SQLserver.database.windows.net;database=dbname; encrypt=true;trustServerCertificate=true;create=false, username = dbusername@SQLserver. Terminating connection pool (set lazyInit to true if you expect to start your database after your app). Original Exception: -- com.microsoft.sqlserver.jdbc.SQLServerException: Login failed for user 'dbusername'. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6847) Improve / fix bugs in Hive scratch dir setup
[ https://issues.apache.org/jira/browse/HIVE-6847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta updated HIVE-6847: --- Status: Patch Available (was: Open) Improve / fix bugs in Hive scratch dir setup Key: HIVE-6847 URL: https://issues.apache.org/jira/browse/HIVE-6847 Project: Hive Issue Type: Bug Components: CLI, HiveServer2 Affects Versions: 0.14.0 Reporter: Vikram Dixit K Assignee: Vaibhav Gumashta Fix For: 0.14.0 Attachments: HIVE-6847.1.patch, HIVE-6847.2.patch, HIVE-6847.3.patch, HIVE-6847.4.patch, HIVE-6847.5.patch Currently, the hive server creates scratch directory and changes permission to 777 however, this is not great with respect to security. We need to create user specific scratch directories instead. Also refer to HIVE-6782 1st iteration of the patch for approach. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7799) TRANSFORM failed in transform_ppr1.q[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109398#comment-14109398 ] Hive QA commented on HIVE-7799: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12664112/HIVE-7799.3-spark.patch {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 6253 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_fs_default_name2 {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/92/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/92/console Test logs: http://ec2-54-176-176-199.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-92/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12664112 TRANSFORM failed in transform_ppr1.q[Spark Branch] -- Key: HIVE-7799 URL: https://issues.apache.org/jira/browse/HIVE-7799 Project: Hive Issue Type: Bug Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Labels: Spark-M1 Attachments: HIVE-7799.1-spark.patch, HIVE-7799.2-spark.patch, HIVE-7799.3-spark.patch Here is the exception: {noformat} 2014-08-20 01:14:36,594 ERROR executor.Executor (Logging.scala:logError(96)) - Exception in task 0.0 in stage 1.0 (TID 0) java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.spark.HiveKVResultCache.next(HiveKVResultCache.java:113) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:124) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:82) at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:42) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) {noformat} Basically, the cause is that RowContainer is misused(it's not allowed to write once someone read row from it), i'm trying to figure out whether it's a hive issue or just in hive on spark mode. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (HIVE-7721) CBO: support case statement translation to optiq
[ https://issues.apache.org/jira/browse/HIVE-7721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Laljo John Pullokkaran reassigned HIVE-7721: Assignee: Laljo John Pullokkaran CBO: support case statement translation to optiq Key: HIVE-7721 URL: https://issues.apache.org/jira/browse/HIVE-7721 Project: Hive Issue Type: Sub-task Reporter: Harish Butani Assignee: Laljo John Pullokkaran Following query: {code} explain select case when key '104' then null else key end as key from src {code} fails with: {quote} java.lang.RuntimeException: java.lang.RuntimeException: java.lang.RuntimeException: Unsupported Expression at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer$OptiqBasedPlanner.getOptimizedAST(SemanticAnalyzer.java:11808) aused by: java.lang.RuntimeException: Unsupported Expression at org.apache.hadoop.hive.ql.optimizer.optiq.translator.RexNodeConverter.convert(RexNodeConverter.java:91) at org.apache.hadoop.hive.ql.optimizer.optiq.translator.RexNodeConverter.convert(RexNodeConverter.java:124) {quote} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7720) CBO: rank translation to Optiq RelNode tree failing
[ https://issues.apache.org/jira/browse/HIVE-7720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Laljo John Pullokkaran updated HIVE-7720: - Assignee: Laljo John Pullokkaran CBO: rank translation to Optiq RelNode tree failing --- Key: HIVE-7720 URL: https://issues.apache.org/jira/browse/HIVE-7720 Project: Hive Issue Type: Sub-task Components: Query Processor Reporter: Harish Butani Assignee: Laljo John Pullokkaran Following query: {code} explain select p_name from (select p_mfgr, p_name, p_size, rank() over(partition by p_mfgr order by p_size) as r from part) a where r = 2; {code} fails with {quote} org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException: One or more arguments are expected. at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFRank.getEvaluator(GenericUDAFRank.java:61) at org.apache.hadoop.hive.ql.udf.generic.AbstractGenericUDAFResolver.getEvaluator(AbstractGenericUDAFResolver.java:47) at org.apache.hadoop.hive.ql.exec.FunctionRegistry.getGenericUDAFEvaluator(FunctionRegistry.java:1110) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getGenericUDAFEvaluator(SemanticAnalyzer.java:3506) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer$OptiqBasedPlanner.getHiveAggInfo(SemanticAnalyzer.java:12496) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer$OptiqBasedPlanner.genWindowingProj(SemanticAnalyzer.java:12858) {quote} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7721) CBO: support case statement translation to optiq
[ https://issues.apache.org/jira/browse/HIVE-7721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109405#comment-14109405 ] Laljo John Pullokkaran commented on HIVE-7721: -- Fixed by HIVE-7841 CBO: support case statement translation to optiq Key: HIVE-7721 URL: https://issues.apache.org/jira/browse/HIVE-7721 Project: Hive Issue Type: Sub-task Reporter: Harish Butani Assignee: Laljo John Pullokkaran Following query: {code} explain select case when key '104' then null else key end as key from src {code} fails with: {quote} java.lang.RuntimeException: java.lang.RuntimeException: java.lang.RuntimeException: Unsupported Expression at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer$OptiqBasedPlanner.getOptimizedAST(SemanticAnalyzer.java:11808) aused by: java.lang.RuntimeException: Unsupported Expression at org.apache.hadoop.hive.ql.optimizer.optiq.translator.RexNodeConverter.convert(RexNodeConverter.java:91) at org.apache.hadoop.hive.ql.optimizer.optiq.translator.RexNodeConverter.convert(RexNodeConverter.java:124) {quote} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HIVE-7721) CBO: support case statement translation to optiq
[ https://issues.apache.org/jira/browse/HIVE-7721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Laljo John Pullokkaran resolved HIVE-7721. -- Resolution: Fixed CBO: support case statement translation to optiq Key: HIVE-7721 URL: https://issues.apache.org/jira/browse/HIVE-7721 Project: Hive Issue Type: Sub-task Reporter: Harish Butani Assignee: Laljo John Pullokkaran Following query: {code} explain select case when key '104' then null else key end as key from src {code} fails with: {quote} java.lang.RuntimeException: java.lang.RuntimeException: java.lang.RuntimeException: Unsupported Expression at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer$OptiqBasedPlanner.getOptimizedAST(SemanticAnalyzer.java:11808) aused by: java.lang.RuntimeException: Unsupported Expression at org.apache.hadoop.hive.ql.optimizer.optiq.translator.RexNodeConverter.convert(RexNodeConverter.java:91) at org.apache.hadoop.hive.ql.optimizer.optiq.translator.RexNodeConverter.convert(RexNodeConverter.java:124) {quote} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7860) [CBO] Query on partitioned table which filter out all partitions fails
[ https://issues.apache.org/jira/browse/HIVE-7860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7860: --- Resolution: Fixed Status: Resolved (was: Patch Available) Committed to branch [CBO] Query on partitioned table which filter out all partitions fails -- Key: HIVE-7860 URL: https://issues.apache.org/jira/browse/HIVE-7860 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: h-7860.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7720) CBO: rank translation to Optiq RelNode tree failing
[ https://issues.apache.org/jira/browse/HIVE-7720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109411#comment-14109411 ] Laljo John Pullokkaran commented on HIVE-7720: -- Support all Windowing UDAF. row_number, rank, dense_rank, percent_rank, cume_dist, first_value, last_value, lead, lag. CBO: rank translation to Optiq RelNode tree failing --- Key: HIVE-7720 URL: https://issues.apache.org/jira/browse/HIVE-7720 Project: Hive Issue Type: Sub-task Components: Query Processor Reporter: Harish Butani Assignee: Laljo John Pullokkaran Following query: {code} explain select p_name from (select p_mfgr, p_name, p_size, rank() over(partition by p_mfgr order by p_size) as r from part) a where r = 2; {code} fails with {quote} org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException: One or more arguments are expected. at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFRank.getEvaluator(GenericUDAFRank.java:61) at org.apache.hadoop.hive.ql.udf.generic.AbstractGenericUDAFResolver.getEvaluator(AbstractGenericUDAFResolver.java:47) at org.apache.hadoop.hive.ql.exec.FunctionRegistry.getGenericUDAFEvaluator(FunctionRegistry.java:1110) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getGenericUDAFEvaluator(SemanticAnalyzer.java:3506) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer$OptiqBasedPlanner.getHiveAggInfo(SemanticAnalyzer.java:12496) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer$OptiqBasedPlanner.genWindowingProj(SemanticAnalyzer.java:12858) {quote} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7871) WebHCat: Hive job with SQL server as MetastoreDB fails when Unicode characters are present in curl command
[ https://issues.apache.org/jira/browse/HIVE-7871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-7871: Attachment: HIVE-7871.1.patch Similar changes need to be made for the remaining end points in Server.java WebHCat: Hive job with SQL server as MetastoreDB fails when Unicode characters are present in curl command --- Key: HIVE-7871 URL: https://issues.apache.org/jira/browse/HIVE-7871 Project: Hive Issue Type: Bug Components: WebHCat Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan Attachments: HIVE-7871.1.patch Please follow the steps below to repro. 1. Create a SQL Server. Create Username, Password, DB with Unicode characters in their name. 2. Create a cluster and run the below command against its templeton endpoint curl -i -u username:password \ -d define=javax.jdo.option.ConnectionUserName=dbusername@SQLserver \ -d define=hive.metastore.uris= \ -d define=javax.jdo.option.ConnectionURL=jdbc:sqlserver://SQLserver.database.windows.net;database=dbname; encrypt=true;trustServerCertificate=true;create=false \ -d define=javax.jdo.option.ConnectionPassword=dbpassword \ -d statusdir=/hivestatus \ -d user.name=admin \ -d enablelog=false \ -d execute=show tables; \ -s https://localhost:30111/templeton/v1/hive; The following error message is received. javax.jdo.JDOFatalDataStoreException: Unable to open a test connection to the given database. JDBC url = jdbc:sqlserver://SQLserver.database.windows.net;database=dbname; encrypt=true;trustServerCertificate=true;create=false, username = dbusername@SQLserver. Terminating connection pool (set lazyInit to true if you expect to start your database after your app). Original Exception: -- com.microsoft.sqlserver.jdbc.SQLServerException: Login failed for user 'dbusername'. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7871) WebHCat: Hive job with SQL server as MetastoreDB fails when Unicode characters are present in curl command
[ https://issues.apache.org/jira/browse/HIVE-7871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-7871: Status: Patch Available (was: Open) WebHCat: Hive job with SQL server as MetastoreDB fails when Unicode characters are present in curl command --- Key: HIVE-7871 URL: https://issues.apache.org/jira/browse/HIVE-7871 Project: Hive Issue Type: Bug Components: WebHCat Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan Attachments: HIVE-7871.1.patch Please follow the steps below to repro. 1. Create a SQL Server. Create Username, Password, DB with Unicode characters in their name. 2. Create a cluster and run the below command against its templeton endpoint curl -i -u username:password \ -d define=javax.jdo.option.ConnectionUserName=dbusername@SQLserver \ -d define=hive.metastore.uris= \ -d define=javax.jdo.option.ConnectionURL=jdbc:sqlserver://SQLserver.database.windows.net;database=dbname; encrypt=true;trustServerCertificate=true;create=false \ -d define=javax.jdo.option.ConnectionPassword=dbpassword \ -d statusdir=/hivestatus \ -d user.name=admin \ -d enablelog=false \ -d execute=show tables; \ -s https://localhost:30111/templeton/v1/hive; The following error message is received. javax.jdo.JDOFatalDataStoreException: Unable to open a test connection to the given database. JDBC url = jdbc:sqlserver://SQLserver.database.windows.net;database=dbname; encrypt=true;trustServerCertificate=true;create=false, username = dbusername@SQLserver. Terminating connection pool (set lazyInit to true if you expect to start your database after your app). Original Exception: -- com.microsoft.sqlserver.jdbc.SQLServerException: Login failed for user 'dbusername'. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 24830: HIVE-7548: Precondition checks should not fail the merge task in case of automatic trigger
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24830/#review51415 --- ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java https://reviews.apache.org/r/24830/#comment89723 Better to throw an AssertionException here isn't it? Otherwise you will blindly delete it? ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFileMergeMapper.java https://reviews.apache.org/r/24830/#comment89724 ? - Gunther Hagleitner On Aug. 19, 2014, 12:29 a.m., Prasanth_J wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24830/ --- (Updated Aug. 19, 2014, 12:29 a.m.) Review request for hive and Gunther Hagleitner. Repository: hive-git Description --- ORC fast merge (HIVE-7509) will fail the merge task in case if any of the precondition checks fail. Precondition check fail is good for ALTER TABLE .. CONCATENATE but not for automatic trigger of merge task from conditional resolver. In case if a partition has non-compatible ORC files for merging then the merge task should ignore it and not fail the task. Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 1d6a93a ql/src/java/org/apache/hadoop/hive/ql/io/merge/MergeMapper.java beb4f7d ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFileMergeMapper.java b36152a ql/src/test/queries/clientnegative/orc_merge1.q b2d42cd ql/src/test/queries/clientnegative/orc_merge2.q 2f62ee7 ql/src/test/queries/clientnegative/orc_merge3.q 5158e2e ql/src/test/queries/clientnegative/orc_merge4.q ad48572 ql/src/test/queries/clientnegative/orc_merge5.q e94a8cc ql/src/test/queries/clientpositive/orc_merge_incompat1.q PRE-CREATION ql/src/test/queries/clientpositive/orc_merge_incompat2.q PRE-CREATION ql/src/test/results/clientpositive/orc_merge_incompat1.q.out PRE-CREATION ql/src/test/results/clientpositive/orc_merge_incompat2.q.out PRE-CREATION Diff: https://reviews.apache.org/r/24830/diff/ Testing --- Thanks, Prasanth_J
[jira] [Commented] (HIVE-7548) Precondition checks should not fail the merge task in case of automatic trigger
[ https://issues.apache.org/jira/browse/HIVE-7548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109433#comment-14109433 ] Gunther Hagleitner commented on HIVE-7548: -- Comments on rb. Otherwise +1. [~gopalv]/[~ashutoshc] could you take a look at this (esp the regex)? Precondition checks should not fail the merge task in case of automatic trigger --- Key: HIVE-7548 URL: https://issues.apache.org/jira/browse/HIVE-7548 Project: Hive Issue Type: Sub-task Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Attachments: HIVE-7548.1.patch ORC fast merge (HIVE-7509) will fail the merge task in case if any of the precondition checks fail. Precondition check fail is good for ALTER TABLE .. CONCATENATE but not for automatic trigger of merge task from conditional resolver. In case if a partition has non-compatible ORC files for merging then the merge task should ignore it and not fail the task. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Hive on Tez Counters
0.5 is almost out, the vote should be closed within a day. You'' have to use a build of Hive from the Tez branch though. TEZ-1118 is the patch you'll need to pick up to fix the CPU counters. This should apply directly on the 0.4 branch, if that's the approach you want to take. On Mon, Aug 25, 2014 at 5:38 AM, Suma Shivaprasad sumasai.shivapra...@gmail.com wrote: Hi Siddharth/Gunther, Thanks for replying to my queries. I was particularly interested in the CPU counter since I was doing some benchmarking on queries. Can you please clarify if I just blindly take a mod(CPU counter) for all tasks and add them up..would they be fine..or should I take a patch from the fix and apply it on Tez 0.4 to get it working until 0.5 is released? Thanks Suma On Fri, Aug 22, 2014 at 2:55 AM, Gunther Hagleitner ghagleit...@hortonworks.com wrote: Hive logs the same counters regardless of whether you run with Tez or MR. We've removed some counters in hive 0.13 (HIVE-4518) - the specific one you're looking for might be in that list. Thanks, Gunther. On Thu, Aug 21, 2014 at 11:13 AM, Siddharth Seth ss...@apache.org wrote: I'll let Hive folks answer the questions about the Hive counters. In terms of the CPU counter - that was a bug in Tez-0.4.0, which has been fixed in 0.5.0. COMMITTED_HEAP_BYTES just represents the memory available to the JVM (Runtime.getRuntime().totalMemory()). This will only vary if the VM is started with a different Xms and Xmx option. In terms of Tez, the application logs are currently the best place. Hive may expose these in a more accessible manner though. On Wed, Aug 20, 2014 at 11:16 PM, Suma Shivaprasad sumasai.shivapra...@gmail.com wrote: Hi, Needed info on where I can get detailed job counters for Hive on Tez. Am running this on a HDP cluster with Hive 0.13 and see only the following job counters through Hive Tez in Yarn application logs which I got through( yarn logs -applicationId ...) . a. Cannot see any ReduceOperator counters and also only DESERIALIZE_ERRORS is the only counter present in MapOperator b. The CPU_MILLISECONDS in some cases in -ve. Is CPU_MILLISECONDS accurate c. What does COMMITTED_HEAP_BYTES indicate? d. Is there any other place I should be checking the counters? [[File System Counters FILE: BYTES_READ=512, FILE: BYTES_WRITTEN=3079881, FILE: READ_OPS=0, FILE: LARGE_READ_OPS=0, FILE: WRITE_OPS=0, HDFS: BYTES_READ=8215153, HDFS: BYTES_WRITTEN=0, HDFS: READ_OPS=3, HDFS: LARGE_READ_OPS=0, HDFS: WRITE_OPS=0] [org.apache.tez.common.counters.TaskCounter SPILLED_RECORDS=222543, GC_TIME_MILLIS=172, *CPU_MILLISECONDS=-19700*, PHYSICAL_MEMORY_BYTES=667566080, VIRTUAL_MEMORY_BYTES=1887797248, COMMITTED_HEAP_BYTES=1011023872, INPUT_RECORDS_PROCESSED=222543, OUTPUT_RECORDS=222543, OUTPUT_BYTES=23543896, OUTPUT_BYTES_WITH_OVERHEAD=23989024, OUTPUT_BYTES_PHYSICAL=3079369, ADDITIONAL_SPILLS_BYTES_WRITTEN=0, ADDITIONAL_SPILLS_BYTES_READ=0, ADDITIONAL_SPILL_COUNT=0] [*org.apache.hadoop.hive.ql.exec.MapOperator*$Counter DESERIALIZE_ERRORS=0]] Thanks Suma -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
[jira] [Commented] (HIVE-7353) HiveServer2 using embedded MetaStore leaks JDOPersistanceManager
[ https://issues.apache.org/jira/browse/HIVE-7353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109455#comment-14109455 ] Szehon Ho commented on HIVE-7353: - Hi [~vgumashta] I dont see the TestSessionGlobalInitFile failures in the other runs, can you take a look? HiveServer2 using embedded MetaStore leaks JDOPersistanceManager Key: HIVE-7353 URL: https://issues.apache.org/jira/browse/HIVE-7353 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.13.0 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Fix For: 0.14.0 Attachments: HIVE-7353.1.patch, HIVE-7353.2.patch, HIVE-7353.3.patch, HIVE-7353.4.patch, HIVE-7353.5.patch While using embedded metastore, while creating background threads to run async operations, HiveServer2 ends up creating new instances of JDOPersistanceManager which are cached in JDOPersistanceManagerFactory. Even when the background thread is killed by the thread pool manager, the JDOPersistanceManager are never GCed because they are cached by JDOPersistanceManagerFactory. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 24962: HIVE-7730: Extend ReadEntity to add accessed columns from query
On Aug. 22, 2014, 6:14 a.m., Szehon Ho wrote: ql/src/java/org/apache/hadoop/hive/ql/hooks/ReadEntity.java, line 54 https://reviews.apache.org/r/24962/diff/1/?file=666753#file666753line54 Can we make this final, and not have a setter? The caller can just add to the list. It'll make the code a bit simpler. Also should it be set? Xiaomeng Huang wrote: Thanks, I think it better to be list. I get accessed columns from tableToColumnAccessMap, which is a MapString, ListString. Hive's native authorization is use this list too. I get the column list via a table name, then set it to readEntity directly, don't need to add every one with a loop. so it is necessary to have a setter. BTW, I can also to add a API addAccessedColumn(String column) to add one column to this column list. Szehon Ho wrote: OK its fine if you think it should be list. For the other part, I was thinking to have just one method getAccessedColumn() which returns list. Then caller (SemanticAnalyzer) can call: entity.getAccessedColumns().addAll(...). The benefit to me is 1) the list can be made final, and 2) make the calling code cleaner (no need to construct lists and set them). Also its more consistent with the other collections in this class. Hope that makes sense, thanks! Xiaomeng Huang wrote: Yes, it fines to me. Fixed it, thanks! Thanks Xiaomeng, can you please upload the latest to the JIRA for testing and commit? And I had just one minor comment for your consideration, as it's still not uploaded. With this, we dont need to construct a new linkedList for cols outside the switches (its kind of a waste), and we can just directly call entity.getAccessedColumns().addAll(tableToColumnAccessMap.get...), or you can make a local variable cols = tableToColumnAccessMap.get(...) and then entity.getAccessedColumns().addAll(cols). It's not a huge deal, but I was thinking we can fix and upload the patch together. - Szehon --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24962/#review51257 --- On Aug. 25, 2014, 3:17 a.m., Xiaomeng Huang wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24962/ --- (Updated Aug. 25, 2014, 3:17 a.m.) Review request for hive, Prasad Mujumdar and Szehon Ho. Repository: hive-git Description --- External authorization model can not get accessed columns from query. Hive should store accessed columns to ReadEntity Diffs - ql/src/java/org/apache/hadoop/hive/ql/hooks/ReadEntity.java 7ed50b4 ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java b05d3b4 Diff: https://reviews.apache.org/r/24962/diff/ Testing --- Thanks, Xiaomeng Huang
Re: Review Request 24830: HIVE-7548: Precondition checks should not fail the merge task in case of automatic trigger
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24830/#review51419 --- ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java https://reviews.apache.org/r/24830/#comment89734 There are other utility functions that extracts taskID/attemptID from file names. None of these methods throw exception if it could not find matches for the regex pattern. Example: getIdFromFilename() returns filename as Id if it cannot match pattern. I was also following the same convention. In this case, if there are no matches for copy file pattern it will return false and will fallback to old code path. The regex will still work if files are loaded using LOAD DATA LOCAL INPATH statement. With this statement, the file names will be like 1) filename.txt 2) filename_copy_1.txt 3) filename_copy_2.txt For this file pattern, there will be no match for taskId/attemptId extraction. Hence no files will be marked duplicate. We really don't have to worry about copy file names in this case as there will not be any duplicate elimination. ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFileMergeMapper.java https://reviews.apache.org/r/24830/#comment89735 Fixed it. - Prasanth_J On Aug. 19, 2014, 12:29 a.m., Prasanth_J wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24830/ --- (Updated Aug. 19, 2014, 12:29 a.m.) Review request for hive and Gunther Hagleitner. Repository: hive-git Description --- ORC fast merge (HIVE-7509) will fail the merge task in case if any of the precondition checks fail. Precondition check fail is good for ALTER TABLE .. CONCATENATE but not for automatic trigger of merge task from conditional resolver. In case if a partition has non-compatible ORC files for merging then the merge task should ignore it and not fail the task. Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 1d6a93a ql/src/java/org/apache/hadoop/hive/ql/io/merge/MergeMapper.java beb4f7d ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFileMergeMapper.java b36152a ql/src/test/queries/clientnegative/orc_merge1.q b2d42cd ql/src/test/queries/clientnegative/orc_merge2.q 2f62ee7 ql/src/test/queries/clientnegative/orc_merge3.q 5158e2e ql/src/test/queries/clientnegative/orc_merge4.q ad48572 ql/src/test/queries/clientnegative/orc_merge5.q e94a8cc ql/src/test/queries/clientpositive/orc_merge_incompat1.q PRE-CREATION ql/src/test/queries/clientpositive/orc_merge_incompat2.q PRE-CREATION ql/src/test/results/clientpositive/orc_merge_incompat1.q.out PRE-CREATION ql/src/test/results/clientpositive/orc_merge_incompat2.q.out PRE-CREATION Diff: https://reviews.apache.org/r/24830/diff/ Testing --- Thanks, Prasanth_J
[jira] [Updated] (HIVE-7681) qualified tablenames usage does not work with several alter-table commands
[ https://issues.apache.org/jira/browse/HIVE-7681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-7681: Attachment: HIVE-7681.4.patch.txt HIVE-7681.4.patch.txt - Uploading the patch to kick off tests and make sure that no new tests need update. qualified tablenames usage does not work with several alter-table commands -- Key: HIVE-7681 URL: https://issues.apache.org/jira/browse/HIVE-7681 Project: Hive Issue Type: Bug Reporter: Thejas M Nair Assignee: Navis Attachments: HIVE-7681.1.patch.txt, HIVE-7681.2.patch.txt, HIVE-7681.3.patch.txt, HIVE-7681.4.patch.txt, HIVE-7681.4.patch.txt Changes were made in HIVE-4064 for use of qualified table names in more types of queries. But several alter table commands don't work with qualified - alter table default.tmpfoo set tblproperties (bar = bar value) - ALTER TABLE default.kv_rename_test CHANGE a a STRING - add,drop partition - alter index rebuild -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 24627: HIVE-7704: Create tez task for fast file merging
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24627/#review51416 --- ql/src/java/org/apache/hadoop/hive/ql/exec/MapOperator.java https://reviews.apache.org/r/24627/#comment89725 why do you need a map operator at all then? Can't you just write a net new processor that doesn't init map op? ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java https://reviews.apache.org/r/24627/#comment89727 conf setup should happen in initVertexConf. ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java https://reviews.apache.org/r/24627/#comment89728 if i read this right the only diff is the process or. can you use a var for this and keep a single call to new Vertex? String procClassName; if ... { procClassName = ... } ... new Vertext(...procClassName) if you move all the conf setup into the initVertexConf method this should be more clear. ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java https://reviews.apache.org/r/24627/#comment89726 indentation seems broken ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java https://reviews.apache.org/r/24627/#comment89729 that means you're setting a path as the alias? ql/src/java/org/apache/hadoop/hive/ql/exec/tez/MergeFileMapRecordProcessor.java https://reviews.apache.org/r/24627/#comment89730 I'm assuming that Merge* and ORCMerge* contain a lot of copied code? (from the MR path). If that's the case can you factor that out? ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezProcessor.java https://reviews.apache.org/r/24627/#comment89731 this should be a different class. not every processor will need these things. ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezTask.java https://reviews.apache.org/r/24627/#comment89733 don't call it jobClose if it only applies to merge work. ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java https://reviews.apache.org/r/24627/#comment89736 you seem to be fighting that merge work is only partly a map work. why not create a dummy op? that way everything is the same. you could even create a real op and move your merge logic into it. - Gunther Hagleitner On Aug. 15, 2014, 5:27 a.m., Prasanth_J wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24627/ --- (Updated Aug. 15, 2014, 5:27 a.m.) Review request for hive and Gunther Hagleitner. Bugs: HIVE-7704 https://issues.apache.org/jira/browse/HIVE-7704 Repository: hive-git Description --- Currently tez falls back to MR task for merge file task. It will beneficial to convert the merge file tasks to tez task to make use of the performance gains from tez. Diffs - itests/src/test/resources/testconfiguration.properties b801678 ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java cd017d8 ql/src/java/org/apache/hadoop/hive/ql/exec/MapOperator.java d5de58e ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java a2975cb ql/src/java/org/apache/hadoop/hive/ql/exec/TaskFactory.java 3d74459 ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 1d6a93a ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecMapper.java 4e0fd79 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java e116426 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/MapRecordProcessor.java 8513e33 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/MapTezProcessor.java 31f3bcd ql/src/java/org/apache/hadoop/hive/ql/exec/tez/MergeFileMapRecordProcessor.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/MergeFileTezProcessor.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/OrcMergeFileMapRecordProcessor.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/OrcMergeFileTezProcessor.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/RCFileMergeFileMapRecordProcessor.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/RecordProcessor.java 1577827 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezProcessor.java c2ba782 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezTask.java 951e918 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/tools/RCFileMergeFileTezProcessor.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java bf44548 ql/src/java/org/apache/hadoop/hive/ql/io/merge/MergeFileInputFormat.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/merge/MergeFileMapper.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/merge/MergeFileOutputFormat.java PRE-CREATION
[jira] [Commented] (HIVE-7704) Create tez task for fast file merging
[ https://issues.apache.org/jira/browse/HIVE-7704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109480#comment-14109480 ] Gunther Hagleitner commented on HIVE-7704: -- comments on rb Create tez task for fast file merging - Key: HIVE-7704 URL: https://issues.apache.org/jira/browse/HIVE-7704 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Attachments: HIVE-7704.1.patch, HIVE-7704.2.patch, HIVE-7704.3.patch, HIVE-7704.4.patch, HIVE-7704.4.patch Currently tez falls back to MR task for merge file task. It will beneficial to convert the merge file tasks to tez task to make use of the performance gains from tez. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7548) Precondition checks should not fail the merge task in case of automatic trigger
[ https://issues.apache.org/jira/browse/HIVE-7548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-7548: - Attachment: HIVE-7548.2.patch Precondition checks should not fail the merge task in case of automatic trigger --- Key: HIVE-7548 URL: https://issues.apache.org/jira/browse/HIVE-7548 Project: Hive Issue Type: Sub-task Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Attachments: HIVE-7548.1.patch, HIVE-7548.2.patch ORC fast merge (HIVE-7509) will fail the merge task in case if any of the precondition checks fail. Precondition check fail is good for ALTER TABLE .. CONCATENATE but not for automatic trigger of merge task from conditional resolver. In case if a partition has non-compatible ORC files for merging then the merge task should ignore it and not fail the task. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7548) Precondition checks should not fail the merge task in case of automatic trigger
[ https://issues.apache.org/jira/browse/HIVE-7548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109482#comment-14109482 ] Prasanth J commented on HIVE-7548: -- Addressed Gunther's review comments. Precondition checks should not fail the merge task in case of automatic trigger --- Key: HIVE-7548 URL: https://issues.apache.org/jira/browse/HIVE-7548 Project: Hive Issue Type: Sub-task Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Attachments: HIVE-7548.1.patch, HIVE-7548.2.patch ORC fast merge (HIVE-7509) will fail the merge task in case if any of the precondition checks fail. Precondition check fail is good for ALTER TABLE .. CONCATENATE but not for automatic trigger of merge task from conditional resolver. In case if a partition has non-compatible ORC files for merging then the merge task should ignore it and not fail the task. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 24830: HIVE-7548: Precondition checks should not fail the merge task in case of automatic trigger
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24830/#review51424 --- ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java https://reviews.apache.org/r/24830/#comment89740 Use named capture in java as much as possible. (?taskId[0-9]+) etc. ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java https://reviews.apache.org/r/24830/#comment89741 What about LOAD DATA INPATH? - Gopal V On Aug. 19, 2014, 12:29 a.m., Prasanth_J wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24830/ --- (Updated Aug. 19, 2014, 12:29 a.m.) Review request for hive and Gunther Hagleitner. Repository: hive-git Description --- ORC fast merge (HIVE-7509) will fail the merge task in case if any of the precondition checks fail. Precondition check fail is good for ALTER TABLE .. CONCATENATE but not for automatic trigger of merge task from conditional resolver. In case if a partition has non-compatible ORC files for merging then the merge task should ignore it and not fail the task. Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 1d6a93a ql/src/java/org/apache/hadoop/hive/ql/io/merge/MergeMapper.java beb4f7d ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFileMergeMapper.java b36152a ql/src/test/queries/clientnegative/orc_merge1.q b2d42cd ql/src/test/queries/clientnegative/orc_merge2.q 2f62ee7 ql/src/test/queries/clientnegative/orc_merge3.q 5158e2e ql/src/test/queries/clientnegative/orc_merge4.q ad48572 ql/src/test/queries/clientnegative/orc_merge5.q e94a8cc ql/src/test/queries/clientpositive/orc_merge_incompat1.q PRE-CREATION ql/src/test/queries/clientpositive/orc_merge_incompat2.q PRE-CREATION ql/src/test/results/clientpositive/orc_merge_incompat1.q.out PRE-CREATION ql/src/test/results/clientpositive/orc_merge_incompat2.q.out PRE-CREATION Diff: https://reviews.apache.org/r/24830/diff/ Testing --- Thanks, Prasanth_J
[jira] [Commented] (HIVE-5799) session/operation timeout for hiveserver2
[ https://issues.apache.org/jira/browse/HIVE-5799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109489#comment-14109489 ] Brock Noland commented on HIVE-5799: # It appears the constant SessionManager. SESSION_CHECK_INTERVAL is unused. # HiveSessionImpl. opHandleSet should be converted to a synchronized set or a concurrent set since it's now modified by the client thread and the background thread # I think we should call OperationManager._getOperation getOperationInternal as that seems to be more standard for this use case in the Hive code base. # Perhaps we should change OperationManager.closeExpiredOperations to not use closeOperation but use similar code since it appears there is a harmless race condition there which will log exceptions when the operation is closed during closeExpiredOperations Does anyone else have any feedback? Otherwise once these are fixed I think we can commit. session/operation timeout for hiveserver2 - Key: HIVE-5799 URL: https://issues.apache.org/jira/browse/HIVE-5799 Project: Hive Issue Type: Improvement Components: HiveServer2 Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-5799.1.patch.txt, HIVE-5799.10.patch.txt, HIVE-5799.11.patch.txt, HIVE-5799.12.patch.txt, HIVE-5799.12.patch.txt, HIVE-5799.2.patch.txt, HIVE-5799.3.patch.txt, HIVE-5799.4.patch.txt, HIVE-5799.5.patch.txt, HIVE-5799.6.patch.txt, HIVE-5799.7.patch.txt, HIVE-5799.8.patch.txt, HIVE-5799.9.patch.txt Need some timeout facility for preventing resource leakages from instable or bad clients. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7865) Extend TestFileDump test case to printout ORC row index information
[ https://issues.apache.org/jira/browse/HIVE-7865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109492#comment-14109492 ] Sergey Shelukhin commented on HIVE-7865: +1 Extend TestFileDump test case to printout ORC row index information --- Key: HIVE-7865 URL: https://issues.apache.org/jira/browse/HIVE-7865 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Attachments: HIVE-7865.1.patch It will be good to have test case to printout ORC row index entries. Some changes to ORC format like HIVE-7832 uses different codepath to write row index entries. To make sure it is not doing anything wrong a test case that prints row index entries will be helpful. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-5799) session/operation timeout for hiveserver2
[ https://issues.apache.org/jira/browse/HIVE-5799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109500#comment-14109500 ] Lars Francke commented on HIVE-5799: The patch looks mostly good, I have a couple of minor comments regarding style/checkstyle. If you're interested in them could you please update RB? session/operation timeout for hiveserver2 - Key: HIVE-5799 URL: https://issues.apache.org/jira/browse/HIVE-5799 Project: Hive Issue Type: Improvement Components: HiveServer2 Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-5799.1.patch.txt, HIVE-5799.10.patch.txt, HIVE-5799.11.patch.txt, HIVE-5799.12.patch.txt, HIVE-5799.12.patch.txt, HIVE-5799.2.patch.txt, HIVE-5799.3.patch.txt, HIVE-5799.4.patch.txt, HIVE-5799.5.patch.txt, HIVE-5799.6.patch.txt, HIVE-5799.7.patch.txt, HIVE-5799.8.patch.txt, HIVE-5799.9.patch.txt Need some timeout facility for preventing resource leakages from instable or bad clients. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 24688: parallel order by clause on a string column fails with IOException: Split points are out of order
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24688/#review51426 --- Looks like an important bug to fix, but I dont know too much about this code, can you explain what is the bug in the getPartitionKey algorithm, and what is the fix? Like why we need to alter the stepSize as we iterate. Is there a test we can add for this as well to illustrate and validate the fix? Also my confusion is if the other fixes on the patch are related? 1. Adding setConf on the HiveTotalOrderPartitioner is related to the bug? 2. What is the use of the new HiveConf ..min.reducer? My guess is you found the algorithm not generating enough partitionKey sometimes, can you explain? common/src/java/org/apache/hadoop/hive/conf/HiveConf.java https://reviews.apache.org/r/24688/#comment89744 If this needs to be exposed, should be worded better. Something like: name = hive.optimize.sampling.orderby.min.reducer.ratio If sampling is enabled, this is the minimum ratio allowed of reducers calculated by sampling to expected number of reducers. Its might be confusing to user in my opinion, as the user has little control of what the expected reducer is, right? ql/src/java/org/apache/hadoop/hive/ql/exec/PartitionKeySampler.java https://reviews.apache.org/r/24688/#comment89742 Please add some more context to this debug statement. ql/src/java/org/apache/hadoop/hive/ql/exec/PartitionKeySampler.java https://reviews.apache.org/r/24688/#comment89743 If needs to be exposed, message can be Sampling generated x number of reducers, but it was expected to be y - Szehon Ho On Aug. 14, 2014, 2:29 a.m., Navis Ryu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24688/ --- (Updated Aug. 14, 2014, 2:29 a.m.) Review request for hive. Bugs: HIVE-7669 https://issues.apache.org/jira/browse/HIVE-7669 Repository: hive-git Description --- The source table has 600 Million rows and it has a String column l_shipinstruct which has 4 unique values. (Ie. these 4 values are repeated across the 600 million rows) We are sorting it based on this string column l_shipinstruct as shown in the below HiveQL with the following parameters. {code:sql} set hive.optimize.sampling.orderby=true; set hive.optimize.sampling.orderby.number=1000; set hive.optimize.sampling.orderby.percent=0.1f; insert overwrite table lineitem_temp_report select l_orderkey, l_partkey, l_suppkey, l_linenumber, l_quantity, l_extendedprice, l_discount, l_tax, l_returnflag, l_linestatus, l_shipdate, l_commitdate, l_receiptdate, l_shipinstruct, l_shipmode, l_comment from lineitem order by l_shipinstruct; {code} Stack Trace Diagnostic Messages for this Task: {noformat} Error: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapred.MapTask$OldOutputCollector.init(MapTask.java:569) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) ... 10 more Caused by: java.lang.IllegalArgumentException: Can't read partitions file at org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:116) at org.apache.hadoop.mapred.lib.TotalOrderPartitioner.configure(TotalOrderPartitioner.java:42) at org.apache.hadoop.hive.ql.exec.HiveTotalOrderPartitioner.configure(HiveTotalOrderPartitioner.java:37) ... 15 more Caused by: java.io.IOException: Split points are
Re: Review Request 23320: HiveServer2 using embedded MetaStore leaks JDOPersistanceManager
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/23320/ --- (Updated Aug. 25, 2014, 7:15 p.m.) Review request for hive, Navis Ryu, Sushanth Sowmyan, Szehon Ho, and Thejas Nair. Bugs: HIVE-7353 https://issues.apache.org/jira/browse/HIVE-7353 Repository: hive-git Description --- https://issues.apache.org/jira/browse/HIVE-7353 Diffs (updated) - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 7f4afd9 metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 06d7595 metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 0693039 ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java e387b8f service/src/java/org/apache/hive/service/cli/CLIService.java d2cdfc1 service/src/java/org/apache/hive/service/cli/operation/SQLOperation.java de54ca1 service/src/java/org/apache/hive/service/cli/session/HiveSession.java 9785e95 service/src/java/org/apache/hive/service/cli/session/HiveSessionBase.java eee1cc6 service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java bc0a02c service/src/java/org/apache/hive/service/cli/session/SessionManager.java d573592 service/src/java/org/apache/hive/service/cli/thrift/ThriftBinaryCLIService.java 37b05fc service/src/java/org/apache/hive/service/cli/thrift/ThriftCLIService.java be2eb01 service/src/java/org/apache/hive/service/cli/thrift/ThriftHttpCLIService.java c380b69 service/src/java/org/apache/hive/service/server/ThreadFactoryWithGarbageCleanup.java PRE-CREATION service/src/java/org/apache/hive/service/server/ThreadWithGarbageCleanup.java PRE-CREATION Diff: https://reviews.apache.org/r/23320/diff/ Testing --- Manual testing using Yourkit. Thanks, Vaibhav Gumashta
[jira] [Updated] (HIVE-7353) HiveServer2 using embedded MetaStore leaks JDOPersistanceManager
[ https://issues.apache.org/jira/browse/HIVE-7353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta updated HIVE-7353: --- Attachment: HIVE-7353.6.patch [~szehon] This should fix it. HiveServer2 using embedded MetaStore leaks JDOPersistanceManager Key: HIVE-7353 URL: https://issues.apache.org/jira/browse/HIVE-7353 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.13.0 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Fix For: 0.14.0 Attachments: HIVE-7353.1.patch, HIVE-7353.2.patch, HIVE-7353.3.patch, HIVE-7353.4.patch, HIVE-7353.5.patch, HIVE-7353.6.patch While using embedded metastore, while creating background threads to run async operations, HiveServer2 ends up creating new instances of JDOPersistanceManager which are cached in JDOPersistanceManagerFactory. Even when the background thread is killed by the thread pool manager, the JDOPersistanceManager are never GCed because they are cached by JDOPersistanceManagerFactory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7353) HiveServer2 using embedded MetaStore leaks JDOPersistanceManager
[ https://issues.apache.org/jira/browse/HIVE-7353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta updated HIVE-7353: --- Status: Open (was: Patch Available) HiveServer2 using embedded MetaStore leaks JDOPersistanceManager Key: HIVE-7353 URL: https://issues.apache.org/jira/browse/HIVE-7353 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.13.0 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Fix For: 0.14.0 Attachments: HIVE-7353.1.patch, HIVE-7353.2.patch, HIVE-7353.3.patch, HIVE-7353.4.patch, HIVE-7353.5.patch, HIVE-7353.6.patch While using embedded metastore, while creating background threads to run async operations, HiveServer2 ends up creating new instances of JDOPersistanceManager which are cached in JDOPersistanceManagerFactory. Even when the background thread is killed by the thread pool manager, the JDOPersistanceManager are never GCed because they are cached by JDOPersistanceManagerFactory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7872) StorageBasedAuthorizationProvider should check access perms of parent directory for DROP actions
Jason Dere created HIVE-7872: Summary: StorageBasedAuthorizationProvider should check access perms of parent directory for DROP actions Key: HIVE-7872 URL: https://issues.apache.org/jira/browse/HIVE-7872 Project: Hive Issue Type: Bug Components: Authorization Reporter: Jason Dere When dropping a table partition, StorageBasedAuthorizationProvider is checking for write permission on the partition directory itself to check if the user is allowed to drop the partition. However to delete the partition directory, you really need write perms on the parent directory of the file you are going to delete. So SBA will authorize the user to drop the partition but actually deleting the partition directory will fail if the user does not have the correct access on the table (parent) directory. SBA should also check the parent directory for DROP actions during its auth check. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6847) Improve / fix bugs in Hive scratch dir setup
[ https://issues.apache.org/jira/browse/HIVE-6847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109598#comment-14109598 ] Hive QA commented on HIVE-6847: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12664179/HIVE-6847.5.patch {color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 6114 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby8_noskew org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_index_bitmap3 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_index_bitmap_auto org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/488/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/488/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-488/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12664179 Improve / fix bugs in Hive scratch dir setup Key: HIVE-6847 URL: https://issues.apache.org/jira/browse/HIVE-6847 Project: Hive Issue Type: Bug Components: CLI, HiveServer2 Affects Versions: 0.14.0 Reporter: Vikram Dixit K Assignee: Vaibhav Gumashta Fix For: 0.14.0 Attachments: HIVE-6847.1.patch, HIVE-6847.2.patch, HIVE-6847.3.patch, HIVE-6847.4.patch, HIVE-6847.5.patch Currently, the hive server creates scratch directory and changes permission to 777 however, this is not great with respect to security. We need to create user specific scratch directories instead. Also refer to HIVE-6782 1st iteration of the patch for approach. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 24830: HIVE-7548: Precondition checks should not fail the merge task in case of automatic trigger
On Aug. 25, 2014, 6:48 p.m., Gopal V wrote: ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java, line 1549 https://reviews.apache.org/r/24830/diff/1/?file=663983#file663983line1549 Use named capture in java as much as possible. (?taskId[0-9]+) etc. Named capture is supported only in JDK7 and above. Using comments in next patch to maintain compat. On Aug. 25, 2014, 6:48 p.m., Gopal V wrote: ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java, line 1877 https://reviews.apache.org/r/24830/diff/1/?file=663983#file663983line1877 What about LOAD DATA INPATH? By LOAD .. INTO in comment I meant meant LOAD DATA INPATH .. INTO TABLE.. Please look at my previous comment for Gunther's question in review board for LOAD DATA example. - Prasanth_J --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24830/#review51424 --- On Aug. 19, 2014, 12:29 a.m., Prasanth_J wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24830/ --- (Updated Aug. 19, 2014, 12:29 a.m.) Review request for hive and Gunther Hagleitner. Repository: hive-git Description --- ORC fast merge (HIVE-7509) will fail the merge task in case if any of the precondition checks fail. Precondition check fail is good for ALTER TABLE .. CONCATENATE but not for automatic trigger of merge task from conditional resolver. In case if a partition has non-compatible ORC files for merging then the merge task should ignore it and not fail the task. Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 1d6a93a ql/src/java/org/apache/hadoop/hive/ql/io/merge/MergeMapper.java beb4f7d ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFileMergeMapper.java b36152a ql/src/test/queries/clientnegative/orc_merge1.q b2d42cd ql/src/test/queries/clientnegative/orc_merge2.q 2f62ee7 ql/src/test/queries/clientnegative/orc_merge3.q 5158e2e ql/src/test/queries/clientnegative/orc_merge4.q ad48572 ql/src/test/queries/clientnegative/orc_merge5.q e94a8cc ql/src/test/queries/clientpositive/orc_merge_incompat1.q PRE-CREATION ql/src/test/queries/clientpositive/orc_merge_incompat2.q PRE-CREATION ql/src/test/results/clientpositive/orc_merge_incompat1.q.out PRE-CREATION ql/src/test/results/clientpositive/orc_merge_incompat2.q.out PRE-CREATION Diff: https://reviews.apache.org/r/24830/diff/ Testing --- Thanks, Prasanth_J
Re: Review Request 24830: HIVE-7548: Precondition checks should not fail the merge task in case of automatic trigger
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24830/ --- (Updated Aug. 25, 2014, 7:54 p.m.) Review request for hive and Gunther Hagleitner. Bugs: HIVE-7548 https://issues.apache.org/jira/browse/HIVE-7548 Repository: hive-git Description --- ORC fast merge (HIVE-7509) will fail the merge task in case if any of the precondition checks fail. Precondition check fail is good for ALTER TABLE .. CONCATENATE but not for automatic trigger of merge task from conditional resolver. In case if a partition has non-compatible ORC files for merging then the merge task should ignore it and not fail the task. Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 1d6a93a ql/src/java/org/apache/hadoop/hive/ql/io/merge/MergeMapper.java beb4f7d ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFileMergeMapper.java b36152a ql/src/test/queries/clientnegative/orc_merge1.q b2d42cd ql/src/test/queries/clientnegative/orc_merge2.q 2f62ee7 ql/src/test/queries/clientnegative/orc_merge3.q 5158e2e ql/src/test/queries/clientnegative/orc_merge4.q ad48572 ql/src/test/queries/clientnegative/orc_merge5.q e94a8cc ql/src/test/queries/clientpositive/orc_merge_incompat1.q PRE-CREATION ql/src/test/queries/clientpositive/orc_merge_incompat2.q PRE-CREATION ql/src/test/results/clientpositive/orc_merge_incompat1.q.out PRE-CREATION ql/src/test/results/clientpositive/orc_merge_incompat2.q.out PRE-CREATION Diff: https://reviews.apache.org/r/24830/diff/ Testing --- Thanks, Prasanth_J
[jira] [Updated] (HIVE-7548) Precondition checks should not fail the merge task in case of automatic trigger
[ https://issues.apache.org/jira/browse/HIVE-7548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-7548: - Attachment: HIVE-7548.3.patch Addressed Gopal's review comments Precondition checks should not fail the merge task in case of automatic trigger --- Key: HIVE-7548 URL: https://issues.apache.org/jira/browse/HIVE-7548 Project: Hive Issue Type: Sub-task Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Attachments: HIVE-7548.1.patch, HIVE-7548.2.patch, HIVE-7548.3.patch ORC fast merge (HIVE-7509) will fail the merge task in case if any of the precondition checks fail. Precondition check fail is good for ALTER TABLE .. CONCATENATE but not for automatic trigger of merge task from conditional resolver. In case if a partition has non-compatible ORC files for merging then the merge task should ignore it and not fail the task. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7353) HiveServer2 using embedded MetaStore leaks JDOPersistanceManager
[ https://issues.apache.org/jira/browse/HIVE-7353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109642#comment-14109642 ] Szehon Ho commented on HIVE-7353: - Thanks, do you mind updating the RB as well? HiveServer2 using embedded MetaStore leaks JDOPersistanceManager Key: HIVE-7353 URL: https://issues.apache.org/jira/browse/HIVE-7353 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.13.0 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Fix For: 0.14.0 Attachments: HIVE-7353.1.patch, HIVE-7353.2.patch, HIVE-7353.3.patch, HIVE-7353.4.patch, HIVE-7353.5.patch, HIVE-7353.6.patch While using embedded metastore, while creating background threads to run async operations, HiveServer2 ends up creating new instances of JDOPersistanceManager which are cached in JDOPersistanceManagerFactory. Even when the background thread is killed by the thread pool manager, the JDOPersistanceManager are never GCed because they are cached by JDOPersistanceManagerFactory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7865) Extend TestFileDump test case to printout ORC row index information
[ https://issues.apache.org/jira/browse/HIVE-7865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-7865: - Resolution: Fixed Fix Version/s: 0.14.0 Status: Resolved (was: Patch Available) Committed to trunk. Extend TestFileDump test case to printout ORC row index information --- Key: HIVE-7865 URL: https://issues.apache.org/jira/browse/HIVE-7865 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Fix For: 0.14.0 Attachments: HIVE-7865.1.patch It will be good to have test case to printout ORC row index entries. Some changes to ORC format like HIVE-7832 uses different codepath to write row index entries. To make sure it is not doing anything wrong a test case that prints row index entries will be helpful. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7613) Research optimization of auto convert join to map join [Spark branch]
[ https://issues.apache.org/jira/browse/HIVE-7613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szehon Ho updated HIVE-7613: Assignee: Szehon Ho I'll take at this this week, but I might not be able to finish (be out next week). I can hand it off to somebody else at that point. Research optimization of auto convert join to map join [Spark branch] - Key: HIVE-7613 URL: https://issues.apache.org/jira/browse/HIVE-7613 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chengxiang Li Assignee: Szehon Ho Priority: Minor ConvertJoinMapJoin is an optimization the replaces a common join(aka shuffle join) with a map join(aka broadcast or fragment replicate join) when possible. we need to research how to make it workable with Hive on Spark. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 24986: HIVE-7553: decouple the auxiliary jar loading from hive server2 starting phase
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24986/#review51436 --- Hi, Thank you very much for your work!! This looks great! I have a few comments below. common/src/java/org/apache/hadoop/hive/conf/HiveConf.java https://reviews.apache.org/r/24986/#comment89754 I've been trying to think of a good name. I think we should call this Reloadable jars since we are re-loading them. Thoughts? ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionTask.java https://reviews.apache.org/r/24986/#comment89764 Do we need to do this? getSessionSpecifiedClassLoader won't return null ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java https://reviews.apache.org/r/24986/#comment89756 Let's put some trace logging in here as which classloader we are returning. I don't see the classloader on Conf actually getting set anywhere. Is it set by someone for us? ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveUtils.java https://reviews.apache.org/r/24986/#comment89757 I don't think we want HiveAuthroization and HiveAuthentication providers to be reloadble? ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveUtils.java https://reviews.apache.org/r/24986/#comment89758 Same as above? ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java https://reviews.apache.org/r/24986/#comment89760 This should be moved to the top of the class and be make final ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java https://reviews.apache.org/r/24986/#comment89759 Let's use camelCaps not under_scores for variable names since that is more standard. ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java https://reviews.apache.org/r/24986/#comment89761 I think that HIVEREFRESHJARS should be a list of directories like HIVEAUXJARS ql/src/test/org/apache/hadoop/hive/ql/session/TestSessionState.java https://reviews.apache.org/r/24986/#comment89762 Can you put this in java.io.tmpdir? ql/src/test/org/apache/hadoop/hive/ql/session/TestSessionState.java https://reviews.apache.org/r/24986/#comment89768 Let's not print to std error. Let's print to log. I think we should also call Assert.fail(msg) with a good message. ql/src/test/org/apache/hadoop/hive/ql/session/TestSessionState.java https://reviews.apache.org/r/24986/#comment89767 We should fail of an exception is thrown ql/src/test/org/apache/hadoop/hive/ql/session/TestSessionState.java https://reviews.apache.org/r/24986/#comment89765 We should fail of an exception is thrown ql/src/test/org/apache/hadoop/hive/ql/session/TestSessionState.java https://reviews.apache.org/r/24986/#comment89766 We should fail of an exception is thrown service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java https://reviews.apache.org/r/24986/#comment89763 We should do something which this exception Hi, - Brock Noland On Aug. 25, 2014, 6:45 a.m., cheng xu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24986/ --- (Updated Aug. 25, 2014, 6:45 a.m.) Review request for hive. Bugs: HIVE-7553 https://issues.apache.org/jira/browse/HIVE-7553 Repository: hive-git Description --- HIVE-7553: decouple the auxiliary jar loading from hive server2 starting phase Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 7f4afd9d64aff18329e7850342855aade42e21f5 hcatalog/core/src/main/java/org/apache/hive/hcatalog/common/HCatUtil.java 93a03adeab7ba3c3c91344955d303e4252005239 hcatalog/webhcat/java-client/src/main/java/org/apache/hive/hcatalog/api/HCatClient.java f25039dcf55b3b24bbf8dcba05855665a1c7f3b0 ql/src/java/org/apache/hadoop/hive/ql/exec/DefaultFetchFormatter.java 5924bcf1f55dc4c2dd06f312f929047b7df9de55 ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 0c6a3d44ef1f796778768421dc02f8bf3ede6a8c ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionTask.java bd45df1a401d1adb009e953d08205c7d5c2d5de2 ql/src/java/org/apache/hadoop/hive/ql/exec/ListSinkOperator.java dcc19f70644c561e17df8c8660ca62805465f1d6 ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 76fee612a583cdc2c632d27932623521b735e768 ql/src/java/org/apache/hadoop/hive/ql/exec/mr/HadoopJobExecHelper.java eb2851b2c5fa52e0f555b3d8d1beea5d1ac3b225 ql/src/java/org/apache/hadoop/hive/ql/hooks/HookUtils.java 3f474f846c7af5f1f65f1c14f3ce51308f1279d4 ql/src/java/org/apache/hadoop/hive/ql/io/HivePassThroughOutputFormat.java 0962cadce0d515e046371d0a816f4efd70b8eef7 ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveUtils.java
[jira] [Commented] (HIVE-7353) HiveServer2 using embedded MetaStore leaks JDOPersistanceManager
[ https://issues.apache.org/jira/browse/HIVE-7353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109662#comment-14109662 ] Vaibhav Gumashta commented on HIVE-7353: [~szehon] Done that already. Actually hold off on reviewing this one for few mins - I see one issue with the current patch - I'll put up an updated one in few mins. HiveServer2 using embedded MetaStore leaks JDOPersistanceManager Key: HIVE-7353 URL: https://issues.apache.org/jira/browse/HIVE-7353 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.13.0 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Fix For: 0.14.0 Attachments: HIVE-7353.1.patch, HIVE-7353.2.patch, HIVE-7353.3.patch, HIVE-7353.4.patch, HIVE-7353.5.patch, HIVE-7353.6.patch While using embedded metastore, while creating background threads to run async operations, HiveServer2 ends up creating new instances of JDOPersistanceManager which are cached in JDOPersistanceManagerFactory. Even when the background thread is killed by the thread pool manager, the JDOPersistanceManager are never GCed because they are cached by JDOPersistanceManagerFactory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7873) Re-enable lazy HiveBaseFunctionResultList
[ https://issues.apache.org/jira/browse/HIVE-7873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-7873: --- Issue Type: Sub-task (was: Bug) Parent: HIVE-7292 Re-enable lazy HiveBaseFunctionResultList - Key: HIVE-7873 URL: https://issues.apache.org/jira/browse/HIVE-7873 Project: Hive Issue Type: Sub-task Reporter: Brock Noland Labels: spark We removed this optimization in HIVE-7799. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7873) Re-enable lazy HiveBaseFunctionResultList
Brock Noland created HIVE-7873: -- Summary: Re-enable lazy HiveBaseFunctionResultList Key: HIVE-7873 URL: https://issues.apache.org/jira/browse/HIVE-7873 Project: Hive Issue Type: Bug Reporter: Brock Noland We removed this optimization in HIVE-7799. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7846) authorization api should support group, not assume case insensitive role names
[ https://issues.apache.org/jira/browse/HIVE-7846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-7846: Summary: authorization api should support group, not assume case insensitive role names (was: authorization api should not assume case insensitive role names) authorization api should support group, not assume case insensitive role names -- Key: HIVE-7846 URL: https://issues.apache.org/jira/browse/HIVE-7846 Project: Hive Issue Type: Bug Components: Authorization Reporter: Thejas M Nair Assignee: Thejas M Nair The case insensitive behavior of roles should be specific to sql standard authorization. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7846) authorization api should support group, not assume case insensitive role names
[ https://issues.apache.org/jira/browse/HIVE-7846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-7846: Description: The case insensitive behavior of roles should be specific to sql standard authorization. Group type for principal also should be disabled at the sql std authorization layer, instead of disallowing it at the API level. was: The case insensitive behavior of roles should be specific to sql standard authorization. authorization api should support group, not assume case insensitive role names -- Key: HIVE-7846 URL: https://issues.apache.org/jira/browse/HIVE-7846 Project: Hive Issue Type: Bug Components: Authorization Reporter: Thejas M Nair Assignee: Thejas M Nair The case insensitive behavior of roles should be specific to sql standard authorization. Group type for principal also should be disabled at the sql std authorization layer, instead of disallowing it at the API level. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7799) TRANSFORM failed in transform_ppr1.q[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109679#comment-14109679 ] Brock Noland commented on HIVE-7799: Hey guys, I created HIVE-7873 to track the improvement in M4. TRANSFORM failed in transform_ppr1.q[Spark Branch] -- Key: HIVE-7799 URL: https://issues.apache.org/jira/browse/HIVE-7799 Project: Hive Issue Type: Bug Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Labels: Spark-M1 Attachments: HIVE-7799.1-spark.patch, HIVE-7799.2-spark.patch, HIVE-7799.3-spark.patch Here is the exception: {noformat} 2014-08-20 01:14:36,594 ERROR executor.Executor (Logging.scala:logError(96)) - Exception in task 0.0 in stage 1.0 (TID 0) java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.spark.HiveKVResultCache.next(HiveKVResultCache.java:113) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:124) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:82) at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:42) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) {noformat} Basically, the cause is that RowContainer is misused(it's not allowed to write once someone read row from it), i'm trying to figure out whether it's a hive issue or just in hive on spark mode. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7799) TRANSFORM failed in transform_ppr1.q[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-7799: --- Resolution: Fixed Fix Version/s: spark-branch Status: Resolved (was: Patch Available) Thank you guys! I have committed this to spark! TRANSFORM failed in transform_ppr1.q[Spark Branch] -- Key: HIVE-7799 URL: https://issues.apache.org/jira/browse/HIVE-7799 Project: Hive Issue Type: Bug Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Labels: Spark-M1 Fix For: spark-branch Attachments: HIVE-7799.1-spark.patch, HIVE-7799.2-spark.patch, HIVE-7799.3-spark.patch Here is the exception: {noformat} 2014-08-20 01:14:36,594 ERROR executor.Executor (Logging.scala:logError(96)) - Exception in task 0.0 in stage 1.0 (TID 0) java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.spark.HiveKVResultCache.next(HiveKVResultCache.java:113) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:124) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:82) at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:42) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) {noformat} Basically, the cause is that RowContainer is misused(it's not allowed to write once someone read row from it), i'm trying to figure out whether it's a hive issue or just in hive on spark mode. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7873) Re-enable lazy HiveBaseFunctionResultList
[ https://issues.apache.org/jira/browse/HIVE-7873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-7873: --- Labels: Spark-M5 spark (was: spark) Re-enable lazy HiveBaseFunctionResultList - Key: HIVE-7873 URL: https://issues.apache.org/jira/browse/HIVE-7873 Project: Hive Issue Type: Sub-task Reporter: Brock Noland Labels: Spark-M4, spark We removed this optimization in HIVE-7799. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7873) Re-enable lazy HiveBaseFunctionResultList
[ https://issues.apache.org/jira/browse/HIVE-7873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-7873: --- Labels: Spark-M4 spark (was: Spark-M5 spark) Re-enable lazy HiveBaseFunctionResultList - Key: HIVE-7873 URL: https://issues.apache.org/jira/browse/HIVE-7873 Project: Hive Issue Type: Sub-task Reporter: Brock Noland Labels: Spark-M4, spark We removed this optimization in HIVE-7799. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7871) WebHCat: Hive job with SQL server as MetastoreDB fails when Unicode characters are present in curl command
[ https://issues.apache.org/jira/browse/HIVE-7871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109684#comment-14109684 ] Hive QA commented on HIVE-7871: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12664180/HIVE-7871.1.patch {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 6118 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/489/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/489/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-489/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12664180 WebHCat: Hive job with SQL server as MetastoreDB fails when Unicode characters are present in curl command --- Key: HIVE-7871 URL: https://issues.apache.org/jira/browse/HIVE-7871 Project: Hive Issue Type: Bug Components: WebHCat Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan Attachments: HIVE-7871.1.patch Please follow the steps below to repro. 1. Create a SQL Server. Create Username, Password, DB with Unicode characters in their name. 2. Create a cluster and run the below command against its templeton endpoint curl -i -u username:password \ -d define=javax.jdo.option.ConnectionUserName=dbusername@SQLserver \ -d define=hive.metastore.uris= \ -d define=javax.jdo.option.ConnectionURL=jdbc:sqlserver://SQLserver.database.windows.net;database=dbname; encrypt=true;trustServerCertificate=true;create=false \ -d define=javax.jdo.option.ConnectionPassword=dbpassword \ -d statusdir=/hivestatus \ -d user.name=admin \ -d enablelog=false \ -d execute=show tables; \ -s https://localhost:30111/templeton/v1/hive; The following error message is received. javax.jdo.JDOFatalDataStoreException: Unable to open a test connection to the given database. JDBC url = jdbc:sqlserver://SQLserver.database.windows.net;database=dbname; encrypt=true;trustServerCertificate=true;create=false, username = dbusername@SQLserver. Terminating connection pool (set lazyInit to true if you expect to start your database after your app). Original Exception: -- com.microsoft.sqlserver.jdbc.SQLServerException: Login failed for user 'dbusername'. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7842) load_dyn_part1.q fails with an assertion [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109705#comment-14109705 ] Brock Noland commented on HIVE-7842: IIRC assertions are not enabled for MR when tasks are run so this might fail with MR as well. load_dyn_part1.q fails with an assertion [Spark Branch] --- Key: HIVE-7842 URL: https://issues.apache.org/jira/browse/HIVE-7842 Project: Hive Issue Type: Bug Components: Spark Affects Versions: spark-branch Reporter: Venki Korukanti Assignee: Venki Korukanti Labels: Spark-M1 Fix For: spark-branch On spark branch, load_dyn_part1.q fails with following assertion. Looks like SerDe is receiving invalid ByteWritable buffer. {code} java.lang.AssertionError org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserialize(BinarySortableSerDe.java:205) org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserialize(BinarySortableSerDe.java:187) org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:186) org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:47) org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:27) org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:98) scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) scala.collection.Iterator$class.foreach(Iterator.scala:727) scala.collection.AbstractIterator.foreach(Iterator.scala:1157) org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:759) org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:759) org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121) org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121) org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) org.apache.spark.scheduler.Task.run(Task.scala:54) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) java.lang.Thread.run(Thread.java:744) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7842) load_dyn_part1.q fails with an assertion [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-7842: --- Issue Type: Sub-task (was: Bug) Parent: HIVE-7292 load_dyn_part1.q fails with an assertion [Spark Branch] --- Key: HIVE-7842 URL: https://issues.apache.org/jira/browse/HIVE-7842 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Venki Korukanti Assignee: Venki Korukanti Labels: Spark-M1 Fix For: spark-branch On spark branch, load_dyn_part1.q fails with following assertion. Looks like SerDe is receiving invalid ByteWritable buffer. {code} java.lang.AssertionError org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserialize(BinarySortableSerDe.java:205) org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserialize(BinarySortableSerDe.java:187) org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:186) org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:47) org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:27) org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:98) scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) scala.collection.Iterator$class.foreach(Iterator.scala:727) scala.collection.AbstractIterator.foreach(Iterator.scala:1157) org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:759) org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:759) org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121) org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121) org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) org.apache.spark.scheduler.Task.run(Task.scala:54) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) java.lang.Thread.run(Thread.java:744) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7842) load_dyn_part1.q fails with an assertion [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109704#comment-14109704 ] Brock Noland commented on HIVE-7842: Linking to HIVE-7580. load_dyn_part1.q fails with an assertion [Spark Branch] --- Key: HIVE-7842 URL: https://issues.apache.org/jira/browse/HIVE-7842 Project: Hive Issue Type: Bug Components: Spark Affects Versions: spark-branch Reporter: Venki Korukanti Assignee: Venki Korukanti Labels: Spark-M1 Fix For: spark-branch On spark branch, load_dyn_part1.q fails with following assertion. Looks like SerDe is receiving invalid ByteWritable buffer. {code} java.lang.AssertionError org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserialize(BinarySortableSerDe.java:205) org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserialize(BinarySortableSerDe.java:187) org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:186) org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:47) org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:27) org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:98) scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) scala.collection.Iterator$class.foreach(Iterator.scala:727) scala.collection.AbstractIterator.foreach(Iterator.scala:1157) org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:759) org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:759) org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121) org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121) org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) org.apache.spark.scheduler.Task.run(Task.scala:54) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) java.lang.Thread.run(Thread.java:744) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7842) load_dyn_part1.q fails with an assertion [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-7842: --- Labels: Spark-M1 (was: ) load_dyn_part1.q fails with an assertion [Spark Branch] --- Key: HIVE-7842 URL: https://issues.apache.org/jira/browse/HIVE-7842 Project: Hive Issue Type: Bug Components: Spark Affects Versions: spark-branch Reporter: Venki Korukanti Assignee: Venki Korukanti Labels: Spark-M1 Fix For: spark-branch On spark branch, load_dyn_part1.q fails with following assertion. Looks like SerDe is receiving invalid ByteWritable buffer. {code} java.lang.AssertionError org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserialize(BinarySortableSerDe.java:205) org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserialize(BinarySortableSerDe.java:187) org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:186) org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:47) org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:27) org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:98) scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) scala.collection.Iterator$class.foreach(Iterator.scala:727) scala.collection.AbstractIterator.foreach(Iterator.scala:1157) org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:759) org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:759) org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121) org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121) org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) org.apache.spark.scheduler.Task.run(Task.scala:54) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) java.lang.Thread.run(Thread.java:744) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7843) orc_analyze.q fails with an assertion in FileSinkOperator [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-7843: --- Labels: Spark-M1 (was: ) orc_analyze.q fails with an assertion in FileSinkOperator [Spark Branch] Key: HIVE-7843 URL: https://issues.apache.org/jira/browse/HIVE-7843 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Venki Korukanti Assignee: Venki Korukanti Labels: Spark-M1 Fix For: spark-branch {code} java.lang.AssertionError: data length is different from num of DP columns org.apache.hadoop.hive.ql.exec.FileSinkOperator.getDynPartDirectory(FileSinkOperator.java:809) org.apache.hadoop.hive.ql.exec.FileSinkOperator.getDynOutPaths(FileSinkOperator.java:730) org.apache.hadoop.hive.ql.exec.FileSinkOperator.startGroup(FileSinkOperator.java:829) org.apache.hadoop.hive.ql.exec.Operator.defaultStartGroup(Operator.java:502) org.apache.hadoop.hive.ql.exec.Operator.startGroup(Operator.java:525) org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:198) org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:47) org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:27) org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:98) scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) scala.collection.Iterator$class.foreach(Iterator.scala:727) scala.collection.AbstractIterator.foreach(Iterator.scala:1157) org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:759) org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:759) org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121) org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121) org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) org.apache.spark.scheduler.Task.run(Task.scala:54) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) java.lang.Thread.run(Thread.java:744) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7844) optimize_nullscan.q fails due to differences in explain plan [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-7844: --- Issue Type: Sub-task (was: Bug) Parent: HIVE-7292 optimize_nullscan.q fails due to differences in explain plan [Spark Branch] --- Key: HIVE-7844 URL: https://issues.apache.org/jira/browse/HIVE-7844 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Venki Korukanti Assignee: Venki Korukanti Labels: Spark-M1 Fix For: spark-branch Looks like on spark branch, we are not optimizing query plans for limit 0 cases. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7843) orc_analyze.q fails with an assertion in FileSinkOperator [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-7843: --- Issue Type: Sub-task (was: Bug) Parent: HIVE-7292 orc_analyze.q fails with an assertion in FileSinkOperator [Spark Branch] Key: HIVE-7843 URL: https://issues.apache.org/jira/browse/HIVE-7843 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Venki Korukanti Assignee: Venki Korukanti Labels: Spark-M1 Fix For: spark-branch {code} java.lang.AssertionError: data length is different from num of DP columns org.apache.hadoop.hive.ql.exec.FileSinkOperator.getDynPartDirectory(FileSinkOperator.java:809) org.apache.hadoop.hive.ql.exec.FileSinkOperator.getDynOutPaths(FileSinkOperator.java:730) org.apache.hadoop.hive.ql.exec.FileSinkOperator.startGroup(FileSinkOperator.java:829) org.apache.hadoop.hive.ql.exec.Operator.defaultStartGroup(Operator.java:502) org.apache.hadoop.hive.ql.exec.Operator.startGroup(Operator.java:525) org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:198) org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:47) org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:27) org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:98) scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) scala.collection.Iterator$class.foreach(Iterator.scala:727) scala.collection.AbstractIterator.foreach(Iterator.scala:1157) org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:759) org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:759) org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121) org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121) org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) org.apache.spark.scheduler.Task.run(Task.scala:54) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) java.lang.Thread.run(Thread.java:744) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7438) Counters, statistics, and metrics [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-7438: --- Labels: Spark-M2 (was: ) Counters, statistics, and metrics [Spark Branch] Key: HIVE-7438 URL: https://issues.apache.org/jira/browse/HIVE-7438 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Chengxiang Li Labels: Spark-M2 Attachments: hive on spark job statistic design.docx Hive makes use of MapReduce counters for statistics and possibly for other purposes. For Hive on Spark, we should achieve the same functionality using Spark's accumulators. Hive also collects metrics from MapReduce jobs traditionally. Spark job very likely publishes a different set of metrics, which, if made available, would help user to get insights into their spark jobs. Thus, we should obtain the metrics and make them available as we do for MapReduce. This task therefore includes: # identify Hive's existing functionality w.r.t. counters, statistics, and metrics; # design and implement the same functionality in Spark. Please refer to the design document for more information. https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark#HiveonSpark-CountersandMetrics -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7844) optimize_nullscan.q fails due to differences in explain plan [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-7844: --- Labels: Spark-M1 (was: ) optimize_nullscan.q fails due to differences in explain plan [Spark Branch] --- Key: HIVE-7844 URL: https://issues.apache.org/jira/browse/HIVE-7844 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Venki Korukanti Assignee: Venki Korukanti Labels: Spark-M1 Fix For: spark-branch Looks like on spark branch, we are not optimizing query plans for limit 0 cases. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7438) Counters, statistics, and metrics [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-7438: --- Labels: Spark-M3 (was: Spark-M2) Counters, statistics, and metrics [Spark Branch] Key: HIVE-7438 URL: https://issues.apache.org/jira/browse/HIVE-7438 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Chengxiang Li Labels: Spark-M3 Attachments: hive on spark job statistic design.docx Hive makes use of MapReduce counters for statistics and possibly for other purposes. For Hive on Spark, we should achieve the same functionality using Spark's accumulators. Hive also collects metrics from MapReduce jobs traditionally. Spark job very likely publishes a different set of metrics, which, if made available, would help user to get insights into their spark jobs. Thus, we should obtain the metrics and make them available as we do for MapReduce. This task therefore includes: # identify Hive's existing functionality w.r.t. counters, statistics, and metrics; # design and implement the same functionality in Spark. Please refer to the design document for more information. https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark#HiveonSpark-CountersandMetrics -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7439) Spark job monitoring and error reporting [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-7439: --- Labels: Spark-M3 (was: ) Spark job monitoring and error reporting [Spark Branch] --- Key: HIVE-7439 URL: https://issues.apache.org/jira/browse/HIVE-7439 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Chengxiang Li Labels: Spark-M3 After Hive submits a job to Spark cluster, we need to report to user the job progress, such as the percentage done, to the user. This is especially important for long running queries. Moreover, if there is an error during job submission or execution, it's also crucial for hive to fetch the error log and/or stacktrace and feedback it to the user. Please refer design doc on wiki for more information. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7439) Spark job monitoring and error reporting [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109721#comment-14109721 ] Brock Noland commented on HIVE-7439: I think that we'll need the API from HIVE-7874 to do this work. Spark job monitoring and error reporting [Spark Branch] --- Key: HIVE-7439 URL: https://issues.apache.org/jira/browse/HIVE-7439 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Chengxiang Li Labels: Spark-M3 After Hive submits a job to Spark cluster, we need to report to user the job progress, such as the percentage done, to the user. This is especially important for long running queries. Moreover, if there is an error during job submission or execution, it's also crucial for hive to fetch the error log and/or stacktrace and feedback it to the user. Please refer design doc on wiki for more information. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7874) Support multiple concurrent users
Brock Noland created HIVE-7874: -- Summary: Support multiple concurrent users Key: HIVE-7874 URL: https://issues.apache.org/jira/browse/HIVE-7874 Project: Hive Issue Type: Sub-task Reporter: Brock Noland Priority: Blocker This JIRA is to track on the Hive side the solution to handling multiple user sessions. At first we thought this would be SPARK-2243 but there have been discussions on the Spark side which suggest the solution will be different. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7874) Support multiple concurrent users
[ https://issues.apache.org/jira/browse/HIVE-7874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-7874: --- Description: This JIRA is to track on the Hive side the solution to handling multiple user sessions. We thought this would be SPARK-2243 but there have been discussions on the Spark side which suggest the solution will be different. (was: This JIRA is to track on the Hive side the solution to handling multiple user sessions. At first we thought this would be SPARK-2243 but there have been discussions on the Spark side which suggest the solution will be different.) Support multiple concurrent users - Key: HIVE-7874 URL: https://issues.apache.org/jira/browse/HIVE-7874 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Brock Noland Priority: Blocker This JIRA is to track on the Hive side the solution to handling multiple user sessions. We thought this would be SPARK-2243 but there have been discussions on the Spark side which suggest the solution will be different. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7846) authorization api should support group, not assume case insensitive role names
[ https://issues.apache.org/jira/browse/HIVE-7846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-7846: Attachment: HIVE-7846.1.patch authorization api should support group, not assume case insensitive role names -- Key: HIVE-7846 URL: https://issues.apache.org/jira/browse/HIVE-7846 Project: Hive Issue Type: Bug Components: Authorization Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: HIVE-7846.1.patch The case insensitive behavior of roles should be specific to sql standard authorization. Group type for principal also should be disabled at the sql std authorization layer, instead of disallowing it at the API level. -- This message was sent by Atlassian JIRA (v6.2#6252)
Review Request 25037: HIVE-7846 - authorization api should support group, not assume case insensitive role names
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25037/ --- Review request for hive and Jason Dere. Bugs: HIVE-7846 https://issues.apache.org/jira/browse/HIVE-7846 Repository: hive-git Description --- See https://issues.apache.org/jira/browse/HIVE-7846 Diffs - itests/util/src/main/java/org/apache/hadoop/hive/ql/security/authorization/plugin/sqlstd/SQLStdHiveAccessControllerForTest.java 89429b6 itests/util/src/main/java/org/apache/hadoop/hive/ql/security/authorization/plugin/sqlstd/SQLStdHiveAuthorizationValidatorForTest.java 1d039ad itests/util/src/main/java/org/apache/hadoop/hive/ql/security/authorization/plugin/sqlstd/SQLStdHiveAuthorizerFactoryForTest.java 0f41a8f ql/src/java/org/apache/hadoop/hive/ql/parse/authorization/HiveAuthorizationTaskFactoryImpl.java f92ecf2 ql/src/java/org/apache/hadoop/hive/ql/plan/RoleDDLDesc.java 8413fb7 ql/src/java/org/apache/hadoop/hive/ql/security/authorization/AuthorizationUtils.java 2113f45 ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/HivePrincipal.java 30a4496 ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/sqlstd/SQLAuthorizationUtils.java a6b008a ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/sqlstd/SQLStdHiveAccessControllerWrapper.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/sqlstd/SQLStdHiveAuthorizationValidator.java 9ceac0c ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/sqlstd/SQLStdHiveAuthorizerFactory.java 9db3d74 ql/src/test/queries/clientnegative/authorization_grant_group.q PRE-CREATION ql/src/test/queries/clientnegative/authorization_public_create.q 002389f ql/src/test/queries/clientnegative/authorization_public_drop.q 69c5a8d ql/src/test/queries/clientnegative/authorization_role_case.q PRE-CREATION ql/src/test/queries/clientnegative/authorize_grant_public.q bfd3165 ql/src/test/queries/clientnegative/authorize_revoke_public.q 2b29822 ql/src/test/queries/clientpositive/authorization_1.q 25c9918 ql/src/test/queries/clientpositive/authorization_5.q 8869edc ql/src/test/queries/clientpositive/authorization_grant_public_role.q fe177ac ql/src/test/queries/clientpositive/authorization_role_grant2.q 95fa4e6 ql/src/test/results/clientnegative/authorization_grant_group.q.out PRE-CREATION ql/src/test/results/clientnegative/authorization_public_create.q.out 4c9a2ad ql/src/test/results/clientnegative/authorization_public_drop.q.out 520b56e ql/src/test/results/clientnegative/authorization_role_case.q.out PRE-CREATION ql/src/test/results/clientnegative/authorize_grant_public.q.out ef4a1b1 ql/src/test/results/clientnegative/authorize_revoke_public.q.out 618fedd ql/src/test/results/clientpositive/authorization_1.q.out dac0820 ql/src/test/results/clientpositive/authorization_5.q.out 6e5187e ql/src/test/results/clientpositive/authorization_grant_public_role.q.out 17b6c8a ql/src/test/results/clientpositive/authorization_role_grant2.q.out 56e7667 Diff: https://reviews.apache.org/r/25037/diff/ Testing --- Tests of old default authorization mode modified to verify that roles with mixed case now work. Added -ve test for group in grant with sql std auth. Thanks, Thejas Nair
[jira] [Updated] (HIVE-7392) Support Columns Stats for Partition Columns
[ https://issues.apache.org/jira/browse/HIVE-7392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7392: --- Attachment: h-7392.patch Support Columns Stats for Partition Columns --- Key: HIVE-7392 URL: https://issues.apache.org/jira/browse/HIVE-7392 Project: Hive Issue Type: Sub-task Reporter: Laljo John Pullokkaran Assignee: Ashutosh Chauhan Attachments: h-7392.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7392) Support Columns Stats for Partition Columns
[ https://issues.apache.org/jira/browse/HIVE-7392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7392: --- Status: Patch Available (was: Open) Support Columns Stats for Partition Columns --- Key: HIVE-7392 URL: https://issues.apache.org/jira/browse/HIVE-7392 Project: Hive Issue Type: Sub-task Reporter: Laljo John Pullokkaran Assignee: Ashutosh Chauhan Attachments: h-7392.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
Review Request 25038: implement NDV for partition colum
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25038/ --- Review request for hive and John Pullokkaran. Bugs: HIVE-7392 https://issues.apache.org/jira/browse/HIVE-7392 Repository: hive Description --- implement NDV for partition colum Diffs - branches/cbo/ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/RelOptHiveTable.java 1620394 Diff: https://reviews.apache.org/r/25038/diff/ Testing --- Thanks, Ashutosh Chauhan
[jira] [Commented] (HIVE-7392) Support Columns Stats for Partition Columns
[ https://issues.apache.org/jira/browse/HIVE-7392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109786#comment-14109786 ] Sergey Shelukhin commented on HIVE-7392: Is handling column values as strings correct for databases created with all versions of hive? In olden days it was possible to create partitions like a=2 and a=02 for integer a. There may still be similar cases now, but more restricted. What will query return in such cases on Hive? It may be different value than from these stats. Support Columns Stats for Partition Columns --- Key: HIVE-7392 URL: https://issues.apache.org/jira/browse/HIVE-7392 Project: Hive Issue Type: Sub-task Reporter: Laljo John Pullokkaran Assignee: Ashutosh Chauhan Attachments: h-7392.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7875) Hive cannot load data into partitioned table with Unicode key
Xiaobing Zhou created HIVE-7875: --- Summary: Hive cannot load data into partitioned table with Unicode key Key: HIVE-7875 URL: https://issues.apache.org/jira/browse/HIVE-7875 Project: Hive Issue Type: Bug Environment: Windows Server 2008 Reporter: Xiaobing Zhou Assignee: Xiaobing Zhou Steps to reproduce: 1) Copy the file partitioned.txt to the root folder HDFS. Copy the two hql files to your local directory. 2) Open Hive CLI. 3) Run: hive source path to CreatePartitionedTable.hql; 4) Run hive source path to LoadIntoPartitionedTable.hql; The following error will be shown: hive source C:\Scripts\partition\LoadIntoPartitionedTable.hql; Loading data to table default.mypartitioned partition (tag=䶵) Failed with exception null FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7875) Hive cannot load data into partitioned table with Unicode key
[ https://issues.apache.org/jira/browse/HIVE-7875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaobing Zhou updated HIVE-7875: Fix Version/s: 0.14.0 Hive cannot load data into partitioned table with Unicode key - Key: HIVE-7875 URL: https://issues.apache.org/jira/browse/HIVE-7875 Project: Hive Issue Type: Bug Affects Versions: 0.13.0 Environment: Windows Server 2008 Reporter: Xiaobing Zhou Assignee: Xiaobing Zhou Fix For: 0.14.0 Attachments: CreatePartitionedTable.hql, LoadIntoPartitionedTable.hql, partitioned.txt Steps to reproduce: 1) Copy the file partitioned.txt to the root folder HDFS. Copy the two hql files to your local directory. 2) Open Hive CLI. 3) Run: hive source path to CreatePartitionedTable.hql; 4) Run hive source path to LoadIntoPartitionedTable.hql; The following error will be shown: hive source C:\Scripts\partition\LoadIntoPartitionedTable.hql; Loading data to table default.mypartitioned partition (tag=䶵) Failed with exception null FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7875) Hive cannot load data into partitioned table with Unicode key
[ https://issues.apache.org/jira/browse/HIVE-7875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaobing Zhou updated HIVE-7875: Affects Version/s: 0.13.0 Hive cannot load data into partitioned table with Unicode key - Key: HIVE-7875 URL: https://issues.apache.org/jira/browse/HIVE-7875 Project: Hive Issue Type: Bug Affects Versions: 0.13.0 Environment: Windows Server 2008 Reporter: Xiaobing Zhou Assignee: Xiaobing Zhou Fix For: 0.14.0 Attachments: CreatePartitionedTable.hql, LoadIntoPartitionedTable.hql, partitioned.txt Steps to reproduce: 1) Copy the file partitioned.txt to the root folder HDFS. Copy the two hql files to your local directory. 2) Open Hive CLI. 3) Run: hive source path to CreatePartitionedTable.hql; 4) Run hive source path to LoadIntoPartitionedTable.hql; The following error will be shown: hive source C:\Scripts\partition\LoadIntoPartitionedTable.hql; Loading data to table default.mypartitioned partition (tag=䶵) Failed with exception null FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask -- This message was sent by Atlassian JIRA (v6.2#6252)