[jira] [Created] (HIVE-10005) remove some unnecessary branches from the inner loop
Gunther Hagleitner created HIVE-10005: - Summary: remove some unnecessary branches from the inner loop Key: HIVE-10005 URL: https://issues.apache.org/jira/browse/HIVE-10005 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Attachments: HIVE-10005.1.patch Operator.forward is doing too much. There's no reason to do the done checking per row and update it inline. It's much more efficient to just do that when the event that completes an operator happens. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10006) RSC has memory leak while execute multi queries.[Spark Branch]
Chengxiang Li created HIVE-10006: Summary: RSC has memory leak while execute multi queries.[Spark Branch] Key: HIVE-10006 URL: https://issues.apache.org/jira/browse/HIVE-10006 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: 1.1.0 Reporter: Chengxiang Li Assignee: Chengxiang Li Priority: Critical While execute query with RSC, MapWork/ReduceWork number is increased all the time, and lead to OOM at the end. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 31386: HIVE-9555 assorted ORC refactorings for LLAP on trunk
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31386/#review76879 --- common/src/java/org/apache/hadoop/hive/common/DiskRange.java https://reviews.apache.org/r/31386/#comment124580 Restore the finals, since in-place mutations of these ranges within a DiskRangeList produces hard-to-debug scenarios. common/src/java/org/apache/hadoop/hive/common/DiskRange.java https://reviews.apache.org/r/31386/#comment124579 Bad behaviour - the original DiskRange was written with final variables for easier debugging. ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java https://reviews.apache.org/r/31386/#comment124582 for loop? - Gopal V On March 11, 2015, 12:50 a.m., Sergey Shelukhin wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31386/ --- (Updated March 11, 2015, 12:50 a.m.) Review request for hive and Prasanth_J. Repository: hive-git Description --- see jira Diffs - common/src/java/org/apache/hadoop/hive/common/DiskRange.java PRE-CREATION common/src/java/org/apache/hadoop/hive/common/DiskRangeList.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/orc/CompressionCodec.java 5e2d880 ql/src/java/org/apache/hadoop/hive/ql/io/orc/FileDump.java 9788c16 ql/src/java/org/apache/hadoop/hive/ql/io/orc/InStream.java 62c6f8d ql/src/java/org/apache/hadoop/hive/ql/io/orc/MetadataReader.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFile.java 25bb15a ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java 498ee14 ql/src/java/org/apache/hadoop/hive/ql/io/orc/PositionProvider.java 3daa9ba ql/src/java/org/apache/hadoop/hive/ql/io/orc/Reader.java f85c21b ql/src/java/org/apache/hadoop/hive/ql/io/orc/ReaderImpl.java 03f8085 ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java 458ad21 ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderUtils.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthIntegerReaderV2.java 4057036 ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcInputFormat.java 23e5f27 ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java 79dc5a1 ql/src/java/org/apache/hadoop/hive/ql/io/sarg/SearchArgumentFactory.java f4a2e65 ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInStream.java 0ea4a7b ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInputOutputFormat.java 2cc3d7a ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestIntegerCompressionReader.java 591ec3f ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcFile.java cd1d645 ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestRecordReaderImpl.java 326dde4 Diff: https://reviews.apache.org/r/31386/diff/ Testing --- Thanks, Sergey Shelukhin
[jira] [Created] (HIVE-10007) Support qualified table name in analyze table compute statistics for columns
Chaoyu Tang created HIVE-10007: -- Summary: Support qualified table name in analyze table compute statistics for columns Key: HIVE-10007 URL: https://issues.apache.org/jira/browse/HIVE-10007 Project: Hive Issue Type: Improvement Components: Query Processor, Statistics Affects Versions: 1.0.0 Reporter: Chaoyu Tang Assignee: Chaoyu Tang Currently analyze table compute statistics for columns command can not compute column stats for a table in a different database since it does not support qualified table name. You need switch to that table database in order to compute its column stats. For example, you have to use psqljira, then analyze table src compute statistics for columns for the table src under psqljira. This JIRA will provide the support to qualified table name in analyze column stats command. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10009) LazyObjectInspectorFactory is not thread safe [Spark Branch]
Jimmy Xiang created HIVE-10009: -- Summary: LazyObjectInspectorFactory is not thread safe [Spark Branch] Key: HIVE-10009 URL: https://issues.apache.org/jira/browse/HIVE-10009 Project: Hive Issue Type: Bug Reporter: Jimmy Xiang Assignee: Jimmy Xiang LazyObjectInspectorFactory is not thread safe, which causes random failures in multiple thread environment such as Hive on Spark. We got exceptions like below {noformat} java.lang.RuntimeException: Map operator initialization failed: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.lazy.objectinspector.LazySimpleStructObjectInspector cannot be cast to org.apache.hadoop.hive.serde2.objectinspector.SettableStructObjectInspector at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:127) at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:55) at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:30) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:170) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:170) at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634) at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:64) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.lazy.objectinspector.LazySimpleStructObjectInspector cannot be cast to org.apache.hadoop.hive.serde2.objectinspector.SettableStructObjectInspector at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConverter(ObjectInspectorConverters.java:154) at org.apache.hadoop.hive.ql.exec.MapOperator.initObjectInspector(MapOperator.java:199) at org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:355) at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:92) ... 16 more {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Reading 2 table data in MapReduce for Performing Join
Hi All, https://issues.apache.org/jira/browse/HIVE-4997 patch helped! On Tue, Mar 17, 2015 at 1:05 AM, Suraj Nayak snay...@gmail.com wrote: Hi, I tried reading data via HCatalog for 1 Hive table in MapReduce using something similar to https://cwiki.apache.org/confluence/display/Hive/HCatalog+InputOutput#HCatalogInputOutput-RunningMapReducewithHCatalog. I was able to read successfully. Now am trying to read 2 tables, as the requirement is to join 2 tables. I did not find API similar to *FileInputFormat.addInputPaths* in *HCatInputFormat*. What is the equivalent of the same in HCat ? I had performed join using FilesInputFormat in HDFS(by getting split information in mapper). This article( http://www.codingjunkie.com/mapreduce-reduce-joins) helped me code join. http://www.codingjunkie.com/mapreduce-reduce-joins/ Can someone suggest how I can perform join operation using HCatalog ? Briefly, the aim is to - Read 2 tables (almost similar schema) - If key exists in both the table send it to same reducer. - Do some processing on the records in reducer. - Save the output into file/Hive table. *P.S : The reason for using MapReduce to perform join is because of complex requirement which can't be solved via Hive/Pig directly. * Any help will be greatly appreciated :) -- Thanks Suraj Nayak M -- Thanks Suraj Nayak M
[jira] [Created] (HIVE-10002) fix yarn service registry not found in ut problem
Gunther Hagleitner created HIVE-10002: - Summary: fix yarn service registry not found in ut problem Key: HIVE-10002 URL: https://issues.apache.org/jira/browse/HIVE-10002 Project: Hive Issue Type: Sub-task Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10003) MiniTez ut fail with missing configs
Gunther Hagleitner created HIVE-10003: - Summary: MiniTez ut fail with missing configs Key: HIVE-10003 URL: https://issues.apache.org/jira/browse/HIVE-10003 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gopal V -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10004) yarn service registry should be shim'd
Gunther Hagleitner created HIVE-10004: - Summary: yarn service registry should be shim'd Key: HIVE-10004 URL: https://issues.apache.org/jira/browse/HIVE-10004 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gopal V -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10014) LLAP : investigate showing LLAP IO usage in explain
Sergey Shelukhin created HIVE-10014: --- Summary: LLAP : investigate showing LLAP IO usage in explain Key: HIVE-10014 URL: https://issues.apache.org/jira/browse/HIVE-10014 Project: Hive Issue Type: Sub-task Reporter: Sergey Shelukhin Not sure if input formats are created during explain, or if it is possible to go deep enough to create them; we should show if LLAP input format will be used for the query based on whether it's vectorized, original format is supported, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10012) LLAP: Hive sessions run before Slider registers to YARN registry fail to launch
Gopal V created HIVE-10012: -- Summary: LLAP: Hive sessions run before Slider registers to YARN registry fail to launch Key: HIVE-10012 URL: https://issues.apache.org/jira/browse/HIVE-10012 Project: Hive Issue Type: Sub-task Affects Versions: llap Reporter: Gopal V Assignee: Gopal V Fix For: llap The LLAP YARN registry only registers entries after at least one daemon is up. Any Tez session starting before that will end up with an error listing zookeeper directories. {code} 2015-03-18 16:54:21,392 FATAL [main] app.DAGAppMaster: Error starting DAGAppMaster org.apache.hadoop.service.ServiceStateException: org.apache.hadoop.fs.PathNotFoundException: `/users/sershe/services/org-apache-hive/llap0/components/workers': {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10011) LLAP: NegativeArraySize exception on some vector string reader
Gopal V created HIVE-10011: -- Summary: LLAP: NegativeArraySize exception on some vector string reader Key: HIVE-10011 URL: https://issues.apache.org/jira/browse/HIVE-10011 Project: Hive Issue Type: Sub-task Reporter: Gopal V With some logging, I confirmed that the String length vectors contained junk data the length field is overflowing. {code} Caused by: java.lang.NegativeArraySizeException at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$BytesColumnVectorUtil.commonReadByteArrays(RecordReaderImpl.java:1550) at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$BytesColumnVectorUtil.readOrcByteArrays(RecordReaderImpl.java:1569) at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StringDirectTreeReader.nextVector(RecordReaderImpl.java:1691) at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StringTreeReader.nextVector(RecordReaderImpl.java:1517) at org.apache.hadoop.hive.llap.io.decode.OrcEncodedDataConsumer.decodeBatch(OrcEncodedDataConsumer.java:115) at org.apache.hadoop.hive.llap.io.decode.EncodedDataConsumer.consumeData(EncodedDataConsumer.java:108) at org.apache.hadoop.hive.llap.io.decode.EncodedDataConsumer.consumeData(EncodedDataConsumer.java:35) at org.apache.hadoop.hive.ql.io.orc.EncodedReaderImpl.readEncodedColumns(EncodedReaderImpl.java:314) at org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:272) at org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:44) at org.apache.hadoop.hive.common.CallableWithNdc.call(CallableWithNdc.java:37) ... 4 more {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10015) LLAP : add LLAP IO read debug tool
Sergey Shelukhin created HIVE-10015: --- Summary: LLAP : add LLAP IO read debug tool Key: HIVE-10015 URL: https://issues.apache.org/jira/browse/HIVE-10015 Project: Hive Issue Type: Sub-task Reporter: Sergey Shelukhin Need a tool that will run read pipeline on a file downloaded from the cluster (that causes some exception), for local debugging. Needs to reproduce whatever environment necessary (in IO pipeline itself) from the cluster (e.g. allocation sizes in allocator, etc.). Cache contents can be gotten from the same file (optional, phase 2); incorrect pre-existing cache contents would be out of the scope of this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10013) NPE in LLAP logs in heartbeat
Sergey Shelukhin created HIVE-10013: --- Summary: NPE in LLAP logs in heartbeat Key: HIVE-10013 URL: https://issues.apache.org/jira/browse/HIVE-10013 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Siddharth Seth {noformat} 2015-03-18 17:28:37,559 [TezTaskRunner_attempt_1424502260528_1294_1_00_25_0(container_1_1294_01_26_sershe_20150318172752_5ce4647e-177c-4b1e-8dfa-462230735854:1_Map 1_25_0)] INFO org.apache.tez.runtime.task.TezTaskRunner: Encounted an error while executing task: attempt_1424502260528_1294_1_00_25_0 java.lang.NullPointerException at org.apache.tez.runtime.task.TaskReporter$HeartbeatCallable.access$400(TaskReporter.java:120) at org.apache.tez.runtime.task.TaskReporter.addEvents(TaskReporter.java:386) at org.apache.tez.runtime.task.TezTaskRunner.addEvents(TezTaskRunner.java:278) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.sendTaskGeneratedEvents(LogicalIOProcessorRuntimeTask.java:596) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.close(LogicalIOProcessorRuntimeTask.java:355) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:181) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 2015-03-18 17:28:37,559 [TezTaskRunner_attempt_1424502260528_1294_1_00_25_0(container_1_1294_01_26_sershe_20150318172752_5ce4647e-177c-4b1e-8dfa-462230735854:1_Map 1_25_0)] INFO org.apache.tez.runtime.task.TezTaskRunner: Ignoring the following exception since a previous exception is already registered java.lang.NullPointerException at org.apache.tez.runtime.task.TaskReporter$HeartbeatCallable.access$300(TaskReporter.java:120) at org.apache.tez.runtime.task.TaskReporter.taskFailed(TaskReporter.java:382) at org.apache.tez.runtime.task.TezTaskRunner.sendFailure(TezTaskRunner.java:260) at org.apache.tez.runtime.task.TezTaskRunner.access$600(TezTaskRunner.java:52) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:227) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10008) Need to refactor itests for hbase metastore [hbase-metastore branch]
Alan Gates created HIVE-10008: - Summary: Need to refactor itests for hbase metastore [hbase-metastore branch] Key: HIVE-10008 URL: https://issues.apache.org/jira/browse/HIVE-10008 Project: Hive Issue Type: Task Components: Tests Affects Versions: hbase-metastore-branch Reporter: Alan Gates Assignee: Alan Gates Much of the infrastructure for the itest/hive-unit/.../metastore/hbase tests is repeated in each test. This needs to be factored out into a base class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10010) Alter table results in NPE [hbase-metastore branch]
Alan Gates created HIVE-10010: - Summary: Alter table results in NPE [hbase-metastore branch] Key: HIVE-10010 URL: https://issues.apache.org/jira/browse/HIVE-10010 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: hbase-metastore-branch Reporter: Alan Gates Assignee: Alan Gates Doing an alter table results in: {code} 2015-03-18 10:45:54,189 ERROR [main]: exec.DDLTask (DDLTask.java:failed(512)) - java.lang.NullPointerException at org.apache.hadoop.hive.metastore.api.StorageDescriptor.init(StorageDescriptor.java:239) at org.apache.hadoop.hive.metastore.api.Table.init(Table.java:270) at org.apache.hadoop.hive.metastore.api.Table.deepCopy(Table.java:310) at org.apache.hadoop.hive.ql.metadata.Table.copy(Table.java:856) at org.apache.hadoop.hive.ql.exec.DDLTask.alterTable(DDLTask.java:3329) at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:329) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1644) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1403) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1189) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1055) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1045) at org.apache.hadoop.hive.metastore.hbase.TestHBaseMetastoreSql.table(TestHBaseMetastoreSql.java:89) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)