[jira] [Created] (HIVE-11073) ORC FileDump utility ignore errors when writing output
Elliot West created HIVE-11073: -- Summary: ORC FileDump utility ignore errors when writing output Key: HIVE-11073 URL: https://issues.apache.org/jira/browse/HIVE-11073 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 1.2.0 Reporter: Elliot West Assignee: Elliot West Priority: Minor The Hive command line provides the {{--orcfiledump}} utility for dumping data contained within ORC files, specifically when using the {{-d}} option. Generally, it is useful to be able to pipe the data extracted into other commands and utilities to transform and control the data so that it is more manageable by the CLI user. A classic example is {{less}}. When such command pipelines are currently constructed, the underlying implementation in {{org.apache.hadoop.hive.ql.io.orc.FileDump#printJsonData}} is oblivious to errors occurring when writing to its output stream. Such errors are common place when a user issues {{Ctrl+C}} to kill the leaf process. In this event the leaf process terminates immediately but the Hive CLI process continues to execute until the full contents of the ORC file has been read. By making {{FileDump}} considerate of output stream errors the process will terminate as soon as the destination process exits (i.e. when the user kills {{less}}) and control will be returned to the user as expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11072) Add data validation between Hive metastore upgrades tests
Sergio Peña created HIVE-11072: -- Summary: Add data validation between Hive metastore upgrades tests Key: HIVE-11072 URL: https://issues.apache.org/jira/browse/HIVE-11072 Project: Hive Issue Type: Task Components: Tests Reporter: Sergio Peña Assignee: Sergio Peña An existing Hive metastore upgrade test is running on Hive jenkins. However, these scripts do test only database schema upgrade, not data validation between upgrades. We should validate data between metastore version upgrades. Using data validation, we may ensure that data won't be damaged, or corrupted when upgrading the Hive metastore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11075) HiveOnTez: show vertex dependency in topological order in explain
Pengcheng Xiong created HIVE-11075: -- Summary: HiveOnTez: show vertex dependency in topological order in explain Key: HIVE-11075 URL: https://issues.apache.org/jira/browse/HIVE-11075 Project: Hive Issue Type: Improvement Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Right now the vertex is shown in the order of its string name. We would like to improve it to show vertex dependency in topological order in explain. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11074) Update tests for HIVE-9302 after removing binaries
Jesus Camacho Rodriguez created HIVE-11074: -- Summary: Update tests for HIVE-9302 after removing binaries Key: HIVE-11074 URL: https://issues.apache.org/jira/browse/HIVE-11074 Project: Hive Issue Type: Bug Affects Versions: 1.2.0 Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11076) Explicitly set hive.cbo.enable=true for some tests
Pengcheng Xiong created HIVE-11076: -- Summary: Explicitly set hive.cbo.enable=true for some tests Key: HIVE-11076 URL: https://issues.apache.org/jira/browse/HIVE-11076 Project: Hive Issue Type: Improvement Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11069) ColumnStatsTask doesn't work with hive.exec.parallel
Rajat Jain created HIVE-11069: - Summary: ColumnStatsTask doesn't work with hive.exec.parallel Key: HIVE-11069 URL: https://issues.apache.org/jira/browse/HIVE-11069 Project: Hive Issue Type: Bug Affects Versions: 1.2.0 Reporter: Rajat Jain Try a simple query: {code} hive set hive.exec.parallel=true; hive analyze table src compute statistics for columns; {code} It fails with errors similar to: {code} FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.ColumnStatsTask hive java.lang.RuntimeException: Error caching map.xml: java.io.IOException: java.lang.InterruptedException at org.apache.hadoop.hive.ql.exec.Utilities.setBaseWork(Utilities.java:747) at org.apache.hadoop.hive.ql.exec.Utilities.setMapWork(Utilities.java:682) at org.apache.hadoop.hive.ql.exec.Utilities.setMapRedWork(Utilities.java:674) at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:375) at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:137) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88) at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:75) Caused by: java.io.IOException: java.lang.InterruptedException at org.apache.hadoop.ipc.Client.call(Client.java:1450) at org.apache.hadoop.ipc.Client.call(Client.java:1402) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) at com.sun.proxy.$Proxy14.mkdirs(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.mkdirs(ClientNamenodeProtocolTranslatorPB.java:539) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy15.mkdirs(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:2758) at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:2729) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:870) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:866) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirsInternal(DistributedFileSystem.java:866) at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:859) at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1954) at org.apache.hadoop.hive.ql.exec.Utilities.setPlanPath(Utilities.java:765) at org.apache.hadoop.hive.ql.exec.Utilities.setBaseWork(Utilities.java:691) ... 7 more Caused by: java.lang.InterruptedException at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:400) at java.util.concurrent.FutureTask.get(FutureTask.java:187) at org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:1049) at org.apache.hadoop.ipc.Client.call(Client.java:1444) ... 28 more Job Submission failed with exception 'java.lang.RuntimeException(Error caching map.xml: java.io.IOException: java.lang.InterruptedException)' {code} The problem is the Column Stats Task doesn't depend on the root task which causes errors. Here's the explain output: {code} hive explain analyze table src compute statistics for columns; OK STAGE DEPENDENCIES: Stage-0 is a root stage Stage-1 is a root stage STAGE PLANS: Stage: Stage-0 Map Reduce Map Operator Tree: TableScan alias: src Select Operator expressions: key (type: string), value (type: string) outputColumnNames: key, value Group By Operator aggregations: compute_stats(key, 16), compute_stats(value, 16) mode: hash outputColumnNames: _col0, _col1 Reduce Output Operator sort order: value expressions: _col0 (type: structcolumntype:string,maxlength:bigint,sumlength:bigint,count:bigint,countnulls:bigint,bitvector:string,numbitvectors:int), _col1 (type: structcolumntype:string,maxlength:bigint,sumlength:bigint,count:bigint,countnulls:bigint,bitvector:string,numbitvectors:int) Reduce Operator Tree: Group
Re: Review Request 34059: HIVE-10673 Dynamically partitioned hash join for Tez
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34059/ --- (Updated June 22, 2015, 7:30 p.m.) Review request for hive, Matt McCline and Vikram Dixit Kumaraswamy. Changes --- rebasing patch with trunk again Bugs: HIVE-10673 https://issues.apache.org/jira/browse/HIVE-10673 Repository: hive-git Description --- Reduce-side hash join (using MapJoinOperator), where the Tez inputs to the reducer are unsorted. Diffs (updated) - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 27f68df itests/src/test/resources/testconfiguration.properties 7b7559a ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java 15cafdd ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java d7f1b42 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesAdapter.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValue.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValues.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordProcessor.java 545d7c6 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordSource.java cdabe3a ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java e9bd44a ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinCommonOperator.java 4c8c4b1 ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java 5a87bd6 ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java 4d84f0f ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkMapJoinProc.java bca91dd ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezProcContext.java adc31ae ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezUtils.java 11c1df6 ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezWork.java 6db8220 ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java a342738 ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java fb3c4a3 ql/src/java/org/apache/hadoop/hive/ql/plan/MapJoinDesc.java cee9100 ql/src/test/queries/clientpositive/tez_dynpart_hashjoin_1.q PRE-CREATION ql/src/test/queries/clientpositive/tez_dynpart_hashjoin_2.q PRE-CREATION ql/src/test/queries/clientpositive/tez_vector_dynpart_hashjoin_1.q PRE-CREATION ql/src/test/results/clientpositive/tez/tez_dynpart_hashjoin_1.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/tez_dynpart_hashjoin_2.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/tez_vector_dynpart_hashjoin_1.q.out PRE-CREATION Diff: https://reviews.apache.org/r/34059/diff/ Testing --- q-file tests added Thanks, Jason Dere
Added jars are not available after running shell command in hive that uses hive
Hi, We encounter strange behavior on Hive 13.0.1 on Amazon EMR: add jar my.jar //Loading a jar that contains udfs create temporary function my_func as some.package.Func !hive -e ; select my_func(column1) from mytable This results in: The UDF implementation class 'some.package.Func' is not present in the class path We had no such problem when using hive 0.11
[jira] [Created] (HIVE-11078) Enhance DbLockManger to support multi-statement txns
Eugene Koifman created HIVE-11078: - Summary: Enhance DbLockManger to support multi-statement txns Key: HIVE-11078 URL: https://issues.apache.org/jira/browse/HIVE-11078 Project: Hive Issue Type: Sub-task Components: Locking, Transactions Affects Versions: 1.2.0 Reporter: Eugene Koifman Assignee: Eugene Koifman need to build deadlock detection, etc -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11079) Fix qfile tests that fail on Windows due to CR/character escape differences
Jason Dere created HIVE-11079: - Summary: Fix qfile tests that fail on Windows due to CR/character escape differences Key: HIVE-11079 URL: https://issues.apache.org/jira/browse/HIVE-11079 Project: Hive Issue Type: Bug Components: Tests Reporter: Jason Dere Assignee: Jason Dere A few qfile tests are failing on Windows due to a couple of windows-specific issues: - The table comment for the test includes a CR character, which is different on Windows compared to Unix. - The partition path in the test includes a space character. Unlike Unix, on Windows space characters in Hive paths are escaped. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11080) Modify VectorizedRowBatch.toString() to not depend on VectorExpressionWriter
Owen O'Malley created HIVE-11080: Summary: Modify VectorizedRowBatch.toString() to not depend on VectorExpressionWriter Key: HIVE-11080 URL: https://issues.apache.org/jira/browse/HIVE-11080 Project: Hive Issue Type: Sub-task Reporter: Owen O'Malley Assignee: Owen O'Malley Currently the VectorizedRowBatch.toString method uses the VectorExpressionWriter to convert the row batch to a string. Since the string is only used for printing error messages, I'd propose making the toString use the types of the vector batch instead of the object inspector. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11077) Add support in parser and wire up to txn manager
Eugene Koifman created HIVE-11077: - Summary: Add support in parser and wire up to txn manager Key: HIVE-11077 URL: https://issues.apache.org/jira/browse/HIVE-11077 Project: Hive Issue Type: Sub-task Components: SQL, Transactions Affects Versions: 1.2.0 Reporter: Eugene Koifman Assignee: Eugene Koifman -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11081) Running hive commands from an external command line ( using ! ) makes added UDF disappear
harel gliksman created HIVE-11081: - Summary: Running hive commands from an external command line ( using ! ) makes added UDF disappear Key: HIVE-11081 URL: https://issues.apache.org/jira/browse/HIVE-11081 Project: Hive Issue Type: Bug Affects Versions: 0.13.1 Environment: Amazon EMR AMI 3.5.0 Reporter: harel gliksman Priority: Minor add jar myjar.jar create temporary function myfunc as '...' create some table !hive -e --some commnd line that uses hive (such as a customized script that adds partitions to a table) use myfunc in a query This results in The UDF implementation class '...' is not present in the class path. Without the shell command it works, Even weirder is that running list jars does show the added jar still the exception is thrown... This did not happen on Hive 0.11 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11070) HCatOutputFormat doesn't clean up
Tilaye created HIVE-11070: - Summary: HCatOutputFormat doesn't clean up Key: HIVE-11070 URL: https://issues.apache.org/jira/browse/HIVE-11070 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.13.1 Reporter: Tilaye Priority: Minor The following piece of code creates a resource which blocks the JVM from exiting. I'm not sure if it is related but see a process reaper thread being created when the setOutput call is made. JVM exits if that statement is removed. import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.mapreduce.Job; import org.apache.hive.hcatalog.mapreduce.HCatOutputFormat; import org.apache.hive.hcatalog.mapreduce.OutputJobInfo; public class ReproduceHCatError { public static void main(String[] args) throws Exception { Job job = Job.getInstance(); Configuration config = job.getConfiguration(); config.set(hive.metastore.uris, thrift://bd:9083); OutputJobInfo outputJobInfo = OutputJobInfo.create(null, test_table_name, null); HCatOutputFormat.setOutput(job, outputJobInfo); } } -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11071) FIx the output of beeline dbinfo command
Shinichi Yamashita created HIVE-11071: - Summary: FIx the output of beeline dbinfo command Key: HIVE-11071 URL: https://issues.apache.org/jira/browse/HIVE-11071 Project: Hive Issue Type: Bug Components: Beeline Reporter: Shinichi Yamashita Assignee: Shinichi Yamashita When dbinfo is executed by beeline, it is displayed as follows. {code} 0: jdbc:hive2://localhost:10001/ !dbinfo Error: Method not supported (state=,code=0) allTablesAreSelectabletrue Error: Method not supported (state=,code=0) Error: Method not supported (state=,code=0) Error: Method not supported (state=,code=0) getCatalogSeparator . getCatalogTerminstance getDatabaseProductNameApache Hive getDatabaseProductVersion 2.0.0-SNAPSHOT getDefaultTransactionIsolation0 getDriverMajorVersion 1 getDriverMinorVersion 1 getDriverName Hive JDBC ... {code} The method name of Error is not understood. I fix this output. -- This message was sent by Atlassian JIRA (v6.3.4#6332)