[jira] [Created] (HIVE-11073) ORC FileDump utility ignore errors when writing output

2015-06-22 Thread Elliot West (JIRA)
Elliot West created HIVE-11073:
--

 Summary: ORC FileDump utility ignore errors when writing output
 Key: HIVE-11073
 URL: https://issues.apache.org/jira/browse/HIVE-11073
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 1.2.0
Reporter: Elliot West
Assignee: Elliot West
Priority: Minor


The Hive command line provides the {{--orcfiledump}} utility for dumping data 
contained within ORC files, specifically when using the {{-d}} option. 
Generally, it is useful to be able to pipe the data extracted into other 
commands and utilities to transform and control the data so that it is more 
manageable by the CLI user. A classic example is {{less}}.

When such command pipelines are currently constructed, the underlying 
implementation in {{org.apache.hadoop.hive.ql.io.orc.FileDump#printJsonData}} 
is oblivious to errors occurring when writing to its output stream. Such errors 
are common place when a user issues {{Ctrl+C}} to kill the leaf process. In 
this event the leaf process terminates immediately but the Hive CLI process 
continues to execute until the full contents of the ORC file has been read.

By making {{FileDump}} considerate of output stream errors the process will 
terminate as soon as the destination process exits (i.e. when the user kills 
{{less}}) and control will be returned to the user as expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11072) Add data validation between Hive metastore upgrades tests

2015-06-22 Thread JIRA
Sergio Peña created HIVE-11072:
--

 Summary: Add data validation between Hive metastore upgrades tests
 Key: HIVE-11072
 URL: https://issues.apache.org/jira/browse/HIVE-11072
 Project: Hive
  Issue Type: Task
  Components: Tests
Reporter: Sergio Peña
Assignee: Sergio Peña


An existing Hive metastore upgrade test is running on Hive jenkins. However, 
these scripts do test only database schema upgrade, not data validation between 
upgrades.

We should validate data between metastore version upgrades. Using data 
validation, we may ensure that data won't be damaged, or corrupted when 
upgrading the Hive metastore.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11075) HiveOnTez: show vertex dependency in topological order in explain

2015-06-22 Thread Pengcheng Xiong (JIRA)
Pengcheng Xiong created HIVE-11075:
--

 Summary: HiveOnTez: show vertex dependency in topological order in 
explain
 Key: HIVE-11075
 URL: https://issues.apache.org/jira/browse/HIVE-11075
 Project: Hive
  Issue Type: Improvement
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong


Right now the vertex is shown in the order of its string name. We would like to 
improve it to show vertex dependency in topological order in explain.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11074) Update tests for HIVE-9302 after removing binaries

2015-06-22 Thread Jesus Camacho Rodriguez (JIRA)
Jesus Camacho Rodriguez created HIVE-11074:
--

 Summary: Update tests for HIVE-9302 after removing binaries
 Key: HIVE-11074
 URL: https://issues.apache.org/jira/browse/HIVE-11074
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11076) Explicitly set hive.cbo.enable=true for some tests

2015-06-22 Thread Pengcheng Xiong (JIRA)
Pengcheng Xiong created HIVE-11076:
--

 Summary: Explicitly set hive.cbo.enable=true for some tests
 Key: HIVE-11076
 URL: https://issues.apache.org/jira/browse/HIVE-11076
 Project: Hive
  Issue Type: Improvement
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11069) ColumnStatsTask doesn't work with hive.exec.parallel

2015-06-22 Thread Rajat Jain (JIRA)
Rajat Jain created HIVE-11069:
-

 Summary: ColumnStatsTask doesn't work with hive.exec.parallel
 Key: HIVE-11069
 URL: https://issues.apache.org/jira/browse/HIVE-11069
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Rajat Jain


Try a simple query:

{code}
hive set hive.exec.parallel=true;
hive analyze table src compute statistics for columns;
{code}

It fails with errors similar to:

{code}
FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.ColumnStatsTask
hive java.lang.RuntimeException: Error caching map.xml: java.io.IOException: 
java.lang.InterruptedException
at 
org.apache.hadoop.hive.ql.exec.Utilities.setBaseWork(Utilities.java:747)
at 
org.apache.hadoop.hive.ql.exec.Utilities.setMapWork(Utilities.java:682)
at 
org.apache.hadoop.hive.ql.exec.Utilities.setMapRedWork(Utilities.java:674)
at 
org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:375)
at 
org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:137)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)
at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:75)
Caused by: java.io.IOException: java.lang.InterruptedException
at org.apache.hadoop.ipc.Client.call(Client.java:1450)
at org.apache.hadoop.ipc.Client.call(Client.java:1402)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
at com.sun.proxy.$Proxy14.mkdirs(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.mkdirs(ClientNamenodeProtocolTranslatorPB.java:539)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy15.mkdirs(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:2758)
at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:2729)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:870)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:866)
at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.mkdirsInternal(DistributedFileSystem.java:866)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:859)
at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1954)
at 
org.apache.hadoop.hive.ql.exec.Utilities.setPlanPath(Utilities.java:765)
at 
org.apache.hadoop.hive.ql.exec.Utilities.setBaseWork(Utilities.java:691)
... 7 more
Caused by: java.lang.InterruptedException
at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:400)
at java.util.concurrent.FutureTask.get(FutureTask.java:187)
at 
org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:1049)
at org.apache.hadoop.ipc.Client.call(Client.java:1444)
... 28 more
Job Submission failed with exception 'java.lang.RuntimeException(Error caching 
map.xml: java.io.IOException: java.lang.InterruptedException)'
{code}

The problem is the Column Stats Task doesn't depend on the root task which 
causes errors. Here's the explain output:

{code}
hive explain analyze table src compute statistics for columns;
OK
STAGE DEPENDENCIES:
  Stage-0 is a root stage
  Stage-1 is a root stage

STAGE PLANS:
  Stage: Stage-0
Map Reduce
  Map Operator Tree:
  TableScan
alias: src
Select Operator
  expressions: key (type: string), value (type: string)
  outputColumnNames: key, value
  Group By Operator
aggregations: compute_stats(key, 16), compute_stats(value, 16)
mode: hash
outputColumnNames: _col0, _col1
Reduce Output Operator
  sort order:
  value expressions: _col0 (type: 
structcolumntype:string,maxlength:bigint,sumlength:bigint,count:bigint,countnulls:bigint,bitvector:string,numbitvectors:int),
 _col1 (type: 
structcolumntype:string,maxlength:bigint,sumlength:bigint,count:bigint,countnulls:bigint,bitvector:string,numbitvectors:int)
  Reduce Operator Tree:
Group 

Re: Review Request 34059: HIVE-10673 Dynamically partitioned hash join for Tez

2015-06-22 Thread Jason Dere

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34059/
---

(Updated June 22, 2015, 7:30 p.m.)


Review request for hive, Matt McCline and Vikram Dixit Kumaraswamy.


Changes
---

rebasing patch with trunk again


Bugs: HIVE-10673
https://issues.apache.org/jira/browse/HIVE-10673


Repository: hive-git


Description
---

Reduce-side hash join (using MapJoinOperator), where the Tez inputs to the 
reducer are unsorted.


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 27f68df 
  itests/src/test/resources/testconfiguration.properties 7b7559a 
  ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java 15cafdd 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java d7f1b42 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesAdapter.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValue.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/KeyValuesFromKeyValues.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordProcessor.java 
545d7c6 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordSource.java 
cdabe3a 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java 
e9bd44a 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinCommonOperator.java
 4c8c4b1 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java 
5a87bd6 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java 4d84f0f 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkMapJoinProc.java 
bca91dd 
  ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezProcContext.java adc31ae 
  ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezUtils.java 11c1df6 
  ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezWork.java 6db8220 
  ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java a342738 
  ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java fb3c4a3 
  ql/src/java/org/apache/hadoop/hive/ql/plan/MapJoinDesc.java cee9100 
  ql/src/test/queries/clientpositive/tez_dynpart_hashjoin_1.q PRE-CREATION 
  ql/src/test/queries/clientpositive/tez_dynpart_hashjoin_2.q PRE-CREATION 
  ql/src/test/queries/clientpositive/tez_vector_dynpart_hashjoin_1.q 
PRE-CREATION 
  ql/src/test/results/clientpositive/tez/tez_dynpart_hashjoin_1.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/tez/tez_dynpart_hashjoin_2.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/tez/tez_vector_dynpart_hashjoin_1.q.out 
PRE-CREATION 

Diff: https://reviews.apache.org/r/34059/diff/


Testing
---

q-file tests added


Thanks,

Jason Dere



Added jars are not available after running shell command in hive that uses hive

2015-06-22 Thread Harel Gliksman
Hi,

We encounter strange behavior on Hive 13.0.1 on Amazon EMR:

add jar my.jar //Loading a jar that contains udfs
create temporary function my_func as some.package.Func
!hive -e ;
select  my_func(column1)
from mytable

This results in:

The UDF implementation class 'some.package.Func' is not present in the
class path



We had no such problem when using hive 0.11


[jira] [Created] (HIVE-11078) Enhance DbLockManger to support multi-statement txns

2015-06-22 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-11078:
-

 Summary: Enhance DbLockManger to support multi-statement txns
 Key: HIVE-11078
 URL: https://issues.apache.org/jira/browse/HIVE-11078
 Project: Hive
  Issue Type: Sub-task
  Components: Locking, Transactions
Affects Versions: 1.2.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman


need to build deadlock detection, etc



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11079) Fix qfile tests that fail on Windows due to CR/character escape differences

2015-06-22 Thread Jason Dere (JIRA)
Jason Dere created HIVE-11079:
-

 Summary: Fix qfile tests that fail on Windows due to CR/character 
escape differences
 Key: HIVE-11079
 URL: https://issues.apache.org/jira/browse/HIVE-11079
 Project: Hive
  Issue Type: Bug
  Components: Tests
Reporter: Jason Dere
Assignee: Jason Dere


A few qfile tests are failing on Windows due to a couple of windows-specific 
issues:
- The table comment for the test includes a CR character, which is different on 
Windows compared to Unix.
- The partition path in the test includes a space character. Unlike Unix, on 
Windows space characters in Hive paths are escaped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11080) Modify VectorizedRowBatch.toString() to not depend on VectorExpressionWriter

2015-06-22 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-11080:


 Summary: Modify VectorizedRowBatch.toString() to not depend on 
VectorExpressionWriter
 Key: HIVE-11080
 URL: https://issues.apache.org/jira/browse/HIVE-11080
 Project: Hive
  Issue Type: Sub-task
Reporter: Owen O'Malley
Assignee: Owen O'Malley


Currently the VectorizedRowBatch.toString method uses the 
VectorExpressionWriter to convert the row batch to a string.

Since the string is only used for printing error messages, I'd propose making 
the toString use the types of the vector batch instead of the object inspector.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11077) Add support in parser and wire up to txn manager

2015-06-22 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-11077:
-

 Summary: Add support in parser and wire up to txn manager
 Key: HIVE-11077
 URL: https://issues.apache.org/jira/browse/HIVE-11077
 Project: Hive
  Issue Type: Sub-task
  Components: SQL, Transactions
Affects Versions: 1.2.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11081) Running hive commands from an external command line ( using ! ) makes added UDF disappear

2015-06-22 Thread harel gliksman (JIRA)
harel gliksman created HIVE-11081:
-

 Summary: Running hive commands from an external command line ( 
using ! ) makes added UDF disappear
 Key: HIVE-11081
 URL: https://issues.apache.org/jira/browse/HIVE-11081
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.1
 Environment: Amazon EMR AMI 3.5.0
Reporter: harel gliksman
Priority: Minor


add jar myjar.jar
create temporary function myfunc as '...'
create some table
!hive -e  --some commnd line that uses hive (such as a customized script that 
adds partitions to a table)
use myfunc in a query

This results in The UDF implementation class '...' is not present in the class 
path.

Without the shell command it works,
Even weirder is that running list jars does show the added jar still the 
exception is thrown...  

This did not happen on Hive 0.11



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11070) HCatOutputFormat doesn't clean up

2015-06-22 Thread Tilaye (JIRA)
Tilaye created HIVE-11070:
-

 Summary: HCatOutputFormat doesn't clean up
 Key: HIVE-11070
 URL: https://issues.apache.org/jira/browse/HIVE-11070
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 0.13.1
Reporter: Tilaye
Priority: Minor


The following piece of code creates a resource which blocks the JVM from 
exiting. I'm not sure if it is related but  see a process reaper thread being 
created when the setOutput call is made. JVM exits if that statement is removed.

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hive.hcatalog.mapreduce.HCatOutputFormat;
import org.apache.hive.hcatalog.mapreduce.OutputJobInfo;

public class ReproduceHCatError {

public static void main(String[] args) throws Exception {

Job job = Job.getInstance();
Configuration config = job.getConfiguration();
config.set(hive.metastore.uris, thrift://bd:9083);

OutputJobInfo outputJobInfo = OutputJobInfo.create(null, 
test_table_name, null);
HCatOutputFormat.setOutput(job, outputJobInfo);
}
}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11071) FIx the output of beeline dbinfo command

2015-06-22 Thread Shinichi Yamashita (JIRA)
Shinichi Yamashita created HIVE-11071:
-

 Summary: FIx the output of beeline dbinfo command
 Key: HIVE-11071
 URL: https://issues.apache.org/jira/browse/HIVE-11071
 Project: Hive
  Issue Type: Bug
  Components: Beeline
Reporter: Shinichi Yamashita
Assignee: Shinichi Yamashita


When dbinfo is executed by beeline, it is displayed as follows. 

{code}
0: jdbc:hive2://localhost:10001/ !dbinfo
Error: Method not supported (state=,code=0)
allTablesAreSelectabletrue
Error: Method not supported (state=,code=0)
Error: Method not supported (state=,code=0)
Error: Method not supported (state=,code=0)
getCatalogSeparator   .
getCatalogTerminstance
getDatabaseProductNameApache Hive
getDatabaseProductVersion 2.0.0-SNAPSHOT
getDefaultTransactionIsolation0
getDriverMajorVersion 1
getDriverMinorVersion 1
getDriverName Hive JDBC
...
{code}

The method name of Error is not understood. I fix this output.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)