I'm getting an error in Hive when executing a query on a table in ORC format.
After several trials, I succeeded to run the same query on the same table in
TEXTFILE format.
I 've been able to reproduce the error with the simple sql script below.
I create the same table in TEXFILE and in ORC and I run a SELECT ...GROUP BY on
the tables.
The first SELECT issued on the TEXTFILE table succeeds.
The second SELECT issued on the ORC table fails.
NB : There is a CONCAT in the query. If I remove the CONCAT the query is
running ok with both tables ...
Example script to reproduce the error :
USE pvr_temp;
DROP TABLE IF EXISTS students_text;
CREATE TABLE students_text (name VARCHAR(64), age INT, datetime TIMESTAMP, gpa
DECIMAL(3, 2)) STORED AS TEXTFILE;
INSERT INTO TABLE students_text VALUES ('fred flintstone', 35, '2015-04-13
13:40:00', 1.28), ('barney rubble', 32, '2015-04-13 13:40:00', 2.32);
SELECT CONCAT(TO_DATE(datetime), '-'), SUM(gpa) FROM students_text GROUP BY
CONCAT(TO_DATE(datetime), '-');
DROP TABLE IF EXISTS students_orc;
CREATE TABLE students_orc (name VARCHAR(64), age INT, datetime TIMESTAMP, gpa
DECIMAL(3, 2)) STORED AS ORC;
INSERT INTO TABLE students_orc VALUES ('fred flintstone', 35, '2015-04-13
SELECT CONCAT(TO_DATE(datetime), '-'), SUM(gpa) FROM students_orc GROUP BY
CONCAT(TO_DATE(datetime), '-');
13:40:00', 1.28), ('barney rubble', 32, '2015-04-13 13:40:00', 2.32);
Log where you can see the error :
[pvr@tpcalr01s ~]$ cat test.log
scan complete in 9ms
Connecting to jdbc:hive2://tpcrmm03s:10000
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
[jar:file:/usr/hdp/2.2.0.0-2041/hadoop/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/usr/hdp/2.2.0.0-2041/hive/lib/hive-jdbc-0.14.0.2.2.0.0-2041-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Connected to: Apache Hive (version 0.14.0.2.2.0.0-2041)
Driver: Hive JDBC (version 0.14.0.2.2.0.0-2041)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://tpcrmm03s:10000> USE pvr_temp;
No rows affected (0.061 seconds)
0: jdbc:hive2://tpcrmm03s:10000> DROP TABLE IF EXISTS students_text;
No rows affected (0.12 seconds)
0: jdbc:hive2://tpcrmm03s:10000> CREATE TABLE students_text (name VARCHAR(64),
age INT, datetime TIMESTAMP, gpa DECIMAL(3, 2)) STORED AS TEXTFILE;
No rows affected (0.057 seconds)
0: jdbc:hive2://tpcrmm03s:10000> INSERT INTO TABLE students_text VALUES ('fred
flintstone', 35, '2015-04-13 13:40:00', 1.28), ('barney rubble', 32,
'2015-04-13 13:40:00', 2.32);
INFO : Tez session hasn't been created yet. Opening session
INFO :
INFO : Status: Running (Executing on YARN cluster with App id
application_1428656093356_0047)
INFO : Map 1: -/-
INFO : Map 1: 0/1
No rows affected (14.134 seconds)
INFO : Map 1: 0/1
INFO : Map 1: 0(+1)/1
INFO : Map 1: 0(+1)/1
INFO : Map 1: 1/1
INFO : Loading data to table pvr_temp.students_text from
hdfs://tpcrmm01s.priv.atos.fr:8020/tmp/hive/hive/bf19c354-de67-45ae-a3e4-cd57d81acd71/hive_2015-04-13_14-15-08_445_2811483497310651606-20/-ext-10000
INFO : Table pvr_temp.students_text stats: [numFiles=1, numRows=2,
totalSize=86, rawDataSize=84]
0: jdbc:hive2://tpcrmm03s:10000> SELECT CONCAT(TO_DATE(datetime), '-'),
SUM(gpa) FROM students_text GROUP BY CONCAT(TO_DATE(datetime), '-');
INFO : Session is already open
INFO :
INFO : Status: Running (Executing on YARN cluster with App id
application_1428656093356_0047)
INFO : Map 1: -/- Reducer 2: 0/1
INFO : Map 1: 0/1 Reducer 2: 0/1
INFO : Map 1: 0(+1)/1 Reducer 2: 0/1
INFO : Map 1: 1/1 Reducer 2: 0(+1)/1
INFO : Map 1: 1/1 Reducer 2: 1/1
+--------------+------+--+
| _c0 | _c1 |
+--------------+------+--+
| 2015-04-13- | 3.6 |
+--------------+------+--+
1 row selected (3.258 seconds)
0: jdbc:hive2://tpcrmm03s:10000> DROP TABLE IF EXISTS students_orc;
No rows affected (0.109 seconds)
0: jdbc:hive2://tpcrmm03s:10000> CREATE TABLE students_orc (name VARCHAR(64),
age INT, datetime TIMESTAMP, gpa DECIMAL(3, 2)) STORED AS ORC;
No rows affected (0.063 seconds)
0: jdbc:hive2://tpcrmm03s:10000> INSERT INTO TABLE students_orc VALUES ('fred
flintstone', 35, '2015-04-13 13:40:00', 1.28), ('barney rubble', 32,
'2015-04-13 13:40:00', 2.32);
No rows affected (2.125 seconds)
INFO : Session is already open
INFO :
INFO : Status: Running (Executing on YARN cluster with App id
application_1428656093356_0047)
INFO : Map 1: 0/1
INFO : Map 1: 0(+1)/1
INFO : Map 1: 1/1
INFO : Loading data to table pvr_temp.students_orc from
hdfs://tpcrmm01s.priv.atos.fr:8020/tmp/hive/hive/bf19c354-de67-45ae-a3e4-cd57d81acd71/hive_2015-04-13_14-15-26_056_1247475009666467472-20/-ext-10000
INFO : Table pvr_temp.students_orc stats: [numFiles=1, numRows=2,
totalSize=590, rawDataSize=508]
0: jdbc:hive2://tpcrmm03s:10000> SELECT CONCAT(TO_DATE(datetime), '-'),
SUM(gpa) FROM students_orc GROUP BY CONCAT(TO_DATE(datetime), '-');
INFO : Session is already open
INFO :
INFO : Status: Running (Executing on YARN cluster with App id
application_1428656093356_0047)
INFO : Map 1: -/- Reducer 2: 0/1
INFO : Map 1: 0(+1)/1 Reducer 2: 0/1
INFO : Map 1: 0(+1,-1)/1 Reducer 2: 0/1
INFO : Map 1: 0(+1,-1)/1 Reducer 2: 0/1
INFO : Map 1: 0(+1,-2)/1 Reducer 2: 0/1
INFO : Map 1: 0(+1,-2)/1 Reducer 2: 0/1
INFO : Map 1: 0(+1,-3)/1 Reducer 2: 0/1
INFO : Map 1: 0(+1,-3)/1 Reducer 2: 0/1
ERROR : Status: Failed
ERROR : Vertex failed, vertexName=Map 1,
vertexId=vertex_1428656093356_0047_4_00, diagnostics=[Task failed,
taskId=task_1428656093356_0047_4_00_000000, diagnostics=[TaskAttempt 0 failed,
info=[Error: Failure while running task:java.lang.RuntimeException:
java.lang.RuntimeException: Map operator initialization failed
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:186)
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:138)
at
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: Map operator initialization failed
at
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:232)
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:162)
... 13 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unsuported vector
output type: StringGroup
at
org.apache.hadoop.hive.ql.exec.vector.VectorColumnSetInfo.addKey(VectorColumnSetInfo.java:139)
at
org.apache.hadoop.hive.ql.exec.vector.VectorHashKeyWrapperBatch.compileKeyWrapperBatch(VectorHashKeyWrapperBatch.java:521)
at
org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator.initializeOp(VectorGroupByOperator.java:786)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:469)
at
org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:425)
at
org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.initializeOp(VectorSelectOperator.java:105)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:469)
at
org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:425)
at
org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:193)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
at
org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:427)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
at
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:201)
... 14 more
], TaskAttempt 1 failed, info=[Error: Failure while running
task:java.lang.RuntimeException: java.lang.RuntimeException: Map operator
initialization failed
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:186)
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:138)
at
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: Map operator initialization failed
at
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:232)
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:162)
... 13 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unsuported vector
output type: StringGroup
at
org.apache.hadoop.hive.ql.exec.vector.VectorColumnSetInfo.addKey(VectorColumnSetInfo.java:139)
at
org.apache.hadoop.hive.ql.exec.vector.VectorHashKeyWrapperBatch.compileKeyWrapperBatch(VectorHashKeyWrapperBatch.java:521)
at
org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator.initializeOp(VectorGroupByOperator.java:786)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:469)
at
org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:425)
at
org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.initializeOp(VectorSelectOperator.java:105)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:469)
at
org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:425)
at
org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:193)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
at
org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:427)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
at
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:201)
... 14 more
], TaskAttempt 2 failed, info=[Error: Failure while running
task:java.lang.RuntimeException: java.lang.RuntimeException: Map operator
initialization failed
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:186)
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:138)
at
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: Map operator initialization failed
at
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:232)
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:162)
... 13 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unsuported vector
output type: StringGroup
at
org.apache.hadoop.hive.ql.exec.vector.VectorColumnSetInfo.addKey(VectorColumnSetInfo.java:139)
at
org.apache.hadoop.hive.ql.exec.vector.VectorHashKeyWrapperBatch.compileKeyWrapperBatch(VectorHashKeyWrapperBatch.java:521)
at
org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator.initializeOp(VectorGroupByOperator.java:786)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:469)
at
org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:425)
at
org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.initializeOp(VectorSelectOperator.java:105)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:469)
at
org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:425)
at
org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:193)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
at
org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:427)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
at
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:201)
... 14 more
], TaskAttempt 3 failed, info=[Error: Failure while running
task:java.lang.RuntimeException: java.lang.RuntimeException: Map operator
initialization failed
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:186)
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:138)
at
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: Map operator initialization failed
at
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:232)
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:162)
... 13 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unsuported vector
output type: StringGroup
at
org.apache.hadoop.hive.ql.exec.vector.VectorColumnSetInfo.addKey(VectorColumnSetInfo.java:139)
at
org.apache.hadoop.hive.ql.exec.vector.VectorHashKeyWrapperBatch.compileKeyWrapperBatch(VectorHashKeyWrapperBatch.java:521)
at
org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator.initializeOp(VectorGroupByOperator.java:786)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:469)
at
org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:425)
at
org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.initializeOp(VectorSelectOperator.java:105)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:469)
at
org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:425)
at
org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:193)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
at
org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:427)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
at
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:201)
... 14 more
]], Vertex failed as one or more tasks failed. failedTasks:1, Vertex
vertex_1428656093356_0047_4_00 [Map 1] killed/failed due to:null]
ERROR : Vertex killed, vertexName=Reducer 2,
vertexId=vertex_1428656093356_0047_4_01, diagnostics=[Vertex received Kill
while in RUNNING state., Vertex killed as other vertex failed. failedTasks:0,
Vertex vertex_1428656093356_0047_4_01 [Reducer 2] killed/failed due to:null]
ERROR : DAG failed due to vertex failure. failedVertices:1 killedVertices:1
Error: Error while processing statement: FAILED: Execution Error, return code 2
from org.apache.hadoop.hive.ql.exec.tez.TezTask (state=08S01,code=2)
Closing: 0: jdbc:hive2://tpcrmm03s:10000
________________________________
Ce message et les pi?ces jointes sont confidentiels et r?serv?s ? l'usage
exclusif de ses destinataires. Il peut ?galement ?tre prot?g? par le secret
professionnel. Si vous recevez ce message par erreur, merci d'en avertir
imm?diatement l'exp?diteur et de le d?truire. L'int?grit? du message ne pouvant
?tre assur?e sur Internet, la responsabilit? de Worldline ne pourra ?tre
recherch?e quant au contenu de ce message. Bien que les meilleurs efforts
soient faits pour maintenir cette transmission exempte de tout virus,
l'exp?diteur ne donne aucune garantie ? cet ?gard et sa responsabilit? ne
saurait ?tre recherch?e pour tout dommage r?sultant d'un virus transmis.
This e-mail and the documents attached are confidential and intended solely for
the addressee; it may also be privileged. If you receive this e-mail in error,
please notify the sender immediately and destroy it. As its integrity cannot be
secured on the Internet, the Worldline liability cannot be triggered for the
message content. Although the sender endeavours to maintain a computer
virus-free network, the sender does not warrant that this transmission is
virus-free and will not be liable for any damages resulting from any virus
transmitted.