Re: Tez / Orc / S3

Mcbride, Neil Wed, 11 Nov 2015 02:50:18 -0800

Hi,

I tried the following:

CREATE EXTERNAL TABLE ocs_test (
 cols...
) STORED AS ORC
LOCATION 's3a://<bucketname>/orc/'
TBLPROPERTIES ('orc.compress'='SNAPPY')
;

select subscriber_id from ocs_test limit 100

java.io.IOException: java.lang.RuntimeException: serious problem

I did also try the .jar file supplied. Here's the full log output - divide
by zero error:

INFO  : Session is already open
INFO  : Tez session was closed. Reopening...
INFO  : Session re-established.
INFO  :

INFO  : Status: Running (Executing on YARN cluster with App id
application_1447235341036_0003)

INFO  : Map 1: -/-
INFO  : Map 1: -/-
ERROR : Status: Failed
ERROR : Vertex failed, vertexName=Map 1,
vertexId=vertex_1447235341036_0003_1_00, diagnostics=[Vertex
vertex_1447235341036_0003_1_00 [Map 1] killed/failed due
to:ROOT_INPUT_INIT_FAILURE, Vertex Input: comverse.ocs_test initializer
failed, vertex=vertex_1447235341036_0003_1_00 [Map 1],
java.lang.RuntimeException: serious problem
at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$Context.waitForTasks(OrcInputFormat.java:478)
at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:949)
at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:974)
at
org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:298)
at
org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplitsInternal(HiveInputFormat.java:412)
at
org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:330)
at
org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:129)
at
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:245)
at
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:239)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:239)
at
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:226)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ArithmeticException: / by zero
at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.run(OrcInputFormat.java:830)
... 3 more
]
ERROR : DAG did not succeed due to VERTEX_FAILURE. failedVertices:1
killedVertices:0

On 10 November 2015 at 20:38, Gopal Vijayaraghavan <[email protected]>
wrote:

>
> > Tez is not officially supported on EMR yet but they provide bootstraps
> >for Tez 0.7. I've been using it for a month (via Hue 3.7.1) and it's
> >absolutely fine.
>
> Point me to it (off-list?), I didn't see one at emr-bootstrap-actions.
>
> > I've recreated my table as s3a and am currently inserting 670m records
> >into it. Will feedback but it sounds like I will need to wait for AWS to
> >move to Hive 2.0 to get the full benefits.
>
> Rajesh has got a backwards port of the seek improvements as a new S3A FS
> impl, for drop-in replacement for those who can't wait for the next
> release to try it out.
>
> https://github.com/rajeshbalamohan/hadoop-aws
>
>
> Cheers,
> Gopal
>
>
>

-- 
Regards
Neil

Re: Tez / Orc / S3

Reply via email to