Re: Support for ORC Table in Shark/Spark

Michael Armbrust Wed, 13 Aug 2014 11:10:41 -0700

I would expect this to work with Spark SQL (available in 1.0+) but there is
a JIRA open to confirm this works SPARK-2883
<https://issues.apache.org/jira/browse/SPARK-2883>.



On Mon, Aug 11, 2014 at 10:23 PM, <[email protected]> wrote:

> Hi all,
>
> Is it possible to use table with ORC format in Shark version 0.9.1 with
> Spark 0.9.2 and Hive version 0.12.0..??
>
> I have tried creating the ORC table in Shark using the below query
>
> *create table orc_table (x int, y string) stored as orc*
>
> create table works, but when I try to insert values from a text table
> containing 2 rows
>
> *insert into table orc_table select * from text_table;*
>
> I get the below exception
>  org.apache.spark.SparkException: Job aborted: Task 3.0:1 failed 4 times
> (most recent failure: Exception failure:
> org.apache.hadoop.hive.ql.metadata.HiveException:
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException):
> Failed to create file
> [/tmp/hive-windfarm/hive_2014-08-08_10-11-21_691_1945292644101251597/_task_tmp.-ext-10000/_tmp.000001_0]
> for [DFSClient_attempt_201408081011_0000_m_000001_0_-341065575_80] on
> client [<machine_ip>], because this file is already being created by
> [DFSClient_attempt_201408081011_0000_m_000001_0_82854889_71] on
> [192.168.22.40]
>          at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2548)
>          at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:2306)
>          at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2235)
>          at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2188)
>          at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:505)
>          at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:354)
>          at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>          at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>          at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
>          at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986)
>          at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982)
>          at java.security.AccessController.doPrivileged(Native Method)
>          at javax.security.auth.Subject.doAs(Subject.java:415)
>          at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
>          at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980)
>  )
>          at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1028)
>          at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1026)
>          at
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>          at
> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>          at org.apache.spark.scheduler.DAGScheduler.org
> $apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1026)
>          at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:619)
>          at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:619)
>          at scala.Option.foreach(Option.scala:236)
>          at
> org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:619)
>          at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:207)
>          at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>          at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>          at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>          at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>          at
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
>          at
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>          at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>          at
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>          at
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>  FAILED: Execution Error, return code -101 from shark.execution.SparkTask
>
>  Any idea how to overcome this..??
>
>
>
>  Thanks and regards
>  Vinay Kashyap
>

Re: Support for ORC Table in Shark/Spark

Reply via email to