Re: Problem with getting start of Hive on Spark

Xuefu Zhang Tue, 01 Dec 2015 05:50:18 -0800

Mich,

As I understand, you have a problem with Hive on Spark due to duel network
interfaces. I agree that this is something that should be fixed in Hive.
However, saying Hive on Spark doesn't work seems unfair. At Cloudera, we
have many customers that successfully deployed Hive on Spark on their
clusters.


As discussed in another thread, we don't have all the bandwidth we like to
answer every user problem. Thus, it's crucial to provided as much
information as possible when reporting a problem. This includes reproducing
steps as well as Hive, Spark, and/ Yarn logs.

Thanks,
Xuefu

On Tue, Dec 1, 2015 at 1:32 AM, Mich Talebzadeh <[email protected]> wrote:

> Hi Link,
>
>
>
> I am afraid it seems that using Spark as the execution engine for Hive
> does not seem to work. I am still trying to make it work.
>
>
>
> An alternative is to use Spark with Hive data set. To be precise
> spark-sql. You set spark to use Hive metastore and then use Hive as the
> heavy DML engine. That will give you the ability to use spark for queries
> that can be done in-memory.
>
>
>
> HTH
>
>
>
> Mich Talebzadeh
>
>
>
> *Sybase ASE 15 Gold Medal Award 2008*
>
> A Winning Strategy: Running the most Critical Financial Data on ASE 15
>
>
> http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf
>
> Author of the books* "A Practitioner’s Guide to Upgrading to Sybase ASE
> 15", ISBN 978-0-9563693-0-7*.
>
> co-author *"Sybase Transact SQL Guidelines Best Practices", ISBN
> 978-0-9759693-0-4*
>
> *Publications due shortly:*
>
> *Complex Event Processing in Heterogeneous Environments*, ISBN:
> 978-0-9563693-3-8
>
> *Oracle and Sybase, Concepts and Contrasts*, ISBN: 978-0-9563693-1-4, volume
> one out shortly
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> NOTE: The information in this email is proprietary and confidential. This
> message is for the designated recipient only, if you are not the intended
> recipient, you should destroy it immediately. Any information in this
> message shall not be understood as given or endorsed by Peridale Technology
> Ltd, its subsidiaries or their employees, unless expressly so stated. It is
> the responsibility of the recipient to ensure that this email is virus
> free, therefore neither Peridale Ltd, its subsidiaries nor their employees
> accept any responsibility.
>
>
>
> *From:* Link Qian [mailto:[email protected]]
> *Sent:* 01 December 2015 00:57
>
> *To:* [email protected]
> *Subject:* RE: Problem with getting start of Hive on Spark
>
>
>
> Hi Mich,
> I set hive execution engine as Spark.
>
> Link Qian
> ------------------------------
>
> From: [email protected]
> To: [email protected]
> Subject: RE: Problem with getting start of Hive on Spark
> Date: Mon, 30 Nov 2015 16:15:31 +0000
>
> To clarify are you running Hive and using Spark as the execution engine
> (as opposed to default Hive execution engine MapReduce)?
>
>
>
> Mich Talebzadeh
>
>
>
> *Sybase ASE 15 Gold Medal Award 2008*
>
> A Winning Strategy: Running the most Critical Financial Data on ASE 15
>
>
> http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf
>
> Author of the books* "A Practitioner’s Guide to Upgrading to Sybase ASE
> 15", ISBN 978-0-9563693-0-7*.
>
> co-author *"Sybase Transact SQL Guidelines Best Practices", ISBN
> 978-0-9759693-0-4*
>
> *Publications due shortly:*
>
> *Complex Event Processing in Heterogeneous Environments*, ISBN:
> 978-0-9563693-3-8
>
> *Oracle and Sybase, Concepts and Contrasts*, ISBN: 978-0-9563693-1-4, volume
> one out shortly
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> NOTE: The information in this email is proprietary and confidential. This
> message is for the designated recipient only, if you are not the intended
> recipient, you should destroy it immediately. Any information in this
> message shall not be understood as given or endorsed by Peridale Technology
> Ltd, its subsidiaries or their employees, unless expressly so stated. It is
> the responsibility of the recipient to ensure that this email is virus
> free, therefore neither Peridale Ltd, its subsidiaries nor their employees
> accept any responsibility.
>
>
>
> *From:* Link Qian [mailto:[email protected] <[email protected]>]
>
> *Sent:* 30 November 2015 13:21
> *To:* [email protected]
> *Subject:* Problem with getting start of Hive on Spark
>
>
>
> Hello,
>
> Following the Hive wiki page,
> https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started
> <https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark:+Getting+Started>,
> I got a several fails that execute HQL based on Spark engine with yarn. I
> have hadoop-2.6.2, yarn-2.6.2 and Spark-1.5.2.
> The fails got either spark-1.5.2-hadoop2.6 distribution version or
> spark-1.5.2-without-hive customer compiler version with instruction on that
> wiki page.
>
> Hive cli submits spark job but the job runs a short time and RM web app
> shows the job is successfully.  but hive cli show the job fails.
>
> Here is a snippet of hive cli debug log. any suggestion?
>
>
> 15/11/30 07:31:36 [main]: INFO status.SparkJobMonitor: state = SENT
> 15/11/30 07:31:37 [stderr-redir-1]: INFO client.SparkClientImpl: 15/11/30
> 07:31:37 INFO yarn.Client: Application report for
> application_1448886638370_0001 (state: RUNNING)
> 15/11/30 07:31:37 [stderr-redir-1]: INFO client.SparkClientImpl: 15/11/30
> 07:31:37 INFO yarn.Client:
> 15/11/30 07:31:37 [stderr-redir-1]: INFO client.SparkClientImpl:
> client token: N/A
> 15/11/30 07:31:37 [stderr-redir-1]: INFO client.SparkClientImpl:
> diagnostics: N/A
> 15/11/30 07:31:37 [stderr-redir-1]: INFO client.SparkClientImpl:
> ApplicationMaster host: 192.168.1.12
> 15/11/30 07:31:37 [stderr-redir-1]: INFO client.SparkClientImpl:
> ApplicationMaster RPC port: 0
> 15/11/30 07:31:37 [stderr-redir-1]: INFO client.SparkClientImpl:
> queue: default
> 15/11/30 07:31:37 [stderr-redir-1]: INFO client.SparkClientImpl:
> start time: 1448886649489
> 15/11/30 07:31:37 [stderr-redir-1]: INFO client.SparkClientImpl:
> final status: UNDEFINED
> 15/11/30 07:31:37 [stderr-redir-1]: INFO client.SparkClientImpl:
> tracking URL:
> http://namenode.localdomain:8088/proxy/application_1448886638370_0001/
> 15/11/30 07:31:37 [stderr-redir-1]: INFO client.SparkClientImpl:
> user: hadoop
> 15/11/30 07:31:37 [stderr-redir-1]: INFO client.SparkClientImpl: 15/11/30
> 07:31:37 INFO cluster.YarnClientSchedulerBackend: Application
> application_1448886638370_0001 has started running.
> 15/11/30 07:31:37 [stderr-redir-1]: INFO client.SparkClientImpl: 15/11/30
> 07:31:37 INFO util.Utils: Successfully started service
> 'org.apache.spark.network.netty.NettyBlockTransferService' on port 51326.
> 15/11/30 07:31:37 [stderr-redir-1]: INFO client.SparkClientImpl: 15/11/30
> 07:31:37 INFO netty.NettyBlockTransferService: Server created on 51326
> 15/11/30 07:31:37 [stderr-redir-1]: INFO client.SparkClientImpl: 15/11/30
> 07:31:37 INFO storage.BlockManagerMaster: Trying to register BlockManager
> 15/11/30 07:31:37 [stderr-redir-1]: INFO client.SparkClientImpl: 15/11/30
> 07:31:37 INFO storage.BlockManagerMasterEndpoint: Registering block manager
> 192.168.1.10:51326 with 66.8 MB RAM, BlockManagerId(driver, 192.168.1.10,
> 51326)
> 15/11/30 07:31:37 [stderr-redir-1]: INFO client.SparkClientImpl: 15/11/30
> 07:31:37 INFO storage.BlockManagerMaster: Registered BlockManager
> state = SENT
> 15/11/30 07:31:37 [main]: INFO status.SparkJobMonitor: state = SENT
> 15/11/30 07:31:37 [stderr-redir-1]: INFO client.SparkClientImpl: 15/11/30
> 07:31:37 INFO cluster.YarnClientSchedulerBackend: SchedulerBackend is ready
> for scheduling beginning after waiting maxRegisteredResourcesWaitingTime:
> 30000(ms)
> 15/11/30 07:31:37 [RPC-Handler-3]: DEBUG rpc.KryoMessageCodec: Decoded
> message of type org.apache.hive.spark.client.rpc.Rpc$MessageHeader (5 bytes)
> 15/11/30 07:31:37 [RPC-Handler-3]: DEBUG rpc.KryoMessageCodec: Decoded
> message of type java.lang.Integer (2 bytes)
> 15/11/30 07:31:37 [RPC-Handler-3]: DEBUG rpc.RpcDispatcher:
> [ClientProtocol] Received RPC message: type=REPLY id=0
> payload=java.lang.Integer
> *15/11/30 07:31:37 [stderr-redir-1]: INFO client.SparkClientImpl: 15/11/30
> 07:31:37 INFO spark.SparkContext: Added JAR
> file:/home/hadoop/apache-hive-1.2.1-bin/lib/hive-exec-1.2.1.jar at
> http://192.168.1.10:41276/jars/hive-exec-1.2.1.jar
> <http://192.168.1.10:41276/jars/hive-exec-1.2.1.jar> with timestamp
> 1448886697575*
> 15/11/30 07:31:37 [RPC-Handler-3]: DEBUG rpc.KryoMessageCodec: Decoded
> message of type org.apache.hive.spark.client.rpc.Rpc$MessageHeader (5 bytes)
> 15/11/30 07:31:37 [RPC-Handler-3]: DEBUG rpc.KryoMessageCodec: Decoded
> message of type org.apache.hive.spark.client.rpc.Rpc$NullMessage (2 bytes)
> 15/11/30 07:31:37 [RPC-Handler-3]: DEBUG rpc.RpcDispatcher:
> [ClientProtocol] Received RPC message: type=REPLY id=1
> payload=org.apache.hive.spark.client.rpc.Rpc$NullMessage
> 15/11/30 07:31:37 [stderr-redir-1]: INFO client.SparkClientImpl: 15/11/30
> 07:31:37 INFO rpc.RpcDispatcher: [DriverProtocol] Closing channel due to
> exception in pipeline (java.lang.NoClassDefFoundError:
> org/apache/hive/spark/client/Job).
> 15/11/30 07:31:37 [RPC-Handler-3]: DEBUG rpc.KryoMessageCodec: Decoded
> message of type org.apache.hive.spark.client.rpc.Rpc$MessageHeader (5 bytes)
> 15/11/30 07:31:37 [RPC-Handler-3]: DEBUG rpc.KryoMessageCodec: Decoded
> message of type java.lang.String (3720 bytes)
> 15/11/30 07:31:37 [RPC-Handler-3]: DEBUG rpc.RpcDispatcher:
> [ClientProtocol] Received RPC message: type=ERROR id=2
> payload=java.lang.String
> 15/11/30 07:31:37 [RPC-Handler-3]: WARN rpc.RpcDispatcher: Received error
> message:io.netty.handler.codec.DecoderException:
> java.lang.NoClassDefFoundError: org/apache/hive/spark/client/Job
>     at
> io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:358)
>     at
> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:230)
>     at
> io.netty.handler.codec.ByteToMessageCodec.channelRead(ByteToMessageCodec.java:103)
>     at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
>     at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
>     at
> io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
>     at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
>     at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
>     at
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)
>     at
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
>     at
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
>     at
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
>     at
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
>     at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
>     at
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
>     at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.NoClassDefFoundError: org/apache/hive/spark/client/Job
>     at java.lang.ClassLoader.defineClass1(Native Method)
>     at java.lang.ClassLoader.defineClass(ClassLoader.java:760)
>     at
> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
>     at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
>     at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:411)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>     at java.lang.Class.forName0(Native Method)
>     at java.lang.Class.forName(Class.java:348)
>     at
> org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:136)
>     at
> org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:115)
>     at
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:656)
>     at
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:99)
>     at
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
>     at
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
>     at
> org.apache.hive.spark.client.rpc.KryoMessageCodec.decode(KryoMessageCodec.java:96)
>     at
> io.netty.handler.codec.ByteToMessageCodec$1.decode(ByteToMessageCodec.java:42)
>     at
> io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:327)
>     ... 15 more
> Caused by: java.lang.ClassNotFoundException:
> org.apache.hive.spark.client.Job
>     at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>     ... 39 more
> .
> 15/11/30 07:31:37 [stderr-redir-1]: INFO client.SparkClientImpl: 15/11/30
> 07:31:37 WARN client.RemoteDriver: Shutting down driver because RPC channel
> was closed.
> 15/11/30 07:31:37 [stderr-redir-1]: INFO client.SparkClientImpl: 15/11/30
> 07:31:37 INFO client.RemoteDriver: Shutting down remote driver.
> 15/11/30 07:31:37 [RPC-Handler-3]: WARN client.SparkClientImpl: Client RPC
> channel closed unexpectedly.
>
>
> best regards,
> Link Qian
>

Re: Problem with getting start of Hive on Spark

Reply via email to