Re: Shuffle service fails to register driver - Spark - Mesos
Never mind, just figured out my problem, I was running: *deploy.ExternalShuffleService* instead of *deploy.mesos.MesosExternalShuffleService* - Jo Voordeckers On Fri, Apr 15, 2016 at 2:29 PM, Jo Voordeckers <jo.voordeck...@gmail.com> wrote: > Forgot to mention we're running Spark (Streaming) 1.5.1 > > - Jo Voordeckers > > > On Fri, Apr 15, 2016 at 12:21 PM, Jo Voordeckers <jo.voordeck...@gmail.com > > wrote: > >> Hi all, >> >> I've got mesos in coarse grained mode with dyn alloc, shuffle service >> enabled and am running the shuffle service on every mesos slave. >> >> I'm assuming I misconfigured something on the scheduler service, any >> ideas? >> >> On my driver is see a few of these, I guess it's one for every executor : >> >> 19:12:29 WARN [shuffle-client-1] MesosExternalShuffleClient - Unable to >>> register app 78a944c9-3a89-4334-bca3-7108aadb1798- with external >>> shuffle service. Please manually remove shuffle data after driver exit. >>> Error: java.lang.RuntimeException: java.lang.UnsupportedOperationException: >>> Unexpected message: >>> org.apache.spark.network.shuffle.protocol.mesos.RegisterDriver@c399c59a >>> at >>> org.apache.spark.network.shuffle.ExternalShuffleBlockHandler.handleMessage(ExternalShuffleBlockHandler.java:92) >>> at >>> org.apache.spark.network.shuffle.ExternalShuffleBlockHandler.receive(ExternalShuffleBlockHandler.java:68) >>> at >>> org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:114) >>> at >>> org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:87) >>> at >>> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:101) >>> at >>> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51) >>> at >>> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) >>> at >>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308) >>> at >>> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294) >>> >> [...] >> >> In the scheduler service I see logs like this on every box: >> >> log4j:WARN No appenders could be found for logger >>> (org.apache.hadoop.metrics2.lib.MutableMetricsFactory). >>> log4j:WARN Please initialize the log4j system properly. >>> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig >>> for more info. >>> Using Spark's repl log4j profile: >>> org/apache/spark/log4j-defaults-repl.properties >>> To adjust logging level use sc.setLogLevel("INFO") >>> 16/04/14 19:12:29 ERROR TransportRequestHandler: Error while invoking >>> RpcHandler#receive() on RPC id 7280403447531815366 >>> java.lang.UnsupportedOperationException: Unexpected message: >>> org.apache.spark.network.shuffle.protocol.mesos.RegisterDriver@c399c59a >>> at >>> org.apache.spark.network.shuffle.ExternalShuffleBlockHandler.handleMessage(ExternalShuffleBlockHandler.java:92) >>> at >>> org.apache.spark.network.shuffle.ExternalShuffleBlockHandler.receive(ExternalShuffleBlockHandler.java:68) >>> at >>> org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:114) >>> at >>> org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:87) >>> at >>> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:101) >>> at >>> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51) >>> at >>> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) >>> at >>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308) >>> >> [...] >> >> Thanks! >> >> - Jo Voordeckers >> >> >
Re: Shuffle service fails to register driver - Spark - Mesos
Forgot to mention we're running Spark (Streaming) 1.5.1 - Jo Voordeckers On Fri, Apr 15, 2016 at 12:21 PM, Jo Voordeckers <jo.voordeck...@gmail.com> wrote: > Hi all, > > I've got mesos in coarse grained mode with dyn alloc, shuffle service > enabled and am running the shuffle service on every mesos slave. > > I'm assuming I misconfigured something on the scheduler service, any ideas? > > On my driver is see a few of these, I guess it's one for every executor : > > 19:12:29 WARN [shuffle-client-1] MesosExternalShuffleClient - Unable to >> register app 78a944c9-3a89-4334-bca3-7108aadb1798- with external >> shuffle service. Please manually remove shuffle data after driver exit. >> Error: java.lang.RuntimeException: java.lang.UnsupportedOperationException: >> Unexpected message: >> org.apache.spark.network.shuffle.protocol.mesos.RegisterDriver@c399c59a >> at >> org.apache.spark.network.shuffle.ExternalShuffleBlockHandler.handleMessage(ExternalShuffleBlockHandler.java:92) >> at >> org.apache.spark.network.shuffle.ExternalShuffleBlockHandler.receive(ExternalShuffleBlockHandler.java:68) >> at >> org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:114) >> at >> org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:87) >> at >> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:101) >> at >> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51) >> at >> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) >> at >> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308) >> at >> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294) >> > [...] > > In the scheduler service I see logs like this on every box: > > log4j:WARN No appenders could be found for logger >> (org.apache.hadoop.metrics2.lib.MutableMetricsFactory). >> log4j:WARN Please initialize the log4j system properly. >> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for >> more info. >> Using Spark's repl log4j profile: >> org/apache/spark/log4j-defaults-repl.properties >> To adjust logging level use sc.setLogLevel("INFO") >> 16/04/14 19:12:29 ERROR TransportRequestHandler: Error while invoking >> RpcHandler#receive() on RPC id 7280403447531815366 >> java.lang.UnsupportedOperationException: Unexpected message: >> org.apache.spark.network.shuffle.protocol.mesos.RegisterDriver@c399c59a >> at >> org.apache.spark.network.shuffle.ExternalShuffleBlockHandler.handleMessage(ExternalShuffleBlockHandler.java:92) >> at >> org.apache.spark.network.shuffle.ExternalShuffleBlockHandler.receive(ExternalShuffleBlockHandler.java:68) >> at >> org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:114) >> at >> org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:87) >> at >> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:101) >> at >> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51) >> at >> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) >> at >> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308) >> > [...] > > Thanks! > > - Jo Voordeckers > >
Shuffle service fails to register driver - Spark - Mesos
Hi all, I've got mesos in coarse grained mode with dyn alloc, shuffle service enabled and am running the shuffle service on every mesos slave. I'm assuming I misconfigured something on the scheduler service, any ideas? On my driver is see a few of these, I guess it's one for every executor : 19:12:29 WARN [shuffle-client-1] MesosExternalShuffleClient - Unable to > register app 78a944c9-3a89-4334-bca3-7108aadb1798- with external > shuffle service. Please manually remove shuffle data after driver exit. > Error: java.lang.RuntimeException: java.lang.UnsupportedOperationException: > Unexpected message: > org.apache.spark.network.shuffle.protocol.mesos.RegisterDriver@c399c59a > at > org.apache.spark.network.shuffle.ExternalShuffleBlockHandler.handleMessage(ExternalShuffleBlockHandler.java:92) > at > org.apache.spark.network.shuffle.ExternalShuffleBlockHandler.receive(ExternalShuffleBlockHandler.java:68) > at > org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:114) > at > org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:87) > at > org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:101) > at > org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51) > at > io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294) > [...] In the scheduler service I see logs like this on every box: log4j:WARN No appenders could be found for logger > (org.apache.hadoop.metrics2.lib.MutableMetricsFactory). > log4j:WARN Please initialize the log4j system properly. > log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for > more info. > Using Spark's repl log4j profile: > org/apache/spark/log4j-defaults-repl.properties > To adjust logging level use sc.setLogLevel("INFO") > 16/04/14 19:12:29 ERROR TransportRequestHandler: Error while invoking > RpcHandler#receive() on RPC id 7280403447531815366 > java.lang.UnsupportedOperationException: Unexpected message: > org.apache.spark.network.shuffle.protocol.mesos.RegisterDriver@c399c59a > at > org.apache.spark.network.shuffle.ExternalShuffleBlockHandler.handleMessage(ExternalShuffleBlockHandler.java:92) > at > org.apache.spark.network.shuffle.ExternalShuffleBlockHandler.receive(ExternalShuffleBlockHandler.java:68) > at > org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:114) > at > org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:87) > at > org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:101) > at > org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51) > at > io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308) > [...] Thanks! - Jo Voordeckers
Re: Mesos cluster dispatcher doesn't respect most args from the submit req
On Tue, Nov 17, 2015 at 5:16 AM, Iulian DragoČ™ <iulian.dra...@typesafe.com> wrote: > I think it actually tries to send all properties as part of > `SPARK_EXECUTOR_OPTS`, which may not be everything that's needed: > > > https://github.com/jayv/spark/blob/mesos_cluster_params/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala#L375-L377 > > Aha that's interesting, I overlooked that line, I'll debug some more today because I know for sure that those options don't make it onto the commandline when I was running it in my debugger. > Can you please open a Jira ticket and describe also the symptoms? This > might be related, or the same issue: SPARK-11280 > <https://issues.apache.org/jira/browse/SPARK-11280> and also SPARK-11327 > <https://issues.apache.org/jira/browse/SPARK-11327> > SPARK-11327 <https://issues.apache.org/jira/browse/SPARK-11327> is exactly my problem, but I don't run docker. - Jo On Tue, Nov 17, 2015 at 2:46 AM, Jo Voordeckers <jo.voordeck...@gmail.com> > wrote: > >> >> Hi all, >> >> I'm running the mesos cluster dispatcher, however when I submit jobs with >> things like jvm args, classpath order and UI port aren't added to the >> commandline executed by the mesos scheduler. In fact it only cares about >> the class, jar and num cores/mem. >> >> >> https://github.com/jayv/spark/blob/mesos_cluster_params/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala#L412-L424 >> >> I've made an attempt at adding a few of the args that I believe are >> useful to the MesosClusterScheduler class, which seems to solve my problem. >> >> Please have a look: >> >> https://github.com/apache/spark/pull/9752 >> >> Thanks >> >> - Jo Voordeckers >> >> > > > -- > > -- > Iulian Dragos > > -- > Reactive Apps on the JVM > www.typesafe.com > >
Re: Mesos cluster dispatcher doesn't respect most args from the submit req
Hi Tim, I've done more forensics on this bug, see my comment here: https://issues.apache.org/jira/browse/SPARK-11327?focusedCommentId=15009843=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15009843 - Jo Voordeckers On Tue, Nov 17, 2015 at 12:01 PM, Timothy Chen <tnac...@gmail.com> wrote: > Hi Jo, > > Thanks for the links, I would expected the properties to be in > scheduler properties but I need to double check. > > I'll be looking into these problems this week. > > Tim > > On Tue, Nov 17, 2015 at 10:28 AM, Jo Voordeckers > <jo.voordeck...@gmail.com> wrote: > > On Tue, Nov 17, 2015 at 5:16 AM, Iulian DragoČ™ < > iulian.dra...@typesafe.com> > > wrote: > >> > >> I think it actually tries to send all properties as part of > >> `SPARK_EXECUTOR_OPTS`, which may not be everything that's needed: > >> > >> > >> > https://github.com/jayv/spark/blob/mesos_cluster_params/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala#L375-L377 > >> > > > > Aha that's interesting, I overlooked that line, I'll debug some more > today > > because I know for sure that those options don't make it onto the > > commandline when I was running it in my debugger. > > > >> > >> Can you please open a Jira ticket and describe also the symptoms? This > >> might be related, or the same issue: SPARK-11280 and also SPARK-11327 > > > > > > SPARK-11327 is exactly my problem, but I don't run docker. > > > > - Jo > > > >> On Tue, Nov 17, 2015 at 2:46 AM, Jo Voordeckers < > jo.voordeck...@gmail.com> > >> wrote: > >>> > >>> > >>> Hi all, > >>> > >>> I'm running the mesos cluster dispatcher, however when I submit jobs > with > >>> things like jvm args, classpath order and UI port aren't added to the > >>> commandline executed by the mesos scheduler. In fact it only cares > about the > >>> class, jar and num cores/mem. > >>> > >>> > >>> > https://github.com/jayv/spark/blob/mesos_cluster_params/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala#L412-L424 > >>> > >>> I've made an attempt at adding a few of the args that I believe are > >>> useful to the MesosClusterScheduler class, which seems to solve my > problem. > >>> > >>> Please have a look: > >>> > >>> https://github.com/apache/spark/pull/9752 > >>> > >>> Thanks > >>> > >>> - Jo Voordeckers > >>> > >> > >> > >> > >> -- > >> > >> -- > >> Iulian Dragos > >> > >> -- > >> Reactive Apps on the JVM > >> www.typesafe.com > >> > > >
Re: Conf Settings in Mesos
I have run in a related issue I think, args passed to spark-submit to my cluster dispatcher get lost in translation when lauching the driver from mesos, I'm suggesting this patch: https://github.com/jayv/spark/commit/b2025ddc1d565d1cc3036200fc3b3046578f4b02 - Jo Voordeckers On Thu, Nov 12, 2015 at 6:05 AM, John Omernik <j...@omernik.com> wrote: > Hey all, > > I noticed today that if I take a tgz as my URI for Mesos, that I have to > repackaged it with my conf settings from where I execute say pyspark for > the executors to have the right configuration settings. > > That is... > > If I take a "stock" tgz from makedistribution.sh, unpack it, and then set > the URI in spark-defaults to be the unmodified tgz as the URI. Change other > settings in both spark-defaults.conf and spark-env.sh, then run > ./bin/pyspark from that unpacked directory, I guess I would have thought > that when the executor spun up, that some sort of magic was happening where > the conf directory or the conf settings would propagate out to the > executors (thus making configuration changes easier to manage) > > For things to work, I had to unpack the tgz, change conf settings, then > repackage the tgz with all my conf settings for the tgz in the URI then run > it. Then it seemed to work. > > I have a work around, but I guess, from a usability point of view, it > would be nice to have tgz that is "binaries" and that when it's run, it > takes the conf at run time. It would help with managing multiple > configurations that are using the same binaries (different models/apps etc) > Instead of having to repackage an tgz for each app, it would just > propagate...am I looking at this wrong? > > John > > >
Re: Spark-shell connecting to Mesos stuck at sched.cpp
I've seen this issue when the mesos cluster couldn't figure out my IP address correctly, have you tried setting the ENV var with your IP address when launching spark or mesos cluster dispatcher like: LIBPROCESS_IP="172.16.0.180" - Jo Voordeckers On Sun, Nov 15, 2015 at 6:59 PM, Jong Wook Kim <jongw...@nyu.edu> wrote: > I'm having problem connecting my spark app to a Mesos cluster; any help on > the below question would be appreciated. > > > http://stackoverflow.com/questions/33727154/spark-shell-connecting-to-mesos-stuck-at-sched-cpp > > Thanks, > Jong Wook >
Mesos cluster dispatcher doesn't respect most args from the submit req
Hi all, I'm running the mesos cluster dispatcher, however when I submit jobs with things like jvm args, classpath order and UI port aren't added to the commandline executed by the mesos scheduler. In fact it only cares about the class, jar and num cores/mem. https://github.com/jayv/spark/blob/mesos_cluster_params/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala#L412-L424 I've made an attempt at adding a few of the args that I believe are useful to the MesosClusterScheduler class, which seems to solve my problem. Please have a look: https://github.com/apache/spark/pull/9752 Thanks - Jo Voordeckers
Spark 1.5.1 on Mesos NO Executor Java Options
Hi everyone, I'm trying to setup Spark 1.5.1 with mesos and the Cluster Dispatcher that I'm currently running on one of the slaves. We're migrating from a 1.3 standalone cluster and we're hoping to benefit from dynamic resource allocation with fine grained mesos for a better distribution of resources between regular jobs and streaming jobs. My driver tries to launch, but fails because none of the system properties are present, yet in the UI for the dispatcher it lists the options. It looks related to this: https://issues.apache.org/jira/browse/SPARK-2921 *Spark Submit* bin/spark-submit --deploy-mode cluster --master mesos:// > mesoss1.qa.xxx.com:7077 --conf > "spark.executor.extraJavaOptions=-Dcom.sun.jersey.server.impl.cdi.lookupExtensionInBeanManager=true" > --class com.xxx.tools.cdi.Bootstrap > /home/deployment/spork-1.0.0-SNAPSHOT-fat.jar /home/deployment/xyz.json also tried spark.driver.extraJavaOptions without success *Dispatcher UI* Scheduler propertyValuespark.executor.extraJavaOptions -Dcom.sun.jersey.server.impl.cdi.lookupExtensionInBeanManager=truespark.jars file:/home/deployment/spork-1.0.0-SNAPSHOT-fat.jarspark.driver.supervise falsespark.app.namecom.xxx.tools.cdi.Bootstrapspark.submit.deployModecluster spark.mastermesos://mesoss1.qa.xxx.com:7077 *Logs in Sandbox* Starting task driver-20151104023425-0011 > Forked command at 6554 > sh -c '/opt/spark/bin/spark-submit --name com.xxx.tools.cdi.Bootstrap > --class com.xxx.tools.cdi.Bootstrap --master > mesos://zk://zk-seed-1:2181,zk-seed-2:2181,zk-seed-3:2181/mesos > --driver-cores 1.0 --driver-memory 1024M spork-1.0.0-SNAPSHOT-fat.jar > /home/deployment/xyz.json' I hope someone has a clue. Thanks!! - Jo