Re: Shuffle service fails to register driver - Spark - Mesos

2016-04-15 Thread Jo Voordeckers
Never mind, just figured out my problem,

I was running: *deploy.ExternalShuffleService* instead of
*deploy.mesos.MesosExternalShuffleService*

- Jo Voordeckers


On Fri, Apr 15, 2016 at 2:29 PM, Jo Voordeckers <jo.voordeck...@gmail.com>
wrote:

> Forgot to mention we're running Spark (Streaming) 1.5.1
>
> - Jo Voordeckers
>
>
> On Fri, Apr 15, 2016 at 12:21 PM, Jo Voordeckers <jo.voordeck...@gmail.com
> > wrote:
>
>> Hi all,
>>
>> I've got mesos in coarse grained mode with dyn alloc, shuffle service
>> enabled and am running the shuffle service on every mesos slave.
>>
>> I'm assuming I misconfigured something on the scheduler service, any
>> ideas?
>>
>> On my driver is see a few of these, I guess it's one for every executor :
>>
>> 19:12:29 WARN [shuffle-client-1] MesosExternalShuffleClient - Unable to
>>> register app 78a944c9-3a89-4334-bca3-7108aadb1798- with external
>>> shuffle service. Please manually remove shuffle data after driver exit.
>>> Error: java.lang.RuntimeException: java.lang.UnsupportedOperationException:
>>> Unexpected message:
>>> org.apache.spark.network.shuffle.protocol.mesos.RegisterDriver@c399c59a
>>> at
>>> org.apache.spark.network.shuffle.ExternalShuffleBlockHandler.handleMessage(ExternalShuffleBlockHandler.java:92)
>>> at
>>> org.apache.spark.network.shuffle.ExternalShuffleBlockHandler.receive(ExternalShuffleBlockHandler.java:68)
>>> at
>>> org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:114)
>>> at
>>> org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:87)
>>> at
>>> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:101)
>>> at
>>> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)
>>> at
>>> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
>>> at
>>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
>>> at
>>> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
>>>
>> [...]
>>
>> In the scheduler service I see logs like this on every box:
>>
>> log4j:WARN No appenders could be found for logger
>>> (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
>>> log4j:WARN Please initialize the log4j system properly.
>>> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig
>>> for more info.
>>> Using Spark's repl log4j profile:
>>> org/apache/spark/log4j-defaults-repl.properties
>>> To adjust logging level use sc.setLogLevel("INFO")
>>> 16/04/14 19:12:29 ERROR TransportRequestHandler: Error while invoking
>>> RpcHandler#receive() on RPC id 7280403447531815366
>>> java.lang.UnsupportedOperationException: Unexpected message:
>>> org.apache.spark.network.shuffle.protocol.mesos.RegisterDriver@c399c59a
>>> at
>>> org.apache.spark.network.shuffle.ExternalShuffleBlockHandler.handleMessage(ExternalShuffleBlockHandler.java:92)
>>> at
>>> org.apache.spark.network.shuffle.ExternalShuffleBlockHandler.receive(ExternalShuffleBlockHandler.java:68)
>>> at
>>> org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:114)
>>> at
>>> org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:87)
>>> at
>>> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:101)
>>> at
>>> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)
>>> at
>>> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
>>> at
>>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
>>>
>> [...]
>>
>> Thanks!
>>
>> - Jo Voordeckers
>>
>>
>


Re: Shuffle service fails to register driver - Spark - Mesos

2016-04-15 Thread Jo Voordeckers
Forgot to mention we're running Spark (Streaming) 1.5.1

- Jo Voordeckers


On Fri, Apr 15, 2016 at 12:21 PM, Jo Voordeckers <jo.voordeck...@gmail.com>
wrote:

> Hi all,
>
> I've got mesos in coarse grained mode with dyn alloc, shuffle service
> enabled and am running the shuffle service on every mesos slave.
>
> I'm assuming I misconfigured something on the scheduler service, any ideas?
>
> On my driver is see a few of these, I guess it's one for every executor :
>
> 19:12:29 WARN [shuffle-client-1] MesosExternalShuffleClient - Unable to
>> register app 78a944c9-3a89-4334-bca3-7108aadb1798- with external
>> shuffle service. Please manually remove shuffle data after driver exit.
>> Error: java.lang.RuntimeException: java.lang.UnsupportedOperationException:
>> Unexpected message:
>> org.apache.spark.network.shuffle.protocol.mesos.RegisterDriver@c399c59a
>> at
>> org.apache.spark.network.shuffle.ExternalShuffleBlockHandler.handleMessage(ExternalShuffleBlockHandler.java:92)
>> at
>> org.apache.spark.network.shuffle.ExternalShuffleBlockHandler.receive(ExternalShuffleBlockHandler.java:68)
>> at
>> org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:114)
>> at
>> org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:87)
>> at
>> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:101)
>> at
>> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)
>> at
>> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
>> at
>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
>> at
>> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
>>
> [...]
>
> In the scheduler service I see logs like this on every box:
>
> log4j:WARN No appenders could be found for logger
>> (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
>> log4j:WARN Please initialize the log4j system properly.
>> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for
>> more info.
>> Using Spark's repl log4j profile:
>> org/apache/spark/log4j-defaults-repl.properties
>> To adjust logging level use sc.setLogLevel("INFO")
>> 16/04/14 19:12:29 ERROR TransportRequestHandler: Error while invoking
>> RpcHandler#receive() on RPC id 7280403447531815366
>> java.lang.UnsupportedOperationException: Unexpected message:
>> org.apache.spark.network.shuffle.protocol.mesos.RegisterDriver@c399c59a
>> at
>> org.apache.spark.network.shuffle.ExternalShuffleBlockHandler.handleMessage(ExternalShuffleBlockHandler.java:92)
>> at
>> org.apache.spark.network.shuffle.ExternalShuffleBlockHandler.receive(ExternalShuffleBlockHandler.java:68)
>> at
>> org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:114)
>> at
>> org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:87)
>> at
>> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:101)
>> at
>> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)
>> at
>> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
>> at
>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
>>
> [...]
>
> Thanks!
>
> - Jo Voordeckers
>
>


Shuffle service fails to register driver - Spark - Mesos

2016-04-15 Thread Jo Voordeckers
Hi all,

I've got mesos in coarse grained mode with dyn alloc, shuffle service
enabled and am running the shuffle service on every mesos slave.

I'm assuming I misconfigured something on the scheduler service, any ideas?

On my driver is see a few of these, I guess it's one for every executor :

19:12:29 WARN [shuffle-client-1] MesosExternalShuffleClient - Unable to
> register app 78a944c9-3a89-4334-bca3-7108aadb1798- with external
> shuffle service. Please manually remove shuffle data after driver exit.
> Error: java.lang.RuntimeException: java.lang.UnsupportedOperationException:
> Unexpected message:
> org.apache.spark.network.shuffle.protocol.mesos.RegisterDriver@c399c59a
> at
> org.apache.spark.network.shuffle.ExternalShuffleBlockHandler.handleMessage(ExternalShuffleBlockHandler.java:92)
> at
> org.apache.spark.network.shuffle.ExternalShuffleBlockHandler.receive(ExternalShuffleBlockHandler.java:68)
> at
> org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:114)
> at
> org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:87)
> at
> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:101)
> at
> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)
> at
> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
> at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
> at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
>
[...]

In the scheduler service I see logs like this on every box:

log4j:WARN No appenders could be found for logger
> (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
> log4j:WARN Please initialize the log4j system properly.
> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for
> more info.
> Using Spark's repl log4j profile:
> org/apache/spark/log4j-defaults-repl.properties
> To adjust logging level use sc.setLogLevel("INFO")
> 16/04/14 19:12:29 ERROR TransportRequestHandler: Error while invoking
> RpcHandler#receive() on RPC id 7280403447531815366
> java.lang.UnsupportedOperationException: Unexpected message:
> org.apache.spark.network.shuffle.protocol.mesos.RegisterDriver@c399c59a
> at
> org.apache.spark.network.shuffle.ExternalShuffleBlockHandler.handleMessage(ExternalShuffleBlockHandler.java:92)
> at
> org.apache.spark.network.shuffle.ExternalShuffleBlockHandler.receive(ExternalShuffleBlockHandler.java:68)
> at
> org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:114)
> at
> org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:87)
> at
> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:101)
> at
> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)
> at
> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
> at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
>
[...]

Thanks!

- Jo Voordeckers


Re: Mesos cluster dispatcher doesn't respect most args from the submit req

2015-11-17 Thread Jo Voordeckers
On Tue, Nov 17, 2015 at 5:16 AM, Iulian DragoČ™ <iulian.dra...@typesafe.com>
wrote:

> I think it actually tries to send all properties as part of
> `SPARK_EXECUTOR_OPTS`, which may not be everything that's needed:
>
>
> https://github.com/jayv/spark/blob/mesos_cluster_params/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala#L375-L377
>
>
Aha that's interesting, I overlooked that line, I'll debug some more today
because I know for sure that those options don't make it onto the
commandline when I was running it in my debugger.


> Can you please open a Jira ticket and describe also the symptoms? This
> might be related, or the same issue: SPARK-11280
> <https://issues.apache.org/jira/browse/SPARK-11280> and also SPARK-11327
> <https://issues.apache.org/jira/browse/SPARK-11327>
>

SPARK-11327 <https://issues.apache.org/jira/browse/SPARK-11327> is exactly
my problem, but I don't run docker.

 - Jo

On Tue, Nov 17, 2015 at 2:46 AM, Jo Voordeckers <jo.voordeck...@gmail.com>
> wrote:
>
>>
>> Hi all,
>>
>> I'm running the mesos cluster dispatcher, however when I submit jobs with
>> things like jvm args, classpath order and UI port aren't added to the
>> commandline executed by the mesos scheduler. In fact it only cares about
>> the class, jar and num cores/mem.
>>
>>
>> https://github.com/jayv/spark/blob/mesos_cluster_params/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala#L412-L424
>>
>> I've made an attempt at adding a few of the args that I believe are
>> useful to the MesosClusterScheduler class, which seems to solve my problem.
>>
>> Please have a look:
>>
>> https://github.com/apache/spark/pull/9752
>>
>> Thanks
>>
>> - Jo Voordeckers
>>
>>
>
>
> --
>
> --
> Iulian Dragos
>
> --
> Reactive Apps on the JVM
> www.typesafe.com
>
>


Re: Mesos cluster dispatcher doesn't respect most args from the submit req

2015-11-17 Thread Jo Voordeckers
Hi Tim,

I've done more forensics on this bug, see my comment here:

https://issues.apache.org/jira/browse/SPARK-11327?focusedCommentId=15009843=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15009843


- Jo Voordeckers


On Tue, Nov 17, 2015 at 12:01 PM, Timothy Chen <tnac...@gmail.com> wrote:

> Hi Jo,
>
> Thanks for the links, I would expected the properties to be in
> scheduler properties but I need to double check.
>
> I'll be looking into these problems this week.
>
> Tim
>
> On Tue, Nov 17, 2015 at 10:28 AM, Jo Voordeckers
> <jo.voordeck...@gmail.com> wrote:
> > On Tue, Nov 17, 2015 at 5:16 AM, Iulian DragoČ™ <
> iulian.dra...@typesafe.com>
> > wrote:
> >>
> >> I think it actually tries to send all properties as part of
> >> `SPARK_EXECUTOR_OPTS`, which may not be everything that's needed:
> >>
> >>
> >>
> https://github.com/jayv/spark/blob/mesos_cluster_params/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala#L375-L377
> >>
> >
> > Aha that's interesting, I overlooked that line, I'll debug some more
> today
> > because I know for sure that those options don't make it onto the
> > commandline when I was running it in my debugger.
> >
> >>
> >> Can you please open a Jira ticket and describe also the symptoms? This
> >> might be related, or the same issue: SPARK-11280 and also SPARK-11327
> >
> >
> > SPARK-11327 is exactly my problem, but I don't run docker.
> >
> >  - Jo
> >
> >> On Tue, Nov 17, 2015 at 2:46 AM, Jo Voordeckers <
> jo.voordeck...@gmail.com>
> >> wrote:
> >>>
> >>>
> >>> Hi all,
> >>>
> >>> I'm running the mesos cluster dispatcher, however when I submit jobs
> with
> >>> things like jvm args, classpath order and UI port aren't added to the
> >>> commandline executed by the mesos scheduler. In fact it only cares
> about the
> >>> class, jar and num cores/mem.
> >>>
> >>>
> >>>
> https://github.com/jayv/spark/blob/mesos_cluster_params/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala#L412-L424
> >>>
> >>> I've made an attempt at adding a few of the args that I believe are
> >>> useful to the MesosClusterScheduler class, which seems to solve my
> problem.
> >>>
> >>> Please have a look:
> >>>
> >>> https://github.com/apache/spark/pull/9752
> >>>
> >>> Thanks
> >>>
> >>> - Jo Voordeckers
> >>>
> >>
> >>
> >>
> >> --
> >>
> >> --
> >> Iulian Dragos
> >>
> >> --
> >> Reactive Apps on the JVM
> >> www.typesafe.com
> >>
> >
>


Re: Conf Settings in Mesos

2015-11-16 Thread Jo Voordeckers
I have run in a related issue I think, args passed to spark-submit to my
cluster dispatcher get lost in translation when lauching the driver from
mesos, I'm suggesting this patch:

https://github.com/jayv/spark/commit/b2025ddc1d565d1cc3036200fc3b3046578f4b02

- Jo Voordeckers


On Thu, Nov 12, 2015 at 6:05 AM, John Omernik <j...@omernik.com> wrote:

> Hey all,
>
> I noticed today that if I take a tgz as my URI for Mesos, that I have to
> repackaged it with my conf settings from where I execute say pyspark for
> the executors to have the right configuration settings.
>
> That is...
>
> If I take a "stock" tgz from makedistribution.sh, unpack it, and then set
> the URI in spark-defaults to be the unmodified tgz as the URI. Change other
> settings in both spark-defaults.conf and spark-env.sh, then run
> ./bin/pyspark from that unpacked directory, I guess I would have thought
> that when the executor spun up, that some sort of magic was happening where
> the conf directory or the conf settings would propagate out to the
> executors (thus making configuration changes easier to manage)
>
> For things to work, I had to unpack the tgz, change conf settings, then
> repackage the tgz with all my conf settings for the tgz in the URI then run
> it. Then it seemed to work.
>
> I have a work around, but I guess, from a usability point of view, it
> would be nice to have tgz that is "binaries" and that when it's run, it
> takes the conf at run time. It would help with managing multiple
> configurations that are using the same binaries (different models/apps etc)
> Instead of having to repackage an tgz for each app, it would just
> propagate...am I looking at this wrong?
>
> John
>
>
>


Re: Spark-shell connecting to Mesos stuck at sched.cpp

2015-11-16 Thread Jo Voordeckers
I've seen this issue when the mesos cluster couldn't figure out my IP
address correctly, have you tried setting the ENV var with your IP address
when launching spark or mesos cluster dispatcher like:

 LIBPROCESS_IP="172.16.0.180"


- Jo Voordeckers


On Sun, Nov 15, 2015 at 6:59 PM, Jong Wook Kim <jongw...@nyu.edu> wrote:

> I'm having problem connecting my spark app to a Mesos cluster; any help on
> the below question would be appreciated.
>
>
> http://stackoverflow.com/questions/33727154/spark-shell-connecting-to-mesos-stuck-at-sched-cpp
>
> Thanks,
> Jong Wook
>


Mesos cluster dispatcher doesn't respect most args from the submit req

2015-11-16 Thread Jo Voordeckers
Hi all,

I'm running the mesos cluster dispatcher, however when I submit jobs with
things like jvm args, classpath order and UI port aren't added to the
commandline executed by the mesos scheduler. In fact it only cares about
the class, jar and num cores/mem.

https://github.com/jayv/spark/blob/mesos_cluster_params/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala#L412-L424

I've made an attempt at adding a few of the args that I believe are useful
to the MesosClusterScheduler class, which seems to solve my problem.

Please have a look:

https://github.com/apache/spark/pull/9752

Thanks

- Jo Voordeckers


Spark 1.5.1 on Mesos NO Executor Java Options

2015-11-03 Thread Jo Voordeckers
Hi everyone,

I'm trying to setup Spark 1.5.1 with mesos and the Cluster Dispatcher that
I'm currently running on one of the slaves. We're migrating from a 1.3
standalone cluster and we're hoping to benefit from dynamic resource
allocation with fine grained mesos for a better distribution of resources
between regular jobs and streaming jobs.

My driver tries to launch, but fails because none of the system properties
are present, yet in the UI for the dispatcher it lists the options.

It looks related to this: https://issues.apache.org/jira/browse/SPARK-2921

*Spark Submit*

bin/spark-submit --deploy-mode cluster --master mesos://
> mesoss1.qa.xxx.com:7077 --conf
> "spark.executor.extraJavaOptions=-Dcom.sun.jersey.server.impl.cdi.lookupExtensionInBeanManager=true"
> --class com.xxx.tools.cdi.Bootstrap
> /home/deployment/spork-1.0.0-SNAPSHOT-fat.jar /home/deployment/xyz.json


also tried spark.driver.extraJavaOptions without success

*Dispatcher UI*

Scheduler propertyValuespark.executor.extraJavaOptions
-Dcom.sun.jersey.server.impl.cdi.lookupExtensionInBeanManager=truespark.jars
file:/home/deployment/spork-1.0.0-SNAPSHOT-fat.jarspark.driver.supervise
falsespark.app.namecom.xxx.tools.cdi.Bootstrapspark.submit.deployModecluster
spark.mastermesos://mesoss1.qa.xxx.com:7077
*Logs in Sandbox*

Starting task driver-20151104023425-0011
> Forked command at 6554
> sh -c '/opt/spark/bin/spark-submit --name com.xxx.tools.cdi.Bootstrap
> --class com.xxx.tools.cdi.Bootstrap --master
> mesos://zk://zk-seed-1:2181,zk-seed-2:2181,zk-seed-3:2181/mesos
> --driver-cores 1.0 --driver-memory 1024M spork-1.0.0-SNAPSHOT-fat.jar
> /home/deployment/xyz.json'



I hope someone has a clue.

Thanks!!

- Jo