Re: advice on maintaining a production spark cluster?

Mayur Rustagi Mon, 19 May 2014 19:37:15 -0700

You are better off using Mesos for production cluster. Standalone mode will
not provide reliability & availability in production. That said it depends
on what production means. Many of my analytics customers use standalone in
production.
Regards
Mayur


Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi <https://twitter.com/mayur_rustagi>



On Fri, May 16, 2014 at 10:23 PM, Josh Marcus <jmar...@meetup.com> wrote:

> Hey folks,
>
> I'm wondering what strategies other folks are using for maintaining and
> monitoring the stability of stand-alone spark clusters.
>
> Our master very regularly loses workers, and they (as expected) never
> rejoin the cluster.  This is the same behavior I've seen
> using akka cluster (if that's what spark is using in stand-alone mode) --
> are there configuration options we could be setting
> to make the cluster more robust?
>
> We have a custom script which monitors the number of workers (through the
> web interface) and restarts the cluster when
> necessary, as well as resolving other issues we face (like spark shells
> left open permanently claiming resources), and it
> works, but it's no where close to a great solution.
>
> What are other folks doing?  Is this something that other folks observe as
> well?  I suspect that the loss of workers is tied to
> jobs that run out of memory on the client side or our use of very large
> broadcast variables, but I don't have an isolated test case.
> I'm open to general answers here: for example, perhaps we should simply be
> using mesos or yarn instead of stand-alone mode.
>
> --j
>
>

Re: advice on maintaining a production spark cluster?

Reply via email to