This is great information. Thanks for sharing Steven!

On Tue, Jun 27, 2017 at 7:05 AM, Steven Schlansker <
sschlans...@opentable.com> wrote:

>
> > On Jun 25, 2017, at 11:24 PM, Benjamin Mahler <bmah...@apache.org>
> wrote:
> >
> > As a data point, as far as I'm aware, most users are using a local work
> directory, not an NFS mounted one. Would love to hear from anyone on the
> list if they are doing this, and if there are any subtleties that should be
> documented.
>
> We don't run NFS in particular but we did originally use a SAN -- two
> observations:
>
> NFS (historically, maybe it's better now, but doubtful...) has really bad
> failure modes.
> Network failures can cause serious hangs both in user-space and
> kernel-space.  Such
> hangs can be impossible to clear without rebooting the machine, and in
> some edge cases
> can even make it difficult or impossible to reboot the machine via normal
> means.
>
> Network attached drives (our SAN) are less reliable, slower, and more
> complex
> (read: more failure modes) than local disk.  It's also a really big single
> point
> of failure.  So far our only true cluster outages have been due to failure
> of
> the SAN, since it took down all nodes at once -- once we removed the SAN,
> future
> failures had islands of availability and any properly written application
> could continue running (obviously without network resources) through the
> incident.
>
> Maybe this isn't a huge deal for your use case, which might differ from
> ours.
> For us, it was enough of a problem that we now purchase local SSD scratch
> space
> for every node just so that we have some storage we can depend on a bit
> more
> than network attached storage.
>
> >
> > On Thu, Jun 22, 2017 at 11:13 PM, <thomas.kurm...@artorg.unibe.ch>
> wrote:
> > Hi,
> >
> > We have a couple of server nodes mainly used for computational tasks in
> > our mesos cluster. These servers have beefy cpus, gpus etc. but only
> > limited ssd space. We also have a 40GBe network and a decently fast
> > file server.
> >
> > My question is simple but I didnt find an answer anywhere: What are the
> > best practices for the working directory on mesos-agent nodes? Should
> > we keep the working directory local or is it reasonable to use a nfs
> > mounted folder? We implemented both and they seem to work fine, but I
> > would rather like to follow "best practices".
> >
> > Thanks and cheers
> >
> > Tom
> >
>
>

Reply via email to