Re: Agent Working Directory Best Practices

James Peach Mon, 26 Jun 2017 17:31:03 -0700

> On Jun 26, 2017, at 4:05 PM, Steven Schlansker <[email protected]> 
> wrote:
> 
> 
>> On Jun 25, 2017, at 11:24 PM, Benjamin Mahler <[email protected]> wrote:
>> 
>> As a data point, as far as I'm aware, most users are using a local work 
>> directory, not an NFS mounted one. Would love to hear from anyone on the 
>> list if they are doing this, and if there are any subtleties that should be 
>> documented.
> 
> We don't run NFS in particular but we did originally use a SAN -- two 
> observations:
> 
> NFS (historically, maybe it's better now, but doubtful...) has really bad 
> failure modes.
> Network failures can cause serious hangs both in user-space and kernel-space. 
>  Such
> hangs can be impossible to clear without rebooting the machine, and in some 
> edge cases
> can even make it difficult or impossible to reboot the machine via normal 
> means.


You need to make sure to mount with the "intr" option.

https://speakerdeck.com/gnb/130-lca2008-nfs-tuning-secrets-d7

> 
> Network attached drives (our SAN) are less reliable, slower, and more complex
> (read: more failure modes) than local disk.  It's also a really big single 
> point
> of failure.  So far our only true cluster outages have been due to failure of
> the SAN, since it took down all nodes at once -- once we removed the SAN, 
> future
> failures had islands of availability and any properly written application
> could continue running (obviously without network resources) through the 
> incident.
> 
> Maybe this isn't a huge deal for your use case, which might differ from ours.
> For us, it was enough of a problem that we now purchase local SSD scratch 
> space
> for every node just so that we have some storage we can depend on a bit more
> than network attached storage.
> 
>> 
>> On Thu, Jun 22, 2017 at 11:13 PM, <[email protected]> wrote:
>> Hi,
>> 
>> We have a couple of server nodes mainly used for computational tasks in
>> our mesos cluster. These servers have beefy cpus, gpus etc. but only
>> limited ssd space. We also have a 40GBe network and a decently fast
>> file server.
>> 
>> My question is simple but I didnt find an answer anywhere: What are the
>> best practices for the working directory on mesos-agent nodes? Should
>> we keep the working directory local or is it reasonable to use a nfs
>> mounted folder? We implemented both and they seem to work fine, but I
>> would rather like to follow "best practices".
>> 
>> Thanks and cheers
>> 
>> Tom
>> 
>

Re: Agent Working Directory Best Practices

Reply via email to