> On Jun 26, 2017, at 4:05 PM, Steven Schlansker <[email protected]> > wrote: > > >> On Jun 25, 2017, at 11:24 PM, Benjamin Mahler <[email protected]> wrote: >> >> As a data point, as far as I'm aware, most users are using a local work >> directory, not an NFS mounted one. Would love to hear from anyone on the >> list if they are doing this, and if there are any subtleties that should be >> documented. > > We don't run NFS in particular but we did originally use a SAN -- two > observations: > > NFS (historically, maybe it's better now, but doubtful...) has really bad > failure modes. > Network failures can cause serious hangs both in user-space and kernel-space. > Such > hangs can be impossible to clear without rebooting the machine, and in some > edge cases > can even make it difficult or impossible to reboot the machine via normal > means.
You need to make sure to mount with the "intr" option. https://speakerdeck.com/gnb/130-lca2008-nfs-tuning-secrets-d7 > > Network attached drives (our SAN) are less reliable, slower, and more complex > (read: more failure modes) than local disk. It's also a really big single > point > of failure. So far our only true cluster outages have been due to failure of > the SAN, since it took down all nodes at once -- once we removed the SAN, > future > failures had islands of availability and any properly written application > could continue running (obviously without network resources) through the > incident. > > Maybe this isn't a huge deal for your use case, which might differ from ours. > For us, it was enough of a problem that we now purchase local SSD scratch > space > for every node just so that we have some storage we can depend on a bit more > than network attached storage. > >> >> On Thu, Jun 22, 2017 at 11:13 PM, <[email protected]> wrote: >> Hi, >> >> We have a couple of server nodes mainly used for computational tasks in >> our mesos cluster. These servers have beefy cpus, gpus etc. but only >> limited ssd space. We also have a 40GBe network and a decently fast >> file server. >> >> My question is simple but I didnt find an answer anywhere: What are the >> best practices for the working directory on mesos-agent nodes? Should >> we keep the working directory local or is it reasonable to use a nfs >> mounted folder? We implemented both and they seem to work fine, but I >> would rather like to follow "best practices". >> >> Thanks and cheers >> >> Tom >> >

