> On Jun 25, 2017, at 11:24 PM, Benjamin Mahler <[email protected]> wrote: > > As a data point, as far as I'm aware, most users are using a local work > directory, not an NFS mounted one. Would love to hear from anyone on the list > if they are doing this, and if there are any subtleties that should be > documented.
We don't run NFS in particular but we did originally use a SAN -- two observations: NFS (historically, maybe it's better now, but doubtful...) has really bad failure modes. Network failures can cause serious hangs both in user-space and kernel-space. Such hangs can be impossible to clear without rebooting the machine, and in some edge cases can even make it difficult or impossible to reboot the machine via normal means. Network attached drives (our SAN) are less reliable, slower, and more complex (read: more failure modes) than local disk. It's also a really big single point of failure. So far our only true cluster outages have been due to failure of the SAN, since it took down all nodes at once -- once we removed the SAN, future failures had islands of availability and any properly written application could continue running (obviously without network resources) through the incident. Maybe this isn't a huge deal for your use case, which might differ from ours. For us, it was enough of a problem that we now purchase local SSD scratch space for every node just so that we have some storage we can depend on a bit more than network attached storage. > > On Thu, Jun 22, 2017 at 11:13 PM, <[email protected]> wrote: > Hi, > > We have a couple of server nodes mainly used for computational tasks in > our mesos cluster. These servers have beefy cpus, gpus etc. but only > limited ssd space. We also have a 40GBe network and a decently fast > file server. > > My question is simple but I didnt find an answer anywhere: What are the > best practices for the working directory on mesos-agent nodes? Should > we keep the working directory local or is it reasonable to use a nfs > mounted folder? We implemented both and they seem to work fine, but I > would rather like to follow "best practices". > > Thanks and cheers > > Tom >
signature.asc
Description: Message signed with OpenPGP using GPGMail

