Thanks Oliver, a lot of great suggestions. One of the reasons I was interested
in Mesos was the idea of it being more generalized. While this small HPC
cluster will serve one primary job, it will also be used for research purposes.
So being able to easily test out frameworks and not be 'locked in' to one way
of doing things is appealing. Most jobs are relatively CPU/RAM heavy (and
small file disk I/O unfortunately) but I already have a good handle on building
individual compute servers that would handle that, so would be suitable slave
nodes/compute clusters. HA would be nice in terms of ensuring turn-around times
on workflows, but likely isn't a major issue in terms of if it is down for a
few hours no one will lose any sleep or die. If the node could be brought back
up reasonably it should be fine.