On Tue, Aug 12, 2014 at 1:17 PM, Thomas Petr <[email protected]> wrote:
> That solution would likely cause us more pain -- we'd still need to figure > out an appropriate amount of resources to request for artifact downloads / > extractions, our scheduler would need to be sophisticated enough to only > accept offers from the same slave that the setup task ran on, and we'd need > to manage some new shared artifact storage location outside of the > containers. Is splitting workflows into multiple tasks like this a common > pattern? > > Agreed. It definitely complicates the scheduler logic a bit. Not sure I understand the "shared artifact location outside containers" dependency w.r.t splitting work flow with tasks solution though. Reg: "Is this a common pattern?" I'm not sure. Aurora/Thermos have support to do setup/run split via processes within the same task. But AFAIK, anything that thermos does on behalf of the users (e.g., fetching/extracting) is charged against container's resources. You could reduce the footprint of "setup" resources by caching etc, which you already seem to be doing. As Sharma suggested, using soft-limits for memory (not yet supported) is one solution. But this of course means you have to live with soft limits throughout the lifetime of the container. Depending on the type of tasks you are running this might or might not suitable for you. > I personally agree that tasks manually overriding cgroups limits is a > little sketchy (and am curious how MESOS-1279 would affect this > discussion), but I doubt that we'll be the last people to attempt something > like this. In other words, we acknowledge we're going rogue by > temporarily overriding the limits... are there other implications of > exposing the container ID that you're worried about? > > One implication of exposing container id is that you are depending/intuiting the cgroup path based on the container id (e.g., /sys/fs/cgroup/mem/mesos/<container-id>). But, what happens when we change the layout in the future (e.g., /sys/fs/cgroup/mem/mesos/*containers/*<container-id>)? Things will break for you unless you do a lock step upgrade of your executors with slaves, which will be a pain. > Do you have any thoughts about my other idea (overriding the fetcher > executable for a task)? > Not sure I understand this idea. How *custom* executors fetch task's artifacts is completely opaque to Mesos. The fetcher used by Mesos slave is only used to fetch executor artifacts, not tasks'.

