That solution would likely cause us more pain -- we'd still need to figure
out an appropriate amount of resources to request for artifact downloads /
extractions, our scheduler would need to be sophisticated enough to only
accept offers from the same slave that the setup task ran on, and we'd need
to manage some new shared artifact storage location outside of the
containers. Is splitting workflows into multiple tasks like this a common
pattern?

I personally agree that tasks manually overriding cgroups limits is a
little sketchy (and am curious how MESOS-1279 would affect this
discussion), but I doubt that we'll be the last people to attempt something
like this. In other words, we acknowledge we're going rogue by temporarily
overriding the limits... are there other implications of exposing the
container ID that you're worried about?

Do you have any thoughts about my other idea (overriding the fetcher
executable for a task)?

Thanks,
Tom

On Tue, Aug 12, 2014 at 2:05 PM, Vinod Kone <[email protected]> wrote:

> Thanks Thomas for the clarification.
>
> One solution you could consider would be separating out the setup
> (fetch/extract) phase and running phase into separate mesos tasks. That way
> you can give the setup task resources need for fetching/extracting and as
> soon as it is done, you can send a TASK_FINISHED so that the resources used
> by that task are reclaimed by Mesos. That would give you the dynamism you
> need. Would that work in your scenario?
>
> Having the executor change cgroup limits behind the scenes, opaquely to
> Mesos, seems like a recipe for problems in the future to me, since it could
> lead to temporary over-commit of resources and affect isolation across
> containers.
>
>
>
> On Tue, Aug 12, 2014 at 10:45 AM, Thomas Petr <[email protected]> wrote:
>
>> Hey Vinod,
>>
>> We're not using mesos-fetcher to download the executor -- we ensure our
>> executor exists on the slaves beforehand (during machine provisioning, to
>> be exact). The issue that Whitney is talking about is OOMing while fetching
>> artifacts necessary for task execution (like the JAR for a web service).
>>
>> Our own executor
>> <https://github.com/HubSpot/Singularity/tree/master/SingularityExecutor> has
>> some nice enhancements around S3 downloads and artifact caching that we
>> don't necessarily want to lose if we switched back to using mesos-fetcher.
>>
>> Surfacing the container ID seems like a trivial change, but another
>> alternative could be to allow frameworks to specify an alternative fetcher
>> executable (perhaps in CommandInfo?).
>>
>> Thanks,
>> Tom
>>
>>
>> On Tue, Aug 12, 2014 at 1:09 PM, Vinod Kone <[email protected]> wrote:
>>
>>> Hi Whitney,
>>>
>>> While we could conceivably set the container id in the environment of
>>> the executor, I would like to understand the problem you are facing.
>>>
>>> The fetching and extracting of the executor is done in by mesos-fetcher,
>>> a process forked by slave and run under slave's cgroup. AFAICT, this
>>> shouldn't cause an OOM in the executor. Does your executor do more
>>> fetches/extracts once it is launched (e.g., for user's tasks)?
>>>
>>
>>
>

Reply via email to