Hi,
I'm developing my own framework - that distributes >100 independent
tasks across the cluster and just run them arbitrarily. My problem is,
each task execution environment is a bit large tarball (2~6GB, mostly
application jar files) and task itself finishes within 1~200 seconds,
while tarball extraction takes like tens of seconds every time.
Extracting the same tarball again and again in all tasks is a wasteful
overhead that cannot be ignored.

Fetcher cache is great, but in my case, fetcher cache isn't even
enough and I want to preserve all files extracted from the tarball
while my executor is alive. If Mesos could cache all files extracted
from the tarball by omitting not only download but extraction, I could
save more time.

In "Fetcher Cache Internals" [1] or in "Fetcher Cache" [2] section in
the official document, such issues or future work is not mentioned -
how do you solve this kind of extraction overhead problem, when you
have rather large resource ?

An option would be setting up an internal docker registry and let
slaves cache the docker image that includes our jar files and save
tarball extraction. But, I want to prevent our system from additional
moving parts as much as I can.

Another option might be let fetcher fetch all jar files independently
in slaves, but I think it feasible, but I don't think it manageable in
production in an easy way.

PS Mesos is great; it is helping us a lot - I want to appreciate all
the efforts by the community. Thank you so much!

[1] http://mesos.apache.org/documentation/latest/fetcher-cache-internals/
[2] http://mesos.apache.org/documentation/latest/fetcher/

Kota UENISHI

Reply via email to