Hi Tim,

Sure, here's some preliminary thoughts.

In a Mesos cluster that has only one framework, it would suffice for the
scheduler to have this strategy;

- when assigning for a task that needs data locality, assign from an offer
from a host that has the data
- when assigning for a task that does need data locality, do not assign
from an offer from a host that has/had another task which produced data
needed by others for data locality

This strategy would naturally cluster hosts into two groups: one in which
hosts are used for data locality and another in which hosts run tasks that
don't need data locality. Or, multiple groups if not all data is identical.

Now, if there were to be multiple frameworks in the cluster, we would need
new support in Mesos to ensure the above strategy works. Mesos allocater
would need to do the following:

- when giving out offers to framework A, prefer hosts that had other tasks
running (or previously run) from framework A.

As an example, say we have two frameworks A and B. And say there are 4
hosts, h1, h2, h3, and h4, each with 4 cores.
If, say, A and B are assigned 1:1, that is 8 cores each. Say currently, 2
cores from each of the 4 hosts are offered to frameworks A and B. A variety
of reasons could have resulted in such a split.

Now, say framework A launches a task that uses 2 cores and it uses its
offer on host h1. Now, framework A has no ability to launch another task to
achieve data locality. To keep resource allocation still 1:1 and help data
locality, it would be nice if Mesos did the following:

- rescind 2-core offer on h1 from framework B
- rescind 2-core offer on h2 from framework A
- send 2-core offer on h1 to framework A
- send 2-core offer on h2 to framework B

This would need to be done only if framework A indicated, when launching
its task on h1, that this is a task that produces data for locality
purposes.

Similarly, other scenarios and other resource types can be dealt with in
this new strategy.





On Fri, Jan 16, 2015 at 9:53 AM, Tim Chen <t...@mesosphere.io> wrote:

> Hi Sharma,
>
> You're correct and that's how most schedulers handle this, which is to
> handle the locality information itself.
>
> We've considering and finding primitives to help in this front though, so
> if you have any input let us know how to help manage locality that fits at
> the level of Mesos.
>
> Tim
>
> On Fri, Jan 16, 2015 at 9:34 AM, Sharma Podila <spod...@netflix.com>
> wrote:
>
>> Using the attributes would be the simplest way, if the slave were to
>> support dynamic updates of the attributes. The JIRA that Tim references
>> would be nice! Otherwise one would have to resort to something like a
>> wrapper script of the mesos-slave process that detects new data
>> availability and restarts mesos-slave with new attributes in cmdline.
>> Restarts may be OK when slaves are run to checkpoint state and recover
>> state upon restart.
>>
>> Another possibility in the interim would be for the framework scheduler
>> to launch the task that does the download of the file(s) to the small
>> subset of nodes. Then, the scheduler can maintain this state information
>> and assign the tasks based on that. This has the additional advantage of
>> maintaining the list of that subset of nodes in a more dynamic way, if that
>> is useful to you.
>>
>> In general, I am a fan of achieving data locality via the scheduler's
>> state info. In a more generic scenario, the data would be created
>> dynamically by tasks previously run (instead of just an initial download)
>> and therefore locality for such data is easier done via the scheduler.
>>
>>
>>
>> On Fri, Jan 16, 2015 at 12:15 AM, Tim Chen <t...@mesosphere.io> wrote:
>>
>>> Hi Douglas,
>>>
>>> The simplest way that Mesos can support is to add attributes via cli
>>> flags when you launch a mesos slave. And when this slave's resources is
>>> being offered, it will also include all the attributes you've tagged.
>>>
>>> This currently is static information on launch, and I believe there is
>>> JIRA tickets to make this dynamic (updatable at runtime).
>>>
>>> Tim
>>>
>>> On Thu, Jan 15, 2015 at 7:23 PM, Douglas Voet <dv...@broadinstitute.org>
>>> wrote:
>>>
>>>> Hello,
>>>>
>>>> I am evaluating mesos in the context of running analyses of many large
>>>> files. I only want to download a file to a small subset of my nodes and
>>>> route the related processing there. The mesos paper talks about using
>>>> resource offers as a mechanism to achieve data locality but I can't find
>>>> any reference to how one might do this in the documentation. How would a
>>>> mesos slave know what data is available keeping in mind that that might
>>>> change over time? How can I configure a slave to include this information
>>>> in resource offers?
>>>>
>>>> Thanks in advance for any pointers.
>>>>
>>>> -Doug
>>>>
>>>
>>>
>>
>

Reply via email to