Re: Messos do not assign all available resources

Benjamin Mahler Thu, 02 Mar 2017 16:54:07 -0800

Also, what is the allocation that each framework has when you reach your
steady state?
Are there frameworks that don't have any more work to do but have a really
low share of the cluster?


On Thu, Mar 2, 2017 at 4:29 PM, Benjamin Mahler <[email protected]> wrote:

> Can you upload the full logs somewhere and link to them here?
>
> How many frameworks are you running? Do they all run in the "*" role?
> Are the tasks short lived or long lived?
> Can you update your test to not use the --offer_timeout? The intention of
> that is to mitigate against frameworks that hold on to offers, but it
> sounds like your frameworks decline.
>
> On Thu, Mar 2, 2017 at 3:57 PM, Harold Molina-Bulla <[email protected]>
> wrote:
>
>> Hi,
>>
>> Thanks for your reply.
>>
>> Hi there, more clarification is needed:
>>
>>> I have close to 800 CPUs, but the system does not assign all the
>>> available resources to all our tasks.
>>>
>> What do you mean precisely here? Can you describe what you're seeing?
>> Also, you have more than 800GB or RAM right?
>>
>>
>> Yes, we have at least 2GBytes per CPU, and typically our resource table
>> looks like:
>>
>> In this case 346/788 cpus are available and not assigned to any task, but
>> we have more than 400 tasks waiting to be running.
>>
>> Checking the mesos-master log, it not make offers to all running
>> frameworks all the time, just a few ones:
>>
>> I0303 00:16:01.964318 31791 master.cpp:6517] Sending 3 offers to
>> framework 4d896f23-1ce0-46d6-ae0f-acbe23f2a38c-0053 (Ejecucion: FRUS) at
>> [email protected]:32899
>> I0303 00:16:01.966234 31791 master.cpp:6517] Sending 5 offers to
>> framework 4d896f23-1ce0-46d6-ae0f-acbe23f2a38c-0072 (:izanami) at
>> [email protected]:44233
>> I0303 00:16:01.968003 31791 master.cpp:6517] Sending 6 offers to
>> framework 4d896f23-1ce0-46d6-ae0f-acbe23f2a38c-0084 (vatmoutput) at
>> [email protected]:43023
>> I0303 00:16:01.969828 31791 master.cpp:6517] Sending 6 offers to
>> framework 4d896f23-1ce0-46d6-ae0f-acbe23f2a38c-0081 (vatmoutput) at
>> [email protected]:43067
>> I0303 00:16:01.971613 31791 master.cpp:6517] Sending 6 offers to
>> framework c5299003-e29d-43cb-8ca7-887ab24c8513-0175 (:izanami) at
>> [email protected]:38706
>> I0303 00:16:01.973351 31791 master.cpp:6517] Sending 6 offers to
>> framework 4d896f23-1ce0-46d6-ae0f-acbe23f2a38c-0082 (vatmoutputg) at
>> [email protected]:33668
>> I0303 00:16:01.975126 31791 master.cpp:6517] Sending 6 offers to
>> framework 4d896f23-1ce0-46d6-ae0f-acbe23f2a38c-0062 (vatmvalidation) at
>> [email protected]:33148
>> I0303 00:16:01.976877 31791 master.cpp:6517] Sending 6 offers to
>> framework 4d896f23-1ce0-46d6-ae0f-acbe23f2a38c-0077 (:izanami) at
>> [email protected]:35345
>> I0303 00:16:01.978590 31791 master.cpp:6517] Sending 6 offers to
>> framework 4d896f23-1ce0-46d6-ae0f-acbe23f2a38c-0083 (vatmoutputg) at
>> [email protected]:39218
>>
>> We have close to twice Frameworks running in this moment, one of them
>> (not included) with more than 300 tasks waiting and just 100 cpus assigned
>> (1 cpu per task).
>>
>> The problem is (we think): the mesos-master does not offers resources to
>> all the tasks all the time and the declined resources are not re-offered to
>> other tasks. Any idea to how to change the behavior or the rate to offer
>> resources to the tasks?
>>
>> FYI We set the --offer_timeout=1sec
>>
>> Thanks in advance.
>>
>> Harold Molina-Bulla Ph.D.
>> On 02/03/2017 23:28, Benjamin Mahler wrote:
>>
>>
>> Ben
>>
>> On Thu, Mar 2, 2017 at 9:00 AM, Harold Molina-Bulla <[email protected]
>> > wrote:
>>
>>> Hi Everybody,
>>>
>>> We are trying to develop an Scheduler in Python to distribute processes
>>> in a Mesos cluster.
>>>
>>> I have close to 800 CPUs, but the system does not assign all the
>>> available resources to all our tasks.
>>>
>>> In order to test, we are defining: 1 CPU, 1Gbyte RAM per process in
>>> order all the process fits on our machines. And launch several scripts
>>> simultaneous in order to have Nprocs > Ncpus (close 900 tasks in total).
>>>
>>> Our script is based on the test_framework.py example included in the
>>> Mesos src distribution, with changes like if the list of tasks to launch is
>>> empty, send an decline message.
>>>
>>> We have deployed Mesos 1.1.0.
>>>
>>> Any ideas in order the improvement the use of our resources?
>>>
>>> Thx in advance!
>>> Harold Molina-Bulla Ph.D.
>>> --
>>>
>>> *"En una época de mentira universal, decir la verdad constituye un acto
>>> revolucionario”*
>>> George Orwell (1984)
>>>
>>> Recuerda: PRISM te está vigilando!!! X)
>>> *Harold Molina-Bulla*
>>> Clave GnuPG: *189D5144*
>>>
>>
>>
>> --
>>
>> *"En una época de mentira universal, decir la verdad constituye un acto
>> revolucionario”*
>> George Orwell (1984)
>>
>> Recuerda: PRISM te está vigilando!!! X)
>> *Harold Molina-Bulla*
>> *[email protected] <[email protected]>*
>> Clave GnuPG: *189D5144*
>>
>
>

Re: Messos do not assign all available resources

Reply via email to