Can you upload the full logs somewhere and link to them here?

How many frameworks are you running? Do they all run in the "*" role?
Are the tasks short lived or long lived?
Can you update your test to not use the --offer_timeout? The intention of
that is to mitigate against frameworks that hold on to offers, but it
sounds like your frameworks decline.

On Thu, Mar 2, 2017 at 3:57 PM, Harold Molina-Bulla <h.mol...@tsc.uc3m.es>
wrote:

> Hi,
>
> Thanks for your reply.
>
> Hi there, more clarification is needed:
>
>> I have close to 800 CPUs, but the system does not assign all the
>> available resources to all our tasks.
>>
> What do you mean precisely here? Can you describe what you're seeing?
> Also, you have more than 800GB or RAM right?
>
>
> Yes, we have at least 2GBytes per CPU, and typically our resource table
> looks like:
>
> In this case 346/788 cpus are available and not assigned to any task, but
> we have more than 400 tasks waiting to be running.
>
> Checking the mesos-master log, it not make offers to all running
> frameworks all the time, just a few ones:
>
> I0303 00:16:01.964318 31791 master.cpp:6517] Sending 3 offers to framework
> 4d896f23-1ce0-46d6-ae0f-acbe23f2a38c-0053 (Ejecucion: FRUS) at
> scheduler-52a267e9-30d1-4cc8-847e-fa7acfddf855@192.168.151.147:32899
> I0303 00:16:01.966234 31791 master.cpp:6517] Sending 5 offers to framework
> 4d896f23-1ce0-46d6-ae0f-acbe23f2a38c-0072 (:izanami) at
> scheduler-ce746b8b-adac-4a0c-8310-5d312c9ed04f@192.168.151.186:44233
> I0303 00:16:01.968003 31791 master.cpp:6517] Sending 6 offers to framework
> 4d896f23-1ce0-46d6-ae0f-acbe23f2a38c-0084 (vatmoutput) at
> scheduler-078b1978-840a-437e-a23e-5bca8c5e05c8@192.168.151.84:43023
> I0303 00:16:01.969828 31791 master.cpp:6517] Sending 6 offers to framework
> 4d896f23-1ce0-46d6-ae0f-acbe23f2a38c-0081 (vatmoutput) at
> scheduler-d921e4bb-ee23-4e77-93d9-7742264839e5@192.168.151.84:43067
> I0303 00:16:01.971613 31791 master.cpp:6517] Sending 6 offers to framework
> c5299003-e29d-43cb-8ca7-887ab24c8513-0175 (:izanami) at
> scheduler-e10a1167-62d7-4ded-b932-792b5478ab61@192.168.151.186:38706
> I0303 00:16:01.973351 31791 master.cpp:6517] Sending 6 offers to framework
> 4d896f23-1ce0-46d6-ae0f-acbe23f2a38c-0082 (vatmoutputg) at
> scheduler-c4db35be-41e1-45cb-8005-f0f7827a23d0@192.168.151.84:33668
> I0303 00:16:01.975126 31791 master.cpp:6517] Sending 6 offers to framework
> 4d896f23-1ce0-46d6-ae0f-acbe23f2a38c-0062 (vatmvalidation) at
> scheduler-44ed1457-a752-4037-89b6-590221db3de5@192.168.151.84:33148
> I0303 00:16:01.976877 31791 master.cpp:6517] Sending 6 offers to framework
> 4d896f23-1ce0-46d6-ae0f-acbe23f2a38c-0077 (:izanami) at
> scheduler-c648708f-32f3-44d5-9014-3fd0dbb461f7@192.168.151.186:35345
> I0303 00:16:01.978590 31791 master.cpp:6517] Sending 6 offers to framework
> 4d896f23-1ce0-46d6-ae0f-acbe23f2a38c-0083 (vatmoutputg) at
> scheduler-fb965e89-5764-4a07-a94a-43de45babc7a@192.168.151.84:39218
>
> We have close to twice Frameworks running in this moment, one of them (not
> included) with more than 300 tasks waiting and just 100 cpus assigned (1
> cpu per task).
>
> The problem is (we think): the mesos-master does not offers resources to
> all the tasks all the time and the declined resources are not re-offered to
> other tasks. Any idea to how to change the behavior or the rate to offer
> resources to the tasks?
>
> FYI We set the --offer_timeout=1sec
>
> Thanks in advance.
>
> Harold Molina-Bulla Ph.D.
> On 02/03/2017 23:28, Benjamin Mahler wrote:
>
>
> Ben
>
> On Thu, Mar 2, 2017 at 9:00 AM, Harold Molina-Bulla <h.mol...@tsc.uc3m.es>
> wrote:
>
>> Hi Everybody,
>>
>> We are trying to develop an Scheduler in Python to distribute processes
>> in a Mesos cluster.
>>
>> I have close to 800 CPUs, but the system does not assign all the
>> available resources to all our tasks.
>>
>> In order to test, we are defining: 1 CPU, 1Gbyte RAM per process in order
>> all the process fits on our machines. And launch several scripts
>> simultaneous in order to have Nprocs > Ncpus (close 900 tasks in total).
>>
>> Our script is based on the test_framework.py example included in the
>> Mesos src distribution, with changes like if the list of tasks to launch is
>> empty, send an decline message.
>>
>> We have deployed Mesos 1.1.0.
>>
>> Any ideas in order the improvement the use of our resources?
>>
>> Thx in advance!
>> Harold Molina-Bulla Ph.D.
>> --
>>
>> *"En una época de mentira universal, decir la verdad constituye un acto
>> revolucionario”*
>> George Orwell (1984)
>>
>> Recuerda: PRISM te está vigilando!!! X)
>> *Harold Molina-Bulla*
>> Clave GnuPG: *189D5144*
>>
>
>
> --
>
> *"En una época de mentira universal, decir la verdad constituye un acto
> revolucionario”*
> George Orwell (1984)
>
> Recuerda: PRISM te está vigilando!!! X)
> *Harold Molina-Bulla*
> *h.mol...@tsc.uc3m.es <h.mol...@tsc.uc3m.es>*
> Clave GnuPG: *189D5144*
>

Reply via email to