Hi,
Thanks for your reply.
> Hi there, more clarification is needed:
>
> I have close to 800 CPUs, but the system does not assign all the
> available resources to all our tasks.
>
> What do you mean precisely here? Can you describe what you're seeing?
> Also, you have more than 800GB or RAM right?
>
Yes, we have at least 2GBytes per CPU, and typically our resource table
looks like:
In this case 346/788 cpus are available and not assigned to any task,
but we have more than 400 tasks waiting to be running.
Checking the mesos-master log, it not make offers to all running
frameworks all the time, just a few ones:
> I0303 00:16:01.964318 31791 master.cpp:6517] Sending 3 offers to
> framework 4d896f23-1ce0-46d6-ae0f-acbe23f2a38c-0053 (Ejecucion: FRUS)
> at scheduler-52a267e9-30d1-4cc8-847e-fa7acfddf855@192.168.151.147:32899
> I0303 00:16:01.966234 31791 master.cpp:6517] Sending 5 offers to
> framework 4d896f23-1ce0-46d6-ae0f-acbe23f2a38c-0072 (:izanami) at
> scheduler-ce746b8b-adac-4a0c-8310-5d312c9ed04f@192.168.151.186:44233
> I0303 00:16:01.968003 31791 master.cpp:6517] Sending 6 offers to
> framework 4d896f23-1ce0-46d6-ae0f-acbe23f2a38c-0084 (vatmoutput) at
> scheduler-078b1978-840a-437e-a23e-5bca8c5e05c8@192.168.151.84:43023
> I0303 00:16:01.969828 31791 master.cpp:6517] Sending 6 offers to
> framework 4d896f23-1ce0-46d6-ae0f-acbe23f2a38c-0081 (vatmoutput) at
> scheduler-d921e4bb-ee23-4e77-93d9-7742264839e5@192.168.151.84:43067
> I0303 00:16:01.971613 31791 master.cpp:6517] Sending 6 offers to
> framework c5299003-e29d-43cb-8ca7-887ab24c8513-0175 (:izanami) at
> scheduler-e10a1167-62d7-4ded-b932-792b5478ab61@192.168.151.186:38706
> I0303 00:16:01.973351 31791 master.cpp:6517] Sending 6 offers to
> framework 4d896f23-1ce0-46d6-ae0f-acbe23f2a38c-0082 (vatmoutputg) at
> scheduler-c4db35be-41e1-45cb-8005-f0f7827a23d0@192.168.151.84:33668
> I0303 00:16:01.975126 31791 master.cpp:6517] Sending 6 offers to
> framework 4d896f23-1ce0-46d6-ae0f-acbe23f2a38c-0062 (vatmvalidation)
> at scheduler-44ed1457-a752-4037-89b6-590221db3de5@192.168.151.84:33148
> I0303 00:16:01.976877 31791 master.cpp:6517] Sending 6 offers to
> framework 4d896f23-1ce0-46d6-ae0f-acbe23f2a38c-0077 (:izanami) at
> scheduler-c648708f-32f3-44d5-9014-3fd0dbb461f7@192.168.151.186:35345
> I0303 00:16:01.978590 31791 master.cpp:6517] Sending 6 offers to
> framework 4d896f23-1ce0-46d6-ae0f-acbe23f2a38c-0083 (vatmoutputg) at
> scheduler-fb965e89-5764-4a07-a94a-43de45babc7a@192.168.151.84:39218
We have close to twice Frameworks running in this moment, one of them
(not included) with more than 300 tasks waiting and just 100 cpus
assigned (1 cpu per task).
The problem is (we think): the mesos-master does not offers resources to
all the tasks all the time and the declined resources are not re-offered
to other tasks. Any idea to how to change the behavior or the rate to
offer resources to the tasks?
FYI We set the --offer_timeout=1sec
Thanks in advance.
Harold Molina-Bulla Ph.D.
On 02/03/2017 23:28, Benjamin Mahler wrote:
>
> Ben
>
> On Thu, Mar 2, 2017 at 9:00 AM, Harold Molina-Bulla
> <h.mol...@tsc.uc3m.es <mailto:h.mol...@tsc.uc3m.es>> wrote:
>
> Hi Everybody,
>
> We are trying to develop an Scheduler in Python to distribute
> processes in a Mesos cluster.
>
> I have close to 800 CPUs, but the system does not assign all the
> available resources to all our tasks.
>
> In order to test, we are defining: 1 CPU, 1Gbyte RAM per process
> in order all the process fits on our machines. And launch several
> scripts simultaneous in order to have Nprocs > Ncpus (close 900
> tasks in total).
>
> Our script is based on the test_framework.py example included in
> the Mesos src distribution, with changes like if the list of tasks
> to launch is empty, send an decline message.
>
> We have deployed Mesos 1.1.0.
>
> Any ideas in order the improvement the use of our resources?
>
> Thx in advance!
>
> Harold Molina-Bulla Ph.D.
> --
>
> /"En una época de mentira universal, decir la verdad constituye un
> acto revolucionario”/
> George Orwell (1984)
>
> Recuerda: PRISM te está vigilando!!! X)
>
> *Harold Molina-Bulla*
> Clave GnuPG: *189D5144*
>
>
--
/"En una época de mentira universal, decir la verdad constituye un acto
revolucionario”/
George Orwell (1984)
Recuerda: PRISM te está vigilando!!! X)
*Harold Molina-Bulla*
/h.mol...@tsc.uc3m.es/
Clave GnuPG: *189D5144*
smime.p7s
Description: S/MIME Cryptographic Signature