Also, what is the allocation that each framework has when you reach your steady state? Are there frameworks that don't have any more work to do but have a really low share of the cluster?
On Thu, Mar 2, 2017 at 4:29 PM, Benjamin Mahler <[email protected]> wrote: > Can you upload the full logs somewhere and link to them here? > > How many frameworks are you running? Do they all run in the "*" role? > Are the tasks short lived or long lived? > Can you update your test to not use the --offer_timeout? The intention of > that is to mitigate against frameworks that hold on to offers, but it > sounds like your frameworks decline. > > On Thu, Mar 2, 2017 at 3:57 PM, Harold Molina-Bulla <[email protected]> > wrote: > >> Hi, >> >> Thanks for your reply. >> >> Hi there, more clarification is needed: >> >>> I have close to 800 CPUs, but the system does not assign all the >>> available resources to all our tasks. >>> >> What do you mean precisely here? Can you describe what you're seeing? >> Also, you have more than 800GB or RAM right? >> >> >> Yes, we have at least 2GBytes per CPU, and typically our resource table >> looks like: >> >> In this case 346/788 cpus are available and not assigned to any task, but >> we have more than 400 tasks waiting to be running. >> >> Checking the mesos-master log, it not make offers to all running >> frameworks all the time, just a few ones: >> >> I0303 00:16:01.964318 31791 master.cpp:6517] Sending 3 offers to >> framework 4d896f23-1ce0-46d6-ae0f-acbe23f2a38c-0053 (Ejecucion: FRUS) at >> [email protected]:32899 >> I0303 00:16:01.966234 31791 master.cpp:6517] Sending 5 offers to >> framework 4d896f23-1ce0-46d6-ae0f-acbe23f2a38c-0072 (:izanami) at >> [email protected]:44233 >> I0303 00:16:01.968003 31791 master.cpp:6517] Sending 6 offers to >> framework 4d896f23-1ce0-46d6-ae0f-acbe23f2a38c-0084 (vatmoutput) at >> [email protected]:43023 >> I0303 00:16:01.969828 31791 master.cpp:6517] Sending 6 offers to >> framework 4d896f23-1ce0-46d6-ae0f-acbe23f2a38c-0081 (vatmoutput) at >> [email protected]:43067 >> I0303 00:16:01.971613 31791 master.cpp:6517] Sending 6 offers to >> framework c5299003-e29d-43cb-8ca7-887ab24c8513-0175 (:izanami) at >> [email protected]:38706 >> I0303 00:16:01.973351 31791 master.cpp:6517] Sending 6 offers to >> framework 4d896f23-1ce0-46d6-ae0f-acbe23f2a38c-0082 (vatmoutputg) at >> [email protected]:33668 >> I0303 00:16:01.975126 31791 master.cpp:6517] Sending 6 offers to >> framework 4d896f23-1ce0-46d6-ae0f-acbe23f2a38c-0062 (vatmvalidation) at >> [email protected]:33148 >> I0303 00:16:01.976877 31791 master.cpp:6517] Sending 6 offers to >> framework 4d896f23-1ce0-46d6-ae0f-acbe23f2a38c-0077 (:izanami) at >> [email protected]:35345 >> I0303 00:16:01.978590 31791 master.cpp:6517] Sending 6 offers to >> framework 4d896f23-1ce0-46d6-ae0f-acbe23f2a38c-0083 (vatmoutputg) at >> [email protected]:39218 >> >> We have close to twice Frameworks running in this moment, one of them >> (not included) with more than 300 tasks waiting and just 100 cpus assigned >> (1 cpu per task). >> >> The problem is (we think): the mesos-master does not offers resources to >> all the tasks all the time and the declined resources are not re-offered to >> other tasks. Any idea to how to change the behavior or the rate to offer >> resources to the tasks? >> >> FYI We set the --offer_timeout=1sec >> >> Thanks in advance. >> >> Harold Molina-Bulla Ph.D. >> On 02/03/2017 23:28, Benjamin Mahler wrote: >> >> >> Ben >> >> On Thu, Mar 2, 2017 at 9:00 AM, Harold Molina-Bulla <[email protected] >> > wrote: >> >>> Hi Everybody, >>> >>> We are trying to develop an Scheduler in Python to distribute processes >>> in a Mesos cluster. >>> >>> I have close to 800 CPUs, but the system does not assign all the >>> available resources to all our tasks. >>> >>> In order to test, we are defining: 1 CPU, 1Gbyte RAM per process in >>> order all the process fits on our machines. And launch several scripts >>> simultaneous in order to have Nprocs > Ncpus (close 900 tasks in total). >>> >>> Our script is based on the test_framework.py example included in the >>> Mesos src distribution, with changes like if the list of tasks to launch is >>> empty, send an decline message. >>> >>> We have deployed Mesos 1.1.0. >>> >>> Any ideas in order the improvement the use of our resources? >>> >>> Thx in advance! >>> Harold Molina-Bulla Ph.D. >>> -- >>> >>> *"En una época de mentira universal, decir la verdad constituye un acto >>> revolucionario”* >>> George Orwell (1984) >>> >>> Recuerda: PRISM te está vigilando!!! X) >>> *Harold Molina-Bulla* >>> Clave GnuPG: *189D5144* >>> >> >> >> -- >> >> *"En una época de mentira universal, decir la verdad constituye un acto >> revolucionario”* >> George Orwell (1984) >> >> Recuerda: PRISM te está vigilando!!! X) >> *Harold Molina-Bulla* >> *[email protected] <[email protected]>* >> Clave GnuPG: *189D5144* >> > >

