Can you upload the full logs somewhere and link to them here? How many frameworks are you running? Do they all run in the "*" role? Are the tasks short lived or long lived? Can you update your test to not use the --offer_timeout? The intention of that is to mitigate against frameworks that hold on to offers, but it sounds like your frameworks decline.
On Thu, Mar 2, 2017 at 3:57 PM, Harold Molina-Bulla <[email protected]> wrote: > Hi, > > Thanks for your reply. > > Hi there, more clarification is needed: > >> I have close to 800 CPUs, but the system does not assign all the >> available resources to all our tasks. >> > What do you mean precisely here? Can you describe what you're seeing? > Also, you have more than 800GB or RAM right? > > > Yes, we have at least 2GBytes per CPU, and typically our resource table > looks like: > > In this case 346/788 cpus are available and not assigned to any task, but > we have more than 400 tasks waiting to be running. > > Checking the mesos-master log, it not make offers to all running > frameworks all the time, just a few ones: > > I0303 00:16:01.964318 31791 master.cpp:6517] Sending 3 offers to framework > 4d896f23-1ce0-46d6-ae0f-acbe23f2a38c-0053 (Ejecucion: FRUS) at > [email protected]:32899 > I0303 00:16:01.966234 31791 master.cpp:6517] Sending 5 offers to framework > 4d896f23-1ce0-46d6-ae0f-acbe23f2a38c-0072 (:izanami) at > [email protected]:44233 > I0303 00:16:01.968003 31791 master.cpp:6517] Sending 6 offers to framework > 4d896f23-1ce0-46d6-ae0f-acbe23f2a38c-0084 (vatmoutput) at > [email protected]:43023 > I0303 00:16:01.969828 31791 master.cpp:6517] Sending 6 offers to framework > 4d896f23-1ce0-46d6-ae0f-acbe23f2a38c-0081 (vatmoutput) at > [email protected]:43067 > I0303 00:16:01.971613 31791 master.cpp:6517] Sending 6 offers to framework > c5299003-e29d-43cb-8ca7-887ab24c8513-0175 (:izanami) at > [email protected]:38706 > I0303 00:16:01.973351 31791 master.cpp:6517] Sending 6 offers to framework > 4d896f23-1ce0-46d6-ae0f-acbe23f2a38c-0082 (vatmoutputg) at > [email protected]:33668 > I0303 00:16:01.975126 31791 master.cpp:6517] Sending 6 offers to framework > 4d896f23-1ce0-46d6-ae0f-acbe23f2a38c-0062 (vatmvalidation) at > [email protected]:33148 > I0303 00:16:01.976877 31791 master.cpp:6517] Sending 6 offers to framework > 4d896f23-1ce0-46d6-ae0f-acbe23f2a38c-0077 (:izanami) at > [email protected]:35345 > I0303 00:16:01.978590 31791 master.cpp:6517] Sending 6 offers to framework > 4d896f23-1ce0-46d6-ae0f-acbe23f2a38c-0083 (vatmoutputg) at > [email protected]:39218 > > We have close to twice Frameworks running in this moment, one of them (not > included) with more than 300 tasks waiting and just 100 cpus assigned (1 > cpu per task). > > The problem is (we think): the mesos-master does not offers resources to > all the tasks all the time and the declined resources are not re-offered to > other tasks. Any idea to how to change the behavior or the rate to offer > resources to the tasks? > > FYI We set the --offer_timeout=1sec > > Thanks in advance. > > Harold Molina-Bulla Ph.D. > On 02/03/2017 23:28, Benjamin Mahler wrote: > > > Ben > > On Thu, Mar 2, 2017 at 9:00 AM, Harold Molina-Bulla <[email protected]> > wrote: > >> Hi Everybody, >> >> We are trying to develop an Scheduler in Python to distribute processes >> in a Mesos cluster. >> >> I have close to 800 CPUs, but the system does not assign all the >> available resources to all our tasks. >> >> In order to test, we are defining: 1 CPU, 1Gbyte RAM per process in order >> all the process fits on our machines. And launch several scripts >> simultaneous in order to have Nprocs > Ncpus (close 900 tasks in total). >> >> Our script is based on the test_framework.py example included in the >> Mesos src distribution, with changes like if the list of tasks to launch is >> empty, send an decline message. >> >> We have deployed Mesos 1.1.0. >> >> Any ideas in order the improvement the use of our resources? >> >> Thx in advance! >> Harold Molina-Bulla Ph.D. >> -- >> >> *"En una época de mentira universal, decir la verdad constituye un acto >> revolucionario”* >> George Orwell (1984) >> >> Recuerda: PRISM te está vigilando!!! X) >> *Harold Molina-Bulla* >> Clave GnuPG: *189D5144* >> > > > -- > > *"En una época de mentira universal, decir la verdad constituye un acto > revolucionario”* > George Orwell (1984) > > Recuerda: PRISM te está vigilando!!! X) > *Harold Molina-Bulla* > *[email protected] <[email protected]>* > Clave GnuPG: *189D5144* >

