Can you upload the full logs somewhere and link to them here? How many frameworks are you running? Do they all run in the "*" role? Are the tasks short lived or long lived? Can you update your test to not use the --offer_timeout? The intention of that is to mitigate against frameworks that hold on to offers, but it sounds like your frameworks decline.
On Thu, Mar 2, 2017 at 3:57 PM, Harold Molina-Bulla <h.mol...@tsc.uc3m.es> wrote: > Hi, > > Thanks for your reply. > > Hi there, more clarification is needed: > >> I have close to 800 CPUs, but the system does not assign all the >> available resources to all our tasks. >> > What do you mean precisely here? Can you describe what you're seeing? > Also, you have more than 800GB or RAM right? > > > Yes, we have at least 2GBytes per CPU, and typically our resource table > looks like: > > In this case 346/788 cpus are available and not assigned to any task, but > we have more than 400 tasks waiting to be running. > > Checking the mesos-master log, it not make offers to all running > frameworks all the time, just a few ones: > > I0303 00:16:01.964318 31791 master.cpp:6517] Sending 3 offers to framework > 4d896f23-1ce0-46d6-ae0f-acbe23f2a38c-0053 (Ejecucion: FRUS) at > scheduler-52a267e9-30d1-4cc8-847e-fa7acfddf855@192.168.151.147:32899 > I0303 00:16:01.966234 31791 master.cpp:6517] Sending 5 offers to framework > 4d896f23-1ce0-46d6-ae0f-acbe23f2a38c-0072 (:izanami) at > scheduler-ce746b8b-adac-4a0c-8310-5d312c9ed04f@192.168.151.186:44233 > I0303 00:16:01.968003 31791 master.cpp:6517] Sending 6 offers to framework > 4d896f23-1ce0-46d6-ae0f-acbe23f2a38c-0084 (vatmoutput) at > scheduler-078b1978-840a-437e-a23e-5bca8c5e05c8@192.168.151.84:43023 > I0303 00:16:01.969828 31791 master.cpp:6517] Sending 6 offers to framework > 4d896f23-1ce0-46d6-ae0f-acbe23f2a38c-0081 (vatmoutput) at > scheduler-d921e4bb-ee23-4e77-93d9-7742264839e5@192.168.151.84:43067 > I0303 00:16:01.971613 31791 master.cpp:6517] Sending 6 offers to framework > c5299003-e29d-43cb-8ca7-887ab24c8513-0175 (:izanami) at > scheduler-e10a1167-62d7-4ded-b932-792b5478ab61@192.168.151.186:38706 > I0303 00:16:01.973351 31791 master.cpp:6517] Sending 6 offers to framework > 4d896f23-1ce0-46d6-ae0f-acbe23f2a38c-0082 (vatmoutputg) at > scheduler-c4db35be-41e1-45cb-8005-f0f7827a23d0@192.168.151.84:33668 > I0303 00:16:01.975126 31791 master.cpp:6517] Sending 6 offers to framework > 4d896f23-1ce0-46d6-ae0f-acbe23f2a38c-0062 (vatmvalidation) at > scheduler-44ed1457-a752-4037-89b6-590221db3de5@192.168.151.84:33148 > I0303 00:16:01.976877 31791 master.cpp:6517] Sending 6 offers to framework > 4d896f23-1ce0-46d6-ae0f-acbe23f2a38c-0077 (:izanami) at > scheduler-c648708f-32f3-44d5-9014-3fd0dbb461f7@192.168.151.186:35345 > I0303 00:16:01.978590 31791 master.cpp:6517] Sending 6 offers to framework > 4d896f23-1ce0-46d6-ae0f-acbe23f2a38c-0083 (vatmoutputg) at > scheduler-fb965e89-5764-4a07-a94a-43de45babc7a@192.168.151.84:39218 > > We have close to twice Frameworks running in this moment, one of them (not > included) with more than 300 tasks waiting and just 100 cpus assigned (1 > cpu per task). > > The problem is (we think): the mesos-master does not offers resources to > all the tasks all the time and the declined resources are not re-offered to > other tasks. Any idea to how to change the behavior or the rate to offer > resources to the tasks? > > FYI We set the --offer_timeout=1sec > > Thanks in advance. > > Harold Molina-Bulla Ph.D. > On 02/03/2017 23:28, Benjamin Mahler wrote: > > > Ben > > On Thu, Mar 2, 2017 at 9:00 AM, Harold Molina-Bulla <h.mol...@tsc.uc3m.es> > wrote: > >> Hi Everybody, >> >> We are trying to develop an Scheduler in Python to distribute processes >> in a Mesos cluster. >> >> I have close to 800 CPUs, but the system does not assign all the >> available resources to all our tasks. >> >> In order to test, we are defining: 1 CPU, 1Gbyte RAM per process in order >> all the process fits on our machines. And launch several scripts >> simultaneous in order to have Nprocs > Ncpus (close 900 tasks in total). >> >> Our script is based on the test_framework.py example included in the >> Mesos src distribution, with changes like if the list of tasks to launch is >> empty, send an decline message. >> >> We have deployed Mesos 1.1.0. >> >> Any ideas in order the improvement the use of our resources? >> >> Thx in advance! >> Harold Molina-Bulla Ph.D. >> -- >> >> *"En una época de mentira universal, decir la verdad constituye un acto >> revolucionario”* >> George Orwell (1984) >> >> Recuerda: PRISM te está vigilando!!! X) >> *Harold Molina-Bulla* >> Clave GnuPG: *189D5144* >> > > > -- > > *"En una época de mentira universal, decir la verdad constituye un acto > revolucionario”* > George Orwell (1984) > > Recuerda: PRISM te está vigilando!!! X) > *Harold Molina-Bulla* > *h.mol...@tsc.uc3m.es <h.mol...@tsc.uc3m.es>* > Clave GnuPG: *189D5144* >