Seems strange that you only have 2MB of allocatable memory on your slave
("total allocatable: cpus(*):2; mem(*):2;").
Try bumping that up to something like 2GB ("mem(*):2048") and I bet you'll
see more tasks able to run.
Even the default executor (no task) needs 32MB, so you won't be able to do
much with a mesos slave that has <64MB memory.
Are you explicitly setting a --resources flag on your slave? If not, do you
only have tiny VMs available for the slaves?


On Thu, Aug 7, 2014 at 7:03 AM, Martin Weindel <martin.wein...@gmail.com>
wrote:

> I'm using Apache Mesos 0.19.0 together with Apache Spark 1.0.2 on a three
> node cluster.
>
> When using the fine-grained task scheduling mode of Spark, I reproducably
> see some kind of dead lock on high load.
> If multiple jobs are running, after some time the jobs do not submit any
> tasks anymore.
>
> I have added some more log output in the Scheduler implementation of Spark
> and it looks as if Mesos does not make any offers anymore, although there
> are allocatable resources.
>
> Below is the log from Mesos. The last task is normally finished, the
> resources recovered, the filters are removed, but the log shows no "sending
> ... offers to framework" entries after this timepoint.
> I have tried to wake up the offers with a reviveOffers call I have added
> to the Spark code, but with no effect.
> The "Resources" section on the Mesos web UI shows all CPUs as idle, none
> is used or offered.
>
> If I kill all jobs but one, this last job continues and finishes normally.
>
> Is this a bug?
>
> Thanks,
> Martin
>
> I0807 15:17:54.605695 15727 master.cpp:2933] Sending 1 offers to framework 
> 20140717-090825-308511242-5050-15711-0044
> I0807 15:17:54.615705 15732 master.cpp:1889] Processing reply for offers: [ 
> 20140717-090825-308511242-5050-15711-2132 ] on slave 
> 20140717-090821-325288458-5050-2360-1 at slave(1)@10.130.99.20:5051 
> (ustst020-cep-node3.usu.usu.grp) for framework 
> 20140717-090825-308511242-5050-15711-0044
> I0807 15:17:54.615897 15732 master.hpp:655] Adding task 1 with resources 
> cpus(*):1; mem(*):1 on slave 20140717-090821-325288458-5050-2360-1 
> (ustst020-cep-node3.usu.usu.grp)
> I0807 15:17:54.616029 15732 master.cpp:3111] Launching task 1 of framework 
> 20140717-090825-308511242-5050-15711-0044 with resources cpus(*):1; mem(*):1 
> on slave 20140717-090821-325288458-5050-2360-1 at slave(1)@10.130.99.20:5051 
> (ustst020-cep-node3.usu.usu.grp)
> I0807 15:17:54.616325 15732 hierarchical_allocator_process.hpp:589] Framework 
> 20140717-090825-308511242-5050-15711-0044 filtered slave 
> 20140717-090821-325288458-5050-2360-1 for 8secs
> I0807 15:17:58.324476 15728 master.cpp:2628] Status update TASK_RUNNING 
> (UUID: ec5ecf90-7313-4bf1-af9e-b5f6e35189f7) for task 1 of framework 
> 20140717-090825-308511242-5050-15711-0044 from slave 
> 20140717-090821-325288458-5050-2360-1 at slave(1)@10.130.99.20:5051 
> (ustst020-cep-node3.usu.usu.grp)
> I0807 15:17:58.326279 15726 master.cpp:1988] Reviving offers for framework 
> 20140717-090825-308511242-5050-15711-0044
> I0807 15:17:58.326406 15732 hierarchical_allocator_process.hpp:660] Removed 
> filters for framework 20140717-090825-308511242-5050-15711-0044
> I0807 15:18:00.993798 15726 master.cpp:2628] Status update TASK_FINISHED 
> (UUID: ef7a4dfd-c403-483a-a6a7-c2cd995aa64e) for task 1 of framework 
> 20140717-090825-308511242-5050-15711-0044 from slave 
> 20140717-090821-325288458-5050-2360-1 at slave(1)@10.130.99.20:5051 
> (ustst020-cep-node3.usu.usu.grp)
> I0807 15:18:00.994935 15726 master.hpp:673] Removing task 1 with resources 
> cpus(*):1; mem(*):1 on slave 20140717-090821-325288458-5050-2360-1 
> (ustst020-cep-node3.usu.usu.grp)
> I0807 15:18:00.995511 15726 master.cpp:1988] Reviving offers for framework 
> 20140717-090825-308511242-5050-15711-0044
> I0807 15:18:00.995599 15725 hierarchical_allocator_process.hpp:636] Recovered 
> cpus(*):1; mem(*):1 (total allocatable: cpus(*):2; mem(*):2; disk(*):12526; 
> ports(*):[31000-32000]) on slave 20140717-090821-325288458-5050-2360-1 from 
> framework 20140717-090825-308511242-5050-15711-0044
> I0807 15:18:00.995846 15725 hierarchical_allocator_process.hpp:660] Removed 
> filters for framework 20140717-090825-308511242-5050-15711-0044
> I0807 15:18:01.055794 15730 master.cpp:1988] Reviving offers for framework 
> 20140717-090825-308511242-5050-15711-0044
> I0807 15:18:01.055982 15730 hierarchical_allocator_process.hpp:660] Removed 
> filters for framework 20140717-090825-308511242-5050-15711-0044
>
>

Reply via email to