In the Spark scheduler, the memory is always allocated for the executor.
So these few MBs are not relevant. In fact each executor has about 3 GB
memory in my settings.
During searching for the cause, I added minimal memory resources to the
tasks. Sorry, if this causes some confusion.
Am 07.08.2014 17:32, schrieb Adam Bordelon:
Seems strange that you only have 2MB of allocatable memory on your
slave ("total allocatable: cpus(*):2; mem(*):2;").
Try bumping that up to something like 2GB ("mem(*):2048") and I bet
you'll see more tasks able to run.
Even the default executor (no task) needs 32MB, so you won't be able
to do much with a mesos slave that has <64MB memory.
Are you explicitly setting a --resources flag on your slave? If not,
do you only have tiny VMs available for the slaves?
On Thu, Aug 7, 2014 at 7:03 AM, Martin Weindel
<[email protected] <mailto:[email protected]>> wrote:
I'm using Apache Mesos 0.19.0 together with Apache Spark 1.0.2 on
a three node cluster.
When using the fine-grained task scheduling mode of Spark, I
reproducably see some kind of dead lock on high load.
If multiple jobs are running, after some time the jobs do not
submit any tasks anymore.
I have added some more log output in the Scheduler implementation
of Spark and it looks as if Mesos does not make any offers
anymore, although there are allocatable resources.
Below is the log from Mesos. The last task is normally finished,
the resources recovered, the filters are removed, but the log
shows no "sending ... offers to framework" entries after this
timepoint.
I have tried to wake up the offers with a reviveOffers call I have
added to the Spark code, but with no effect.
The "Resources" section on the Mesos web UI shows all CPUs as
idle, none is used or offered.
If I kill all jobs but one, this last job continues and finishes
normally.
Is this a bug?
Thanks,
Martin
I0807 15:17:54.605695 15727 master.cpp:2933] Sending 1 offers to framework
20140717-090825-308511242-5050-15711-0044
I0807 15:17:54.615705 15732 master.cpp:1889] Processing reply for offers: [
20140717-090825-308511242-5050-15711-2132 ] on slave
20140717-090821-325288458-5050-2360-1 at slave(1)@10.130.99.20:5051
<http://10.130.99.20:5051> (ustst020-cep-node3.usu.usu.grp) for framework
20140717-090825-308511242-5050-15711-0044
I0807 15:17:54.615897 15732 master.hpp:655] Adding task 1 with resources
cpus(*):1; mem(*):1 on slave 20140717-090821-325288458-5050-2360-1
(ustst020-cep-node3.usu.usu.grp)
I0807 15:17:54.616029 15732 master.cpp:3111] Launching task 1 of framework
20140717-090825-308511242-5050-15711-0044 with resources cpus(*):1; mem(*):1 on slave
20140717-090821-325288458-5050-2360-1 at slave(1)@10.130.99.20:5051
<http://10.130.99.20:5051> (ustst020-cep-node3.usu.usu.grp)
I0807 15:17:54.616325 15732 hierarchical_allocator_process.hpp:589]
Framework 20140717-090825-308511242-5050-15711-0044 filtered slave
20140717-090821-325288458-5050-2360-1 for 8secs
I0807 15:17:58.324476 15728 master.cpp:2628] Status update TASK_RUNNING (UUID:
ec5ecf90-7313-4bf1-af9e-b5f6e35189f7) for task 1 of framework
20140717-090825-308511242-5050-15711-0044 from slave
20140717-090821-325288458-5050-2360-1 at slave(1)@10.130.99.20:5051
<http://10.130.99.20:5051> (ustst020-cep-node3.usu.usu.grp)
I0807 15:17:58.326279 15726 master.cpp:1988] Reviving offers for framework
20140717-090825-308511242-5050-15711-0044
I0807 15:17:58.326406 15732 hierarchical_allocator_process.hpp:660] Removed
filters for framework 20140717-090825-308511242-5050-15711-0044
I0807 15:18:00.993798 15726 master.cpp:2628] Status update TASK_FINISHED (UUID:
ef7a4dfd-c403-483a-a6a7-c2cd995aa64e) for task 1 of framework
20140717-090825-308511242-5050-15711-0044 from slave
20140717-090821-325288458-5050-2360-1 at slave(1)@10.130.99.20:5051
<http://10.130.99.20:5051> (ustst020-cep-node3.usu.usu.grp)
I0807 15:18:00.994935 15726 master.hpp:673] Removing task 1 with resources
cpus(*):1; mem(*):1 on slave 20140717-090821-325288458-5050-2360-1
(ustst020-cep-node3.usu.usu.grp)
I0807 15:18:00.995511 15726 master.cpp:1988] Reviving offers for framework
20140717-090825-308511242-5050-15711-0044
I0807 15:18:00.995599 15725 hierarchical_allocator_process.hpp:636]
Recovered cpus(*):1; mem(*):1 (total allocatable: cpus(*):2; mem(*):2;
disk(*):12526; ports(*):[31000-32000]) on slave
20140717-090821-325288458-5050-2360-1 from framework
20140717-090825-308511242-5050-15711-0044
I0807 15:18:00.995846 15725 hierarchical_allocator_process.hpp:660] Removed
filters for framework 20140717-090825-308511242-5050-15711-0044
I0807 15:18:01.055794 15730 master.cpp:1988] Reviving offers for framework
20140717-090825-308511242-5050-15711-0044
I0807 15:18:01.055982 15730 hierarchical_allocator_process.hpp:660] Removed
filters for framework 20140717-090825-308511242-5050-15711-0044