I haven't seen that much memory overhead, I think my default is 512Mb (just a small test stack) on spark 1.4.x and i can run simple monte carlo simulations without the 'spike' of RAM usage when they deploy.
I'd assume something you're using is grabbing a lot of VM up front - one option you might want to try is to cap RSS but not virtual memory. On mesos 0.22.x last time I tested it, that'll allow tasks to spill into swap if they hit a memory cap - you can do that with the slave CLI if you want to try it. On 23 September 2015 at 12:45, Gary Ogden <gog...@gmail.com> wrote: > But the thing I don't get is why is it trying to take all 3GB at startup? > That seems excessive. So if I want to run a job that only needs 512MB, I > need to have 3GB free at all times? Doesn't make sense. > > We are using sparks native mesos support. On spark submit we use: --mesos > mesos://zk://prodMesosMaster01:2181,prodMesosMaster02:2181,prodMesosMaster03:2181/mesos > > And we followed the instructions here: > https://spark.apache.org/docs/1.2.0/running-on-mesos.html > > On 23 September 2015 at 08:22, Dick Davies <d...@hellooperator.net> wrote: >> >> I've had this working and never needed to mess with cgconfig.conf, in >> my experience mesos takes >> care of that for you. >> >> The memory requirement you set during marathon submit is what mesos >> will cap the task to. >> >> semi-unrelated question: why are you not using sparks native mesos support >> ? >> >> On 22 September 2015 at 15:19, oggie <gog...@gmail.com> wrote: >> > I'm using spark 1.2.2 on mesos 0.21 >> > >> > I have a java job that is submitted to mesos from marathon. >> > >> > I also have cgroups configured for mesos on each node. Even though the >> > job, >> > when running, uses 512MB, it tries to take over 3GB at startup and is >> > killed >> > by cgroups. >> > >> > When I start mesos-slave, It's started like this (we use supervisord): >> > command=/usr/sbin/mesos-slave --disk_watch_interval=10secs >> > --gc_delay=480mins --isolation=cgroups/cpu,cgroups/mem >> > --cgroups_hierarchy=/cgroup --resources="mem(* >> > ):3000;cpus(*):2;ports(*):[25000-30000];disk(*):5000" >> > --cgroups_root=mesos >> > >> > --master=zk://prodMesosMaster01:2181,prodMesosMaster02:2181,prodMesosMaster03:2181/me >> > sos --work_dir=/tmp/mesos --log_dir=/var/log/mesos >> > >> > In cgconfig.conf: >> > memory.limit_in_bytes="3221225472"; >> > >> > spark-submit from marathon: >> > bin/spark-submit --executor-memory 128m --master >> > >> > mesos://zk://prodMesosMaster01:2181,prodMesosMaster02:2181,prodMesosMaster03:2181/mesos >> > --class com.company.alert.AlertConsumer AlertConsumer.jar --zk >> > prodMesosMaster01:2181,prodMesosMaster02:2181,prodMesosMaster03:2181 >> > --mesos >> > >> > mesos://zk://prodMesosMaster01:2181,prodMesosMaster02:2181,prodMesosMaster03:2181/mesos >> > --spark_executor_uri >> > http://prodmesosfileserver01/spark-dist/1.2.2/spark-dist-1.2.2.tgz >> > >> > We increased the cgroup limit to 6GB and the memory resources from 3000 >> > to >> > 6000 for the startup of mesos and now cgroups doesn't kill the job >> > anymore. >> > >> > But the question is, how do I limit the start of the job so it isn't >> > trying >> > to take 3GB, even if when running it's only using 512MB? >> > >> > >> > >> > >> > -- >> > View this message in context: >> > http://apache-spark-user-list.1001560.n3.nabble.com/spark-on-mesos-gets-killed-by-cgroups-for-too-much-memory-tp24769.html >> > Sent from the Apache Spark User List mailing list archive at Nabble.com. >> > >> > --------------------------------------------------------------------- >> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> > For additional commands, e-mail: user-h...@spark.apache.org >> > > > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org