My understanding is that the spark.executor.cores setting controls the number of worker threads in the executor in the JVM. Each worker thread communicates then with a pyspark daemon process (these are not threads) to stream data into Python. There should be one daemon process per worker thread (but as I mentioned I sometimes see a low multiple).
Your 4GB limit for Python is fairly high, that means even for 12 workers you're looking at a max of 48GB (and it goes frequently beyond that). You will be better off using a lower number there and instead increasing the parallelism of your job (i.e. dividing the job into more and smaller partitions). On Sat, Mar 26, 2016 at 7:10 AM, Carlile, Ken <carli...@janelia.hhmi.org> wrote: > Thanks, Sven! > > I know that I’ve messed up the memory allocation, but I’m trying not to > think too much about that (because I’ve advertised it to my users as “90GB > for Spark works!” and that’s how it displays in the Spark UI (totally > ignoring the python processes). So I’ll need to deal with that at some > point… esp since I’ve set the max python memory usage to 4GB to work around > other issues! > > The load issue comes in because we have a lot of background cron jobs > (mostly to clean up after spark…), and those will stack up behind the high > load and keep stacking until the whole thing comes crashing down. I will > look into how to avoid this stacking, as I think one of my predecessors had > a way, but that’s why the high load nukes the nodes. I don’t have the > spark.executor.cores set, but will setting that to say, 12 limit the > pyspark threads, or will it just limit the jvm threads? > > Thanks! > Ken > > On Mar 25, 2016, at 9:10 PM, Sven Krasser <kras...@gmail.com> wrote: > > Hey Ken, > > I also frequently see more pyspark daemons than configured concurrency, > often it's a low multiple. (There was an issue pre-1.3.0 that caused this > to be quite a bit higher, so make sure you at least have a recent version; > see SPARK-5395.) > > Each pyspark daemon tries to stay below the configured memory limit during > aggregation (which is separate from the JVM heap as you note). Since the > number of daemons can be high and the memory limit is per daemon (each > daemon is actually a process and not a thread and therefore has its own > memory it tracks against the configured per-worker limit), I found memory > depletion to be the main source of pyspark problems on larger data sets. > Also, as Sea already noted the memory limit is not firm and individual > daemons can grow larger. > > With that said, a run queue of 25 on a 16 core machine does not sound > great but also not awful enough to knock it offline. I suspect something > else may be going on. If you want to limit the amount of work running > concurrently, try reducing spark.executor.cores (under normal circumstances > this would leave parts of your resources underutilized). > > Hope this helps! > -Sven > > > On Fri, Mar 25, 2016 at 10:41 AM, Carlile, Ken <carli...@janelia.hhmi.org> > wrote: > >> Further data on this. >> I’m watching another job right now where there are 16 pyspark.daemon >> threads, all of which are trying to get a full core (remember, this is a 16 >> core machine). Unfortunately , the java process actually running the spark >> worker is trying to take several cores of its own, driving the load up. I’m >> hoping someone has seen something like this. >> >> —Ken >> >> On Mar 21, 2016, at 3:07 PM, Carlile, Ken <carli...@janelia.hhmi.org> >> wrote: >> >> No further input on this? I discovered today that the pyspark.daemon >> threadcount was actually 48, which makes a little more sense (at least it’s >> a multiple of 16), and it seems to be happening at reduce and collect >> portions of the code. >> >> —Ken >> >> On Mar 17, 2016, at 10:51 AM, Carlile, Ken <carli...@janelia.hhmi.org> >> wrote: >> >> Thanks! I found that part just after I sent the email… whoops. I’m >> guessing that’s not an issue for my users, since it’s been set that way for >> a couple of years now. >> >> The thread count is definitely an issue, though, since if enough nodes go >> down, they can’t schedule their spark clusters. >> >> —Ken >> >> On Mar 17, 2016, at 10:50 AM, Ted Yu <yuzhih...@gmail.com> wrote: >> >> I took a look at docs/configuration.md >> Though I didn't find answer for your first question, I think the >> following pertains to your second question: >> >> <tr> >> <td><code>spark.python.worker.memory</code></td> >> <td>512m</td> >> <td> >> Amount of memory to use per python worker process during aggregation, >> in the same >> format as JVM memory strings (e.g. <code>512m</code>, >> <code>2g</code>). If the memory >> used during aggregation goes above this amount, it will spill the >> data into disks. >> </td> >> </tr> >> >> On Thu, Mar 17, 2016 at 7:43 AM, Carlile, Ken <carli...@janelia.hhmi.org> >> wrote: >> >>> Hello, >>> >>> We have an HPC cluster that we run Spark jobs on using standalone mode >>> and a number of scripts I’ve built up to dynamically schedule and start >>> spark clusters within the Grid Engine framework. Nodes in the cluster have >>> 16 cores and 128GB of RAM. >>> >>> My users use pyspark heavily. We’ve been having a number of problems >>> with nodes going offline with extraordinarily high load. I was able to look >>> at one of those nodes today before it went truly sideways, and I discovered >>> that the user was running 50 pyspark.daemon threads (remember, this is a 16 >>> core box), and the load was somewhere around 25 or so, with all CPUs maxed >>> out at 100%. >>> >>> So while the spark worker is aware it’s only got 16 cores and behaves >>> accordingly, pyspark seems to be happy to overrun everything like crazy. Is >>> there a global parameter I can use to limit pyspark threads to a sane >>> number, say 15 or 16? It would also be interesting to set a memory limit, >>> which leads to another question. >>> >>> How is memory managed when pyspark is used? I have the spark worker >>> memory set to 90GB, and there is 8GB of system overhead (GPFS caching), so >>> if pyspark operates outside of the JVM memory pool, that leaves it at most >>> 30GB to play with, assuming there is no overhead outside the JVM’s 90GB >>> heap (ha ha.) >>> >>> Thanks, >>> Ken Carlile >>> Sr. Unix Engineer >>> HHMI/Janelia Research Campus >>> 571-209-4363 >>> >>> >> >> >> Т���������������������������������������������������������������������ХF�V�7V'67&�&R�R���âW6W"�V�7V'67&�&T7&��6�R��&pФf�"FF�F����6����G2�R���âW6W"ֆV�7&��6�R��&pР >> >> >> --------------------------------------------------------------------- To >> unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional >> commands, e-mail: user-h...@spark.apache.org >> >> >> > > > -- > www.skrasser.com <http://www.skrasser.com/?utm_source=sig> > > > -- www.skrasser.com <http://www.skrasser.com/?utm_source=sig>