Swapping is pretty bad here, especially because a JVM-based won't even feel the memory pressure and try to GC or shrink the heap when the OS faces memory pressure. It's probably relatively worse than in M/R because Spark uses memory more. Enough grinding in swap will cause tasks to fail due to timeouts, and because these failures are pretty correlated, will cause jobs to die, messily. For that reason I think you always want to disable swap, all the more so because disk I/O tends to be a bottleneck.
If you're using YARN, I do find its design encourages, kind of on purpose, under-subscription of resources. You can probably safely over-subscribe YARN memory, without resorting to swap. On Mon, Nov 7, 2016 at 5:29 PM Michael Segel <msegel_had...@hotmail.com> wrote: > This may seem like a silly question, but it really isn’t. > In terms of Map/Reduce, its possible to over subscribe the cluster because > there is a lack of sensitivity if the servers swap memory to disk. > > In terms of HBase, which is very sensitive, swap doesn’t just kill > performance, but also can kill HBase. (I’m sure one can tune it to be less > sensitive…) > > But I have to ask how sensitive is Spark? > Considering we can cache to disk (local disk) it would imply that it would > less sensitive. > Yet we see some posters facing over subscription and hitting OOME. > > Thoughts? > > > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org >