The spark-shell process alone shouldn't take up that much memory, at least in my experience. Have you dumped the heap to see what's all in there? What environment are you running spark in?
Doing stuff like RDD.collect() or .countByKey will pull potentially a lot of data the spark-shell heap. Another thing thing that can fill up the spark master process heap (which is also run in the spark-shell process) is running lots of jobs, the logged SparkEvents of which stick around in order for the UI to render. There are some options under `spark.ui.retained*` to limit that if it's a problem. On Mon, Jan 9, 2017 at 6:00 PM, Kevin Burton <bur...@spinn3r.com> wrote: > We've had various OOM issues with spark and have been trying to track them > down one by one. > > Now we have one in spark-shell which is super surprising. > > We currently allocate 6GB to spark shell, as confirmed via 'ps' > > Why the heck would the *shell* need that much memory. > > I'm going to try to give it more of course but would be nice to know if > this is a legitimate memory constraint or there is a bug somewhere. > > PS: One thought I had was that it would be nice to have spark keep track > of where an OOM was encountered, in what component. > > Kevin > > > -- > > We’re hiring if you know of any awesome Java Devops or Linux Operations > Engineers! > > Founder/CEO Spinn3r.com > Location: *San Francisco, CA* > blog: http://burtonator.wordpress.com > … or check out my Google+ profile > <https://plus.google.com/102718274791889610666/posts> > > -- *CONFIDENTIALITY NOTICE: This email message, and any documents, files or previous e-mail messages attached to it is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.*