On Wed, Apr 30, 2014 at 1:52 PM, wxhsdp <wxh...@gmail.com> wrote: > Hi, guys > > i want to do some optimizations of my spark codes. i use VisualVM to > monitor the executor when run the app. > here's the snapshot: > < > http://apache-spark-user-list.1001560.n3.nabble.com/file/n5107/executor.png > > > > from the snapshot, i can get the memory usage information about the > executor, but the executor contains lots of tasks. is it possible to get > the > memory usage of one single task in JVM with GC running in the background? >
I guess you could run 1-core slaves. That way they would only work on one task at a time. by the way, you can see every time when memory is consumed up to 90%, JVM > does GC operation. > i'am a little confused about that. i originally thought that 60% of the > memory is kept for Spark's memory cache(i did not cache any RDDs in my > application), so there was only 40% left for running the app. > The way I understand it, Spark does not have a tight control on the memory. Your code running on the executor can easily use more than 40% of memory. Spark only limits the memory used for RDD caches and shuffles. If its RDD caches are full, taking up 60% of the heap, and your code takes up more than 40% (after GC), the executor will die with OOM. I suppose there is not much Spark could do about this. You cannot control how much memory a function you call is allowed to use.