Patrick, thanks for the response. May I ask more questions?
I'm running a Spark Streaming application which receives data from socket and does some transformations. The event injection rate is too high so the processing duration is larger than batch interval. So I see "Could not compute split, block input-0-1414049609200 not found" issue as discussed by others in this post: "http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Could-not-compute-split-block-not-found-td11186.html#a11237" If the understanding is correct, Spark is lack of storage in this case because of event pile-up, so it needs to delete some splits in order to free memory. However, even in this case, I still see very small number (like 3MB) in the "Memory Used" column where the total memory seems to be quite big (like 6GB). So I think the number shown in this column may have problems. How do Spark calculate the total memory based on allocated JVM heap size? I guess it's related with the "spark.storage.memoryFraction" configuration, but want to know the details. And why the driver also uses memory to store RDD blocks? Thanks again for the answer! ________________________________ From: Patrick Wendell [mailto:pwend...@gmail.com] Sent: 2014年10月23日 14:00 To: Haopu Wang Cc: user Subject: Re: About "Memory usage" in the Spark UI It shows the amount of memory used to store RDD blocks, which are created when you run .cache()/.persist() on an RDD. On Wed, Oct 22, 2014 at 10:07 PM, Haopu Wang <hw...@qilinsoft.com> wrote: Hi, please take a look at the attached screen-shot. I wonders what's the "Memory Used" column mean. I give 2GB memory to the driver process and 12GB memory to the executor process. Thank you!