RE: About "Memory usage" in the Spark UI

Haopu Wang Thu, 23 Oct 2014 18:31:27 -0700

TD, thanks for the clarification.

>From the UI, it looks like the driver also allocate memory to store blocks, 
>what's the purpose for that because I think driver doesn't need to run tasks?

________________________________

From: Tathagata Das [mailto:tathagata.das1...@gmail.com] 
Sent: 2014年10月24日 8:07
To: Haopu Wang
Cc: Patrick Wendell; user
Subject: Re: About "Memory usage" in the Spark UI

The memory usage of blocks of data received through Spark Streaming is not 
reflected in the Spark UI. It only shows the memory usage due to cached RDDs.

I didnt find a JIRA for this, so I opened a new one. 

https://issues.apache.org/jira/browse/SPARK-4072

TD

On Thu, Oct 23, 2014 at 12:47 AM, Haopu Wang <hw...@qilinsoft.com> wrote:

Patrick, thanks for the response. May I ask more questions?

I'm running a Spark Streaming application which receives data from socket and 
does some transformations.

The event injection rate is too high so the processing duration is larger than 
batch interval.

So I see "Could not compute split, block input-0-1414049609200 not found" issue 
as discussed by others in this post: 
"http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Could-not-compute-split-block-not-found-td11186.html#a11237";

If the understanding is correct, Spark is lack of storage in this case because 
of event pile-up, so it needs to delete some splits in order to free memory.

However, even in this case, I still see very small number (like 3MB) in the 
"Memory Used" column where the total memory seems to be quite big (like 6GB). 
So I think the number shown in this column may have problems.

How do Spark calculate the total memory based on allocated JVM heap size? I 
guess it's related with the "spark.storage.memoryFraction" configuration, but 
want to know the details.

And why the driver also uses memory to store RDD blocks?

Thanks again for the answer!

________________________________

From: Patrick Wendell [mailto:pwend...@gmail.com] 
Sent: 2014年10月23日 14:00
To: Haopu Wang
Cc: user
Subject: Re: About "Memory usage" in the Spark UI

It shows the amount of memory used to store RDD blocks, which are created when 
you run .cache()/.persist() on an RDD.

On Wed, Oct 22, 2014 at 10:07 PM, Haopu Wang <hw...@qilinsoft.com> wrote:

Hi, please take a look at the attached screen-shot. I wonders what's the 
"Memory Used" column mean.

I give 2GB memory to the driver process and 12GB memory to the executor process.

Thank you!

RE: About "Memory usage" in the Spark UI

Reply via email to