Patrick, thanks for the response. May I ask more questions?

 

I'm running a Spark Streaming application which receives data from socket and 
does some transformations.

 

The event injection rate is too high so the processing duration is larger than 
batch interval.

 

So I see "Could not compute split, block input-0-1414049609200 not found" issue 
as discussed by others in this post: 
"http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Could-not-compute-split-block-not-found-td11186.html#a11237";

 

If the understanding is correct, Spark is lack of storage in this case because 
of event pile-up, so it needs to delete some splits in order to free memory.

 

However, even in this case, I still see very small number (like 3MB) in the 
"Memory Used" column where the total memory seems to be quite big (like 6GB). 
So I think the number shown in this column may have problems.

 

How do Spark calculate the total memory based on allocated JVM heap size? I 
guess it's related with the "spark.storage.memoryFraction" configuration, but 
want to know the details.

And why the driver also uses memory to store RDD blocks?

 

Thanks again for the answer!

 

________________________________

From: Patrick Wendell [mailto:pwend...@gmail.com] 
Sent: 2014年10月23日 14:00
To: Haopu Wang
Cc: user
Subject: Re: About "Memory usage" in the Spark UI

 

It shows the amount of memory used to store RDD blocks, which are created when 
you run .cache()/.persist() on an RDD.

 

On Wed, Oct 22, 2014 at 10:07 PM, Haopu Wang <hw...@qilinsoft.com> wrote:

Hi, please take a look at the attached screen-shot. I wonders what's the 
"Memory Used" column mean.

 

I give 2GB memory to the driver process and 12GB memory to the executor process.

 

Thank you!

 

 

 

Reply via email to