Re: About Memory usage in the Spark UI

2014-10-23 Thread Patrick Wendell
It shows the amount of memory used to store RDD blocks, which are created
when you run .cache()/.persist() on an RDD.

On Wed, Oct 22, 2014 at 10:07 PM, Haopu Wang hw...@qilinsoft.com wrote:

  Hi, please take a look at the attached screen-shot. I wonders what's the
 Memory Used column mean.



 I give 2GB memory to the driver process and 12GB memory to the executor
 process.



 Thank you!






RE: About Memory usage in the Spark UI

2014-10-23 Thread Haopu Wang
Patrick, thanks for the response. May I ask more questions?

 

I'm running a Spark Streaming application which receives data from socket and 
does some transformations.

 

The event injection rate is too high so the processing duration is larger than 
batch interval.

 

So I see Could not compute split, block input-0-1414049609200 not found issue 
as discussed by others in this post: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Could-not-compute-split-block-not-found-td11186.html#a11237;

 

If the understanding is correct, Spark is lack of storage in this case because 
of event pile-up, so it needs to delete some splits in order to free memory.

 

However, even in this case, I still see very small number (like 3MB) in the 
Memory Used column where the total memory seems to be quite big (like 6GB). 
So I think the number shown in this column may have problems.

 

How do Spark calculate the total memory based on allocated JVM heap size? I 
guess it's related with the spark.storage.memoryFraction configuration, but 
want to know the details.

And why the driver also uses memory to store RDD blocks?

 

Thanks again for the answer!

 



From: Patrick Wendell [mailto:pwend...@gmail.com] 
Sent: 2014年10月23日 14:00
To: Haopu Wang
Cc: user
Subject: Re: About Memory usage in the Spark UI

 

It shows the amount of memory used to store RDD blocks, which are created when 
you run .cache()/.persist() on an RDD.

 

On Wed, Oct 22, 2014 at 10:07 PM, Haopu Wang hw...@qilinsoft.com wrote:

Hi, please take a look at the attached screen-shot. I wonders what's the 
Memory Used column mean.

 

I give 2GB memory to the driver process and 12GB memory to the executor process.

 

Thank you!

 

 

 



Re: About Memory usage in the Spark UI

2014-10-23 Thread Tathagata Das
The memory usage of blocks of data received through Spark Streaming is not
reflected in the Spark UI. It only shows the memory usage due to cached
RDDs.
I didnt find a JIRA for this, so I opened a new one.

https://issues.apache.org/jira/browse/SPARK-4072


TD

On Thu, Oct 23, 2014 at 12:47 AM, Haopu Wang hw...@qilinsoft.com wrote:

  Patrick, thanks for the response. May I ask more questions?



 I'm running a Spark Streaming application which receives data from socket
 and does some transformations.



 The event injection rate is too high so the processing duration is larger
 than batch interval.



 So I see Could not compute split, block input-0-1414049609200 not found
 issue as discussed by others in this post: 
 http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Could-not-compute-split-block-not-found-td11186.html#a11237
 



 If the understanding is correct, Spark is lack of storage in this case
 because of event pile-up, so it needs to delete some splits in order to
 free memory.



 However, even in this case, I still see very small number (like 3MB) in
 the Memory Used column where the total memory seems to be quite big (like
 6GB). So I think the number shown in this column may have problems.



 How do Spark calculate the total memory based on allocated JVM heap size?
 I guess it's related with the spark.storage.memoryFraction configuration,
 but want to know the details.

 And why the driver also uses memory to store RDD blocks?



 Thanks again for the answer!


  --

 *From:* Patrick Wendell [mailto:pwend...@gmail.com]
 *Sent:* 2014年10月23日 14:00
 *To:* Haopu Wang
 *Cc:* user
 *Subject:* Re: About Memory usage in the Spark UI



 It shows the amount of memory used to store RDD blocks, which are created
 when you run .cache()/.persist() on an RDD.



 On Wed, Oct 22, 2014 at 10:07 PM, Haopu Wang hw...@qilinsoft.com wrote:

 Hi, please take a look at the attached screen-shot. I wonders what's the
 Memory Used column mean.



 I give 2GB memory to the driver process and 12GB memory to the executor
 process.



 Thank you!







RE: About Memory usage in the Spark UI

2014-10-23 Thread Haopu Wang
TD, thanks for the clarification.

 

From the UI, it looks like the driver also allocate memory to store blocks, 
what's the purpose for that because I think driver doesn't need to run tasks?

 



From: Tathagata Das [mailto:tathagata.das1...@gmail.com] 
Sent: 2014年10月24日 8:07
To: Haopu Wang
Cc: Patrick Wendell; user
Subject: Re: About Memory usage in the Spark UI

 

The memory usage of blocks of data received through Spark Streaming is not 
reflected in the Spark UI. It only shows the memory usage due to cached RDDs.

I didnt find a JIRA for this, so I opened a new one. 

 

https://issues.apache.org/jira/browse/SPARK-4072

 

 

TD

 

On Thu, Oct 23, 2014 at 12:47 AM, Haopu Wang hw...@qilinsoft.com wrote:

Patrick, thanks for the response. May I ask more questions?

 

I'm running a Spark Streaming application which receives data from socket and 
does some transformations.

 

The event injection rate is too high so the processing duration is larger than 
batch interval.

 

So I see Could not compute split, block input-0-1414049609200 not found issue 
as discussed by others in this post: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Could-not-compute-split-block-not-found-td11186.html#a11237;

 

If the understanding is correct, Spark is lack of storage in this case because 
of event pile-up, so it needs to delete some splits in order to free memory.

 

However, even in this case, I still see very small number (like 3MB) in the 
Memory Used column where the total memory seems to be quite big (like 6GB). 
So I think the number shown in this column may have problems.

 

How do Spark calculate the total memory based on allocated JVM heap size? I 
guess it's related with the spark.storage.memoryFraction configuration, but 
want to know the details.

And why the driver also uses memory to store RDD blocks?

 

Thanks again for the answer!

 



From: Patrick Wendell [mailto:pwend...@gmail.com] 
Sent: 2014年10月23日 14:00
To: Haopu Wang
Cc: user
Subject: Re: About Memory usage in the Spark UI

 

It shows the amount of memory used to store RDD blocks, which are created when 
you run .cache()/.persist() on an RDD.

 

On Wed, Oct 22, 2014 at 10:07 PM, Haopu Wang hw...@qilinsoft.com wrote:

Hi, please take a look at the attached screen-shot. I wonders what's the 
Memory Used column mean.

 

I give 2GB memory to the driver process and 12GB memory to the executor process.

 

Thank you!

 

 

 

 



About Memory usage in the Spark UI

2014-10-22 Thread Haopu Wang
Hi, please take a look at the attached screen-shot. I wonders what's the
Memory Used column mean.

 

I give 2GB memory to the driver process and 12GB memory to the executor
process.

 

Thank you!