Re: About Memory usage in the Spark UI
It shows the amount of memory used to store RDD blocks, which are created when you run .cache()/.persist() on an RDD. On Wed, Oct 22, 2014 at 10:07 PM, Haopu Wang hw...@qilinsoft.com wrote: Hi, please take a look at the attached screen-shot. I wonders what's the Memory Used column mean. I give 2GB memory to the driver process and 12GB memory to the executor process. Thank you!
RE: About Memory usage in the Spark UI
Patrick, thanks for the response. May I ask more questions? I'm running a Spark Streaming application which receives data from socket and does some transformations. The event injection rate is too high so the processing duration is larger than batch interval. So I see Could not compute split, block input-0-1414049609200 not found issue as discussed by others in this post: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Could-not-compute-split-block-not-found-td11186.html#a11237; If the understanding is correct, Spark is lack of storage in this case because of event pile-up, so it needs to delete some splits in order to free memory. However, even in this case, I still see very small number (like 3MB) in the Memory Used column where the total memory seems to be quite big (like 6GB). So I think the number shown in this column may have problems. How do Spark calculate the total memory based on allocated JVM heap size? I guess it's related with the spark.storage.memoryFraction configuration, but want to know the details. And why the driver also uses memory to store RDD blocks? Thanks again for the answer! From: Patrick Wendell [mailto:pwend...@gmail.com] Sent: 2014年10月23日 14:00 To: Haopu Wang Cc: user Subject: Re: About Memory usage in the Spark UI It shows the amount of memory used to store RDD blocks, which are created when you run .cache()/.persist() on an RDD. On Wed, Oct 22, 2014 at 10:07 PM, Haopu Wang hw...@qilinsoft.com wrote: Hi, please take a look at the attached screen-shot. I wonders what's the Memory Used column mean. I give 2GB memory to the driver process and 12GB memory to the executor process. Thank you!
Re: About Memory usage in the Spark UI
The memory usage of blocks of data received through Spark Streaming is not reflected in the Spark UI. It only shows the memory usage due to cached RDDs. I didnt find a JIRA for this, so I opened a new one. https://issues.apache.org/jira/browse/SPARK-4072 TD On Thu, Oct 23, 2014 at 12:47 AM, Haopu Wang hw...@qilinsoft.com wrote: Patrick, thanks for the response. May I ask more questions? I'm running a Spark Streaming application which receives data from socket and does some transformations. The event injection rate is too high so the processing duration is larger than batch interval. So I see Could not compute split, block input-0-1414049609200 not found issue as discussed by others in this post: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Could-not-compute-split-block-not-found-td11186.html#a11237 If the understanding is correct, Spark is lack of storage in this case because of event pile-up, so it needs to delete some splits in order to free memory. However, even in this case, I still see very small number (like 3MB) in the Memory Used column where the total memory seems to be quite big (like 6GB). So I think the number shown in this column may have problems. How do Spark calculate the total memory based on allocated JVM heap size? I guess it's related with the spark.storage.memoryFraction configuration, but want to know the details. And why the driver also uses memory to store RDD blocks? Thanks again for the answer! -- *From:* Patrick Wendell [mailto:pwend...@gmail.com] *Sent:* 2014年10月23日 14:00 *To:* Haopu Wang *Cc:* user *Subject:* Re: About Memory usage in the Spark UI It shows the amount of memory used to store RDD blocks, which are created when you run .cache()/.persist() on an RDD. On Wed, Oct 22, 2014 at 10:07 PM, Haopu Wang hw...@qilinsoft.com wrote: Hi, please take a look at the attached screen-shot. I wonders what's the Memory Used column mean. I give 2GB memory to the driver process and 12GB memory to the executor process. Thank you!
RE: About Memory usage in the Spark UI
TD, thanks for the clarification. From the UI, it looks like the driver also allocate memory to store blocks, what's the purpose for that because I think driver doesn't need to run tasks? From: Tathagata Das [mailto:tathagata.das1...@gmail.com] Sent: 2014年10月24日 8:07 To: Haopu Wang Cc: Patrick Wendell; user Subject: Re: About Memory usage in the Spark UI The memory usage of blocks of data received through Spark Streaming is not reflected in the Spark UI. It only shows the memory usage due to cached RDDs. I didnt find a JIRA for this, so I opened a new one. https://issues.apache.org/jira/browse/SPARK-4072 TD On Thu, Oct 23, 2014 at 12:47 AM, Haopu Wang hw...@qilinsoft.com wrote: Patrick, thanks for the response. May I ask more questions? I'm running a Spark Streaming application which receives data from socket and does some transformations. The event injection rate is too high so the processing duration is larger than batch interval. So I see Could not compute split, block input-0-1414049609200 not found issue as discussed by others in this post: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Could-not-compute-split-block-not-found-td11186.html#a11237; If the understanding is correct, Spark is lack of storage in this case because of event pile-up, so it needs to delete some splits in order to free memory. However, even in this case, I still see very small number (like 3MB) in the Memory Used column where the total memory seems to be quite big (like 6GB). So I think the number shown in this column may have problems. How do Spark calculate the total memory based on allocated JVM heap size? I guess it's related with the spark.storage.memoryFraction configuration, but want to know the details. And why the driver also uses memory to store RDD blocks? Thanks again for the answer! From: Patrick Wendell [mailto:pwend...@gmail.com] Sent: 2014年10月23日 14:00 To: Haopu Wang Cc: user Subject: Re: About Memory usage in the Spark UI It shows the amount of memory used to store RDD blocks, which are created when you run .cache()/.persist() on an RDD. On Wed, Oct 22, 2014 at 10:07 PM, Haopu Wang hw...@qilinsoft.com wrote: Hi, please take a look at the attached screen-shot. I wonders what's the Memory Used column mean. I give 2GB memory to the driver process and 12GB memory to the executor process. Thank you!
About Memory usage in the Spark UI
Hi, please take a look at the attached screen-shot. I wonders what's the Memory Used column mean. I give 2GB memory to the driver process and 12GB memory to the executor process. Thank you!