I reduced the 'state timeout' from 10 minutes to 2 minutes so that memory
would be released quicker & the new numbers for Storage Memory are: 54.7GB
out of 598.5GB BUT I still don't trust these numbers. As Amit pointed out,
it seems there's a bug in the Spark 2.4 UI.

I am requesting 2TB of Memory but the UI keeps showing 598.5GB. I am not
exactly sure if it's a BUG in Spark 2.4 UI OR our cluster is indeed not
giving my job enough memory!





On Sun, Jan 10, 2021 at 12:32 AM Amit Sharma <resolve...@gmail.com> wrote:

> I believe it’s a spark Ui issue which do not display correct value. I
> believe it is resolved for spark 3.0.
>
> Thanks
> Amit
>
> On Fri, Jan 8, 2021 at 4:00 PM Luca Canali <luca.can...@cern.ch> wrote:
>
>> You report 'Storage Memory': 3.3TB/ 598.5 GB -> The first number is the
>> memory used for storage, the second one is the available memory (for
>> storage) in the unified memory pool.
>>
>> The used memory shown in your webui snippet is indeed quite high (higher
>> than the available memory!? ), you can probably profit by drilling down on
>> that to understand better what is happening.
>>
>> For example look at the details per executor (the numbers you reported
>> are aggregated values), then also look at the “storage tab” for a list of
>> cached RDDs with details.
>>
>> In case, Spark 3.0 has improved memory instrumentation and improved
>> instrumentation for streaming, so you can you profit from testing there too.
>>
>>
>>
>>
>>
>> *From:* Eric Beabes <mailinglist...@gmail.com>
>> *Sent:* Friday, January 8, 2021 04:23
>> *To:* Luca Canali <luca.can...@cern.ch>
>> *Cc:* spark-user <user@spark.apache.org>
>> *Subject:* Re: Understanding Executors UI
>>
>>
>>
>> So when I see this for 'Storage Memory': *3.3TB/ 598.5 GB* *- it's
>> telling me that Spark is using 3.3 TB of memory & 598.5 GB is used for
>> caching data, correct?* What I am surprised about is that these numbers
>> don't change at all throughout the day even though the load on the system
>> is low after 5pm PST.
>>
>>
>>
>> I would expect the "Memory used" to be lower than 3.3Tb after 5pm PST.
>>
>>
>>
>> Does Spark 3.0 do a better job of memory management? Wondering if
>> upgrading to Spark 3.0 would improve performance?
>>
>>
>>
>>
>>
>> On Wed, Jan 6, 2021 at 2:29 PM Luca Canali <luca.can...@cern.ch> wrote:
>>
>> Hi Eric,
>>
>>
>>
>> A few links, in case they can be useful for your troubleshooting:
>>
>>
>>
>> The Spark Web UI is documented in Spark 3.x documentation, although you
>> can use most of it for Spark 2.4 too:
>> https://spark.apache.org/docs/latest/web-ui.html
>>
>>
>>
>> Spark memory management is documented at
>> https://spark.apache.org/docs/latest/tuning.html#memory-management-overview
>>
>>
>> Additional resource: see also this diagram
>> https://canali.web.cern.ch/docs/SparkExecutorMemory.png  and
>> https://db-blog.web.cern.ch/blog/luca-canali/2020-08-spark3-memory-monitoring
>>
>>
>>
>> Best,
>>
>> Luca
>>
>>
>>
>> *From:* Eric Beabes <mailinglist...@gmail.com>
>> *Sent:* Wednesday, January 6, 2021 00:20
>> *To:* spark-user <user@spark.apache.org>
>> *Subject:* Understanding Executors UI
>>
>>
>>
>> [image: image.png]
>>
>>
>>
>>
>>
>> Not sure if this image will go through. (Never sent an email to this
>> mailing list with an image).
>>
>>
>>
>> I am trying to understand this 'Executors' UI in Spark 2.4. I have a
>> Stateful Structured Streaming job with 'State timeout' set to 10 minutes.
>> When the load on the system is low a message gets written to Kafka
>> immediately after the State times out BUT under heavy load it takes over 40
>> minutes to get a message on the output topic. Trying to debug this issue &
>> see if performance can be improved.
>>
>>
>>
>> Questions:
>>
>>
>>
>> 1) I am requesting 3.2 TB of memory but it seems the job keeps using only
>> 598.5 GB as per the values in 'Storage Memory' as well as 'On Heap Storage
>> Memory'. Wondering if this is a Cluster issue OR am I not setting values
>> correctly?
>>
>> 2) Where can I find documentation to understand different 'Tabs' in the
>> Spark UI? (Sorry, Googling didn't help. I will keep searching.)
>>
>>
>>
>> Any pointers would be appreciated. Thanks.
>>
>>
>>
>>

Reply via email to