It`s quite impossible for anyone to answer your question about what is
eating your memory, without even knowing what language you are using.

If you are using C then it`s always pointers, that's the mem issue.
If you are using python, there can be some like not using context manager
like With Context Managers and Python's with Statement
And another can be not to close resources after use.

In my experience you can process 3 years or more of data, IF you are
closing opened resources.
I use the web GUI http://spark:4040 to follow what spark is doing.

ons. 30. mar. 2022

> Thanks for answer-much appreciated! This forum is very useful :-)
> I didnt know the sparkcontext stays alive. I guess this is eating up
> memory.  The eviction means that he knows that he should clear some of the
> old cached memory to be able to store new one. In case anyone has good
> articles about memory leaks I would be interested to read.
> I will try to add following lines at the end of my job (as I cached the
> table in spark sql):
> *sqlContext.sql("UNCACHE TABLE mytableofinterest ")*
> *spark.stop()*
> Wrt looping: if I want to process 3 years of data, my modest cluster will
> never do it one go , I would expect? I have to break it down in smaller
> pieces and run that in a loop (1 day is already lots of data).
> Thanks!
On 30 Mar 2022
> The Spark context does not stop when a job does. It stops when you stop
> it. There could be many ways mem can leak. Caching maybe - but it will
> evict. You should be clearing caches when no longer needed.
> I would guess it is something else your program holds on to in its logic.
> Also consider not looping; there is probably a faster way to do it in one
> go.
On Wed, Mar 30, 2022
> wrote:
>> Hi,
>> I have a pyspark job submitted through spark-submit that does some heavy
>> processing for 1 day of data. It runs with no errors. I have to loop over
>> many days, so I run this spark job in a loop. I notice after couple
>> executions the memory is increasing on all worker nodes and eventually this
>> leads to faillures. My job does some caching, but I understand that when
>> the job ends successfully, then the sparkcontext is destroyed and the cache
>> should be cleared. However it seems that something keeps on filling the
>> memory a bit more and more after each run. THis is the memory behaviour
>> over time, which in the end will start leading to failures :
>> (what we see is: green=physical memory used, green-blue=physical memory
>> cached, grey=memory capacity =straight line around 31GB )
>> This runs on a healthy spark 2.4 and was optimized already to come to a
>> stable job in terms of spark-submit resources parameters like
>> driver-memory/num-executors/executor-memory/executor-cores/spark.locality.wait).
>> Any clue how to “really” clear the memory in between jobs? So basically
>> currently I can loop 10x and then need to restart my cluster so all memory
>> is cleared completely.
>> Thanks for any info!
>> <Screenshot 2022-03-30 at 15.28.24.png>

