Hello everyone:

I have been facing a problem associated spark streaming memory.

I have been running two Spark Streaming jobs concurrently. The jobs read
data from Kafka with a batch interval of 1 minute, performs aggregation, and
sinks the computed data to MongoDB using using stratio-mongodb connector.

I have setup the spark standalone cluster on AWS. My setup is configured as
follows: I have a four-node cluster. One node as a master, and the rest
3-nodes as workers, while each worker has only one executor, with 2-cores
and 8GB of RAM.

Currently, I am processing seven-hundred thousand JSON events, every minute.
After running the jobs for 3-4 hours, I have observed that the memory
consumption keeps growing, exiting one of the jobs.

Despite setting /spark.cleaner.ttl/ for 600 seconds, and having used
/rdd.unpersist/ method at the end of the job. I am not able to understand
why the memory consumption keeps growing over time. I am unable solve this
problem. I would appreciate if someone can help me solve or provide
redirections as to why this is happening.

Thank you.




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Memory-grows-exponentially-tp27308.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to