Hello everyone: I have been facing a problem associated spark streaming memory.
I have been running two Spark Streaming jobs concurrently. The jobs read data from Kafka with a batch interval of 1 minute, performs aggregation, and sinks the computed data to MongoDB using using stratio-mongodb connector. I have setup the spark standalone cluster on AWS. My setup is configured as follows: I have a four-node cluster. One node as a master, and the rest 3-nodes as workers, while each worker has only one executor, with 2-cores and 8GB of RAM. Currently, I am processing seven-hundred thousand JSON events, every minute. After running the jobs for 3-4 hours, I have observed that the memory consumption keeps growing, exiting one of the jobs. Despite setting /spark.cleaner.ttl/ for 600 seconds, and having used /rdd.unpersist/ method at the end of the job. I am not able to understand why the memory consumption keeps growing over time. I am unable solve this problem. I would appreciate if someone can help me solve or provide redirections as to why this is happening. Thank you. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Memory-grows-exponentially-tp27308.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org