Hello all, I'm running a pyspark script that makes use of for loop to create smaller chunks of my main dataset.
some example code: for chunk in chunks: my_rdd = sc.parallelize(chunk).flatmap(somefunc) # do some stuff with my_rdd my_df = make_df(my_rdd) # do some stuff with my_df my_df.write.parquet('./some/path') After a couple of loops I always start to loose executors because out of memory errors. Is there a way free up memory after an loop? Do I have to do it in python or with spark? Thanks