Hello all,

I'm running a pyspark script that makes use of for loop to create smaller
chunks of my main dataset.

some example code:

for chunk in chunks:
    my_rdd = sc.parallelize(chunk).flatmap(somefunc)
    # do some stuff with my_rdd

    my_df = make_df(my_rdd)
    # do some stuff with my_df
    my_df.write.parquet('./some/path')

After a couple of loops I always start to loose executors because out of
memory errors. Is there a way free up memory after an loop? Do I have to do
it in python or with spark?

Thanks

Reply via email to