for loops in pyspark

Alexander Czech Wed, 20 Sep 2017 05:13:58 -0700

Hello all,

I'm running a pyspark script that makes use of for loop to create smaller
chunks of my main dataset.


some example code:

for chunk in chunks:
    my_rdd = sc.parallelize(chunk).flatmap(somefunc)
    # do some stuff with my_rdd

    my_df = make_df(my_rdd)
    # do some stuff with my_df
    my_df.write.parquet('./some/path')

After a couple of loops I always start to loose executors because out of
memory errors. Is there a way free up memory after an loop? Do I have to do
it in python or with spark?

Thanks

for loops in pyspark

Reply via email to