Hello,

I currently have a task always failing with "java.io.FileNotFoundException:
[...]/shuffle_0_257_2155 (Too many open files)" when I run sorting
operations such as distinct, sortByKey, or reduceByKey on a large number of
partitions.

Im working with 365 GB of data which is being split into 5959 partitions.
The cluster Im using has over 1000GB of memory with 20GB of memory per node.

I have tried adding .set("spark.shuffle.consolidate.files",  "true") when
making my spark context but it doesnt seem to make a difference.

 Has anyone else had similar problems?

Best regards,

Matt

Reply via email to