Hello, I've seen a couple emails on this issue but could not find anything to solve my situation.
Tried to reduce the partitioning level, enable consolidateFiles and increase the sizeInFlight limit, but still no help. Spill manager is sort, which is the default, any advice? 15/07/29 10:37:01 WARN TaskSetManager: Lost task 34.0 in stage 11.0 (TID 331, localhost): FetchFailed(BlockManagerId(driver, localhost, 43437), shuffleId=3, mapId=0, reduceId=34, message= org.apache.spark.shuffle.FetchFailedException: /tmp/spark-71109b28-0f89-4e07-a521-5ff0a943472a/blockmgr-eda0751d-fd21-4229-93b0-2ee2546edf5a/0d/shuffle_3_0_0.index (Too many open files) .. .. 15/07/29 10:37:01 INFO Executor: Executor is trying to kill task 9.0 in stage 11.0 (TID 306) org.apache.spark.SparkException: Job aborted due to stage failure: Task 20 in stage 11.0 failed 1 times, most recent failure: Lost task 20.0 in stage 11.0 (TID 317, localhost): java.io.FileNotFoundException: /tmp/spark-71109b28-0f89-4e07-a521-5ff0a943472a/blockmgr-eda0751d-fd21-4229-93b0-2ee2546edf5a/1b/temp_shuffle_a3a9815a-677a-4342-94a2-1e083d758bcc (Too many open files) my fs is ext4 and currently ulist -n is 1024 Thanks Saif