Thank you both, I will take a look, but
1. For high-shuffle tasks, is this right for the system to have the size and thresholds high? I hope there is no bad consequences. 2. I will try to overlook admin access and see if I can get anything with only user rights From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Wednesday, July 29, 2015 12:59 PM To: Ellafi, Saif A. Cc: <user@spark.apache.org> Subject: Re: Too many open files Please increase limit for open files: http://stackoverflow.com/questions/34588/how-do-i-change-the-number-of-open-files-limit-in-linux On Jul 29, 2015, at 8:39 AM, <saif.a.ell...@wellsfargo.com<mailto:saif.a.ell...@wellsfargo.com>> <saif.a.ell...@wellsfargo.com<mailto:saif.a.ell...@wellsfargo.com>> wrote: Hello, I’ve seen a couple emails on this issue but could not find anything to solve my situation. Tried to reduce the partitioning level, enable consolidateFiles and increase the sizeInFlight limit, but still no help. Spill manager is sort, which is the default, any advice? 15/07/29 10:37:01 WARN TaskSetManager: Lost task 34.0 in stage 11.0 (TID 331, localhost): FetchFailed(BlockManagerId(driver, localhost, 43437), shuffleId=3, mapId=0, reduceId=34, message= org.apache.spark.shuffle.FetchFailedException: /tmp/spark-71109b28-0f89-4e07-a521-5ff0a943472a/blockmgr-eda0751d-fd21-4229-93b0-2ee2546edf5a/0d/shuffle_3_0_0.index (Too many open files) .. .. 15/07/29 10:37:01 INFO Executor: Executor is trying to kill task 9.0 in stage 11.0 (TID 306) org.apache.spark.SparkException: Job aborted due to stage failure: Task 20 in stage 11.0 failed 1 times, most recent failure: Lost task 20.0 in stage 11.0 (TID 317, localhost): java.io.FileNotFoundException: /tmp/spark-71109b28-0f89-4e07-a521-5ff0a943472a/blockmgr-eda0751d-fd21-4229-93b0-2ee2546edf5a/1b/temp_shuffle_a3a9815a-677a-4342-94a2-1e083d758bcc (Too many open files) my fs is ext4 and currently ulist –n is 1024 Thanks Saif