you probably should increase file handles limit for user that all processes are running with(spark master & workers) e.g. http://www.cyberciti.biz/faq/linux-increase-the-maximum-number-of-open-files/
On 29 July 2015 at 18:39, <saif.a.ell...@wellsfargo.com> wrote: > Hello, > > I’ve seen a couple emails on this issue but could not find anything to > solve my situation. > > Tried to reduce the partitioning level, enable consolidateFiles and > increase the sizeInFlight limit, but still no help. Spill manager is sort, > which is the default, any advice? > > 15/07/29 10:37:01 WARN TaskSetManager: Lost task 34.0 in stage 11.0 (TID > 331, localhost): FetchFailed(BlockManagerId(driver, localhost, 43437), > shuffleId=3, mapId=0, reduceId=34, message= > org.apache.spark.shuffle.FetchFailedException: > /tmp/spark-71109b28-0f89-4e07-a521-5ff0a943472a/blockmgr-eda0751d-fd21-4229-93b0-2ee2546edf5a/0d/shuffle_3_0_0.index > (Too many open files) > .. > .. > 15/07/29 10:37:01 INFO Executor: Executor is trying to kill task 9.0 in > stage 11.0 (TID 306) > org.apache.spark.SparkException: Job aborted due to stage failure: Task 20 > in stage 11.0 failed 1 times, most recent failure: Lost task 20.0 in stage > 11.0 (TID 317, localhost): java.io.FileNotFoundException: > /tmp/spark-71109b28-0f89-4e07-a521-5ff0a943472a/blockmgr-eda0751d-fd21-4229-93b0-2ee2546edf5a/1b/temp_shuffle_a3a9815a-677a-4342-94a2-1e083d758bcc > (Too many open files) > > my fs is ext4 and currently ulist –n is 1024 > > Thanks > Saif > >