Assuming you are on Linux, what is your /etc/security/limits.conf set for
nofile/soft (number of open file handles)?

On Fri, Mar 20, 2015 at 3:29 PM Shuai Zheng <szheng.c...@gmail.com> wrote:

> Hi All,
>
>
>
> I try to run a simple sort by on 1.2.1. And it always give me below two
> errors:
>
>
>
> 1, 15/03/20 17:48:29 WARN TaskSetManager: Lost task 2.0 in stage 1.0 (TID
> 35, ip-10-169-217-47.ec2.internal): java.io.FileNotFoundException:
> /tmp/spark-e40bb112-3a08-4f62-9eaa-cd094fcfa624/spark-58f72d53-8afc-41c2-ad6b-e96b479b51f5/spark-fde6da79-0b51-4087-8234-2c07ac6d7586/spark-dd7d6682-19dd-4c66-8aa5-d8a4abe88ca2/16/temp_shuffle_756b59df-ef3a-4680-b3ac-437b53267826
> (Too many open files)
>
>
>
> And then I switch to:
>
> conf.set("spark.shuffle.consolidateFiles", "true")
>
> .set("spark.shuffle.manager", "SORT")
>
>
>
> Then I get the error:
>
>
>
> Exception in thread "main" org.apache.spark.SparkException: Job aborted
> due to stage failure: Task 5 in stage 1.0 failed 4 times, most recent
> failure: Lost task 5.3 in stage 1.0 (TID 36,
> ip-10-169-217-47.ec2.internal): com.esotericsoftware.kryo.KryoException:
> java.io.IOException: File too large
>
>         at com.esotericsoftware.kryo.io.Output.flush(Output.java:157)
>
>
>
> I roughly know the first issue is because Spark shuffle creates too many
> local temp files (and I don’t know the solution, because looks like my
> solution also cause other issues), but I am not sure what means is the
> second error.
>
>
>
> Anyone knows the solution for both cases?
>
>
>
> Regards,
>
>
>
> Shuai
>

Reply via email to