IOException and appcache FileNotFoundException in Spark 1.02

Ilya Ganelin Thu, 09 Oct 2014 17:28:43 -0700

Hi all – I could use some help figuring out a couple of exceptions I’ve
been getting regularly.


I have been running on a fairly large dataset (150 gigs). With smaller
datasets I don't have any issues.

My sequence of operations is as follows – unless otherwise specified, I am
not caching:

Map a 30 million row x 70 col string table to approx 30 mil x  5 string
(For read as textFile I am using 1500 partitions)

>From that, map to ((a,b), score) and reduceByKey, numPartitions = 180

Then, extract distinct values for A and distinct values for B. (I cache the
output of distinct), , numPartitions = 180

Zip with index for A and for B (to remap strings to int)

Join remapped ids with original table

This is then fed into MLLIBs ALS algorithm.

I am running with:

Spark version 1.02 with CDH5.1

numExecutors = 8, numCores = 14

Memory = 12g

MemoryFration = 0.7

KryoSerialization

My issue is that the code runs fine for a while but then will
non-deterministically crash with either file IOExceptions or the following
obscure error:

14/10/08 13:29:59 INFO TaskSetManager: Loss was due to java.io.IOException:
Filesystem closed [duplicate 10]

14/10/08 13:30:08 WARN TaskSetManager: Loss was due to
java.io.FileNotFoundException

java.io.FileNotFoundException:
/opt/cloudera/hadoop/1/yarn/nm/usercache/zjb238/appcache/application_1412717093951_0024/spark-local-20141008131827-c082/1c/shuffle_3_117_354
(No such file or directory)

Looking through the logs, I see the IOException in other places but it
appears to be non-catastrophic. The FileNotFoundException, however, is. I
have found the following stack overflow that at least seems to address the
IOException:

http://stackoverflow.com/questions/24038908/spark-fails-on-big-shuffle-jobs-with-java-io-ioexception-filesystem-closed

But I have not found anything useful at all with regards to the app cache
error.

Any help would be much appreciated.

IOException and appcache FileNotFoundException in Spark 1.02

Reply via email to