"Failed to connect" implies that the executor at that host died, please check its logs as well.
On Tue, Mar 3, 2015 at 11:03 AM, Jianshi Huang <jianshi.hu...@gmail.com> wrote: > Sorry that I forgot the subject. > > And in the driver, I got many FetchFailedException. The error messages are > > 15/03/03 10:34:32 WARN TaskSetManager: Lost task 31.0 in stage 2.2 (TID > 7943, xxxx): FetchFailed(BlockManagerId(86, xxxx, 43070), shuffleId=0, > mapId=24, reduceId=1220, message= > org.apache.spark.shuffle.FetchFailedException: Failed to connect to > xxxx/xxxx:43070 > at > org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$.org$apache$spark$shuffle$hash$BlockStoreShuffleFetcher$$unpackBlock$1(BlockStoreShuffleFetcher.scala:67) > at > org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:83) > at > org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:83) > > > Jianshi > > On Wed, Mar 4, 2015 at 2:55 AM, Jianshi Huang <jianshi.hu...@gmail.com> > wrote: > >> Hi, >> >> I got this error message: >> >> 15/03/03 10:22:41 ERROR OneForOneBlockFetcher: Failed while starting >> block fetches >> java.lang.RuntimeException: java.io.FileNotFoundException: >> /hadoop01/scratch/local/usercache/jianshuang/appcache/application_1421268539738_202330/spark-local-20150303100549-fc3b/02/shuffle_0_1458_0.index >> (No such file or directory) >> at java.io.FileInputStream.open(Native Method) >> at java.io.FileInputStream.<init>(FileInputStream.java:146) >> at >> org.apache.spark.shuffle.IndexShuffleBlockManager.getBlockData(IndexShuffleBlockManager.scala:109) >> at >> org.apache.spark.storage.BlockManager.getBlockData(BlockManager.scala:305) >> at >> org.apache.spark.network.netty.NettyBlockRpcServer$$anonfun$2.apply(NettyBlockRpcServer.scala:57) >> at >> org.apache.spark.network.netty.NettyBlockRpcServer$$anonfun$2.apply(NettyBlockRpcServer.scala:57) >> at >> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) >> at >> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) >> at >> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) >> at >> scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) >> at >> scala.collection.TraversableLike$class.map(TraversableLike.scala:244) >> at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108) >> at >> org.apache.spark.network.netty.NettyBlockRpcServer.receive(NettyBlockRpcServer.scala:57) >> >> >> And then for the same index file and executor, I got the following errors >> multiple times >> >> 15/03/03 10:22:41 ERROR ShuffleBlockFetcherIterator: Failed to get >> block(s) from host-xxxx:39534 >> java.lang.RuntimeException: java.io.FileNotFoundException: >> /hadoop01/scratch/local/usercache/jianshuang/appcache/application_1421268539738_202330/spark-local-20150303100549-fc3b/02/shuffle_0_1458_0.index >> (No such file or directory) >> >> 15/03/03 10:22:41 ERROR RetryingBlockFetcher: Failed to fetch block >> shuffle_0_13_1228, and will not retry (0 retries) >> java.lang.RuntimeException: java.io.FileNotFoundException: >> /hadoop01/scratch/local/usercache/jianshuang/appcache/application_1421268539738_202330/spark-local-20150303100549-fc3b/02/shuffle_0_1458_0.index >> (No such file or directory) >> >> ... >> Caused by: java.net.ConnectException: Connection refused: host-xxxx.... >> >> >> What's the problem? >> >> BTW, I'm using Spark 1.2.1-SNAPSHOT I built around Dec. 20. Is there any >> bug fixes related to shuffle block fetching or index files after that? >> >> >> Thanks, >> -- >> Jianshi Huang >> >> LinkedIn: jianshi >> Twitter: @jshuang >> Github & Blog: http://huangjs.github.com/ >> > > > > -- > Jianshi Huang > > LinkedIn: jianshi > Twitter: @jshuang > Github & Blog: http://huangjs.github.com/ >