Hello All, I dug a little deeper and found this error :
15/04/27 16:05:39 WARN TransportChannelHandler: Exception in connection from /10.1.0.90:40590 java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) at sun.nio.ch.IOUtil.read(IOUtil.java:192) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:311) at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881) at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:225) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116) at java.lang.Thread.run(Thread.java:745) 15/04/27 16:05:39 ERROR TransportRequestHandler: Error sending result ChunkFetchSuccess{streamChunkId=StreamChunkId{streamId=45314884029, chunkIndex=0}, buffer=NioManagedBuffer{buf=java.nio.HeapByteBuffer[pos=0 lim=26227673 cap=26227673]}} to /10.1.0.90:40590; closing connection java.nio.channels.ClosedChannelException 15/04/27 16:05:39 ERROR TransportRequestHandler: Error sending result RpcResponse{requestId=8439869725098873668, response=[B@1bdcdf63} to /10.1.0.90:40590; closing connection java.nio.channels.ClosedChannelException 15/04/27 16:05:39 ERROR CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkexecu...@master.spark.com:60802] -> [akka.tcp://sparkdri...@master.spark.com:37195] disassociated! Shutting down. 15/04/27 16:05:39 WARN ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkdri...@master.spark.com:37195] has failed, address is now gated for [5000] ms. Reason is: [Disassociated]. On Mon, Apr 27, 2015 at 8:35 AM, Shixiong Zhu <zsxw...@gmail.com> wrote: > The configuration key should be "spark.akka.askTimeout" for this timeout. > The time unit is seconds. > > Best Regards, > Shixiong(Ryan) Zhu > > 2015-04-26 15:15 GMT-07:00 Deepak Gopalakrishnan <dgk...@gmail.com>: > > Hello, >> >> >> Just to add a bit more context : >> >> I have done that in the code, but I cannot see it change from 30 seconds >> in the log. >> >> .set("spark.executor.memory", "10g") >> >> .set("spark.driver.memory", "20g") >> >> .set("spark.akka.timeout","6000") >> >> PS : I understand that 6000 is quite large, but I'm just trying to see if >> it actually changes >> >> >> Here is the command that I'm running >> >> sudo MASTER=spark://master.spark.com:7077 >> /opt/spark/spark-1.3.0-bin-hadoop2.4/bin/spark-submit --class >> "<class-name>" --executor-memory 20G --driver-memory 10G --deploy-mode >> client --conf spark.akka.timeout=6000 --conf spark.akka.askTimeout=6000 >> <jar file path> >> >> >> and here is how I load the file JavaPairRDD<String, String> >> learningRdd=sc.wholeTextFiles(filePath,10); >> Thanks >> >> On Mon, Apr 27, 2015 at 3:36 AM, Bryan Cutler <cutl...@gmail.com> wrote: >> >>> I'm not sure what the expected performance should be for this amount of >>> data, but you could try to increase the timeout with the property >>> "spark.akka.timeout" to see if that helps. >>> >>> Bryan >>> >>> On Sun, Apr 26, 2015 at 6:57 AM, Deepak Gopalakrishnan <dgk...@gmail.com >>> > wrote: >>> >>>> Hello All, >>>> >>>> I'm trying to process a 3.5GB file on standalone mode using spark. I >>>> could run my spark job succesfully on a 100MB file and it works as >>>> expected. But, when I try to run it on the 3.5GB file, I run into the below >>>> error : >>>> >>>> >>>> 15/04/26 12:45:50 INFO BlockManagerMaster: Updated info of block >>>> taskresult_83 >>>> 15/04/26 12:46:46 WARN AkkaUtils: Error sending message [message = >>>> Heartbeat(2,[Lscala.Tuple2;@790223d3,BlockManagerId(2, master.spark.com, >>>> 39143))] in 1 attempts >>>> java.util.concurrent.TimeoutException: Futures timed out after [30 seconds] >>>> at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) >>>> at >>>> scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) >>>> at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) >>>> at >>>> scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) >>>> at scala.concurrent.Await$.result(package.scala:107) >>>> at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:195) >>>> at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:427) >>>> 15/04/26 12:47:15 INFO MemoryStore: ensureFreeSpace(26227673) called with >>>> curMem=265897, maxMem=5556991426 >>>> 15/04/26 12:47:15 INFO MemoryStore: Block taskresult_92 stored as bytes in >>>> memory (estimated size 25.0 MB, free 5.2 GB) >>>> 15/04/26 12:47:16 INFO MemoryStore: ensureFreeSpace(26272879) called with >>>> curMem=26493570, maxMem=5556991426 >>>> 15/04/26 12:47:16 INFO MemoryStore: Block taskresult_94 stored as bytes in >>>> memory (estimated size 25.1 MB, free 5.1 GB) >>>> 15/04/26 12:47:18 INFO MemoryStore: ensureFreeSpace(26285327) called with >>>> curMem=52766449, maxMem=5556991426 >>>> >>>> >>>> and the job fails. >>>> >>>> >>>> I'm on AWS and have opened all ports. Also, since the 100MB file works, >>>> it should not be a connection issue. I've a r3 xlarge and 2 m3 large. >>>> >>>> Can anyone suggest a way to fix this? >>>> >>>> -- >>>> Regards, >>>> *Deepak Gopalakrishnan* >>>> *Mobile*:+918891509774 >>>> *Skype* : deepakgk87 >>>> http://myexps.blogspot.com >>>> >>>> >>> >> >> >> -- >> Regards, >> *Deepak Gopalakrishnan* >> *Mobile*:+918891509774 >> *Skype* : deepakgk87 >> http://myexps.blogspot.com >> >> > -- Regards, *Deepak Gopalakrishnan* *Mobile*:+918891509774 *Skype* : deepakgk87 http://myexps.blogspot.com