Hey, I've been struggling to set up a work flow with spark. I'm basically using the AMI for the amplab3 tutorials, but added a couple of packages for R, rJava and some of my own jars. Basically Spark 0.7.3 standalone. (can't get Mesos running but that's a question for a different time)
I read data from S3, and do a cascade of filters, maps, joins and reduce on them. If I perform the task with a smallish data set (<1000) it succeeds, but if I use a data set of > 1.5M rows, I keep getting the follow error when I do a collect on the RDD 13/09/21 00:41:45 INFO master.Master: Removing app app-20130921004115-0000 13/09/21 00:41:45 ERROR actor.ActorSystemImpl: RemoteClientError@akka:// [email protected]:44283: Error[java.net.ConnectException:Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:708) at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.connect(NioClientSocketPipelineSink.java:404) at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.processSelectedKeys(NioClientSocketPipelineSink.java:366) at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.run(NioClientSocketPipelineSink.java:282) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) ] I'm at loss on where to start debugging -- is it some configuration issue on my part, or some scala error, or some spark error? I've attached the log file from the master and worker.... If anyone has any ideas on how to start debugging, please.. I'll be very appreciative. tks,shay
spark-root-spark.deploy.master.Master-1-ip-10-232-35-179.ec2.internal.out
Description: Binary data
spark-root-spark.deploy.worker.Worker-1-ip-10-168-42-45.ec2.internal.out
Description: Binary data
stderr
Description: Binary data
