Hey Andrew, I tried the following approach: I modified my Spark build on my local machine. I did downloaded the Spark 1.4.0 src code and then made a change to ResultTask.scala( I made a simple change to see if it work. I added a print statement). Now, I built spark using
mvn -Dhadoop.version=1.0.4 -Phadoop-1 -DskipTests -Dscala-2.10 clean package Now, the new assembly jar was built. I started my EC2 Cluster using this command: ./ec2/spark-ec2 -k key -i ../aggr/key.pem --instance-type=m3.medium --zone=us-east-1b -s 9 launch spark-cluster I initially launched my application jar and it worked fine. After that I scp’d the new assembly jar to the spark lib directory of all my ec2 nodes. When I ran the jar again I got the following error: 5/06/21 00:42:51 INFO AppClient$ClientActor: Connecting to master akka.tcp://sparkmas...@ec2-xxx.compute-1.amazonaws.com:7077/user/Master... 15/06/21 00:42:52 WARN Remoting: Tried to associate with unreachable remote address [akka.tcp://sparkmas...@ec2-xxx.compute-1.amazonaws.com:7077]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: Connection refused: ec2-XXX.compute-1.amazonaws.com/10.165.103.16:7077 15/06/21 00:42:52 WARN AppClient$ClientActor: Could not connect to akka.tcp://sparkmas...@ec2-xxx.compute-1.amazonaws.com:7077: akka.remote.InvalidAssociation: Invalid address: akka.tcp://sparkmas...@ec2-xxx.compute-1.amazonaws.com:7077 15/06/21 00:43:11 INFO AppClient$ClientActor: Connecting to master akka.tcp://sparkmas...@ec2-xxx.compute-1.amazonaws.com:7077/user/Master... 15/06/21 00:43:11 WARN AppClient$ClientActor: Could not connect to akka.tcp://sparkmas...@ec2-xxx.compute-1.amazonaws.com:7077: akka.remote.InvalidAssociation: Invalid address: akka.tcp://sparkmas...@ec2-xxx.compute-1.amazonaws.com:7077 15/06/21 00:43:11 WARN Remoting: Tried to associate with unreachable remote address [akka.tcp://sparkmas...@ec2-xxx.compute-1.amazonaws.com:7077]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: Connection refused: ec2-XXX.compute-1.amazonaws.com/10.165.103.16:7077 15/06/21 00:43:31 INFO AppClient$ClientActor: Connecting to master akka.tcp://sparkmas...@ec2-xxx.compute-1.amazonaws.com:7077/user/Master... 15/06/21 00:43:31 WARN AppClient$ClientActor: Could not connect to akka.tcp://sparkmas...@ec2-xxx.compute-1.amazonaws.com:7077: akka.remote.InvalidAssociation: Invalid address: akka.tcp://sparkmas...@xxx.compute-1.amazonaws.com:7077 15/06/21 00:43:31 WARN Remoting: Tried to associate with unreachable remote address [akka.tcp://sparkmas...@ec2-xxx.compute-1.amazonaws.com:7077]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: Connection refused: XXX.compute-1.amazonaws.com/10.165.103.16:7077 15/06/21 00:43:51 ERROR SparkDeploySchedulerBackend: Application has been killed. Reason: All masters are unresponsive! Giving up. 15/06/21 00:43:51 WARN SparkDeploySchedulerBackend: Application ID is not initialized yet. 15/06/21 00:43:51 INFO SparkUI: Stopped Spark web UI at http://XXX.compute-1.amazonaws.com:4040 15/06/21 00:43:51 INFO DAGScheduler: Stopping DAGScheduler 15/06/21 00:43:51 INFO SparkDeploySchedulerBackend: Shutting down all executors 15/06/21 00:43:51 INFO SparkDeploySchedulerBackend: Asking each executor to shut down 15/06/21 00:43:51 ERROR OneForOneStrategy: java.lang.NullPointerException at org.apache.spark.deploy.client.AppClient$ClientActor$$anonfun$receiveWithLogging$1.applyOrElse(AppClient.scala:160) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) at org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:59) at org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:42) at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118) at org.apache.spark.util.ActorLogReceive$$anon$1.applyOrElse(ActorLogReceive.scala:42) at akka.actor.Actor$class.aroundReceive(Actor.scala:465) at org.apache.spark.deploy.client.AppClient$ClientActor.aroundReceive(AppClient.scala:61) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) at akka.actor.ActorCell.invoke(ActorCell.scala:487) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238) at akka.dispatch.Mailbox.run(Mailbox.scala:220) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Also, in the above error it says: connection refused to ec2-XXX.compute-1.amazonaws.com/10.165.103.16:7077 <http://ec2-xxx.compute-1.amazonaws.com/10.165.103.16:7077> I don’t understand where it gets the 10.165.103.16 <http://ec2-xxx.compute-1.amazonaws.com/10.165.103.16:7077> from. I never specify that in the master url command line parameter. Any ideas on what I might be doing wrong? > On Jun 19, 2015, at 7:19 PM, Andrew Or <and...@databricks.com> wrote: > > Hi Raghav, > > I'm assuming you're using standalone mode. When using the Spark EC2 scripts > you need to make sure that every machine has the most updated jars. Once you > have built on one of the nodes, you must rsync the Spark directory to the > rest of the nodes (see /root/spark-ec2/copy-dir). > > That said, I usually build it locally on my laptop and scp the assembly jar > to the cluster instead of building it there. The EC2 machines often take much > longer to build for some reason. Also it's cumbersome to set up proper IDE > there. > > -Andrew > > > 2015-06-19 19:11 GMT-07:00 Raghav Shankar <raghav0110...@gmail.com>: > Thanks Andrew! Is this all I have to do when using the spark ec2 script to > setup a spark cluster? It seems to be getting an assembly jar that is not > from my project(perhaps from a maven repo). Is there a way to make the ec2 > script use the assembly jar that I created? > > Thanks, > Raghav > > > On Friday, June 19, 2015, Andrew Or <and...@databricks.com> wrote: > Hi Raghav, > > If you want to make changes to Spark and run your application with it, you > may follow these steps. > > 1. git clone g...@github.com:apache/spark > 2. cd spark; build/mvn clean package -DskipTests [...] > 3. make local changes > 4. build/mvn package -DskipTests [...] (no need to clean again here) > 5. bin/spark-submit --master spark://[...] --class your.main.class your.jar > > No need to pass in extra --driver-java-options or --driver-extra-classpath as > others have suggested. When using spark-submit, the main jar comes from > assembly/target/scala_2.10, which is prepared through "mvn package". You just > have to make sure that you re-package the assembly jar after each > modification. > > -Andrew > > 2015-06-18 16:35 GMT-07:00 maxdml <max...@cs.duke.edu>: > You can specify the jars of your application to be included with spark-submit > with the /--jars/ switch. > > Otherwise, are you sure that your newly compiled spark jar assembly is in > assembly/target/scala-2.10/? > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Submitting-Spark-Applications-using-Spark-Submit-tp23352p23400.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > >