Hey Andrew, 

 I tried the following approach: I modified my Spark build on my local machine. 
I did downloaded the Spark 1.4.0 src code and then made a change to 
ResultTask.scala( I made a simple change to see if it work. I added a print 
statement). Now, I built spark using 

mvn -Dhadoop.version=1.0.4 -Phadoop-1 -DskipTests -Dscala-2.10 clean package

Now, the new assembly jar was built. I started my EC2 Cluster using this 
command:

./ec2/spark-ec2 -k key -i ../aggr/key.pem --instance-type=m3.medium 
--zone=us-east-1b -s 9 launch spark-cluster

I initially launched my application jar and it worked fine. After that I scp’d 
the new assembly jar to the spark lib directory of all my ec2 nodes. When I ran 
the jar again I got the following error:

5/06/21 00:42:51 INFO AppClient$ClientActor: Connecting to master 
akka.tcp://sparkmas...@ec2-xxx.compute-1.amazonaws.com:7077/user/Master...
15/06/21 00:42:52 WARN Remoting: Tried to associate with unreachable remote 
address [akka.tcp://sparkmas...@ec2-xxx.compute-1.amazonaws.com:7077]. Address 
is now gated for 5000 ms, all messages to this address will be delivered to 
dead letters. Reason: Connection refused: 
ec2-XXX.compute-1.amazonaws.com/10.165.103.16:7077
15/06/21 00:42:52 WARN AppClient$ClientActor: Could not connect to 
akka.tcp://sparkmas...@ec2-xxx.compute-1.amazonaws.com:7077: 
akka.remote.InvalidAssociation: Invalid address: 
akka.tcp://sparkmas...@ec2-xxx.compute-1.amazonaws.com:7077
15/06/21 00:43:11 INFO AppClient$ClientActor: Connecting to master 
akka.tcp://sparkmas...@ec2-xxx.compute-1.amazonaws.com:7077/user/Master...
15/06/21 00:43:11 WARN AppClient$ClientActor: Could not connect to 
akka.tcp://sparkmas...@ec2-xxx.compute-1.amazonaws.com:7077: 
akka.remote.InvalidAssociation: Invalid address: 
akka.tcp://sparkmas...@ec2-xxx.compute-1.amazonaws.com:7077
15/06/21 00:43:11 WARN Remoting: Tried to associate with unreachable remote 
address [akka.tcp://sparkmas...@ec2-xxx.compute-1.amazonaws.com:7077]. Address 
is now gated for 5000 ms, all messages to this address will be delivered to 
dead letters. Reason: Connection refused: 
ec2-XXX.compute-1.amazonaws.com/10.165.103.16:7077
15/06/21 00:43:31 INFO AppClient$ClientActor: Connecting to master 
akka.tcp://sparkmas...@ec2-xxx.compute-1.amazonaws.com:7077/user/Master...
15/06/21 00:43:31 WARN AppClient$ClientActor: Could not connect to 
akka.tcp://sparkmas...@ec2-xxx.compute-1.amazonaws.com:7077: 
akka.remote.InvalidAssociation: Invalid address: 
akka.tcp://sparkmas...@xxx.compute-1.amazonaws.com:7077
15/06/21 00:43:31 WARN Remoting: Tried to associate with unreachable remote 
address [akka.tcp://sparkmas...@ec2-xxx.compute-1.amazonaws.com:7077]. Address 
is now gated for 5000 ms, all messages to this address will be delivered to 
dead letters. Reason: Connection refused: 
XXX.compute-1.amazonaws.com/10.165.103.16:7077
15/06/21 00:43:51 ERROR SparkDeploySchedulerBackend: Application has been 
killed. Reason: All masters are unresponsive! Giving up.
15/06/21 00:43:51 WARN SparkDeploySchedulerBackend: Application ID is not 
initialized yet.
15/06/21 00:43:51 INFO SparkUI: Stopped Spark web UI at 
http://XXX.compute-1.amazonaws.com:4040
15/06/21 00:43:51 INFO DAGScheduler: Stopping DAGScheduler
15/06/21 00:43:51 INFO SparkDeploySchedulerBackend: Shutting down all executors
15/06/21 00:43:51 INFO SparkDeploySchedulerBackend: Asking each executor to 
shut down
15/06/21 00:43:51 ERROR OneForOneStrategy: 
java.lang.NullPointerException
        at 
org.apache.spark.deploy.client.AppClient$ClientActor$$anonfun$receiveWithLogging$1.applyOrElse(AppClient.scala:160)
        at 
scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
        at 
scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
        at 
scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
        at 
org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:59)
        at 
org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:42)
        at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
        at 
org.apache.spark.util.ActorLogReceive$$anon$1.applyOrElse(ActorLogReceive.scala:42)
        at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
        at 
org.apache.spark.deploy.client.AppClient$ClientActor.aroundReceive(AppClient.scala:61)
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
        at akka.actor.ActorCell.invoke(ActorCell.scala:487)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
        at akka.dispatch.Mailbox.run(Mailbox.scala:220)
        at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

Also, in the above error it says: connection refused to 
ec2-XXX.compute-1.amazonaws.com/10.165.103.16:7077 
<http://ec2-xxx.compute-1.amazonaws.com/10.165.103.16:7077> I don’t understand 
where it gets the 10.165.103.16 
<http://ec2-xxx.compute-1.amazonaws.com/10.165.103.16:7077> from. I never 
specify that in the master url command line parameter. Any ideas on what I 
might be doing wrong?

> On Jun 19, 2015, at 7:19 PM, Andrew Or <and...@databricks.com> wrote:
> 
> Hi Raghav,
> 
> I'm assuming you're using standalone mode. When using the Spark EC2 scripts 
> you need to make sure that every machine has the most updated jars. Once you 
> have built on one of the nodes, you must rsync the Spark directory to the 
> rest of the nodes (see /root/spark-ec2/copy-dir).
> 
> That said, I usually build it locally on my laptop and scp the assembly jar 
> to the cluster instead of building it there. The EC2 machines often take much 
> longer to build for some reason. Also it's cumbersome to set up proper IDE 
> there.
> 
> -Andrew
> 
> 
> 2015-06-19 19:11 GMT-07:00 Raghav Shankar <raghav0110...@gmail.com>:
> Thanks Andrew! Is this all I have to do when using the spark ec2 script to 
> setup a spark cluster? It seems to be getting an assembly jar that is not 
> from my project(perhaps from a maven repo). Is there a way to make the ec2 
> script use the assembly jar that I created?
> 
> Thanks,
> Raghav 
> 
> 
> On Friday, June 19, 2015, Andrew Or <and...@databricks.com> wrote:
> Hi Raghav,
> 
> If you want to make changes to Spark and run your application with it, you 
> may follow these steps.
> 
> 1. git clone g...@github.com:apache/spark
> 2. cd spark; build/mvn clean package -DskipTests [...]
> 3. make local changes
> 4. build/mvn package -DskipTests [...] (no need to clean again here)
> 5. bin/spark-submit --master spark://[...] --class your.main.class your.jar
> 
> No need to pass in extra --driver-java-options or --driver-extra-classpath as 
> others have suggested. When using spark-submit, the main jar comes from 
> assembly/target/scala_2.10, which is prepared through "mvn package". You just 
> have to make sure that you re-package the assembly jar after each 
> modification.
> 
> -Andrew
> 
> 2015-06-18 16:35 GMT-07:00 maxdml <max...@cs.duke.edu>:
> You can specify the jars of your application to be included with spark-submit
> with the /--jars/ switch.
> 
> Otherwise, are you sure that your newly compiled spark jar assembly is in
> assembly/target/scala-2.10/?
> 
> 
> 
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/Submitting-Spark-Applications-using-Spark-Submit-tp23352p23400.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
> 
> 
> 

Reply via email to