Re: Running Spark jar on EC2

Jeff Higgens Wed, 01 Jan 2014 16:20:04 -0800

Ok, the problem was a very silly mistake.

I launched my EC2 instances using spark-0.8.1-incubating, but my fat jar
was still being compiled with spark-0.7.3. Oops!



On Wed, Jan 1, 2014 at 3:36 PM, Jeff Higgens <[email protected]> wrote:

> Thanks for the suggestions.
>
> Unfortunately I am still unable to run my fat jar on EC2 (even using
> runExample, and SPARK_CLASSPATH is blank). Here is the full output:
>
> root@ip-172-31-21-60 ~]$ java -jar Crunch-assembly-0.0.1.jar
> 14/01/01 22:34:40 INFO slf4j.Slf4jEventHandler: Slf4jEventHandler started
> 14/01/01 22:34:40 INFO spark.SparkEnv: Registering BlockManagerMaster
> 14/01/01 22:34:40 INFO storage.MemoryStore: MemoryStore started with
> capacity 1093.6 MB.
> 14/01/01 22:34:41 INFO storage.DiskStore: Created local directory at
> /tmp/spark-local-20140101223440-a6bb
> 14/01/01 22:34:41 INFO network.ConnectionManager: Bound socket to port
> 56274 with id = ConnectionManagerId(ip-172-31-21-60,56274)
> 14/01/01 22:34:41 INFO storage.BlockManagerMaster: Trying to register
> BlockManager
> 14/01/01 22:34:41 INFO storage.BlockManagerMaster: Registered BlockManager
> 14/01/01 22:34:41 INFO server.Server: jetty-7.x.y-SNAPSHOT
> 14/01/01 22:34:41 INFO server.AbstractConnector: Started
> [email protected]:46111
> 14/01/01 22:34:41 INFO broadcast.HttpBroadcast: Broadcast server started
> at http://172.31.21.60:46111
> 14/01/01 22:34:41 INFO spark.SparkEnv: Registering MapOutputTracker
> 14/01/01 22:34:41 INFO spark.HttpFileServer: HTTP File server directory is
> /tmp/spark-227ad744-5d0d-4e1a-aacd-9c0c73876b31
> 14/01/01 22:34:41 INFO server.Server: jetty-7.x.y-SNAPSHOT
> 14/01/01 22:34:41 INFO server.AbstractConnector: Started
> [email protected]:44012
> 14/01/01 22:34:41 INFO io.IoWorker: IoWorker thread 'spray-io-worker-0'
> started
> 14/01/01 22:34:41 INFO server.HttpServer:
> akka://spark/user/BlockManagerHTTPServer started on /0.0.0.0:45098
> 14/01/01 22:34:41 INFO storage.BlockManagerUI: Started BlockManager web UI
> at http://ip-172-31-21-60:45098
> 14/01/01 22:34:42 INFO spark.SparkContext: Added JAR
> /root/Crunch-assembly-0.0.1.jar at
> http://172.31.21.60:44012/jars/Crunch-assembly-0.0.1.jar with timestamp
> 1388615682294
> 14/01/01 22:34:42 INFO client.Client$ClientActor: Connecting to master
> spark://ec2-54-193-16-137.us-west-1.compute.amazonaws.com:7077
> 14/01/01 22:34:42 ERROR client.Client$ClientActor: Connection to master
> failed; stopping client
> 14/01/01 22:34:42 ERROR cluster.SparkDeploySchedulerBackend: Disconnected
> from Spark cluster!
> 14/01/01 22:34:42 ERROR cluster.ClusterScheduler: Exiting due to error
> from cluster scheduler: Disconnected from Spark cluster
>
>
> Interestingly, running one of the examples (SparkPi) works fine. The only
> thing that looked different from the output of SparkPi was this line:
> 14/01/01 23:27:55 INFO network.ConnectionManager: Bound socket to port
> 41806 with id =
> ConnectionManagerId(ip-172-31-29-197.us-west-1.compute.internal,41806)
>
> Whereas my (not working) jar looked like this on that line:
> 14/01/01 22:34:41 INFO network.ConnectionManager: Bound socket to port
> 56274 with id = ConnectionManagerId(ip-172-31-21-60,56274)
>
>
> On Fri, Dec 20, 2013 at 8:54 PM, Evan Sparks <[email protected]>wrote:
>
>> I ran into a similar issue a few months back - pay careful attention to
>> the order in which spark decides to look for your jars. The root of my
>> problem was a stale jar in SPARK_CLASSPATH on the worker nodes, which took
>> precedence (IIRC) over jars passed in with the SparkContext constructor.
>>
>> On Dec 20, 2013, at 8:49 PM, "K. Shankari" <[email protected]>
>> wrote:
>>
>> I don't think that you need to copy the jar to the rest of the cluster -
>> you should be able to do addJar() in the SparkContext and spark should
>> automatically push the jars to the client for you.
>>
>> I don't know how set you are on running code through checking out and
>> compiling, but here's what I do instead to get my own application to run:
>> - compile my code on my desktop and generate a jar
>> - scp the jar to the master
>> - modify runExample to include the jar in the classpath. I think that you
>> can also just modify SPARK_CLASSPATH
>> - run using something like:
>>
>> $ runExample my.class.name arg1 arg2 arg3
>>
>> Hope this helps!
>> Shankari
>>
>>
>> On Tue, Dec 10, 2013 at 12:15 PM, Jeff Higgens <[email protected]> wrote:
>>
>>> I'm having trouble running my Spark program as a "fat jar" on EC2.
>>>
>>> This is the process I'm using:
>>> (1) spark-ec2 script to launch cluster
>>> (2) ssh to master, install sbt and git clone my project's source code
>>> (3) update source to reference correct master and jar
>>> (4) sbt assembly
>>> (5) copy-dir to copy the jar to the rest of the cluster
>>>
>>> I tried both running the jar (java -jar ...) and using sbt run, but I
>>> always end up with this error:
>>>
>>> 18:58:59.556 [spark-akka.actor.default-dispatcher-4] INFO
>>>  o.a.s.d.client.Client$ClientActor - Connecting to master spark://
>>> ec2-50-16-80-0.compute-1.amazonaws.com:7077
>>> 18:58:59.838 [spark-akka.actor.default-dispatcher-4] ERROR
>>> o.a.s.d.client.Client$ClientActor - Connection to master failed; stopping
>>> client
>>> 18:58:59.839 [spark-akka.actor.default-dispatcher-4] ERROR
>>> o.a.s.s.c.SparkDeploySchedulerBackend - Disconnected from Spark cluster!
>>> 18:58:59.840 [spark-akka.actor.default-dispatcher-4] ERROR
>>> o.a.s.s.cluster.ClusterScheduler - Exiting due to error from cluster
>>> scheduler: Disconnected from Spark cluster
>>> 18:58:59.844 [delete Spark local dirs] DEBUG
>>> org.apache.spark.storage.DiskStore - Shutdown hook called
>>>
>>>
>>> But when I use spark-shell it has no problems connecting to the master
>>> using the exact same url:
>>>
>>> 13/12/10 18:59:40 INFO client.Client$ClientActor: Connecting to master
>>> spark://ec2-50-16-80-0.compute-1.amazonaws.com:7077
>>> Spark context available as sc.
>>>
>>> I'm probably missing something obvious so any tips are very appreciated.
>>>
>>
>>
>

Re: Running Spark jar on EC2

Reply via email to