Hi,

So I'm super confused about how to take my Spark code and actually deploy
and run it on a cluster.

Let's assume I'm writing in Java, and we'll take a simple example such as:
https://github.com/apache/spark/blob/master/examples/src/main/java/org/apache/spark/examples/JavaLogQuery.java,
and this is a process I want to be running quite regularly (say more than
once a minute).

>From the documentation (
http://spark.apache.org/docs/1.1.0/submitting-applications.html), it reads
as if I need to create a jar from the above code, and every time I want to
run this code, I use ./bin/spark-submit to upload it to the cluster, which
would then run it straight away.

This would mean that every time I want to run my process, I need to have a
.jar file travel over the network? Is this correct? (seems like this would
be very slow? I should try it however).

Doing some digging around the JavaDocs, I can see that the
Java/SparkContext has the option to .addJar()'s , but I can't see any
documentation that actually outlines how this can be used?  If someone can
point me towards an article or tutorial on how this is meant to work, I'd
greatly appreciate it.

It would *seem* like I could write a simple process that ran, quite
probably on the same machine as master, that added a Jar through the
SparkContext... but then, how to run the code from that Jar?

Or is the Jar include the code that I would run, that would then create the
SparkContext that would addJar itself? (now my head hurts).

Would Spark also be smart enough to know that the JAR was already uploaded,
if addJar was called once it had already been uploaded?

I'm not seeing this shown in the examples either.

I'm really excited by what I see in Spark, but I am totally confused by how
to actually get code up on Spark and make it run, and nothing I read seems
to explain this aspect very well (at least to my thick head).

I have seen: https://github.com/spark-jobserver/spark-jobserver, but from
initial review, it *looks *like it will only work with Scala, (because you
need to use the ScalaJob trait), and I have a Java dependency.

Any help on this aspect would be greatly appreciated!

Mark


-- 
E: mark.man...@gmail.com
T: http://www.twitter.com/neurotic
W: www.compoundtheory.com

2 Devs from Down Under Podcast
http://www.2ddu.com/

Reply via email to