Hey Jeremy,

The issue is that you are using one of the external libraries and
these aren't actually packaged with Spark on the cluster, so you need
to create an uber jar that includes them.

You can look at the example here (I recently did this for a kafka
project and the idea is the same):

https://github.com/pwendell/kafka-spark-example

You'll want to make an uber jar that includes these packages (run sbt
assembly) and then submit that jar to spark-submit. Also, I'd try
running it locally first (if you aren't already) just to make the
debugging simpler.

- Patrick


On Wed, Jun 4, 2014 at 6:16 AM, Sean Owen <so...@cloudera.com> wrote:
> Ah sorry, this may be the thing I learned for the day. The issue is
> that classes from that particular artifact are missing though. Worth
> interrogating the resulting .jar file with "jar tf" to see if it made
> it in?
>
> On Wed, Jun 4, 2014 at 2:12 PM, Nick Pentreath <nick.pentre...@gmail.com> 
> wrote:
>> @Sean, the %% syntax in SBT should automatically add the Scala major version
>> qualifier (_2.10, _2.11 etc) for you, so that does appear to be correct
>> syntax for the build.
>>
>> I seemed to run into this issue with some missing Jackson deps, and solved
>> it by including the jar explicitly on the driver class path:
>>
>> bin/spark-submit --driver-class-path
>> SimpleApp/target/scala-2.10/simple-project_2.10-1.0.jar --class "SimpleApp"
>> SimpleApp/target/scala-2.10/simple-project_2.10-1.0.jar
>>
>> Seems redundant to me since I thought that the JAR as argument is copied to
>> driver and made available. But this solved it for me so perhaps give it a
>> try?
>>
>>
>>
>> On Wed, Jun 4, 2014 at 3:01 PM, Sean Owen <so...@cloudera.com> wrote:
>>>
>>> Those aren't the names of the artifacts:
>>>
>>>
>>> http://search.maven.org/#search%7Cga%7C1%7Ca%3A%22spark-streaming-twitter_2.10%22
>>>
>>> The name is "spark-streaming-twitter_2.10"
>>>
>>> On Wed, Jun 4, 2014 at 1:49 PM, Jeremy Lee
>>> <unorthodox.engine...@gmail.com> wrote:
>>> > Man, this has been hard going. Six days, and I finally got a "Hello
>>> > World"
>>> > App working that I wrote myself.
>>> >
>>> > Now I'm trying to make a minimal streaming app based on the twitter
>>> > examples, (running standalone right now while learning) and when running
>>> > it
>>> > like this:
>>> >
>>> > bin/spark-submit --class "SimpleApp"
>>> > SimpleApp/target/scala-2.10/simple-project_2.10-1.0.jar
>>> >
>>> > I'm getting this error:
>>> >
>>> > Exception in thread "main" java.lang.NoClassDefFoundError:
>>> > org/apache/spark/streaming/twitter/TwitterUtils$
>>> >
>>> > Which I'm guessing is because I haven't put in a dependency to
>>> > "external/twitter" in the .sbt, but _how_? I can't find any docs on it.
>>> > Here's my build file so far:
>>> >
>>> > simple.sbt
>>> > ------------------------------------------
>>> > name := "Simple Project"
>>> >
>>> > version := "1.0"
>>> >
>>> > scalaVersion := "2.10.4"
>>> >
>>> > libraryDependencies += "org.apache.spark" %% "spark-core" % "1.0.0"
>>> >
>>> > libraryDependencies += "org.apache.spark" %% "spark-streaming" % "1.0.0"
>>> >
>>> > libraryDependencies += "org.apache.spark" %% "spark-streaming-twitter" %
>>> > "1.0.0"
>>> >
>>> > libraryDependencies += "org.twitter4j" % "twitter4j-stream" % "3.0.3"
>>> >
>>> > resolvers += "Akka Repository" at "http://repo.akka.io/releases/";
>>> > ------------------------------------------
>>> >
>>> > I've tried a few obvious things like adding:
>>> >
>>> > libraryDependencies += "org.apache.spark" %% "spark-external" % "1.0.0"
>>> >
>>> > libraryDependencies += "org.apache.spark" %% "spark-external-twitter" %
>>> > "1.0.0"
>>> >
>>> > because, well, that would match the naming scheme implied so far, but it
>>> > errors.
>>> >
>>> >
>>> > Also, I just realized I don't completely understand if:
>>> > (a) the "spark-submit" command _sends_ the .jar to all the workers, or
>>> > (b) the "spark-submit" commands sends a _job_ to the workers, which are
>>> > supposed to already have the jar file installed (or in hdfs), or
>>> > (c) the Context is supposed to list the jars to be distributed. (is that
>>> > deprecated?)
>>> >
>>> > One part of the documentation says:
>>> >
>>> >  "Once you have an assembled jar you can call the bin/spark-submit
>>> > script as
>>> > shown here while passing your jar."
>>> >
>>> > but another says:
>>> >
>>> > "application-jar: Path to a bundled jar including your application and
>>> > all
>>> > dependencies. The URL must be globally visible inside of your cluster,
>>> > for
>>> > instance, an hdfs:// path or a file:// path that is present on all
>>> > nodes."
>>> >
>>> > I suppose both could be correct if you take a certain point of view.
>>> >
>>> > --
>>> > Jeremy Lee  BCompSci(Hons)
>>> >   The Unorthodox Engineers
>>
>>

Reply via email to