Man, this has been hard going. Six days, and I finally got a "Hello World"
App working that I wrote myself.

Now I'm trying to make a minimal streaming app based on the twitter
examples, (running standalone right now while learning) and when running it
like this:

bin/spark-submit --class "SimpleApp"
SimpleApp/target/scala-2.10/simple-project_2.10-1.0.jar

I'm getting this error:

Exception in thread "main" java.lang.NoClassDefFoundError:
org/apache/spark/streaming/twitter/TwitterUtils$

Which I'm guessing is because I haven't put in a dependency to
"external/twitter" in the .sbt, but _how_? I can't find any docs on it.
Here's my build file so far:

simple.sbt
------------------------------------------
name := "Simple Project"

version := "1.0"

scalaVersion := "2.10.4"

libraryDependencies += "org.apache.spark" %% "spark-core" % "1.0.0"

libraryDependencies += "org.apache.spark" %% "spark-streaming" % "1.0.0"

libraryDependencies += "org.apache.spark" %% "spark-streaming-twitter" %
"1.0.0"

libraryDependencies += "org.twitter4j" % "twitter4j-stream" % "3.0.3"

resolvers += "Akka Repository" at "http://repo.akka.io/releases/";
------------------------------------------

I've tried a few obvious things like adding:

libraryDependencies += "org.apache.spark" %% "spark-external" % "1.0.0"

libraryDependencies += "org.apache.spark" %% "spark-external-twitter" %
"1.0.0"

because, well, that would match the naming scheme implied so far, but it
errors.


Also, I just realized I don't completely understand if:
(a) the "spark-submit" command _sends_ the .jar to all the workers, or
(b) the "spark-submit" commands sends a _job_ to the workers, which are
supposed to already have the jar file installed (or in hdfs), or
(c) the Context is supposed to list the jars to be distributed. (is that
deprecated?)

One part of the documentation says:

 "Once you have an assembled jar you can call the bin/spark-submit script
as shown here while passing your jar."

but another says:

"application-jar: Path to a bundled jar including your application and all
dependencies. The URL must be globally visible inside of your cluster, for
instance, an hdfs:// path or a file:// path that is present on all nodes."

I suppose both could be correct if you take a certain point of view.

-- 
Jeremy Lee  BCompSci(Hons)
  The Unorthodox Engineers

Reply via email to