Man, this has been hard going. Six days, and I finally got a "Hello World" App working that I wrote myself.
Now I'm trying to make a minimal streaming app based on the twitter examples, (running standalone right now while learning) and when running it like this: bin/spark-submit --class "SimpleApp" SimpleApp/target/scala-2.10/simple-project_2.10-1.0.jar I'm getting this error: Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/streaming/twitter/TwitterUtils$ Which I'm guessing is because I haven't put in a dependency to "external/twitter" in the .sbt, but _how_? I can't find any docs on it. Here's my build file so far: simple.sbt ------------------------------------------ name := "Simple Project" version := "1.0" scalaVersion := "2.10.4" libraryDependencies += "org.apache.spark" %% "spark-core" % "1.0.0" libraryDependencies += "org.apache.spark" %% "spark-streaming" % "1.0.0" libraryDependencies += "org.apache.spark" %% "spark-streaming-twitter" % "1.0.0" libraryDependencies += "org.twitter4j" % "twitter4j-stream" % "3.0.3" resolvers += "Akka Repository" at "http://repo.akka.io/releases/" ------------------------------------------ I've tried a few obvious things like adding: libraryDependencies += "org.apache.spark" %% "spark-external" % "1.0.0" libraryDependencies += "org.apache.spark" %% "spark-external-twitter" % "1.0.0" because, well, that would match the naming scheme implied so far, but it errors. Also, I just realized I don't completely understand if: (a) the "spark-submit" command _sends_ the .jar to all the workers, or (b) the "spark-submit" commands sends a _job_ to the workers, which are supposed to already have the jar file installed (or in hdfs), or (c) the Context is supposed to list the jars to be distributed. (is that deprecated?) One part of the documentation says: "Once you have an assembled jar you can call the bin/spark-submit script as shown here while passing your jar." but another says: "application-jar: Path to a bundled jar including your application and all dependencies. The URL must be globally visible inside of your cluster, for instance, an hdfs:// path or a file:// path that is present on all nodes." I suppose both could be correct if you take a certain point of view. -- Jeremy Lee BCompSci(Hons) The Unorthodox Engineers