I came across this: https://github.com/xerial/sbt-pack
Until i found this, I was simply using the sbt-assembly plugin (sbt clean assembly) mn On Sep 4, 2014, at 2:46 PM, Aris <arisofala...@gmail.com> wrote: > Thanks for answering Daniil - > > I have SBT version 0.13.5, is that an old version? Seems pretty up-to-date. > > It turns out I figured out a way around this entire problem: just use 'sbt > package', and when using bin/spark-submit, pass it the "--jars" option and > GIVE IT ALL THE JARS from the local iv2 cache. Pretty inelegant, but at least > I am able to develop, and when I want to make a super JAR with sbt assembly I > can use the stupidly slow method. > > Here is the important snippet for grabbing all the JARs for the local cache > of ivy2 : > > --jars $(find ~/.ivy2/cache/ -iname *.jar | tr '\n' ,) > > Here's the entire running command - > > bin/spark-submit --master local[*] --jars $(find /home/data/.ivy2/cache/ > -iname *.jar | tr '\n' ,) --class KafkaStreamConsumer > ~/code_host/data/scala/streamingKafka/target/scala-2.10/streamingkafka_2.10-1.0.jar > node1:2181 my-consumer-group aris-topic 1 > > This is fairly bad, but it works around sbt assembly being incredibly slow > > > On Tue, Sep 2, 2014 at 2:13 PM, Daniil Osipov <daniil.osi...@shazam.com> > wrote: > What version of sbt are you using? There is a bug in early version of 0.13 > that causes assembly to be extremely slow - make sure you're using the latest > one. > > > On Fri, Aug 29, 2014 at 1:30 PM, Aris <> wrote: > Hi folks, > > I am trying to use Kafka with Spark Streaming, and it appears I cannot do the > normal 'sbt package' as I do with other Spark applications, such as Spark > alone or Spark with MLlib. I learned I have to build with the sbt-assembly > plugin. > > OK, so here is my build.sbt file for my extremely simple test Kafka/Spark > Streaming project. It Takes almost 30 minutes to build! This is a Centos > Linux machine on SSDs with 4GB of RAM, it's never been slow for me. To > compare, sbt assembly for the entire Spark project itself takes less than 10 > minutes. > > At the bottom of this file I am trying to play with 'cacheOutput' options, > because I read online that maybe I am calculating SHA-1 for all the *.class > files in this super JAR. > > I also copied the mergeStrategy from Spark contributor TD Spark Streaming > tutorial from Spark Summit 2014. > > Again, is there some better way to build this JAR file, just using sbt > package? This is process is working, but very slow. > > Any help with speeding up this compilation is really appreciated!! > > Aris > > ----------------------------------------- > > import AssemblyKeys._ // put this at the top of the file > > name := "streamingKafka" > > version := "1.0" > > scalaVersion := "2.10.4" > > libraryDependencies ++= Seq( > "org.apache.spark" %% "spark-core" % "1.0.1" % "provided", > "org.apache.spark" %% "spark-streaming" % "1.0.1" % "provided", > "org.apache.spark" %% "spark-streaming-kafka" % "1.0.1" > ) > > assemblySettings > > jarName in assembly := "streamingkafka-assembly.jar" > > mergeStrategy in assembly := { > case m if m.toLowerCase.endsWith("manifest.mf") => > MergeStrategy.discard > case m if m.toLowerCase.matches("meta-inf.*\\.sf$") => > MergeStrategy.discard > case "log4j.properties" => > MergeStrategy.discard > case m if m.toLowerCase.startsWith("meta-inf/services/") => > MergeStrategy.filterDistinctLines > case "reference.conf" => > MergeStrategy.concat > case _ => > MergeStrategy.first > } > > assemblyOption in assembly ~= { _.copy(cacheOutput = false) } > > >