Hi folks,

I am trying to use Kafka with Spark Streaming, and it appears I cannot do
the normal 'sbt package' as I do with other Spark applications, such as
Spark alone or Spark with MLlib. I learned I have to build with the
sbt-assembly plugin.

OK, so here is my build.sbt file for my extremely simple test Kafka/Spark
Streaming project. It Takes almost 30 minutes to build! This is a Centos
Linux machine on SSDs with 4GB of RAM, it's never been slow for me. To
compare, sbt assembly for the entire Spark project itself takes less than
10 minutes.

At the bottom of this file I am trying to play with 'cacheOutput' options,
because I read online that maybe I am calculating SHA-1 for all the *.class
files in this super JAR.

I also copied the mergeStrategy from Spark contributor TD Spark Streaming
tutorial from Spark Summit 2014.

Again, is there some better way to build this JAR file, just using sbt
package? This is process is working, but very slow.

Any help with speeding up this compilation is really appreciated!!

Aris

-----------------------------------------

import AssemblyKeys._ // put this at the top of the file

name := "streamingKafka"

version := "1.0"

scalaVersion := "2.10.4"

libraryDependencies ++= Seq(
  "org.apache.spark" %% "spark-core" % "1.0.1" % "provided",
  "org.apache.spark" %% "spark-streaming" % "1.0.1" % "provided",
  "org.apache.spark" %% "spark-streaming-kafka" % "1.0.1"
)

assemblySettings

jarName in assembly := "streamingkafka-assembly.jar"

mergeStrategy in assembly := {
  case m if m.toLowerCase.endsWith("manifest.mf")          =>
MergeStrategy.discard
  case m if m.toLowerCase.matches("meta-inf.*\\.sf$")      =>
MergeStrategy.discard
  case "log4j.properties"                                  =>
MergeStrategy.discard
  case m if m.toLowerCase.startsWith("meta-inf/services/") =>
MergeStrategy.filterDistinctLines
  case "reference.conf"                                    =>
MergeStrategy.concat
  case _                                                   =>
MergeStrategy.first
}

assemblyOption in assembly ~= { _.copy(cacheOutput = false) }

Reply via email to