Re: Spark and KafkaUtils

2016-03-15 Thread Vinti Maheshwari
Hi Cody,

I wanted to update my build.sbt which was working with kafka without giving
any error, it may help other user if they face similar issue.

name := "NetworkStreaming"

version := "1.0"

scalaVersion:= "2.10.5"

libraryDependencies ++= Seq(
  "org.apache.spark" %% "spark-streaming-kafka" % "1.6.0", // kafka
  "org.apache.spark" %% "spark-mllib" % "1.6.0",
  "org.codehaus.groovy" % "groovy-all" % "1.8.6",
  "org.apache.hbase" % "hbase-server" % "1.1.2",
  "org.apache.spark" %% "spark-sql"  % "1.6.0",
  "org.apache.hbase" % "hbase-common" % "1.1.2"
excludeAll(ExclusionRule(organization = "javax.servlet",
name="javax.servlet-api"), ExclusionRule(organization =
"org.mortbay.jetty", name="jetty"), ExclusionRule(organization =
"org.mortbay.jetty", name="servlet-api-2.5")),
  "org.apache.hbase" % "hbase-client" % "1.1.2"
excludeAll(ExclusionRule(organization = "javax.servlet",
name="javax.servlet-api"), ExclusionRule(organization =
"org.mortbay.jetty", name="jetty"), ExclusionRule(organization =
"org.mortbay.jetty", name="servlet-api-2.5"))
)


assemblyMergeStrategy in assembly := {
  case m if m.toLowerCase.endsWith("manifest.mf")  =>
MergeStrategy.discard
  case m if m.toLowerCase.matches("meta-inf.*\\.sf$")  =>
MergeStrategy.discard
  case "log4j.properties"  =>
MergeStrategy.discard
  case m if m.toLowerCase.startsWith("meta-inf/services/") =>
MergeStrategy.filterDistinctLines
  case "reference.conf"=>
MergeStrategy.concat
  case _   =>
MergeStrategy.first
}

Thanks & Regards,

Vinti



On Wed, Feb 24, 2016 at 1:34 PM, Cody Koeninger  wrote:

> Looks like conflicting versions of the same dependency.
> If you look at the mergeStrategy section of the build file I posted, you
> can add additional lines for whatever dependencies are causing issues, e.g.
>
>   case PathList("org", "jboss", "netty", _*) => MergeStrategy.first
>
> On Wed, Feb 24, 2016 at 2:55 PM, Vinti Maheshwari 
> wrote:
>
>> Thanks much Cody, I added assembly.sbt and modified build.sbt with ivy
>> bug related content.
>>
>> It's giving lots of errors related to ivy:
>>
>> *[error]
>> /Users/vintim/.ivy2/cache/javax.activation/activation/jars/activation-1.1.jar:javax/activation/ActivationDataFlavor.class*
>>
>> Here is complete error log:
>> https://gist.github.com/Vibhuti/07c24d2893fa6e520d4c
>>
>>
>> Regards,
>> ~Vinti
>>
>> On Wed, Feb 24, 2016 at 12:16 PM, Cody Koeninger 
>> wrote:
>>
>>> Ok, that build file I linked earlier has a minimal example of use.  just
>>> running 'sbt assembly' given a similar build file should build a jar with
>>> all the dependencies.
>>>
>>> On Wed, Feb 24, 2016 at 1:50 PM, Vinti Maheshwari 
>>> wrote:
>>>
 I am not using sbt assembly currently. I need to check how to use sbt
 assembly.

 Regards,
 ~Vinti

 On Wed, Feb 24, 2016 at 11:10 AM, Cody Koeninger 
 wrote:

> Are you using sbt assembly?  That's what will include all of the
> non-provided dependencies in a single jar along with your code.  Otherwise
> you'd have to specify each separate jar in your spark-submit line, which 
> is
> a pain.
>
> On Wed, Feb 24, 2016 at 12:49 PM, Vinti Maheshwari <
> vinti.u...@gmail.com> wrote:
>
>> Hi Cody,
>>
>> I tried with the build file you provided, but it's not working for
>> me, getting same error:
>> Exception in thread "main" java.lang.NoClassDefFoundError:
>> org/apache/spark/streaming/kafka/KafkaUtils$
>>
>> I am not getting this error while building  (sbt package). I am
>> getting this error when i am running my spark-streaming program.
>> Do i need to specify kafka jar path manually with spark-submit --jars
>> flag?
>>
>> My build.sbt:
>>
>> name := "NetworkStreaming"
>> libraryDependencies += "org.apache.hbase" % "hbase" % "0.92.1"
>>
>> libraryDependencies += "org.apache.hadoop" % "hadoop-core" % "1.0.2"
>>
>> libraryDependencies += "org.apache.spark" % "spark-mllib_2.10" % "1.0.0"
>>
>> libraryDependencies ++= Seq(
>>   "org.apache.spark" % "spark-streaming_2.10" % "1.5.2",
>>   "org.apache.spark" % "spark-streaming-kafka_2.10" % "1.5.2"
>> )
>>
>>
>>
>> Regards,
>> ~Vinti
>>
>> On Wed, Feb 24, 2016 at 9:33 AM, Cody Koeninger 
>> wrote:
>>
>>> spark streaming is provided, kafka is not.
>>>
>>> This build file
>>>
>>> https://github.com/koeninger/kafka-exactly-once/blob/master/build.sbt
>>>
>>> includes some hacks for ivy issues that may no longer be strictly
>>> necessary, but try that build and see if it works for you.
>>>
>>>
>>> On Wed, Feb 24, 2016 at 11:14 AM, Vinti Maheshwari <
>>> vinti.u...@gmail.com> wrote:
>>>
 Hello,

 I have tried multiple different settings in build.sbt but seem

Re: Spark and KafkaUtils

2016-02-24 Thread Cody Koeninger
Looks like conflicting versions of the same dependency.
If you look at the mergeStrategy section of the build file I posted, you
can add additional lines for whatever dependencies are causing issues, e.g.

  case PathList("org", "jboss", "netty", _*) => MergeStrategy.first

On Wed, Feb 24, 2016 at 2:55 PM, Vinti Maheshwari 
wrote:

> Thanks much Cody, I added assembly.sbt and modified build.sbt with ivy bug
> related content.
>
> It's giving lots of errors related to ivy:
>
> *[error]
> /Users/vintim/.ivy2/cache/javax.activation/activation/jars/activation-1.1.jar:javax/activation/ActivationDataFlavor.class*
>
> Here is complete error log:
> https://gist.github.com/Vibhuti/07c24d2893fa6e520d4c
>
>
> Regards,
> ~Vinti
>
> On Wed, Feb 24, 2016 at 12:16 PM, Cody Koeninger 
> wrote:
>
>> Ok, that build file I linked earlier has a minimal example of use.  just
>> running 'sbt assembly' given a similar build file should build a jar with
>> all the dependencies.
>>
>> On Wed, Feb 24, 2016 at 1:50 PM, Vinti Maheshwari 
>> wrote:
>>
>>> I am not using sbt assembly currently. I need to check how to use sbt
>>> assembly.
>>>
>>> Regards,
>>> ~Vinti
>>>
>>> On Wed, Feb 24, 2016 at 11:10 AM, Cody Koeninger 
>>> wrote:
>>>
 Are you using sbt assembly?  That's what will include all of the
 non-provided dependencies in a single jar along with your code.  Otherwise
 you'd have to specify each separate jar in your spark-submit line, which is
 a pain.

 On Wed, Feb 24, 2016 at 12:49 PM, Vinti Maheshwari <
 vinti.u...@gmail.com> wrote:

> Hi Cody,
>
> I tried with the build file you provided, but it's not working for me,
> getting same error:
> Exception in thread "main" java.lang.NoClassDefFoundError:
> org/apache/spark/streaming/kafka/KafkaUtils$
>
> I am not getting this error while building  (sbt package). I am
> getting this error when i am running my spark-streaming program.
> Do i need to specify kafka jar path manually with spark-submit --jars
> flag?
>
> My build.sbt:
>
> name := "NetworkStreaming"
> libraryDependencies += "org.apache.hbase" % "hbase" % "0.92.1"
>
> libraryDependencies += "org.apache.hadoop" % "hadoop-core" % "1.0.2"
>
> libraryDependencies += "org.apache.spark" % "spark-mllib_2.10" % "1.0.0"
>
> libraryDependencies ++= Seq(
>   "org.apache.spark" % "spark-streaming_2.10" % "1.5.2",
>   "org.apache.spark" % "spark-streaming-kafka_2.10" % "1.5.2"
> )
>
>
>
> Regards,
> ~Vinti
>
> On Wed, Feb 24, 2016 at 9:33 AM, Cody Koeninger 
> wrote:
>
>> spark streaming is provided, kafka is not.
>>
>> This build file
>>
>> https://github.com/koeninger/kafka-exactly-once/blob/master/build.sbt
>>
>> includes some hacks for ivy issues that may no longer be strictly
>> necessary, but try that build and see if it works for you.
>>
>>
>> On Wed, Feb 24, 2016 at 11:14 AM, Vinti Maheshwari <
>> vinti.u...@gmail.com> wrote:
>>
>>> Hello,
>>>
>>> I have tried multiple different settings in build.sbt but seems like
>>> nothing is working.
>>> Can anyone suggest the right syntax/way to include kafka with spark?
>>>
>>> Error
>>> Exception in thread "main" java.lang.NoClassDefFoundError:
>>> org/apache/spark/streaming/kafka/KafkaUtils$
>>>
>>> build.sbt
>>> libraryDependencies += "org.apache.hbase" % "hbase" % "0.92.1"
>>> libraryDependencies += "org.apache.hadoop" % "hadoop-core" % "1.0.2"
>>> libraryDependencies += "org.apache.spark" % "spark-mllib_2.10" %
>>> "1.0.0"
>>> libraryDependencies ++= Seq(
>>>   "org.apache.spark" % "spark-streaming_2.10" % "1.5.2",
>>>   "org.apache.spark" % "spark-streaming-kafka_2.10" % "1.5.2",
>>>   "org.apache.spark" %% "spark-streaming" % "1.5.2" % "provided",
>>>   "org.apache.spark" %% "spark-streaming-kafka" % "1.5.2" %
>>> "provided"
>>> )
>>>
>>>
>>> Thanks,
>>> Vinti
>>>
>>>
>>
>

>>>
>>
>


Re: Spark and KafkaUtils

2016-02-24 Thread Vinti Maheshwari
Error msg is:

*[error] deduplicate: different file contents found in the following:*
[error]
/Users/vintim/.ivy2/cache/org.jruby/jruby-complete/jars/jruby-complete-1.6.5.jar:org/joda/time/tz/data/Europe/Bucharest
[error]
/Users/vintim/.ivy2/cache/joda-time/joda-time/jars/joda-time-2.6.jar:org/joda/time/tz/data/Europe/Bucharest

I tried to adding below block, given in stackoverflow, but still no luck.

http://stackoverflow.com/questions/20393283/deduplication-error-with-sbt-assembly-plugin?rq=1

excludedJars in assembly <<= (fullClasspath in assembly) map { cp =>
 cp filter {x => x.data.getName.matches("sbt.*") ||
x.data.getName.matches(".*macros.*")}}

Thanks,
~Vinti

On Wed, Feb 24, 2016 at 12:55 PM, Vinti Maheshwari 
wrote:

> Thanks much Cody, I added assembly.sbt and modified build.sbt with ivy bug
> related content.
>
> It's giving lots of errors related to ivy:
>
> *[error]
> /Users/vintim/.ivy2/cache/javax.activation/activation/jars/activation-1.1.jar:javax/activation/ActivationDataFlavor.class*
>
> Here is complete error log:
> https://gist.github.com/Vibhuti/07c24d2893fa6e520d4c
>
>
> Regards,
> ~Vinti
>
> On Wed, Feb 24, 2016 at 12:16 PM, Cody Koeninger 
> wrote:
>
>> Ok, that build file I linked earlier has a minimal example of use.  just
>> running 'sbt assembly' given a similar build file should build a jar with
>> all the dependencies.
>>
>> On Wed, Feb 24, 2016 at 1:50 PM, Vinti Maheshwari 
>> wrote:
>>
>>> I am not using sbt assembly currently. I need to check how to use sbt
>>> assembly.
>>>
>>> Regards,
>>> ~Vinti
>>>
>>> On Wed, Feb 24, 2016 at 11:10 AM, Cody Koeninger 
>>> wrote:
>>>
 Are you using sbt assembly?  That's what will include all of the
 non-provided dependencies in a single jar along with your code.  Otherwise
 you'd have to specify each separate jar in your spark-submit line, which is
 a pain.

 On Wed, Feb 24, 2016 at 12:49 PM, Vinti Maheshwari <
 vinti.u...@gmail.com> wrote:

> Hi Cody,
>
> I tried with the build file you provided, but it's not working for me,
> getting same error:
> Exception in thread "main" java.lang.NoClassDefFoundError:
> org/apache/spark/streaming/kafka/KafkaUtils$
>
> I am not getting this error while building  (sbt package). I am
> getting this error when i am running my spark-streaming program.
> Do i need to specify kafka jar path manually with spark-submit --jars
> flag?
>
> My build.sbt:
>
> name := "NetworkStreaming"
> libraryDependencies += "org.apache.hbase" % "hbase" % "0.92.1"
>
> libraryDependencies += "org.apache.hadoop" % "hadoop-core" % "1.0.2"
>
> libraryDependencies += "org.apache.spark" % "spark-mllib_2.10" % "1.0.0"
>
> libraryDependencies ++= Seq(
>   "org.apache.spark" % "spark-streaming_2.10" % "1.5.2",
>   "org.apache.spark" % "spark-streaming-kafka_2.10" % "1.5.2"
> )
>
>
>
> Regards,
> ~Vinti
>
> On Wed, Feb 24, 2016 at 9:33 AM, Cody Koeninger 
> wrote:
>
>> spark streaming is provided, kafka is not.
>>
>> This build file
>>
>> https://github.com/koeninger/kafka-exactly-once/blob/master/build.sbt
>>
>> includes some hacks for ivy issues that may no longer be strictly
>> necessary, but try that build and see if it works for you.
>>
>>
>> On Wed, Feb 24, 2016 at 11:14 AM, Vinti Maheshwari <
>> vinti.u...@gmail.com> wrote:
>>
>>> Hello,
>>>
>>> I have tried multiple different settings in build.sbt but seems like
>>> nothing is working.
>>> Can anyone suggest the right syntax/way to include kafka with spark?
>>>
>>> Error
>>> Exception in thread "main" java.lang.NoClassDefFoundError:
>>> org/apache/spark/streaming/kafka/KafkaUtils$
>>>
>>> build.sbt
>>> libraryDependencies += "org.apache.hbase" % "hbase" % "0.92.1"
>>> libraryDependencies += "org.apache.hadoop" % "hadoop-core" % "1.0.2"
>>> libraryDependencies += "org.apache.spark" % "spark-mllib_2.10" %
>>> "1.0.0"
>>> libraryDependencies ++= Seq(
>>>   "org.apache.spark" % "spark-streaming_2.10" % "1.5.2",
>>>   "org.apache.spark" % "spark-streaming-kafka_2.10" % "1.5.2",
>>>   "org.apache.spark" %% "spark-streaming" % "1.5.2" % "provided",
>>>   "org.apache.spark" %% "spark-streaming-kafka" % "1.5.2" %
>>> "provided"
>>> )
>>>
>>>
>>> Thanks,
>>> Vinti
>>>
>>>
>>
>

>>>
>>
>


Re: Spark and KafkaUtils

2016-02-24 Thread Vinti Maheshwari
Thanks much Cody, I added assembly.sbt and modified build.sbt with ivy bug
related content.

It's giving lots of errors related to ivy:

*[error]
/Users/vintim/.ivy2/cache/javax.activation/activation/jars/activation-1.1.jar:javax/activation/ActivationDataFlavor.class*

Here is complete error log:
https://gist.github.com/Vibhuti/07c24d2893fa6e520d4c


Regards,
~Vinti

On Wed, Feb 24, 2016 at 12:16 PM, Cody Koeninger  wrote:

> Ok, that build file I linked earlier has a minimal example of use.  just
> running 'sbt assembly' given a similar build file should build a jar with
> all the dependencies.
>
> On Wed, Feb 24, 2016 at 1:50 PM, Vinti Maheshwari 
> wrote:
>
>> I am not using sbt assembly currently. I need to check how to use sbt
>> assembly.
>>
>> Regards,
>> ~Vinti
>>
>> On Wed, Feb 24, 2016 at 11:10 AM, Cody Koeninger 
>> wrote:
>>
>>> Are you using sbt assembly?  That's what will include all of the
>>> non-provided dependencies in a single jar along with your code.  Otherwise
>>> you'd have to specify each separate jar in your spark-submit line, which is
>>> a pain.
>>>
>>> On Wed, Feb 24, 2016 at 12:49 PM, Vinti Maheshwari >> > wrote:
>>>
 Hi Cody,

 I tried with the build file you provided, but it's not working for me,
 getting same error:
 Exception in thread "main" java.lang.NoClassDefFoundError:
 org/apache/spark/streaming/kafka/KafkaUtils$

 I am not getting this error while building  (sbt package). I am getting
 this error when i am running my spark-streaming program.
 Do i need to specify kafka jar path manually with spark-submit --jars
 flag?

 My build.sbt:

 name := "NetworkStreaming"
 libraryDependencies += "org.apache.hbase" % "hbase" % "0.92.1"

 libraryDependencies += "org.apache.hadoop" % "hadoop-core" % "1.0.2"

 libraryDependencies += "org.apache.spark" % "spark-mllib_2.10" % "1.0.0"

 libraryDependencies ++= Seq(
   "org.apache.spark" % "spark-streaming_2.10" % "1.5.2",
   "org.apache.spark" % "spark-streaming-kafka_2.10" % "1.5.2"
 )



 Regards,
 ~Vinti

 On Wed, Feb 24, 2016 at 9:33 AM, Cody Koeninger 
 wrote:

> spark streaming is provided, kafka is not.
>
> This build file
>
> https://github.com/koeninger/kafka-exactly-once/blob/master/build.sbt
>
> includes some hacks for ivy issues that may no longer be strictly
> necessary, but try that build and see if it works for you.
>
>
> On Wed, Feb 24, 2016 at 11:14 AM, Vinti Maheshwari <
> vinti.u...@gmail.com> wrote:
>
>> Hello,
>>
>> I have tried multiple different settings in build.sbt but seems like
>> nothing is working.
>> Can anyone suggest the right syntax/way to include kafka with spark?
>>
>> Error
>> Exception in thread "main" java.lang.NoClassDefFoundError:
>> org/apache/spark/streaming/kafka/KafkaUtils$
>>
>> build.sbt
>> libraryDependencies += "org.apache.hbase" % "hbase" % "0.92.1"
>> libraryDependencies += "org.apache.hadoop" % "hadoop-core" % "1.0.2"
>> libraryDependencies += "org.apache.spark" % "spark-mllib_2.10" %
>> "1.0.0"
>> libraryDependencies ++= Seq(
>>   "org.apache.spark" % "spark-streaming_2.10" % "1.5.2",
>>   "org.apache.spark" % "spark-streaming-kafka_2.10" % "1.5.2",
>>   "org.apache.spark" %% "spark-streaming" % "1.5.2" % "provided",
>>   "org.apache.spark" %% "spark-streaming-kafka" % "1.5.2" % "provided"
>> )
>>
>>
>> Thanks,
>> Vinti
>>
>>
>

>>>
>>
>


Re: Spark and KafkaUtils

2016-02-24 Thread Cody Koeninger
Ok, that build file I linked earlier has a minimal example of use.  just
running 'sbt assembly' given a similar build file should build a jar with
all the dependencies.

On Wed, Feb 24, 2016 at 1:50 PM, Vinti Maheshwari 
wrote:

> I am not using sbt assembly currently. I need to check how to use sbt
> assembly.
>
> Regards,
> ~Vinti
>
> On Wed, Feb 24, 2016 at 11:10 AM, Cody Koeninger 
> wrote:
>
>> Are you using sbt assembly?  That's what will include all of the
>> non-provided dependencies in a single jar along with your code.  Otherwise
>> you'd have to specify each separate jar in your spark-submit line, which is
>> a pain.
>>
>> On Wed, Feb 24, 2016 at 12:49 PM, Vinti Maheshwari 
>> wrote:
>>
>>> Hi Cody,
>>>
>>> I tried with the build file you provided, but it's not working for me,
>>> getting same error:
>>> Exception in thread "main" java.lang.NoClassDefFoundError:
>>> org/apache/spark/streaming/kafka/KafkaUtils$
>>>
>>> I am not getting this error while building  (sbt package). I am getting
>>> this error when i am running my spark-streaming program.
>>> Do i need to specify kafka jar path manually with spark-submit --jars
>>> flag?
>>>
>>> My build.sbt:
>>>
>>> name := "NetworkStreaming"
>>> libraryDependencies += "org.apache.hbase" % "hbase" % "0.92.1"
>>>
>>> libraryDependencies += "org.apache.hadoop" % "hadoop-core" % "1.0.2"
>>>
>>> libraryDependencies += "org.apache.spark" % "spark-mllib_2.10" % "1.0.0"
>>>
>>> libraryDependencies ++= Seq(
>>>   "org.apache.spark" % "spark-streaming_2.10" % "1.5.2",
>>>   "org.apache.spark" % "spark-streaming-kafka_2.10" % "1.5.2"
>>> )
>>>
>>>
>>>
>>> Regards,
>>> ~Vinti
>>>
>>> On Wed, Feb 24, 2016 at 9:33 AM, Cody Koeninger 
>>> wrote:
>>>
 spark streaming is provided, kafka is not.

 This build file

 https://github.com/koeninger/kafka-exactly-once/blob/master/build.sbt

 includes some hacks for ivy issues that may no longer be strictly
 necessary, but try that build and see if it works for you.


 On Wed, Feb 24, 2016 at 11:14 AM, Vinti Maheshwari <
 vinti.u...@gmail.com> wrote:

> Hello,
>
> I have tried multiple different settings in build.sbt but seems like
> nothing is working.
> Can anyone suggest the right syntax/way to include kafka with spark?
>
> Error
> Exception in thread "main" java.lang.NoClassDefFoundError:
> org/apache/spark/streaming/kafka/KafkaUtils$
>
> build.sbt
> libraryDependencies += "org.apache.hbase" % "hbase" % "0.92.1"
> libraryDependencies += "org.apache.hadoop" % "hadoop-core" % "1.0.2"
> libraryDependencies += "org.apache.spark" % "spark-mllib_2.10" %
> "1.0.0"
> libraryDependencies ++= Seq(
>   "org.apache.spark" % "spark-streaming_2.10" % "1.5.2",
>   "org.apache.spark" % "spark-streaming-kafka_2.10" % "1.5.2",
>   "org.apache.spark" %% "spark-streaming" % "1.5.2" % "provided",
>   "org.apache.spark" %% "spark-streaming-kafka" % "1.5.2" % "provided"
> )
>
>
> Thanks,
> Vinti
>
>

>>>
>>
>


Re: Spark and KafkaUtils

2016-02-24 Thread Vinti Maheshwari
I am not using sbt assembly currently. I need to check how to use sbt
assembly.

Regards,
~Vinti

On Wed, Feb 24, 2016 at 11:10 AM, Cody Koeninger  wrote:

> Are you using sbt assembly?  That's what will include all of the
> non-provided dependencies in a single jar along with your code.  Otherwise
> you'd have to specify each separate jar in your spark-submit line, which is
> a pain.
>
> On Wed, Feb 24, 2016 at 12:49 PM, Vinti Maheshwari 
> wrote:
>
>> Hi Cody,
>>
>> I tried with the build file you provided, but it's not working for me,
>> getting same error:
>> Exception in thread "main" java.lang.NoClassDefFoundError:
>> org/apache/spark/streaming/kafka/KafkaUtils$
>>
>> I am not getting this error while building  (sbt package). I am getting
>> this error when i am running my spark-streaming program.
>> Do i need to specify kafka jar path manually with spark-submit --jars
>> flag?
>>
>> My build.sbt:
>>
>> name := "NetworkStreaming"
>> libraryDependencies += "org.apache.hbase" % "hbase" % "0.92.1"
>>
>> libraryDependencies += "org.apache.hadoop" % "hadoop-core" % "1.0.2"
>>
>> libraryDependencies += "org.apache.spark" % "spark-mllib_2.10" % "1.0.0"
>>
>> libraryDependencies ++= Seq(
>>   "org.apache.spark" % "spark-streaming_2.10" % "1.5.2",
>>   "org.apache.spark" % "spark-streaming-kafka_2.10" % "1.5.2"
>> )
>>
>>
>>
>> Regards,
>> ~Vinti
>>
>> On Wed, Feb 24, 2016 at 9:33 AM, Cody Koeninger 
>> wrote:
>>
>>> spark streaming is provided, kafka is not.
>>>
>>> This build file
>>>
>>> https://github.com/koeninger/kafka-exactly-once/blob/master/build.sbt
>>>
>>> includes some hacks for ivy issues that may no longer be strictly
>>> necessary, but try that build and see if it works for you.
>>>
>>>
>>> On Wed, Feb 24, 2016 at 11:14 AM, Vinti Maheshwari >> > wrote:
>>>
 Hello,

 I have tried multiple different settings in build.sbt but seems like
 nothing is working.
 Can anyone suggest the right syntax/way to include kafka with spark?

 Error
 Exception in thread "main" java.lang.NoClassDefFoundError:
 org/apache/spark/streaming/kafka/KafkaUtils$

 build.sbt
 libraryDependencies += "org.apache.hbase" % "hbase" % "0.92.1"
 libraryDependencies += "org.apache.hadoop" % "hadoop-core" % "1.0.2"
 libraryDependencies += "org.apache.spark" % "spark-mllib_2.10" % "1.0.0"
 libraryDependencies ++= Seq(
   "org.apache.spark" % "spark-streaming_2.10" % "1.5.2",
   "org.apache.spark" % "spark-streaming-kafka_2.10" % "1.5.2",
   "org.apache.spark" %% "spark-streaming" % "1.5.2" % "provided",
   "org.apache.spark" %% "spark-streaming-kafka" % "1.5.2" % "provided"
 )


 Thanks,
 Vinti


>>>
>>
>


Re: Spark and KafkaUtils

2016-02-24 Thread Cody Koeninger
Are you using sbt assembly?  That's what will include all of the
non-provided dependencies in a single jar along with your code.  Otherwise
you'd have to specify each separate jar in your spark-submit line, which is
a pain.

On Wed, Feb 24, 2016 at 12:49 PM, Vinti Maheshwari 
wrote:

> Hi Cody,
>
> I tried with the build file you provided, but it's not working for me,
> getting same error:
> Exception in thread "main" java.lang.NoClassDefFoundError:
> org/apache/spark/streaming/kafka/KafkaUtils$
>
> I am not getting this error while building  (sbt package). I am getting
> this error when i am running my spark-streaming program.
> Do i need to specify kafka jar path manually with spark-submit --jars flag?
>
> My build.sbt:
>
> name := "NetworkStreaming"
> libraryDependencies += "org.apache.hbase" % "hbase" % "0.92.1"
>
> libraryDependencies += "org.apache.hadoop" % "hadoop-core" % "1.0.2"
>
> libraryDependencies += "org.apache.spark" % "spark-mllib_2.10" % "1.0.0"
>
> libraryDependencies ++= Seq(
>   "org.apache.spark" % "spark-streaming_2.10" % "1.5.2",
>   "org.apache.spark" % "spark-streaming-kafka_2.10" % "1.5.2"
> )
>
>
>
> Regards,
> ~Vinti
>
> On Wed, Feb 24, 2016 at 9:33 AM, Cody Koeninger 
> wrote:
>
>> spark streaming is provided, kafka is not.
>>
>> This build file
>>
>> https://github.com/koeninger/kafka-exactly-once/blob/master/build.sbt
>>
>> includes some hacks for ivy issues that may no longer be strictly
>> necessary, but try that build and see if it works for you.
>>
>>
>> On Wed, Feb 24, 2016 at 11:14 AM, Vinti Maheshwari 
>> wrote:
>>
>>> Hello,
>>>
>>> I have tried multiple different settings in build.sbt but seems like
>>> nothing is working.
>>> Can anyone suggest the right syntax/way to include kafka with spark?
>>>
>>> Error
>>> Exception in thread "main" java.lang.NoClassDefFoundError:
>>> org/apache/spark/streaming/kafka/KafkaUtils$
>>>
>>> build.sbt
>>> libraryDependencies += "org.apache.hbase" % "hbase" % "0.92.1"
>>> libraryDependencies += "org.apache.hadoop" % "hadoop-core" % "1.0.2"
>>> libraryDependencies += "org.apache.spark" % "spark-mllib_2.10" % "1.0.0"
>>> libraryDependencies ++= Seq(
>>>   "org.apache.spark" % "spark-streaming_2.10" % "1.5.2",
>>>   "org.apache.spark" % "spark-streaming-kafka_2.10" % "1.5.2",
>>>   "org.apache.spark" %% "spark-streaming" % "1.5.2" % "provided",
>>>   "org.apache.spark" %% "spark-streaming-kafka" % "1.5.2" % "provided"
>>> )
>>>
>>>
>>> Thanks,
>>> Vinti
>>>
>>>
>>
>


Re: Spark and KafkaUtils

2016-02-24 Thread Vinti Maheshwari
Hi Cody,

I tried with the build file you provided, but it's not working for me,
getting same error:
Exception in thread "main" java.lang.NoClassDefFoundError:
org/apache/spark/streaming/kafka/KafkaUtils$

I am not getting this error while building  (sbt package). I am getting
this error when i am running my spark-streaming program.
Do i need to specify kafka jar path manually with spark-submit --jars flag?

My build.sbt:

name := "NetworkStreaming"
libraryDependencies += "org.apache.hbase" % "hbase" % "0.92.1"

libraryDependencies += "org.apache.hadoop" % "hadoop-core" % "1.0.2"

libraryDependencies += "org.apache.spark" % "spark-mllib_2.10" % "1.0.0"

libraryDependencies ++= Seq(
  "org.apache.spark" % "spark-streaming_2.10" % "1.5.2",
  "org.apache.spark" % "spark-streaming-kafka_2.10" % "1.5.2"
)



Regards,
~Vinti

On Wed, Feb 24, 2016 at 9:33 AM, Cody Koeninger  wrote:

> spark streaming is provided, kafka is not.
>
> This build file
>
> https://github.com/koeninger/kafka-exactly-once/blob/master/build.sbt
>
> includes some hacks for ivy issues that may no longer be strictly
> necessary, but try that build and see if it works for you.
>
>
> On Wed, Feb 24, 2016 at 11:14 AM, Vinti Maheshwari 
> wrote:
>
>> Hello,
>>
>> I have tried multiple different settings in build.sbt but seems like
>> nothing is working.
>> Can anyone suggest the right syntax/way to include kafka with spark?
>>
>> Error
>> Exception in thread "main" java.lang.NoClassDefFoundError:
>> org/apache/spark/streaming/kafka/KafkaUtils$
>>
>> build.sbt
>> libraryDependencies += "org.apache.hbase" % "hbase" % "0.92.1"
>> libraryDependencies += "org.apache.hadoop" % "hadoop-core" % "1.0.2"
>> libraryDependencies += "org.apache.spark" % "spark-mllib_2.10" % "1.0.0"
>> libraryDependencies ++= Seq(
>>   "org.apache.spark" % "spark-streaming_2.10" % "1.5.2",
>>   "org.apache.spark" % "spark-streaming-kafka_2.10" % "1.5.2",
>>   "org.apache.spark" %% "spark-streaming" % "1.5.2" % "provided",
>>   "org.apache.spark" %% "spark-streaming-kafka" % "1.5.2" % "provided"
>> )
>>
>>
>> Thanks,
>> Vinti
>>
>>
>


Re: Spark and KafkaUtils

2016-02-24 Thread Cody Koeninger
spark streaming is provided, kafka is not.

This build file

https://github.com/koeninger/kafka-exactly-once/blob/master/build.sbt

includes some hacks for ivy issues that may no longer be strictly
necessary, but try that build and see if it works for you.


On Wed, Feb 24, 2016 at 11:14 AM, Vinti Maheshwari 
wrote:

> Hello,
>
> I have tried multiple different settings in build.sbt but seems like
> nothing is working.
> Can anyone suggest the right syntax/way to include kafka with spark?
>
> Error
> Exception in thread "main" java.lang.NoClassDefFoundError:
> org/apache/spark/streaming/kafka/KafkaUtils$
>
> build.sbt
> libraryDependencies += "org.apache.hbase" % "hbase" % "0.92.1"
> libraryDependencies += "org.apache.hadoop" % "hadoop-core" % "1.0.2"
> libraryDependencies += "org.apache.spark" % "spark-mllib_2.10" % "1.0.0"
> libraryDependencies ++= Seq(
>   "org.apache.spark" % "spark-streaming_2.10" % "1.5.2",
>   "org.apache.spark" % "spark-streaming-kafka_2.10" % "1.5.2",
>   "org.apache.spark" %% "spark-streaming" % "1.5.2" % "provided",
>   "org.apache.spark" %% "spark-streaming-kafka" % "1.5.2" % "provided"
> )
>
>
> Thanks,
> Vinti
>
>