I mean the actual kafka client: <dependency> <groupId>org.apache.kafka</groupId> <artifactId>kafka-clients</artifactId> <version>0.10.0.1</version> </dependency>
On Tue, May 16, 2017 at 4:29 PM, kant kodali <kanth...@gmail.com> wrote: > Hi Michael, > > Thanks for the catch. I assume you meant > *spark-streaming-kafka-0-10_2.11-2.1.0.jar* > > I add this in all spark machines under SPARK_HOME/jars. > > Still same error seems to persist. Is that the right jar or is there > anything else I need to add? > > Thanks! > > > > On Tue, May 16, 2017 at 1:40 PM, Michael Armbrust <mich...@databricks.com> > wrote: > >> Looks like you are missing the kafka dependency. >> >> On Tue, May 16, 2017 at 1:04 PM, kant kodali <kanth...@gmail.com> wrote: >> >>> Looks like I am getting the following runtime exception. I am using >>> Spark 2.1.0 and the following jars >>> >>> *spark-sql_2.11-2.1.0.jar* >>> >>> *spark-sql-kafka-0-10_2.11-2.1.0.jar* >>> >>> *spark-streaming_2.11-2.1.0.jar* >>> >>> >>> Exception in thread "stream execution thread for [id = >>> fcfe1fa6-dab3-4769-9e15-e074af622cc1, runId = >>> 7c54940a-e453-41de-b256-049b539b59b1]" >>> >>> java.lang.NoClassDefFoundError: >>> org/apache/kafka/common/serialization/ByteArrayDeserializer >>> at >>> org.apache.spark.sql.kafka010.KafkaSourceProvider.createSource(KafkaSourceProvider.scala:74) >>> at >>> org.apache.spark.sql.execution.datasources.DataSource.createSource(DataSource.scala:245) >>> at >>> org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$logicalPlan$1.applyOrElse(StreamExecution.scala:127) >>> at >>> org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$logicalPlan$1.applyOrElse(StreamExecution.scala:123) >>> at >>> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:288) >>> at >>> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:288) >>> at >>> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70) >>> >>> >>> On Tue, May 16, 2017 at 10:30 AM, Shixiong(Ryan) Zhu < >>> shixi...@databricks.com> wrote: >>> >>>> The default "startingOffsets" is "latest". If you don't push any data >>>> after starting the query, it won't fetch anything. You can set it to >>>> "earliest" like ".option("startingOffsets", "earliest")" to start the >>>> stream from the beginning. >>>> >>>> On Tue, May 16, 2017 at 12:36 AM, kant kodali <kanth...@gmail.com> >>>> wrote: >>>> >>>>> Hi All, >>>>> >>>>> I have the following code. >>>>> >>>>> val ds = sparkSession.readStream() >>>>> .format("kafka") >>>>> .option("kafka.bootstrap.servers",bootstrapServers)) >>>>> .option("subscribe", topicName) >>>>> .option("checkpointLocation", hdfsCheckPointDir) >>>>> .load(); >>>>> >>>>> val ds1 = ds.select($"value") >>>>> val query = >>>>> ds1.writeStream.outputMode("append").format("console").start() >>>>> query.awaitTermination() >>>>> >>>>> There are no errors when I execute this code however I don't see any >>>>> data being printed out to console? When I run my standalone test Kafka >>>>> consumer jar I can see that it is receiving messages. so I am not sure >>>>> what >>>>> is going on with above code? any ideas? >>>>> >>>>> Thanks! >>>>> >>>> >>>> >>> >> >