Re: How to print data to console in structured streaming using Spark 2.1.0?

Michael Armbrust Tue, 16 May 2017 16:35:09 -0700

I mean the actual kafka client:

<dependency>
  <groupId>org.apache.kafka</groupId>
  <artifactId>kafka-clients</artifactId>
  <version>0.10.0.1</version>
</dependency>



On Tue, May 16, 2017 at 4:29 PM, kant kodali <kanth...@gmail.com> wrote:

> Hi Michael,
>
> Thanks for the catch. I assume you meant
> *spark-streaming-kafka-0-10_2.11-2.1.0.jar*
>
> I add this in all spark machines under SPARK_HOME/jars.
>
> Still same error seems to persist. Is that the right jar or is there
> anything else I need to add?
>
> Thanks!
>
>
>
> On Tue, May 16, 2017 at 1:40 PM, Michael Armbrust <mich...@databricks.com>
> wrote:
>
>> Looks like you are missing the kafka dependency.
>>
>> On Tue, May 16, 2017 at 1:04 PM, kant kodali <kanth...@gmail.com> wrote:
>>
>>> Looks like I am getting the following runtime exception. I am using
>>> Spark 2.1.0 and the following jars
>>>
>>> *spark-sql_2.11-2.1.0.jar*
>>>
>>> *spark-sql-kafka-0-10_2.11-2.1.0.jar*
>>>
>>> *spark-streaming_2.11-2.1.0.jar*
>>>
>>>
>>> Exception in thread "stream execution thread for [id = 
>>> fcfe1fa6-dab3-4769-9e15-e074af622cc1, runId = 
>>> 7c54940a-e453-41de-b256-049b539b59b1]"
>>>
>>> java.lang.NoClassDefFoundError: 
>>> org/apache/kafka/common/serialization/ByteArrayDeserializer
>>>     at 
>>> org.apache.spark.sql.kafka010.KafkaSourceProvider.createSource(KafkaSourceProvider.scala:74)
>>>     at 
>>> org.apache.spark.sql.execution.datasources.DataSource.createSource(DataSource.scala:245)
>>>     at 
>>> org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$logicalPlan$1.applyOrElse(StreamExecution.scala:127)
>>>     at 
>>> org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$logicalPlan$1.applyOrElse(StreamExecution.scala:123)
>>>     at 
>>> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:288)
>>>     at 
>>> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:288)
>>>     at 
>>> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
>>>
>>>
>>> On Tue, May 16, 2017 at 10:30 AM, Shixiong(Ryan) Zhu <
>>> shixi...@databricks.com> wrote:
>>>
>>>> The default "startingOffsets" is "latest". If you don't push any data
>>>> after starting the query, it won't fetch anything. You can set it to
>>>> "earliest" like ".option("startingOffsets", "earliest")" to start the
>>>> stream from the beginning.
>>>>
>>>> On Tue, May 16, 2017 at 12:36 AM, kant kodali <kanth...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi All,
>>>>>
>>>>> I have the following code.
>>>>>
>>>>>  val ds = sparkSession.readStream()
>>>>>                 .format("kafka")
>>>>>                 .option("kafka.bootstrap.servers",bootstrapServers))
>>>>>                 .option("subscribe", topicName)
>>>>>                 .option("checkpointLocation", hdfsCheckPointDir)
>>>>>                 .load();
>>>>>
>>>>>  val ds1 = ds.select($"value")
>>>>>  val query = 
>>>>> ds1.writeStream.outputMode("append").format("console").start()
>>>>>  query.awaitTermination()
>>>>>
>>>>> There are no errors when I execute this code however I don't see any
>>>>> data being printed out to console? When I run my standalone test Kafka
>>>>> consumer jar I can see that it is receiving messages. so I am not sure 
>>>>> what
>>>>> is going on with above code? any ideas?
>>>>>
>>>>> Thanks!
>>>>>
>>>>
>>>>
>>>
>>
>

Re: How to print data to console in structured streaming using Spark 2.1.0?

Reply via email to