Re: How to print data to console in structured streaming using Spark 2.1.0?

kant kodali Wed, 17 May 2017 11:26:59 -0700

Thanks Mike & Ryan. Now I can finally see my 5KB messages. However I am
running into the following error.


OpenJDK 64-Bit Server VM warning: INFO:
os::commit_memory(0x0000000734700000, 530579456, 0) failed; error='Cannot
allocate memory' (errno=12)

# There is insufficient memory for the Java Runtime Environment to continue.

# Native memory allocation (mmap) failed to map 530579456 bytes for
committing reserved memory.
# An error report file with more information is saved as:


I am running spark driver program in the client mode on a standalone
cluster using spark 2.1.1. When things happen like this I wonder which
memory I need to increase and how? Should I increase the driver JVM memory
or executor JVM memory?

On Tue, May 16, 2017 at 4:34 PM, Michael Armbrust <mich...@databricks.com>
wrote:

> I mean the actual kafka client:
>
> <dependency>
>   <groupId>org.apache.kafka</groupId>
>   <artifactId>kafka-clients</artifactId>
>   <version>0.10.0.1</version>
> </dependency>
>
>
> On Tue, May 16, 2017 at 4:29 PM, kant kodali <kanth...@gmail.com> wrote:
>
>> Hi Michael,
>>
>> Thanks for the catch. I assume you meant
>> *spark-streaming-kafka-0-10_2.11-2.1.0.jar*
>>
>> I add this in all spark machines under SPARK_HOME/jars.
>>
>> Still same error seems to persist. Is that the right jar or is there
>> anything else I need to add?
>>
>> Thanks!
>>
>>
>>
>> On Tue, May 16, 2017 at 1:40 PM, Michael Armbrust <mich...@databricks.com
>> > wrote:
>>
>>> Looks like you are missing the kafka dependency.
>>>
>>> On Tue, May 16, 2017 at 1:04 PM, kant kodali <kanth...@gmail.com> wrote:
>>>
>>>> Looks like I am getting the following runtime exception. I am using
>>>> Spark 2.1.0 and the following jars
>>>>
>>>> *spark-sql_2.11-2.1.0.jar*
>>>>
>>>> *spark-sql-kafka-0-10_2.11-2.1.0.jar*
>>>>
>>>> *spark-streaming_2.11-2.1.0.jar*
>>>>
>>>>
>>>> Exception in thread "stream execution thread for [id = 
>>>> fcfe1fa6-dab3-4769-9e15-e074af622cc1, runId = 
>>>> 7c54940a-e453-41de-b256-049b539b59b1]"
>>>>
>>>> java.lang.NoClassDefFoundError: 
>>>> org/apache/kafka/common/serialization/ByteArrayDeserializer
>>>>     at 
>>>> org.apache.spark.sql.kafka010.KafkaSourceProvider.createSource(KafkaSourceProvider.scala:74)
>>>>     at 
>>>> org.apache.spark.sql.execution.datasources.DataSource.createSource(DataSource.scala:245)
>>>>     at 
>>>> org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$logicalPlan$1.applyOrElse(StreamExecution.scala:127)
>>>>     at 
>>>> org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$logicalPlan$1.applyOrElse(StreamExecution.scala:123)
>>>>     at 
>>>> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:288)
>>>>     at 
>>>> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:288)
>>>>     at 
>>>> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
>>>>
>>>>
>>>> On Tue, May 16, 2017 at 10:30 AM, Shixiong(Ryan) Zhu <
>>>> shixi...@databricks.com> wrote:
>>>>
>>>>> The default "startingOffsets" is "latest". If you don't push any data
>>>>> after starting the query, it won't fetch anything. You can set it to
>>>>> "earliest" like ".option("startingOffsets", "earliest")" to start the
>>>>> stream from the beginning.
>>>>>
>>>>> On Tue, May 16, 2017 at 12:36 AM, kant kodali <kanth...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi All,
>>>>>>
>>>>>> I have the following code.
>>>>>>
>>>>>>  val ds = sparkSession.readStream()
>>>>>>                 .format("kafka")
>>>>>>                 .option("kafka.bootstrap.servers",bootstrapServers))
>>>>>>                 .option("subscribe", topicName)
>>>>>>                 .option("checkpointLocation", hdfsCheckPointDir)
>>>>>>                 .load();
>>>>>>
>>>>>>  val ds1 = ds.select($"value")
>>>>>>  val query = 
>>>>>> ds1.writeStream.outputMode("append").format("console").start()
>>>>>>  query.awaitTermination()
>>>>>>
>>>>>> There are no errors when I execute this code however I don't see any
>>>>>> data being printed out to console? When I run my standalone test Kafka
>>>>>> consumer jar I can see that it is receiving messages. so I am not sure 
>>>>>> what
>>>>>> is going on with above code? any ideas?
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: How to print data to console in structured streaming using Spark 2.1.0?

Reply via email to