Re: How to print data to console in structured streaming using Spark 2.1.0?

2017-05-17 Thread kant kodali
Thanks Mike & Ryan. Now I can finally see my 5KB messages. However I am running into the following error. OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x00073470, 530579456, 0) failed; error='Cannot allocate memory' (errno=12) # There is insufficient memory for the Java

Re: How to print data to console in structured streaming using Spark 2.1.0?

2017-05-16 Thread Michael Armbrust
I mean the actual kafka client: org.apache.kafka kafka-clients 0.10.0.1 On Tue, May 16, 2017 at 4:29 PM, kant kodali wrote: > Hi Michael, > > Thanks for the catch. I assume you meant > *spark-streaming-kafka-0-10_2.11-2.1.0.jar* > > I add this in all spark machines

Re: How to print data to console in structured streaming using Spark 2.1.0?

2017-05-16 Thread kant kodali
Hi Michael, Thanks for the catch. I assume you meant *spark-streaming-kafka-0-10_2.11-2.1.0.jar* I add this in all spark machines under SPARK_HOME/jars. Still same error seems to persist. Is that the right jar or is there anything else I need to add? Thanks! On Tue, May 16, 2017 at 1:40 PM,

Re: How to print data to console in structured streaming using Spark 2.1.0?

2017-05-16 Thread Michael Armbrust
Looks like you are missing the kafka dependency. On Tue, May 16, 2017 at 1:04 PM, kant kodali wrote: > Looks like I am getting the following runtime exception. I am using Spark > 2.1.0 and the following jars > > *spark-sql_2.11-2.1.0.jar* > >

Re: How to print data to console in structured streaming using Spark 2.1.0?

2017-05-16 Thread kant kodali
Looks like I am getting the following runtime exception. I am using Spark 2.1.0 and the following jars *spark-sql_2.11-2.1.0.jar* *spark-sql-kafka-0-10_2.11-2.1.0.jar* *spark-streaming_2.11-2.1.0.jar* Exception in thread "stream execution thread for [id = fcfe1fa6-dab3-4769-9e15-e074af622cc1,

Re: How to print data to console in structured streaming using Spark 2.1.0?

2017-05-16 Thread Shixiong(Ryan) Zhu
The default "startingOffsets" is "latest". If you don't push any data after starting the query, it won't fetch anything. You can set it to "earliest" like ".option("startingOffsets", "earliest")" to start the stream from the beginning. On Tue, May 16, 2017 at 12:36 AM, kant kodali

Re: How to print data to console in structured streaming using Spark 2.1.0?

2017-05-16 Thread kant kodali
This isn't structured Streaming right On Tue, May 16, 2017 at 4:15 AM, Didac Gil wrote: > From what I know, you would have to iterate on each RDD. When you are > reading from the Stream, Spark actually collects the data as a miniRDD for > each period of time. > > I hope

Re: How to print data to console in structured streaming using Spark 2.1.0?

2017-05-16 Thread Didac Gil
From what I know, you would have to iterate on each RDD. When you are reading from the Stream, Spark actually collects the data as a miniRDD for each period of time. I hope this helps. ds.foreachRDD{ rdd => val newNames = Seq(“Field1”,"Field2”,"Field3") val mydataDF = rdd.toDF(newNames: _*)

How to print data to console in structured streaming using Spark 2.1.0?

2017-05-16 Thread kant kodali
Hi All, I have the following code. val ds = sparkSession.readStream() .format("kafka") .option("kafka.bootstrap.servers",bootstrapServers)) .option("subscribe", topicName) .option("checkpointLocation", hdfsCheckPointDir)