Thanks Mike & Ryan. Now I can finally see my 5KB messages. However I am
running into the following error.
OpenJDK 64-Bit Server VM warning: INFO:
os::commit_memory(0x00073470, 530579456, 0) failed; error='Cannot
allocate memory' (errno=12)
# There is insufficient memory for the Java
I mean the actual kafka client:
org.apache.kafka
kafka-clients
0.10.0.1
On Tue, May 16, 2017 at 4:29 PM, kant kodali wrote:
> Hi Michael,
>
> Thanks for the catch. I assume you meant
> *spark-streaming-kafka-0-10_2.11-2.1.0.jar*
>
> I add this in all spark machines
Hi Michael,
Thanks for the catch. I assume you meant
*spark-streaming-kafka-0-10_2.11-2.1.0.jar*
I add this in all spark machines under SPARK_HOME/jars.
Still same error seems to persist. Is that the right jar or is there
anything else I need to add?
Thanks!
On Tue, May 16, 2017 at 1:40 PM,
Looks like you are missing the kafka dependency.
On Tue, May 16, 2017 at 1:04 PM, kant kodali wrote:
> Looks like I am getting the following runtime exception. I am using Spark
> 2.1.0 and the following jars
>
> *spark-sql_2.11-2.1.0.jar*
>
>
Looks like I am getting the following runtime exception. I am using Spark
2.1.0 and the following jars
*spark-sql_2.11-2.1.0.jar*
*spark-sql-kafka-0-10_2.11-2.1.0.jar*
*spark-streaming_2.11-2.1.0.jar*
Exception in thread "stream execution thread for [id =
fcfe1fa6-dab3-4769-9e15-e074af622cc1,
The default "startingOffsets" is "latest". If you don't push any data after
starting the query, it won't fetch anything. You can set it to "earliest"
like ".option("startingOffsets", "earliest")" to start the stream from the
beginning.
On Tue, May 16, 2017 at 12:36 AM, kant kodali
This isn't structured Streaming right
On Tue, May 16, 2017 at 4:15 AM, Didac Gil wrote:
> From what I know, you would have to iterate on each RDD. When you are
> reading from the Stream, Spark actually collects the data as a miniRDD for
> each period of time.
>
> I hope
From what I know, you would have to iterate on each RDD. When you are reading
from the Stream, Spark actually collects the data as a miniRDD for each period
of time.
I hope this helps.
ds.foreachRDD{ rdd =>
val newNames = Seq(“Field1”,"Field2”,"Field3")
val mydataDF = rdd.toDF(newNames: _*)
Hi All,
I have the following code.
val ds = sparkSession.readStream()
.format("kafka")
.option("kafka.bootstrap.servers",bootstrapServers))
.option("subscribe", topicName)
.option("checkpointLocation", hdfsCheckPointDir)