In direct stream checkpoint location is not recoverable if you modify your
driver code. So if you just rely on checkpoint to commit offset , you can
possibly loose messages if you modify driver code and you select offset
from "largest" offset. If you do not want to loose messages, you need to
com
Kafka Receiver-based approach:
This will maintain the consumer offsets in ZK for you.
Kafka Direct approach:
You can use checkpointing and that will maintain consumer offsets for you.
You'll want to checkpoint to a highly available file system like HDFS or S3.
http://spark.apache.org/docs/latest/s
You need to maintain the offset yourself and rightly so in something like
ZooKeeper.
From: Tao Li [mailto:litao.bupt...@gmail.com]
Sent: Tuesday, December 08, 2015 5:36 PM
To: user@spark.apache.org
Subject: Need to maintain the consumer offset by myself when using spark
streaming kafka direct
I am using spark streaming kafka direct approach these days. I found that
when I start the application, it always start consumer the latest offset. I
hope that when application start, it consume from the offset last
application consumes with the same kafka consumer group. It means I have to
maintai