Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/4805#discussion_r26120607
--- Diff:
external/kafka/src/main/scala/org/apache/spark/streaming/kafka/DirectKafkaInputDStream.scala
---
@@ -84,6 +83,11 @@ class DirectKafkaInputDStream[
protected var currentOffsets = fromOffsets
+ // Map to manage the time -> topic/partition+offset
+ private val offsetMap = new mutable.HashMap[Time, Map[TopicAndPartition,
Long]]()
+ // Add to the listener bus for job completion hook
+ context.addStreamingListener(new DirectKafkaStreamingListener)
+
@tailrec
protected final def latestLeaderOffsets(retries: Int):
Map[TopicAndPartition, LeaderOffset] = {
--- End diff --
I don't agree that group.id has anything at all to do with tracking offsets
in zookeeper. Its purpose is to identify a group of related consumers. From
the kafka docs:
"A string that uniquely identifies the group of consumer processes to which
this consumer belongs. By setting the same group id multiple processes indicate
that they are all part of the same consumer group."
Those consumers may or may not be committing offsets to ZK
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]