[GitHub] spark pull request: [SPARK-6051][Streaming] Add ZooKeeper offest p...

koeninger Tue, 10 Mar 2015 06:27:37 -0700

Github user koeninger commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4805#discussion_r26120607
  
    --- Diff: 
external/kafka/src/main/scala/org/apache/spark/streaming/kafka/DirectKafkaInputDStream.scala
 ---
    @@ -84,6 +83,11 @@ class DirectKafkaInputDStream[
     
       protected var currentOffsets = fromOffsets
     
    +  // Map to manage the time -> topic/partition+offset
    +  private val offsetMap = new mutable.HashMap[Time, Map[TopicAndPartition, 
Long]]()
    +  // Add to the listener bus for job completion hook
    +  context.addStreamingListener(new DirectKafkaStreamingListener)
    +
       @tailrec
       protected final def latestLeaderOffsets(retries: Int): 
Map[TopicAndPartition, LeaderOffset] = {
    --- End diff --
    
    I don't agree that group.id has anything at all to do with tracking offsets 
in zookeeper.  Its purpose is to identify a group of related consumers.  From 
the kafka docs:
    
    "A string that uniquely identifies the group of consumer processes to which 
this consumer belongs. By setting the same group id multiple processes indicate 
that they are all part of the same consumer group."
    
    Those consumers may or may not be committing offsets to ZK



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-6051][Streaming] Add ZooKeeper offest p...

Reply via email to