On Thu, Feb 05, 2015 at 11:57:15PM -0800, Joel Koshy wrote:
> On Fri, Feb 06, 2015 at 12:43:37AM -0500, Jason Rosenberg wrote:
> > I'm not sure what you mean by 'default' behavior 'only if' offset.storage
> > is kafka.  Does that mean the 'default' behavior is 'false' if
> > offset.storage is 'zookeeper'?  Can that be clarified in the config
> > documentation section?
> > 
> > In section 5.6 where the offset managements is described, there is this:
> > "A roll-back (i.e., migrating from Kafka back to ZooKeeper) can also be
> > performed using the above steps if you set offsets.storage=zookeeper."
> > 
> > This implies that dual commit will work also if offsets.storage=zookeeper,
> > no?  Just not by default?  Perhaps there needs to be clarification there
> > (and in the config section for offsets.storage & dual.commit.enabled).
> 
> Actually I think there may be a bug here if someone needs to roll back
> from Kafka-based offsets to zookeeper. Will reply tomorrow on this.
> 
> > 

Never mind - I think we are fine here.  The scenario I was thinking
about is the following: 
- If there are three consumer instances c0, c1, c2 consuming
  partitions pX, pY, ...  and are committing offsets to Kafka and you
  want to migrate to zookeeper
- Do a rolling bounce to turn on dual-commit (and keep offset.storage
  = kafka)
- Do another rolling bounce to set offset.storage to zookeeper:
  - Say, you bounce c0 to commit offsets to zk and it comes back up
    and then owns pX. It begins to commit offsets for pX to zookeeper
    only.
  - You then bounce c1; after it goes down due to our partition
    assignment strategy say pX is now assigned to c2 (which has not
    yet been bounced).
  - c2 uses offset.storage=kafka so would fetch a potentially stale
    offset for pX which would be an issue. 
  - So we explicitly handle this case - if dual.commit is turned on
    and offset.storage is kafka, then the broker fetches offsets from
    both Kafka and ZooKeeper and selects the maximum of the two.

Let me know if you see any holes in the above.

dual.commit is confusing and would have been (slightly) less confusing if it
was called offset.migration.in.progress or something similar. Still, I think
we can document the process carefully and state clearly that it is
intended for use during migration/roll-back only.

Reply via email to