I have a branch which more or less achieves what is described above by
way of a new dead server retry timeout behaviour.

As per the current behaviour with consistent distribution and auto
ejection, keys on the dead server are moved after we hit the initial
failure limit by taking that host out of the continuum.  We then reset
the continuum to force a retry every time we hit the dead server retry
timeout in the same manner as is done for standard connection retries.
Each dead server retry will result in a miss if the host is not actually
available which isn't ideal but I wanted to achieve this whilst
maintaining compatibility with current behaviour so have kept the
changes to a bare minimum.

It's worth noting here that there are a couple of instances where an IO
failure would incorrectly reset a server state to new even if it was
already in timeout.  I've corrected this when setting the state although
I suspect the IO in question probably shouldn't be being attempted in
the first place in some cases.

I've made no attempt to leave keys on their newly allocated servers once
the dead server is brought back to life and I don't believe it would be
sensible to do so.  With multiple clients running, network flapping
would result in effectively random distribution if we attempted to did
this negating the point of the use of consistent distribution.

Bar the correction to the server state reset on IO failure when in
timeout the changes introduced do not alter the behaviour currently seen
if the new dead retry timeout is not used so they should be completely
backwards compatible.

The branch is available at
https://code.launchpad.net/~trevor/libmemcached/dead-retry and I've
attached a patch which will apply to 1.0.2.

Feedback would be welcome as ideally this isn't something I want to have
to maintain separately.

** Attachment added: "Add dead server retry"
   
https://bugs.launchpad.net/libmemcached/+bug/881983/+attachment/2629812/+files/backoff-dead-reconnect

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/881983

Title:
  libmemcached resets continuum with dead server

To manage notifications about this bug go to:
https://bugs.launchpad.net/libmemcached/+bug/881983/+subscriptions

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to