--On Friday, September 03, 2010 01:37:43 PM -0500 Andrew Deason <[email protected]> wrote:

On Mon, 30 Aug 2010 13:12:41 -0500
Andrew Deason <[email protected]> wrote:

So, based on all of this, I propose to do three things:

 1. Set writeTidCounter = tidCounter on beginning write transactions
    instead of incrementing writeTidCounter by 2.

After much discussion and such on jabber... this is gerrit 2647, if
people want to look.

After the Jabber discussion, I had an offline discussion with another developer who, like myself, has spent more time studying this code than most. I believe we came to the conclusion that setting writeTidCounter as you describe has always been the right thing to do, and that this change is neither harmful nor backward-incompatible, given the current behavior of urecovery_CheckTid.

The rest of my comments here are colored by that offline discussion, but are mostly my opinion and interpretation and may not reflect what the other developer thinks about the topic, or may even be wrong. :-)

 2. Change urecovery_CheckTid to strictly check if the given
    transaction id is _equal_ to the current transaction id, not just
    if it is equal or happened before the current transaction.

While this may be worthwhile to do everywhere except SVOTE_Beacon, I
think this requires more thinking about.

It _might_ be this way to avoid the case where a beacon is sent, and
then a write transaction is created, and the beacon might have an older
(smaller) transaction id, causing the transaction to abort. As far as I
can tell that would only need to be a special case for SVOTE_Beacon
processing, but... it'll take more thought.

There are actually three cases here:

1) SVOTE_Beacon
  This is the most difficult case, and will require further study.
  The current test may be subject to a race that can cause it to
  abort a transaction when it shouldn't, but I'm not positive.  It
  may also be necessary to prevent corruption in certain cases when
  there is poor connectivity, or it may not; again, I'm not sure.
  I do believe a strict check would be worse, as it could result
  in a different race, in whose existance I am more confident.

2) SDISK_Begin
  The comparison should not be done at all here; SDISK_Begin should
  _always_ abort any in-progress transaction.  In fact, this was
  already the case, because SDISK_Begin would abort any active
  transaction if urecovery_CheckTid had not killed it.

  There was some duplication of code here which was eliminated in
  #2628 (already merged), though in retrospect I think that code
  might be better if abortalways were checked _first_, rather than
  after checking the epoch and counter.

3) Everywhere else (that is, other SDISK_*)
  These should perform an exact check; it is never OK for a DISK
  operation intended for one transaction to be applied to another.
  This should not be a problem, as ubik should not make these RPC's
  to a server on which the DISK_Begin() call did not succeed.
  Still, it would probably be good to fix this.

 3. To accomodate ubik sites without the change in "1.", change
    SVOTE_Beacon to only check the epoch of the given transaction id,
    and ignore the tid counter. Otherwise, we will abort valid
    transactions due to other sites giving us a bogus transaction id
    in VOTE_Beacon messages. This relaxes the supposed safety check
    somewhat, but if other sites are sending bogus trans id counters,
    we can't really check them.

After discussing and thinking about this, this may not be a good idea.
The problem is with the sender of the VOTE_Beacon messages, so... we
should just fix the VOTE_Beacon side, and not tighten the VOTE_Beacon
trans id check.

What you had proposed didn't tighten the check; it relaxed it. I'm fairly sure that if the check is needed at all, then checking only the epoch is not sufficient. I'm less sure at the moment about whether the check is needed. However, my original conservatism holds - until we're sure that it not only isn't needed but is actively harmful, we shouldn't remove it.


Finally, I'm concerned about the way ubeacon_Interact() decides what to send as the transaction ID. Currently it sends writeTidCounter if the DBWRITING flag is set, and tidCounter + 1 otherwise. The problem here is that I believe there may be a race which could case a value to be sent which is "right" but doesn't match the remote server's idea of the correct transaction ID because of a race between ubeacon_Interact and when the DISK_Begin or DISK_{Commit,ReleaseLocks,Abort} happens on that particular server. I'm somewhat surprised we're not already seeing a problem, so it may be that the race doesn't actually exist; I'll have to work through how the locking works to decide.

-- Jeff
_______________________________________________
OpenAFS-devel mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-devel

Reply via email to