On Fri, 03 Sep 2010 17:49:26 -0400 Jeffrey Hutzelman <[email protected]> wrote:
> --On Friday, September 03, 2010 04:00:00 PM -0500 Andrew Deason > <[email protected]> wrote: > > > If DBWRITING is not set, we send tidCounter+1, as you mention. If > > there is still no write transaction when it arrives, the trans id is > > not checked. If a write transaction has started in the meantime, it > > will have a higher transaction id than the one sent since it began > > after we sent the beacon. (Otherwise the sync site would have > > detected DBWRITING and would have sent writeTidCounter). > > No, I think you're making an assumption of atomicity that is not true. > "It began" is a distributed state change which may not take effect > everywhere at once, with respect to when our beacon is sent. Moreso > for the _end_ of a transaction, where we're transitioning in the > opposite direction. Fixing writeTidCounter may make this problem > worse, as it will no longer tend to be much lower than tidCounter. The atomic operation I mean is when DBWRITING is set on the sync site. Assuming no in-flight prior write transaction... DBWRITING is set before we contact any remote sites, so if it is not set, there is no remote in-flight transaction. So we send tidCounter+1 in VOTE_Beacon. If the remote site gains a write transaction while that VOTE_Beacon message is being sent, that write transaction will have at least a counter of tidCounter+2. On the other side, we don't clear DBWRITING until we've contacted remote sites. So if DBWRITING is not set, we must have ended the previous transaction. In the above, by "a remote site" or "remote sites" I mean sites that we have successfully contacted. If we have not contacted a particular remote site but still contacted enough for quorum, then we're not racing for a false-positive transaction abort, since the transaction should be aborted in such a case. And I know this is fuzzy; I'm not trying to make a proof, just explaining why I don't think this has is seen currently. > In addition, as we discussed on jabber, there are some rather > significant thread-safety issues with pthreaded ubik. One of those is > that our examination of DBWRITING, tidCounter, and writeTidCounter are > not atomic, and neither is the starting of a new local transaction > atomic with respect to the main body of ubeacon_Interact(). Yes, but this issue is not tubik-specific. -- Andrew Deason [email protected] _______________________________________________ OpenAFS-devel mailing list [email protected] https://lists.openafs.org/mailman/listinfo/openafs-devel
