Hartmut Reuter wrote: > I can't use the vnode->lock for this kind of locking, anyway, because > the End-of-I/O-rpc wouldn't run in the same thread. So I have planned a > counter for ongoing reads (write can only start if that came down to 0) > a counter for waiters (to know whether End-of-I/O-rpc has to wake > someone or just can free the struct) and, of course a writer field which > contains the ip-address of the writing client or 0 if there is no write > in progress. > > But all these are implementation details which have nothing to do with > the AFS3 protocol and can be changed later if it seems appropriate. > > -Hartmut
Hartmut: The issue to which Jeff Hutzelman is referring is RXAFS_SetLock, RXAFS_ReleaseLock, and RXAFS_ExtendLock. As you know, these RPCs are used to manage the CM-FS transactions for file locks. A CM requests a lock with SetLock and then proceeds to extend the lifetime of the lock every five minutes with ExtendLock and releases the lock with ReleaseLock. The problem is that there is no magic cookie or lockId or transactionId returned as part of the SetLock call. Therefore, when the FS receives a ExtendLock or ReleaseLock call it does not know if the request came from the CM that issued the original SetLock or not. An ExtendLock can be issued and will succeed as long as the lock count is non-zero. If there is a client that is issuing ExtendLock calls on a FID, those will fail until such time as another client obtains a read lock at which point the lock will be successfully extended even though it was never issued. In the same regards, a ReleaseLock can be issued and will succeed on a FID even when there is no outstanding lock issued to the CM performing the release. We have seen these problems in practice. A CM was issued a lock and then gets disconnected from the network for longer than five minutes (perhaps due to a suspend). The lock for that CM should have been dropped but the CM is unaware and when it wakes attempts to ExtendLock and eventually ReleaseLock causing the lock counts to get out of sync. We have also seen buggy clients that issue ExtendLocks and never stop even after the client has issued a ReleaseLock. Now that we have UUIDs for most clients (UUIDs are not required) we can mitigate the problem by tracking the clients that are actively issued locks and when they will expire. However, it cannot be fixed entirely. The proper way to address this is for SetLock to return some identifier for the lock that can be used to ensure that when an ExtendLock or ReleaseLock is sent, it applies only to the one instance of a lock that was issued and not to any others. The RXAFS_OSD_StartFetchData/RXAFS_OSD_ExtendFetchData/RXAFS_OSD_EndFetchData and RXAFS_OSD_StartStoreData/RXAFS_OSD_ExtendStoreData/RXAFS_OSD_EndStoreData rpcs are going to have exactly the same issue as SetLock/ExtendLock/ReleaseLock rpcs. Jeff's point is that we must not repeat the same mistakes from our past. Jeffrey Altman _______________________________________________ OpenAFS-devel mailing list [email protected] https://lists.openafs.org/mailman/listinfo/openafs-devel
