Hartmut: You are welcome for the advise. I would be happy to provide much more of it once I am able to read a protocol specification.
Thank you. Jeffrey Altman Hartmut Reuter wrote: > Jeffrey: > > thank you very much for your long advise. I will follow it. The only > change I would propose is to name the new RPCs differently because we > will have asynchronous I/O not only with OSDs, but also with direct > access to visible fileserver partitions (what I called embedded > filesystems). So I think we need six new RPCs: > > RRAFS_StartAsyncFetch(...) > RXAFS_ExtendAsyncFetch(...) > RXAFS_EndAsyncFetch(...) > RXAFS_StartAsyncStore(...) > RXAFS_ExtendAsyncStore(...) > RXAFS_EndAsyncStore(...) > > These RPCs would replace RXAFS_GetOSDlocation, and RXAFS_Serverpath and > in some cases storeMini. > > Hartmut > > > > Jeffrey Altman wrote: >> Hartmut: >> >> The issue to which Jeff Hutzelman is referring is RXAFS_SetLock, >> RXAFS_ReleaseLock, and RXAFS_ExtendLock. As you know, these RPCs are >> used to manage the CM-FS transactions for file locks. A CM requests a >> lock with SetLock and then proceeds to extend the lifetime of the lock >> every five minutes with ExtendLock and releases the lock with ReleaseLock. >> >> The problem is that there is no magic cookie or lockId or transactionId >> returned as part of the SetLock call. Therefore, when the FS receives a >> ExtendLock or ReleaseLock call it does not know if the request came from >> the CM that issued the original SetLock or not. >> >> An ExtendLock can be issued and will succeed as long as the lock count >> is non-zero. If there is a client that is issuing ExtendLock calls on a >> FID, those will fail until such time as another client obtains a read >> lock at which point the lock will be successfully extended even though >> it was never issued. >> >> In the same regards, a ReleaseLock can be issued and will succeed on a >> FID even when there is no outstanding lock issued to the CM performing >> the release. >> >> We have seen these problems in practice. A CM was issued a lock and >> then gets disconnected from the network for longer than five minutes >> (perhaps due to a suspend). The lock for that CM should have been >> dropped but the CM is unaware and when it wakes attempts to ExtendLock >> and eventually ReleaseLock causing the lock counts to get out of sync. >> We have also seen buggy clients that issue ExtendLocks and never stop >> even after the client has issued a ReleaseLock. >> >> Now that we have UUIDs for most clients (UUIDs are not required) we can >> mitigate the problem by tracking the clients that are actively issued >> locks and when they will expire. However, it cannot be fixed entirely. >> >> The proper way to address this is for SetLock to return some identifier >> for the lock that can be used to ensure that when an ExtendLock or >> ReleaseLock is sent, it applies only to the one instance of a lock that >> was issued and not to any others. >> >> The >> RXAFS_OSD_StartFetchData/RXAFS_OSD_ExtendFetchData/RXAFS_OSD_EndFetchData >> and >> RXAFS_OSD_StartStoreData/RXAFS_OSD_ExtendStoreData/RXAFS_OSD_EndStoreData >> rpcs are going to have exactly the same issue as >> SetLock/ExtendLock/ReleaseLock rpcs. Jeff's point is that we must not >> repeat the same mistakes from our past. >> >> Jeffrey Altman >> > > _______________________________________________ OpenAFS-devel mailing list [email protected] https://lists.openafs.org/mailman/listinfo/openafs-devel
