Jeffrey: thank you very much for your long advise. I will follow it. The only change I would propose is to name the new RPCs differently because we will have asynchronous I/O not only with OSDs, but also with direct access to visible fileserver partitions (what I called embedded filesystems). So I think we need six new RPCs:
RRAFS_StartAsyncFetch(...) RXAFS_ExtendAsyncFetch(...) RXAFS_EndAsyncFetch(...) RXAFS_StartAsyncStore(...) RXAFS_ExtendAsyncStore(...) RXAFS_EndAsyncStore(...) These RPCs would replace RXAFS_GetOSDlocation, and RXAFS_Serverpath and in some cases storeMini. Hartmut Jeffrey Altman wrote: > > Hartmut: > > The issue to which Jeff Hutzelman is referring is RXAFS_SetLock, > RXAFS_ReleaseLock, and RXAFS_ExtendLock. As you know, these RPCs are > used to manage the CM-FS transactions for file locks. A CM requests a > lock with SetLock and then proceeds to extend the lifetime of the lock > every five minutes with ExtendLock and releases the lock with ReleaseLock. > > The problem is that there is no magic cookie or lockId or transactionId > returned as part of the SetLock call. Therefore, when the FS receives a > ExtendLock or ReleaseLock call it does not know if the request came from > the CM that issued the original SetLock or not. > > An ExtendLock can be issued and will succeed as long as the lock count > is non-zero. If there is a client that is issuing ExtendLock calls on a > FID, those will fail until such time as another client obtains a read > lock at which point the lock will be successfully extended even though > it was never issued. > > In the same regards, a ReleaseLock can be issued and will succeed on a > FID even when there is no outstanding lock issued to the CM performing > the release. > > We have seen these problems in practice. A CM was issued a lock and > then gets disconnected from the network for longer than five minutes > (perhaps due to a suspend). The lock for that CM should have been > dropped but the CM is unaware and when it wakes attempts to ExtendLock > and eventually ReleaseLock causing the lock counts to get out of sync. > We have also seen buggy clients that issue ExtendLocks and never stop > even after the client has issued a ReleaseLock. > > Now that we have UUIDs for most clients (UUIDs are not required) we can > mitigate the problem by tracking the clients that are actively issued > locks and when they will expire. However, it cannot be fixed entirely. > > The proper way to address this is for SetLock to return some identifier > for the lock that can be used to ensure that when an ExtendLock or > ReleaseLock is sent, it applies only to the one instance of a lock that > was issued and not to any others. > > The > RXAFS_OSD_StartFetchData/RXAFS_OSD_ExtendFetchData/RXAFS_OSD_EndFetchData > and > RXAFS_OSD_StartStoreData/RXAFS_OSD_ExtendStoreData/RXAFS_OSD_EndStoreData > rpcs are going to have exactly the same issue as > SetLock/ExtendLock/ReleaseLock rpcs. Jeff's point is that we must not > repeat the same mistakes from our past. > > Jeffrey Altman > -- ----------------------------------------------------------------- Hartmut Reuter e-mail [email protected] phone +49-89-3299-1328 fax +49-89-3299-1301 RZG (Rechenzentrum Garching) web http://www.rzg.mpg.de/~hwr Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) ----------------------------------------------------------------- _______________________________________________ OpenAFS-devel mailing list [email protected] https://lists.openafs.org/mailman/listinfo/openafs-devel
