This was beaten to death a while back with no resolution at the time: volserver would give a volume back to the fileserver, which would promptly break volume callbacks. however, if there was a delay in doing so, the blocking in the volserver waiting for a reply could cause other things it was doing (including talking to clients) to lose.
We tried changing the fileserver to ack the giveback and then do the work, with the problem that now the volserver would try another fssync call and the fssync thread would be busy breaking callbacks and the effect was similar I'm led to believe IBM dealt with this another way (possibly by changing the fssync listener to a "hot thread" and acking before going into break callback land) but the potential for using up all the threads just breaking fssync callbacks was disturbing and so I pursued another route. There's a patch on the head, not directly portable to 1.2.x, but I have a version for 1.2.x also, more on that shortly. It adds a thread to the fileserver just for breaking callbacks for fssync callers. The way this works is as follows: -fileserver fssync handler calls BreakVolumeCallbacksLater --file entries which need to be broken are marked FE_LATER --callbacks to be broken are marked CB_DELAYED --hosts with callbacks we just marked CB_DELAYED are marked HFE_LATER --fssync lwp gets wakeup from fssync handler --fileserver acks volserver -fileserver fssync thread wakes up from this or every 5 minutes, in case a wakeup is missed. --the fssync thread calls BreakLaterCallBacks until it finds that no callbacks needed to be broken --BreakLaterCallBacks finds file entries set FE_LATER, unchains them, and breaks all callbacks they represent. it works just like BreakVolumeCallbacks which means that if the host is VENUSDOWN we end up forcing them to InitCallBackState later, and toss the callbacks. if callbacks were available to break it returns 1, suggesting the caller call it again if a caller with a callback to be broken calls in before we break it, -CallPreamble notices HFE_LATER and calls BreakDelayedCallbacks. edge cases: -HFE_LATER is not unset, so you one time call BreakDelayedCallbacks unnecessarily for each host you had a "Later" callback to break for. the overhead on this is negligible. -if a new caller gets a callback on a file where we set the file entry FE_LATER and haven't dealt yet, it will have its callback broken also, despite no changes to the file. overhead: one FetchStatus per client this happens to. but it's not very likely to happen anyway. http://www.dementia.org/~shadow/fssync.diff a.k.a. /afs/andrew.cmu.edu/usr16/shadow/www/fssync.diff is applicable to 1.2.x if anyone wishes to try it there. if you do try it, please report your findings. _______________________________________________ OpenAFS-devel mailing list [EMAIL PROTECTED] https://lists.openafs.org/mailman/listinfo/openafs-devel
