Re: [OpenAFS] Re: [OpenAFS-devel] 1.6 and post-1.6 OpenAFS branch management and schedule

Rainer Toebbicke Fri, 18 Jun 2010 02:49:55 -0700

Jeffrey Hutzelman schrieb:

Really, I consider enable-fast-restart to be extremely dangerous.
It should have gone away long ago.
I realize some people believe that speed is more important than notlosing data, but I don't agree, and I don't think it's an appropriateposition for a filesystem to take. Not losing your data is pretty muchthe defining difference between filesystems you can lose and filesystemsfrom which you should run away screaming as fast as you can. I do notwant people to run away screaming from OpenAFS, at any speed.

I beg to disagree: the Volume/Vnode back-end has by no means the same problemsthat a file system might have. Damages there will never wildly destroy randomitems on disk, as you would have to be afraid using in a file system. At leastin namei, damages in a volume are entirely contained therein, files themselvesare at the worst entirely replaced by others, they're never corrupted partlyother than being half-written or such. Of course files on disk can becomeunfindable or directories can have bogus entries.

My experience is that damages to the vnode files usually result in directoriescontaining inaccessible entries, in very rare occasions cross-linked files.The link table is surprisingly robust (even with its header overwritten).

I reckon that in over 15 years of AFS service we've probably had more biterrors in files due to uncaught memory errors and uncaught transmissionerrors, not speaking about the major culprit "programming errors", than nastyinconsistencies after crashes which complete and immediate salvaging wouldhave caught.

We salvage volumes in the background at a low rate, and on file servers whichnever crash the logs show the same odd issues as on those who crashed, hencethe added risk of running with potential damage is within the error bars. Evenbetween salvages every now and then volumes drop out. The practical approachis to detect this quickly and re-salvage, and when the rate exceeds the painthreshold find a bug and fix it.

For us, the delta does not justify keeping the service down for several hoursafter a crash. Make that delta proportionally bigger by fixing the otherissues and I revise my statement.


--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Rainer Toebbicke
European Laboratory for Particle Physics(CERN) - Geneva, Switzerland
Phone: +41 22 767 8985       Fax: +41 22 767 7155
_______________________________________________
OpenAFS-info mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-info

Re: [OpenAFS] Re: [OpenAFS-devel] 1.6 and post-1.6 OpenAFS branch management and schedule

Reply via email to