On Sep 13, 2006, at 9:59 AM, Hartmut Reuter wrote:
Juha Jäykkä wrote:
Better you do a "vos convertROtoRW" on the RO-site as soon as
possible to regain a valid RW-volume in this case.
Except that I'm unlikely to notice the corruption before it's
released,
which happens automatically. Sounds like we need to change our backup
policy...
The best way to prevent the salvager from corrupting volumes is not
to run it automatically. If you configure your OpenAFS with with "--
enable-fast-restart" then the fileserver will not salvage
automatically after a crash. So if you find after a ccrash volumes
which couldn't be attached you salvage them by "bos salvage server
partition volume" and examine the SalvageLog. I suppose in the case
he throws the root-directory away you will see some thing in the log.
In a former life I had some Transarc AFS servers which had persistent
problems starting after a major crash due to multi-day power outage.
Some were the same line reported here. The only cleanup process that
worked went something like this:
For every volume in the old server:
Vos dump the volume to a file
Restore it from the file to a different name on an otherwise empty
server
Salvage that volume with orphans being attached. Since you're
salvaging a copy, you
have no risk of hosing the production volume
If no problems, move the original volume to a new server
If recoverable problems in the salvage:
delete the copy you'd made
move the volume to the empty server
salvage it, and clean up the orphans with the end user
move the volume to a new server
thoroughly clean (mkfs) the empty server
If unrecoverable problems in the volume salvage:
tar up the existing volume as best you can
apologize profusely to user.
Fortunately only about 1% of the volumes had problems and most were
easily remedied. But until we emptied the original servers and
rebuilt them from scratch with modern openafs, we had lingering
oddball problems.
In my current position we have about 10.5TB of user files in almost
250,000 volumes spread across 22 servers of various sizes, running
1.4. We see various minor problems which the openafs developers seem
to be addressing quite well, but as a defensive measure we're just
starting a policy of periodically emptying a file server and
ruthlessly salvaging it. Ask me in a year and I'll let you know how
it goes.
Steve_______________________________________________
OpenAFS-info mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-info