On Sep 13, 2006, at 9:59 AM, Hartmut Reuter wrote:

Juha Jäykkä wrote:

Better you do a "vos convertROtoRW" on the RO-site as soon as possible to regain a valid RW-volume in this case.
Except that I'm unlikely to notice the corruption before it's released,
which happens automatically. Sounds like we need to change our backup
policy...

The best way to prevent the salvager from corrupting volumes is not to run it automatically. If you configure your OpenAFS with with "-- enable-fast-restart" then the fileserver will not salvage automatically after a crash. So if you find after a ccrash volumes which couldn't be attached you salvage them by "bos salvage server partition volume" and examine the SalvageLog. I suppose in the case he throws the root-directory away you will see some thing in the log.

In a former life I had some Transarc AFS servers which had persistent problems starting after a major crash due to multi-day power outage. Some were the same line reported here. The only cleanup process that worked went something like this:

For every volume in the old server:
  Vos dump the volume to a file
Restore it from the file to a different name on an otherwise empty server Salvage that volume with orphans being attached. Since you're salvaging a copy, you
     have no risk of hosing the production volume
  If no problems, move the original volume to a new server
  If recoverable problems in the salvage:
     delete the copy you'd made
     move the volume to the empty server
     salvage it, and clean up the orphans with the end user
     move the volume to a new server
     thoroughly clean (mkfs) the empty server
  If unrecoverable problems in the volume salvage:
     tar up the existing volume as best you can
     apologize profusely to user.

Fortunately only about 1% of the volumes had problems and most were easily remedied. But until we emptied the original servers and rebuilt them from scratch with modern openafs, we had lingering oddball problems.

In my current position we have about 10.5TB of user files in almost 250,000 volumes spread across 22 servers of various sizes, running 1.4. We see various minor problems which the openafs developers seem to be addressing quite well, but as a defensive measure we're just starting a policy of periodically emptying a file server and ruthlessly salvaging it. Ask me in a year and I'll let you know how it goes.

Steve_______________________________________________
OpenAFS-info mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-info

Reply via email to