Re: [OpenAFS] Re: help, salvaged volume won't come back online, is it corrupt? [trimmed log]

Steve Simmons Wed, 13 Sep 2006 09:17:16 -0700


On Sep 13, 2006, at 9:59 AM, Hartmut Reuter wrote:

Juha Jäykkä wrote:
Better you do a "vos convertROtoRW" on the RO-site as soon aspossible to regain a valid RW-volume in this case.
Except that I'm unlikely to notice the corruption before it'sreleased,
which happens automatically. Sounds like we need to change our backup
policy...
The best way to prevent the salvager from corrupting volumes is notto run it automatically. If you configure your OpenAFS with with "--enable-fast-restart" then the fileserver will not salvageautomatically after a crash. So if you find after a ccrash volumeswhich couldn't be attached you salvage them by "bos salvage serverpartition volume" and examine the SalvageLog. I suppose in the casehe throws the root-directory away you will see some thing in the log.

In a former life I had some Transarc AFS servers which had persistentproblems starting after a major crash due to multi-day power outage.Some were the same line reported here. The only cleanup process thatworked went something like this:


For every volume in the old server:
  Vos dump the volume to a file

Restore it from the file to a different name on an otherwise emptyserverSalvage that volume with orphans being attached. Since you'resalvaging a copy, you

     have no risk of hosing the production volume
  If no problems, move the original volume to a new server
  If recoverable problems in the salvage:
     delete the copy you'd made
     move the volume to the empty server
     salvage it, and clean up the orphans with the end user
     move the volume to a new server
     thoroughly clean (mkfs) the empty server
  If unrecoverable problems in the volume salvage:
     tar up the existing volume as best you can
     apologize profusely to user.

Fortunately only about 1% of the volumes had problems and most wereeasily remedied. But until we emptied the original servers andrebuilt them from scratch with modern openafs, we had lingeringoddball problems.

In my current position we have about 10.5TB of user files in almost250,000 volumes spread across 22 servers of various sizes, running1.4. We see various minor problems which the openafs developers seemto be addressing quite well, but as a defensive measure we're juststarting a policy of periodically emptying a file server andruthlessly salvaging it. Ask me in a year and I'll let you know howit goes.


Steve_______________________________________________
OpenAFS-info mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-info

Re: [OpenAFS] Re: help, salvaged volume won't come back online, is it corrupt? [trimmed log]

Reply via email to