On Tue, Oct 30, 2012 at 7:33 AM, Kim Kimball <[email protected]> wrote:
> If you have access to a recent RO the quickest fix may be to vos dump it > and restore the RW from it. NB that if there is only one RO currently > available dumping it makes it busy and with no alternate the RO will be > unavailable to all clients. > > Thanks for that Tip, however in my efforts to get the RW site functioning, I removed the RO replica. In other news, the latest salvage has been running for 12 hours... I straced the busiest pid and it is happily verifying all the links and contents (open(), close(), pread() ad infinitum), so its not wedged. This volume has literally slightly less than 32k directory entries in various places (yes, I made SURE the limits were observed ;-) ) and so I imagine it will take a very long time to traverse the entire thing... interesting that this is the fourth salvage and it actually seems to be working at it this time. Last three times it stopped after a bit over an hour. I suspect that the resources given to the afs server were too limited to actually get the salvage done properly. One thing I did this time was increase the memory to the server up to 8GB, and free shows it tooling merrily along with plenty of buffers and cache now. I did THAT because I noticed that the kernel killed the salvage operation the first two times due to out of memory conditions.. something I had not checked, or expected. So it may be that this is the second "true" salvage, and it may succeed. I'll keep you all posted. There wasn't an error in the AFS logs that indicated that salvager proceses had been killed due to OOM. It was only in the kernel logs. -- Timothy Balcer
