Since this is a VM, I just took a snapshot and tried to repro, by "powering 
off" the VM myself.  Here's the terminal log after booting back up from that:
<<
login as: karl
[EMAIL PROTECTED]'s password:
Linux coronado 2.6.20-16-server #2 SMP Thu Jun 7 20:26:23 UTC 2007 i686

The programs included with the Ubuntu system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
applicable law.
Last login: Wed Aug 15 15:25:47 2007
[EMAIL PROTECTED]:~$ ls /afs
[EMAIL PROTECTED]:~$ sudo bos status -server localhost
bos: no such entry (getting tickets)
bos: running unauthenticated
Instance ptserver, currently running normally.
Instance vlserver, currently running normally.
Instance fs, currently running normally.
    Auxiliary status is: salvaging file system.
[EMAIL PROTECTED]:~$ sudo bos status -server localhost
bos: no such entry (getting tickets)
bos: running unauthenticated
Instance ptserver, currently running normally.
Instance vlserver, currently running normally.
Instance fs, currently running normally.
    Auxiliary status is: file server running.
[EMAIL PROTECTED]:~$ ls /afs
[EMAIL PROTECTED]:~$
>>

After noting the status changed, I rebooted (the right way) and this time, it 
came back up just fine.  So I can't reproduce this by just killing the VM.  I'm 
not really interested in powering off the VM's host, either (for one, that 
would have to be done after-hours).

Anyways, if it happens again, I'll catch the logs and file a report on the 
segfault.  In the meantime, I'll go learn how to set dynroot up.

Thanks,
Karl


-----Original Message-----
From: Jeffrey Altman [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, August 15, 2007 1:28 PM
To: Karl M. Davis
Cc: [email protected]
Subject: Re: [OpenAFS] Problems with power outages

Karl M. Davis wrote:
> Hey there all,
> 
>  
> 
> I just recently set up the Debian openafs 1.4.4 packages on an Ubuntu
> server box, running in a virtual machine.  It’s monsoon season here in
> Tucson and we’ve had a couple of long power outages and problems with
> the UPS.  Both times the server has gone done unexpectedly, AFS didn’t
> come back up correctly.  The symptoms I note are that “ls /afs” returns
> empty on the server and the Windows client can’t connect.
>
> For whatever reason, the thing that has fixed it both times is running
> “fs checkvolumes”.  Of course, “fs checkvolumes” segfaults when I run
> it, but if I reboot after that, everything comes back up fine, clients
> can connect, and further “fs checkvolumes” don’t segfault.  Rebooting
> before running that specific command (with the segfault) does
> nothing—“ls /afs” still returns empty.
>
> 
> So… a couple of questions:
> 
> How do I ensure AFS can survive a power outage/unexpected poweroff
> without getting borked?
> 
> If it does get borked, why would a segfaulting “fs checkvolumes” fix things?

fs checkvolumes doesn't really check anything.  It instructs the AFS
cache manager to invalidate its knowledge of all of the volume location
information thereby forcing the data to be reloaded from the volume
database servers.

If you are not using dynroot on UNIX or freelance on Windows, if the
file servers are all down or if all of the copies of the 'root.afs'
volume are offline when the client starts, the client will be unable to
mount the volume.  In the case of the Windows clients they will stop
with a panic condition that is logged to %WinDir%\temp\afsd_init.log

If you file a bug report to [EMAIL PROTECTED] with a stack trace
for the segfault on Linux someone can attempt to fix that.   My guess is
that it is failing because the volume list is empty or some boundary
condition like that.

I have no idea how/why "fs checkvolumes" segfaulting would be a
requirement for subsequent access.

Jeffrey Altman




_______________________________________________
OpenAFS-info mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-info

Reply via email to