Thanks for the quick response, Yeah, sorry for disappearing there on IRC but I needed to restart the computer I was connected with.
> Is the above a correct assumption about your Realm? I would expect you > to be using ridgetop-group.com. Yes, it is correct: our realm is ridgetop-group.local. > Check the /etc/hosts file on all machines and all CellServDB files for > incorrect entries. The CellServDB file is correct: /etc/openafs/CellServDB: << >ridgetop-group.local 192.168.2.5 # coronado.ridgetop-group.local 192.168.2.6 # picacho.ridgetop-group.local ... >> /etc/openafs/server/CellServDB: << >ridgetop-group.local #Cell name 192.168.2.5 #coronado.ridgetop-group.local 192.168.2.6 #picacho.ridgetop-group.local >> The /etc/hosts file is correct, but I did add the second line to it somewhere before things went to pot: << 127.0.0.1 localhost 192.168.2.6 picacho.ridgetop-group.local picacho 127.0.1.1 picacho.ridgetop-group.local picacho ... >> > What is in VLLog? Not much. /var/log/openafs/VLLog: << Wed Oct 3 23:45:05 2007 Using 192.168.2.6 as my primary address Wed Oct 3 23:45:05 2007 Starting AFS vlserver 4 (/usr/lib/openafs/vlserver) >> I believe that moving volumes went well enough. I started having trouble, though, when I went to recreate the RO copies of root.cell and root.afs. Unfortunately, I'm unclear on the exact order of all this now, but here's a list of the things I did: 1. Setup picacho as a dbserver and fileserver. 2. "vos move"'d all of the RW volumes from coronado to picacho. 3. "vos addsite"'s for root.afs and root.cell, but could not get "vos release" to work. It gave me some errors: "Failed to start a transaction on the RO volume ... volume is busy". 4. Tried "vos syncvldb" and "vos syncserv" on both servers, but those didn't seem to help. Running syncvldb on picacho gave me errors: "Warning: Orphaned RW volume ... exists on ...". 5. Further googling turned up some hits that suggested I should try "vos changeaddr 127.0.0.1 192.168.2.6". This is also around when I added the above-mentioned line to /etc/hosts. I can't recall exactly, but I may have tried playing around with "bos addhost" and "bos removehost" here as well. 6. Tried running "bos salvage". I'm pretty sure this is when things got ugly and fs stopped starting. Running "fs checkvolumes" now segfaults: very fun. I only have the two openafs servers: coronado (old VM) and picacho (new box). Both of them are dbservers and volservers, neither is multi-homed. That's the saga so far. I greatly appreciate any help you can offer! -- Karl -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Christopher D. Clausen Sent: Wednesday, October 03, 2007 8:19 PM To: Karl M. Davis Cc: [email protected] Subject: Re: [OpenAFS] AFS Fileserver Won't Start Karl M. Davis <[EMAIL PROTECTED]> wrote: Hi Karl. I'm going to assume it was you in the #openafs IRC channel. I'd suggest staying logged in if you really want help. You have to wait for people to have time to respond. And more than the 15 minutes that you waited. We do need to do things like eat and sleep. > Somewhere towards the end of moving the volumes from the old server > to the new server, things got badly goofed. The fs process will no > longer start on the new server and I find the following entry in the > /var/log/openafs/FileLog file: > > Wed Oct 3 19:26:59 2007 afs_krb_get_lrealm failed, using > ridgetop-group.local. Is the above a correct assumption about your Realm? I would expect you to be using ridgetop-group.com. > Wed Oct 3 19:26:59 2007 VL_RegisterAddrs rpc failed; The IP address > exists on a different server; repair it Check the /etc/hosts file on all machines and all CellServDB files for incorrect entries. > Wed Oct 3 19:26:59 2007 VL_RegisterAddrs rpc failed; See VLLog for > details What is in VLLog? > Unfortunately, there's nothing helpful in VLLog. Interestingly, "vos > listaddrs" returns nothing on the new server, either. vos listaddrs might not be working b/c of the above errors. > Running "vos listvldb" returns the following: > VLDB entries for all servers > root.afs > RWrite: 536870915 ROnly: 536870916 > number of sites -> 3 > server picacho.ridgetop-group.local partition /vicepa RW Site > server picacho.ridgetop-group.local partition /vicepa RO Site > server picacho.ridgetop-group.local partition /vicepa RO Site > > root.cell > RWrite: 536870918 ROnly: 536870919 > number of sites -> 3 > server picacho.ridgetop-group.local partition /vicepa RW Site > server picacho.ridgetop-group.local partition /vicepa RO Site > server picacho.ridgetop-group.local partition /vicepa RO Site > > I'm unsure why there are duplicate RO entries, but the last thing I > was working on was recreating RO volumes for root.cell and root.afs > on the new server. Well, it looks like something did not work out right. > I'm panicking because all of the volumes are now on the new server and > non-accessible. Anyone have some clue what I did wrong and how I can > fix things? Probably going to need more information about what happened, what you did to try and fix it, and other infrastructure questions, like how many AFS DB servers you actually have, and if any of them are multi-homed. <<CDC _______________________________________________ OpenAFS-info mailing list [email protected] https://lists.openafs.org/mailman/listinfo/openafs-info _______________________________________________ OpenAFS-info mailing list [email protected] https://lists.openafs.org/mailman/listinfo/openafs-info
