We had two afs servers and things were running great, we had a nice quorum and 
all was happy.
We added an addition afs server over the weekend, and now none of the machines 
will establish a quorum. All FileLogs show the 5376 error code.
Tue Apr  6 08:59:59 2010 File server starting
Tue Apr  6 08:59:59 2010 /var/openafs/sysid: doesn't exist
Tue Apr  6 08:59:59 2010 Creating new SysID file
Tue Apr  6 08:59:59 2010 VL_RegisterAddrs rpc failed; will retry periodically 
(code=5376, err=0)
Tue Apr  6 09:00:00 2010 Set thread id 133 for FSYNC_sync
Tue Apr  6 09:00:00 2010 FSYNC_sync: bind failed with (98), removed bogus 
/var/openafs/fssync.sock

udebug of 7002 of all three servers:
http://pastebin.com/SZyM4BC7


They all show the sync host as 0.0.0.0 (which is what it gets set to when a 
quroum cannot be established right?)

vos listaddrs shows the two original afs servers, but not the current one.

I upped the debug level on the vlserver and get:
Tue Apr  6 09:09:44 2010 beacon: amSyncSite is 0
Tue Apr  6 09:09:44 2010 Received beacon type 0 from host 10.130.8.160
Tue Apr  6 09:09:46 2010 Received beacon from unknown host 172.20.1.26
Tue Apr  6 09:09:48 2010 recovery running in state 0
Tue Apr  6 09:09:48 2010 beacon: amSyncSite is 0
Tue Apr  6 09:09:52 2010 recovery running in state 0
Tue Apr  6 09:09:52 2010 beacon: amSyncSite is 0

repeatedly. 
We added the server to the CellSrvDB file on all afs servers, and restarted 
them, and we got this. Also the sysid file is not being created on the new 
server (which iirc is because no quorum can be established). 
I checked time, and they are all sycned within ~1 second of each other. 

What else could I be missing or need to check? I am sure it is something very 
simple.

Thank you.




_______________________________________________
OpenAFS-info mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-info

Reply via email to