We had two afs servers and things were running great, we had a nice quorum and all was happy. We added an addition afs server over the weekend, and now none of the machines will establish a quorum. All FileLogs show the 5376 error code. Tue Apr 6 08:59:59 2010 File server starting Tue Apr 6 08:59:59 2010 /var/openafs/sysid: doesn't exist Tue Apr 6 08:59:59 2010 Creating new SysID file Tue Apr 6 08:59:59 2010 VL_RegisterAddrs rpc failed; will retry periodically (code=5376, err=0) Tue Apr 6 09:00:00 2010 Set thread id 133 for FSYNC_sync Tue Apr 6 09:00:00 2010 FSYNC_sync: bind failed with (98), removed bogus /var/openafs/fssync.sock
udebug of 7002 of all three servers: http://pastebin.com/SZyM4BC7 They all show the sync host as 0.0.0.0 (which is what it gets set to when a quroum cannot be established right?) vos listaddrs shows the two original afs servers, but not the current one. I upped the debug level on the vlserver and get: Tue Apr 6 09:09:44 2010 beacon: amSyncSite is 0 Tue Apr 6 09:09:44 2010 Received beacon type 0 from host 10.130.8.160 Tue Apr 6 09:09:46 2010 Received beacon from unknown host 172.20.1.26 Tue Apr 6 09:09:48 2010 recovery running in state 0 Tue Apr 6 09:09:48 2010 beacon: amSyncSite is 0 Tue Apr 6 09:09:52 2010 recovery running in state 0 Tue Apr 6 09:09:52 2010 beacon: amSyncSite is 0 repeatedly. We added the server to the CellSrvDB file on all afs servers, and restarted them, and we got this. Also the sysid file is not being created on the new server (which iirc is because no quorum can be established). I checked time, and they are all sycned within ~1 second of each other. What else could I be missing or need to check? I am sure it is something very simple. Thank you. _______________________________________________ OpenAFS-info mailing list [email protected] https://lists.openafs.org/mailman/listinfo/openafs-info
