> Andrew talks a bit about "errors that appear after the server's been > running for a while". If this is a memory corruption problem, then > there is a good likelyhood of random seg faults, possible core dumps, > and server restarts.
There are no coredumps. (Fileserver and volserver have dumped core previously, and I've got them saved away, so I figure if there were going to be any, at least I am not doing anything to stop it.) I only just restarted all servers deliberately after changing the faulty NetRestrict, but my previous AuthLog on the sunx86_510 extends from Wed Apr 14 15:15 to Fri Apr 16 23:40 which is when I did. I don't think kaserver is restarting spontaneously. > paths and data matter here. Just knowing that the software is restarting > spontaneously (cat /var/log/openafs/BosLog ?) would help a lot. sunx86_510 # less BosLog Sun Apr 11 04:00:58 2010: Server directory access is okay Mon Apr 12 15:09:23 2010: kaserver exited on signal 15 Mon Apr 12 15:11:08 2010: kaserver exited on signal 15 Wed Apr 14 13:07:52 2010: kaserver exited on signal 15 Wed Apr 14 15:14:57 2010: kaserver exited on signal 15 Fri Apr 16 23:44:22 2010: upserverS10x86 exited on signal 15 Fri Apr 16 23:44:22 2010: vlserver exited on signal 15 Fri Apr 16 23:44:22 2010: kaserver exited on signal 15 Fri Apr 16 23:44:22 2010: ptserver exited on signal 15 Fri Apr 16 23:44:22 2010: fs:vol exited on signal 15 Fri Apr 16 23:44:22 2010: upclientetc exited on signal 15 Fri Apr 16 23:45:02 2010: fs:file exited with code 0 > Some other problems that could cause intermittent behavior include: > > /1/ flapping network routes. We already know there are multiple addresses... And a static route. > /2/ DNS. Unlikely, but ubik likely depends on dns. if "host `hostname`" > lists more than one ip address, round robin behavior in dns > might result in oddness. It doesn't. >From DNS, the hostname returns exactly one address. Even if host name resolution was somehow involved, which seems unlikely to my untrained mind, /etc/hosts takes preference, and since it's Solaris, you *have* to have a separate name for each IP address you want to configure on a network interface. Like this: # ls /etc/hostname.nge* hostname.nge0 hostname.nge1 hostname.nge2 # cat /etc/hostname.nge* replicon-dev replicon-rfc1918 replicon # cat /etc/hosts # grep replicon /etc/hosts 128.214.209.84 replicon-dev 128.214.58.174 replicon 10.0.0.20 replicon-rfc1918 nge0 is down and unplumbed now that the "development" server is no more, nge1 is the RFC1918 address, and nge2 is the real McCoy. > But since we know the key files aren't consistent, You "know" that? That's a misassumption at best. sun4x_58 # cksum /usr/afs/etc/KeyFile 2143645127 100 /usr/afs/etc/KeyFile sunx86_510 # cksum /usr/afs/etc/KeyFile 2143645127 100 /usr/afs/etc/KeyFile -- Atro Tossavainen (Mr.) / The Institute of Biotechnology at Systems Analyst, Techno-Amish & / the University of Helsinki, Finland, +358-9-19158939 UNIX Dinosaur / employs me, but my opinions are my own. < URL : http : / / www . helsinki . fi / %7E atossava / > NO FILE ATTACHMENTS _______________________________________________ OpenAFS-info mailing list [email protected] https://lists.openafs.org/mailman/listinfo/openafs-info
