Hi all,
Today we had an outage of a fileserver (i386_linux22) running Redhat 6.2
and OpenAFS-1.2.2.
Suddenly the load increased to 15 or even more. Clients stopped working.
I invoked "kill -TSTP" to increase fileservers loglevel and "kill -XCPU"
to get the dumps.
/usr/afs/logs/FileLog:
...
Mon Oct 29 11:48:53 2001 Host 853c6d8 used to support WhoAreYou, deleting.
Mon Oct 29 11:53:39 2001 Set Debug On level = 1
Mon Oct 29 11:53:40 2001 Set Debug On level = 5
Mon Oct 29 11:53:40 2001 GSS: Delete longest inactive host 5b3c6d86
Mon Oct 29 11:53:40 2001 GSS: Delete longest inactive host 5b3c6d86
Mon Oct 29 11:53:40 2001 GSS: Delete longest inactive host 5b3c6d86
Mon Oct 29 11:53:40 2001 GSS: Delete longest inactive host 5b3c6d86
Mon Oct 29 11:53:40 2001 GSS: Delete longest inactive host 5b3c6d86
...
Mon Oct 29 11:53:40 2001 GSS: Delete longest inactive host 6ec86d86
Mon Oct 29 11:53:40 2001 GSS: Delete longest inactive host 5b3c6d86
Mon Oct 29 11:53:40 2001 GSS: Delete longest inactive host 5b3c6d86
Mon Oct 29 11:53:40 2001 GSS: Delete longest inactive host 5b3c6d86
Mon Oct 29 11:53:40 2001 GSS: Delete longest inactive host 5b3c6d86
Mon Oct 29 11:53:40 2001 GSS: Delete longest inactive host 5b3c6d86
Mon Oct 29 11:53:40 2001 GSS: Delete longest inactive host 5b3c6d86
Mon Oct 29 11:53:40 2001 GSS: Delete longest inactive host 5b3c6d86
Mon Oct 29 11:53:40 2001 GSS: Delete longest inactive host 5b3c6d86
...
such lines are repeated until I restarted the fileserver
(about 30000 times during 1 minute).
Only two ip addresses are contained (0x866d3c5b and 0x866dc86e)
/usr/afs/local/hosts.dump:
...
ip:5b3c6d86 port:22811 hidx:1321 cbid:63416 lock:ffffffff last:1004352554
active:1004352554 down:0 del:0 cons:2 cldel:0
hpfailed:0 hcpsCall:1004351695 hcps [ -656 -212] [] holds:
3ef65b1000000000000 slot/bit: 0/1
...
ip:6ec86d86 port:22811 hidx:62 cbid:56533 lock:ffffffff last:1004352554
active:1004352554 down:0 del:0 cons:2 cldel:0
hpfailed:0 hcpsCall:1004351271 hcps [ -656 -431 -212] [ 6ec86d86] holds:
1000101000000000000 slot/bit: 0/1
...
I noticed that all other 1176 entries in this file have the value "cbid:0".
/usr/afs/local/clients.dump:
...
Host 5b3c6d86.22811 down = 0, LastCall Mon Oct 29 11:49:14 2001
user id=42124, name=nfu, sl=Authenticated till Tue Oct 30 09:50:03 2001
CPS-5 is []
...
Host 6ec86d86.22811 down = 0, LastCall Mon Oct 29 11:49:14 2001
user id=32766, name=anonymous, sl=Not authenticated till No Limit
CPS-2 is []
user id=4799, name=erm, sl=Authenticated till Tue Oct 30 12:55:58 2001
CPS-8 is []
user=anonymous, no current server connection
CPS-2 is []
user=afs_cron, no current server connection
CPS-3 is []
user=afs_cron, no current server connection
CPS-3 is []
/usr/afs/local/callback.dump was not written
We saw this several times during the last few weeks
(I think since we have OpenAFS-1.1.1 on this server),
but this time I could gather some logs.
Do you have any hints?
Thanks,
Thomas.
--
-----------------------------------------------------------------------
Thomas M�ller, TU Chemnitz, Universit�tsrechenzentrum, D-09107 Chemnitz
mail: [EMAIL PROTECTED]
-----------------------------------------------------------------------
_______________________________________________
OpenAFS-devel mailing list
[EMAIL PROTECTED]
https://lists.openafs.org/mailman/listinfo/openafs-devel