Hi all,

Today we had an outage of a fileserver (i386_linux22) running Redhat 6.2
and OpenAFS-1.2.2.

Suddenly the load increased to 15 or even more. Clients stopped working.

I invoked "kill -TSTP" to increase fileservers loglevel and "kill -XCPU" 
to get the dumps.

/usr/afs/logs/FileLog:
...
Mon Oct 29 11:48:53 2001 Host 853c6d8 used to support WhoAreYou, deleting.
Mon Oct 29 11:53:39 2001 Set Debug On level = 1
Mon Oct 29 11:53:40 2001 Set Debug On level = 5
Mon Oct 29 11:53:40 2001 GSS: Delete longest inactive host 5b3c6d86
Mon Oct 29 11:53:40 2001 GSS: Delete longest inactive host 5b3c6d86
Mon Oct 29 11:53:40 2001 GSS: Delete longest inactive host 5b3c6d86
Mon Oct 29 11:53:40 2001 GSS: Delete longest inactive host 5b3c6d86
Mon Oct 29 11:53:40 2001 GSS: Delete longest inactive host 5b3c6d86
...
Mon Oct 29 11:53:40 2001 GSS: Delete longest inactive host 6ec86d86
Mon Oct 29 11:53:40 2001 GSS: Delete longest inactive host 5b3c6d86
Mon Oct 29 11:53:40 2001 GSS: Delete longest inactive host 5b3c6d86
Mon Oct 29 11:53:40 2001 GSS: Delete longest inactive host 5b3c6d86
Mon Oct 29 11:53:40 2001 GSS: Delete longest inactive host 5b3c6d86
Mon Oct 29 11:53:40 2001 GSS: Delete longest inactive host 5b3c6d86
Mon Oct 29 11:53:40 2001 GSS: Delete longest inactive host 5b3c6d86
Mon Oct 29 11:53:40 2001 GSS: Delete longest inactive host 5b3c6d86
Mon Oct 29 11:53:40 2001 GSS: Delete longest inactive host 5b3c6d86
...

such lines are repeated until I restarted the fileserver
(about 30000 times during 1 minute).
Only two ip addresses are contained (0x866d3c5b and 0x866dc86e)

/usr/afs/local/hosts.dump:
...
ip:5b3c6d86 port:22811 hidx:1321 cbid:63416 lock:ffffffff last:1004352554 
active:1004352554 down:0 del:0 cons:2 cldel:0
         hpfailed:0 hcpsCall:1004351695 hcps [ -656 -212] [] holds: 
3ef65b1000000000000 slot/bit: 0/1
...
ip:6ec86d86 port:22811 hidx:62 cbid:56533 lock:ffffffff last:1004352554 
active:1004352554 down:0 del:0 cons:2 cldel:0
         hpfailed:0 hcpsCall:1004351271 hcps [ -656 -431 -212] [ 6ec86d86] holds: 
1000101000000000000 slot/bit: 0/1
...

I noticed that all other 1176 entries in this file have the value "cbid:0".

/usr/afs/local/clients.dump:
...
Host 5b3c6d86.22811 down = 0, LastCall Mon Oct 29 11:49:14 2001
    user id=42124,  name=nfu, sl=Authenticated till Tue Oct 30 09:50:03 2001
      CPS-5 is []
...
Host 6ec86d86.22811 down = 0, LastCall Mon Oct 29 11:49:14 2001
    user id=32766,  name=anonymous, sl=Not authenticated till No Limit
      CPS-2 is []
    user id=4799,  name=erm, sl=Authenticated till Tue Oct 30 12:55:58 2001
      CPS-8 is []
    user=anonymous, no current server connection
      CPS-2 is []
    user=afs_cron, no current server connection
      CPS-3 is []
    user=afs_cron, no current server connection
      CPS-3 is []

/usr/afs/local/callback.dump was not written

We saw this several times during the last few weeks
(I think since we have OpenAFS-1.1.1 on this server),
but this time I could gather some logs.

Do you have any hints?

Thanks,
Thomas.
-- 
-----------------------------------------------------------------------
Thomas M�ller, TU Chemnitz, Universit�tsrechenzentrum, D-09107 Chemnitz
mail: [EMAIL PROTECTED]
-----------------------------------------------------------------------


_______________________________________________
OpenAFS-devel mailing list
[EMAIL PROTECTED]
https://lists.openafs.org/mailman/listinfo/openafs-devel

Reply via email to