Le 10/09/2013 17:43, Andrew Deason a écrit :
On Tue, 10 Sep 2013 07:57:03 +0200
Jean-Marc Choulet <[email protected]> wrote:

We have a question about the fileserver process. On our server openafs
(Debian squeeze), the fileserver process eats 80% CPU every 5 seconds.
Are you running the openafs version from squeeze
(1.4.12.1+dfsg-4+squeeze2)?
Yes, it is

This is somewhat a guess, but: that version sill sync() every 10
seconds; that's a bug that is fixed in later versions. That's every 10
seconds, though, not every 5, and that should only really do anything
noticeable if you have a lot of disk activity on the machine (whether or
not it's caused by openafs). If you 'strace' the fileserver process, it
should be pretty obvious if that's happening; you'll see sync() calls
executing and taking a long time.

For how long does it use 80% cpu?
Every 10 seconds
How long : about 10 seconds

No AFS clientis connected to the server
We restarted our server but it is always the same.
If the above is the problem, you can patch the server to not do that
(it's a very small patch), or you may be able to alter the underlying
filesystem to make sync() calls less noticeable.
... it's easy for you, but not for us :)

However, if that's not the problem, or just more generally to see
"what is going on", there are a few different things you can do:

  - Check FileLog and BosLog (or syslog if you log to syslog), or just
    all of the *Log files. Just see if anything abnormal looks like it's
    being logged. And of course, if there's anything getting logged every
    5 seconds, it's probably relevant.
root@afs1:~# tail -f /var/log/openafs/*
==> /var/log/openafs/BackupLog <==
09/08/2013 04:01:14 Will allocate 400 ubik buffers
09/08/2013 04:01:14 Waiting for quorum election
09/08/2013 04:01:18 Have established quorum
09/08/2013 04:01:18 Ready to process requests at Sun Sep  8 04:01:18 2013

09/10/2013 07:38:38 Will allocate 400 ubik buffers
09/10/2013 07:38:38 Waiting for quorum election
09/10/2013 07:38:42 Have established quorum
09/10/2013 07:38:42 Ready to process requests at Tue Sep 10 07:38:42 2013


==> /var/log/openafs/BosLog <==
Tue Sep 10 07:38:38 2013: Server directory access is okay

==> /var/log/openafs/BosLog.old <==
Sun Sep  8 04:01:14 2013: Server directory access is okay
Sun Sep  8 22:00:03 2013: backupcell exited with code 0
Mon Sep  9 22:00:03 2013: backupcell exited with code 0
Tue Sep 10 07:38:21 2013: buserver exited on signal 15
Tue Sep 10 07:38:21 2013: vlserver exited on signal 15
Tue Sep 10 07:38:21 2013: ptserver exited on signal 15
Tue Sep 10 07:38:27 2013: fs:vol exited on signal 15
Tue Sep 10 07:38:38 2013: fs:file exited with code 0

==> /var/log/openafs/FileLog <==
Tue Sep 10 19:35:32 2013 CB: ProbeUuid for host 0x7f57bb0ef318 (172.20.128.130:7001) failed -01 Tue Sep 10 19:37:32 2013 CB: ProbeUuid for host 0x7f57bb0ef318 (172.20.128.133:7001) failed -01 Tue Sep 10 19:39:30 2013 CB: ProbeUuid for 0x7f57bb0ef318 (172.20.128.130:7001) failed -01 Tue Sep 10 19:41:32 2013 CB: ProbeUuid for host 0x7f57bb0ef318 (172.20.128.133:7001) failed -01 Tue Sep 10 19:43:36 2013 CB: ProbeUuid for 0x7f57bb0ef318 (172.20.128.130:7001) failed -01 Tue Sep 10 19:45:32 2013 CB: ProbeUuid for host 0x7f57bb0ef318 (172.20.128.133:7001) failed -01 Tue Sep 10 19:47:42 2013 CB: ProbeUuid for 0x7f57bb0ef318 (172.20.128.130:7001) failed -01 Tue Sep 10 19:49:32 2013 CB: ProbeUuid for host 0x7f57bb0ef318 (172.20.128.133:7001) failed -01 Tue Sep 10 19:51:48 2013 CB: ProbeUuid for 0x7f57bb0ef318 (172.20.128.130:7001) failed -01 Tue Sep 10 19:53:10 2013 CB: ProbeUuid for host 0x7f57bb0ef318 (172.20.128.133:7001) failed -01

==> /var/log/openafs/FileLog.old <==
Tue Sep 10 07:38:38 2013 File server has terminated normally at Tue Sep 10 07:38:38 2013

ll 0 challenge 47 response 94 debug 0 params 0 unused 0 unused 0 unused 0 version 0 other send counters: ack 1523714, data 2017892 (not resends), resends 241, pushed 0, acked&ignored 486192
       (these should be small) sendFailed 0, fatalErrors 0
   Average rtt is 0.005, with 5138462 samples
   Minimum rtt is 0.000, maximum is 0.566
0 server connections, 16 client connections, 4 peer structs, 32 call structs, 32 free call structs 146883 add CB, 90604 break CB, 91195 del CB, 21718 del FE, 5007 CB's timed out, 0 space reclaim, 72 del host
1 CBs, 1 FEs, (2 of total of 60000 16-byte blocks)

==> /var/log/openafs/PtLog <==
Tue Sep 10 07:38:38 2013 Using 172.20.128.247 as my primary address

==> /var/log/openafs/PtLog.old <==
Sun Sep  8 04:01:14 2013 Using 172.20.128.247 as my primary address

==> /var/log/openafs/SalvageLog <==
@(#) OpenAFS 1.4.12.1 built  2011-02-09
06/01/2013 22:00:24 STARTING AFS SALVAGER 2.4 (/usr/lib/openafs/salvager /vicepa 171082816 -tmpdir /tmp)
06/01/2013 22:00:24 1 nVolumesInInodeFile 32
06/01/2013 22:00:25 SALVAGING VOLUME 171082816.
06/01/2013 22:00:25 user.aabdulha_tmp (171082816) updated 10/11/2012 12:21
06/01/2013 22:00:25 totalInodes 746
06/01/2013 22:00:30 Salvaged user.aabdulha_tmp (171082816): 742 files, 78092 blocks

==> /var/log/openafs/SalvageLog.old <==
06/01/2013 14:56:17 dir vnode 119: special old unlink-while-referenced file .__afs179A is deleted (vnode 86) 06/01/2013 14:56:17 dir vnode 119: special old unlink-while-referenced file .__afsC629 is deleted (vnode 1106) 06/01/2013 14:56:17 dir vnode 119: special old unlink-while-referenced file .__afsCE72 is deleted (vnode 902) 06/01/2013 14:56:17 dir vnode 119: special old unlink-while-referenced file .__afs01D1 is deleted (vnode 908) 06/01/2013 14:56:17 dir vnode 119: special old unlink-while-referenced file .__afsAFF7 is deleted (vnode 890) 06/01/2013 14:56:17 dir vnode 119: special old unlink-while-referenced file .__afsD4CC is deleted (vnode 700) 06/01/2013 14:56:17 dir vnode 119: special old unlink-while-referenced file .__afs096D is deleted (vnode 946) 06/01/2013 14:56:17 dir vnode 119: special old unlink-while-referenced file .__afsCC1D is deleted (vnode 1016)
06/01/2013 14:56:24 Volume uniquifier is too low; fixed
06/01/2013 14:56:24 Salvaged user.aabdulha_tmp (171082816): 742 files, 78092 blocks

==> /var/log/openafs/VLLog <==
Tue Sep 10 07:38:38 2013 Using 172.20.128.247 as my primary address
Tue Sep 10 07:38:38 2013 Starting AFS vlserver 4 (/usr/lib/openafs/vlserver)

==> /var/log/openafs/VLLog.old <==
Sun Sep  8 04:01:14 2013 Using 172.20.128.247 as my primary address
Sun Sep  8 04:01:14 2013 Starting AFS vlserver 4 (/usr/lib/openafs/vlserver)

==> /var/log/openafs/VolserLog <==
Tue Sep 10 10:45:25 2013 1 Volser: CreateVolume: volume 144602000 (user.jmprudha) created Tue Sep 10 10:45:27 2013 1 Volser: CreateVolume: volume 154567376 (user.dyang2) created Tue Sep 10 10:45:29 2013 1 Volser: CreateVolume: volume 144122616 (user.adilou) created Tue Sep 10 12:52:13 2013 1 Volser: CreateVolume: volume 159986776 (user.airauld_tmp) created Tue Sep 10 12:52:45 2013 1 Volser: CreateVolume: volume 144663544 (user.poncot_tmp) created Tue Sep 10 16:33:56 2013 1 Volser: CreateVolume: volume 142021512 (user.wjacques_tmp) created Tue Sep 10 16:35:18 2013 1 Volser: CreateVolume: volume 154334216 (user.cbrenckle_tmp) created Tue Sep 10 16:35:30 2013 1 Volser: CreateVolume: volume 154165960 (user.rhasan_tmp) created Tue Sep 10 16:35:57 2013 1 Volser: CreateVolume: volume 143848040 (user.lnan_tmp) created Tue Sep 10 16:36:07 2013 1 Volser: CreateVolume: volume 146104976 (user.ihamed_tmp) created

==> /var/log/openafs/VolserLog.old <==
Mon Sep 9 16:58:54 2013 1 Volser: CreateVolume: volume 163242808 (user.asalrajh) created Mon Sep 9 16:58:55 2013 1 Volser: CreateVolume: volume 173085448 (user.rhassan) created Mon Sep 9 16:58:57 2013 1 Volser: CreateVolume: volume 158343912 (user.asaif) created Mon Sep 9 16:58:58 2013 1 Volser: CreateVolume: volume 159806320 (user.mtourni3) created Mon Sep 9 22:00:01 2013 1 Volser: Clone: Recloning volume 536870918 to volume 536870920 Mon Sep 9 22:00:01 2013 1 Volser: Clone: Recloning volume 536870945 to volume 536870947 Mon Sep 9 22:00:03 2013 1 Volser: Clone: Recloning volume 151039480 to volume 151039482 Mon Sep 9 22:00:03 2013 1 Volser: Clone: Recloning volume 536870975 to volume 536870977 Mon Sep 9 22:00:03 2013 1 Volser: Clone: Recloning volume 536871149 to volume 536871151 Mon Sep 9 22:00:03 2013 1 Volser: Clone: Recloning volume 536871152 to volume 536871154

==> /var/log/openafs/FileLog <==
Tue Sep 10 19:55:55 2013 CB: ProbeUuid for 0x7f57bb0ef318 (172.20.128.130:7001) failed -01 Tue Sep 10 19:57:18 2013 CB: ProbeUuid for host 0x7f57bb0ef318 (172.20.128.133:7001) failed -01 Tue Sep 10 20:00:01 2013 CB: ProbeUuid for 0x7f57bb0ef318 (172.20.128.130:7001) failed -01 Tue Sep 10 20:00:38 2013 CB: WhoAreYou failed for host 0x7f57bb0ecfc0 (172.20.128.103:7001), error 1 Tue Sep 10 20:00:38 2013 CB: WhoAreYou failed for host 0x7f57bb0ed710 (172.20.128.188:7001), error 1 Tue Sep 10 20:00:40 2013 CB: WhoAreYou failed for host 0x7f57bb0ed5d8 (172.20.128.190:7001), error 1 Tue Sep 10 20:00:42 2013 CB: WhoAreYou failed for host 0x7f57bb0ec9a8 (172.20.128.192:7001), error 1 Tue Sep 10 20:01:07 2013 CB: RCallBackConnectBack failed for host 0x7f57bb0eb148 (172.20.128.130:7001) Tue Sep 10 20:01:07 2013 CB: Call back connect back failed (in break delayed) for Host 172.20.128.130:7001 Tue Sep 10 20:01:07 2013 BreakDelayedCallbacks FAILED for host 172.20.128.130:7001 which IS UP. Connection from 172.20.128.130:7001. Possible network or routing failure. Tue Sep 10 20:01:07 2013 MultiProbe failed to find new address for host 172.20.128.130:7001 Tue Sep 10 20:01:09 2013 CB: WhoAreYou failed for host 0x7f57bb0ef318 (172.20.128.133:7001), error 1 Tue Sep 10 20:01:26 2013 CB: ProbeUuid for host 0x7f57bb0ef318 (172.20.128.133:7001) failed -01


_______________________________________________
OpenAFS-info mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-info

Reply via email to