Le 10/09/2013 17:43, Andrew Deason a écrit :
On Tue, 10 Sep 2013 07:57:03 +0200
Jean-Marc Choulet <[email protected]> wrote:
We have a question about the fileserver process. On our server openafs
(Debian squeeze), the fileserver process eats 80% CPU every 5 seconds.
Are you running the openafs version from squeeze
(1.4.12.1+dfsg-4+squeeze2)?
Yes, it is
This is somewhat a guess, but: that version sill sync() every 10
seconds; that's a bug that is fixed in later versions. That's every 10
seconds, though, not every 5, and that should only really do anything
noticeable if you have a lot of disk activity on the machine (whether or
not it's caused by openafs). If you 'strace' the fileserver process, it
should be pretty obvious if that's happening; you'll see sync() calls
executing and taking a long time.
For how long does it use 80% cpu?
Every 10 seconds
How long : about 10 seconds
No AFS clientis connected to the server
We restarted our server but it is always the same.
If the above is the problem, you can patch the server to not do that
(it's a very small patch), or you may be able to alter the underlying
filesystem to make sync() calls less noticeable.
... it's easy for you, but not for us :)
However, if that's not the problem, or just more generally to see
"what is going on", there are a few different things you can do:
- Check FileLog and BosLog (or syslog if you log to syslog), or just
all of the *Log files. Just see if anything abnormal looks like it's
being logged. And of course, if there's anything getting logged every
5 seconds, it's probably relevant.
root@afs1:~# tail -f /var/log/openafs/*
==> /var/log/openafs/BackupLog <==
09/08/2013 04:01:14 Will allocate 400 ubik buffers
09/08/2013 04:01:14 Waiting for quorum election
09/08/2013 04:01:18 Have established quorum
09/08/2013 04:01:18 Ready to process requests at Sun Sep 8 04:01:18 2013
09/10/2013 07:38:38 Will allocate 400 ubik buffers
09/10/2013 07:38:38 Waiting for quorum election
09/10/2013 07:38:42 Have established quorum
09/10/2013 07:38:42 Ready to process requests at Tue Sep 10 07:38:42 2013
==> /var/log/openafs/BosLog <==
Tue Sep 10 07:38:38 2013: Server directory access is okay
==> /var/log/openafs/BosLog.old <==
Sun Sep 8 04:01:14 2013: Server directory access is okay
Sun Sep 8 22:00:03 2013: backupcell exited with code 0
Mon Sep 9 22:00:03 2013: backupcell exited with code 0
Tue Sep 10 07:38:21 2013: buserver exited on signal 15
Tue Sep 10 07:38:21 2013: vlserver exited on signal 15
Tue Sep 10 07:38:21 2013: ptserver exited on signal 15
Tue Sep 10 07:38:27 2013: fs:vol exited on signal 15
Tue Sep 10 07:38:38 2013: fs:file exited with code 0
==> /var/log/openafs/FileLog <==
Tue Sep 10 19:35:32 2013 CB: ProbeUuid for host 0x7f57bb0ef318
(172.20.128.130:7001) failed -01
Tue Sep 10 19:37:32 2013 CB: ProbeUuid for host 0x7f57bb0ef318
(172.20.128.133:7001) failed -01
Tue Sep 10 19:39:30 2013 CB: ProbeUuid for 0x7f57bb0ef318
(172.20.128.130:7001) failed -01
Tue Sep 10 19:41:32 2013 CB: ProbeUuid for host 0x7f57bb0ef318
(172.20.128.133:7001) failed -01
Tue Sep 10 19:43:36 2013 CB: ProbeUuid for 0x7f57bb0ef318
(172.20.128.130:7001) failed -01
Tue Sep 10 19:45:32 2013 CB: ProbeUuid for host 0x7f57bb0ef318
(172.20.128.133:7001) failed -01
Tue Sep 10 19:47:42 2013 CB: ProbeUuid for 0x7f57bb0ef318
(172.20.128.130:7001) failed -01
Tue Sep 10 19:49:32 2013 CB: ProbeUuid for host 0x7f57bb0ef318
(172.20.128.133:7001) failed -01
Tue Sep 10 19:51:48 2013 CB: ProbeUuid for 0x7f57bb0ef318
(172.20.128.130:7001) failed -01
Tue Sep 10 19:53:10 2013 CB: ProbeUuid for host 0x7f57bb0ef318
(172.20.128.133:7001) failed -01
==> /var/log/openafs/FileLog.old <==
Tue Sep 10 07:38:38 2013 File server has terminated normally at Tue Sep
10 07:38:38 2013
ll 0 challenge 47 response 94 debug 0 params 0 unused 0 unused 0 unused
0 version 0
other send counters: ack 1523714, data 2017892 (not resends),
resends 241, pushed 0, acked&ignored 486192
(these should be small) sendFailed 0, fatalErrors 0
Average rtt is 0.005, with 5138462 samples
Minimum rtt is 0.000, maximum is 0.566
0 server connections, 16 client connections, 4 peer structs, 32 call
structs, 32 free call structs
146883 add CB, 90604 break CB, 91195 del CB, 21718 del FE, 5007 CB's
timed out, 0 space reclaim, 72 del host
1 CBs, 1 FEs, (2 of total of 60000 16-byte blocks)
==> /var/log/openafs/PtLog <==
Tue Sep 10 07:38:38 2013 Using 172.20.128.247 as my primary address
==> /var/log/openafs/PtLog.old <==
Sun Sep 8 04:01:14 2013 Using 172.20.128.247 as my primary address
==> /var/log/openafs/SalvageLog <==
@(#) OpenAFS 1.4.12.1 built 2011-02-09
06/01/2013 22:00:24 STARTING AFS SALVAGER 2.4 (/usr/lib/openafs/salvager
/vicepa 171082816 -tmpdir /tmp)
06/01/2013 22:00:24 1 nVolumesInInodeFile 32
06/01/2013 22:00:25 SALVAGING VOLUME 171082816.
06/01/2013 22:00:25 user.aabdulha_tmp (171082816) updated 10/11/2012 12:21
06/01/2013 22:00:25 totalInodes 746
06/01/2013 22:00:30 Salvaged user.aabdulha_tmp (171082816): 742 files,
78092 blocks
==> /var/log/openafs/SalvageLog.old <==
06/01/2013 14:56:17 dir vnode 119: special old unlink-while-referenced
file .__afs179A is deleted (vnode 86)
06/01/2013 14:56:17 dir vnode 119: special old unlink-while-referenced
file .__afsC629 is deleted (vnode 1106)
06/01/2013 14:56:17 dir vnode 119: special old unlink-while-referenced
file .__afsCE72 is deleted (vnode 902)
06/01/2013 14:56:17 dir vnode 119: special old unlink-while-referenced
file .__afs01D1 is deleted (vnode 908)
06/01/2013 14:56:17 dir vnode 119: special old unlink-while-referenced
file .__afsAFF7 is deleted (vnode 890)
06/01/2013 14:56:17 dir vnode 119: special old unlink-while-referenced
file .__afsD4CC is deleted (vnode 700)
06/01/2013 14:56:17 dir vnode 119: special old unlink-while-referenced
file .__afs096D is deleted (vnode 946)
06/01/2013 14:56:17 dir vnode 119: special old unlink-while-referenced
file .__afsCC1D is deleted (vnode 1016)
06/01/2013 14:56:24 Volume uniquifier is too low; fixed
06/01/2013 14:56:24 Salvaged user.aabdulha_tmp (171082816): 742 files,
78092 blocks
==> /var/log/openafs/VLLog <==
Tue Sep 10 07:38:38 2013 Using 172.20.128.247 as my primary address
Tue Sep 10 07:38:38 2013 Starting AFS vlserver 4 (/usr/lib/openafs/vlserver)
==> /var/log/openafs/VLLog.old <==
Sun Sep 8 04:01:14 2013 Using 172.20.128.247 as my primary address
Sun Sep 8 04:01:14 2013 Starting AFS vlserver 4 (/usr/lib/openafs/vlserver)
==> /var/log/openafs/VolserLog <==
Tue Sep 10 10:45:25 2013 1 Volser: CreateVolume: volume 144602000
(user.jmprudha) created
Tue Sep 10 10:45:27 2013 1 Volser: CreateVolume: volume 154567376
(user.dyang2) created
Tue Sep 10 10:45:29 2013 1 Volser: CreateVolume: volume 144122616
(user.adilou) created
Tue Sep 10 12:52:13 2013 1 Volser: CreateVolume: volume 159986776
(user.airauld_tmp) created
Tue Sep 10 12:52:45 2013 1 Volser: CreateVolume: volume 144663544
(user.poncot_tmp) created
Tue Sep 10 16:33:56 2013 1 Volser: CreateVolume: volume 142021512
(user.wjacques_tmp) created
Tue Sep 10 16:35:18 2013 1 Volser: CreateVolume: volume 154334216
(user.cbrenckle_tmp) created
Tue Sep 10 16:35:30 2013 1 Volser: CreateVolume: volume 154165960
(user.rhasan_tmp) created
Tue Sep 10 16:35:57 2013 1 Volser: CreateVolume: volume 143848040
(user.lnan_tmp) created
Tue Sep 10 16:36:07 2013 1 Volser: CreateVolume: volume 146104976
(user.ihamed_tmp) created
==> /var/log/openafs/VolserLog.old <==
Mon Sep 9 16:58:54 2013 1 Volser: CreateVolume: volume 163242808
(user.asalrajh) created
Mon Sep 9 16:58:55 2013 1 Volser: CreateVolume: volume 173085448
(user.rhassan) created
Mon Sep 9 16:58:57 2013 1 Volser: CreateVolume: volume 158343912
(user.asaif) created
Mon Sep 9 16:58:58 2013 1 Volser: CreateVolume: volume 159806320
(user.mtourni3) created
Mon Sep 9 22:00:01 2013 1 Volser: Clone: Recloning volume 536870918 to
volume 536870920
Mon Sep 9 22:00:01 2013 1 Volser: Clone: Recloning volume 536870945 to
volume 536870947
Mon Sep 9 22:00:03 2013 1 Volser: Clone: Recloning volume 151039480 to
volume 151039482
Mon Sep 9 22:00:03 2013 1 Volser: Clone: Recloning volume 536870975 to
volume 536870977
Mon Sep 9 22:00:03 2013 1 Volser: Clone: Recloning volume 536871149 to
volume 536871151
Mon Sep 9 22:00:03 2013 1 Volser: Clone: Recloning volume 536871152 to
volume 536871154
==> /var/log/openafs/FileLog <==
Tue Sep 10 19:55:55 2013 CB: ProbeUuid for 0x7f57bb0ef318
(172.20.128.130:7001) failed -01
Tue Sep 10 19:57:18 2013 CB: ProbeUuid for host 0x7f57bb0ef318
(172.20.128.133:7001) failed -01
Tue Sep 10 20:00:01 2013 CB: ProbeUuid for 0x7f57bb0ef318
(172.20.128.130:7001) failed -01
Tue Sep 10 20:00:38 2013 CB: WhoAreYou failed for host 0x7f57bb0ecfc0
(172.20.128.103:7001), error 1
Tue Sep 10 20:00:38 2013 CB: WhoAreYou failed for host 0x7f57bb0ed710
(172.20.128.188:7001), error 1
Tue Sep 10 20:00:40 2013 CB: WhoAreYou failed for host 0x7f57bb0ed5d8
(172.20.128.190:7001), error 1
Tue Sep 10 20:00:42 2013 CB: WhoAreYou failed for host 0x7f57bb0ec9a8
(172.20.128.192:7001), error 1
Tue Sep 10 20:01:07 2013 CB: RCallBackConnectBack failed for host
0x7f57bb0eb148 (172.20.128.130:7001)
Tue Sep 10 20:01:07 2013 CB: Call back connect back failed (in break
delayed) for Host 172.20.128.130:7001
Tue Sep 10 20:01:07 2013 BreakDelayedCallbacks FAILED for host
172.20.128.130:7001 which IS UP. Connection from 172.20.128.130:7001.
Possible network or routing failure.
Tue Sep 10 20:01:07 2013 MultiProbe failed to find new address for host
172.20.128.130:7001
Tue Sep 10 20:01:09 2013 CB: WhoAreYou failed for host 0x7f57bb0ef318
(172.20.128.133:7001), error 1
Tue Sep 10 20:01:26 2013 CB: ProbeUuid for host 0x7f57bb0ef318
(172.20.128.133:7001) failed -01
_______________________________________________
OpenAFS-info mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-info