Re: "swiN: clock sio" process taking 75% CPU

2006-07-21 Thread Gareth McCaughan
I wrote:

> About 6 minutes after booting (on two occasions; I don't
> guarantee that this doesn't vary), a process that appears
> in the output of "ps" as "[swi4: clock sio]" begins to
> use about 3/4 of the machine's CPU. I think it does so
> more or less instantaneously. It continues to do so
> indefinitely, so far as I can tell.

So, here's the answer. Whether it's the same thing that's
afflicted the other people who've reported similar problems,
I don't know. (Thanks to John Baldwin on -hackers for
pointing me in a useful direction.)

Executive summary: If you see symptoms like the one above,
are you running a syscons screen saver? (To check: run
"kldstat | grep _saver".) If so, turn it off and the problem
may go away.

1. The machine in question runs largely unattended.

2. I'd enabled the syscons screen saver and chosen one
   of the ones that puts the screen into a graphics mode.
   ("warp", as it happens; "fire" behaves similarly;
   the character-mode ones don't; I haven't looked at
   all of them.)

3. The screen saver kicks in 5 minutes after it gets
   turned on in /etc/rc.d/syscons, provided nothing's
   happening on the console. Which it isn't: see #1.

4. Now, how do those graphics-mode screen savers work?
   They write to the video card's frame buffer directly,
   but there's only a 64k block of RAM they can do this
   through. So, to cope with larger screens, there's a
   bank switching facility accessed by a BIOS call.

5. This BIOS call, on my machine, takes about 0.1ms; you
   need to do two of them for a bank switch, so the time
   actually taken is about 0.2ms.

6. The screen savers are written in a less than optimal way,
   and do that bank switching thing many times. For instance,
   the "fire" screen saver does it at least once for every
   screen line. Even when the entire screen actually fits
   into a single bank so that no switching at all should be
   needed.

7. So the screensaver eats up something on the order of half
   my CPU time; the exact figure depends on which screensaver
   and on more exact timings than I've given above, which is
   how it ends up actually being 75% for the "warp" screensaver.

8. The screensaver gets run in callouts from a kernel
   interrupt thread that happens to have a silly name
   like "swi4: clock sio".

This is eminently fixable, in several different ways. I've
offered to prepare a patch, or perhaps someone else will
do so, so there's a reasonable prospect of later versions
of FreeBSD not having this problem. For the time being,
there's a simple workaround for anyone facing the same
problem I did: *turn off the screensaver*, or replace it
with one that doesn't use a graphics mode.

For clarity: this is a problem with (some) FreeBSD syscons
screen savers, the ones you might enable in /etc/rc.conf;
not with the ones like xscreensaver that you might run in
user mode under X.

-- 
g

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: "swiN: clock sio" process taking 75% CPU

2006-07-17 Thread Gareth McCaughan
> Can I suggest that if you get no replies on questions@ that you try 
> hackers@ (you may need to subscribe to post, not sure).
> 
> What's going on here is not normal so even someone with tons of FreeBSD 
> experience may never have seen something like this.  I know I never 
> have.  Most posters here are not kernel hackers.

That's a fair comment. I tried -stable too; I'm fairly sure some
people there *are* kernel hackers; but -hackers is probably not a
bad place to look for deeper wizardry.

> > It would be a shame if the only conclusion to be drawn from this were 
> > "sometimes a machine running FreeBSD is just 4x slower than it should 
> > be, and no one knows why". 
> 
> Well, if you could only find one somewhat relevant topic with google, 
> then "sometimes" would be "almost never".  I appreciate that you are 
> frustrated because it *is* happening to you, but your problem is not a 
> common one so may take more than a post on questions to resolve.

I found quite a lot of relevant messages. In every case what happened
was that someone posted saying "I've got this weird thing where X% of
my CPU time is being taken up by this clock sio thing" and they got no
responses other than from other people saying "yup, that's strange".
(Sometimes there was something else they were also concerned about,
and they got responses to that bit.) A few examples found by googling
for "clock sio site:lists.freebsd.org" follow; most appear to have
been asked only on -stable, and it was because this seems to have been
an unsuccessful strategy that I tried -questions too. :-)

http://lists.freebsd.org/pipermail/freebsd-stable/2006-April/024793.html

(16%; old system with modems attached, which were also
giving problems; the latter was resolved, kinda, but it's
not clear that the former was)

http://lists.freebsd.org/pipermail/freebsd-stable/2006-July/026873.html

(not clear how much CPU; his concern was something else that
was triggered by this, and he had another kernel thread being
silly too; no resolution)

http://lists.freebsd.org/pipermail/freebsd-stable/2004-December/010673.html

(not certain that this is the same problem, but it sure looks
like it; no reply to his message)

http://lists.freebsd.org/pipermail/freebsd-stable/2004-November/009489.html

(24% on one machine, 8% on another; no reply)

http://lists.freebsd.org/pipermail/freebsd-current/2005-March/047218.html

(18% but drops when a real task begins; may not be the same problem;
this is under MS Virtual Server; only suggestion was to reduce HZ,
which didn't solve the problem; no further replies)

However, I've not seen any sign that other people have had *as much*
CPU wasted this way as I have.

-- 
g

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: "swiN: clock sio" process taking 75% CPU

2006-07-17 Thread Alex Zbyslaw
Can I suggest that if you get no replies on questions@ that you try 
hackers@ (you may need to subscribe to post, not sure).


What's going on here is not normal so even someone with tons of FreeBSD 
experience may never have seen something like this.  I know I never 
have.  Most posters here are not kernel hackers.


It would be a shame if the only conclusion to be drawn from this were 
"sometimes a machine running FreeBSD is just 4x slower than it should 
be, and no one knows why". 


Well, if you could only find one somewhat relevant topic with google, 
then "sometimes" would be "almost never".  I appreciate that you are 
frustrated because it *is* happening to you, but your problem is not a 
common one so may take more than a post on questions to resolve.


--Alex


___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: "swiN: clock sio" process taking 75% CPU

2006-07-17 Thread Gareth McCaughan
I wrote:
> About 6 minutes after booting (on two occasions; I don't
> guarantee that this doesn't vary), a process that appears
> in the output of "ps" as "[swi4: clock sio]" begins to
> use about 3/4 of the machine's CPU. I think it does so
> more or less instantaneously. It continues to do so
> indefinitely, so far as I can tell.
[etc]

No ideas? I'm willing to help track this down, and the machine
in question is sufficiently little used that I can do so without
gross inconvenience; but I don't have enough FreeBSD kernel
expertise to feel like diving in blind.

 *

A little more information, in case it's useful to anyone:

  | $ echo; sysctl debug | egrep to_
  | debug.to_avg_mpcalls: 2890
  | debug.to_avg_mtxcalls: 0
  | debug.to_avg_gcalls: 768
  | debug.to_avg_depth: 3815

That's with HZ = 100. Here are some numbers from a message
in freebsd-ia64, from Marcel Moolenaar, in 2004-07, to someone
seeing symptoms like mine. They're meant to be typical healthy
numbers. Mine above look somewhat worse, but not insanely so;
surely not enough to explain the difference between using
0.3% cpu and using 75%. Marcel also had HZ=100.

  | % sysctl debug | grep to_avg
  | debug.to_avg_depth: 2500
  | debug.to_avg_gcalls: 1003
  | debug.to_avg_mpcalls: 1255

 *

It would be a shame if the only conclusion to be drawn from this
were "sometimes a machine running FreeBSD is just 4x slower than
it should be, and no one knows why".

-- 
g

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: "swiN: clock sio" process taking 75% CPU

2006-07-14 Thread Gareth McCaughan
I wrote, inter alia,

> About 6 minutes after booting (on two occasions; I don't
> guarantee that this doesn't vary), a process that appears
> in the output of "ps" as "[swi4: clock sio]" begins to
> use about 3/4 of the machine's CPU. I think it does so
> more or less instantaneously. It continues to do so
> indefinitely, so far as I can tell.

David Wolfskill e-mailed me off-list to suggest looking at
the output of "vmstat -i". Answer: the interrupt rates all
appear to be normal, or at least similar to those he observes
on his machines which don't exhibit my problem. More specifically ...

-- excerpt from my reply to David begins --
I get this:

  | interrupt  total   rate
  | irq1: atkbd0   3  0
  | irq6: fdc010  0
  | irq14: ata0 2913  1
  | irq15: ata1   47  0
  | irq17: xl0  7342  4
  | cpu0: timer   302649199
  | Total 312964206

(so the rate of timer interrupts doesn't appear to be
insane)

and

  |  7:56PM  up 26 mins, 1 user, load averages: 1.87, 1.45, 1.08

(so the cost in CPU cycles of servicing them -- if that's what
the rogue process is doing, which seems somewhat plausible --
*does* appear to be insane).
-- excerpt from my reply to David ends --

-- 
g

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


"swiN: clock sio" process taking 75% CPU

2006-07-13 Thread Gareth McCaughan
(Approximately the same question as I'm about to ask seems
to have come up many times in -questions and -stable, but
I haven't found a useful answer to it.)

I have a box running 6-STABLE, cvsupped yesterday.
Until recently it was running 5.something and showed
exactly the same peculiar behaviour as I'm about to
describe.

About 6 minutes after booting (on two occasions; I don't
guarantee that this doesn't vary), a process that appears
in the output of "ps" as "[swi4: clock sio]" begins to
use about 3/4 of the machine's CPU. I think it does so
more or less instantaneously. It continues to do so
indefinitely, so far as I can tell.

I'm not aware of anything specific that triggers this,
though I suppose there must *be* something. It happens
apparently spontaneously, on a lightly loaded machine.

Those cycles are genuinely being consumed; other processes
run much more slowly, and take much more wall time than
CPU time.

I've tried diddling my kernel's HZ value; the behaviour
with HZ=100 and with HZ=1000 is the same, so far as I'm
able to tell. I've no idea whether it might be relevant,
but I have option DEVICE_POLLING turned on; toggling
sysctl kern.polling.enable doesn't seem to make any
difference.

The machine is a very uninteresting single-CPU Athlon box,
clocked at 1.6GHz, several years old. It used to run 4.x
once upon a time, and I'm fairly sure it was fine then.
As I mentioned above, it's running a recent 6-STABLE but
showed the same pathology under 5.x. Here's its dmesg output,
with a few uninteresting bits of information leakage elided.

-- dmesg output begins --
Copyright (c) 1992-2006 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD 6.1-STABLE #3: Thu Jul 13 16:46:07 BST 2006
[EMAIL PROTECTED]:/usr/obj/usr/src/sys/
ACPI APIC Table: 
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: AMD Athlon(tm) XP 1900+ (1599.73-MHz 686-class CPU)
  Origin = "AuthenticAMD"  Id = 0x662  Stepping = 2
  
Features=0x383fbff
  AMD Features=0xc0480800
real memory  = 536805376 (511 MB)
avail memory = 51598 (492 MB)
ioapic0  irqs 0-23 on motherboard
netsmb_dev: loaded
acpi0:  on motherboard
acpi0: Power Button (fixed)
Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x808-0x80b on acpi0
cpu0:  on acpi0
acpi_throttle0:  on cpu0
acpi_button0:  on acpi0
pcib0:  port 0xcf8-0xcff on acpi0
pci0:  on pcib0
agp0:  mem 
0xe000-0xe3ff at device 0.0 on pci0
pcib1:  at device 1.0 on pci0
pci1:  on pcib1
pci1:  at device 0.0 (no driver attached)
xl0: <3Com 3c905C-TX Fast Etherlink XL> port 0xec00-0xec7f mem 
0xdf80-0xdfff irq 17 at device 6.0 on pci0
miibus0:  on xl0
ukphy0:  on miibus0
ukphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
xl0: Ethernet address: 00:04:75:7e:54:fd
isab0:  at device 17.0 on pci0
isa0:  on isab0
atapci0:  port 
0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xfc00-0xfc0f at device 17.1 on pci0
ata0:  on atapci0
ata1:  on atapci0
uhci0:  port 0xdc00-0xdc1f irq 5 at device 17.2 on 
pci0
uhci0: [GIANT-LOCKED]
usb0:  on uhci0
usb0: USB revision 1.0
uhub0: VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 2 ports with 2 removable, self powered
uhci1:  port 0xe000-0xe01f irq 5 at device 17.3 on 
pci0
uhci1: [GIANT-LOCKED]
usb1:  on uhci1
usb1: USB revision 1.0
uhub1: VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub1: 2 ports with 2 removable, self powered
uhci2:  port 0xe400-0xe41f irq 5 at device 17.4 on 
pci0
uhci2: [GIANT-LOCKED]
usb2:  on uhci2
usb2: USB revision 1.0
uhub2: VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub2: 2 ports with 2 removable, self powered
pci0:  at device 17.5 (no driver attached)
acpi_button1:  on acpi0
fdc0:  port 0x3f2-0x3f3,0x3f4-0x3f5,0x3f7 irq 6 drq 2 
on acpi0
fdc0: [FAST]
fd0: <1440-KB 3.5" drive> on fdc0 drive 0
sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
sio0: type 16550A
sio1: <16550A-compatible COM port> port 0x2f8-0x2ff irq 3 on acpi0
sio1: type 16550A
ppc0:  port 0x378-0x37f,0x778-0x77b irq 7 drq 3 on 
acpi0
ppc0: SMC-like chipset (ECP/EPP/PS2/NIBBLE) in COMPATIBLE mode
ppc0: FIFO with 16/16/9 bytes threshold
ppbus0:  on ppc0
plip0:  on ppbus0
lpt0:  on ppbus0
lpt0: Interrupt-driven port
ppi0:  on ppbus0
atkbdc0:  port 0x60,0x64 irq 1 on acpi0
atkbd0:  irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
psm0:  irq 12 on atkbdc0
psm0: [GIANT-LOCKED]
psm0: model Generic PS/2 mouse, device ID 0
pmtimer0 on isa0
sc0:  at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
vga0:  at port 0x3c0-0x3df iomem 0xa-0xb on isa0
Timecounter "TSC" frequency 1599730245 Hz quality 800
Timecounters tick every 10.000 msec
ad0: 38166MB  at ata0-master UDMA100
acd0: DVDROM  at ata1-master UDMA33
Trying to mount root from ufs:/dev/ad0s1a
-- dmesg output ends --