Re: "swiN: clock sio" process taking 75% CPU
I wrote: > About 6 minutes after booting (on two occasions; I don't > guarantee that this doesn't vary), a process that appears > in the output of "ps" as "[swi4: clock sio]" begins to > use about 3/4 of the machine's CPU. I think it does so > more or less instantaneously. It continues to do so > indefinitely, so far as I can tell. So, here's the answer. Whether it's the same thing that's afflicted the other people who've reported similar problems, I don't know. (Thanks to John Baldwin on -hackers for pointing me in a useful direction.) Executive summary: If you see symptoms like the one above, are you running a syscons screen saver? (To check: run "kldstat | grep _saver".) If so, turn it off and the problem may go away. 1. The machine in question runs largely unattended. 2. I'd enabled the syscons screen saver and chosen one of the ones that puts the screen into a graphics mode. ("warp", as it happens; "fire" behaves similarly; the character-mode ones don't; I haven't looked at all of them.) 3. The screen saver kicks in 5 minutes after it gets turned on in /etc/rc.d/syscons, provided nothing's happening on the console. Which it isn't: see #1. 4. Now, how do those graphics-mode screen savers work? They write to the video card's frame buffer directly, but there's only a 64k block of RAM they can do this through. So, to cope with larger screens, there's a bank switching facility accessed by a BIOS call. 5. This BIOS call, on my machine, takes about 0.1ms; you need to do two of them for a bank switch, so the time actually taken is about 0.2ms. 6. The screen savers are written in a less than optimal way, and do that bank switching thing many times. For instance, the "fire" screen saver does it at least once for every screen line. Even when the entire screen actually fits into a single bank so that no switching at all should be needed. 7. So the screensaver eats up something on the order of half my CPU time; the exact figure depends on which screensaver and on more exact timings than I've given above, which is how it ends up actually being 75% for the "warp" screensaver. 8. The screensaver gets run in callouts from a kernel interrupt thread that happens to have a silly name like "swi4: clock sio". This is eminently fixable, in several different ways. I've offered to prepare a patch, or perhaps someone else will do so, so there's a reasonable prospect of later versions of FreeBSD not having this problem. For the time being, there's a simple workaround for anyone facing the same problem I did: *turn off the screensaver*, or replace it with one that doesn't use a graphics mode. For clarity: this is a problem with (some) FreeBSD syscons screen savers, the ones you might enable in /etc/rc.conf; not with the ones like xscreensaver that you might run in user mode under X. -- g ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: "swiN: clock sio" process taking 75% CPU
> Can I suggest that if you get no replies on questions@ that you try > hackers@ (you may need to subscribe to post, not sure). > > What's going on here is not normal so even someone with tons of FreeBSD > experience may never have seen something like this. I know I never > have. Most posters here are not kernel hackers. That's a fair comment. I tried -stable too; I'm fairly sure some people there *are* kernel hackers; but -hackers is probably not a bad place to look for deeper wizardry. > > It would be a shame if the only conclusion to be drawn from this were > > "sometimes a machine running FreeBSD is just 4x slower than it should > > be, and no one knows why". > > Well, if you could only find one somewhat relevant topic with google, > then "sometimes" would be "almost never". I appreciate that you are > frustrated because it *is* happening to you, but your problem is not a > common one so may take more than a post on questions to resolve. I found quite a lot of relevant messages. In every case what happened was that someone posted saying "I've got this weird thing where X% of my CPU time is being taken up by this clock sio thing" and they got no responses other than from other people saying "yup, that's strange". (Sometimes there was something else they were also concerned about, and they got responses to that bit.) A few examples found by googling for "clock sio site:lists.freebsd.org" follow; most appear to have been asked only on -stable, and it was because this seems to have been an unsuccessful strategy that I tried -questions too. :-) http://lists.freebsd.org/pipermail/freebsd-stable/2006-April/024793.html (16%; old system with modems attached, which were also giving problems; the latter was resolved, kinda, but it's not clear that the former was) http://lists.freebsd.org/pipermail/freebsd-stable/2006-July/026873.html (not clear how much CPU; his concern was something else that was triggered by this, and he had another kernel thread being silly too; no resolution) http://lists.freebsd.org/pipermail/freebsd-stable/2004-December/010673.html (not certain that this is the same problem, but it sure looks like it; no reply to his message) http://lists.freebsd.org/pipermail/freebsd-stable/2004-November/009489.html (24% on one machine, 8% on another; no reply) http://lists.freebsd.org/pipermail/freebsd-current/2005-March/047218.html (18% but drops when a real task begins; may not be the same problem; this is under MS Virtual Server; only suggestion was to reduce HZ, which didn't solve the problem; no further replies) However, I've not seen any sign that other people have had *as much* CPU wasted this way as I have. -- g ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: "swiN: clock sio" process taking 75% CPU
Can I suggest that if you get no replies on questions@ that you try hackers@ (you may need to subscribe to post, not sure). What's going on here is not normal so even someone with tons of FreeBSD experience may never have seen something like this. I know I never have. Most posters here are not kernel hackers. It would be a shame if the only conclusion to be drawn from this were "sometimes a machine running FreeBSD is just 4x slower than it should be, and no one knows why". Well, if you could only find one somewhat relevant topic with google, then "sometimes" would be "almost never". I appreciate that you are frustrated because it *is* happening to you, but your problem is not a common one so may take more than a post on questions to resolve. --Alex ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: "swiN: clock sio" process taking 75% CPU
I wrote: > About 6 minutes after booting (on two occasions; I don't > guarantee that this doesn't vary), a process that appears > in the output of "ps" as "[swi4: clock sio]" begins to > use about 3/4 of the machine's CPU. I think it does so > more or less instantaneously. It continues to do so > indefinitely, so far as I can tell. [etc] No ideas? I'm willing to help track this down, and the machine in question is sufficiently little used that I can do so without gross inconvenience; but I don't have enough FreeBSD kernel expertise to feel like diving in blind. * A little more information, in case it's useful to anyone: | $ echo; sysctl debug | egrep to_ | debug.to_avg_mpcalls: 2890 | debug.to_avg_mtxcalls: 0 | debug.to_avg_gcalls: 768 | debug.to_avg_depth: 3815 That's with HZ = 100. Here are some numbers from a message in freebsd-ia64, from Marcel Moolenaar, in 2004-07, to someone seeing symptoms like mine. They're meant to be typical healthy numbers. Mine above look somewhat worse, but not insanely so; surely not enough to explain the difference between using 0.3% cpu and using 75%. Marcel also had HZ=100. | % sysctl debug | grep to_avg | debug.to_avg_depth: 2500 | debug.to_avg_gcalls: 1003 | debug.to_avg_mpcalls: 1255 * It would be a shame if the only conclusion to be drawn from this were "sometimes a machine running FreeBSD is just 4x slower than it should be, and no one knows why". -- g ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: "swiN: clock sio" process taking 75% CPU
I wrote, inter alia, > About 6 minutes after booting (on two occasions; I don't > guarantee that this doesn't vary), a process that appears > in the output of "ps" as "[swi4: clock sio]" begins to > use about 3/4 of the machine's CPU. I think it does so > more or less instantaneously. It continues to do so > indefinitely, so far as I can tell. David Wolfskill e-mailed me off-list to suggest looking at the output of "vmstat -i". Answer: the interrupt rates all appear to be normal, or at least similar to those he observes on his machines which don't exhibit my problem. More specifically ... -- excerpt from my reply to David begins -- I get this: | interrupt total rate | irq1: atkbd0 3 0 | irq6: fdc010 0 | irq14: ata0 2913 1 | irq15: ata1 47 0 | irq17: xl0 7342 4 | cpu0: timer 302649199 | Total 312964206 (so the rate of timer interrupts doesn't appear to be insane) and | 7:56PM up 26 mins, 1 user, load averages: 1.87, 1.45, 1.08 (so the cost in CPU cycles of servicing them -- if that's what the rogue process is doing, which seems somewhat plausible -- *does* appear to be insane). -- excerpt from my reply to David ends -- -- g ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "[EMAIL PROTECTED]"
"swiN: clock sio" process taking 75% CPU
(Approximately the same question as I'm about to ask seems to have come up many times in -questions and -stable, but I haven't found a useful answer to it.) I have a box running 6-STABLE, cvsupped yesterday. Until recently it was running 5.something and showed exactly the same peculiar behaviour as I'm about to describe. About 6 minutes after booting (on two occasions; I don't guarantee that this doesn't vary), a process that appears in the output of "ps" as "[swi4: clock sio]" begins to use about 3/4 of the machine's CPU. I think it does so more or less instantaneously. It continues to do so indefinitely, so far as I can tell. I'm not aware of anything specific that triggers this, though I suppose there must *be* something. It happens apparently spontaneously, on a lightly loaded machine. Those cycles are genuinely being consumed; other processes run much more slowly, and take much more wall time than CPU time. I've tried diddling my kernel's HZ value; the behaviour with HZ=100 and with HZ=1000 is the same, so far as I'm able to tell. I've no idea whether it might be relevant, but I have option DEVICE_POLLING turned on; toggling sysctl kern.polling.enable doesn't seem to make any difference. The machine is a very uninteresting single-CPU Athlon box, clocked at 1.6GHz, several years old. It used to run 4.x once upon a time, and I'm fairly sure it was fine then. As I mentioned above, it's running a recent 6-STABLE but showed the same pathology under 5.x. Here's its dmesg output, with a few uninteresting bits of information leakage elided. -- dmesg output begins -- Copyright (c) 1992-2006 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 6.1-STABLE #3: Thu Jul 13 16:46:07 BST 2006 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/ ACPI APIC Table: Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: AMD Athlon(tm) XP 1900+ (1599.73-MHz 686-class CPU) Origin = "AuthenticAMD" Id = 0x662 Stepping = 2 Features=0x383fbff AMD Features=0xc0480800 real memory = 536805376 (511 MB) avail memory = 51598 (492 MB) ioapic0 irqs 0-23 on motherboard netsmb_dev: loaded acpi0: on motherboard acpi0: Power Button (fixed) Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x808-0x80b on acpi0 cpu0: on acpi0 acpi_throttle0: on cpu0 acpi_button0: on acpi0 pcib0: port 0xcf8-0xcff on acpi0 pci0: on pcib0 agp0: mem 0xe000-0xe3ff at device 0.0 on pci0 pcib1: at device 1.0 on pci0 pci1: on pcib1 pci1: at device 0.0 (no driver attached) xl0: <3Com 3c905C-TX Fast Etherlink XL> port 0xec00-0xec7f mem 0xdf80-0xdfff irq 17 at device 6.0 on pci0 miibus0: on xl0 ukphy0: on miibus0 ukphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto xl0: Ethernet address: 00:04:75:7e:54:fd isab0: at device 17.0 on pci0 isa0: on isab0 atapci0: port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xfc00-0xfc0f at device 17.1 on pci0 ata0: on atapci0 ata1: on atapci0 uhci0: port 0xdc00-0xdc1f irq 5 at device 17.2 on pci0 uhci0: [GIANT-LOCKED] usb0: on uhci0 usb0: USB revision 1.0 uhub0: VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub0: 2 ports with 2 removable, self powered uhci1: port 0xe000-0xe01f irq 5 at device 17.3 on pci0 uhci1: [GIANT-LOCKED] usb1: on uhci1 usb1: USB revision 1.0 uhub1: VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub1: 2 ports with 2 removable, self powered uhci2: port 0xe400-0xe41f irq 5 at device 17.4 on pci0 uhci2: [GIANT-LOCKED] usb2: on uhci2 usb2: USB revision 1.0 uhub2: VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub2: 2 ports with 2 removable, self powered pci0: at device 17.5 (no driver attached) acpi_button1: on acpi0 fdc0: port 0x3f2-0x3f3,0x3f4-0x3f5,0x3f7 irq 6 drq 2 on acpi0 fdc0: [FAST] fd0: <1440-KB 3.5" drive> on fdc0 drive 0 sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 sio0: type 16550A sio1: <16550A-compatible COM port> port 0x2f8-0x2ff irq 3 on acpi0 sio1: type 16550A ppc0: port 0x378-0x37f,0x778-0x77b irq 7 drq 3 on acpi0 ppc0: SMC-like chipset (ECP/EPP/PS2/NIBBLE) in COMPATIBLE mode ppc0: FIFO with 16/16/9 bytes threshold ppbus0: on ppc0 plip0: on ppbus0 lpt0: on ppbus0 lpt0: Interrupt-driven port ppi0: on ppbus0 atkbdc0: port 0x60,0x64 irq 1 on acpi0 atkbd0: irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] psm0: irq 12 on atkbdc0 psm0: [GIANT-LOCKED] psm0: model Generic PS/2 mouse, device ID 0 pmtimer0 on isa0 sc0: at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> vga0: at port 0x3c0-0x3df iomem 0xa-0xb on isa0 Timecounter "TSC" frequency 1599730245 Hz quality 800 Timecounters tick every 10.000 msec ad0: 38166MB at ata0-master UDMA100 acd0: DVDROM at ata1-master UDMA33 Trying to mount root from ufs:/dev/ad0s1a -- dmesg output ends --