cpu timer issues
Hello List We have been having issues with some firewall machines of ours using pfSense. FreeBSD smash01.ish.com.au 7.2-RELEASE-p5 FreeBSD 7.2-RELEASE-p5 #0: Sun Dec 6 23:20:31 EST 2009 sullr...@freebsd_7.2_pfsense_1.2.3_snaps.pfsense.org:/usr/obj.pfSense/usr/pfSensesrc/src/sys/pfSense_SMP.7 i386 MotherBoard: http://www.supermicro.com/products/motherboard/Xeon3000/3200/X7SBi-LN4.cfm Originally the systems started out by showing a lot of packet loss, the system time would fall behind, and the value of #vmstat -i | grep timer was dropping below 2000. I was lead to believe by the guys at pfSense that this is where the value should sit. I would also receive errors in messages that looked like kernel: calcru: runtime went backwards from 244314 usec to 236341. We tried a variety of things, disabling USB, turning off the Intel Speed Step in the BIOS, disabling ACPI, etc, etc. All having little to no effect. The only thing that would right it is restarting the box but over time it would degrade again. I talked to the SuperMicro and they said that this is a FreeBSD issue and pretty much washed their hands of it. After a couple of months of dealing with this and just rebooting the systems reguarly, the symptoms slowly but surely disappeared. eg. The kernel messages went away, the system time was not falling behind and I was experiencing no packet loss but the #vmstat -i | grep timer value would continue to decrease over time. Eventually I think, when it finally got the 0 the machine restarted (I am only guessing here). After this restart it worked again for a couple of hours and then it restarted again. After the second time the system has not missed a beat, it has been fine and the #vmstat -i | grep timer value remained near the 2000 mark... We setup some zabbix monitoring to watch it. As mentioned it was fine for about a month. Until today. Today the value has dropped to 0, but the system has not restarted and over the last couple of hours the value has increased to 47. This machine is mission critical, we have two in a fail over scenario (using pfSense's CARP features) and it seems unfortunate that we have an issue with two brand new SuperMicro boxes that affect both machines. While at the moment everything seems fine I want to ensure that I have no further issues. Does anyone have any suggestions? Lastly I have double check both of the below: http://www.freebsd.org/doc/en_US.ISO8859-1/books/faq/troubleshoot.html#CALCRU-NEGATIVE-RUNTIME We disabled EIST. http://www.freebsd.org/doc/en_US.ISO8859-1/books/faq/troubleshoot.html#COMPUTER-CLOCK-SKEW # dmesg | grep Timecounter Timecounter i8254 frequency 1193182 Hz quality 0 Timecounters tick every 1.000 msec # sysctl kern.timecounter.hardware kern.timecounter.hardware: i8254 Only have one timer to choose from. Thanks Jurgen ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: cpu timer issues
On Tue, Sep 28, 2010 at 05:54:15PM +1000, Jurgen Weber wrote: Hello List We have been having issues with some firewall machines of ours using pfSense. FreeBSD smash01.ish.com.au 7.2-RELEASE-p5 FreeBSD 7.2-RELEASE-p5 #0: Sun Dec 6 23:20:31 EST 2009 sullr...@freebsd_7.2_pfsense_1.2.3_snaps.pfsense.org:/usr/obj.pfSense/usr/pfSensesrc/src/sys/pfSense_SMP.7 i386 MotherBoard: http://www.supermicro.com/products/motherboard/Xeon3000/3200/X7SBi-LN4.cfm Originally the systems started out by showing a lot of packet loss, the system time would fall behind, and the value of #vmstat -i | grep timer was dropping below 2000. I was lead to believe by the guys at pfSense that this is where the value should sit. I would also receive errors in messages that looked like kernel: calcru: runtime went backwards from 244314 usec to 236341. We tried a variety of things, disabling USB, turning off the Intel Speed Step in the BIOS, disabling ACPI, etc, etc. All having little to no effect. The only thing that would right it is restarting the box but over time it would degrade again. I talked to the SuperMicro and they said that this is a FreeBSD issue and pretty much washed their hands of it. After a couple of months of dealing with this and just rebooting the systems reguarly, the symptoms slowly but surely disappeared. eg. The kernel messages went away, the system time was not falling behind and I was experiencing no packet loss but the #vmstat -i | grep timer value would continue to decrease over time. Eventually I think, when it finally got the 0 the machine restarted (I am only guessing here). After this restart it worked again for a couple of hours and then it restarted again. After the second time the system has not missed a beat, it has been fine and the #vmstat -i | grep timer value remained near the 2000 mark... We setup some zabbix monitoring to watch it. As mentioned it was fine for about a month. Until today. Today the value has dropped to 0, but the system has not restarted and over the last couple of hours the value has increased to 47. This machine is mission critical, we have two in a fail over scenario (using pfSense's CARP features) and it seems unfortunate that we have an issue with two brand new SuperMicro boxes that affect both machines. While at the moment everything seems fine I want to ensure that I have no further issues. Does anyone have any suggestions? Lastly I have double check both of the below: http://www.freebsd.org/doc/en_US.ISO8859-1/books/faq/troubleshoot.html#CALCRU-NEGATIVE-RUNTIME We disabled EIST. http://www.freebsd.org/doc/en_US.ISO8859-1/books/faq/troubleshoot.html#COMPUTER-CLOCK-SKEW # dmesg | grep Timecounter Timecounter i8254 frequency 1193182 Hz quality 0 Timecounters tick every 1.000 msec # sysctl kern.timecounter.hardware kern.timecounter.hardware: i8254 Only have one timer to choose from. I have a subrevision of this motherboard in use in production, which ran RELENG_7 and now runs RELENG_8, without any of the problems you describe. I don't have any experience with the -LN4 submodel though, although I do have experience with the X7SBA-LN4. Our hardware in question: http://www.supermicro.com/products/system/1U/5015/SYS-5015B-MT.cfm The machine in question consists of 4 disks (1 OS, 3 ZFS raidz1), uses both NICs (two separate networks) at gigE rates, handles nightly backups for all other servers, acts as an NFS server, a time source (ntpd) for other servers on the network, and a serial console head. Oh, it also has EIST enabled, and runs powerd with some minor (well-known) tunings in loader.conf for it. Secondly, here's our sysctl kern.timecounter tree on our system, in addition to our SMBIOS details (proving the system is what I say it is). Note that we have multiple timecounter choices, and APCI-fast is chosen. I would expect problems if i8254 was chosen, but the question is why this is being chosen on your systems and why alternate timecounter choices aren't available. You said you tried booting with ACPI disabled, which might explain why ACPI-fast or ACPI-safe are missing. $ sysctl kern.timecounter kern.timecounter.tick: 1 kern.timecounter.choice: TSC(-100) ACPI-fast(1000) i8254(0) dummy(-100) kern.timecounter.hardware: ACPI-fast kern.timecounter.stepwarnings: 0 kern.timecounter.tc.i8254.mask: 65535 kern.timecounter.tc.i8254.counter: 47135 kern.timecounter.tc.i8254.frequency: 1193182 kern.timecounter.tc.i8254.quality: 0 kern.timecounter.tc.ACPI-fast.mask: 16777215 kern.timecounter.tc.ACPI-fast.counter: 188736 kern.timecounter.tc.ACPI-fast.frequency: 3579545 kern.timecounter.tc.ACPI-fast.quality: 1000 kern.timecounter.tc.TSC.mask: 4294967295 kern.timecounter.tc.TSC.counter: 2830682562 kern.timecounter.tc.TSC.frequency: 2333508681 kern.timecounter.tc.TSC.quality: -100 kern.timecounter.smp_tsc: 0 kern.timecounter.invariant_tsc: 1 $ kenv | grep smbios smbios.bios.reldate=07/24/2009
Re: cpu timer issues
on 28/09/2010 10:54 Jurgen Weber said the following: # dmesg | grep Timecounter Timecounter i8254 frequency 1193182 Hz quality 0 Timecounters tick every 1.000 msec # sysctl kern.timecounter.hardware kern.timecounter.hardware: i8254 Only have one timer to choose from. Can you provide a little bit more of hard data than the above? Specifically, the following sysctls: kern.timecounter dev.cpu Output of vmstat -i. _Verbose_ boot dmesg. Please do not disable ACPI when taking this data. Preferably, upload it somewhere and post a link to it. -- Andriy Gapon ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: cpu timer issues
On 28.09.2010, at 10:54, Jurgen Weber jur...@ish.com.au wrote: Hello List We have been having issues with some firewall machines of ours using pfSense. FreeBSD smash01.ish.com.au 7.2-RELEASE-p5 FreeBSD 7.2-RELEASE-p5 #0: Sun Dec 6 23:20:31 EST 2009 sullr...@freebsd_7.2_pfsense_1.2.3_snaps.pfsense.org:/usr/obj.pfSense/usr/pfSensesrc/src/sys/pfSense_SMP.7 i386 MotherBoard: http://www.supermicro.com/products/motherboard/Xeon3000/3200/X7SBi-LN4.cfm Originally the systems started out by showing a lot of packet loss, the system time would fall behind, and the value of #vmstat -i | grep timer was dropping below 2000. I was lead to believe by the guys at pfSense that this is where the value should sit. I would also receive errors in messages that looked like kernel: calcru: runtime went backwards from 244314 usec to 236341. We tried a variety of things, disabling USB, turning off the Intel Speed Step in the BIOS, disabling ACPI, etc, etc. All having little to no effect. The only thing that would right it is restarting the box but over time it would degrade again. I talked to the SuperMicro and they said that this is a FreeBSD issue and pretty much washed their hands of it. After a couple of months of dealing with this and just rebooting the systems reguarly, the symptoms slowly but surely disappeared. eg. The kernel messages went away, the system time was not falling behind and I was experiencing no packet loss but the #vmstat -i | grep timer value would continue to decrease over time. Eventually I think, when it finally got the 0 the machine restarted (I am only guessing here). After this restart it worked again for a couple of hours and then it restarted again. After the second time the system has not missed a beat, it has been fine and the #vmstat -i | grep timer value remained near the 2000 mark... We setup some zabbix monitoring to watch it. As mentioned it was fine for about a month. Until today. Today the value has dropped to 0, but the system has not restarted and over the last couple of hours the value has increased to 47. This machine is mission critical, we have two in a fail over scenario (using pfSense's CARP features) and it seems unfortunate that we have an issue with two brand new SuperMicro boxes that affect both machines. While at the moment everything seems fine I want to ensure that I have no further issues. Does anyone have any suggestions? Lastly I have double check both of the below: http://www.freebsd.org/doc/en_US.ISO8859-1/books/faq/troubleshoot.html#CALCRU-NEGATIVE-RUNTIME We disabled EIST. http://www.freebsd.org/doc/en_US.ISO8859-1/books/faq/troubleshoot.html#COMPUTER-CLOCK-SKEW # dmesg | grep Timecounter Timecounter i8254 frequency 1193182 Hz quality 0 Timecounters tick every 1.000 msec # sysctl kern.timecounter.hardware kern.timecounter.hardware: i8254 Only have one timer to choose from. Thanks Jurgen ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org Hello, vmsat -i calculates interrupt rate based on interrupt count/uptime, and the interrupt count is 32 bit integer. With high values of kern.hz it will overflow in few days (with kern.hz=4000 it will happen every 12 days or so). If that is the case, use systat -vmstat 1 to get accurate interrupt rate. That is just fyi, because i was confused once and it scared me abit, and i started changing counters untill i noticed this. p.s. please forgive my poor english___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Still getting kmem exhausted panic
Hi, This is with stable as of yesterday,but with an un-tunned ZFS box I was still able to generate a kmem exhausted panic. Hard panic, just 3 lines. The box contains 12Gb memory, runs on a 6 core (with HT) xeon. 6* 2T WD black caviar in raidz2 with 2*512Mb mirrored log. The box died while rsyncing 5.8T from its partnering system. (that was the only activity on the box) So the obvious would to conclude that auto-tuning voor ZFS on 8.1-Stable is not yet quite there. So I guess that we still need tuning advice even for 8.1. And thus prevent a hard panic. At the moment trying to 'zfs send | rsh zfs receive' the stuff. Which seems to run at about 40Mb/sec, and is a lot faster than the rsync stuff. --WjW ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Still getting kmem exhausted panic
On Tue, Sep 28, 2010 at 01:24:28PM +0200, Willem Jan Withagen wrote: This is with stable as of yesterday,but with an un-tunned ZFS box I was still able to generate a kmem exhausted panic. Hard panic, just 3 lines. The box contains 12Gb memory, runs on a 6 core (with HT) xeon. 6* 2T WD black caviar in raidz2 with 2*512Mb mirrored log. The box died while rsyncing 5.8T from its partnering system. (that was the only activity on the box) It would help if you could provide output from the following commands (even after the box has rebooted): $ sysctl -a | egrep ^vm.kmem $ sysctl -a | egrep ^vfs.zfs.arc $ sysctl kstat.zfs.misc.arcstats So the obvious would to conclude that auto-tuning voor ZFS on 8.1-Stable is not yet quite there. So I guess that we still need tuning advice even for 8.1. And thus prevent a hard panic. Andriy Gapon provides this general recommendation: http://lists.freebsd.org/pipermail/freebsd-stable/2010-September/059114.html The advice I've given for RELENG_8 (as of the time of this writing), 8.1-STABLE, and 8.1-RELEASE, is that for amd64 you'll need to tune: vm.kmem_size vfs.zfs.arc_max An example machine: amd64, with 4GB physical RAM installed (3916MB available for use (verified via dmesg)) uses values: vm.kmem_size=4096M vfs.zfs.arc_max=3584M Another example machine: amd64, with 8GB physical RAM installed (7875MB available for use) uses values: vm.kmem_size=8192M vfs.zfs.arc_max=6144M I believe the trick -- Andriy, please correct me if I'm wrong -- is the tuning of vfs.zfs.arc_max, which is now a hard limit rather than a high watermark. However, I believe there have been occasional reports of exhaustion panics despite both of these being set[1]. Those reports are being investigated on an individual basis. I set some other ZFS-related parameters as well (disabling prefetch, adjusting txg.timeout, etc.), but those shouldn't be necessary to gain stability at this point in time. I can't provide tuning advice for i386. [1]: http://lists.freebsd.org/pipermail/freebsd-stable/2010-September/059109.html -- | Jeremy Chadwick j...@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
[releng_8_1 tinderbox] failure on powerpc/powerpc
TB --- 2010-09-28 12:06:50 - tinderbox 2.6 running on freebsd-current.sentex.ca TB --- 2010-09-28 12:06:50 - starting RELENG_8_1 tinderbox run for powerpc/powerpc TB --- 2010-09-28 12:06:50 - cleaning the object tree TB --- 2010-09-28 12:09:09 - cvsupping the source tree TB --- 2010-09-28 12:09:10 - /usr/bin/csup -z -r 3 -g -L 1 -h cvsup5.freebsd.org /tinderbox/RELENG_8_1/powerpc/powerpc/supfile TB --- 2010-09-28 12:16:20 - WARNING: /usr/bin/csup returned exit code 1 TB --- 2010-09-28 12:16:20 - ERROR: unable to cvsup the source tree TB --- 2010-09-28 12:16:20 - 2.59 user 202.72 system 570.10 real http://tinderbox.freebsd.org/tinderbox-releng_8-RELENG_8_1-powerpc-powerpc.full ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Still getting kmem exhausted panic
on 28/09/2010 14:50 Jeremy Chadwick said the following: I believe the trick -- Andriy, please correct me if I'm wrong -- is the Wouldn't hurt to CC me, so that I could do it :-) tuning of vfs.zfs.arc_max, which is now a hard limit rather than a high watermark. Not sure what you mean here. What is hard limit, what is high watermark, what is the difference and when is now? :-) I believe that the trick is to set vm.kmem_size high enough, eitehr using this tunable or vm.kmem_size_scale. However, I believe there have been occasional reports of exhaustion panics despite both of these being set[1]. Those reports are being investigated on an individual basis. I don't believe that the report that you quote actually demonstrates what you say it does. Two quotes from it: During these panics no tuning or /boot/loader.conf values where present. Only after hitting this behaviour yesterday i created boot/loader.conf [1]: http://lists.freebsd.org/pipermail/freebsd-stable/2010-September/059109.html -- Andriy Gapon ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Still getting kmem exhausted panic
On Tue, Sep 28, 2010 at 03:22:01PM +0300, Andriy Gapon wrote: on 28/09/2010 14:50 Jeremy Chadwick said the following: I believe the trick -- Andriy, please correct me if I'm wrong -- is the Wouldn't hurt to CC me, so that I could do it :-) tuning of vfs.zfs.arc_max, which is now a hard limit rather than a high watermark. Not sure what you mean here. What is hard limit, what is high watermark, what is the difference and when is now? :-) There was some speculation on the part of users a while back which lead to this understanding. Folks were seeing actual ARC usage higher than what vfs.zfs.arc_max was set to (automatically or administratively). I believe it started here: http://www.mailinglistarchive.com/freebsd-curr...@freebsd.org/msg28884.html With the high-water mark statements being here: http://www.mailinglistarchive.com/freebsd-curr...@freebsd.org/msg28887.html http://unix.derkeiler.com/Mailing-Lists/FreeBSD/stable/2010-04/msg00129.html The term implies that there is not an explicitly hard limit on the ARC utilisation/growth. As stated in the unix.derkeiler.com URL above, this behaviour was in fact changed. Why/when/how? I had to go digging up the commits -- this took me some time. Here they are, labelled r197816, for RELENG_8 and RELENG_7 respectively. These were both committed on 2010/01/08 UTC: http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c#rev1.22.2.2 http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c#rev1.15.2.6 In HEAD/CURRENT (yet to be MFC'd), it looks like above code got removed on 2010/09/17 UTC, citing they should be enforced by actual calculations of delta: http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c#rev1.46 http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c#rev1.45 So what's this delta code piece that's mentioned? That appears to be have been committed to RELENG_8 on 2010/05/24 UTC (thus, between the above two dates): http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c#rev1.22.2.4 (Side note: the delta stuff was never committed to RELENG_7 -- and that's fine. I'm pointing this out not out of retaliation or insult, but because people will almost certainly Google, find this post, and wonder if their 7.x machines might be affected.) This situation with the ARC, and all its changes over time, is one of the reasons why I rant aggressively about the need for more communication transparency (re: what the changes actually affect). Most SAs and users don't follow commits. I believe that the trick is to set vm.kmem_size high enough, eitehr using this tunable or vm.kmem_size_scale. Thanks for the clarification. I just wish I knew how vm.kmem_size_scale fit into the picture (meaning what it does, etc.). The sysctl description isn't very helpful. Again, my lack of VM knowledge... However, I believe there have been occasional reports of exhaustion panics despite both of these being set[1]. Those reports are being investigated on an individual basis. I don't believe that the report that you quote actually demonstrates what you say it does. Two quotes from it: During these panics no tuning or /boot/loader.conf values where present. Only after hitting this behaviour yesterday i created boot/loader.conf [1]: http://lists.freebsd.org/pipermail/freebsd-stable/2010-September/059109.html You're right -- the report I'm quoting is not the one I thought it was. I'll see if I can dig up the correct mail/report. It could be that I'm thinking of something quite old (pre-ARC-changes (see above paragraphs)). I can barely keep track of all the changes going on. -- | Jeremy Chadwick j...@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Still getting kmem exhausted panic
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 28-9-2010 13:50, Jeremy Chadwick wrote: On Tue, Sep 28, 2010 at 01:24:28PM +0200, Willem Jan Withagen wrote: This is with stable as of yesterday,but with an un-tunned ZFS box I was still able to generate a kmem exhausted panic. Hard panic, just 3 lines. The box contains 12Gb memory, runs on a 6 core (with HT) xeon. 6* 2T WD black caviar in raidz2 with 2*512Mb mirrored log. The box died while rsyncing 5.8T from its partnering system. (that was the only activity on the box) It would help if you could provide output from the following commands (even after the box has rebooted): It is currently in the proces of zfs receive of that same 5.8T. $ sysctl -a | egrep ^vm.kmem $ sysctl -a | egrep ^vfs.zfs.arc $ sysctl kstat.zfs.misc.arcstats sysctl -a | egrep ^vm.kmem vm.kmem_size_scale: 3 vm.kmem_size_max: 329853485875 vm.kmem_size_min: 0 vm.kmem_size: 4156850176 sysctl -a | egrep ^vfs.zfs.arc vfs.zfs.arc_meta_limit: 770777088 vfs.zfs.arc_meta_used: 33449648 vfs.zfs.arc_min: 385388544 vfs.zfs.arc_max: 3083108352 sysctl kstat.zfs.misc.arcstats kstat.zfs.misc.arcstats.hits: 3119873 kstat.zfs.misc.arcstats.misses: 98710 kstat.zfs.misc.arcstats.demand_data_hits: 3043947 kstat.zfs.misc.arcstats.demand_data_misses: 3699 kstat.zfs.misc.arcstats.demand_metadata_hits: 67981 kstat.zfs.misc.arcstats.demand_metadata_misses: 90005 kstat.zfs.misc.arcstats.prefetch_data_hits: 121 kstat.zfs.misc.arcstats.prefetch_data_misses: 48 kstat.zfs.misc.arcstats.prefetch_metadata_hits: 7824 kstat.zfs.misc.arcstats.prefetch_metadata_misses: 4958 kstat.zfs.misc.arcstats.mru_hits: 34828 kstat.zfs.misc.arcstats.mru_ghost_hits: 21736 kstat.zfs.misc.arcstats.mfu_hits: 3077133 kstat.zfs.misc.arcstats.mfu_ghost_hits: 47605 kstat.zfs.misc.arcstats.allocated: 5507025 kstat.zfs.misc.arcstats.deleted: 5349715 kstat.zfs.misc.arcstats.stolen: 4468221 kstat.zfs.misc.arcstats.recycle_miss: 83995 kstat.zfs.misc.arcstats.mutex_miss: 231 kstat.zfs.misc.arcstats.evict_skip: 130461 kstat.zfs.misc.arcstats.evict_l2_cached: 0 kstat.zfs.misc.arcstats.evict_l2_eligible: 592200836608 kstat.zfs.misc.arcstats.evict_l2_ineligible: 1192160 kstat.zfs.misc.arcstats.hash_elements: 20585 kstat.zfs.misc.arcstats.hash_elements_max: 150543 kstat.zfs.misc.arcstats.hash_collisions: 761847 kstat.zfs.misc.arcstats.hash_chains: 780 kstat.zfs.misc.arcstats.hash_chain_max: 6 kstat.zfs.misc.arcstats.p: 2266075295 kstat.zfs.misc.arcstats.c: 2410082200 kstat.zfs.misc.arcstats.c_min: 385388544 kstat.zfs.misc.arcstats.c_max: 3083108352 kstat.zfs.misc.arcstats.size: 2410286720 kstat.zfs.misc.arcstats.hdr_size: 7565040 kstat.zfs.misc.arcstats.data_size: 2394099200 kstat.zfs.misc.arcstats.other_size: 8622480 kstat.zfs.misc.arcstats.l2_hits: 0 kstat.zfs.misc.arcstats.l2_misses: 0 kstat.zfs.misc.arcstats.l2_feeds: 0 kstat.zfs.misc.arcstats.l2_rw_clash: 0 kstat.zfs.misc.arcstats.l2_read_bytes: 0 kstat.zfs.misc.arcstats.l2_write_bytes: 0 kstat.zfs.misc.arcstats.l2_writes_sent: 0 kstat.zfs.misc.arcstats.l2_writes_done: 0 kstat.zfs.misc.arcstats.l2_writes_error: 0 kstat.zfs.misc.arcstats.l2_writes_hdr_miss: 0 kstat.zfs.misc.arcstats.l2_evict_lock_retry: 0 kstat.zfs.misc.arcstats.l2_evict_reading: 0 kstat.zfs.misc.arcstats.l2_free_on_write: 0 kstat.zfs.misc.arcstats.l2_abort_lowmem: 0 kstat.zfs.misc.arcstats.l2_cksum_bad: 0 kstat.zfs.misc.arcstats.l2_io_error: 0 kstat.zfs.misc.arcstats.l2_size: 0 kstat.zfs.misc.arcstats.l2_hdr_size: 0 kstat.zfs.misc.arcstats.memory_throttle_count: 0 kstat.zfs.misc.arcstats.l2_write_trylock_fail: 0 kstat.zfs.misc.arcstats.l2_write_passed_headroom: 0 kstat.zfs.misc.arcstats.l2_write_spa_mismatch: 0 kstat.zfs.misc.arcstats.l2_write_in_l2: 0 kstat.zfs.misc.arcstats.l2_write_io_in_progress: 0 kstat.zfs.misc.arcstats.l2_write_not_cacheable: 85908 kstat.zfs.misc.arcstats.l2_write_full: 0 kstat.zfs.misc.arcstats.l2_write_buffer_iter: 0 kstat.zfs.misc.arcstats.l2_write_pios: 0 kstat.zfs.misc.arcstats.l2_write_buffer_bytes_scanned: 0 kstat.zfs.misc.arcstats.l2_write_buffer_list_iter: 0 kstat.zfs.misc.arcstats.l2_write_buffer_list_null_iter: 0 So the obvious would to conclude that auto-tuning voor ZFS on 8.1-Stable is not yet quite there. So I guess that we still need tuning advice even for 8.1. And thus prevent a hard panic. Andriy Gapon provides this general recommendation: http://lists.freebsd.org/pipermail/freebsd-stable/2010-September/059114.html The advice I've given for RELENG_8 (as of the time of this writing), 8.1-STABLE, and 8.1-RELEASE, is that for amd64 you'll need to tune: Well advises seem to vary, and the latest I understood was that 8.1-stable did not need any tuning. (The other system with a much older kernel is tuned as to what most here are suggesting) And I was shure led to believe that even since 8.0 panics were no longer among us.. vm.kmem_size vfs.zfs.arc_max real memory = 12889096192 (12292 MB) avail memory = 12408684544 (11833 MB) So that
Re: Still getting kmem exhausted panic
on 28/09/2010 16:23 Jeremy Chadwick said the following: On Tue, Sep 28, 2010 at 03:22:01PM +0300, Andriy Gapon wrote: on 28/09/2010 14:50 Jeremy Chadwick said the following: I believe the trick -- Andriy, please correct me if I'm wrong -- is the Wouldn't hurt to CC me, so that I could do it :-) tuning of vfs.zfs.arc_max, which is now a hard limit rather than a high watermark. Not sure what you mean here. What is hard limit, what is high watermark, what is the difference and when is now? :-) There was some speculation on the part of users a while back which lead to this understanding. Folks were seeing actual ARC usage higher than what vfs.zfs.arc_max was set to (automatically or administratively). I believe it started here: http://www.mailinglistarchive.com/freebsd-curr...@freebsd.org/msg28884.html With the high-water mark statements being here: http://www.mailinglistarchive.com/freebsd-curr...@freebsd.org/msg28887.html http://unix.derkeiler.com/Mailing-Lists/FreeBSD/stable/2010-04/msg00129.html The term implies that there is not an explicitly hard limit on the ARC utilisation/growth. As stated in the unix.derkeiler.com URL above, this behaviour was in fact changed. Why/when/how? I had to go digging up the commits -- this took me some time. Here they are, labelled r197816, for RELENG_8 and RELENG_7 respectively. These were both committed on 2010/01/08 UTC: http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c#rev1.22.2.2 http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c#rev1.15.2.6 In HEAD/CURRENT (yet to be MFC'd), it looks like above code got removed on 2010/09/17 UTC, citing they should be enforced by actual calculations of delta: http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c#rev1.46 http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c#rev1.45 So what's this delta code piece that's mentioned? That appears to be have been committed to RELENG_8 on 2010/05/24 UTC (thus, between the above two dates): http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c#rev1.22.2.4 (Side note: the delta stuff was never committed to RELENG_7 -- and that's fine. I'm pointing this out not out of retaliation or insult, but because people will almost certainly Google, find this post, and wonder if their 7.x machines might be affected.) This situation with the ARC, and all its changes over time, is one of the reasons why I rant aggressively about the need for more communication transparency (re: what the changes actually affect). Most SAs and users don't follow commits. Well, no time for me to dig through all that history. arc_max should be a hard limit and it is now. If it ever wasn't then it was a bug. Besides, high watermark is still an ambiguous term, for you it implies that it is not a hard limit, but for me it implies exactly a hard limit. Additionally, going from non-hard limit to a hard limit on ARC size should improve things memory-wise, not vice versa, right? :) P.S. All that I said above is a hint that this is a pointless branch of the thread :) -- Andriy Gapon ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Still getting kmem exhausted panic
on 28/09/2010 16:23 Jeremy Chadwick said the following: On Tue, Sep 28, 2010 at 03:22:01PM +0300, Andriy Gapon wrote: I believe that the trick is to set vm.kmem_size high enough, eitehr using this tunable or vm.kmem_size_scale. Thanks for the clarification. I just wish I knew how vm.kmem_size_scale fit into the picture (meaning what it does, etc.). The sysctl description isn't very helpful. Again, my lack of VM knowledge... Roughly, vm.kmem_size would get set to available memory divided by vm.kmem_size_scale. -- Andriy Gapon ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Still getting kmem exhausted panic
on 28/09/2010 16:25 Willem Jan Withagen said the following: Well advises seem to vary, and the latest I understood was that 8.1-stable did not need any tuning. (The other system with a much older kernel is tuned as to what most here are suggesting) And I was shure led to believe that even since 8.0 panics were no longer among us.. Well, now you have demonstrated yourself that it is not always so. vm.kmem_size vfs.zfs.arc_max real memory = 12889096192 (12292 MB) avail memory = 12408684544 (11833 MB) So that prompts vm.kmem_size=18G. Form the other post: As to arc_max/arc_min, set them based your needs according to general ZFS recommendations. I'm seriously at a loss what general recommendations would be. Have you asked Mr. Google? :) - http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide Search for Memory and Dynamic Reconfiguration Recommendation - http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Limiting_the_ARC_Cache Short version - decide how much memory you need for everything else but ZFS ARC. If autotuned value suits you, then you don't need to change anything. The other box has 8G loader.conf: vm.kmem_size=14G # 2* phys RAM size for ZFS perf. vm.kmem_size_scale=1 No need to set both of the above. vm.kmem_size overrides vm.kmem_size_scale. vfs.zfs.arc_min=1G vfs.zfs.arc_max=6G So I'd select something like 11G for arc_max on a box with 12G mem. -- Andriy Gapon ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Still getting kmem exhausted panic
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 28-9-2010 15:46, Andriy Gapon wrote: on 28/09/2010 16:25 Willem Jan Withagen said the following: Well advises seem to vary, and the latest I understood was that 8.1-stable did not need any tuning. (The other system with a much older kernel is tuned as to what most here are suggesting) And I was shure led to believe that even since 8.0 panics were no longer among us.. Well, now you have demonstrated yourself that it is not always so. I thought I should share the knowledge. ;) Which is not a bad thing ofr those (starting to) use ZFS. I do not read commits, but do read a lot of FreeBSD groups. And for me there is still a shroud of black art over ZFS. Just glad that my main fileserver doesn't crash. (knock on wood). vm.kmem_size vfs.zfs.arc_max real memory = 12889096192 (12292 MB) avail memory = 12408684544 (11833 MB) So that prompts vm.kmem_size=18G. Form the other post: As to arc_max/arc_min, set them based your needs according to general ZFS recommendations. I'm seriously at a loss what general recommendations would be. Have you asked Mr. Google? :) - http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide Search for Memory and Dynamic Reconfiguration Recommendation - http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Limiting_the_ARC_Cache Short version - decide how much memory you need for everything else but ZFS ARC. If autotuned value suits you, then you don't need to change anything. I do have (read) this document, but still that doesn't really give you guidelines for tuning on FreeBSD. It is a fileserver without any serious other apps. I was using auto-tuned, and that crashed my box. That is what started this whole thread. - --WjW -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (MingW32) iQEcBAEBAgAGBQJMofVwAAoJEP4k4K6R6rBhFaUH/3wahrGWO71+xBhHi/ayNoaf DfbOWMD262XfualJudPRgoji7xb9lGaRmd4emv7QBcDjqzmcsiyIeXskT5IYKj7P DvJDULIH66iKQrRZeIBouMXMhLfiLjjT85Lj1hE8fuGg8NAOv97dnUwvVIwC0/Ai yzeeEHYivCYbRmzBhISlAWjdpSXk7xVs6gZnaLUUp953+Uv/8KmNLeG+laoWn+Hn wdKHUG3kR0g/XwJIMc5dZzYvs2kdDPh47uLythoYGC0yaLCwtxLHqEGIPtb/Gypy nIIWxOGtueJo2HjpS0+HlX/pTRW8tfYzXTzKgFKDd90t9fDt2p18BPSexuJSLVc= =hSAg -END PGP SIGNATURE- ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Still getting kmem exhausted panic
on 28/09/2010 17:02 Willem Jan Withagen said the following: I do have (read) this document, but still that doesn't really give you guidelines for tuning on FreeBSD. It is a fileserver without any serious other apps. I was using auto-tuned, and that crashed my box. That is what started this whole thread. Well, as I've said, in my opinion FreeBSD-specific tuning ends at setting kmem size. -- Andriy Gapon ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Still getting kmem exhausted panic
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 28-9-2010 16:07, Andriy Gapon wrote: on 28/09/2010 17:02 Willem Jan Withagen said the following: I do have (read) this document, but still that doesn't really give you guidelines for tuning on FreeBSD. It is a fileserver without any serious other apps. I was using auto-tuned, and that crashed my box. That is what started this whole thread. Well, as I've said, in my opinion FreeBSD-specific tuning ends at setting kmem size. I consider that a useful statement. - --WjW -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (MingW32) iQEcBAEBAgAGBQJMofb9AAoJEP4k4K6R6rBhqaUH/iFd1GG/pGLEKY+savwCRQDA iitWtiBnUVfscP3Cfy81Mrg0m3SNik+lgRD2ywC03jsE+6sJbExuw52G46RjpExc EleJZTW74KvbLHBnVQd+gWUoULKfGx4sZSBuYlkFpANhbrucpYmyPftbpFzmpD7N IOeeY6H7iOa4vnb03DLYY0iErL+ak8NtiSKqYTLYqDA/UWqVfOsvdcRbywrMIOoV JoaoD+65ZQpFYkugiFr7/BtcxXA9GJNpsUI+vIADbDgr77XmhKfu0ky4/Ci5f/L9 8YbEzhobOtRBTjX4/JAl60ZC2ToPwyZ8F4Al7Kj8r7FJnpnhddw7XlVXqEouJxQ= =X2gD -END PGP SIGNATURE- ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Still getting kmem exhausted panic
on 28/09/2010 17:09 Willem Jan Withagen said the following: On 28-9-2010 16:07, Andriy Gapon wrote: on 28/09/2010 17:02 Willem Jan Withagen said the following: I do have (read) this document, but still that doesn't really give you guidelines for tuning on FreeBSD. It is a fileserver without any serious other apps. I was using auto-tuned, and that crashed my box. That is what started this whole thread. Well, as I've said, in my opinion FreeBSD-specific tuning ends at setting kmem size. I consider that a useful statement. Hm, looks like I've just given a bad advice. It seems that auto-tuned arc_max is based on kmem size. So if you use kmem size that is larger than available physical memory, then you better limit arc_max to the available memory minus 1GB or so, if the autotuned value is larger than that. I think this needs to be fixed in the code. -- Andriy Gapon ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Still getting kmem exhausted panic
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 28-9-2010 16:25, Andriy Gapon wrote: on 28/09/2010 17:09 Willem Jan Withagen said the following: On 28-9-2010 16:07, Andriy Gapon wrote: on 28/09/2010 17:02 Willem Jan Withagen said the following: I do have (read) this document, but still that doesn't really give you guidelines for tuning on FreeBSD. It is a fileserver without any serious other apps. I was using auto-tuned, and that crashed my box. That is what started this whole thread. Well, as I've said, in my opinion FreeBSD-specific tuning ends at setting kmem size. I consider that a useful statement. Hm, looks like I've just given a bad advice. It seems that auto-tuned arc_max is based on kmem size. So if you use kmem size that is larger than available physical memory, then you better limit arc_max to the available memory minus 1GB or so, if the autotuned value is larger than that. I think this needs to be fixed in the code. So in my case (no other serious apps) with 12G phys mem: vm.kmem_size=17G vfs.zfs.arc_max=11G - --WjW -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (MingW32) iQEcBAEBAgAGBQJMofv9AAoJEP4k4K6R6rBhrksH/0L7EP9oSi4hhITZTB0uIk8q 0IEKnc2ltnPUSFJXS9wP1r9iLzNFJJXGqrO1ZvZUFcJeXXwSzSjhD+zbd237yf/r f5nQ7yBNPd7MxZlZjDkIXB9ZJYuE1u0KMfuQSxptzOWB7oin8MpXHa1YdX6CVE7A 3+hSykteHFFqs8qwUSzoUs47r0dW2WxXE2qAEurelL6VFn++K86d32F5WNv/SX4u aN43r+/CgrjiJVNrxG+gchoicEnIaI90jepkjzpEMp8M85VF4skIZbflZrSSNheY Wzi4LD2h8dFf/La+9EB5AYkMgRcTvXcgNkppIsZ94nf7oSyYNZFuxLYC3ilQetY= =WYzV -END PGP SIGNATURE- ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Still getting kmem exhausted panic
on 28/09/2010 17:30 Willem Jan Withagen said the following: So in my case (no other serious apps) with 12G phys mem: vm.kmem_size=17G vfs.zfs.arc_max=11G Should be good. -- Andriy Gapon ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: CPU time accounting broken on 8-STABLE machine after a few hours of uptime
Quoth Don Lewis on Monday, 27 September 2010: CPU time accounting is broken on one of my machines running 8-STABLE. I ran a test with a simple program that just loops and consumes CPU time: % time ./a.out 94.544u 0.000s 19:14.10 8.1% 62+2054k 0+0io 0pf+0w The display in top shows the process with WCPU at 100%, but TIME increments very slowly. Several hours after booting, I got a bunch of calcru: runtime went backwards messages, but they stopped right away and never appeared again. Aug 23 13:40:07 scratch ntpd[1159]: ntpd 4.2.4p5-a (1) Aug 23 13:43:18 scratch ntpd[1160]: kernel time sync status change 2001 Aug 23 18:05:57 scratch dbus-daemon: [system] Reloaded configuration Aug 23 18:06:16 scratch dbus-daemon: [system] Reloaded configuration Aug 23 18:12:40 scratch ntpd[1160]: time reset +18.059948 s [snip] Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 6836685136 usec to 5425839798 usec for pid 1526 (csh) Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 4747 usec to 2403 usec for pid 1519 (csh) Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 5265 usec to 2594 usec for pid 1494 (hald-addon-mouse-sy) Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 7818 usec to 3734 usec for pid 1488 (console-kit-daemon) Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 977 usec to 459 usec for pid 1480 (getty) Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 958 usec to 450 usec for pid 1479 (getty) Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 957 usec to 449 usec for pid 1478 (getty) Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 952 usec to 447 usec for pid 1477 (getty) Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 959 usec to 450 usec for pid 1476 (getty) Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 975 usec to 458 usec for pid 1475 (getty) Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 1026 usec to 482 usec for pid 1474 (getty) Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 1333 usec to 626 usec for pid 1473 (getty) Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 2469 usec to 1160 usec for pid 1440 (inetd) Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 719 usec to 690 usec for pid 1402 (sshd) Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 120486 usec to 56770 usec for pid 1360 (cupsd) Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 6204 usec to 2914 usec for pid 1289 (dbus-daemon) Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 179 usec to 84 usec for pid 1265 (moused) Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 22156 usec to 10407 usec for pid 1041 (nfsd) Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 1292 usec to 607 usec for pid 1032 (mountd) Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 8801 usec to 4134 usec for pid 664 (devd) Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 19 usec to 9 usec for pid 9 (sctp_iterator) If I reboot and run the test again, the CPU time accounting seems to be working correctly. % time ./a.out 1144.226u 0.000s 19:06.62 99.7% 5+168k 0+0io 0pf+0w snip I notice that before the calcru messages, ntpd reset the clock by 18 seconds -- that probably accounts for that. I don't know if that has any connection to time(1) running slower -- but perhaps ntpd is aggressively adjusting your clock? -- Sterling (Chip) Camden| sterl...@camdensoftware.com | 2048D/3A978E4F http://camdensoftware.com | http://chipstips.com| http://chipsquips.com pgpTvB0c4vm5i.pgp Description: PGP signature
Re: Still getting kmem exhausted panic
On Sep 28, 2010, at 9:36 AM, Andriy Gapon wrote: Well, no time for me to dig through all that history. arc_max should be a hard limit and it is now. If it ever wasn't then it was a bug. I believe the size of the arc could exceed the limit if your working set was larger than arc_max. The arc can't (couldn't then, anyway) evict data that is still referenced. A contributing factor at the time was that the page daemon did not take into account back pressure from the arc when deciding which pages to move from active to inactive, etc. So data was more likely to be referenced and therefore forced to remain in the arc. I'm not sure if this is still the current state. I seem to remember some changesets mentioning arc back pressure at some point, but I don't know the details. - Ben___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Still getting kmem exhausted panic
on 28/09/2010 18:50 Ben Kelly said the following: On Sep 28, 2010, at 9:36 AM, Andriy Gapon wrote: Well, no time for me to dig through all that history. arc_max should be a hard limit and it is now. If it ever wasn't then it was a bug. I believe the size of the arc could exceed the limit if your working set was larger than arc_max. The arc can't (couldn't then, anyway) evict data that is still referenced. I think that you are correct and I was wrong. ARC would still allocate a new buffer even if it's at or above arc_max and can not re-use any exisiting buffer. But I think that this is more likely to happen with tiny ARC size. I have hard time imagining a workload at which gigabytes of data would be simultaneously and continuously used (see below for definition of used). A contributing factor at the time was that the page daemon did not take into account back pressure from the arc when deciding which pages to move from active to inactive, etc. So data was more likely to be referenced and therefore forced to remain in the arc. I don't think that this is what happened and I don't think that pagedaemon has anything to do with the discussed issue. I think that ARC buffers exist independently of pagedaemon and page cache. I think that they are held only during time when I/O is happening to or from them. I'm not sure if this is still the current state. I seem to remember some changesets mentioning arc back pressure at some point, but I don't know the details. I think that backpressure has nothing to do with it. If ZFS truly does I/O with all existing buffers and it needs a new buffer, then the choices are limited: either block and wait, or go over the limit. Apparently ZFS designers went with the latter option. But as I've said, for non-tiny ARC sizes it's hard to imagine such amount of parallel I/O that would tie all ARC buffers. Given the adaptive nature of ARC I still see it happening, but only when ARC size is near its minimum, not when it is at maximum. It seems that kstat.zfs.misc.arcstats.recycle_miss is a counter of allocations when ARC refused to grow and no existing buffer could be recycled, but this is not the same as going above ARC maximum size. BTW, such allocation over the limit could be considered as a form of memory pressure from ARC on the rest of the system. P.S. The code is in arc_get_data_buf(). -- Andriy Gapon ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Still getting kmem exhausted panic
On Sep 28, 2010, at 12:30 PM, Andriy Gapon wrote: on 28/09/2010 18:50 Ben Kelly said the following: On Sep 28, 2010, at 9:36 AM, Andriy Gapon wrote: Well, no time for me to dig through all that history. arc_max should be a hard limit and it is now. If it ever wasn't then it was a bug. I believe the size of the arc could exceed the limit if your working set was larger than arc_max. The arc can't (couldn't then, anyway) evict data that is still referenced. I think that you are correct and I was wrong. ARC would still allocate a new buffer even if it's at or above arc_max and can not re-use any exisiting buffer. But I think that this is more likely to happen with tiny ARC size. I have hard time imagining a workload at which gigabytes of data would be simultaneously and continuously used (see below for definition of used). A contributing factor at the time was that the page daemon did not take into account back pressure from the arc when deciding which pages to move from active to inactive, etc. So data was more likely to be referenced and therefore forced to remain in the arc. I don't think that this is what happened and I don't think that pagedaemon has anything to do with the discussed issue. I think that ARC buffers exist independently of pagedaemon and page cache. I think that they are held only during time when I/O is happening to or from them. Hmm. My server is currently idle with no I/O happening: kstat.zfs.misc.arcstats.c: 25165824 kstat.zfs.misc.arcstats.c_max: 46137344 kstat.zfs.misc.arcstats.size: 91863156 If what you say is true, this shouldn't happen, should it? This system is an i386 machine with kmem max at 800M and arc set to 40M. This is running head from April 6, 2010, so it is a bit old, though. At one point I had patches running on my system that triggered the pagedaemon based on arc load and it did allow me to keep my arc below the max. Or at least I thought it did. In any case, I've never really been able to wrap my head around the VFS layer and how it interacts with zfs. So I'm more than willing to believe I'm confused. Any insights are greatly appreciated. Thanks! - Ben___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: CPU time accounting broken on 8-STABLE machine after a few hours of uptime
On 28 Sep, Chip Camden wrote: Quoth Don Lewis on Monday, 27 September 2010: CPU time accounting is broken on one of my machines running 8-STABLE. I ran a test with a simple program that just loops and consumes CPU time: % time ./a.out 94.544u 0.000s 19:14.10 8.1% 62+2054k 0+0io 0pf+0w The display in top shows the process with WCPU at 100%, but TIME increments very slowly. Several hours after booting, I got a bunch of calcru: runtime went backwards messages, but they stopped right away and never appeared again. Aug 23 13:40:07 scratch ntpd[1159]: ntpd 4.2.4p5-a (1) Aug 23 13:43:18 scratch ntpd[1160]: kernel time sync status change 2001 Aug 23 18:05:57 scratch dbus-daemon: [system] Reloaded configuration Aug 23 18:06:16 scratch dbus-daemon: [system] Reloaded configuration Aug 23 18:12:40 scratch ntpd[1160]: time reset +18.059948 s [snip] Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 6836685136 usec to 5425839798 usec for pid 1526 (csh) Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 4747 usec to 2403 usec for pid 1519 (csh) Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 5265 usec to 2594 usec for pid 1494 (hald-addon-mouse-sy) Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 7818 usec to 3734 usec for pid 1488 (console-kit-daemon) Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 977 usec to 459 usec for pid 1480 (getty) Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 958 usec to 450 usec for pid 1479 (getty) Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 957 usec to 449 usec for pid 1478 (getty) Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 952 usec to 447 usec for pid 1477 (getty) Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 959 usec to 450 usec for pid 1476 (getty) Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 975 usec to 458 usec for pid 1475 (getty) Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 1026 usec to 482 usec for pid 1474 (getty) Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 1333 usec to 626 usec for pid 1473 (getty) Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 2469 usec to 1160 usec for pid 1440 (inetd) Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 719 usec to 690 usec for pid 1402 (sshd) Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 120486 usec to 56770 usec for pid 1360 (cupsd) Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 6204 usec to 2914 usec for pid 1289 (dbus-daemon) Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 179 usec to 84 usec for pid 1265 (moused) Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 22156 usec to 10407 usec for pid 1041 (nfsd) Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 1292 usec to 607 usec for pid 1032 (mountd) Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 8801 usec to 4134 usec for pid 664 (devd) Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 19 usec to 9 usec for pid 9 (sctp_iterator) If I reboot and run the test again, the CPU time accounting seems to be working correctly. % time ./a.out 1144.226u 0.000s 19:06.62 99.7% 5+168k 0+0io 0pf+0w snip I notice that before the calcru messages, ntpd reset the clock by 18 seconds -- that probably accounts for that. Interesting observation. Since this happened so early in the log, I thought that this time change was the initial time change after boot, but taking a closer look, the time change occurred about 4 1/2 hours after boot. The calcru messages occured another 5 1/2 hours after that. I also just noticed that this log info was from the August 23rd kernel, before I noticed the CPU time accounting problem, and not the latest occurance. Here's the latest log info: Sep 23 16:33:50 scratch ntpd[1144]: ntpd 4.2.4p5-a (1) Sep 23 16:37:03 scratch ntpd[1145]: kernel time sync status change 2001 Sep 23 17:43:47 scratch ntpd[1145]: time reset +276.133928 s Sep 23 17:43:47 scratch ntpd[1145]: kernel time sync status change 6001 Sep 23 17:47:15 scratch ntpd[1145]: kernel time sync status change 2001 Sep 23 19:02:48 scratch ntpd[1145]: time reset +291.507262 s Sep 23 19:02:48 scratch ntpd[1145]: kernel time sync status change 6001 Sep 23 19:06:37 scratch ntpd[1145]: kernel time sync status change 2001 Sep 24 00:03:36 scratch kernel: calcru: runtime went backwards from 1120690857 u sec to 367348485 usec for pid 1518 (csh) Sep 24 00:03:36 scratch kernel: calcru: runtime went backwards from 5403 usec to 466 usec for pid 1477 (hald-addon-mouse-sy) Sep 24 00:03:36 scratch kernel: calcru: runtime went backwards from 7511 usec to 1502 usec for pid 1472 (hald-runner) Sep 24 00:03:36
Re: Still getting kmem exhausted panic
on 28/09/2010 19:46 Ben Kelly said the following: Hmm. My server is currently idle with no I/O happening: kstat.zfs.misc.arcstats.c: 25165824 kstat.zfs.misc.arcstats.c_max: 46137344 kstat.zfs.misc.arcstats.size: 91863156 If what you say is true, this shouldn't happen, should it? This system is an i386 machine with kmem max at 800M and arc set to 40M. This is running head from April 6, 2010, so it is a bit old, though. Well, your system is a bit old indeed. And the branch is unknown, so I can't really see what sources you have. And I am not sure if I'll be able to say anything about those sources. As to the numbers - yes, with current code I'd expect arcstats.size to go down to arcstats.c when there is no I/O. arc_reclaim_thread should do that. At one point I had patches running on my system that triggered the pagedaemon based on arc load and it did allow me to keep my arc below the max. Or at least I thought it did. In any case, I've never really been able to wrap my head around the VFS layer and how it interacts with zfs. So I'm more than willing to believe I'm confused. Any insights are greatly appreciated. ARC is a ZFS private cache. ZFS doesn't use unified buffer/page cache. So ARC is not directly affected by pagedaemon. But this is not exactly VFS layer thing. -- Andriy Gapon ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Still getting kmem exhausted panic
on 28/09/2010 20:17 Andriy Gapon said the following: on 28/09/2010 19:46 Ben Kelly said the following: If what you say is true, this shouldn't happen, should it? This system is an i386 machine with kmem max at 800M and arc set to 40M. This is running head from April 6, 2010, so it is a bit old, though. Well, your system is a bit old indeed. And the branch is unknown, so I can't really see what sources you have. Apologies, missed head in your description of the system. -- Andriy Gapon ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: CPU time accounting broken on 8-STABLE machine after a few hours of uptime
On Tue, Sep 28, 2010 at 10:15:34AM -0700, Don Lewis wrote: My time source is another FreeBSD box with a GPS receiver on my LAN. My other client machine isn't seeing these time jumps. The only messages from ntp in its log from this period are these: Sep 23 04:12:23 mousie ntpd[]: kernel time sync status change 6001 Sep 23 04:29:29 mousie ntpd[]: kernel time sync status change 2001 Sep 24 03:55:24 mousie ntpd[]: kernel time sync status change 6001 Sep 24 04:12:28 mousie ntpd[]: kernel time sync status change 2001 I'm speaking purely about ntpd below this point -- almost certainly a separate problem/issue, but I'll explain it anyway. I'm not under the impression that the calcru messages indicate RTC clock drift, but I'd need someone like John Baldwin to validate my statement. Back to ntpd: you can addressing the above messages by adding maxpoll 9 to your server lines in ntp.conf. The comment we use in our ntp.conf that documents the well-known problem: # maxpoll 9 is used to work around PLL/FLL flipping, which happens at # exactly 1024 seconds (the default maxpoll value). Another FreeBSD # user recommended using 9 instead: # http://lists.freebsd.org/pipermail/freebsd-stable/2006-December/031512.html I don't know if that has any connection to time(1) running slower -- but perhaps ntpd is aggressively adjusting your clock? It seems to be pretty stable when the machine is idle: % ntpq -c pe remote refid st t when poll reach delay offset jitter == *gw.catspoiler.o .GPS.1 u8 64 3770.168 -0.081 0.007 Not too much degradation under CPU load: % ntpq -c pe remote refid st t when poll reach delay offset jitter == *gw.catspoiler.o .GPS.1 u 40 64 3770.166 -0.156 0.026 I/O (dd if=/dev/ad6 of=/dev/null bs=512) doesn't appear to bother it much, either. % ntpq -c pe remote refid st t when poll reach delay offset jitter == *gw.catspoiler.o .GPS.1 u 35 64 3770.169 -0.106 0.009 Still speaking purely about ntpd: The above doesn't indicate a single problem. The deltas shown in both delay, offset, and jitter are all 100% legitimate. A dd (to induce more interrupt use) isn't going to exacerbate the problem (depending on your system configuration, IRQ setup, local APIC, etc.). How about writing a small shell script that runs every minute in a cronjob that does vmstat -i /some/file.log? Then when you see calcru messages, look around the time frame where vmstat -i was run. Look for high interrupt rates, aside from those associated with cpuX devices. Next, you need to let ntpd run for quite a bit longer than what you did above. Your poll maximum is only 64, indicating ntpd had recently been restarted, or that your offset deviates greatly (my guess is ntpd being restarted). poll will increase over time (64, 128, 256, 512, and usually max out at 1024), depending on how stable the clock is. when is a counter that increments, and does clock syncing (if needed) once it reaches poll. You'd see unstable system clock indications in your syslog as well (indicated by actual +/- clock drift lines occurring regularly. These aren't the same as 2001/6001 PLL/FLL mode flipping). Sorry if this is a bit paragraph/much to take in. You might also try stopping ntpd, removing /var/db/ntpd.drift, and restarting ntpd -- then check back in about 48 hours (no I'm not kidding). This is especially necessary if you've replaced the motherboard or taken the disks from System A and stuck them in System B. All that said: I'm not convinced ntpd has anything to do with your problem. EIST or EIST-like capabilities (such as Cool'n'Quiet) are often the source of the problem. device cpufreq might solve your issue entirely, hard to say. -- | Jeremy Chadwick j...@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
fetch: Non-recoverable resolver failure
Hi, we are using fetch command from cron to run PHP scripts periodically and sometimes cron sends error e-mails like this: fetch: https://hiden.example.com/cron/fiveminutes: Non-recoverable resolver failure The exact lines from crontab are: */5 * * * * fetch -qo /dev/null https://hiden.example.com/cron/fiveminutes; */5 * * * * fetch -qo /dev/null http://another.example.com/wd.php?hash=cslhakjs87LJ3rysalj79; Network is working without problems, resolvers are working fine too. I also tried to use local instance of named at 127.0.0.1 but it did not fix the issue so it seems there is some problem with fetch in phase of resolving address. Note: target domains are hosted on the server it-self and named too. The system is FreeBSD 7.3-RELEASE-p2 i386 GENERIC Can somebody help me to diagnose this random fetch+resolver issue? Miroslav Lachman ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Still getting kmem exhausted panic
On Sep 28, 2010, at 1:17 PM, Andriy Gapon wrote: on 28/09/2010 19:46 Ben Kelly said the following: Hmm. My server is currently idle with no I/O happening: kstat.zfs.misc.arcstats.c: 25165824 kstat.zfs.misc.arcstats.c_max: 46137344 kstat.zfs.misc.arcstats.size: 91863156 If what you say is true, this shouldn't happen, should it? This system is an i386 machine with kmem max at 800M and arc set to 40M. This is running head from April 6, 2010, so it is a bit old, though. Well, your system is a bit old indeed. And the branch is unknown, so I can't really see what sources you have. And I am not sure if I'll be able to say anything about those sources. Quite old. I've been intending to update, but haven't found the time lately. I'll try to do the upgrade this weekend and see if it changes anything. As to the numbers - yes, with current code I'd expect arcstats.size to go down to arcstats.c when there is no I/O. arc_reclaim_thread should do that. Thats what I thought as well, but when I debugged it a year or two ago I found that the buffers were still referenced and thus could not be reclaimed. As far as I can remember they needed a vfs/vnops like zfs_vnops_inactive or zfs_vnops_reclaim to be executed in order to free the reference. What is responsible for making those calls? At one point I had patches running on my system that triggered the pagedaemon based on arc load and it did allow me to keep my arc below the max. Or at least I thought it did. In any case, I've never really been able to wrap my head around the VFS layer and how it interacts with zfs. So I'm more than willing to believe I'm confused. Any insights are greatly appreciated. ARC is a ZFS private cache. ZFS doesn't use unified buffer/page cache. So ARC is not directly affected by pagedaemon. But this is not exactly VFS layer thing. Can you explain the difference in how the vfs/vnode operations are called or used for those two situations? I thought that the buffer cache was used by filesystems to implement these operations. So that the buffer cache was below the vfs/vnops layer. So while zfs implemented its operations in terms of the arc, things like UFS implemented vfs/vnops in terms of the buffer cache. I thought the layers further up the chain like the page daemon did not distinguish that much between these two implementation due to the VFS interface layer. (Although there seems to be a layering violation in that the buffer cache signals directly to the upper page daemon layer to trigger page reclamation.) The old (ancient) patch I tried previously to help reduce the arc working set and allow it to shrink is here: http://www.wanderview.com/svn/public/misc/zfs/zfs_kmem_limit.diff Unfortunately, there are a couple ideas on fighting fragmentation mixed into that patch. See the part about arc_reclaim_pages(). This patch did seem to allow my arc to stay under the target maximum even when under load that previously caused the system to exceed the maximum. When I update this weekend I'll try a stripped down version of the patch to see if it helps or not with the latest zfs. Thanks for your help in understanding this stuff! - Ben___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: fetch: Non-recoverable resolver failure
On Tue, Sep 28, 2010 at 08:12:00PM +0200, Miroslav Lachman wrote: Hi, we are using fetch command from cron to run PHP scripts periodically and sometimes cron sends error e-mails like this: fetch: https://hiden.example.com/cron/fiveminutes: Non-recoverable resolver failure The exact lines from crontab are: */5 * * * * fetch -qo /dev/null https://hiden.example.com/cron/fiveminutes; */5 * * * * fetch -qo /dev/null http://another.example.com/wd.php?hash=cslhakjs87LJ3rysalj79; Network is working without problems, resolvers are working fine too. I also tried to use local instance of named at 127.0.0.1 but it did not fix the issue so it seems there is some problem with fetch in phase of resolving address. Note: target domains are hosted on the server it-self and named too. The system is FreeBSD 7.3-RELEASE-p2 i386 GENERIC Can somebody help me to diagnose this random fetch+resolver issue? The error in question comes from the resolver library returning EAI_FAIL. This return code can be returned to all sorts of applications (not just fetch), although how each app handles it may differ. So, chances are you really do have something going on upstream from you (one of the nameservers you use might not be available at all times), and it probably clears very quickly (before you have a chance to manually/interactively investigate it). You're probably going to have to set up a combination of scripts that do tcpdump logging, and ktrace -t+ -i (and probably -a) logging (ex. ktrace -t+ -i -a -f /var/log/ktrace.fetch.out fetch -qo ...) to find out what's going on behind the scenes. The irregularity of the problem (re: sometimes) warrants such. I'd recommend using something other than 127.0.0.1 as your resolver if you need to do tcpdump. Providing contents of your /etc/resolv.conf, as well as details about your network configuration on the machine (specifically if any firewall stacks (pf or ipfw) are in place) would help too. Some folks might want netstat -m output as well. -- | Jeremy Chadwick j...@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
[releng_8 tinderbox] failure on i386/pc98
TB --- 2010-09-28 18:55:35 - tinderbox 2.6 running on freebsd-current.sentex.ca TB --- 2010-09-28 18:55:35 - starting RELENG_8 tinderbox run for i386/pc98 TB --- 2010-09-28 18:55:35 - cleaning the object tree TB --- 2010-09-28 18:58:07 - cvsupping the source tree TB --- 2010-09-28 18:58:07 - /usr/bin/csup -z -r 3 -g -L 1 -h cvsup5.freebsd.org /tinderbox/RELENG_8/i386/pc98/supfile TB --- 2010-09-28 19:04:25 - WARNING: /usr/bin/csup returned exit code 1 TB --- 2010-09-28 19:04:25 - ERROR: unable to cvsup the source tree TB --- 2010-09-28 19:04:25 - 2.27 user 163.56 system 530.08 real http://tinderbox.freebsd.org/tinderbox-releng_8-RELENG_8-i386-pc98.full ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: resume slow on Thinkpad T42 FreeBSD 8-STABLE
Jung-uk Kim wrote: - the mouse doesn't work until I restart moused manually I always use hint.psm.0.flags=0x6000 in /boot/loader.conf, i.e., turn on both HOOKRESUME and INITAFTERSUSPEND, to work around similar problem on different laptop. Yes, that helps (after the stall period). Can you please report other problems in the appropriate ML? em - freebsd-net@ usb -freebsd-usb@ acpi_ec -freebsd-acpi@ I will try to do so. I'm not sure about acpi_ec issue though; it's only a warning, and it doesn't cause me any troubles. I also have this kernel message once in a few hours (seemingly random) if I used sleep/resume before: MCA: Bank 1, Status 0xe20001f5 MCA: Global Cap 0x0005, Status 0x MCA: Vendor GenuineIntel, ID 0x695, APIC ID 0 MCA: CPU 0 UNCOR PCC OVER DCACHE L1 ??? error But once again, it doesn't really cause any problems. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: resume slow on Thinkpad T42 FreeBSD 8-STABLE
On Sep 28, 2010, at 12:57 PM, Vitaly Magerya wrote: I also have this kernel message once in a few hours (seemingly random) if I used sleep/resume before: MCA: Bank 1, Status 0xe20001f5 MCA: Global Cap 0x0005, Status 0x MCA: Vendor GenuineIntel, ID 0x695, APIC ID 0 MCA: CPU 0 UNCOR PCC OVER DCACHE L1 ??? error But once again, it doesn't really cause any problems. That is very likely to be a matter of luck. If I translate this MCA right, it looks to be an uncorrected error in L1 data cache on the CPU. Try to run something like prime95's torture test mode and see whether it fails overnight Regards, -- -Chuck ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: resume slow on Thinkpad T42 FreeBSD 8-STABLE
Chuck Swiger wrote: MCA: Bank 1, Status 0xe20001f5 MCA: Global Cap 0x0005, Status 0x MCA: Vendor GenuineIntel, ID 0x695, APIC ID 0 MCA: CPU 0 UNCOR PCC OVER DCACHE L1 ??? error That is very likely to be a matter of luck. If I translate this MCA right, it looks to be an uncorrected error in L1 data cache on the CPU. Try to run something like prime95's torture test mode and see whether it fails overnight OK, started the test (it's math/mprime, for those who wonder). ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: fetch: Non-recoverable resolver failure
Jeremy Chadwick wrote: On Tue, Sep 28, 2010 at 08:12:00PM +0200, Miroslav Lachman wrote: Hi, we are using fetch command from cron to run PHP scripts periodically and sometimes cron sends error e-mails like this: fetch: https://hiden.example.com/cron/fiveminutes: Non-recoverable resolver failure [...] Note: target domains are hosted on the server it-self and named too. The system is FreeBSD 7.3-RELEASE-p2 i386 GENERIC Can somebody help me to diagnose this random fetch+resolver issue? The error in question comes from the resolver library returning EAI_FAIL. This return code can be returned to all sorts of applications (not just fetch), although how each app handles it may differ. So, chances are you really do have something going on upstream from you (one of the nameservers you use might not be available at all times), and it probably clears very quickly (before you have a chance to manually/interactively investigate it). The strange thing is that I have only one nameserver listed in resolv.conf and it is the local one! (127.0.0.1) (there were two remote nameservers, but I tried to switch to local one to rule out remote nameservers / network problems) You're probably going to have to set up a combination of scripts that do tcpdump logging, and ktrace -t+ -i (and probably -a) logging (ex. ktrace -t+ -i -a -f /var/log/ktrace.fetch.out fetch -qo ...) to find out what's going on behind the scenes. The irregularity of the problem (re: sometimes) warrants such. I'd recommend using something other than 127.0.0.1 as your resolver if you need to do tcpdump. I will try it... there will be a lot of output as there are many cronjobs and relativelly high traffic on the webserver. But fetch resolver failure occurred only few times a day. Providing contents of your /etc/resolv.conf, as well as details about your network configuration on the machine (specifically if any firewall stacks (pf or ipfw) are in place) would help too. Some folks might want netstat -m output as well. There is nothing special in the network, the machine is Sun Fire X2100 M2 with bge1 NIC connected to Cisco Linksys switch (100Mbps port) with uplink (1Gbps port) connected to Cisco router with dual 10Gbps connectivity. No firewalls in the path. There are more than 10 other servers in the rack and we have no problems / error messages in logs from other services / daemons related to DNS. # cat /etc/resolv.conf nameserver 127.0.0.1 /# netstat -m 279/861/1140 mbufs in use (current/cache/total) 257/553/810/25600 mbuf clusters in use (current/cache/total/max) 257/313 mbuf+clusters out of packet secondary zone in use (current/cache) 5/306/311/12800 4k (page size) jumbo clusters in use (current/cache/total/max) 0/0/0/6400 9k jumbo clusters in use (current/cache/total/max) 0/0/0/3200 16k jumbo clusters in use (current/cache/total/max) 603K/2545K/3149K bytes allocated to network (current/cache/total) 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters) 0/0/0 requests for jumbo clusters denied (4k/9k/16k) 13/470/6656 sfbufs in use (current/peak/max) 0 requests for sfbufs denied 0 requests for sfbufs delayed 3351782 requests for I/O initiated by sendfile 0 calls to protocol drain routines (real IPs were replaced) # ifconfig bge1 bge1: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST metric 0 mtu 1500 options=9bRXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM ether 00:1e:68:2f:71:ab inet 1.2.3.40 netmask 0xff80 broadcast 1.2.3.127 inet 1.2.3.41 netmask 0x broadcast 1.2.3.41 inet 1.2.3.42 netmask 0x broadcast 1.2.3.42 media: Ethernet autoselect (100baseTX full-duplex) status: active NIC is: b...@pci0:6:4:1:class=0x02 card=0x534c108e chip=0x167814e4 rev=0xa3 hdr=0x00 vendor = 'Broadcom Corporation' device = 'BCM5715C 10/100/100 PCIe Ethernet Controller' class = network subclass = ethernet There is PF with some basic rules, mostly blocking incomming packets, allowing all outgoing and scrubbing: scrub in on bge1 all fragment reassemble scrub out on bge1 all no-df random-id min-ttl 24 max-mss 1492 fragment reassemble pass out on bge1 inet proto udp all keep state pass out on bge1 inet proto tcp from 1.2.3.40 to any flags S/SA modulate state pass out on bge1 inet proto tcp from 1.2.3.41 to any flags S/SA modulate state pass out on bge1 inet proto tcp from 1.2.3.42 to any flags S/SA modulate state modified PF options: set timeout { frag 15, interval 5 } set limit { frags 2500, states 5000 } set optimization aggressive set block-policy drop set loginterface bge1 # Let loopback and internal interface traffic flow without restrictions set skip on lo0 Thank you for your suggestions Miroslav Lachman ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to
Re: CPU time accounting broken on 8-STABLE machine after a few hours of uptime
On 28 Sep, Jeremy Chadwick wrote: On Tue, Sep 28, 2010 at 10:15:34AM -0700, Don Lewis wrote: My time source is another FreeBSD box with a GPS receiver on my LAN. My other client machine isn't seeing these time jumps. The only messages from ntp in its log from this period are these: Sep 23 04:12:23 mousie ntpd[]: kernel time sync status change 6001 Sep 23 04:29:29 mousie ntpd[]: kernel time sync status change 2001 Sep 24 03:55:24 mousie ntpd[]: kernel time sync status change 6001 Sep 24 04:12:28 mousie ntpd[]: kernel time sync status change 2001 I'm speaking purely about ntpd below this point -- almost certainly a separate problem/issue, but I'll explain it anyway. I'm not under the impression that the calcru messages indicate RTC clock drift, but I'd need someone like John Baldwin to validate my statement. I don't think the problems are directly related. I think the calcru messages get triggered by clcok frequency changes that get detected and change the tick to usec conversion ratio. Back to ntpd: you can addressing the above messages by adding maxpoll 9 to your server lines in ntp.conf. The comment we use in our ntp.conf that documents the well-known problem: Thanks I'll try that. # maxpoll 9 is used to work around PLL/FLL flipping, which happens at # exactly 1024 seconds (the default maxpoll value). Another FreeBSD # user recommended using 9 instead: # http://lists.freebsd.org/pipermail/freebsd-stable/2006-December/031512.html I don't know if that has any connection to time(1) running slower -- but perhaps ntpd is aggressively adjusting your clock? It seems to be pretty stable when the machine is idle: % ntpq -c pe remote refid st t when poll reach delay offset jitter == *gw.catspoiler.o .GPS.1 u8 64 3770.168 -0.081 0.007 Not too much degradation under CPU load: % ntpq -c pe remote refid st t when poll reach delay offset jitter == *gw.catspoiler.o .GPS.1 u 40 64 3770.166 -0.156 0.026 I/O (dd if=/dev/ad6 of=/dev/null bs=512) doesn't appear to bother it much, either. % ntpq -c pe remote refid st t when poll reach delay offset jitter == *gw.catspoiler.o .GPS.1 u 35 64 3770.169 -0.106 0.009 Still speaking purely about ntpd: The above doesn't indicate a single problem. The deltas shown in both delay, offset, and jitter are all 100% legitimate. A dd (to induce more interrupt use) isn't going to exacerbate the problem (depending on your system configuration, IRQ setup, local APIC, etc.). I was hoping to do something to provoke clock interrupt loss. I don't see any problems when this machine is idle. The last two times that the calcru messages have occured where when I booted this machine to build a bunch of ports. I don't see any problems when this machine is idle. Offset and jitter always look really good whenever I've looked. How about writing a small shell script that runs every minute in a cronjob that does vmstat -i /some/file.log? Then when you see calcru messages, look around the time frame where vmstat -i was run. Look for high interrupt rates, aside from those associated with cpuX devices. Ok, I'll give this a try. Just for reference, this is what is currently reported: % vmstat -i interrupt total rate irq0: clk 60683442 1000 irq1: atkbd0 6 0 irq8: rtc7765537127 irq9: acpi0 13 0 irq10: ohci0 ehci1+ 10275064169 irq11: fwohci0 ahc+ 132133 2 irq12: psm0 21 0 irq14: ata090982 1 irq15: nfe0 ata1 18363 0 I'm not sure why I'm getting USB interrupts. There aren't any USB devices plugged into this machine. # usbconfig dump_info ugen0.1: OHCI root HUB nVidia at usbus0, cfg=0 md=HOST spd=FULL (12Mbps) pwr=ON ugen1.1: EHCI root HUB nVidia at usbus1, cfg=0 md=HOST spd=HIGH (480Mbps) pwr=ON ugen2.1: OHCI root HUB nVidia at usbus2, cfg=0 md=HOST spd=FULL (12Mbps) pwr=ON ugen3.1: EHCI root HUB nVidia at usbus3, cfg=0 md=HOST spd=HIGH (480Mbps) pwr=ON Next, you need to let ntpd run for quite a bit longer than what you did above. Your poll maximum is only 64, indicating ntpd had recently been restarted, or that your offset deviates greatly (my guess is ntpd being restarted). poll will increase over time (64, 128, 256, 512, and usually max out at 1024), depending on how stable the clock is. when is a counter
Re: CPU time accounting broken on 8-STABLE machine after a few hours of uptime
On 28 Sep, Don Lewis wrote: % vmstat -i interrupt total rate irq0: clk 60683442 1000 irq1: atkbd0 6 0 irq8: rtc7765537127 irq9: acpi0 13 0 irq10: ohci0 ehci1+ 10275064169 irq11: fwohci0 ahc+ 132133 2 irq12: psm0 21 0 irq14: ata090982 1 irq15: nfe0 ata1 18363 0 I'm not sure why I'm getting USB interrupts. There aren't any USB devices plugged into this machine. Answer: irq 10 is also shared by vgapci0 and atapci1. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Still getting kmem exhausted panic
on 28/09/2010 21:40 Ben Kelly said the following: On Sep 28, 2010, at 1:17 PM, Andriy Gapon wrote: on 28/09/2010 19:46 Ben Kelly said the following: Hmm. My server is currently idle with no I/O happening: kstat.zfs.misc.arcstats.c: 25165824 kstat.zfs.misc.arcstats.c_max: 46137344 kstat.zfs.misc.arcstats.size: 91863156 If what you say is true, this shouldn't happen, should it? This system is an i386 machine with kmem max at 800M and arc set to 40M. This is running head from April 6, 2010, so it is a bit old, though. Well, your system is a bit old indeed. And the branch is unknown, so I can't really see what sources you have. And I am not sure if I'll be able to say anything about those sources. Quite old. I've been intending to update, but haven't found the time lately. I'll try to do the upgrade this weekend and see if it changes anything. As to the numbers - yes, with current code I'd expect arcstats.size to go down to arcstats.c when there is no I/O. arc_reclaim_thread should do that. Thats what I thought as well, but when I debugged it a year or two ago I found that the buffers were still referenced and thus could not be reclaimed. As far as I can remember they needed a vfs/vnops like zfs_vnops_inactive or zfs_vnops_reclaim to be executed in order to free the reference. What is responsible for making those calls? It's time that we should start showing each other places in code :) Because I don't think that that's how the code work. E.g. I look at how zfs_read() calls dmu_read_uio() which calls dmu_buf_hold_array() and dmu_buf_rele_array() around uimove() call. From what I see, dmu_buf_hold_array() calls dmu_buf_hold_array_by_dnode() calls dbuf_hold() calls arc_buf_add_ref() or arc_buf_alloc(). And conversely, dmu_buf_rele_array() calls dbuf_rele() calls arc_buf_remove_ref(). So, I am quite sure that ARC buffers are held/referenced only during ongoing I/O to or from them. Perhaps, on the other hand, you had in mind life-cycle of other things (not ARC buffers) that are accounted against ARC size (with type ARC_SPACE_OTHER)? Such as e.g. dmu_buf_impl_t-s allocated in dbuf_create(). I have to admit that I haven't investigated behavior of that part of ARC-assigned memory. It's only a small proportion (~10%) of the whole ARC size on my systems. At one point I had patches running on my system that triggered the pagedaemon based on arc load and it did allow me to keep my arc below the max. Or at least I thought it did. In any case, I've never really been able to wrap my head around the VFS layer and how it interacts with zfs. So I'm more than willing to believe I'm confused. Any insights are greatly appreciated. ARC is a ZFS private cache. ZFS doesn't use unified buffer/page cache. So ARC is not directly affected by pagedaemon. But this is not exactly VFS layer thing. Can you explain the difference in how the vfs/vnode operations are called or used for those two situations? They are called exactly the same. VFS layer and code above it are not aware of FS implementation details. I thought that the buffer cache was used by filesystems to implement these operations. So that the buffer cache was below the vfs/vnops layer. So Buffer cache works as part of unified VM and its buffers use the same pages as page cache does. while zfs implemented its operations in terms of the arc, things like UFS implemented vfs/vnops in terms of the buffer cache. I thought the layers Yes. Filesystems like UFS are sandwiched between buffer cache and page cache, which work in concert. Also, they don't (have to) implement their own buffer/page caching policies, because it's all managed by unified VM system. On the contrary, ZFS has its own private cache. So, first of all, its data may be cached in two places at once - page cache and ARC. And, because of that, some assumptions of the higher level code get violated, so ZFS has to jump through the hoops to meet those assumptions (e.g. see UIO_NOCOPY). further up the chain like the page daemon did not distinguish that much between these two implementation due to the VFS interface layer. (Although Right, but see above. there seems to be a layering violation in that the buffer cache signals directly to the upper page daemon layer to trigger page reclamation.) Umm, not sure if that is a fact. The old (ancient) patch I tried previously to help reduce the arc working set and allow it to shrink is here: http://www.wanderview.com/svn/public/misc/zfs/zfs_kmem_limit.diff Unfortunately, there are a couple ideas on fighting fragmentation mixed into that patch. See the part about arc_reclaim_pages(). This patch did seem to allow my arc to stay under the target maximum even when under load that previously caused the system to exceed the maximum. When I update this weekend I'll try a stripped down version of the patch to see if it helps or not with the latest zfs. Thanks for your help in
Re: Still getting kmem exhausted panic
On Sep 28, 2010, at 5:30 PM, Andriy Gapon wrote: snipped lots of good info here... probably won't have time to look at it in detail until the weekend there seems to be a layering violation in that the buffer cache signals directly to the upper page daemon layer to trigger page reclamation.) Umm, not sure if that is a fact. I was referring to the code in vfs_bio.c that used to twiddle vm_pageout_deficit directly. That seems to have been replaced with a call to vm_page_grab(). The old (ancient) patch I tried previously to help reduce the arc working set and allow it to shrink is here: http://www.wanderview.com/svn/public/misc/zfs/zfs_kmem_limit.diff Unfortunately, there are a couple ideas on fighting fragmentation mixed into that patch. See the part about arc_reclaim_pages(). This patch did seem to allow my arc to stay under the target maximum even when under load that previously caused the system to exceed the maximum. When I update this weekend I'll try a stripped down version of the patch to see if it helps or not with the latest zfs. Thanks for your help in understanding this stuff! The patch seems good, especially the part about taking into account the kmem fragmentation. But it also seems to be heavily tuned towards tiny ARC systems like yours, so I am not sure yet how suitable it is for mainstream systems. Thanks. Yea, there is a lot of aggressive tuning there. In particular, the slow growth algorithm is somewhat dubious. What I found, though, was that the fragmentation jumped whenever the arc was reduced in size, so it was an attempt to make the size slowly approach peak load without overshooting. A better long term solution would probably be to enhance UMA to support custom slab sizes on a zone-by-zone basis. That way all zfs/arc allocations can use slabs of 128k (at a memory efficiency penalty of course). I prototyped this with a dumbed down block pool allocator at one point and was able to avoid most, if not all, of the fragmentation. Adding the support to UMA seemed non-trivial, though. Thanks again for the information. I hope to get a chance to look at the code this weekend. - Ben___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Still getting kmem exhausted panic
on 29/09/2010 01:01 Ben Kelly said the following: Thanks. Yea, there is a lot of aggressive tuning there. In particular, the slow growth algorithm is somewhat dubious. What I found, though, was that the fragmentation jumped whenever the arc was reduced in size, so it was an attempt to make the size slowly approach peak load without overshooting. A better long term solution would probably be to enhance UMA to support custom slab sizes on a zone-by-zone basis. That way all zfs/arc allocations can use slabs of 128k (at a memory efficiency penalty of course). I prototyped this with a dumbed down block pool allocator at one point and was able to avoid most, if not all, of the fragmentation. Adding the support to UMA seemed non-trivial, though. BTW, have you seen my posts about UMA and ZFS on hackers@ ? I found it advantageous to use UMA for ZFS I/O buffers, but only after reducing size of per-CPU caches for the zones with large-sized items. I further modified the code in my local tree to completely disable per-CPU caches for items 32KB. -- Andriy Gapon ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Still getting kmem exhausted panic
Thanks for the clarification. I just wish I knew how vm.kmem_size_scale fit into the picture (meaning what it does, etc.). The sysctl description isn't very helpful. Again, my lack of VM knowledge... Roughly, vm.kmem_size would get set to available memory divided by vm.kmem_size_scale. http://lists.freebsd.org/pipermail/freebsd-stable/2010-September/059114.html Thanks again for the explication, i was amiss after the post above. So increasing kmem_size_scale will reduce the resulting kmem_size. /*correct me if i'm wrong - divided by triggered this post*/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
[releng_8 tinderbox] failure on mips/mips
TB --- 2010-09-28 19:44:25 - tinderbox 2.6 running on freebsd-current.sentex.ca TB --- 2010-09-28 19:44:25 - starting RELENG_8 tinderbox run for mips/mips TB --- 2010-09-28 19:44:25 - cleaning the object tree TB --- 2010-09-28 19:45:51 - cvsupping the source tree TB --- 2010-09-28 19:45:51 - /usr/bin/csup -z -r 3 -g -L 1 -h cvsup5.freebsd.org /tinderbox/RELENG_8/mips/mips/supfile TB --- 2010-09-28 19:50:26 - building world TB --- 2010-09-28 19:50:26 - MAKEOBJDIRPREFIX=/obj TB --- 2010-09-28 19:50:26 - PATH=/usr/bin:/usr/sbin:/bin:/sbin TB --- 2010-09-28 19:50:26 - TARGET=mips TB --- 2010-09-28 19:50:26 - TARGET_ARCH=mips TB --- 2010-09-28 19:50:26 - TZ=UTC TB --- 2010-09-28 19:50:26 - __MAKE_CONF=/dev/null TB --- 2010-09-28 19:50:26 - cd /src TB --- 2010-09-28 19:50:26 - /usr/bin/make -B buildworld World build started on Tue Sep 28 19:50:28 UTC 2010 Rebuilding the temporary build tree stage 1.1: legacy release compatibility shims stage 1.2: bootstrap tools stage 2.1: cleaning up the object tree stage 2.2: rebuilding the object tree stage 2.3: build tools stage 3: cross tools stage 4.1: building includes stage 4.2: building libraries stage 4.3: make dependencies stage 4.4: building everything [...] /obj/mips/src/tmp/usr/bin/ld: BFD 2.15 [FreeBSD] 2004-05-23 assertion fail /src/gnu/usr.bin/binutils/libbfd/../../../../contrib/binutils/bfd/elfxx-mips.c:1899 /obj/mips/src/tmp/usr/bin/ld: BFD 2.15 [FreeBSD] 2004-05-23 assertion fail /src/gnu/usr.bin/binutils/libbfd/../../../../contrib/binutils/bfd/elfxx-mips.c:1902 /obj/mips/src/tmp/usr/bin/ld: BFD 2.15 [FreeBSD] 2004-05-23 assertion fail /src/gnu/usr.bin/binutils/libbfd/../../../../contrib/binutils/bfd/elfxx-mips.c:1899 /obj/mips/src/tmp/usr/bin/ld: BFD 2.15 [FreeBSD] 2004-05-23 assertion fail /src/gnu/usr.bin/binutils/libbfd/../../../../contrib/binutils/bfd/elfxx-mips.c:1902 /obj/mips/src/tmp/usr/bin/ld: BFD 2.15 [FreeBSD] 2004-05-23 assertion fail /src/gnu/usr.bin/binutils/libbfd/../../../../contrib/binutils/bfd/elfxx-mips.c:1899 /obj/mips/src/tmp/usr/bin/ld: BFD 2.15 [FreeBSD] 2004-05-23 assertion fail /src/gnu/usr.bin/binutils/libbfd/../../../../contrib/binutils/bfd/elfxx-mips.c:1902 /obj/mips/src/tmp/usr/bin/ld: BFD 2.15 [FreeBSD] 2004-05-23 assertion fail /src/gnu/usr.bin/binutils/libbfd/../../../../contrib/binutils/bfd/elfxx-mips.c:1899 /obj/mips/src/tmp/usr/bin/ld: BFD 2.15 [FreeBSD] 2004-05-23 assertion fail /src/gnu/usr.bin/binutils/libbfd/../../../../contrib/binutils/bfd/elfxx-mips.c:1902 *** Error code 1 Stop in /src/usr.bin/tftp. *** Error code 1 Stop in /src/usr.bin. *** Error code 1 Stop in /src. *** Error code 1 Stop in /src. *** Error code 1 Stop in /src. TB --- 2010-09-28 23:36:45 - WARNING: /usr/bin/make returned exit code 1 TB --- 2010-09-28 23:36:45 - ERROR: failed to build world TB --- 2010-09-28 23:36:45 - 2064.56 user 7803.58 system 13940.05 real http://tinderbox.freebsd.org/tinderbox-releng_8-RELENG_8-mips-mips.full ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: CPU time accounting broken on 8-STABLE machine after a few hours of uptime
On 28 Sep, Jeremy Chadwick wrote: Still speaking purely about ntpd: The above doesn't indicate a single problem. The deltas shown in both delay, offset, and jitter are all 100% legitimate. A dd (to induce more interrupt use) isn't going to exacerbate the problem (depending on your system configuration, IRQ setup, local APIC, etc.). How about writing a small shell script that runs every minute in a cronjob that does vmstat -i /some/file.log? Then when you see calcru messages, look around the time frame where vmstat -i was run. Look for high interrupt rates, aside from those associated with cpuX devices. Looking at the timestamps of things and comparing to my logs, I discovered that the last instance of ntp instability happened when I was running make index in /usr/ports. I tried it again with entertaining results. After a while, the machine became unresponsive. I was logged in over ssh and it stopped echoing keystrokes. In parallel I was running a script that echoed the date, the results of vmstat -i, and the results of ntpq -c pe. The latter showed jitter and offset going insane. Eventually make index finished and the machine was responsive again, but the time was way off and ntpd croaked because the necessary time correction was too large. Nothing else anomalous showed up in the logs. Hmn, about half an hour after ntpd died I started my CPU time accounting test and two minutes into that test I got a spew of calcru messages ... Tue Sep 28 14:52:27 PDT 2010 interrupt total rate irq0: clk 64077827999 irq1: atkbd0 26 0 irq8: rtc8199966127 irq9: acpi0 19 0 irq10: ohci0 ehci1+ 10356112161 irq11: fwohci0 ahc+ 132133 2 irq12: psm0 27 0 irq14: ata096064 1 irq15: nfe0 ata1 23350 0 Total 82885524 1293 remote refid st t when poll reach delay offset jitter == *gw.catspoiler.o .GPS.1 u 137 128 3770.1950.111 0.030 Tue Sep 28 14:53:27 PDT 2010 interrupt total rate irq0: clk 64137854999 irq1: atkbd0 26 0 irq8: rtc8207648127 irq9: acpi0 19 0 irq10: ohci0 ehci1+ 10360184161 irq11: fwohci0 ahc+ 132133 2 irq12: psm0 27 0 irq14: ata096154 1 irq15: nfe0 ata1 23379 0 Total 82957424 1293 remote refid st t when poll reach delay offset jitter == *gw.catspoiler.o .GPS.1 u 56 128 3770.1950.111 853895. Tue Sep 28 14:54:27 PDT 2010 interrupt total rate irq0: clk 64197881999 irq1: atkbd0 26 0 irq8: rtc8215329127 irq9: acpi0 21 0 irq10: ohci0 ehci1+ 10360777161 irq11: fwohci0 ahc+ 132133 2 irq12: psm0 27 0 irq14: ata096244 1 irq15: nfe0 ata1 23405 0 Total 83025843 1293 remote refid st t when poll reach delay offset jitter == *gw.catspoiler.o .GPS.1 u 116 128 3770.1950.111 853895. Tue Sep 28 14:55:27 PDT 2010 interrupt total rate irq0: clk 64257907999 irq1: atkbd0 26 0 irq8: rtc8223011127 irq9: acpi0 21 0 irq10: ohci0 ehci1+ 10360836161 irq11: fwohci0 ahc+ 132133 2 irq12: psm0 27 0 irq14: ata096334 1 irq15: nfe0 ata1 23424 0 Total 83093719 1292 remote refid st t when poll reach delay offset jitter == gw.catspoiler.o .GPS.1 u 48 128 3770.197 2259195 2091608 Tue Sep 28 14:56:27 PDT 2010 interrupt total rate irq0: clk 64317933999 irq1: atkbd0
Re: Still getting kmem exhausted panic
On Tue, Sep 28, 2010 at 3:22 PM, Andriy Gapon a...@icyb.net.ua wrote: BTW, have you seen my posts about UMA and ZFS on hackers@ ? I found it advantageous to use UMA for ZFS I/O buffers, but only after reducing size of per-CPU caches for the zones with large-sized items. I further modified the code in my local tree to completely disable per-CPU caches for items 32KB. Do you have updated patch disabling per-cpu caches for large items? I've just rebuilt FreeBSD-8 with your uma-2.diff (it needed r209050 from -head to compile) and so far things look good. I'll re-enable UMA for ZFS and see how it flies in a couple of days. --Artem ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: cpu timer issues
Jeremy Thanks for having a look. Nothing in loader.conf # cat /etc/sysctl.conf # Do not send RSTs for packets to closed ports net.inet.tcp.blackhole=2 # Do not send ICMP port unreach messages for closed ports net.inet.udp.blackhole=1 # Generate random IP_ID's net.inet.ip.random_id=1 # Breaks RFC1379, but nobody uses it anyway net.inet.tcp.drop_synfin=1 net.inet.ip.redirect=1 net.inet.tcp.syncookies=1 net.inet.tcp.recvspace=65228 net.inet.tcp.sendspace=65228 # fastforwarding - see http://lists.freebsd.org/pipermail/freebsd-net/2004-January/002534.html net.inet.ip.fastforwarding=1 net.inet.tcp.delayed_ack=0 net.inet.udp.maxdgram=57344 kern.rndtest.verbose=0 net.link.bridge.pfil_onlyip=0 net.link.tap.user_open=1 # The system will attempt to calculate the bandwidth delay product for each connection and limit the amount of data queued to the network to just the amount required to maintain optimum throughput. net.inet.tcp.inflight.enable=1 net.inet.ip.portrange.first=1024 net.inet.ip.intr_queue_maxlen=1000 net.link.bridge.pfil_bridge=0 # Disable TCP extended debugging net.inet.tcp.log_debug=0 # Set a reasonable ICMPLimit net.inet.icmp.icmplim=500 # TSO causes problems with em(4) and reply-to, and isn't of much benefit in a firewall, disable. net.inet.tcp.tso=0 # kenv | grep smbios smbios.bios.reldate=12/19/2008 smbios.bios.vendor=Phoenix Technologies LTD smbios.bios.version=1.2a smbios.chassis.maker=Supermicro smbios.chassis.serial=0123456789 smbios.chassis.tag= smbios.chassis.version=0123456789 smbios.planar.maker=Supermicro smbios.planar.product=X7SBi-LN4 smbios.planar.serial=0123456789 smbios.planar.version=PCB Version smbios.socket.enabled=1 smbios.socket.populated=1 smbios.system.maker=Supermicro smbios.system.product=X7SBi-LN4 smbios.system.serial=0123456789 smbios.system.uuid=53d1a494-d663-a0e7-890b-8a0f00f08a0f smbios.system.version=0123456789 # sysctl kern.timecounter kern.timecounter.tick: 1 kern.timecounter.choice: TSC(-100) i8254(0) dummy(-100) kern.timecounter.hardware: i8254 kern.timecounter.stepwarnings: 0 kern.timecounter.tc.i8254.mask: 65535 kern.timecounter.tc.i8254.counter: 27546 kern.timecounter.tc.i8254.frequency: 1193182 kern.timecounter.tc.i8254.quality: 0 kern.timecounter.tc.TSC.mask: 4294967295 kern.timecounter.tc.TSC.counter: 1322201372 kern.timecounter.tc.TSC.frequency: 2926018304 kern.timecounter.tc.TSC.quality: -100 kern.timecounter.smp_tsc: 0 kern.timecounter.invariant_tsc: 0 Thanks Jurgen On 28/09/10 7:30 PM, Jeremy Chadwick wrote: Can you provide any tuning you do in loader.conf or sysctl.conf, as well as your kernel configuration? -- ish http://www.ish.com.au Level 1, 30 Wilson Street Newtown 2042 Australia phone +61 2 9550 5001 fax +61 2 9550 4001 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: cpu timer issues
Andriy You can find everything you are after here: http://pastebin.com/WH4V2W0F Thanks Jurgen On 28/09/10 8:07 PM, Andriy Gapon wrote: on 28/09/2010 10:54 Jurgen Weber said the following: # dmesg | grep Timecounter Timecounter i8254 frequency 1193182 Hz quality 0 Timecounters tick every 1.000 msec # sysctl kern.timecounter.hardware kern.timecounter.hardware: i8254 Only have one timer to choose from. Can you provide a little bit more of hard data than the above? Specifically, the following sysctls: kern.timecounter dev.cpu Output of vmstat -i. _Verbose_ boot dmesg. Please do not disable ACPI when taking this data. Preferably, upload it somewhere and post a link to it. -- -- ish http://www.ish.com.au Level 1, 30 Wilson Street Newtown 2042 Australia phone +61 2 9550 5001 fax +61 2 9550 4001 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: fetch: Non-recoverable resolver failure
On Tue, Sep 28, 2010 at 08:12:00PM +0200, Miroslav Lachman wrote: The exact lines from crontab are: */5 * * * * fetch -qo /dev/null https://hiden.example.com/cron/fiveminutes; */5 * * * * fetch -qo /dev/null http://another.example.com/wd.php?hash=cslhakjs87LJ3rysalj79; In addition to anything else, I suspect the question mark in double-quotes might cause some shell-related interpretation; perhaps single quotes will be safer... -- Brian Reichert reich...@numachi.com 55 Crystal Ave. #286 Derry NH 03038-1725 USA BSD admin/developer at large ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: cpu timer issues
Interesting, using systat everything looks fine. The interrupts hang around 2000. Thanks Jurgen On 28/09/10 8:33 PM, borislav nikolov wrote: Hello, vmsat -i calculates interrupt rate based on interrupt count/uptime, and the interrupt count is 32 bit integer. With high values of kern.hz it will overflow in few days (with kern.hz=4000 it will happen every 12 days or so). If that is the case, use systat -vmstat 1 to get accurate interrupt rate. That is just fyi, because i was confused once and it scared me abit, and i started changing counters untill i noticed this. p.s. please forgive my poor english -- -- ish http://www.ish.com.au Level 1, 30 Wilson Street Newtown 2042 Australia phone +61 2 9550 5001 fax +61 2 9550 4001 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: CPU time accounting broken on 8-STABLE machine after a few hours of uptime
On 28 Sep, Don Lewis wrote: Looking at the timestamps of things and comparing to my logs, I discovered that the last instance of ntp instability happened when I was running make index in /usr/ports. I tried it again with entertaining results. After a while, the machine became unresponsive. I was logged in over ssh and it stopped echoing keystrokes. In parallel I was running a script that echoed the date, the results of vmstat -i, and the results of ntpq -c pe. The latter showed jitter and offset going insane. Eventually make index finished and the machine was responsive again, but the time was way off and ntpd croaked because the necessary time correction was too large. Nothing else anomalous showed up in the logs. Hmn, about half an hour after ntpd died I started my CPU time accounting test and two minutes into that test I got a spew of calcru messages ... I tried this experiment again using a kernel with WITNESS and DEBUG_VFS_LOCKS compiled in, and pinging this machine from another. Things look normal for a while, then the ping times get huge for a while and then recover. 64 bytes from 192.168.101.3: icmp_seq=1169 ttl=64 time=0.135 ms 64 bytes from 192.168.101.3: icmp_seq=1170 ttl=64 time=0.141 ms 64 bytes from 192.168.101.3: icmp_seq=1171 ttl=64 time=0.130 ms 64 bytes from 192.168.101.3: icmp_seq=1172 ttl=64 time=0.131 ms 64 bytes from 192.168.101.3: icmp_seq=1173 ttl=64 time=0.128 ms 64 bytes from 192.168.101.3: icmp_seq=1174 ttl=64 time=38232.140 ms 64 bytes from 192.168.101.3: icmp_seq=1175 ttl=64 time=37231.309 ms 64 bytes from 192.168.101.3: icmp_seq=1176 ttl=64 time=36230.470 ms 64 bytes from 192.168.101.3: icmp_seq=1177 ttl=64 time=35229.632 ms 64 bytes from 192.168.101.3: icmp_seq=1178 ttl=64 time=34228.791 ms 64 bytes from 192.168.101.3: icmp_seq=1179 ttl=64 time=33227.953 ms 64 bytes from 192.168.101.3: icmp_seq=1180 ttl=64 time=32227.091 ms 64 bytes from 192.168.101.3: icmp_seq=1181 ttl=64 time=31226.262 ms 64 bytes from 192.168.101.3: icmp_seq=1182 ttl=64 time=30225.425 ms 64 bytes from 192.168.101.3: icmp_seq=1183 ttl=64 time=29224.597 ms 64 bytes from 192.168.101.3: icmp_seq=1184 ttl=64 time=28223.757 ms 64 bytes from 192.168.101.3: icmp_seq=1185 ttl=64 time=27222.918 ms 64 bytes from 192.168.101.3: icmp_seq=1186 ttl=64 time=26222.086 ms 64 bytes from 192.168.101.3: icmp_seq=1187 ttl=64 time=25221.164 ms 64 bytes from 192.168.101.3: icmp_seq=1188 ttl=64 time=24220.407 ms 64 bytes from 192.168.101.3: icmp_seq=1189 ttl=64 time=23219.575 ms 64 bytes from 192.168.101.3: icmp_seq=1190 ttl=64 time=22218.737 ms 64 bytes from 192.168.101.3: icmp_seq=1191 ttl=64 time=21217.905 ms 64 bytes from 192.168.101.3: icmp_seq=1192 ttl=64 time=20217.066 ms 64 bytes from 192.168.101.3: icmp_seq=1193 ttl=64 time=19216.228 ms 64 bytes from 192.168.101.3: icmp_seq=1194 ttl=64 time=18215.333 ms 64 bytes from 192.168.101.3: icmp_seq=1195 ttl=64 time=17214.503 ms 64 bytes from 192.168.101.3: icmp_seq=1196 ttl=64 time=16213.720 ms 64 bytes from 192.168.101.3: icmp_seq=1197 ttl=64 time=15210.912 ms 64 bytes from 192.168.101.3: icmp_seq=1198 ttl=64 time=14210.044 ms 64 bytes from 192.168.101.3: icmp_seq=1199 ttl=64 time=13209.194 ms 64 bytes from 192.168.101.3: icmp_seq=1200 ttl=64 time=12208.376 ms 64 bytes from 192.168.101.3: icmp_seq=1201 ttl=64 time=11207.536 ms 64 bytes from 192.168.101.3: icmp_seq=1202 ttl=64 time=10206.694 ms 64 bytes from 192.168.101.3: icmp_seq=1203 ttl=64 time=9205.816 ms 64 bytes from 192.168.101.3: icmp_seq=1204 ttl=64 time=8205.014 ms 64 bytes from 192.168.101.3: icmp_seq=1205 ttl=64 time=7204.186 ms 64 bytes from 192.168.101.3: icmp_seq=1206 ttl=64 time=6203.294 ms 64 bytes from 192.168.101.3: icmp_seq=1207 ttl=64 time=5202.510 ms 64 bytes from 192.168.101.3: icmp_seq=1208 ttl=64 time=4201.677 ms 64 bytes from 192.168.101.3: icmp_seq=1209 ttl=64 time=3200.851 ms 64 bytes from 192.168.101.3: icmp_seq=1210 ttl=64 time=2200.013 ms 64 bytes from 192.168.101.3: icmp_seq=1211 ttl=64 time=1199.100 ms 64 bytes from 192.168.101.3: icmp_seq=1212 ttl=64 time=198.331 ms 64 bytes from 192.168.101.3: icmp_seq=1213 ttl=64 time=0.129 ms 64 bytes from 192.168.101.3: icmp_seq=1214 ttl=64 time=58223.470 ms 64 bytes from 192.168.101.3: icmp_seq=1215 ttl=64 time=57222.637 ms 64 bytes from 192.168.101.3: icmp_seq=1216 ttl=64 time=56221.800 ms 64 bytes from 192.168.101.3: icmp_seq=1217 ttl=64 time=55220.960 ms 64 bytes from 192.168.101.3: icmp_seq=1218 ttl=64 time=54220.116 ms 64 bytes from 192.168.101.3: icmp_seq=1219 ttl=64 time=53219.282 ms 64 bytes from 192.168.101.3: icmp_seq=1220 ttl=64 time=52218.444 ms 64 bytes from 192.168.101.3: icmp_seq=1221 ttl=64 time=51217.618 ms 64 bytes from 192.168.101.3: icmp_seq=1222 ttl=64 time=50216.778 ms 64 bytes from 192.168.101.3: icmp_seq=1223 ttl=64 time=49215.932 ms 64 bytes from 192.168.101.3: icmp_seq=1224 ttl=64 time=48215.095 ms 64 bytes from 192.168.101.3: icmp_seq=1225 ttl=64 time=47214.262 ms 64 bytes from 192.168.101.3: icmp_seq=1226