Re: CPU time accounting broken on 8-STABLE machine after a few hours of uptime
On Tue, Sep 28, 2010 at 10:31:27PM -0700, Don Lewis wrote: On 28 Sep, Don Lewis wrote: Looking at the timestamps of things and comparing to my logs, I discovered that the last instance of ntp instability happened when I was running make index in /usr/ports. I tried it again with entertaining results. After a while, the machine became unresponsive. I was logged in over ssh and it stopped echoing keystrokes. In parallel I was running a script that echoed the date, the results of vmstat -i, and the results of ntpq -c pe. The latter showed jitter and offset going insane. Eventually make index finished and the machine was responsive again, but the time was way off and ntpd croaked because the necessary time correction was too large. Nothing else anomalous showed up in the logs. Hmn, about half an hour after ntpd died I started my CPU time accounting test and two minutes into that test I got a spew of calcru messages ... I tried this experiment again using a kernel with WITNESS and DEBUG_VFS_LOCKS compiled in, and pinging this machine from another. Things look normal for a while, then the ping times get huge for a while and then recover. 64 bytes from 192.168.101.3: icmp_seq=1169 ttl=64 time=0.135 ms 64 bytes from 192.168.101.3: icmp_seq=1170 ttl=64 time=0.141 ms 64 bytes from 192.168.101.3: icmp_seq=1171 ttl=64 time=0.130 ms 64 bytes from 192.168.101.3: icmp_seq=1172 ttl=64 time=0.131 ms 64 bytes from 192.168.101.3: icmp_seq=1173 ttl=64 time=0.128 ms 64 bytes from 192.168.101.3: icmp_seq=1174 ttl=64 time=38232.140 ms 64 bytes from 192.168.101.3: icmp_seq=1175 ttl=64 time=37231.309 ms 64 bytes from 192.168.101.3: icmp_seq=1176 ttl=64 time=36230.470 ms 64 bytes from 192.168.101.3: icmp_seq=1177 ttl=64 time=35229.632 ms 64 bytes from 192.168.101.3: icmp_seq=1178 ttl=64 time=34228.791 ms 64 bytes from 192.168.101.3: icmp_seq=1179 ttl=64 time=33227.953 ms 64 bytes from 192.168.101.3: icmp_seq=1180 ttl=64 time=32227.091 ms 64 bytes from 192.168.101.3: icmp_seq=1181 ttl=64 time=31226.262 ms 64 bytes from 192.168.101.3: icmp_seq=1182 ttl=64 time=30225.425 ms 64 bytes from 192.168.101.3: icmp_seq=1183 ttl=64 time=29224.597 ms 64 bytes from 192.168.101.3: icmp_seq=1184 ttl=64 time=28223.757 ms 64 bytes from 192.168.101.3: icmp_seq=1185 ttl=64 time=27222.918 ms 64 bytes from 192.168.101.3: icmp_seq=1186 ttl=64 time=26222.086 ms 64 bytes from 192.168.101.3: icmp_seq=1187 ttl=64 time=25221.164 ms 64 bytes from 192.168.101.3: icmp_seq=1188 ttl=64 time=24220.407 ms 64 bytes from 192.168.101.3: icmp_seq=1189 ttl=64 time=23219.575 ms 64 bytes from 192.168.101.3: icmp_seq=1190 ttl=64 time=22218.737 ms 64 bytes from 192.168.101.3: icmp_seq=1191 ttl=64 time=21217.905 ms 64 bytes from 192.168.101.3: icmp_seq=1192 ttl=64 time=20217.066 ms 64 bytes from 192.168.101.3: icmp_seq=1193 ttl=64 time=19216.228 ms 64 bytes from 192.168.101.3: icmp_seq=1194 ttl=64 time=18215.333 ms 64 bytes from 192.168.101.3: icmp_seq=1195 ttl=64 time=17214.503 ms 64 bytes from 192.168.101.3: icmp_seq=1196 ttl=64 time=16213.720 ms 64 bytes from 192.168.101.3: icmp_seq=1197 ttl=64 time=15210.912 ms 64 bytes from 192.168.101.3: icmp_seq=1198 ttl=64 time=14210.044 ms 64 bytes from 192.168.101.3: icmp_seq=1199 ttl=64 time=13209.194 ms 64 bytes from 192.168.101.3: icmp_seq=1200 ttl=64 time=12208.376 ms 64 bytes from 192.168.101.3: icmp_seq=1201 ttl=64 time=11207.536 ms 64 bytes from 192.168.101.3: icmp_seq=1202 ttl=64 time=10206.694 ms 64 bytes from 192.168.101.3: icmp_seq=1203 ttl=64 time=9205.816 ms 64 bytes from 192.168.101.3: icmp_seq=1204 ttl=64 time=8205.014 ms 64 bytes from 192.168.101.3: icmp_seq=1205 ttl=64 time=7204.186 ms 64 bytes from 192.168.101.3: icmp_seq=1206 ttl=64 time=6203.294 ms 64 bytes from 192.168.101.3: icmp_seq=1207 ttl=64 time=5202.510 ms 64 bytes from 192.168.101.3: icmp_seq=1208 ttl=64 time=4201.677 ms 64 bytes from 192.168.101.3: icmp_seq=1209 ttl=64 time=3200.851 ms 64 bytes from 192.168.101.3: icmp_seq=1210 ttl=64 time=2200.013 ms 64 bytes from 192.168.101.3: icmp_seq=1211 ttl=64 time=1199.100 ms 64 bytes from 192.168.101.3: icmp_seq=1212 ttl=64 time=198.331 ms 64 bytes from 192.168.101.3: icmp_seq=1213 ttl=64 time=0.129 ms 64 bytes from 192.168.101.3: icmp_seq=1214 ttl=64 time=58223.470 ms 64 bytes from 192.168.101.3: icmp_seq=1215 ttl=64 time=57222.637 ms 64 bytes from 192.168.101.3: icmp_seq=1216 ttl=64 time=56221.800 ms 64 bytes from 192.168.101.3: icmp_seq=1217 ttl=64 time=55220.960 ms 64 bytes from 192.168.101.3: icmp_seq=1218 ttl=64 time=54220.116 ms 64 bytes from 192.168.101.3: icmp_seq=1219 ttl=64 time=53219.282 ms 64 bytes from 192.168.101.3: icmp_seq=1220 ttl=64 time=52218.444 ms 64 bytes from 192.168.101.3: icmp_seq=1221 ttl=64 time=51217.618 ms 64 bytes from 192.168.101.3: icmp_seq=1222 ttl=64 time=50216.778 ms 64 bytes from 192.168.101.3: icmp_seq=1223 ttl=64 time=49215.932 ms 64 bytes from 192.168.101.3: icmp_seq=1224
Re: fetch: Non-recoverable resolver failure
On Tue, Sep 28, 2010 at 10:59:04PM +0200, Miroslav Lachman wrote: Jeremy Chadwick wrote: On Tue, Sep 28, 2010 at 08:12:00PM +0200, Miroslav Lachman wrote: Hi, we are using fetch command from cron to run PHP scripts periodically and sometimes cron sends error e-mails like this: fetch: https://hiden.example.com/cron/fiveminutes: Non-recoverable resolver failure [...] Note: target domains are hosted on the server it-self and named too. The system is FreeBSD 7.3-RELEASE-p2 i386 GENERIC Can somebody help me to diagnose this random fetch+resolver issue? The error in question comes from the resolver library returning EAI_FAIL. This return code can be returned to all sorts of applications (not just fetch), although how each app handles it may differ. So, chances are you really do have something going on upstream from you (one of the nameservers you use might not be available at all times), and it probably clears very quickly (before you have a chance to manually/interactively investigate it). The strange thing is that I have only one nameserver listed in resolv.conf and it is the local one! (127.0.0.1) (there were two remote nameservers, but I tried to switch to local one to rule out remote nameservers / network problems) You're probably going to have to set up a combination of scripts that do tcpdump logging, and ktrace -t+ -i (and probably -a) logging (ex. ktrace -t+ -i -a -f /var/log/ktrace.fetch.out fetch -qo ...) to find out what's going on behind the scenes. The irregularity of the problem (re: sometimes) warrants such. I'd recommend using something other than 127.0.0.1 as your resolver if you need to do tcpdump. I will try it... there will be a lot of output as there are many cronjobs and relativelly high traffic on the webserver. But fetch resolver failure occurred only few times a day. Providing contents of your /etc/resolv.conf, as well as details about your network configuration on the machine (specifically if any firewall stacks (pf or ipfw) are in place) would help too. Some folks might want netstat -m output as well. There is nothing special in the network, the machine is Sun Fire X2100 M2 with bge1 NIC connected to Cisco Linksys switch (100Mbps port) with uplink (1Gbps port) connected to Cisco router with dual 10Gbps connectivity. No firewalls in the path. There are more than 10 other servers in the rack and we have no problems / error messages in logs from other services / daemons related to DNS. # cat /etc/resolv.conf nameserver 127.0.0.1 /# netstat -m 279/861/1140 mbufs in use (current/cache/total) 257/553/810/25600 mbuf clusters in use (current/cache/total/max) 257/313 mbuf+clusters out of packet secondary zone in use (current/cache) 5/306/311/12800 4k (page size) jumbo clusters in use (current/cache/total/max) 0/0/0/6400 9k jumbo clusters in use (current/cache/total/max) 0/0/0/3200 16k jumbo clusters in use (current/cache/total/max) 603K/2545K/3149K bytes allocated to network (current/cache/total) 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters) 0/0/0 requests for jumbo clusters denied (4k/9k/16k) 13/470/6656 sfbufs in use (current/peak/max) 0 requests for sfbufs denied 0 requests for sfbufs delayed 3351782 requests for I/O initiated by sendfile 0 calls to protocol drain routines (real IPs were replaced) # ifconfig bge1 bge1: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST metric 0 mtu 1500 options=9bRXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM ether 00:1e:68:2f:71:ab inet 1.2.3.40 netmask 0xff80 broadcast 1.2.3.127 inet 1.2.3.41 netmask 0x broadcast 1.2.3.41 inet 1.2.3.42 netmask 0x broadcast 1.2.3.42 media: Ethernet autoselect (100baseTX full-duplex) status: active NIC is: b...@pci0:6:4:1:class=0x02 card=0x534c108e chip=0x167814e4 rev=0xa3 hdr=0x00 vendor = 'Broadcom Corporation' device = 'BCM5715C 10/100/100 PCIe Ethernet Controller' class = network subclass = ethernet There is PF with some basic rules, mostly blocking incomming packets, allowing all outgoing and scrubbing: scrub in on bge1 all fragment reassemble scrub out on bge1 all no-df random-id min-ttl 24 max-mss 1492 fragment reassemble pass out on bge1 inet proto udp all keep state pass out on bge1 inet proto tcp from 1.2.3.40 to any flags S/SA modulate state pass out on bge1 inet proto tcp from 1.2.3.41 to any flags S/SA modulate state pass out on bge1 inet proto tcp from 1.2.3.42 to any flags S/SA modulate state modified PF options: set timeout { frag 15, interval 5 } set limit { frags 2500, states 5000 } set optimization aggressive set block-policy drop set loginterface bge1 # Let loopback and internal interface traffic flow without restrictions set skip on lo0 Please also provide pfctl -s info output, in addition to uname -a output (you
Re: Still getting kmem exhausted panic
on 29/09/2010 03:38 Artem Belevich said the following: On Tue, Sep 28, 2010 at 3:22 PM, Andriy Gapon a...@icyb.net.ua wrote: BTW, have you seen my posts about UMA and ZFS on hackers@ ? I found it advantageous to use UMA for ZFS I/O buffers, but only after reducing size of per-CPU caches for the zones with large-sized items. I further modified the code in my local tree to completely disable per-CPU caches for items 32KB. Do you have updated patch disabling per-cpu caches for large items? I've just rebuilt FreeBSD-8 with your uma-2.diff (it needed r209050 from -head to compile) and so far things look good. I'll re-enable UMA for ZFS and see how it flies in a couple of days. I've just uploaded uma-3.diff. It implements what uma-1.diff did, plus totally skips per-CPU caches for items 32KB, and also has code from uma-2.diff for flushing per-CPU caches on significant memory shortage. Will appreciate your feedback. Thank you for testing! -- Andriy Gapon ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: cpu timer issues
on 29/09/2010 06:49 Jurgen Weber said the following: Andriy You can find everything you are after here: http://pastebin.com/WH4V2W0F Looks like this was with ACPI disabled? Can you try to re-enable it? Also, it doesn't look like the dmesg is verbose. On 28/09/10 8:07 PM, Andriy Gapon wrote: on 28/09/2010 10:54 Jurgen Weber said the following: # dmesg | grep Timecounter Timecounter i8254 frequency 1193182 Hz quality 0 Timecounters tick every 1.000 msec # sysctl kern.timecounter.hardware kern.timecounter.hardware: i8254 Only have one timer to choose from. Can you provide a little bit more of hard data than the above? Specifically, the following sysctls: kern.timecounter dev.cpu Output of vmstat -i. _Verbose_ boot dmesg. Please do not disable ACPI when taking this data. Preferably, upload it somewhere and post a link to it. -- Andriy Gapon ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: cpu timer issues
On Wed, Sep 29, 2010 at 01:49:39PM +1000, Jurgen Weber wrote: Andriy You can find everything you are after here: http://pastebin.com/WH4V2W0F The information provided here shows ACPI is disabled in addition to the boot not being verbose. -- | Jeremy Chadwick j...@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: CPU time accounting broken on 8-STABLE machine after a few hours of uptime
On 29 Sep, Jeremy Chadwick wrote: Given all the information here, in addition to the other portion of the thread (indicating ntpd reports extreme offset between the system clock and its stratum 1 source), I would say the motherboard is faulty or there is a system device which is behaving badly (possibly something pertaining to interrupts, but I don't know how to debug this on a low level). Possible, but I haven't run into any problems running -CURRENT on this box with an SMP kernel. Can you boot verbosely and provide all of the output here or somewhere on the web? http://people.freebsd.org/~truckman/AN-M2_HD-8.1-STABLE-verbose.txt If possible, I would start by replacing the mainboard. The board looks to be a consumer-level board (I see an nfe(4) controller, for example). It's an Abit AN-M2 HD. The RAM is ECC. I haven't seen any machine check errors in the logs. I'll run prime95 as soon as I have a chance. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: CPU time accounting broken on 8-STABLE machine after a few hours of uptime
On Wed, Sep 29, 2010 at 12:39:49AM -0700, Don Lewis wrote: On 29 Sep, Jeremy Chadwick wrote: Given all the information here, in addition to the other portion of the thread (indicating ntpd reports extreme offset between the system clock and its stratum 1 source), I would say the motherboard is faulty or there is a system device which is behaving badly (possibly something pertaining to interrupts, but I don't know how to debug this on a low level). Possible, but I haven't run into any problems running -CURRENT on this box with an SMP kernel. Can you boot verbosely and provide all of the output here or somewhere on the web? http://people.freebsd.org/~truckman/AN-M2_HD-8.1-STABLE-verbose.txt If possible, I would start by replacing the mainboard. The board looks to be a consumer-level board (I see an nfe(4) controller, for example). It's an Abit AN-M2 HD. The RAM is ECC. I haven't seen any machine check errors in the logs. I'll run prime95 as soon as I have a chance. Thanks for the verbose boot. Since it works on -CURRENT, can you provide a verbose boot from that as well? Possibly someone made some changes between RELENG_8 and HEAD which fixed an issue, which could be MFC'd. -- | Jeremy Chadwick j...@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: CPU time accounting broken on 8-STABLE machine after a few hours of uptime
On 29 Sep, Jeremy Chadwick wrote: On Wed, Sep 29, 2010 at 12:39:49AM -0700, Don Lewis wrote: On 29 Sep, Jeremy Chadwick wrote: Given all the information here, in addition to the other portion of the thread (indicating ntpd reports extreme offset between the system clock and its stratum 1 source), I would say the motherboard is faulty or there is a system device which is behaving badly (possibly something pertaining to interrupts, but I don't know how to debug this on a low level). Possible, but I haven't run into any problems running -CURRENT on this box with an SMP kernel. Can you boot verbosely and provide all of the output here or somewhere on the web? http://people.freebsd.org/~truckman/AN-M2_HD-8.1-STABLE-verbose.txt If possible, I would start by replacing the mainboard. The board looks to be a consumer-level board (I see an nfe(4) controller, for example). It's an Abit AN-M2 HD. The RAM is ECC. I haven't seen any machine check errors in the logs. I'll run prime95 as soon as I have a chance. Thanks for the verbose boot. Since it works on -CURRENT, can you provide a verbose boot from that as well? Possibly someone made some changes between RELENG_8 and HEAD which fixed an issue, which could be MFC'd. Even when I saw the wierd ntp stepping problem and the calcru messages, the system was still stable enough to build hundreds of ports. In the most recent case, I built 800+ ports over several days without any other hiccups. It could also be a difference between SMP and !SMP. I just found a bug that causes an immediate panic if lock profiling is enabled on a !SMP kernel. This bug also exists in -CURRENT. Here's the patch: Index: sys/sys/mutex.h === RCS file: /home/ncvs/src/sys/sys/mutex.h,v retrieving revision 1.105.2.1 diff -u -r1.105.2.1 mutex.h --- sys/sys/mutex.h 3 Aug 2009 08:13:06 - 1.105.2.1 +++ sys/sys/mutex.h 29 Sep 2010 06:58:52 - @@ -251,8 +251,11 @@ #define _rel_spin_lock(mp) do { \ if (mtx_recursed((mp))) \ (mp)-mtx_recurse--;\ - else\ + else { \ (mp)-mtx_lock = MTX_UNOWNED; \ + LOCKSTAT_PROFILE_RELEASE_LOCK(LS_MTX_SPIN_UNLOCK_RELEASE, \ + mp);\ + } \ spinlock_exit();\ } while (0) #endif /* SMP */ After applying the above patch, I enabled lock profiling and got the following results when I ran make index: http://people.freebsd.org/~truckman/AN-M2_HD-8.1-STABLE_lock_profile.txt I didn't see anything strange happening this time. I don't know if I got lucky, or the change in kernel options fixed the bug. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: CPU time accounting broken on 8-STABLE machine after a few hours of uptime
on 29/09/2010 00:11 Don Lewis said the following: On 28 Sep, Don Lewis wrote: % vmstat -i interrupt total rate irq0: clk 60683442 1000 irq1: atkbd0 6 0 irq8: rtc7765537127 irq9: acpi0 13 0 irq10: ohci0 ehci1+ 10275064169 irq11: fwohci0 ahc+ 132133 2 irq12: psm0 21 0 irq14: ata090982 1 irq15: nfe0 ata1 18363 0 I'm not sure why I'm getting USB interrupts. There aren't any USB devices plugged into this machine. Answer: irq 10 is also shared by vgapci0 and atapci1. Just curious why Local APIC timer isn't being used for hardclock on your system. -- Andriy Gapon ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: CPU time accounting broken on 8-STABLE machine after a few hours of uptime
On 29 Sep, Andriy Gapon wrote: on 29/09/2010 00:11 Don Lewis said the following: On 28 Sep, Don Lewis wrote: % vmstat -i interrupt total rate irq0: clk 60683442 1000 irq1: atkbd0 6 0 irq8: rtc7765537127 irq9: acpi0 13 0 irq10: ohci0 ehci1+ 10275064169 irq11: fwohci0 ahc+ 132133 2 irq12: psm0 21 0 irq14: ata090982 1 irq15: nfe0 ata1 18363 0 I'm not sure why I'm getting USB interrupts. There aren't any USB devices plugged into this machine. Answer: irq 10 is also shared by vgapci0 and atapci1. Just curious why Local APIC timer isn't being used for hardclock on your system. I'm using the same kernel config as the one on a slower !SMP box which I'm trying to squeeze as much performance out of as possible. My kernel config file contains these statements: nooptions SMP nodeviceapic Testing with an SMP kernel is on my TODO list. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: CPU time accounting broken on 8-STABLE machine after a few hours of uptime
on 29/09/2010 11:56 Don Lewis said the following: I'm using the same kernel config as the one on a slower !SMP box which I'm trying to squeeze as much performance out of as possible. My kernel config file contains these statements: nooptions SMP nodeviceapic Testing with an SMP kernel is on my TODO list. SMP or not, it's really weird to see apic disabled nowadays. -- Andriy Gapon ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Diskless/readonly root booting issues
Hi all, I've been working on updating my semi-embedded images to 7.3-stable of late (I generally wait for .3+ releases), it's been a few years since the last time I did one of these and I'm having some issues getting my netboot test environment to behave itself. I'm sure it's something simple but I've spent quite a bit of time looking for answers and poking the system but no joy yet. Basically I use a PXE booted NFS root to test my reduced footprint image builds, the boot is working but init is attempting to remount / rw (in spite of it being marked ro in fstab) which of course fails because the directory is exported ro from the NFS server at which point the system dumps me to single user mode; === OUTPUT === Starting file system checks: udp: Netconfig database not found Mounting root filesystem rw failed, startup aborted ERROR: ABORTING BOOT (sending SIGTERM to parent)! Sep 30 09:60:02 init: /bin/sh on /etc/rc terminated abnormally, going to single user mode Enter full pathname of shell or RETURN for /bin/sh: Relevant configs from the diskless root == rc.conf == ifconfig_le0=DHCP diskless_mount=/etc/rc.initdiskless varsize=8192 varmfs=YES tmpsize=8192 tmpmfs=YES nfs_client_enable=YES dumpdev=NO = rc.initdiskless is the version from /usr/share/examples/rc.initdiskless == fstab == 192.168.2.2:/usr/fbtest / nfs ro 0 0 proc /proc procfs rw 0 0 == loader.conf == verbose_loading=YES autoboot_delay=2 Kernel is (obviously) built with NFS_ROOT and NFSCLIENT, relatively minimalist otherwise, have also tested with GENERIC, same result. I must be forgetting something simple in all of this, I don't recall it being terribly difficult to get this stuff working when I was doing my original work with 6.3, though I don't recall the use of the initdiskless script, IIRC I was using rc.diskless2 which (again IIRC) was later replaced by /etc/rc.d/diskless but I've not been able to find this script anywhere. Any suggestions would be greatly appreciated at this point. Thanks, Morgan Reed ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: resume slow on Thinkpad T42 FreeBSD 8-STABLE
Chuck Swiger wrote: MCA: Bank 1, Status 0xe20001f5 MCA: Global Cap 0x0005, Status 0x MCA: Vendor GenuineIntel, ID 0x695, APIC ID 0 MCA: CPU 0 UNCOR PCC OVER DCACHE L1 ??? error That is very likely to be a matter of luck. If I translate this MCA right, it looks to be an uncorrected error in L1 data cache on the CPU. Try to run something like prime95's torture test mode and see whether it fails overnight The test run for 17 hours without any problems (or MCA messages), then I put the laptop for a 5 minute sleep, resumed the test, and after 30 minutes of it there are now two MCA messages in dmesg. Are they somehow related, or is this a coincidence? ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: resume slow on Thinkpad T42 FreeBSD 8-STABLE
Hi-- On Sep 29, 2010, at 8:42 AM, Vitaly Magerya wrote: The test run for 17 hours without any problems (or MCA messages), That part is good. At least starting from normal operation, your laptop is running stably under load then I put the laptop for a 5 minute sleep, resumed the test, and after 30 minutes of it there are now two MCA messages in dmesg. Are they somehow related, or is this a coincidence? I doubt repeated coincidences. :-) Is prime95 testing running stable after waking from sleep? Regards, -- -Chuck ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: resume slow on Thinkpad T42 FreeBSD 8-STABLE
On Wed, Sep 29, 2010 at 09:57:53AM -0700, Chuck Swiger wrote: Hi-- On Sep 29, 2010, at 8:42 AM, Vitaly Magerya wrote: The test run for 17 hours without any problems (or MCA messages), That part is good. At least starting from normal operation, your laptop is running stably under load then I put the laptop for a 5 minute sleep, resumed the test, and after 30 minutes of it there are now two MCA messages in dmesg. Are they somehow related, or is this a coincidence? I doubt repeated coincidences. :-) Is prime95 testing running stable after waking from sleep? He's not running Prime95 (native Win32 app), he's running ports/math/mprime under FreeBSD natively. I don't know if this application stresses hardware to the same degree Prime95 does; I've used Prime95 many times to burn in new workstations. The Thinkpad hardware he's on is old (note the quotes), so I wouldn't be surprised if the CPU (Intel Pentium M) happens to induce a strange/odd MCA event as a result of going in/out of sleep state. It could be a general system bug of some sort as well (one which has no repercussions). Look at it this way: if his L1 cache was going bad, his system would be freaking out doing literally anything (booting the kernel for example); I'm under the impression Pentium M CPUs do not have ECC L1 cache. -- | Jeremy Chadwick j...@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: resume slow on Thinkpad T42 FreeBSD 8-STABLE
Jeremy Chadwick wrote: On Wed, Sep 29, 2010 at 09:57:53AM -0700, Chuck Swiger wrote: Is prime95 testing running stable after waking from sleep? Yes, 0 errors, 0 warnings. The Thinkpad hardware he's on is old (note the quotes), so I wouldn't be surprised if the CPU (Intel Pentium M) happens to induce a strange/odd MCA event as a result of going in/out of sleep state. It could be a general system bug of some sort as well (one which has no repercussions). Well, since it causes no other visible problems, it might just as well be a false alarm. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: resume slow on Thinkpad T42 FreeBSD 8-STABLE
On Wed, Sep 29, 2010 at 10:16:13AM -0700, Chuck Swiger wrote: On Sep 29, 2010, at 10:07 AM, Jeremy Chadwick wrote: On Wed, Sep 29, 2010 at 09:57:53AM -0700, Chuck Swiger wrote: I doubt repeated coincidences. :-) Is prime95 testing running stable after waking from sleep? He's not running Prime95 (native Win32 app), he's running ports/math/mprime under FreeBSD natively. I don't know if this application stresses hardware to the same degree Prime95 does; I've used Prime95 many times to burn in new workstations. It's doing the same math operations; something like mprime -t is the same as the Win32 test mode per the docs: -t Run the torture test. Same as Options/Torture Test. The Thinkpad hardware he's on is old (note the quotes), so I wouldn't be surprised if the CPU (Intel Pentium M) happens to induce a strange/odd MCA event as a result of going in/out of sleep state. It could be a general system bug of some sort as well (one which has no repercussions). That sounds reasonable to me, but I'm wary of uncorrected errors which seem to be reproducible to specific circumstances. Look at it this way: if his L1 cache was going bad, his system would be freaking out doing literally anything (booting the kernel for example); I'm under the impression Pentium M CPUs do not have ECC L1 cache. Sure, if the MCA report is reflecting a legitimate problem, and it was happening more often than every few minutes, and it happened after a cold reboot rather than after wakeup from sleep :-) I place more faith in ~17 hours of Prime95/mprime working OK to validate that the hardware is not obviously broken. Oh, absolutely. If anything my statement was indirectly agreeing with your recommended test (sans being unsure how mprime behaved). :-) I wonder if there's CPU errata or something along those lines which might explain the behaviour. -- | Jeremy Chadwick j...@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: resume slow on Thinkpad T42 FreeBSD 8-STABLE
On Wed, Sep 29, 2010 at 08:24:21PM +0300, Vitaly Magerya wrote: Jeremy Chadwick wrote: On Wed, Sep 29, 2010 at 09:57:53AM -0700, Chuck Swiger wrote: Is prime95 testing running stable after waking from sleep? Yes, 0 errors, 0 warnings. The Thinkpad hardware he's on is old (note the quotes), so I wouldn't be surprised if the CPU (Intel Pentium M) happens to induce a strange/odd MCA event as a result of going in/out of sleep state. It could be a general system bug of some sort as well (one which has no repercussions). Well, since it causes no other visible problems, it might just as well be a false alarm. Highly possible. If it bothers you to the point where you'd rather not see it, you can disable MCA events by setting hw.mca.enabled=0 in /boot/loader.conf. I don't know of a way to conditionally ignore certain MCAs. -- | Jeremy Chadwick j...@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
zfs send/receive: is this slow?
It's taken about 15 hours to copy 800GB. I'm sure there's some tuning I can do. The system is now running: # zfs send storage/bac...@transfer | zfs receive storage/compressed/bacula All the drives are Hitachi HDS722020ALA330 JKAOA28A ATA-8 SATA 2.x device from systat: 1 usersLoad 0.36 0.58 0.57 Sep 29 13:47 Mem:KBREALVIRTUAL VN PAGER SWAP PAGER Tot Share TotShareFree in out in out Act 420127584 54404411028 204492 count All 9623568736 1074363k18220 pages Proc:Interrupts r p d s w Csw Trp Sys Int Sof Flt141 cow9951 total 42 23k 668 3094 1951 2166 657288 zfod ohci0 ohci ozfod ohci2 ohci 13.6%Sys 0.8%Intr 0.2%User 0.0%Nice 85.5%Idle%ozfod ahc0 irq20 ||||||||||| daefr ahci0 22 === 366 prcfr 2000 cpu0: time 26 dtbuf47129 totfr 3 em0 irq256 Namei Name-cache Dir-cache10 desvn react 892 siis0 257 Callshits %hits % 87983 numvn pdwak 1056 siis1 259 46084608 100 24981 frevn pdpgs 2000 cpu3: time intrn 2000 cpu1: time Disks ada0 ada1 ada2 ada3 ada4 ada5 ada6 1355484 wire 2000 cpu2: time KB/t 35.95 37.00 36.75 41.44 40.05 40.86 41.11 25936 act tps 306 299 301 267 276 271 269 2452756 inact MB/s 10.75 10.82 10.79 10.79 10.80 10.80 10.81 76664 cache %busy27502537272727127828 free 427728 buf $ zpool iostat 10 capacity operationsbandwidth pool used avail read write read write -- - - - - - - storage 7.67T 5.02T358 38 43.1M 1.96M storage 7.67T 5.02T317475 39.4M 30.9M storage 7.67T 5.02T357533 44.3M 34.4M storage 7.67T 5.02T371556 46.0M 35.8M storage 7.67T 5.02T313521 38.9M 28.7M storage 7.67T 5.02T309457 38.4M 30.4M storage 7.67T 5.02T388589 48.2M 37.8M storage 7.67T 5.02T377581 46.8M 36.5M storage 7.67T 5.02T310559 38.4M 30.4M storage 7.67T 5.02T430611 53.4M 41.3M $ zfs get all storage/compressed NAMEPROPERTY VALUE SOURCE storage/compressed type filesystem - storage/compressed creation Tue Sep 28 20:35 2010 - storage/compressed used 856G - storage/compressed available 3.38T - storage/compressed referenced44.8K - storage/compressed compressratio 1.60x - storage/compressed mounted yes- storage/compressed quota none default storage/compressed reservation none default storage/compressed recordsize128K default storage/compressed mountpoint/storage/compresseddefault storage/compressed sharenfs offdefault storage/compressed checksum on default storage/compressed compression on local storage/compressed atime on default storage/compressed devices on default storage/compressed exec on default storage/compressed setuidon default storage/compressed readonly offdefault storage/compressed jailedoffdefault storage/compressed snapdir hidden default storage/compressed aclmode groupmask default storage/compressed aclinheritrestricted default storage/compressed canmount on default storage/compressed shareiscsioffdefault storage/compressed xattr offtemporary storage/compressed copies1 default storage/compressed version 4 - storage/compressed utf8only off- storage/compressed normalization none - storage/compressed casesensitivity sensitive - storage/compressed vscan
Re: zfs send/receive: is this slow?
On Wed, Sep 29, 2010 at 11:04 AM, Dan Langille d...@langille.org wrote: It's taken about 15 hours to copy 800GB. I'm sure there's some tuning I can do. The system is now running: # zfs send storage/bac...@transfer | zfs receive storage/compressed/bacula Try piping zfs data through mbuffer (misc/mbuffer in ports). I've found that it does help a lot to smooth out data flow and increase send/receive throughput even when send/receive happens on the same host. Run it with a buffer large enough to accommodate few seconds worth of write throughput for your target disks. Here's an example: http://blogs.everycity.co.uk/alasdair/2010/07/using-mbuffer-to-speed-up-slow-zfs-send-zfs-receive/ --Artem ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: wifi issues under -stable
Warren Block wrote: On Wed, 22 Sep 2010, Jim Bryant wrote: i have two laptops, both are Compaq(HP) C300 series using the motherboards with the 945GM chipset and using T7200 and T7600 Core2 Duos. One (this one) has an intel pro wireless 3945ABG installed, which returns: wpi0: Intel(R) PRO/Wireless 3945ABG irq 18 at device 0.0 on pci6 wpi0: Driver Revision 20071127 wpi0: 0x1000 bytes of rid 0x10 res 3 failed (0, 0x). wpi0: could not allocate memory resource device_attach: wpi0 attach returned 6 and the broadcom used in the other does the exact same thing. I'm thinking that this isn't really a problem with the wifi, but may be a mini-pci-e issue. Don't know about the Intel, but some Broadcoms work. Please show what you're doing in /boot/loader.conf and /etc/rc.conf. the other machine has configs pretty much the same as this one. the problem seems to be at the kernel level tho, at probe time. 4:19:52pm argus(14): cat /boot/loader.conf beastie_disable=YES # Turn the beastie boot menu on and off# Beginning of the block added by the VMware software vmxnet_load=YES # End of the block added by the VMware software 4:20:02pm argus(15): cat /etc/rc.conf # -- sysinstall generated deltas -- # Tue Mar 23 21:07:50 2010 # Created: Tue Mar 23 21:07:50 2010 # Enable network daemons for user convenience. # Please make all changes to this file, not to /etc/defaults/rc.conf. # This file now contains just the overrides from /etc/defaults/rc.conf. accounting_enable=YES gateway_enable=YES hostname=argus.root.com ifconfig_rl0=inet 192.168.0.2 192.168.0.1 netmask 255.255.255.0 inetd_enable=YES ipv6_enable=NO keyrate=fast lpd_enable=YES moused_enable=YES moused_port=/dev/psm0 moused_type=auto named_enable=YES nfs_client_enable=YES nfs_reserved_port_only=YES nfs_server_enable=YES router=/sbin/routed router_enable=YES router_flags=-q rpc_lockd_enable=YES rpc_statd_enable=YES rpcbind_enable=YES rwhod_enable=NO saver=NO scrnmap=NO sshd_enable=YES blanktime=300 font8x14=cp437-8x14 font8x16=cp437-8x16 font8x8=cp437-8x8 allscreens_flags=-c blink MODE_280# Set this vidcontrol mode for all virtual screens allscreens_kbdflags=-r fast # Set this kbdcontrol mode for all virtual screens fusefs_enable=YES mysql_enable=YES lircd_enable=YES 4:23:36pm argus(17): pciconf -lv hos...@pci0:0:0:0: class=0x06 card=0x30a5103c chip=0x27a08086 rev=0x03 hdr=0x00 vendor = 'Intel Corporation' device = '955XM/945GM/PM/GMS/940GML Express Processor to DRAM Controller' class = bridge subclass = HOST-PCI vgap...@pci0:0:2:0: class=0x03 card=0x30a5103c chip=0x27a28086 rev=0x03 hdr=0x00 vendor = 'Intel Corporation' device = 'Mobile 945GM/GU Express Integrated Graphics Controller' class = display subclass = VGA vgap...@pci0:0:2:1: class=0x038000 card=0x30a5103c chip=0x27a68086 rev=0x03 hdr=0x00 vendor = 'Intel Corporation' device = 'Mobile 945GM/GU Express Integrated Graphics Controller' class = display hd...@pci0:0:27:0: class=0x040300 card=0x30a5103c chip=0x27d88086 rev=0x01 hdr=0x00 vendor = 'Intel Corporation' device = 'IDT High Definition Audio Driver (BA101897)' class = multimedia subclass = HDA pc...@pci0:0:28:0: class=0x060400 card=0x30a5103c chip=0x27d08086 rev=0x01 hdr=0x01 vendor = 'Intel Corporation' device = '82801G (ICH7 Family) PCIe Root Port' class = bridge subclass = PCI-PCI pc...@pci0:0:28:2: class=0x060400 card=0x30a5103c chip=0x27d48086 rev=0x01 hdr=0x01 vendor = 'Intel Corporation' device = '82801G (ICH7 Family) PCIe Root Port' class = bridge subclass = PCI-PCI uh...@pci0:0:29:0: class=0x0c0300 card=0x30a5103c chip=0x27c88086 rev=0x01 hdr=0x00 vendor = 'Intel Corporation' device = '82801G (ICH7 Family) USB Universal Host Controller' class = serial bus subclass = USB uh...@pci0:0:29:1: class=0x0c0300 card=0x30a5103c chip=0x27c98086 rev=0x01 hdr=0x00 vendor = 'Intel Corporation' device = '82801G (ICH7 Family) USB Universal Host Controller' class = serial bus subclass = USB uh...@pci0:0:29:2: class=0x0c0300 card=0x30a5103c chip=0x27ca8086 rev=0x01 hdr=0x00 vendor = 'Intel Corporation' device = '82801G (ICH7 Family) USB Universal Host Controller' class = serial bus subclass = USB eh...@pci0:0:29:7: class=0x0c0320 card=0x30a5103c chip=0x27cc8086 rev=0x01 hdr=0x00 vendor = 'Intel Corporation' device = '82801G (ICH7 Family) USB 2.0 Enhanced Host Controller' class = serial bus subclass = USB pc...@pci0:0:30:0: class=0x060401 card=0x30a5103c chip=0x24488086 rev=0xe1 hdr=0x01 vendor = 'Intel Corporation' device = '82801 Family (ICH2/3/4/5/6/7/8/9-M) Hub Interface to PCI Bridge' class = bridge subclass = PCI-PCI
Re: cpu timer issues
I do not understand what you mean by a verbose dmesg.. looking at the man page there is no verbose option for dmesg except what I completed (dmesg -a). Once that is clarified I can reboot the backup machine and turn on ACPI for you. On 29/09/10 5:29 PM, Jeremy Chadwick wrote: On Wed, Sep 29, 2010 at 01:49:39PM +1000, Jurgen Weber wrote: Andriy You can find everything you are after here: http://pastebin.com/WH4V2W0F The information provided here shows ACPI is disabled in addition to the boot not being verbose. -- -- ish http://www.ish.com.au Level 1, 30 Wilson Street Newtown 2042 Australia phone +61 2 9550 5001 fax +61 2 9550 4001 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: cpu timer issues
Hi I do not understand what you mean by a verbose dmesg.. looking at the man page there is no verbose option for dmesg except what I completed (dmesg -a). Once that is clarified I can reboot the backup machine and turn on ACPI for you. Thanks On 29/09/10 5:26 PM, Andriy Gapon wrote: on 29/09/2010 06:49 Jurgen Weber said the following: Andriy You can find everything you are after here: http://pastebin.com/WH4V2W0F Looks like this was with ACPI disabled? Can you try to re-enable it? Also, it doesn't look like the dmesg is verbose. On 28/09/10 8:07 PM, Andriy Gapon wrote: on 28/09/2010 10:54 Jurgen Weber said the following: # dmesg | grep Timecounter Timecounter i8254 frequency 1193182 Hz quality 0 Timecounters tick every 1.000 msec # sysctl kern.timecounter.hardware kern.timecounter.hardware: i8254 Only have one timer to choose from. Can you provide a little bit more of hard data than the above? Specifically, the following sysctls: kern.timecounter dev.cpu Output of vmstat -i. _Verbose_ boot dmesg. Please do not disable ACPI when taking this data. Preferably, upload it somewhere and post a link to it. -- -- ish http://www.ish.com.au Level 1, 30 Wilson Street Newtown 2042 Australia phone +61 2 9550 5001 fax +61 2 9550 4001 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: cpu timer issues
On Thu, Sep 30, 2010 at 07:51:49AM +1000, Jurgen Weber wrote: I do not understand what you mean by a verbose dmesg.. looking at the man page there is no verbose option for dmesg except what I completed (dmesg -a). Once that is clarified I can reboot the backup machine and turn on ACPI for you. On 29/09/10 5:29 PM, Jeremy Chadwick wrote: On Wed, Sep 29, 2010 at 01:49:39PM +1000, Jurgen Weber wrote: Andriy You can find everything you are after here: http://pastebin.com/WH4V2W0F The information provided here shows ACPI is disabled in addition to the boot not being verbose. When the machine boots (when loader starts), you'll see the FreeBSD logo with a menu of choices (boot, boot with ACPI disabled, single user mode, etc.). One of them is boot verbosely; I think it's #5, labelled Boot with verbose logging or something like that. Choose that. That will cause your machine to boot with ACPI enabled, in addition to booting verbosely. There will be a LOT more information printed on the screen during the boot process, and it should be visible in /var/log/messages after the machine is started. This is the information we're looking for. HTH! -- | Jeremy Chadwick j...@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: cpu timer issues
on 30/09/2010 00:51 Jurgen Weber said the following: Hi I do not understand what you mean by a verbose dmesg.. looking at the man page there is no verbose option for dmesg except what I completed (dmesg -a). Once that is clarified I can reboot the backup machine and turn on ACPI for you. Verbose dmesg is produced when kernel is booted with verbose logging. Either boot -v on loader prompt. Or '5' (IIRC) in loader menu. Or nextboot -k kernel -o -v before reboot. Or verbose_loading=YES in loader.conf. -- Andriy Gapon ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: cpu timer issues
Gentlemen Ah, ok. Learn something new everyday. Fantastic. The first time the machine stopped during the boot process, but that is ok the 2nd time we have success. http://pastebin.com/r4UWdN7U I am not sure if ACPI is on, Jeremy you mention below that it should be in just by booting with this option so let me know if there are any problems there. Thanks Jurgen On 30/09/10 7:56 AM, Jeremy Chadwick wrote: On Thu, Sep 30, 2010 at 07:51:49AM +1000, Jurgen Weber wrote: I do not understand what you mean by a verbose dmesg.. looking at the man page there is no verbose option for dmesg except what I completed (dmesg -a). Once that is clarified I can reboot the backup machine and turn on ACPI for you. On 29/09/10 5:29 PM, Jeremy Chadwick wrote: On Wed, Sep 29, 2010 at 01:49:39PM +1000, Jurgen Weber wrote: Andriy You can find everything you are after here: http://pastebin.com/WH4V2W0F The information provided here shows ACPI is disabled in addition to the boot not being verbose. When the machine boots (when loader starts), you'll see the FreeBSD logo with a menu of choices (boot, boot with ACPI disabled, single user mode, etc.). One of them is boot verbosely; I think it's #5, labelled Boot with verbose logging or something like that. Choose that. That will cause your machine to boot with ACPI enabled, in addition to booting verbosely. There will be a LOT more information printed on the screen during the boot process, and it should be visible in /var/log/messages after the machine is started. This is the information we're looking for. HTH! -- -- ish http://www.ish.com.au Level 1, 30 Wilson Street Newtown 2042 Australia phone +61 2 9550 5001 fax +61 2 9550 4001 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: CPU time accounting broken on 8-STABLE machine after a few hours of uptime
On 29 Sep, Andriy Gapon wrote: on 29/09/2010 11:56 Don Lewis said the following: I'm using the same kernel config as the one on a slower !SMP box which I'm trying to squeeze as much performance out of as possible. My kernel config file contains these statements: nooptions SMP nodeviceapic Testing with an SMP kernel is on my TODO list. SMP or not, it's really weird to see apic disabled nowadays. I tried enabling apic and got worse results. I saw ping RTTs as high as 67 seconds. Here's the timer info with apic enabled: # sysctl kern.timecounter kern.timecounter.tick: 1 kern.timecounter.choice: TSC(800) ACPI-fast(1000) i8254(0) dummy(-100) kern.timecounter.hardware: ACPI-fast kern.timecounter.stepwarnings: 0 kern.timecounter.tc.i8254.mask: 65535 kern.timecounter.tc.i8254.counter: 53633 kern.timecounter.tc.i8254.frequency: 1193182 kern.timecounter.tc.i8254.quality: 0 kern.timecounter.tc.ACPI-fast.mask: 16777215 kern.timecounter.tc.ACPI-fast.counter: 7988816 kern.timecounter.tc.ACPI-fast.frequency: 3579545 kern.timecounter.tc.ACPI-fast.quality: 1000 kern.timecounter.tc.TSC.mask: 4294967295 kern.timecounter.tc.TSC.counter: 1341917999 kern.timecounter.tc.TSC.frequency: 2500014018 kern.timecounter.tc.TSC.quality: 800 kern.timecounter.invariant_tsc: 0 Here's the verbose boot info with apic: http://people.freebsd.org/~truckman/AN-M2_HD-8.1-STABLE-apic-verbose.txt I've also experimented with SMP as well as SCHED_4BSD (all previous testing was with !SMP and SCHED_ULE). I still see occasional problems with SCHED_4BSD and !SMP, but so far I have not seen any problems with SCHED_ULE and SMP. I did manage to catch the problem with lock profiling enabled: http://people.freebsd.org/~truckman/AN-M2_HD-8.1-STABLE_lock_profile_freeze.txt I'm currently testing SMP some more to verify if it really avoids this problem. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: zfs send/receive: is this slow?
On 9/29/2010 3:57 PM, Artem Belevich wrote: On Wed, Sep 29, 2010 at 11:04 AM, Dan Langilled...@langille.org wrote: It's taken about 15 hours to copy 800GB. I'm sure there's some tuning I can do. The system is now running: # zfs send storage/bac...@transfer | zfs receive storage/compressed/bacula Try piping zfs data through mbuffer (misc/mbuffer in ports). I've found that it does help a lot to smooth out data flow and increase send/receive throughput even when send/receive happens on the same host. Run it with a buffer large enough to accommodate few seconds worth of write throughput for your target disks. Thanks. I just installed it. I'll use it next time. I don't want to interrupt this one. I'd like to see how long it takes. Then compare. Here's an example: http://blogs.everycity.co.uk/alasdair/2010/07/using-mbuffer-to-speed-up-slow-zfs-send-zfs-receive/ That looks really good. Thank you. -- Dan Langille - http://langille.org/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org