Re: Strange reboot since 9.1
Hello, i have enabled dumpdev=AUTO and run kgdb after a reboot. Here is the backtrace: root@freebsd-server kgdb GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type show copying to see the conditions. There is absolutely no warranty for GDB. Type show warranty for details. This GDB was configured as amd64-marcel-freebsd... #0 sched_switch (td=0x812228a0, newtd=0xfe00051c8000, flags=Variable flags is not available. ) at /usr/src/sys/kern/sched_ule.c:1927 1927cpuid = PCPU_GET(cpuid); (kgdb) bt #0 sched_switch (td=0x812228a0, newtd=0xfe00051c8000, flags=Variable flags is not available. ) at /usr/src/sys/kern/sched_ule.c:1927 #1 0x808f2d46 in mi_switch (flags=260, newtd=0x0) at /usr/src/sys/kern/kern_synch.c:485 #2 0x8092ba72 in sleepq_timedwait (wchan=0x81222400, pri=84) at /usr/src/sys/kern/subr_sleepqueue.c:658 #3 0x808f332f in _sleep (ident=0x81222400, lock=0x0, priority=Variable priority is not available. ) at /usr/src/sys/kern/kern_synch.c:246 #4 0x80b429db in scheduler (dummy=Variable dummy is not available. ) at /usr/src/sys/vm/vm_glue.c:788 #5 0x8089c047 in mi_startup () at /usr/src/sys/kern/init_main.c:277 #6 0x802b526c in btext () at /usr/src/sys/amd64/amd64/locore.S:81 #7 0x0001 in ?? () #8 0x81240f80 in tdq_cpu () #9 0x812228a0 in proc0 () #10 0x in ?? () #11 0x81529b90 in ?? () #12 0x81529b38 in ?? () #13 0xfe00051c8000 in ?? () #14 0x8091352e in sched_switch (td=0x0, newtd=0x0, flags=Variable flags is not available. ) at /usr/src/sys/kern/sched_ule.c:1921 Previous frame inner to this frame (corrupt stack?) (kgdb) bt f #0 sched_switch (td=0x812228a0, newtd=0xfe00051c8000, flags=Variable flags is not available. ) at /usr/src/sys/kern/sched_ule.c:1927 __res = 2 __s = Variable __s is not available. -- Best regards, Loïc BLOT, Engineering UNIX Systems, Security and Networks http://www.unix-experience.fr Le mercredi 06 mars 2013 à 11:18 +0200, Marin Atanasov Nikolov a écrit : On Wed, Mar 6, 2013 at 10:55 AM, Loïc Blot loic.b...@unix-experience.fr wrote: Hello, Hi, Since FreeBSD 9.1 I have strange problems with the distribution. Some servers are rebooting without any kernel panic, instanly. First i thought it's a problem with my KVM system, but one of my FreeBSD under a Dell R210 have the same problem. The servers concerned are now: - Monitoring server - LDAP test server - Some other servers, randomly (not in production). First i thought it's a problem with my FreeBSD install, then i download another time the ISO but the problem was already here. After i try another thing, install 9.0 and upgrade to 9.1 but same problem. How can i get informations about this problem ? I've had similar issues with one of my FreeBSD systems. My system had spontaneous reboots without any kernel panic, without any clear evidence of why it happened. After a lot of trials and tests the root cause appeared to be the amount of ZFS snapshots I had, which were more than 1K on a 8G system. Upgrading from 9.0 to 9.1 didn't solve the issue, as clearly I had to do some cleanup of the ZFS snapshots and since then it's more than a month without any reboots. Few pointers that you could use -- get these systems monitored and keep an eye on the monitoring system -- CPU usage, memory, processes, network traffic, etc.. I've noticed that my system was running low on free memory and that later led me to the ZFS snapshots clue. So, my advise is to get first these systems monitored and watch for anything unusual happening. Then further investigate. Good luck. Regards, Marin Thanks for advance. -- Best regards, Loïc BLOT, Engineering UNIX Systems, Security and Networks http://www.unix-experience.fr ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org -- Marin Atanasov Nikolov dnaeon AT gmail DOT com http://www.unix-heaven.org/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Musings on ZFS Backup strategies
I have found that the use of mbuffer really speeds up the differential transfer process: #!/bin/sh export PATH=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin: pool=zroot destination=tank host=1.2.3.4 today=`date +$type-%Y-%m-%d` yesterday=`date -v -1d +$type-%Y-%m-%d` # create today snapshot snapshot_today=$pool@$today # look for a snapshot with this name if zfs list -H -o name -t snapshot | sort | grep $snapshot_today$ /dev/null; then echo snapshot, $snapshot_today, already exists exit 1 else echo taking todays snapshot, $snapshot_today | sendmail root zfs snapshot -r $snapshot_today fi # look for yesterday snapshot snapshot_yesterday=$pool@$yesterday if zfs list -H -o name -t snapshot | sort | grep $snapshot_yesterday$ /dev/null; then echo yesterday snapshot, $snapshot_yesterday, exists lets proceed with backup zfs send -R -i $snapshot_yesterday $snapshot_today | mbuffer -q -v 0 -s 128k -m 1G | ssh root@$host mbuffer -s 128k -m 1G | zfs receive -Fd $destination /dev/null echo backup complete destroying yesterday snapshot | sendmail root zfs destroy -r $snapshot_yesterday echo Backup done | sendmail root exit 0 else echo missing yesterday snapshot aborting, $snapshot_yesterday exit 1 fi -- George Kontostanos --- http://www.aisecure.net ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Strange reboot since 9.1
on 07/03/2013 12:27 Loïc Blot said the following: Hello, i have enabled dumpdev=AUTO and run kgdb after a reboot. Here is the backtrace: root@freebsd-server kgdb It's a stack trace of the first thread in your live running system. You need to read kgdb(1), inspect your /var/crash directory and pass a proper vmcore file, if any, to kgdb. GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type show copying to see the conditions. There is absolutely no warranty for GDB. Type show warranty for details. This GDB was configured as amd64-marcel-freebsd... #0 sched_switch (td=0x812228a0, newtd=0xfe00051c8000, flags=Variable flags is not available. ) at /usr/src/sys/kern/sched_ule.c:1927 1927 cpuid = PCPU_GET(cpuid); (kgdb) bt #0 sched_switch (td=0x812228a0, newtd=0xfe00051c8000, flags=Variable flags is not available. ) at /usr/src/sys/kern/sched_ule.c:1927 #1 0x808f2d46 in mi_switch (flags=260, newtd=0x0) at /usr/src/sys/kern/kern_synch.c:485 #2 0x8092ba72 in sleepq_timedwait (wchan=0x81222400, pri=84) at /usr/src/sys/kern/subr_sleepqueue.c:658 #3 0x808f332f in _sleep (ident=0x81222400, lock=0x0, priority=Variable priority is not available. ) at /usr/src/sys/kern/kern_synch.c:246 #4 0x80b429db in scheduler (dummy=Variable dummy is not available. ) at /usr/src/sys/vm/vm_glue.c:788 #5 0x8089c047 in mi_startup () at /usr/src/sys/kern/init_main.c:277 #6 0x802b526c in btext () at /usr/src/sys/amd64/amd64/locore.S:81 #7 0x0001 in ?? () #8 0x81240f80 in tdq_cpu () #9 0x812228a0 in proc0 () #10 0x in ?? () #11 0x81529b90 in ?? () #12 0x81529b38 in ?? () #13 0xfe00051c8000 in ?? () #14 0x8091352e in sched_switch (td=0x0, newtd=0x0, flags=Variable flags is not available. ) at /usr/src/sys/kern/sched_ule.c:1921 Previous frame inner to this frame (corrupt stack?) (kgdb) bt f #0 sched_switch (td=0x812228a0, newtd=0xfe00051c8000, flags=Variable flags is not available. ) at /usr/src/sys/kern/sched_ule.c:1927 __res = 2 __s = Variable __s is not available. -- Andriy Gapon ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Strange reboot since 9.1
Hi Andriy, thanks for your help. here is the stack backtrace (i have 11 core.txt files, and each has this crash). (cat /var/crash/core.txt.11) panic: page fault cpuid = 0 KDB: stack backtrace: #0 0x809208a6 at kdb_backtrace+0x66 #1 0x808ea8be at panic+0x1ce #2 0x80bd8240 at trap_fatal+0x290 #3 0x80bd857d at trap_pfault+0x1ed #4 0x80bd8b9e at trap+0x3ce #5 0x80bc315f at calltrap+0x8 #6 0x80a861d5 at udp_input+0x475 #7 0x80a043dc at ip_input+0xac #8 0x809adafb at netisr_dispatch_src+0x20b #9 0x809a35cd at ether_demux+0x14d #10 0x809a38a4 at ether_nh_input+0x1f4 #11 0x809adafb at netisr_dispatch_src+0x20b #12 0x80438fd7 at bce_intr+0x487 #13 0x808be8d4 at intr_event_execute_handlers+0x104 #14 0x808c0076 at ithread_loop+0xa6 #15 0x808bb9ef at fork_exit+0x11f #16 0x80bc368e at fork_trampoline+0xe Uptime: 2h6m59s Dumping 1177 out of 8162 MB:..2%..11%..21%..32%..41%..51%..62%..71%..81%..92% I can't read vmcore.11 only with this option: kgdb -d /var/crash/vmcore.11 I read man and thought i must use kgdb -c /var/crash/vmcore.11 but it's not a suitable image. (kgdb: couldn't find a suitable kernel image) This servers uses UDP packets, for SNMP requests ( 1/h), NTP (a little), Syslog (that's all i remember). -- Best regards, Loïc BLOT, Engineering UNIX Systems, Security and Networks http://www.unix-experience.fr Le jeudi 07 mars 2013 à 14:55 +0200, Andriy Gapon a écrit : on 07/03/2013 12:27 Loïc Blot said the following: Hello, i have enabled dumpdev=AUTO and run kgdb after a reboot. Here is the backtrace: root@freebsd-server kgdb It's a stack trace of the first thread in your live running system. You need to read kgdb(1), inspect your /var/crash directory and pass a proper vmcore file, if any, to kgdb. GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type show copying to see the conditions. There is absolutely no warranty for GDB. Type show warranty for details. This GDB was configured as amd64-marcel-freebsd... #0 sched_switch (td=0x812228a0, newtd=0xfe00051c8000, flags=Variable flags is not available. ) at /usr/src/sys/kern/sched_ule.c:1927 1927cpuid = PCPU_GET(cpuid); (kgdb) bt #0 sched_switch (td=0x812228a0, newtd=0xfe00051c8000, flags=Variable flags is not available. ) at /usr/src/sys/kern/sched_ule.c:1927 #1 0x808f2d46 in mi_switch (flags=260, newtd=0x0) at /usr/src/sys/kern/kern_synch.c:485 #2 0x8092ba72 in sleepq_timedwait (wchan=0x81222400, pri=84) at /usr/src/sys/kern/subr_sleepqueue.c:658 #3 0x808f332f in _sleep (ident=0x81222400, lock=0x0, priority=Variable priority is not available. ) at /usr/src/sys/kern/kern_synch.c:246 #4 0x80b429db in scheduler (dummy=Variable dummy is not available. ) at /usr/src/sys/vm/vm_glue.c:788 #5 0x8089c047 in mi_startup () at /usr/src/sys/kern/init_main.c:277 #6 0x802b526c in btext () at /usr/src/sys/amd64/amd64/locore.S:81 #7 0x0001 in ?? () #8 0x81240f80 in tdq_cpu () #9 0x812228a0 in proc0 () #10 0x in ?? () #11 0x81529b90 in ?? () #12 0x81529b38 in ?? () #13 0xfe00051c8000 in ?? () #14 0x8091352e in sched_switch (td=0x0, newtd=0x0, flags=Variable flags is not available. ) at /usr/src/sys/kern/sched_ule.c:1921 Previous frame inner to this frame (corrupt stack?) (kgdb) bt f #0 sched_switch (td=0x812228a0, newtd=0xfe00051c8000, flags=Variable flags is not available. ) at /usr/src/sys/kern/sched_ule.c:1927 __res = 2 __s = Variable __s is not available. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Strange reboot since 9.1
Em 07/03/13 10:12, Loïc Blot escreveu: Hi Andriy, thanks for your help. here is the stack backtrace (i have 11 core.txt files, and each has this crash). (cat /var/crash/core.txt.11) panic: page fault cpuid = 0 KDB: stack backtrace: #0 0x809208a6 at kdb_backtrace+0x66 #1 0x808ea8be at panic+0x1ce #2 0x80bd8240 at trap_fatal+0x290 #3 0x80bd857d at trap_pfault+0x1ed #4 0x80bd8b9e at trap+0x3ce #5 0x80bc315f at calltrap+0x8 #6 0x80a861d5 at udp_input+0x475 #7 0x80a043dc at ip_input+0xac #8 0x809adafb at netisr_dispatch_src+0x20b #9 0x809a35cd at ether_demux+0x14d #10 0x809a38a4 at ether_nh_input+0x1f4 #11 0x809adafb at netisr_dispatch_src+0x20b #12 0x80438fd7 at bce_intr+0x487 #13 0x808be8d4 at intr_event_execute_handlers+0x104 #14 0x808c0076 at ithread_loop+0xa6 #15 0x808bb9ef at fork_exit+0x11f #16 0x80bc368e at fork_trampoline+0xe Uptime: 2h6m59s Dumping 1177 out of 8162 MB:..2%..11%..21%..32%..41%..51%..62%..71%..81%..92% I can't read vmcore.11 only with this option: kgdb -d /var/crash/vmcore.11 I read man and thought i must use kgdb -c /var/crash/vmcore.11 but it's not a suitable image. (kgdb: couldn't find a suitable kernel image) This servers uses UDP packets, for SNMP requests ( 1/h), NTP (a little), Syslog (that's all i remember). Hi, Look this http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-gdb.html []'s Gondim ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS stalls -- and maybe we should be talking about defaults?
On 3/7/2013 1:21 AM, Peter Jeremy wrote: On 2013-Mar-04 16:48:18 -0600, Karl Denninger k...@denninger.net wrote: The subject machine in question has 12GB of RAM and dual Xeon 5500-series processors. It also has an ARECA 1680ix in it with 2GB of local cache and the BBU for it. The ZFS spindles are all exported as JBOD drives. I set up four disks under GPT, have a single freebsd-zfs partition added to them, are labeled and the providers are then geli-encrypted and added to the pool. What sort of disks? SAS or SATA? SATA. They're clean; they report no errors, no retries, no corrected data (ECC) etc. They also have been running for a couple of years under UFS+SU without problems. This isn't new hardware; it's an in-service system. also known good. I began to get EXTENDED stalls with zero I/O going on, some lasting for 30 seconds or so. The system was not frozen but anything that touched I/O would lock until it cleared. Dedup is off, incidentally. When the system has stalled: - Do you see very low free memory? Yes. Effectively zero. - What happens to all the different CPU utilisation figures? Do they all go to zero? Do you get high system or interrupt CPU (including going to 1 core's worth)? No, they start to fall. This is a bad piece of data to trust though because I am geli-encrypting the spindles, so falling CPU doesn't mean the CPU is actually idle (since with no I/O there is nothing going through geli.) I'm working on instrumenting things sufficiently to try to peel that off -- I suspect the kernel is spinning on something, but the trick is finding out what it is. - What happens to interrupt load? Do you see any disk controller interrupts? None. Would you be able to build a kernel with WITNESS (and WITNESS_SKIPSPIN) and see if you get any errors when stalls happen. If I have to. That's easy to do on the test box -- on the production one, not so much. On 2013-Mar-05 14:09:36 -0800, Jeremy Chadwick j...@koitsu.org wrote: On Tue, Mar 05, 2013 at 01:09:41PM +0200, Andriy Gapon wrote: Completely unrelated to the main thread: on 05/03/2013 07:32 Jeremy Chadwick said the following: That said, I still do not recommend ZFS for a root filesystem Why? Too long a history of problems with it and weird edge cases (keep reading); the last thing an administrator wants to deal with is a system where the root filesystem won't mount/can't be used. It makes recovery or problem-solving (i.e. the server is not physically accessible given geographic distances) very difficult. I've had lots of problems with a gmirrored UFS root as well. The biggest issue is that gmirror has no audit functionality so you can't verify that both sides of a mirror really do have the same data. I have root on a 2-drive RAID mirror (done in the controller) and that has been fine. The controller does scrubs on a regular basis internally. The problem is that if it gets a clean read that is different (e.g. no ECC indications, etc) it doesn't know which is the correct copy. The good news is that hasn't happened yet :-) The risk of this happening as my data store continues to expand is one of the reasons I want to move toward ZFS, but not necessarily for the boot drives. For the data store, however My point/opinion: UFS for a root filesystem is guaranteed to work without any fiddling about and, barring drive failures or controller issues, is (again, my opinion) a lot more risk-free than ZFS-on-root. AFAIK, you can't boot from anything other than a single disk (ie no graid). Where I am right now is this: 1. I *CANNOT* reproduce the spins on the test machine with Postgres stopped in any way. Even with multiple ZFS send/recv copies going on and the load average north of 20 (due to all the geli threads), the system doesn't stall or produce any notable pauses in throughput. Nor does the system RAM allocation get driven hard enough to force paging. This is with NO tuning hacks in /boot/loader.conf. I/O performance is both stable and solid. 2. WITH Postgres running as a connected hot spare (identical to the production machine), allocating ~1.5G of shared, wired memory, running the same synthetic workload in (1) above I am getting SMALL versions of the misbehavior. However, while system RAM allocation gets driven pretty hard and reaches down toward 100MB in some instances it doesn't get driven hard enough to allocate swap. The burstiness is very evident in the iostat figures with spates getting into the single digit MB/sec range from time to time but it's not enough to drive the system to a full-on stall. There's pretty-clearly a bad interaction here between Postgres wiring memory and the ARC, when the latter is left alone and allowed to do what it wants. I'm continuing to work on replicating this on the test machine... just not completely there yet. -- -- Karl Denninger /The Market Ticker ®/ http://market-ticker.org Cuda Systems LLC signature.asc Description: OpenPGP
Re: Sanity Check on Mac Mini
On 03/07/13 01:59, Doug Hardie wrote: I have a new Mac Mini and have encountered the same problem reported last year by Richard Kuhns. YongHyeon PYUN provided some patches to the kernel that resolved the problem. However, without an internet connection its a bit tricky to get them into the system. Here is the approach I believe will work, but wanted to check first before I really mess things up. 1. Downloaded from current today via svnweb.freebsd.org: sys/dev/bge/if_bgereg.h sys/dev/bge/if_bge.c sys/dev/mii/brgphy.c I believe the patches are incorporated in today's versions. The comments indicate such. Thus I don't need to apply the original supplied patch. 2. Put those on a flash drive. 3. Install 9.1 release from flash drive onto the Mini disk. Have to include the system source. 4. Copy the files from 1 above from flash over the files on the disk. 5. Rebuild the kernel and install it. Thanks, -- Doug That's worked for me 3 times now. -- Richard Kuhns r...@wintek.com My Desk: 765-269-8541 Wintek Corporation Internet Support: 765-269-8503 427 N 6th Street Consulting: 765-269-8504 Lafayette, IN 47901-2211 Accounting: 765-269-8502 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Strange reboot since 9.1
Hi Marcelo, thanks. Here is a better trace: - kgdb /boot/kernel/kernel.symbols /var/crash/vmcore.11 GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type show copying to see the conditions. There is absolutely no warranty for GDB. Type show warranty for details. This GDB was configured as amd64-marcel-freebsd... Unread portion of the kernel message buffer: Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x0 fault code = supervisor read data, page not present instruction pointer = 0x20:0x80a84414 stack pointer = 0x28:0xff822fc267a0 frame pointer = 0x28:0xff822fc26830 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 12 (irq265: bce0) trap number = 12 panic: page fault cpuid = 0 KDB: stack backtrace: #0 0x809208a6 at kdb_backtrace+0x66 #1 0x808ea8be at panic+0x1ce #2 0x80bd8240 at trap_fatal+0x290 #3 0x80bd857d at trap_pfault+0x1ed #4 0x80bd8b9e at trap+0x3ce #5 0x80bc315f at calltrap+0x8 #6 0x80a861d5 at udp_input+0x475 #7 0x80a043dc at ip_input+0xac #8 0x809adafb at netisr_dispatch_src+0x20b #9 0x809a35cd at ether_demux+0x14d #10 0x809a38a4 at ether_nh_input+0x1f4 #11 0x809adafb at netisr_dispatch_src+0x20b #12 0x80438fd7 at bce_intr+0x487 #13 0x808be8d4 at intr_event_execute_handlers+0x104 #14 0x808c0076 at ithread_loop+0xa6 #15 0x808bb9ef at fork_exit+0x11f #16 0x80bc368e at fork_trampoline+0xe Uptime: 27m20s Dumping 1265 out of 8162 MB:..2%..11%..21%..31%..41%..51%..61%..71%..81%..92% #0 doadump (textdump=Variable textdump is not available. ) at pcpu.h:224 224 pcpu.h: No such file or directory. in pcpu.h (kgdb) bt f #0 doadump (textdump=Variable textdump is not available. ) at pcpu.h:224 No locals. #1 0x808ea3a1 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:448 _ep = Variable _ep is not available. (kgdb) bt #0 doadump (textdump=Variable textdump is not available. ) at pcpu.h:224 #1 0x808ea3a1 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:448 #2 0x808ea897 in panic (fmt=0x1 Address 0x1 out of bounds) at /usr/src/sys/kern/kern_shutdown.c:636 #3 0x80bd8240 in trap_fatal (frame=0xc, eva=Variable eva is not available. ) at /usr/src/sys/amd64/amd64/trap.c:857 #4 0x80bd857d in trap_pfault (frame=0xff822fc266f0, usermode=0) at /usr/src/sys/amd64/amd64/trap.c:773 #5 0x80bd8b9e in trap (frame=0xff822fc266f0) at /usr/src/sys/amd64/amd64/trap.c:456 #6 0x80bc315f in calltrap () at /usr/src/sys/amd64/amd64/exception.S:228 #7 0x80a84414 in udp_append (inp=0xfe019e2a1000, ip=0xfe00444b6c80, n=0xfe00444b6c00, off=20, udp_in=0xff822fc268a0) at /usr/src/sys/netinet/udp_usrreq.c:252 #8 0x80a861d5 in udp_input (m=0xfe00444b6c00, off=Variable off is not available. ) at /usr/src/sys/netinet/udp_usrreq.c:618 #9 0x80a043dc in ip_input (m=0xfe00444b6c00) at /usr/src/sys/netinet/ip_input.c:760 #10 0x809adafb in netisr_dispatch_src (proto=1, source=Variable source is not available. ) at /usr/src/sys/net/netisr.c:1013 #11 0x809a35cd in ether_demux (ifp=0xfe00053fa000, m=0xfe00444b6c00) at /usr/src/sys/net/if_ethersubr.c:940 #12 0x809a38a4 in ether_nh_input (m=Variable m is not available. ) at /usr/src/sys/net/if_ethersubr.c:759 #13 0x809adafb in netisr_dispatch_src (proto=9, source=Variable source is not available. ) at /usr/src/sys/net/netisr.c:1013 #14 0x80438fd7 in bce_intr (xsc=Variable xsc is not available. ) at /usr/src/sys/dev/bce/if_bce.c:6903 #15 0x808be8d4 in intr_event_execute_handlers (p=Variable p is not available. ) at /usr/src/sys/kern/kern_intr.c:1262 #16 0x808c0076 in ithread_loop (arg=0xfe00057424e0) at /usr/src/sys/kern/kern_intr.c:1275 #17 0x808bb9ef in fork_exit (callout=0x808bffd0 ithread_loop, arg=0xfe00057424e0, frame=0xff822fc26c40) at /usr/src/sys/kern/kern_fork.c:992 #18 0x80bc368e in fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:602 #19 0x in ?? () #20 0x in ?? () #21 0x0001 in ?? () #22 0x in ?? () #23 0x in ?? () #24 0x in ?? () #25 0x in ?? () #26 0x in ?? () #27 0x in ?? () #28 0x in ?? () #29 0x in ?? () #30
Re: Strange reboot since 9.1
On Thu, Mar 07, 2013 at 04:38:54PM +0100, Lo?c Blot wrote: Hi Marcelo, thanks. Here is a better trace: - kgdb /boot/kernel/kernel.symbols /var/crash/vmcore.11 GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type show copying to see the conditions. There is absolutely no warranty for GDB. Type show warranty for details. This GDB was configured as amd64-marcel-freebsd... Unread portion of the kernel message buffer: Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x0 fault code= supervisor read data, page not present instruction pointer = 0x20:0x80a84414 stack pointer = 0x28:0xff822fc267a0 frame pointer = 0x28:0xff822fc26830 code segment = base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 12 (irq265: bce0) trap number = 12 panic: page fault cpuid = 0 KDB: stack backtrace: #0 0x809208a6 at kdb_backtrace+0x66 #1 0x808ea8be at panic+0x1ce #2 0x80bd8240 at trap_fatal+0x290 #3 0x80bd857d at trap_pfault+0x1ed #4 0x80bd8b9e at trap+0x3ce #5 0x80bc315f at calltrap+0x8 #6 0x80a861d5 at udp_input+0x475 #7 0x80a043dc at ip_input+0xac #8 0x809adafb at netisr_dispatch_src+0x20b #9 0x809a35cd at ether_demux+0x14d #10 0x809a38a4 at ether_nh_input+0x1f4 #11 0x809adafb at netisr_dispatch_src+0x20b #12 0x80438fd7 at bce_intr+0x487 #13 0x808be8d4 at intr_event_execute_handlers+0x104 #14 0x808c0076 at ithread_loop+0xa6 #15 0x808bb9ef at fork_exit+0x11f #16 0x80bc368e at fork_trampoline+0xe Uptime: 27m20s Dumping 1265 out of 8162 MB:..2%..11%..21%..31%..41%..51%..61%..71%..81%..92% #0 doadump (textdump=Variable textdump is not available. ) at pcpu.h:224 224 pcpu.h: No such file or directory. in pcpu.h (kgdb) bt f #0 doadump (textdump=Variable textdump is not available. ) at pcpu.h:224 No locals. #1 0x808ea3a1 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:448 _ep = Variable _ep is not available. (kgdb) bt #0 doadump (textdump=Variable textdump is not available. ) at pcpu.h:224 #1 0x808ea3a1 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:448 #2 0x808ea897 in panic (fmt=0x1 Address 0x1 out of bounds) at /usr/src/sys/kern/kern_shutdown.c:636 #3 0x80bd8240 in trap_fatal (frame=0xc, eva=Variable eva is not available. ) at /usr/src/sys/amd64/amd64/trap.c:857 #4 0x80bd857d in trap_pfault (frame=0xff822fc266f0, usermode=0) at /usr/src/sys/amd64/amd64/trap.c:773 #5 0x80bd8b9e in trap (frame=0xff822fc266f0) at /usr/src/sys/amd64/amd64/trap.c:456 #6 0x80bc315f in calltrap () at /usr/src/sys/amd64/amd64/exception.S:228 #7 0x80a84414 in udp_append (inp=0xfe019e2a1000, ip=0xfe00444b6c80, n=0xfe00444b6c00, off=20, udp_in=0xff822fc268a0) at /usr/src/sys/netinet/udp_usrreq.c:252 #8 0x80a861d5 in udp_input (m=0xfe00444b6c00, off=Variable off is not available. ) at /usr/src/sys/netinet/udp_usrreq.c:618 #9 0x80a043dc in ip_input (m=0xfe00444b6c00) at /usr/src/sys/netinet/ip_input.c:760 #10 0x809adafb in netisr_dispatch_src (proto=1, source=Variable source is not available. ) at /usr/src/sys/net/netisr.c:1013 #11 0x809a35cd in ether_demux (ifp=0xfe00053fa000, m=0xfe00444b6c00) at /usr/src/sys/net/if_ethersubr.c:940 #12 0x809a38a4 in ether_nh_input (m=Variable m is not available. ) at /usr/src/sys/net/if_ethersubr.c:759 #13 0x809adafb in netisr_dispatch_src (proto=9, source=Variable source is not available. ) at /usr/src/sys/net/netisr.c:1013 #14 0x80438fd7 in bce_intr (xsc=Variable xsc is not available. ) at /usr/src/sys/dev/bce/if_bce.c:6903 #15 0x808be8d4 in intr_event_execute_handlers (p=Variable p is not available. ) at /usr/src/sys/kern/kern_intr.c:1262 #16 0x808c0076 in ithread_loop (arg=0xfe00057424e0) at /usr/src/sys/kern/kern_intr.c:1275 #17 0x808bb9ef in fork_exit (callout=0x808bffd0 ithread_loop, arg=0xfe00057424e0, frame=0xff822fc26c40) at /usr/src/sys/kern/kern_fork.c:992 #18 0x80bc368e in fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:602 #19 0x in ?? () #20 0x in ?? () #21 0x0001 in ?? () #22 0x in ?? () #23 0x in ?? () #24 0x in ?? () #25
Re: Strange reboot since 9.1
Here is pciconf -lbcv hostb0@pci0:0:0:0: class=0x06 card=0x02a51028 chip=0xd1308086 rev=0x11 hdr=0x00 vendor = 'Intel Corporation' device = 'Core Processor DMI' class = bridge subclass = HOST-PCI cap 05[60] = MSI supports 2 messages, vector masks cap 10[90] = PCI-Express 1 root port max data 128(128) link x4(x4) cap 01[e0] = powerspec 3 supports D0 D3 current D0 ecap 0001[100] = AER 1 0 fatal 0 non-fatal 0 corrected ecap 000d[150] = unknown 1 ecap 000b[160] = unknown 0 pcib1@pci0:0:3:0: class=0x060400 card=0x02a51028 chip=0xd1388086 rev=0x11 hdr=0x01 vendor = 'Intel Corporation' device = 'Core Processor PCI Express Root Port 1' class = bridge subclass = PCI-PCI cap 0d[40] = PCI Bridge card=0x02a51028 cap 05[60] = MSI supports 2 messages, vector masks cap 10[90] = PCI-Express 2 root port max data 256(256) link x8(x16) cap 01[e0] = powerspec 3 supports D0 D3 current D0 ecap 0001[100] = AER 1 0 fatal 0 non-fatal 0 corrected ecap 000d[150] = unknown 1 ecap 000b[160] = unknown 0 none0@pci0:0:8:0: class=0x088000 card=0x chip=0xd1558086 rev=0x11 hdr=0x00 vendor = 'Intel Corporation' device = 'Core Processor System Management Registers' class = base peripheral cap 10[40] = PCI-Express 2 root endpoint max data 128(128) link x0(x0) ecap 000b[100] = unknown 0 none1@pci0:0:8:1: class=0x088000 card=0x chip=0xd1568086 rev=0x11 hdr=0x00 vendor = 'Intel Corporation' device = 'Core Processor Semaphore and Scratchpad Registers' class = base peripheral cap 10[40] = PCI-Express 2 root endpoint max data 128(128) link x0(x0) ecap 000b[100] = unknown 0 none2@pci0:0:8:2: class=0x088000 card=0x chip=0xd1578086 rev=0x11 hdr=0x00 vendor = 'Intel Corporation' device = 'Core Processor System Control and Status Registers' class = base peripheral cap 10[40] = PCI-Express 2 root endpoint max data 128(128) link x0(x0) ecap 000b[100] = unknown 0 none3@pci0:0:8:3: class=0x088000 card=0x chip=0xd1588086 rev=0x11 hdr=0x00 vendor = 'Intel Corporation' device = 'Core Processor Miscellaneous Registers' class = base peripheral none4@pci0:0:16:0: class=0x088000 card=0x chip=0xd1508086 rev=0x11 hdr=0x00 vendor = 'Intel Corporation' device = 'Core Processor QPI Link' class = base peripheral none5@pci0:0:16:1: class=0x088000 card=0x chip=0xd1518086 rev=0x11 hdr=0x00 vendor = 'Intel Corporation' device = 'Core Processor QPI Routing and Protocol Registers' class = base peripheral ehci0@pci0:0:26:0: class=0x0c0320 card=0x02a51028 chip=0x3b3c8086 rev=0x05 hdr=0x00 vendor = 'Intel Corporation' device = '5 Series/3400 Series Chipset USB2 Enhanced Host Controller' class = serial bus subclass = USB bar [10] = type Memory, range 32, base 0xdf0fa000, size 1024, enabled cap 01[50] = powerspec 2 supports D0 D3 current D0 cap 0a[58] = EHCI Debug Port at offset 0xa0 in map 0x14 cap 13[98] = PCI Advanced Features: FLR TP pcib2@pci0:0:28:0: class=0x060400 card=0x02a51028 chip=0x3b428086 rev=0x05 hdr=0x01 vendor = 'Intel Corporation' device = '5 Series/3400 Series Chipset PCI Express Root Port 1' class = bridge subclass = PCI-PCI cap 10[40] = PCI-Express 2 root port max data 128(128) link x4(x4) cap 05[80] = MSI supports 1 message cap 0d[90] = PCI Bridge card=0x02a51028 cap 01[a0] = powerspec 2 supports D0 D3 current D0 ehci1@pci0:0:29:0: class=0x0c0320 card=0x02a51028 chip=0x3b348086 rev=0x05 hdr=0x00 vendor = 'Intel Corporation' device = '5 Series/3400 Series Chipset USB2 Enhanced Host Controller' class = serial bus subclass = USB bar [10] = type Memory, range 32, base 0xdf0fc000, size 1024, enabled cap 01[50] = powerspec 2 supports D0 D3 current D0 cap 0a[58] = EHCI Debug Port at offset 0xa0 in map 0x14 cap 13[98] = PCI Advanced Features: FLR TP pcib3@pci0:0:30:0: class=0x060401 card=0x02a51028 chip=0x244e8086 rev=0xa5 hdr=0x01 vendor = 'Intel Corporation' device = '82801 PCI Bridge' class = bridge subclass = PCI-PCI cap 0d[50] = PCI Bridge card=0x02a51028 isab0@pci0:0:31:0: class=0x060100 card=0x02a51028 chip=0x3b148086 rev=0x05 hdr=0x00 vendor = 'Intel Corporation' device = '3400 Series Chipset LPC Interface Controller' class = bridge subclass = PCI-ISA cap 09[e0] = vendor (length 16) Intel cap 1 version 1 ahci0@pci0:0:31:2: class=0x010601 card=0x02a51028 chip=0x3b228086 rev=0x05 hdr=0x00 vendor = 'Intel Corporation' device = '5 Series/3400 Series Chipset 6 port SATA AHCI Controller' class = mass storage subclass = SATA bar [10]
Re: gdb broken on 9.1/amd64?
(Please Cc: me on reply.) On Wed, Mar 06, 2013 at 10:50:59PM +0200, Konstantin Belousov wrote: On Wed, Mar 06, 2013 at 07:02:22PM +0100, Jeremie Le Hen wrote: root@ingwe:~ # gdb -p 521 Try to specify the executable binary on the command line. It works better indeed! Now I can get a backtrace for sleep(1), but I am experiencing difficulties to debug OpenSTMPD: # gdb /usr/local/sbin/smtpd 25442 GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type show copying to see the conditions. There is absolutely no warranty for GDB. Type show warranty for details. This GDB was configured as amd64-marcel-freebsd... Attaching to program: /usr/local/sbin/smtpd, process 25442 Reading symbols from /usr/local/lib/libsqlite3.so.8...done. Loaded symbols for /usr/local/lib/libsqlite3.so.8 Reading symbols from /usr/local/lib/libevent-1.4.so.4...done. Loaded symbols for /usr/local/lib/libevent-1.4.so.4 Reading symbols from /lib/libcrypto.so.6...done. Loaded symbols for /lib/libcrypto.so.6 Reading symbols from /usr/lib/libssl.so.6...done. Loaded symbols for /usr/lib/libssl.so.6 Reading symbols from /lib/libz.so.6...done. Loaded symbols for /lib/libz.so.6 Reading symbols from /lib/libutil.so.9...done. Loaded symbols for /lib/libutil.so.9 Reading symbols from /lib/libcrypt.so.5...done. Loaded symbols for /lib/libcrypt.so.5 Reading symbols from /usr/lib/libpam.so.5...done. Loaded symbols for /usr/lib/libpam.so.5 Reading symbols from /lib/libc.so.7...done. Loaded symbols for /lib/libc.so.7 Reading symbols from /lib/libthr.so.3...done. Error while reading shared library symbols: Cannot get thread info: invalid key Reading symbols from /libexec/ld-elf.so.1...done. Loaded symbols for /libexec/ld-elf.so.1 0x00080ce4281c in kevent () from /lib/libc.so.7 (gdb) bt #0 0x00080ce4281c in kevent () from /lib/libc.so.7 #1 0x000803104070 in kq_dispatch () from /usr/local/lib/libevent-1.4.so.4 #2 0x0008030f802a in event_base_loop () from /usr/local/lib/libevent-1.4.so.4 #3 0x0042fd7f in smtp () at smtp.c:295 #4 0x00436c0f in fork_peers () at smtpd.c:983 #5 0x0043686d in main (argc=0, argv=0x7fff77e8) at smtpd.c:904 (gdb) c Continuing. no thread to satisfy query 0x00080ce4281c in kevent () from /lib/libc.so.7 The problem is that the process seems hung here, despite the continue. When I am connecting to it, I say HELO but I don't get any reply if gdb(1) is attached. Also, the following might be relevant? (gdb) thread apply all c Cannot get thread info: invalid key Any idea? Thanks. Cheers, -- Jeremie Le Hen Scientists say the world is made up of Protons, Neutrons and Electrons. They forgot to mention Morons. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS stalls -- and maybe we should be talking about defaults?
- Original Message - From: Karl Denninger k...@denninger.net Where I am right now is this: 1. I *CANNOT* reproduce the spins on the test machine with Postgres stopped in any way. Even with multiple ZFS send/recv copies going on and the load average north of 20 (due to all the geli threads), the system doesn't stall or produce any notable pauses in throughput. Nor does the system RAM allocation get driven hard enough to force paging. This is with NO tuning hacks in /boot/loader.conf. I/O performance is both stable and solid. 2. WITH Postgres running as a connected hot spare (identical to the production machine), allocating ~1.5G of shared, wired memory, running the same synthetic workload in (1) above I am getting SMALL versions of the misbehavior. However, while system RAM allocation gets driven pretty hard and reaches down toward 100MB in some instances it doesn't get driven hard enough to allocate swap. The burstiness is very evident in the iostat figures with spates getting into the single digit MB/sec range from time to time but it's not enough to drive the system to a full-on stall. There's pretty-clearly a bad interaction here between Postgres wiring memory and the ARC, when the latter is left alone and allowed to do what it wants. I'm continuing to work on replicating this on the test machine... just not completely there yet. Another possibility to consider is how postgres uses the FS. For example does is request sync IO in ways not present in the system without it which is causing the FS and possibly underlying disk system to behave differently. One other options to test, just to rule it out is what happens if you use BSD scheduler instead of ULE? Regards Steve This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmas...@multiplay.co.uk. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS stalls -- and maybe we should be talking about defaults?
On 3/7/2013 12:57 PM, Steven Hartland wrote: - Original Message - From: Karl Denninger k...@denninger.net Where I am right now is this: 1. I *CANNOT* reproduce the spins on the test machine with Postgres stopped in any way. Even with multiple ZFS send/recv copies going on and the load average north of 20 (due to all the geli threads), the system doesn't stall or produce any notable pauses in throughput. Nor does the system RAM allocation get driven hard enough to force paging. This is with NO tuning hacks in /boot/loader.conf. I/O performance is both stable and solid. 2. WITH Postgres running as a connected hot spare (identical to the production machine), allocating ~1.5G of shared, wired memory, running the same synthetic workload in (1) above I am getting SMALL versions of the misbehavior. However, while system RAM allocation gets driven pretty hard and reaches down toward 100MB in some instances it doesn't get driven hard enough to allocate swap. The burstiness is very evident in the iostat figures with spates getting into the single digit MB/sec range from time to time but it's not enough to drive the system to a full-on stall. There's pretty-clearly a bad interaction here between Postgres wiring memory and the ARC, when the latter is left alone and allowed to do what it wants. I'm continuing to work on replicating this on the test machine... just not completely there yet. Another possibility to consider is how postgres uses the FS. For example does is request sync IO in ways not present in the system without it which is causing the FS and possibly underlying disk system to behave differently. That's possible but not terribly-likely in this particular instance. The reason is that I ran into this with the Postgres data store on a UFS volume BEFORE I converted it. Now it's on the ZFS pool (with recordsize=8k as recommended for that filesystem) but when I first ran into this it was on a separate UFS filesystem (which is where it had resided for 2+ years without incident), so unless the Postgres filesystem use on a UFS volume would give ZFS fits it's unlikely to be involved. One other options to test, just to rule it out is what happens if you use BSD scheduler instead of ULE? Regards Steve I will test that but first I have to get the test machine to reliably stall so I know I'm not chasing my tail. -- -- Karl Denninger /The Market Ticker ®/ http://market-ticker.org Cuda Systems LLC ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Sanity Check on Mac Mini
On 7 March 2013, at 06:42, Richard Kuhns r...@wintek.com wrote: On 03/07/13 01:59, Doug Hardie wrote: I have a new Mac Mini and have encountered the same problem reported last year by Richard Kuhns. YongHyeon PYUN provided some patches to the kernel that resolved the problem. However, without an internet connection its a bit tricky to get them into the system. Here is the approach I believe will work, but wanted to check first before I really mess things up. 1. Downloaded from current today via svnweb.freebsd.org: sys/dev/bge/if_bgereg.h sys/dev/bge/if_bge.c sys/dev/mii/brgphy.c I believe the patches are incorporated in today's versions. The comments indicate such. Thus I don't need to apply the original supplied patch. 2. Put those on a flash drive. 3. Install 9.1 release from flash drive onto the Mini disk. Have to include the system source. 4. Copy the files from 1 above from flash over the files on the disk. 5. Rebuild the kernel and install it. Thanks, -- Doug That's worked for me 3 times now. Thanks. Well, I got 9.1 Release installed, but it won't boot from the internal disk. It doesn't see the disk as bootable. I installed using the entire disk for FreeBSD. I used the i386 release. Perhaps I need to switch to the amd64 release? -- Doug ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS stalls -- and maybe we should be talking about defaults?
- Original Message - From: Karl Denninger k...@denninger.net To: freebsd-stable@freebsd.org Sent: Thursday, March 07, 2013 7:07 PM Subject: Re: ZFS stalls -- and maybe we should be talking about defaults? On 3/7/2013 12:57 PM, Steven Hartland wrote: - Original Message - From: Karl Denninger k...@denninger.net Where I am right now is this: 1. I *CANNOT* reproduce the spins on the test machine with Postgres stopped in any way. Even with multiple ZFS send/recv copies going on and the load average north of 20 (due to all the geli threads), the system doesn't stall or produce any notable pauses in throughput. Nor does the system RAM allocation get driven hard enough to force paging. This is with NO tuning hacks in /boot/loader.conf. I/O performance is both stable and solid. 2. WITH Postgres running as a connected hot spare (identical to the production machine), allocating ~1.5G of shared, wired memory, running the same synthetic workload in (1) above I am getting SMALL versions of the misbehavior. However, while system RAM allocation gets driven pretty hard and reaches down toward 100MB in some instances it doesn't get driven hard enough to allocate swap. The burstiness is very evident in the iostat figures with spates getting into the single digit MB/sec range from time to time but it's not enough to drive the system to a full-on stall. There's pretty-clearly a bad interaction here between Postgres wiring memory and the ARC, when the latter is left alone and allowed to do what it wants. I'm continuing to work on replicating this on the test machine... just not completely there yet. Another possibility to consider is how postgres uses the FS. For example does is request sync IO in ways not present in the system without it which is causing the FS and possibly underlying disk system to behave differently. That's possible but not terribly-likely in this particular instance. The reason is that I ran into this with the Postgres data store on a UFS volume BEFORE I converted it. Now it's on the ZFS pool (with recordsize=8k as recommended for that filesystem) but when I first ran into this it was on a separate UFS filesystem (which is where it had resided for 2+ years without incident), so unless the Postgres filesystem use on a UFS volume would give ZFS fits it's unlikely to be involved. I hate to say it, but that sounds very familiar to something we experienced with a machine here which was running high numbers of rrd updates. Again we had the issue on UFS and saw the same thing when we moved the ZFS. I'll leave that there as to not derail the investigation with what could be totally irrelavent info, but it may prove an interesting data point later. There are obvious common low level points between UFS and ZFS which may be the cause. One area which springs to mind is device bio ordering and barriers which could well be impacted by sync IO requests independent of the FS in use. One other options to test, just to rule it out is what happens if you use BSD scheduler instead of ULE? I will test that but first I have to get the test machine to reliably stall so I know I'm not chasing my tail. Very sensible. Assuming you can reproduce it, one thing that might be interesting to try is to eliminate all sync IO. I'm not sure if there are options in Postgres to do this via configuration or if it would require editing the code but this could reduce the problem space. If disabling sync IO eliminated the problem it would go a long way to proving it isn't the IO volume or pattern per say but instead related to the sync nature of said IO. Regards Steve This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmas...@multiplay.co.uk. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS stalls -- and maybe we should be talking about defaults?
On 3/7/2013 1:27 PM, Steven Hartland wrote: - Original Message - From: Karl Denninger k...@denninger.net To: freebsd-stable@freebsd.org Sent: Thursday, March 07, 2013 7:07 PM Subject: Re: ZFS stalls -- and maybe we should be talking about defaults? On 3/7/2013 12:57 PM, Steven Hartland wrote: - Original Message - From: Karl Denninger k...@denninger.net Where I am right now is this: 1. I *CANNOT* reproduce the spins on the test machine with Postgres stopped in any way. Even with multiple ZFS send/recv copies going on and the load average north of 20 (due to all the geli threads), the system doesn't stall or produce any notable pauses in throughput. Nor does the system RAM allocation get driven hard enough to force paging. This is with NO tuning hacks in /boot/loader.conf. I/O performance is both stable and solid. 2. WITH Postgres running as a connected hot spare (identical to the production machine), allocating ~1.5G of shared, wired memory, running the same synthetic workload in (1) above I am getting SMALL versions of the misbehavior. However, while system RAM allocation gets driven pretty hard and reaches down toward 100MB in some instances it doesn't get driven hard enough to allocate swap. The burstiness is very evident in the iostat figures with spates getting into the single digit MB/sec range from time to time but it's not enough to drive the system to a full-on stall. There's pretty-clearly a bad interaction here between Postgres wiring memory and the ARC, when the latter is left alone and allowed to do what it wants. I'm continuing to work on replicating this on the test machine... just not completely there yet. Another possibility to consider is how postgres uses the FS. For example does is request sync IO in ways not present in the system without it which is causing the FS and possibly underlying disk system to behave differently. That's possible but not terribly-likely in this particular instance. The reason is that I ran into this with the Postgres data store on a UFS volume BEFORE I converted it. Now it's on the ZFS pool (with recordsize=8k as recommended for that filesystem) but when I first ran into this it was on a separate UFS filesystem (which is where it had resided for 2+ years without incident), so unless the Postgres filesystem use on a UFS volume would give ZFS fits it's unlikely to be involved. I hate to say it, but that sounds very familiar to something we experienced with a machine here which was running high numbers of rrd updates. Again we had the issue on UFS and saw the same thing when we moved the ZFS. I'll leave that there as to not derail the investigation with what could be totally irrelavent info, but it may prove an interesting data point later. There are obvious common low level points between UFS and ZFS which may be the cause. One area which springs to mind is device bio ordering and barriers which could well be impacted by sync IO requests independent of the FS in use. One other options to test, just to rule it out is what happens if you use BSD scheduler instead of ULE? I will test that but first I have to get the test machine to reliably stall so I know I'm not chasing my tail. Very sensible. Assuming you can reproduce it, one thing that might be interesting to try is to eliminate all sync IO. I'm not sure if there are options in Postgres to do this via configuration or if it would require editing the code but this could reduce the problem space. If disabling sync IO eliminated the problem it would go a long way to proving it isn't the IO volume or pattern per say but instead related to the sync nature of said IO. That can be turned off in the Postgres configuration. For obvious reasons it's a very bad idea but it is able to be disabled without actually changing the code itself. I don't know if it shuts off ALL sync requests, but the documentation says it does. It's interesting that you ran into this with RRD going; the machine in question does pull RRD data for Cacti, but it's such a small piece of the total load profile that I considered it immaterial. It might not be. -- -- Karl Denninger /The Market Ticker ®/ http://market-ticker.org Cuda Systems LLC ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Sanity Check on Mac Mini
On Thu, Mar 7, 2013 at 11:10 AM, Doug Hardie bc...@lafn.org wrote: On 7 March 2013, at 06:42, Richard Kuhns r...@wintek.com wrote: On 03/07/13 01:59, Doug Hardie wrote: I have a new Mac Mini and have encountered the same problem reported last year by Richard Kuhns. YongHyeon PYUN provided some patches to the kernel that resolved the problem. However, without an internet connection its a bit tricky to get them into the system. Here is the approach I believe will work, but wanted to check first before I really mess things up. 1. Downloaded from current today via svnweb.freebsd.org: sys/dev/bge/if_bgereg.h sys/dev/bge/if_bge.c sys/dev/mii/brgphy.c I believe the patches are incorporated in today's versions. The comments indicate such. Thus I don't need to apply the original supplied patch. 2. Put those on a flash drive. 3. Install 9.1 release from flash drive onto the Mini disk. Have to include the system source. 4. Copy the files from 1 above from flash over the files on the disk. 5. Rebuild the kernel and install it. Thanks, -- Doug That's worked for me 3 times now. Thanks. Well, I got 9.1 Release installed, but it won't boot from the internal disk. It doesn't see the disk as bootable. I installed using the entire disk for FreeBSD. I used the i386 release. Perhaps I need to switch to the amd64 release? I would generally recommend using the amd64 release, but it may not get your system to boot. How is your disk partitioned? GPT? Some BIOSes are broken and assume that a GPT formatted disk is UEFI and will not recognize them if they lack the UEFI boot partition. UEFI boot is a current project that seems likely to reach head in the fairly near future, but it's not possible now. You may be able to tweak your BIOS to get it to work or you may have to install using the traditional partitioning system. The installer defaults to GPT, but can create either. I have such a system (ThinkPad T520) and I have two disks... one that came with the system and containing Windows, and my GPT formatted FreeBSD disk. I wrote a FreeBSD BootEasy boot into the MBR of the Windows disk and it CAN boot the GPT disk just fine. Not ideal for most, but it works well for me -- R. Kevin Oberman, Network Engineer E-mail: rkober...@gmail.com ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS stalls -- and maybe we should be talking about defaults?
- Original Message - From: Karl Denninger k...@denninger.net I will test that but first I have to get the test machine to reliably stall so I know I'm not chasing my tail. Very sensible. Assuming you can reproduce it, one thing that might be interesting to try is to eliminate all sync IO. I'm not sure if there are options in Postgres to do this via configuration or if it would require editing the code but this could reduce the problem space. If disabling sync IO eliminated the problem it would go a long way to proving it isn't the IO volume or pattern per say but instead related to the sync nature of said IO. That can be turned off in the Postgres configuration. For obvious reasons it's a very bad idea but it is able to be disabled without actually changing the code itself. I don't know if it shuts off ALL sync requests, but the documentation says it does. It's interesting that you ran into this with RRD going; the machine in question does pull RRD data for Cacti, but it's such a small piece of the total load profile that I considered it immaterial. It might not be. We never did get to the bottom of it but did come up with a fix. Instead of using straight RRD interaction we switched all out code to use rrdcached and put the files on SSD based pool, never had an issue since. Regards Steve This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmas...@multiplay.co.uk. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Sanity Check on Mac Mini
On 7 March 2013, at 11:57, Kevin Oberman rkober...@gmail.com wrote: On Thu, Mar 7, 2013 at 11:10 AM, Doug Hardie bc...@lafn.org wrote: On 7 March 2013, at 06:42, Richard Kuhns r...@wintek.com wrote: On 03/07/13 01:59, Doug Hardie wrote: I have a new Mac Mini and have encountered the same problem reported last year by Richard Kuhns. YongHyeon PYUN provided some patches to the kernel that resolved the problem. However, without an internet connection its a bit tricky to get them into the system. Here is the approach I believe will work, but wanted to check first before I really mess things up. 1. Downloaded from current today via svnweb.freebsd.org: sys/dev/bge/if_bgereg.h sys/dev/bge/if_bge.c sys/dev/mii/brgphy.c I believe the patches are incorporated in today's versions. The comments indicate such. Thus I don't need to apply the original supplied patch. 2. Put those on a flash drive. 3. Install 9.1 release from flash drive onto the Mini disk. Have to include the system source. 4. Copy the files from 1 above from flash over the files on the disk. 5. Rebuild the kernel and install it. Thanks, -- Doug That's worked for me 3 times now. Thanks. Well, I got 9.1 Release installed, but it won't boot from the internal disk. It doesn't see the disk as bootable. I installed using the entire disk for FreeBSD. I used the i386 release. Perhaps I need to switch to the amd64 release? I would generally recommend using the amd64 release, but it may not get your system to boot. How is your disk partitioned? GPT? Some BIOSes are broken and assume that a GPT formatted disk is UEFI and will not recognize them if they lack the UEFI boot partition. UEFI boot is a current project that seems likely to reach head in the fairly near future, but it's not possible now. No idea what the default partitioning is for BSDInstall. However the Mini is only EFI or UFEI with some fallbacks although the comments I find in the web indicate that different models have different fallbacks. One comment indicates that an older unit will boot if its MBR partitioning. I don't know if the new installer supports that or not. You may be able to tweak your BIOS to get it to work or you may have to install using the traditional partitioning system. The installer defaults to GPT, but can create either. I have such a system (ThinkPad T520) and I have two disks... one that came with the system and containing Windows, and my GPT formatted FreeBSD disk. I wrote a FreeBSD BootEasy boot into the MBR of the Windows disk and it CAN boot the GPT disk just fine. Not ideal for most, but it works well for me Based on a comment I say, waiting till the empty folder icon appears and then plugging in the install memstick causes the mini to boot from disk. That just downright weird, but it works. I could live with that, but this is an unattended server and would experience some down time if I am not there when there is a power failure. I just found some instructions for using MBR with bsdinstall, but given there is an effort to create a UEFI boot which I suspect would expect to find the GPT boot partition, perhaps I should just go with the memstick approach? -- Doug ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Sanity Check on Mac Mini
On Thu, 7 Mar 2013 14:18:23 -0800 Doug Hardie bc...@lafn.org wrote: On 7 March 2013, at 11:57, Kevin Oberman rkober...@gmail.com wrote: On Thu, Mar 7, 2013 at 11:10 AM, Doug Hardie bc...@lafn.org wrote: On 7 March 2013, at 06:42, Richard Kuhns r...@wintek.com wrote: On 03/07/13 01:59, Doug Hardie wrote: I have a new Mac Mini and have encountered the same problem reported last year by Richard Kuhns. YongHyeon PYUN provided some patches to the kernel that resolved the problem. However, without an internet connection its a bit tricky to get them into the system. Here is the approach I believe will work, but wanted to check first before I really mess things up. 1. Downloaded from current today via svnweb.freebsd.org: sys/dev/bge/if_bgereg.h sys/dev/bge/if_bge.c sys/dev/mii/brgphy.c I believe the patches are incorporated in today's versions. The comments indicate such. Thus I don't need to apply the original supplied patch. 2. Put those on a flash drive. 3. Install 9.1 release from flash drive onto the Mini disk. Have to include the system source. 4. Copy the files from 1 above from flash over the files on the disk. 5. Rebuild the kernel and install it. Thanks, -- Doug That's worked for me 3 times now. Thanks. Well, I got 9.1 Release installed, but it won't boot from the internal disk. It doesn't see the disk as bootable. I installed using the entire disk for FreeBSD. I used the i386 release. Perhaps I need to switch to the amd64 release? I would generally recommend using the amd64 release, but it may not get your system to boot. How is your disk partitioned? GPT? Some BIOSes are broken and assume that a GPT formatted disk is UEFI and will not recognize them if they lack the UEFI boot partition. UEFI boot is a current project that seems likely to reach head in the fairly near future, but it's not possible now. No idea what the default partitioning is for BSDInstall. However the Mini is only EFI or UFEI with some fallbacks although the comments I find in the web indicate that different models have different fallbacks. One comment indicates that an older unit will boot if its MBR partitioning. I don't know if the new installer supports that or not. You may be able to tweak your BIOS to get it to work or you may have to install using the traditional partitioning system. The installer defaults to GPT, but can create either. I have such a system (ThinkPad T520) and I have two disks... one that came with the system and containing Windows, and my GPT formatted FreeBSD disk. I wrote a FreeBSD BootEasy boot into the MBR of the Windows disk and it CAN boot the GPT disk just fine. Not ideal for most, but it works well for me Based on a comment I say, waiting till the empty folder icon appears and then plugging in the install memstick causes the mini to boot from disk. That just downright weird, but it works. I could live with that, but this is an unattended server and would experience some down time if I am not there when there is a power failure. I just found some instructions for using MBR with bsdinstall, but given there is an effort to create a UEFI boot which I suspect would expect to find the GPT boot partition, perhaps I should just go with the memstick approach? Hello, If you still have a drive with OS X on it, you may have some luck with OS X's bless command: https://developer.apple.com/library/mac/#documentation/Darwin/Reference/Manpages/man8/bless.8.html I got a late 2012 mac mini to boot FreeBSD 9.1 (AMD64) from a hard drive using 'bless' (unfortunately I don't remember the exact command line parameters I used). If you're looking to dual boot, the only luck I had (without resorting to using third party software like rEFIt) was to put the OS's on different drives and install FreeBSD using MBR on the second drive. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Sanity Check on Mac Mini
On Thu, Mar 7, 2013 at 2:18 PM, Doug Hardie bc...@lafn.org wrote: On 7 March 2013, at 11:57, Kevin Oberman rkober...@gmail.com wrote: On Thu, Mar 7, 2013 at 11:10 AM, Doug Hardie bc...@lafn.org wrote: On 7 March 2013, at 06:42, Richard Kuhns r...@wintek.com wrote: On 03/07/13 01:59, Doug Hardie wrote: I have a new Mac Mini and have encountered the same problem reported last year by Richard Kuhns. YongHyeon PYUN provided some patches to the kernel that resolved the problem. However, without an internet connection its a bit tricky to get them into the system. Here is the approach I believe will work, but wanted to check first before I really mess things up. 1. Downloaded from current today via svnweb.freebsd.org: sys/dev/bge/if_bgereg.h sys/dev/bge/if_bge.c sys/dev/mii/brgphy.c I believe the patches are incorporated in today's versions. The comments indicate such. Thus I don't need to apply the original supplied patch. 2. Put those on a flash drive. 3. Install 9.1 release from flash drive onto the Mini disk. Have to include the system source. 4. Copy the files from 1 above from flash over the files on the disk. 5. Rebuild the kernel and install it. Thanks, -- Doug That's worked for me 3 times now. Thanks. Well, I got 9.1 Release installed, but it won't boot from the internal disk. It doesn't see the disk as bootable. I installed using the entire disk for FreeBSD. I used the i386 release. Perhaps I need to switch to the amd64 release? I would generally recommend using the amd64 release, but it may not get your system to boot. How is your disk partitioned? GPT? Some BIOSes are broken and assume that a GPT formatted disk is UEFI and will not recognize them if they lack the UEFI boot partition. UEFI boot is a current project that seems likely to reach head in the fairly near future, but it's not possible now. No idea what the default partitioning is for BSDInstall. However the Mini is only EFI or UFEI with some fallbacks although the comments I find in the web indicate that different models have different fallbacks. One comment indicates that an older unit will boot if its MBR partitioning. I don't know if the new installer supports that or not. You may be able to tweak your BIOS to get it to work or you may have to install using the traditional partitioning system. The installer defaults to GPT, but can create either. I have such a system (ThinkPad T520) and I have two disks... one that came with the system and containing Windows, and my GPT formatted FreeBSD disk. I wrote a FreeBSD BootEasy boot into the MBR of the Windows disk and it CAN boot the GPT disk just fine. Not ideal for most, but it works well for me Based on a comment I say, waiting till the empty folder icon appears and then plugging in the install memstick causes the mini to boot from disk. That just downright weird, but it works. I could live with that, but this is an unattended server and would experience some down time if I am not there when there is a power failure. I just found some instructions for using MBR with bsdinstall, but given there is an effort to create a UEFI boot which I suspect would expect to find the GPT boot partition, perhaps I should just go with the memstick approach To be cleat, you just insert the thumb drive and the hard drive boots? That IS weird! Or do you get the BootEasy prompt for the partition/disk you want to boot? If the latter, the system is processing the MBR from the thumb drive and using that to boot the GPT disk. I am not an expert on EFI or UEFI. I know EFI is older and UEFI replaced it about five years ago. I am not entirely clear on the differences, but I assume a newer Mac Mini would be UEFI. My experience with boot loaders is, to put it politely, ancient. I mean pre-BIOS. I have, at best, a limited understanding of BIOS booting and not much on UEFI, but I know that UEFI can boot devices using the old PC partitioning system as well as GUID (GPT) partitioned ones. The Wikipedia article on UEFI is enlightening. -- R. Kevin Oberman, Network Engineer E-mail: rkober...@gmail.com ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Strange reboot since 9.1
On Thu, Mar 07, 2013 at 08:38:27AM -0800, Jeremy Chadwick wrote: On Thu, Mar 07, 2013 at 04:38:54PM +0100, Lo?c Blot wrote: Hi Marcelo, thanks. Here is a better trace: - kgdb /boot/kernel/kernel.symbols /var/crash/vmcore.11 GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type show copying to see the conditions. There is absolutely no warranty for GDB. Type show warranty for details. This GDB was configured as amd64-marcel-freebsd... Unread portion of the kernel message buffer: Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x0 fault code = supervisor read data, page not present instruction pointer = 0x20:0x80a84414 stack pointer = 0x28:0xff822fc267a0 frame pointer = 0x28:0xff822fc26830 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 12 (irq265: bce0) trap number = 12 panic: page fault cpuid = 0 KDB: stack backtrace: #0 0x809208a6 at kdb_backtrace+0x66 #1 0x808ea8be at panic+0x1ce #2 0x80bd8240 at trap_fatal+0x290 #3 0x80bd857d at trap_pfault+0x1ed #4 0x80bd8b9e at trap+0x3ce #5 0x80bc315f at calltrap+0x8 #6 0x80a861d5 at udp_input+0x475 #7 0x80a043dc at ip_input+0xac #8 0x809adafb at netisr_dispatch_src+0x20b #9 0x809a35cd at ether_demux+0x14d #10 0x809a38a4 at ether_nh_input+0x1f4 #11 0x809adafb at netisr_dispatch_src+0x20b #12 0x80438fd7 at bce_intr+0x487 #13 0x808be8d4 at intr_event_execute_handlers+0x104 #14 0x808c0076 at ithread_loop+0xa6 #15 0x808bb9ef at fork_exit+0x11f #16 0x80bc368e at fork_trampoline+0xe Uptime: 27m20s Dumping 1265 out of 8162 MB:..2%..11%..21%..31%..41%..51%..61%..71%..81%..92% #0 doadump (textdump=Variable textdump is not available. ) at pcpu.h:224 224 pcpu.h: No such file or directory. in pcpu.h (kgdb) bt f #0 doadump (textdump=Variable textdump is not available. ) at pcpu.h:224 No locals. #1 0x808ea3a1 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:448 _ep = Variable _ep is not available. (kgdb) bt #0 doadump (textdump=Variable textdump is not available. ) at pcpu.h:224 #1 0x808ea3a1 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:448 #2 0x808ea897 in panic (fmt=0x1 Address 0x1 out of bounds) at /usr/src/sys/kern/kern_shutdown.c:636 #3 0x80bd8240 in trap_fatal (frame=0xc, eva=Variable eva is not available. ) at /usr/src/sys/amd64/amd64/trap.c:857 #4 0x80bd857d in trap_pfault (frame=0xff822fc266f0, usermode=0) at /usr/src/sys/amd64/amd64/trap.c:773 #5 0x80bd8b9e in trap (frame=0xff822fc266f0) at /usr/src/sys/amd64/amd64/trap.c:456 #6 0x80bc315f in calltrap () at /usr/src/sys/amd64/amd64/exception.S:228 #7 0x80a84414 in udp_append (inp=0xfe019e2a1000, ip=0xfe00444b6c80, n=0xfe00444b6c00, off=20, udp_in=0xff822fc268a0) at /usr/src/sys/netinet/udp_usrreq.c:252 #8 0x80a861d5 in udp_input (m=0xfe00444b6c00, off=Variable off is not available. ) at /usr/src/sys/netinet/udp_usrreq.c:618 #9 0x80a043dc in ip_input (m=0xfe00444b6c00) at /usr/src/sys/netinet/ip_input.c:760 #10 0x809adafb in netisr_dispatch_src (proto=1, source=Variable source is not available. ) at /usr/src/sys/net/netisr.c:1013 #11 0x809a35cd in ether_demux (ifp=0xfe00053fa000, m=0xfe00444b6c00) at /usr/src/sys/net/if_ethersubr.c:940 #12 0x809a38a4 in ether_nh_input (m=Variable m is not available. ) at /usr/src/sys/net/if_ethersubr.c:759 #13 0x809adafb in netisr_dispatch_src (proto=9, source=Variable source is not available. ) at /usr/src/sys/net/netisr.c:1013 #14 0x80438fd7 in bce_intr (xsc=Variable xsc is not available. ) at /usr/src/sys/dev/bce/if_bce.c:6903 #15 0x808be8d4 in intr_event_execute_handlers (p=Variable p is not available. ) at /usr/src/sys/kern/kern_intr.c:1262 #16 0x808c0076 in ithread_loop (arg=0xfe00057424e0) at /usr/src/sys/kern/kern_intr.c:1275 #17 0x808bb9ef in fork_exit (callout=0x808bffd0 ithread_loop, arg=0xfe00057424e0, frame=0xff822fc26c40) at /usr/src/sys/kern/kern_fork.c:992 #18 0x80bc368e in fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:602 #19 0x in ?? ()