Re: VM errors on -current
On 2015-11-06 20:02, Michael Butler wrote: > Is anyone else seeing VM errors from X on -current with an old Intel > display adapter? > > agp0: on vgapci0 > agp0: aperture size is 256M, detected 7932k stolen memory > > [ .. snip .. ] > > kernel: vm_fault: pager read error, pid 1103 (Xorg) <- ?? > kdm[993]: X server for display :0 terminated unexpectedly > kernel: pid 1103 (Xorg), uid 0: exited on signal 6 (core dumped) > devd: notify_clients: send() failed; dropping unresponsive client > > imb > > ___ > freebsd-current@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org" > That pager error may suggest a failing disk where it was unable to read back some swapped out data. Check /var/log/messages for disk errors -- Allan Jude signature.asc Description: OpenPGP digital signature
Re: VM errors on -current
On 11/06/15 20:04, Allan Jude wrote: > On 2015-11-06 20:02, Michael Butler wrote: >> Is anyone else seeing VM errors from X on -current with an old Intel >> display adapter? >> >> agp0: on vgapci0 >> agp0: aperture size is 256M, detected 7932k stolen memory >> >> [ .. snip .. ] >> >> kernel: vm_fault: pager read error, pid 1103 (Xorg) <- ?? >> kdm[993]: X server for display :0 terminated unexpectedly >> kernel: pid 1103 (Xorg), uid 0: exited on signal 6 (core dumped) >> devd: notify_clients: send() failed; dropping unresponsive client >> >> imb >> >> ___ >> freebsd-current@freebsd.org mailing list >> https://lists.freebsd.org/mailman/listinfo/freebsd-current >> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org" >> > > That pager error may suggest a failing disk where it was unable to read > back some swapped out data. Check /var/log/messages for disk errors No swap in use - this is a video-RAM paging issue, imb ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
VM errors on -current
Is anyone else seeing VM errors from X on -current with an old Intel display adapter? agp0: on vgapci0 agp0: aperture size is 256M, detected 7932k stolen memory [ .. snip .. ] kernel: vm_fault: pager read error, pid 1103 (Xorg) <- ?? kdm[993]: X server for display :0 terminated unexpectedly kernel: pid 1103 (Xorg), uid 0: exited on signal 6 (core dumped) devd: notify_clients: send() failed; dropping unresponsive client imb ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Panic with PF on current.
> On 06 Nov 2015, at 01:12, Daniel Dettlaffwrote: > I have interesting verbose output with backtrace (not panic) from one of my > VMs: http://s.verknowsys.com/f0d457ce9420399baaf531012c33eb81.png > It’s triggered by autostarting jail on bridged vlan interface (no VNET > feature enabled) > This seems to be a lock ordering issue between ZFS and devfs. It appears to have been around for a while too. I found PR 142878 with the same witness warning. Regards, Kristof ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: nanoBSD: buidlworld failure
On Fri, 6 Nov 2015 00:43:35 -0800 NGie Cooperwrote: > > On Nov 5, 2015, at 22:51, O. Hartmann wrote: > > > > Since two days, I receive the error shown below on a build machine building > > nanoBSD. The recent CURRENT (nanoBSD sources AND the building host) is > > > > FreeBSD 11.0-CURRENT #2 r290394: Thu Nov 5 16:30:20 CET 2015 amd64 > > > > > > [...] > > cc -O3 -pipe -I/empty/src/ALG/CURRENT/usr.sbin/cron/lib/../cron > > -DLOGIN_CAP -DPAM -DNDEBUG -std=gnu99 -fstack-protector-strong > > -Wsystem-headers -Wall -Wno-format-y2k -W -Wno-unused-parameter > > -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Wno-uninitialized > > -Wno-pointer-sign -Wno-empty-body -Wno-string-plus-int > > -Wno-unused-const-variable -Wno-tautological-compare -Wno-unused-value > > -Wno-parentheses-equality -Wno-unused-function -Wno-enum-conversion > > -Wno-unused-local-typedef -Qunused-arguments > > -c /empty/src/ALG/CURRENT/usr.sbin/cron/lib/env.c -o env.o --- > > all_subdir_usr.bin --- --- all_subdir_netstat --- if.o: In function > > `intpr': /empty/src/ALG/CURRENT/usr.bin/netstat/if.c:(.text+0x40f): > > undefined reference to > > `MAX' /empty/src/ALG/CURRENT/usr.bin/netstat/if.c:(.text+0x472): undefined > > reference to `MAX' cc: error: linker command failed with exit code 1 (use > > -v to see invocation) *** [netstat] Error code 1 > > Fixed in r290437 — thank you for the report. > Take care! > -NGie Thank you. oh ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: nanoBSD: buidlworld failure
> On Nov 5, 2015, at 22:51, O. Hartmannwrote: > > Since two days, I receive the error shown below on a build machine building > nanoBSD. The recent CURRENT (nanoBSD sources AND the building host) is > > FreeBSD 11.0-CURRENT #2 r290394: Thu Nov 5 16:30:20 CET 2015 amd64 > > > [...] > cc -O3 -pipe -I/empty/src/ALG/CURRENT/usr.sbin/cron/lib/../cron > -DLOGIN_CAP > -DPAM -DNDEBUG -std=gnu99 -fstack-protector-strong -Wsystem-headers -Wall > -Wno-format-y2k -W -Wno-unused-parameter -Wstrict-prototypes > -Wmissing-prototypes -Wpointer-arith -Wno-uninitialized -Wno-pointer-sign > -Wno-empty-body -Wno-string-plus-int -Wno-unused-const-variable > -Wno-tautological-compare -Wno-unused-value -Wno-parentheses-equality > -Wno-unused-function -Wno-enum-conversion -Wno-unused-local-typedef > -Qunused-arguments -c /empty/src/ALG/CURRENT/usr.sbin/cron/lib/env.c -o env.o > --- all_subdir_usr.bin --- --- all_subdir_netstat --- if.o: In function > `intpr': /empty/src/ALG/CURRENT/usr.bin/netstat/if.c:(.text+0x40f): undefined > reference to `MAX' /empty/src/ALG/CURRENT/usr.bin/netstat/if.c:(.text+0x472): > undefined reference to `MAX' cc: error: linker command failed with exit code 1 > (use -v to see invocation) *** [netstat] Error code 1 Fixed in r290437 — thank you for the report. Take care! -NGie ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Timing issue with Dummynet on high kernel timer interrupt
Ideally there'd be both behaviours: * You'd specify whether a timer/sleep needs to be exact or can withstand some jitter (which is what linux provides); and * You can communicate to the kernel its aggressiveness for coalescing wakeups. Teaching powerd to flip on/off a sysctl for this isn't that tricky. I remember seeing someone else in #bsdmips having issues figuring out sleep-in-kernel durations and it being exactly this stuff (and it being you, Ian, that figured it out) - so I'd like to at least fix it in -head this time around! Also - I thought the other issue was that precise callouts (versus the default for callouts now which is "fuzzy" unless you specify precision) were still jittering. Is that the case? Thanks, -adrian ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Timing issue with Dummynet on high kernel timer interrupt
On Fri, 6 Nov 2015, Ian Lepore wrote: On Fri, 2015-11-06 at 17:51 +0100, Hans Petter Selasky wrote: On 11/06/15 17:43, Ian Lepore wrote: On Fri, 2015-11-06 at 17:28 +0100, Hans Petter Selasky wrote: Hi, Do the test II results change with this setting? sysctl kern.timecounter.alloweddeviation=0 Yes, it looks much better: debug.total: 10013 -> 0 debug.total: 10013 -> 0 ... This isn't the first time that the alloweddeviation feature has led people (including me in the past) to think there is a timing bug. I think the main purpose of the feature is to help save battery power on laptops by clustering nearby scheduled wakeups to all happen at the same time and then allow for longer sleeps between each wakeup. I was trying to remember the flag for turning off that "feature". It gives the bizarre behaviour that on an old system with a timer resolution of 10 msec, "time sleep 1" sleeps for 1 second with an average error of < 10 msec, but with a timer resolution of 1 msec for hardclock and finer for short timeouts, "time sleep 1" sleeps for an average of an extra 30 msec (worst case 1.069 seconds IIRC). Thus high resolution timers give much lower resolution for medium-sized timeouts. (For "sleep 10", the average error is again 30 msec but this is relatively smaller, and for "sleep .001" the average error must be less than 1 msec to work at all, though it is likely to be relatively large.) I've been wondering lately whether this might also be behind the unexplained "load average is always 0.60" problem people have noticed on some systems. If load average is calculated by sampling what work is happening when a timer interrupt fires, and the system is working hard to ensure that a timer interrupt only happens when there is actual work to do, you'd end up with statistics reporting that there is work being done most of the time when it took a sample. I use HZ = 100 and haven't seen this. Strangely, HZ = 100 gives the same 69 msec max error for "sleep 1" as HZ = 1000. Schedulers should mostly use the actual thread runtimes to avoid sampling biases. That might even be faster. But it doesn't work so well for the load average, or at all for resource usages that are averages, or for the usr/sys/intr splitting of the runtime. It is good enough for scheduling since the splitting is not need for scheduling. Bruce ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: [PATCH] microoptimize by trying to avoid locking a locked mutex
On 11/06/15 01:55, Mateusz Guzik wrote: On Thu, Nov 05, 2015 at 04:35:22PM -0700, Ian Lepore wrote: On Thu, 2015-11-05 at 14:19 -0800, John Baldwin wrote: On Thursday, November 05, 2015 01:45:19 PM Adrian Chadd wrote: On 5 November 2015 at 11:26, Mateusz Guzikwrote: On Thu, Nov 05, 2015 at 11:04:13AM -0800, John Baldwin wrote: On Thursday, November 05, 2015 04:26:28 PM Konstantin Belousov wrote: On Thu, Nov 05, 2015 at 12:32:18AM +0100, Mateusz Guzik wrote: mtx_lock will unconditionally try to grab the lock and if that fails, will call __mtx_lock_sleep which will immediately try to do the same atomic op again. So, the obvious microoptimization is to check the state in __mtx_lock_sleep and avoid the operation if the lock is not free. This gives me ~40% speedup in a microbenchmark of 40 find processes traversing tmpfs and contending on mount mtx (only used as an easy benchmark, I have WIP patches to get rid of it). Second part of the patch is optional and just checks the state of the lock prior to doing any atomic operations, but it gives a very modest speed up when applied on top of the __mtx_lock_sleep change. As such, I'm not going to defend this part. Shouldn't the same consideration applied to all spinning loops, i.e. also to the spin/thread mutexes, and to the spinning parts of sx and lockmgr ? I agree. I think both changes are good and worth doing in our other primitives. I glanced over e.g. rw_rlock and it did not have the issue, now that I see _sx_xlock_hard it wuld indeed use fixing. Expect a patch in few h for all primitives I'll find. I'll stress test the kernel, but it is unlikely I'll do microbenchmarks for remaining primitives. Is this stuff you're proposing still valid for non-x86 platforms? Yes. It just does a read before trying the atomic compare and swap and falls through to the hard case as if the atomic op failed if the result of the read would result in a compare failure. The atomic ops include barriers, the new pre-read of the variable doesn't. Will that cause problems, especially for code inside a loop where the compiler may decide to shuffle things around? Lock value is volatile, so the compiler should not screw us up. I removed the store to the variable due to a different concern. In worst case we would have slept instead of spinning. (see below) I suspect the performance gain will be biggest on the platforms where atomic ops are expensive (I gather from various code comments that's the case on x86). I don't know about other architectures, on x86 atomic op performance on contended cachelines deteriorates drastically. === For the most port the patch below does 2 simple kinds of changes: 1. in macros for lock/unlock primitives the lock value is fetched and compared against relevant _UNLOCKED macro prior to the atomic op 2. in functions: before: while (!_mtx_obtain_lock(m, tid)) { v = m->mtx_lock; if (v != MTX_UNOWNED) { } . } after: for (;;) { if (m->mtx_lock == MTX_UNOWNED && _mtx_obtain_lock(m, tid)) break; v = m->mtx_lock; if (v != MTX_UNOWNED) { } . } The original patch preloaded the 'v' variable. If the value was MTX_UNOWNED and _mtx_obtain_lock failed, we just lost the race. In order to spin waiting for the other thread to release the lock, we have to re-read the variable. Thus for simplicity it is no longer stored in 'v'. Note this is contrary to kib's earlier suggestion which would put such threads directly to sleep. I don't have proper measurements to have any strong opinion here, I went the aforementioned route to minimise changes. Note this is a trivial patch to improve the situation a little bit, I have no interest in trying to polish these primitives at this time. For overall results, on a machine with 32GB of RAM and the following: CPU: Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz (2800.06-MHz K8-class CPU) FreeBSD/SMP: 2 package(s) x 10 core(s) x 2 SMT threads this reduced make -j 40 kernel in a 10.2 jail from ~2:15 to ~2:05. === diff --git a/sys/kern/kern_lock.c b/sys/kern/kern_lock.c index aa67180..198e93e 100644 --- a/sys/kern/kern_lock.c +++ b/sys/kern/kern_lock.c @@ -787,8 +787,10 @@ __lockmgr_args(struct lock *lk, u_int flags, struct lock_object *ilk, break; } - while (!atomic_cmpset_acq_ptr(>lk_lock, LK_UNLOCKED, - tid)) { + for (;;) { + if (lk->lk_lock == LK_UNLOCKED && + atomic_cmpset_acq_ptr(>lk_lock, LK_UNLOCKED, tid)) + break; #ifdef HWPMC_HOOKS PMC_SOFT_CALL( , , lock, failed); #endif @@ -1124,7 +1126,11 @@ __lockmgr_args(struct lock *lk, u_int flags, struct
Re: strange kernel crash
On Fri, Nov 06, 2015 at 01:20:13PM +0200, Andriy Gapon wrote: > Unread portion of the kernel message buffer: > > Fatal trap 1: privileged instruction fault while in kernel mode > cpuid = 0; apic id = 00 > instruction pointer = 0x20:0x80619a1e > stack pointer = 0x28:0xfe04f57856f0 > frame pointer = 0x28:0xfe04f57857b0 > code segment= base 0x0, limit 0xf, type 0x1b > = DPL 0, pres 1, long 1, def32 0, gran 1 > processor eflags= interrupt enabled, resume, IOPL = 0 > current process = 2658 (firefox) > trap number = 1 > panic: privileged instruction fault > cpuid = 0 > curthread: 0xf803270b6000 > stack: 0xfe04f5782000 - 0xfe04f5786000 > stack pointer: 0xfe04f5785320 > KDB: stack backtrace: > db_trace_self_wrapper() at 0x8041e86b = > db_trace_self_wrapper+0x2b/frame > 0xfe04f5785250 > kdb_backtrace() at 0x80669f39 = kdb_backtrace+0x39/frame > 0xfe04f5785300 > vpanic() at 0x8063531c = vpanic+0x14c/frame 0xfe04f5785340 > panic() at 0x80635063 = panic+0x43/frame 0xfe04f57853a0 > trap_fatal() at 0x8081fc0f = trap_fatal+0x33f/frame 0xfe04f5785400 > trap() at 0x8081f872 = trap+0x7d2/frame 0xfe04f5785610 > trap_check() at 0x8081ff2a = trap_check+0x2a/frame 0xfe04f5785630 > calltrap() at 0x80807ea0 = calltrap+0x8/frame 0xfe04f5785630 > --- trap 0x1, rip = 0x80619a1e, rsp = 0xfe04f5785700, rbp = > 0xfe04f57857b0 --- > __mtx_lock_flags() at 0x80619a1e = __mtx_lock_flags+0x2ee/frame > 0xfe04f57857b0 > uma_dbg_getslab() at 0x807df15c = uma_dbg_getslab+0x3c/frame > 0xfe04f57857d0 > uma_dbg_alloc() at 0x807df08d = uma_dbg_alloc+0x2d/frame > 0xfe04f5785800 > uma_zalloc_arg() at 0x807dacf1 = uma_zalloc_arg+0x4b1/frame > 0xfe04f5785890 > uma_zalloc() at 0x8068b040 = uma_zalloc+0x10/frame 0xfe04f57858a0 > selfdalloc() at 0x8068aa12 = selfdalloc+0x22/frame 0xfe04f57858c0 > pollscan() at 0x8068a615 = pollscan+0x95/frame 0xfe04f5785910 > kern_poll() at 0x8068a4b1 = kern_poll+0x1f1/frame 0xfe04f5785a70 > sys_poll() at 0x8068a2b9 = sys_poll+0x79/frame 0xfe04f5785a90 > syscallenter() at 0x80820560 = syscallenter+0x320/frame > 0xfe04f5785b00 > amd64_syscall() at 0x8082012f = amd64_syscall+0x1f/frame > 0xfe04f5785bf0 > Xfast_syscall() at 0x8080818b = Xfast_syscall+0xfb/frame > 0xfe04f5785bf0 > --- syscall (209, FreeBSD ELF64, sys_poll), rip = 0x80146342a, rsp = > 0x7fffd8e8, rbp = 0x7fffd920 --- > Uptime: 1d12h57m32s > > > Now the strange part: > >0x80619a18 <+744>: jne0x80619a61 > <__mtx_lock_flags+817> >0x80619a1a <+746>: mov%rbx,(%rsp) > => 0x80619a1e <+750>: movq $0x0,0x18(%rsp) >0x80619a27 <+759>: movq $0x0,0x10(%rsp) >0x80619a30 <+768>: movq $0x0,0x8(%rsp) > > RSP value seems to be sane and consistent with the stack information above: > (kgdb) i reg > rax0x4 4 > rbx0xf80126ea54f0 -8791145163536 > rcx0x8099a600 -2137414144 > rdx0xf803270b6000 -8782553063424 > rsi0x4 4 > rdi0xf80027f41318 -8795422715112 > rbp0xfe04f57857b0 0xfe04f57857b0 > rsp0xfe04f5785700 0xfe04f5785700 > r8 0x809a7727 -2137360601 > r9 0xf80126ea54f0 -8791145163536 > r100x3e81000 > r110xfe04f5785cc0 -2177725080384 > r120x1 1 > r130xf803270b6000 -8782553063424 > r140xf80027f41318 -8795422715112 > r150x0 0 > rip0x80619a1e 0x80619a1e > <__mtx_lock_flags+750> > eflags 0x10246 [ PF ZF IF RF ] > cs 0x20 32 > ss 0x28 40 > ds > es > fs > gs > > (kgdb) x/a $rsp > 0xfe04f5785700: 0xf80126ea54f0 > (kgdb) x/a $rsp + 0x18 > 0xfe04f5785718: 0x0 > > I have no idea what could have caused the #GP. This is certainly not a stack > overflow. This is a second report, please take a look at https://lists.freebsd.org/pipermail/freebsd-current/2015-October/057975.html I have no idea as well. ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: r289932 causes pf reversion - breaks rules with broadcast destination
On 2015-11-05 12:39:22 (-0500), Tom Uffnerwrote: > So, if my rule was "working" due to false positive in a comparison that has > now been fixed, how many other address comparisons were affected by this > error? > > There are 36 occurrences of PF_ANEQ in pf.c and 2 in if_pfsync.c > Most of them are an optimisation check. They're used in the NAT paths to see if addresses need to be rewritten (and checksums updated) or not. That's probably part of the reason it took so long to notice the bug in the macro: in most cases a false positive only slowed things down a little, it didn't actually produce an incorrect result. I think I've reproduced your problem with very simple rules: pass out block out proto icmp pass out log on vtnet0 proto icmp from any to vtnet0:broadcast pass out log on vtnet0 proto icmp from any to 172.16.2.1 With those rules I can ping to ping 172.16.2.255 (vtnet0 has 172.16.2.2/24), but not to 172.16.2.1. If I remove the broadcast rule I suddenly can ping to 172.16.2.1. I suspect I've also found the source of the problem: pf_addr_wrap_neq() uses PF_ANEQ(), but sets address family 0. As a result of the fix that now means we always return false there. Can you give this a quick test: diff --git a/sys/netpfil/pf/pf.c b/sys/netpfil/pf/pf.c index 1dfc37d..762b82e 100644 --- a/sys/netpfil/pf/pf.c +++ b/sys/netpfil/pf/pf.c @@ -1973,9 +1973,9 @@ pf_addr_wrap_neq(struct pf_addr_wrap *aw1, struct pf_addr_wrap *aw2) switch (aw1->type) { case PF_ADDR_ADDRMASK: case PF_ADDR_RANGE: - if (PF_ANEQ(>v.a.addr, >v.a.addr, 0)) + if (PF_ANEQ(>v.a.addr, >v.a.addr, AF_INET6)) return (1); - if (PF_ANEQ(>v.a.mask, >v.a.mask, 0)) + if (PF_ANEQ(>v.a.mask, >v.a.mask, AF_INET6)) return (1); return (0); case PF_ADDR_DYNIFTL: Regards, Kristof ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: strange kernel crash
On 06/11/2015 17:35, Konstantin Belousov wrote: > This is a second report, please take a look at > https://lists.freebsd.org/pipermail/freebsd-current/2015-October/057975.html > I have no idea as well. In my case it does not look like the code was overwritten, though. -- Andriy Gapon ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Timing issue with Dummynet on high kernel timer interrupt
Hi, I spent some time to write a test application to investigate this issue and I found some irregularities, that when kern.eventtimer.periodic=0, the timer appears to run very irregular. Test software: == fetch http://home.selasky.org:8192/privat/callout_test_dummynet.tar.gz tar -zxvf callout_test_dummynet.tar.gz cd callout_test_dummynet make -m /usr/src/share/mk SYSDIR=/usr/src/sys all install Test I: === The following settings are placed in /boot/loader.conf and computer rebooted. kern.hz=1 kern.eventtimer.periodic=1 kldload callout_test ./test.sh debug.total: 32604 -> 0 debug.total: 10021 -> 0 debug.total: 10020 -> 0 debug.total: 10020 -> 0 debug.total: 10020 -> 0 debug.total: 10020 -> 0 debug.total: 10020 -> 0 debug.total: 10020 -> 0 debug.total: 10019 -> 0 debug.total: 10021 -> 0 debug.total: 10020 -> 0 debug.total: 10021 -> 0 debug.total: 10019 -> 0 debug.total: 10020 -> 0 debug.total: 10018 -> 0 debug.total: 10021 -> 0 debug.total: 10020 -> 0 debug.total: 10020 -> 0 debug.total: 10020 -> 0 debug.total: 10020 -> 0 debug.total: 10020 -> 0 debug.total: 10020 -> 0 debug.total: 10020 -> 0 debug.total: 10020 -> 0 Test II: The following settings are placed in /boot/loader.conf and computer rebooted. kern.hz=1 kern.eventtimer.periodic=0 kldload callout_test ./test.sh debug.total: 20337 -> 0 debug.total: 10091 -> 0 debug.total: 10018 -> 1 debug.total: 10041 -> 0 debug.total: 10019 -> 0 debug.total: 10644 -> 0 debug.total: 10014 -> 1 debug.total: 10020 -> 0 debug.total: 10019 -> 0 debug.total: 10644 -> 0 debug.total: 10014 -> 0 debug.total: 10644 -> 0 debug.total: 10640 -> 0 debug.total: 10210 -> 0 debug.total: 10015 -> 0 debug.total: 10020 -> 1 debug.total: 10020 -> 0 debug.total: 10453 -> 0 debug.total: 10642 -> 0 If you load the computer, like a multi-CPU buildkernel, the value flattens out again. Test III: = The following settings are placed in /boot/loader.conf and computer rebooted. kern.hz=1000 kern.eventtimer.periodic=1 kldload callout_test ./test.sh debug.total: 5238 -> 0 debug.total: 1005 -> 0 debug.total: 1003 -> 0 debug.total: 1006 -> 0 debug.total: 1003 -> 0 debug.total: 1003 -> 0 debug.total: 1005 -> 0 debug.total: 1006 -> 0 debug.total: 1003 -> 0 debug.total: 1013 -> 0 debug.total: 1005 -> 0 debug.total: 1006 -> 0 debug.total: 1004 -> 0 debug.total: 1005 -> 0 debug.total: 1003 -> 0 debug.total: 1004 -> 0 debug.total: 1006 -> 0 debug.total: 1004 -> 0 debug.total: 1004 -> 0 debug.total: 1007 -> 0 debug.total: 1003 -> 0 debug.total: 1004 -> 0 The difference between 100021 and 10642 interrupts in a second for test II is quite big, close to 6.5% deviation. Does anyone have any clue if this is the expected behaviour when kern.eventtimer.periodic=1 or not? MAV ? --HPS ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Timing issue with Dummynet on high kernel timer interrupt
On Fri, 2015-11-06 at 17:28 +0100, Hans Petter Selasky wrote: > Hi, > > I spent some time to write a test application to investigate this > issue > and I found some irregularities, that when > kern.eventtimer.periodic=0, > the timer appears to run very irregular. > > Test software: > == > > fetch > http://home.selasky.org:8192/privat/callout_test_dummynet.tar.gz > tar -zxvf callout_test_dummynet.tar.gz > cd callout_test_dummynet > make -m /usr/src/share/mk SYSDIR=/usr/src/sys all install > > Test I: > === > > The following settings are placed in /boot/loader.conf and computer > rebooted. > kern.hz=1 > kern.eventtimer.periodic=1 > > kldload callout_test > > ./test.sh > debug.total: 32604 -> 0 > debug.total: 10021 -> 0 > debug.total: 10020 -> 0 > debug.total: 10020 -> 0 > debug.total: 10020 -> 0 > debug.total: 10020 -> 0 > debug.total: 10020 -> 0 > debug.total: 10020 -> 0 > debug.total: 10019 -> 0 > debug.total: 10021 -> 0 > debug.total: 10020 -> 0 > debug.total: 10021 -> 0 > debug.total: 10019 -> 0 > debug.total: 10020 -> 0 > debug.total: 10018 -> 0 > debug.total: 10021 -> 0 > debug.total: 10020 -> 0 > debug.total: 10020 -> 0 > debug.total: 10020 -> 0 > debug.total: 10020 -> 0 > debug.total: 10020 -> 0 > debug.total: 10020 -> 0 > debug.total: 10020 -> 0 > debug.total: 10020 -> 0 > > Test II: > > > The following settings are placed in /boot/loader.conf and computer > rebooted. > kern.hz=1 > kern.eventtimer.periodic=0 > > kldload callout_test > > ./test.sh > debug.total: 20337 -> 0 > debug.total: 10091 -> 0 > debug.total: 10018 -> 1 > debug.total: 10041 -> 0 > debug.total: 10019 -> 0 > debug.total: 10644 -> 0 > debug.total: 10014 -> 1 > debug.total: 10020 -> 0 > debug.total: 10019 -> 0 > debug.total: 10644 -> 0 > debug.total: 10014 -> 0 > debug.total: 10644 -> 0 > debug.total: 10640 -> 0 > debug.total: 10210 -> 0 > debug.total: 10015 -> 0 > debug.total: 10020 -> 1 > debug.total: 10020 -> 0 > debug.total: 10453 -> 0 > debug.total: 10642 -> 0 > > If you load the computer, like a multi-CPU buildkernel, the value > flattens out again. > > Test III: > = > The following settings are placed in /boot/loader.conf and computer > rebooted. > kern.hz=1000 > kern.eventtimer.periodic=1 > > kldload callout_test > > ./test.sh > debug.total: 5238 -> 0 > debug.total: 1005 -> 0 > debug.total: 1003 -> 0 > debug.total: 1006 -> 0 > debug.total: 1003 -> 0 > debug.total: 1003 -> 0 > debug.total: 1005 -> 0 > debug.total: 1006 -> 0 > debug.total: 1003 -> 0 > debug.total: 1013 -> 0 > debug.total: 1005 -> 0 > debug.total: 1006 -> 0 > debug.total: 1004 -> 0 > debug.total: 1005 -> 0 > debug.total: 1003 -> 0 > debug.total: 1004 -> 0 > debug.total: 1006 -> 0 > debug.total: 1004 -> 0 > debug.total: 1004 -> 0 > debug.total: 1007 -> 0 > debug.total: 1003 -> 0 > debug.total: 1004 -> 0 > > > The difference between 100021 and 10642 interrupts in a second for > test > II is quite big, close to 6.5% deviation. Does anyone have any clue > if > this is the expected behaviour when kern.eventtimer.periodic=1 or > not? > > MAV ? > > --HPS Do the test II results change with this setting? sysctl kern.timecounter.alloweddeviation=0 -- Ian ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Timing issue with Dummynet on high kernel timer interrupt
On 11/06/15 17:43, Ian Lepore wrote: On Fri, 2015-11-06 at 17:28 +0100, Hans Petter Selasky wrote: Hi, Do the test II results change with this setting? sysctl kern.timecounter.alloweddeviation=0 Yes, it looks much better: debug.total: 10013 -> 0 debug.total: 10013 -> 0 debug.total: 10012 -> 0 debug.total: 10012 -> 0 debug.total: 10012 -> 0 debug.total: 10013 -> 0 debug.total: 10012 -> 1 debug.total: 10013 -> 0 debug.total: 10013 -> 0 debug.total: 10013 -> 0 debug.total: 10012 -> 0 debug.total: 10013 -> 0 debug.total: 10013 -> 0 --HPS ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Timing issue with Dummynet on high kernel timer interrupt
On 11/06/15 17:51, Hans Petter Selasky wrote: On 11/06/15 17:43, Ian Lepore wrote: On Fri, 2015-11-06 at 17:28 +0100, Hans Petter Selasky wrote: Hi, Do the test II results change with this setting? sysctl kern.timecounter.alloweddeviation=0 Yes, it looks much better: debug.total: 10013 -> 0 debug.total: 10013 -> 0 debug.total: 10012 -> 0 debug.total: 10012 -> 0 debug.total: 10012 -> 0 debug.total: 10013 -> 0 debug.total: 10012 -> 1 debug.total: 10013 -> 0 debug.total: 10013 -> 0 debug.total: 10013 -> 0 debug.total: 10012 -> 0 debug.total: 10013 -> 0 debug.total: 10013 -> 0 --HPS Thought I still see some unexpected dips, as the test runs for a longer amount of time. Not sure what the cause might be. debug.total: 10013 -> 0 debug.total: 10013 -> 0 debug.total: 10013 -> 0 debug.total: 9844 -> 0 debug.total: 10013 -> 0 debug.total: 10013 -> 0 debug.total: 10012 -> 0 --HPS ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Timing issue with Dummynet on high kernel timer interrupt
On Fri, 2015-11-06 at 17:51 +0100, Hans Petter Selasky wrote: > On 11/06/15 17:43, Ian Lepore wrote: > > On Fri, 2015-11-06 at 17:28 +0100, Hans Petter Selasky wrote: > > > Hi, > > > > > Do the test II results change with this setting? > > > >sysctl kern.timecounter.alloweddeviation=0 > > > > Yes, it looks much better: > > debug.total: 10013 -> 0 > debug.total: 10013 -> 0 > debug.total: 10012 -> 0 > debug.total: 10012 -> 0 > debug.total: 10012 -> 0 > debug.total: 10013 -> 0 > debug.total: 10012 -> 1 > debug.total: 10013 -> 0 > debug.total: 10013 -> 0 > debug.total: 10013 -> 0 > debug.total: 10012 -> 0 > debug.total: 10013 -> 0 > debug.total: 10013 -> 0 > > --HPS This isn't the first time that the alloweddeviation feature has led people (including me in the past) to think there is a timing bug. I think the main purpose of the feature is to help save battery power on laptops by clustering nearby scheduled wakeups to all happen at the same time and then allow for longer sleeps between each wakeup. I've been wondering lately whether this might also be behind the unexplained "load average is always 0.60" problem people have noticed on some systems. If load average is calculated by sampling what work is happening when a timer interrupt fires, and the system is working hard to ensure that a timer interrupt only happens when there is actual work to do, you'd end up with statistics reporting that there is work being done most of the time when it took a sample. Maybe it would make sense to only enable the feature by default on battery-powered systems? -- Ian ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
strange kernel crash
Unread portion of the kernel message buffer: Fatal trap 1: privileged instruction fault while in kernel mode cpuid = 0; apic id = 00 instruction pointer = 0x20:0x80619a1e stack pointer = 0x28:0xfe04f57856f0 frame pointer = 0x28:0xfe04f57857b0 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 2658 (firefox) trap number = 1 panic: privileged instruction fault cpuid = 0 curthread: 0xf803270b6000 stack: 0xfe04f5782000 - 0xfe04f5786000 stack pointer: 0xfe04f5785320 KDB: stack backtrace: db_trace_self_wrapper() at 0x8041e86b = db_trace_self_wrapper+0x2b/frame 0xfe04f5785250 kdb_backtrace() at 0x80669f39 = kdb_backtrace+0x39/frame 0xfe04f5785300 vpanic() at 0x8063531c = vpanic+0x14c/frame 0xfe04f5785340 panic() at 0x80635063 = panic+0x43/frame 0xfe04f57853a0 trap_fatal() at 0x8081fc0f = trap_fatal+0x33f/frame 0xfe04f5785400 trap() at 0x8081f872 = trap+0x7d2/frame 0xfe04f5785610 trap_check() at 0x8081ff2a = trap_check+0x2a/frame 0xfe04f5785630 calltrap() at 0x80807ea0 = calltrap+0x8/frame 0xfe04f5785630 --- trap 0x1, rip = 0x80619a1e, rsp = 0xfe04f5785700, rbp = 0xfe04f57857b0 --- __mtx_lock_flags() at 0x80619a1e = __mtx_lock_flags+0x2ee/frame 0xfe04f57857b0 uma_dbg_getslab() at 0x807df15c = uma_dbg_getslab+0x3c/frame 0xfe04f57857d0 uma_dbg_alloc() at 0x807df08d = uma_dbg_alloc+0x2d/frame 0xfe04f5785800 uma_zalloc_arg() at 0x807dacf1 = uma_zalloc_arg+0x4b1/frame 0xfe04f5785890 uma_zalloc() at 0x8068b040 = uma_zalloc+0x10/frame 0xfe04f57858a0 selfdalloc() at 0x8068aa12 = selfdalloc+0x22/frame 0xfe04f57858c0 pollscan() at 0x8068a615 = pollscan+0x95/frame 0xfe04f5785910 kern_poll() at 0x8068a4b1 = kern_poll+0x1f1/frame 0xfe04f5785a70 sys_poll() at 0x8068a2b9 = sys_poll+0x79/frame 0xfe04f5785a90 syscallenter() at 0x80820560 = syscallenter+0x320/frame 0xfe04f5785b00 amd64_syscall() at 0x8082012f = amd64_syscall+0x1f/frame 0xfe04f5785bf0 Xfast_syscall() at 0x8080818b = Xfast_syscall+0xfb/frame 0xfe04f5785bf0 --- syscall (209, FreeBSD ELF64, sys_poll), rip = 0x80146342a, rsp = 0x7fffd8e8, rbp = 0x7fffd920 --- Uptime: 1d12h57m32s Now the strange part: 0x80619a18 <+744>: jne0x80619a61 <__mtx_lock_flags+817> 0x80619a1a <+746>: mov%rbx,(%rsp) => 0x80619a1e <+750>: movq $0x0,0x18(%rsp) 0x80619a27 <+759>: movq $0x0,0x10(%rsp) 0x80619a30 <+768>: movq $0x0,0x8(%rsp) RSP value seems to be sane and consistent with the stack information above: (kgdb) i reg rax0x4 4 rbx0xf80126ea54f0 -8791145163536 rcx0x8099a600 -2137414144 rdx0xf803270b6000 -8782553063424 rsi0x4 4 rdi0xf80027f41318 -8795422715112 rbp0xfe04f57857b0 0xfe04f57857b0 rsp0xfe04f5785700 0xfe04f5785700 r8 0x809a7727 -2137360601 r9 0xf80126ea54f0 -8791145163536 r100x3e81000 r110xfe04f5785cc0 -2177725080384 r120x1 1 r130xf803270b6000 -8782553063424 r140xf80027f41318 -8795422715112 r150x0 0 rip0x80619a1e 0x80619a1e <__mtx_lock_flags+750> eflags 0x10246 [ PF ZF IF RF ] cs 0x20 32 ss 0x28 40 ds es fs gs (kgdb) x/a $rsp 0xfe04f5785700: 0xf80126ea54f0 (kgdb) x/a $rsp + 0x18 0xfe04f5785718: 0x0 I have no idea what could have caused the #GP. This is certainly not a stack overflow. -- Andriy Gapon ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Timing issue with Dummynet on high kernel timer interrupt
On Fri, 2015-11-06 at 17:57 +0100, Hans Petter Selasky wrote: > On 11/06/15 17:51, Hans Petter Selasky wrote: > > On 11/06/15 17:43, Ian Lepore wrote: > > > On Fri, 2015-11-06 at 17:28 +0100, Hans Petter Selasky wrote: > > > > Hi, > > > > > > > > Do the test II results change with this setting? > > > > > >sysctl kern.timecounter.alloweddeviation=0 > > > > > > > Yes, it looks much better: > > > > debug.total: 10013 -> 0 > > debug.total: 10013 -> 0 > > debug.total: 10012 -> 0 > > debug.total: 10012 -> 0 > > debug.total: 10012 -> 0 > > debug.total: 10013 -> 0 > > debug.total: 10012 -> 1 > > debug.total: 10013 -> 0 > > debug.total: 10013 -> 0 > > debug.total: 10013 -> 0 > > debug.total: 10012 -> 0 > > debug.total: 10013 -> 0 > > debug.total: 10013 -> 0 > > > > --HPS > > > > Thought I still see some unexpected dips, as the test runs for a > longer > amount of time. Not sure what the cause might be. > > > debug.total: 10013 -> 0 > > debug.total: 10013 -> 0 > > debug.total: 10013 -> 0 > > debug.total: 9844 -> 0 > > debug.total: 10013 -> 0 > > debug.total: 10013 -> 0 > > debug.total: 10012 -> 0 > > --HPS Is it possible your machine is occasionally falling into a deeper sleep state (C3 or whatever)? I think it can take hundreds of microseconds to wake up from some of the deeper power-saving modes. If not power-saving, just plain old system-is-busy could lead to the reduced counts occasionally, since each callout is scheduled as a delta from the current actual wakeup time, not a delta from the current scheduled wakeup time (i.e., there is a phase shift in the firing of events over time). Raising the priority of the test thread into the realtime range might help with system-busy variation. It's also a bit iffy that you're using eventtimers to measure the performance of eventtimers (by using sleep 1 in the script). That explains the +12/13 count every second, not the reduced counts unless we speculate that sleep is returning early (which I have NEVER caught it doing). If you have a PPS source available, that can make a good way to sleep based on an off-box timing signal and avoid using a clock to measure itself. But I doubt it would make a big difference in this case. -- Ian ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: strange kernel crash
On 11/06/15 12:20, Andriy Gapon wrote: Now the strange part: 0x80619a18 <+744>: jne0x80619a61 <__mtx_lock_flags+817> 0x80619a1a <+746>: mov%rbx,(%rsp) => 0x80619a1e <+750>: movq $0x0,0x18(%rsp) 0x80619a27 <+759>: movq $0x0,0x10(%rsp) 0x80619a30 <+768>: movq $0x0,0x8(%rsp) Were these instructions dumped from RAM or from the kernel ELF file? --HPS ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
panic in zfs on reboot
I updated my machine that tracks head and rebooted it, and it panic'd during the 'shutdown -r' execution. Panic String: solaris assert: zrl->zr_refcount == 0 (0x2 == 0x0), file: /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zrlock.c, line: 64 (kgdb) bt #0 doadump (textdump=1) at pcpu.h:221 #1 0x80708ec5 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:364 #2 0x8070949b in vpanic (fmt=, ap=) at /usr/src/sys/kern/kern_shutdown.c:757 #3 0x807094e3 in panic (fmt=0x0) at /usr/src/sys/kern/kern_shutdown.c:688 #4 0x81b3c25f in assfail3 (a=, lv=, op=, rv=, f=, l=optimized out>) at /usr/src/sys/cddl/compat/opensolaris/kern/opensolaris_cmn_err.c:91 #5 0x8187fe54 in zrl_destroy (zrl=0xf8001d429820) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zrlock.c:64 #6 0x81804432 in dnode_special_close (dnh=0xf8001d429820) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dnode.c:1005 #7 0x817fa608 in dmu_objset_evict_done (os=0xf8001d429800) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_objset.c:726 #8 0x8180f191 in dsl_dataset_evict (dbu=0xf8000f957000) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_dataset.c:287 #9 0x80756ae0 in taskqueue_run_locked (queue=0xf8000d00cd00) at /usr/src/sys/kern/subr_taskqueue.c:430 #10 0x807575f8 in taskqueue_thread_loop (arg=) at /usr/src/sys/kern/subr_taskqueue.c:683 #11 0x806cf6b4 in fork_exit ( callout=0x80757570 , arg=0xf8000d01e020, frame=0xfe085c058ac0) at /usr/src/sys/kern/kern_fork.c:1011 #12 0x809ae70e in fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:609 #13 0x in ?? () I have a vmcore for this panic. I have saved off the installed kernel directory for future examination. -Kurt ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: strange kernel crash
On 6 Nov, Konstantin Belousov wrote: > On Fri, Nov 06, 2015 at 01:20:13PM +0200, Andriy Gapon wrote: >> Unread portion of the kernel message buffer: >> >> Fatal trap 1: privileged instruction fault while in kernel mode >> cpuid = 0; apic id = 00 >> instruction pointer = 0x20:0x80619a1e >> stack pointer = 0x28:0xfe04f57856f0 >> frame pointer = 0x28:0xfe04f57857b0 >> code segment= base 0x0, limit 0xf, type 0x1b >> = DPL 0, pres 1, long 1, def32 0, gran 1 >> processor eflags= interrupt enabled, resume, IOPL = 0 >> current process = 2658 (firefox) >> trap number = 1 >> panic: privileged instruction fault >> cpuid = 0 >> curthread: 0xf803270b6000 >> stack: 0xfe04f5782000 - 0xfe04f5786000 >> stack pointer: 0xfe04f5785320 >> KDB: stack backtrace: >> db_trace_self_wrapper() at 0x8041e86b = >> db_trace_self_wrapper+0x2b/frame >> 0xfe04f5785250 >> kdb_backtrace() at 0x80669f39 = kdb_backtrace+0x39/frame >> 0xfe04f5785300 >> vpanic() at 0x8063531c = vpanic+0x14c/frame 0xfe04f5785340 >> panic() at 0x80635063 = panic+0x43/frame 0xfe04f57853a0 >> trap_fatal() at 0x8081fc0f = trap_fatal+0x33f/frame >> 0xfe04f5785400 >> trap() at 0x8081f872 = trap+0x7d2/frame 0xfe04f5785610 >> trap_check() at 0x8081ff2a = trap_check+0x2a/frame 0xfe04f5785630 >> calltrap() at 0x80807ea0 = calltrap+0x8/frame 0xfe04f5785630 >> --- trap 0x1, rip = 0x80619a1e, rsp = 0xfe04f5785700, rbp = >> 0xfe04f57857b0 --- >> __mtx_lock_flags() at 0x80619a1e = __mtx_lock_flags+0x2ee/frame >> 0xfe04f57857b0 >> uma_dbg_getslab() at 0x807df15c = uma_dbg_getslab+0x3c/frame >> 0xfe04f57857d0 >> uma_dbg_alloc() at 0x807df08d = uma_dbg_alloc+0x2d/frame >> 0xfe04f5785800 >> uma_zalloc_arg() at 0x807dacf1 = uma_zalloc_arg+0x4b1/frame >> 0xfe04f5785890 >> uma_zalloc() at 0x8068b040 = uma_zalloc+0x10/frame 0xfe04f57858a0 >> selfdalloc() at 0x8068aa12 = selfdalloc+0x22/frame 0xfe04f57858c0 >> pollscan() at 0x8068a615 = pollscan+0x95/frame 0xfe04f5785910 >> kern_poll() at 0x8068a4b1 = kern_poll+0x1f1/frame 0xfe04f5785a70 >> sys_poll() at 0x8068a2b9 = sys_poll+0x79/frame 0xfe04f5785a90 >> syscallenter() at 0x80820560 = syscallenter+0x320/frame >> 0xfe04f5785b00 >> amd64_syscall() at 0x8082012f = amd64_syscall+0x1f/frame >> 0xfe04f5785bf0 >> Xfast_syscall() at 0x8080818b = Xfast_syscall+0xfb/frame >> 0xfe04f5785bf0 >> --- syscall (209, FreeBSD ELF64, sys_poll), rip = 0x80146342a, rsp = >> 0x7fffd8e8, rbp = 0x7fffd920 --- >> Uptime: 1d12h57m32s >> >> >> Now the strange part: >> >>0x80619a18 <+744>: jne0x80619a61 >> <__mtx_lock_flags+817> >>0x80619a1a <+746>: mov%rbx,(%rsp) >> => 0x80619a1e <+750>: movq $0x0,0x18(%rsp) >>0x80619a27 <+759>: movq $0x0,0x10(%rsp) >>0x80619a30 <+768>: movq $0x0,0x8(%rsp) >> >> RSP value seems to be sane and consistent with the stack information above: >> (kgdb) i reg >> rax0x4 4 >> rbx0xf80126ea54f0 -8791145163536 >> rcx0x8099a600 -2137414144 >> rdx0xf803270b6000 -8782553063424 >> rsi0x4 4 >> rdi0xf80027f41318 -8795422715112 >> rbp0xfe04f57857b0 0xfe04f57857b0 >> rsp0xfe04f5785700 0xfe04f5785700 >> r8 0x809a7727 -2137360601 >> r9 0xf80126ea54f0 -8791145163536 >> r100x3e81000 >> r110xfe04f5785cc0 -2177725080384 >> r120x1 1 >> r130xf803270b6000 -8782553063424 >> r140xf80027f41318 -8795422715112 >> r150x0 0 >> rip0x80619a1e 0x80619a1e >> <__mtx_lock_flags+750> >> eflags 0x10246 [ PF ZF IF RF ] >> cs 0x20 32 >> ss 0x28 40 >> ds >> es >> fs >> gs >> >> (kgdb) x/a $rsp >> 0xfe04f5785700: 0xf80126ea54f0 >> (kgdb) x/a $rsp + 0x18 >> 0xfe04f5785718: 0x0 >> >> I have no idea what could have caused the #GP. This is certainly not a stack >> overflow. > > This is a second report, please take a look at > https://lists.freebsd.org/pipermail/freebsd-current/2015-October/057975.html > I have no idea as well. Whatever the problem is, it appears to be hard to trigger. I haven't had a recurrence. ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to
Re: panic in zfs on reboot
On 11/6/15 1:12 PM, Kurt Lidl wrote: I updated my machine that tracks head and rebooted it, and it panic'd during the 'shutdown -r' execution. Panic String: solaris assert: zrl->zr_refcount == 0 (0x2 == 0x0), file: /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zrlock.c, line: 64 (kgdb) bt #0 doadump (textdump=1) at pcpu.h:221 #1 0x80708ec5 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:364 #2 0x8070949b in vpanic (fmt=, ap=) at /usr/src/sys/kern/kern_shutdown.c:757 #3 0x807094e3 in panic (fmt=0x0) at /usr/src/sys/kern/kern_shutdown.c:688 #4 0x81b3c25f in assfail3 (a=, lv=, op=, rv=, f=, l=) at /usr/src/sys/cddl/compat/opensolaris/kern/opensolaris_cmn_err.c:91 #5 0x8187fe54 in zrl_destroy (zrl=0xf8001d429820) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zrlock.c:64 #6 0x81804432 in dnode_special_close (dnh=0xf8001d429820) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dnode.c:1005 #7 0x817fa608 in dmu_objset_evict_done (os=0xf8001d429800) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_objset.c:726 #8 0x8180f191 in dsl_dataset_evict (dbu=0xf8000f957000) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_dataset.c:287 #9 0x80756ae0 in taskqueue_run_locked (queue=0xf8000d00cd00) at /usr/src/sys/kern/subr_taskqueue.c:430 #10 0x807575f8 in taskqueue_thread_loop (arg=) at /usr/src/sys/kern/subr_taskqueue.c:683 #11 0x806cf6b4 in fork_exit ( callout=0x80757570 , arg=0xf8000d01e020, frame=0xfe085c058ac0) at /usr/src/sys/kern/kern_fork.c:1011 #12 0x809ae70e in fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:609 #13 0x in ?? () I have a vmcore for this panic. I have saved off the installed kernel directory for future examination. -Kurt I should have copied in the whole info.2 file: Dump header from device: /dev/gpt/swap0 Architecture: amd64 Architecture Version: 2 Dump Length: 1404231680 Blocksize: 512 Dumptime: Fri Nov 6 12:10:25 2015 Hostname: busybox.pix.net Magic: FreeBSD Kernel Dump Version String: FreeBSD 11.0-CURRENT #86 r290447M: Fri Nov 6 11:00:18 EST 2015 l...@busybox.pix.net:/usr/obj/usr/src/sys/BUSYBOX Panic String: solaris assert: zrl->zr_refcount == 0 (0x2 == 0x0), file: /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zrlock.c, line: 64 Dump Parity: 375889703 Bounds: 2 Dump Status: good The only delta in my source tree on this machine is a patch to the pagezero.S file, which I've been running for several months, so I know it's not that. (https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=199151 for the delta, if you care. I need to benchmark this more fully, but just haven't gotten around to doing it yet.) -Kurt ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Timing issue with Dummynet on high kernel timer interrupt
Adrian Chadd wrote this message on Fri, Nov 06, 2015 at 11:15 -0800: > Ideally there'd be both behaviours: > > * You'd specify whether a timer/sleep needs to be exact or can > withstand some jitter (which is what linux provides); and Isn't that what the precision argument in callout is for? See callout_reset_sbt(9): The sbt, pr, and flags arguments provide more control over the scheduled time including support for higher resolution times, specifying the precision of the scheduled time, and setting an absolute deadline instead of a relative timeout. The callout is scheduled to execute in a time window which begins at the time specified in sbt and extends for the amount of time specified in pr. If sbt specifies a time in the past, the window is adjusted to start at the current time. A non-zero value for pr allows the callout subsystem to coalesce callouts scheduled close to each other into fewer timer interrupts, reducing processing overhead and power consumption. These flags may be specified to adjust the interpretation of sbt and pr: -- John-Mark Gurney Voice: +1 415 225 5579 "All that I will do, has been done, All that I have, has not." ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: [PATCH] microoptimize by trying to avoid locking a locked mutex
On 5 Nov 2015, at 16:55, Mateusz Guzikwrote: > >> Will that cause problems, especially for code inside a loop >> where the compiler may decide to shuffle things around? >> > > Lock value is volatile, so the compiler should not screw us up. Note: volatile means do not reorder loads/stores *to this value*, the compiler is completely free to reorder loads/stores to other values around this one. _Atomic exists in C11 to do the right thing when you want to enforce stricter barriers. In this case, we’re currently relying on the compiler not being able to see through the assembly. At some point in the future, I imagine that we’ll move to using C11 atomic operations, which will allow the compiler more freedom. The new reads that you’re currently doing should be replaced by explicit atomic reads with a relaxed memory order (to enforce atomicity with respect to that variable). If we were to replace the mtx_lock field with an _Atomic variant, then the current code would make it a sequentially consistent read, which would add a lot of extra overhead. Note also that it is undefined behaviour in C11 to do atomic and non-atomic accesses to the same underlying memory location. Please consider the future maintainability of the code when making this change. David ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: FYI: SVN to GIT converter currently broken, github is falling behind
2015-11-05 15:46 GMT+01:00 Ulrich Spörlein: > 2015-11-04 18:57 GMT+01:00 Ulrich Spörlein : >> The recent SVN update on the cluster broke svn2git in certain circumstances. >> >> To fix this and make sure no content was dropped, the converter has >> been stopped and we're working on bringing a fixed version online, as >> well as vetting the correctness of the published git repositories. >> >> ETA is currently unknown, expect an update to this thread within 24h. >> Sorry for the inconvenience! >> >> Uli, on behalf of git-admin > > An independent run of the converter produces a different git > repository starting at the commit following this one: > https://github.com/freebsd/freebsd/commit/bf66c97c4a64e64410bf0223d221a54ca9555f52 > > This is from 9d ago and will likely require a force push to github > that will necessitate people to rebase or merge there work (a > fast-forward merge will fail). > > This is the preliminary inspection and a third verification run is > currently underway. Expect another update within 24h. > > Uli We've rolled back to SVN 1.8 and have resumed normal operations. We're still working on getting svn2git to work nice with SVN 1.9 and produce the same repository checksums. Sorry for the inconvenience! Uli ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: strange kernel crash
On 06/11/2015 20:02, Hans Petter Selasky wrote: > On 11/06/15 12:20, Andriy Gapon wrote: >> Now the strange part: >> >> 0x80619a18 <+744>: jne0x80619a61 >> <__mtx_lock_flags+817> >> 0x80619a1a <+746>: mov%rbx,(%rsp) >> => 0x80619a1e <+750>: movq $0x0,0x18(%rsp) >> 0x80619a27 <+759>: movq $0x0,0x10(%rsp) >> 0x80619a30 <+768>: movq $0x0,0x8(%rsp) > > Were these instructions dumped from RAM or from the kernel ELF file? Whatever minidump and kgdb (libkvm) do for the text section. Just in case, in addition to 'disassemble' I also did this: (kgdb) x/i 0x80619a1e => 0x80619a1e <__mtx_lock_flags+750>: movq $0x0,0x18(%rsp) -- Andriy Gapon ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"