Re: VM errors on -current

2015-11-06 Thread Allan Jude
On 2015-11-06 20:02, Michael Butler wrote:
> Is anyone else seeing VM errors from X on -current with an old Intel
> display adapter?
> 
> agp0:  on vgapci0
> agp0: aperture size is 256M, detected 7932k stolen memory
> 
>  [ .. snip .. ]
> 
> kernel: vm_fault: pager read error, pid 1103 (Xorg) <- ??
> kdm[993]: X server for display :0 terminated unexpectedly
> kernel: pid 1103 (Xorg), uid 0: exited on signal 6 (core dumped)
> devd: notify_clients: send() failed; dropping unresponsive client
> 
>   imb
> 
> ___
> freebsd-current@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
> 

That pager error may suggest a failing disk where it was unable to read
back some swapped out data. Check /var/log/messages for disk errors

-- 
Allan Jude



signature.asc
Description: OpenPGP digital signature


Re: VM errors on -current

2015-11-06 Thread Michael Butler
On 11/06/15 20:04, Allan Jude wrote:
> On 2015-11-06 20:02, Michael Butler wrote:
>> Is anyone else seeing VM errors from X on -current with an old Intel
>> display adapter?
>>
>> agp0:  on vgapci0
>> agp0: aperture size is 256M, detected 7932k stolen memory
>>
>>  [ .. snip .. ]
>>
>> kernel: vm_fault: pager read error, pid 1103 (Xorg) <- ??
>> kdm[993]: X server for display :0 terminated unexpectedly
>> kernel: pid 1103 (Xorg), uid 0: exited on signal 6 (core dumped)
>> devd: notify_clients: send() failed; dropping unresponsive client
>>
>>  imb
>>
>> ___
>> freebsd-current@freebsd.org mailing list
>> https://lists.freebsd.org/mailman/listinfo/freebsd-current
>> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
>>
> 
> That pager error may suggest a failing disk where it was unable to read
> back some swapped out data. Check /var/log/messages for disk errors

No swap in use - this is a video-RAM paging issue,

imb


___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


VM errors on -current

2015-11-06 Thread Michael Butler
Is anyone else seeing VM errors from X on -current with an old Intel
display adapter?

agp0:  on vgapci0
agp0: aperture size is 256M, detected 7932k stolen memory

 [ .. snip .. ]

kernel: vm_fault: pager read error, pid 1103 (Xorg) <- ??
kdm[993]: X server for display :0 terminated unexpectedly
kernel: pid 1103 (Xorg), uid 0: exited on signal 6 (core dumped)
devd: notify_clients: send() failed; dropping unresponsive client

imb

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Panic with PF on current.

2015-11-06 Thread Kristof Provost

> On 06 Nov 2015, at 01:12, Daniel Dettlaff  wrote:
> I have interesting verbose output with backtrace (not panic) from one of my 
> VMs: http://s.verknowsys.com/f0d457ce9420399baaf531012c33eb81.png
> It’s triggered by autostarting jail on bridged vlan interface (no VNET 
> feature enabled)
> 
This seems to be a lock ordering issue between ZFS and devfs.
It appears to have been around for a while too. I found PR 142878 with the same 
witness warning.

Regards,
Kristof
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: nanoBSD: buidlworld failure

2015-11-06 Thread O. Hartmann
On Fri, 6 Nov 2015 00:43:35 -0800
NGie Cooper  wrote:

> > On Nov 5, 2015, at 22:51, O. Hartmann  wrote:
> > 
> > Since two days, I receive the error shown below on a build machine building
> > nanoBSD. The recent CURRENT (nanoBSD sources AND the building host) is
> > 
> > FreeBSD 11.0-CURRENT #2 r290394: Thu Nov  5 16:30:20 CET 2015 amd64
> > 
> > 
> > [...]
> > cc   -O3 -pipe   -I/empty/src/ALG/CURRENT/usr.sbin/cron/lib/../cron
> > -DLOGIN_CAP -DPAM -DNDEBUG -std=gnu99 -fstack-protector-strong
> > -Wsystem-headers -Wall -Wno-format-y2k -W -Wno-unused-parameter
> > -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Wno-uninitialized
> > -Wno-pointer-sign -Wno-empty-body -Wno-string-plus-int
> > -Wno-unused-const-variable -Wno-tautological-compare -Wno-unused-value
> > -Wno-parentheses-equality -Wno-unused-function -Wno-enum-conversion
> > -Wno-unused-local-typedef -Qunused-arguments
> > -c /empty/src/ALG/CURRENT/usr.sbin/cron/lib/env.c -o env.o ---
> > all_subdir_usr.bin --- --- all_subdir_netstat --- if.o: In function
> > `intpr': /empty/src/ALG/CURRENT/usr.bin/netstat/if.c:(.text+0x40f):
> > undefined reference to
> > `MAX' /empty/src/ALG/CURRENT/usr.bin/netstat/if.c:(.text+0x472): undefined
> > reference to `MAX' cc: error: linker command failed with exit code 1 (use
> > -v to see invocation) *** [netstat] Error code 1  
> 
> Fixed in r290437 — thank you for the report.
> Take care!
> -NGie

Thank you.

oh
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: nanoBSD: buidlworld failure

2015-11-06 Thread NGie Cooper

> On Nov 5, 2015, at 22:51, O. Hartmann  wrote:
> 
> Since two days, I receive the error shown below on a build machine building
> nanoBSD. The recent CURRENT (nanoBSD sources AND the building host) is
> 
> FreeBSD 11.0-CURRENT #2 r290394: Thu Nov  5 16:30:20 CET 2015 amd64
> 
> 
> [...]
> cc   -O3 -pipe   -I/empty/src/ALG/CURRENT/usr.sbin/cron/lib/../cron 
> -DLOGIN_CAP
> -DPAM -DNDEBUG -std=gnu99 -fstack-protector-strong -Wsystem-headers -Wall
> -Wno-format-y2k -W -Wno-unused-parameter -Wstrict-prototypes
> -Wmissing-prototypes -Wpointer-arith -Wno-uninitialized -Wno-pointer-sign
> -Wno-empty-body -Wno-string-plus-int -Wno-unused-const-variable
> -Wno-tautological-compare -Wno-unused-value -Wno-parentheses-equality
> -Wno-unused-function -Wno-enum-conversion -Wno-unused-local-typedef
> -Qunused-arguments -c /empty/src/ALG/CURRENT/usr.sbin/cron/lib/env.c -o env.o
> --- all_subdir_usr.bin --- --- all_subdir_netstat --- if.o: In function
> `intpr': /empty/src/ALG/CURRENT/usr.bin/netstat/if.c:(.text+0x40f): undefined
> reference to `MAX' /empty/src/ALG/CURRENT/usr.bin/netstat/if.c:(.text+0x472):
> undefined reference to `MAX' cc: error: linker command failed with exit code 1
> (use -v to see invocation) *** [netstat] Error code 1

Fixed in r290437 — thank you for the report.
Take care!
-NGie

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: Timing issue with Dummynet on high kernel timer interrupt

2015-11-06 Thread Adrian Chadd
Ideally there'd be both behaviours:

* You'd specify whether a timer/sleep needs to be exact or can
withstand some jitter (which is what linux provides); and
* You can communicate to the kernel its aggressiveness for coalescing wakeups.

Teaching powerd to flip on/off a sysctl for this isn't that tricky. I
remember seeing someone else in #bsdmips having issues figuring out
sleep-in-kernel durations and it being exactly this stuff (and it
being you, Ian, that figured it out) - so I'd like to at least fix it
in -head this time around!

Also - I thought the other issue was that precise callouts (versus the
default for callouts now which is "fuzzy" unless you specify
precision) were still jittering. Is that the case?

Thanks,

-adrian
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Timing issue with Dummynet on high kernel timer interrupt

2015-11-06 Thread Bruce Evans

On Fri, 6 Nov 2015, Ian Lepore wrote:


On Fri, 2015-11-06 at 17:51 +0100, Hans Petter Selasky wrote:

On 11/06/15 17:43, Ian Lepore wrote:

On Fri, 2015-11-06 at 17:28 +0100, Hans Petter Selasky wrote:

Hi,




Do the test II results change with this setting?

   sysctl kern.timecounter.alloweddeviation=0


Yes, it looks much better:

debug.total: 10013 -> 0
debug.total: 10013 -> 0
...

This isn't the first time that the alloweddeviation feature has led
people (including me in the past) to think there is a timing bug.  I
think the main purpose of the feature is to help save battery power on
laptops by clustering nearby scheduled wakeups to all happen at the
same time and then allow for longer sleeps between each wakeup.


I was trying to remember the flag for turning off that "feature".  It
gives the bizarre behaviour that on an old system with a timer resolution
of 10 msec, "time sleep 1" sleeps for 1 second with an average error of
< 10 msec, but with a timer resolution of 1 msec for hardclock and finer
for short timeouts, "time sleep 1" sleeps for an average of an extra 30
msec (worst case 1.069 seconds IIRC).  Thus high resolution timers give
much lower resolution for medium-sized timeouts.  (For "sleep 10", the
average error is again 30 msec but this is relatively smaller, and for
"sleep .001" the average error must be less than 1 msec to work at all,
though it is likely to be relatively large.)


I've been wondering lately whether this might also be behind the
unexplained "load average is always 0.60" problem people have noticed
on some systems.  If load average is calculated by sampling what work
is happening when a timer interrupt fires, and the system is working
hard to ensure that a timer interrupt only happens when there is actual
work to do, you'd end up with statistics reporting that there is work
being done most of the time when it took a sample.


I use HZ = 100 and haven't seen this.  Strangely, HZ = 100 gives the same
69 msec max error for "sleep 1" as HZ = 1000.

Schedulers should mostly use the actual thread runtimes to avoid
sampling biases.  That might even be faster.  But it doesn't work so
well for the load average, or at all for resource usages that are
averages, or for the usr/sys/intr splitting of the runtime.  It is
good enough for scheduling since the splitting is not need for
scheduling.

Bruce
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [PATCH] microoptimize by trying to avoid locking a locked mutex

2015-11-06 Thread Hans Petter Selasky

On 11/06/15 01:55, Mateusz Guzik wrote:

On Thu, Nov 05, 2015 at 04:35:22PM -0700, Ian Lepore wrote:

On Thu, 2015-11-05 at 14:19 -0800, John Baldwin wrote:

On Thursday, November 05, 2015 01:45:19 PM Adrian Chadd wrote:

On 5 November 2015 at 11:26, Mateusz Guzik 
wrote:

On Thu, Nov 05, 2015 at 11:04:13AM -0800, John Baldwin wrote:

On Thursday, November 05, 2015 04:26:28 PM Konstantin Belousov
wrote:

On Thu, Nov 05, 2015 at 12:32:18AM +0100, Mateusz Guzik
wrote:

mtx_lock will unconditionally try to grab the lock and if
that fails,
will call __mtx_lock_sleep which will immediately try to do
the same
atomic op again.

So, the obvious microoptimization is to check the state in
__mtx_lock_sleep and avoid the operation if the lock is not
free.

This gives me ~40% speedup in a microbenchmark of 40 find
processes
traversing tmpfs and contending on mount mtx (only used as
an easy
benchmark, I have WIP patches to get rid of it).

Second part of the patch is optional and just checks the
state of the
lock prior to doing any atomic operations, but it gives a
very modest
speed up when applied on top of the __mtx_lock_sleep
change. As such,
I'm not going to defend this part.

Shouldn't the same consideration applied to all spinning
loops, i.e.
also to the spin/thread mutexes, and to the spinning parts of
sx and
lockmgr ?


I agree.  I think both changes are good and worth doing in our
other
primitives.



I glanced over e.g. rw_rlock and it did not have the issue, now
that I
see _sx_xlock_hard it wuld indeed use fixing.

Expect a patch in few h for all primitives I'll find. I'll stress
test
the kernel, but it is unlikely I'll do microbenchmarks for
remaining
primitives.


Is this stuff you're proposing still valid for non-x86 platforms?


Yes.  It just does a read before trying the atomic compare and swap
and
falls through to the hard case as if the atomic op failed if the
result
of the read would result in a compare failure.



The atomic ops include barriers, the new pre-read of the variable
doesn't.  Will that cause problems, especially for code inside a loop
where the compiler may decide to shuffle things around?



Lock value is volatile, so the compiler should not screw us up. I removed
the store to the variable due to a different concern. In worst case we
would have slept instead of spinning. (see below)


I suspect the performance gain will be biggest on the platforms where
atomic ops are expensive (I gather from various code comments that's
the case on x86).



I don't know about other architectures, on x86 atomic op performance on
contended cachelines deteriorates drastically.

===

For the most port the patch below does 2 simple kinds of changes:
1. in macros for lock/unlock primitives the lock value is fetched and
   compared against relevant _UNLOCKED macro prior to the atomic op
2. in functions:

before:
while (!_mtx_obtain_lock(m, tid)) {
v = m->mtx_lock;
if (v != MTX_UNOWNED) {

}
.
}

after:
for (;;) {
if (m->mtx_lock == MTX_UNOWNED && _mtx_obtain_lock(m, tid))
break;
v = m->mtx_lock;
if (v != MTX_UNOWNED) {

}
.
}

The original patch preloaded the 'v' variable. If the value was
MTX_UNOWNED and _mtx_obtain_lock failed, we just lost the race.  In
order to spin waiting for the other thread to release the lock, we have
to re-read the variable. Thus for simplicity it is no longer stored in
'v'. Note this is contrary to kib's earlier suggestion which would put
such threads directly to sleep. I don't have proper measurements to have
any strong opinion here, I went the aforementioned route to minimise
changes.

Note this is a trivial patch to improve the situation a little bit, I
have no interest in trying to polish these primitives at this time.

For overall results, on a machine with 32GB of RAM and the following:
CPU: Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz (2800.06-MHz K8-class
CPU)
FreeBSD/SMP: 2 package(s) x 10 core(s) x 2 SMT threads

this reduced make -j 40 kernel in a 10.2 jail from ~2:15 to ~2:05.

===

diff --git a/sys/kern/kern_lock.c b/sys/kern/kern_lock.c
index aa67180..198e93e 100644
--- a/sys/kern/kern_lock.c
+++ b/sys/kern/kern_lock.c
@@ -787,8 +787,10 @@ __lockmgr_args(struct lock *lk, u_int flags, struct 
lock_object *ilk,
break;
}

-   while (!atomic_cmpset_acq_ptr(>lk_lock, LK_UNLOCKED,
-   tid)) {
+   for (;;) {
+   if (lk->lk_lock == LK_UNLOCKED &&
+   atomic_cmpset_acq_ptr(>lk_lock, LK_UNLOCKED, 
tid))
+   break;
  #ifdef HWPMC_HOOKS
PMC_SOFT_CALL( , , lock, failed);
  #endif
@@ -1124,7 +1126,11 @@ __lockmgr_args(struct lock *lk, u_int flags, struct 

Re: strange kernel crash

2015-11-06 Thread Konstantin Belousov
On Fri, Nov 06, 2015 at 01:20:13PM +0200, Andriy Gapon wrote:
> Unread portion of the kernel message buffer:
> 
> Fatal trap 1: privileged instruction fault while in kernel mode
> cpuid = 0; apic id = 00
> instruction pointer = 0x20:0x80619a1e
> stack pointer   = 0x28:0xfe04f57856f0
> frame pointer   = 0x28:0xfe04f57857b0
> code segment= base 0x0, limit 0xf, type 0x1b
> = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags= interrupt enabled, resume, IOPL = 0
> current process = 2658 (firefox)
> trap number = 1
> panic: privileged instruction fault
> cpuid = 0
> curthread: 0xf803270b6000
> stack: 0xfe04f5782000 - 0xfe04f5786000
> stack pointer: 0xfe04f5785320
> KDB: stack backtrace:
> db_trace_self_wrapper() at 0x8041e86b = 
> db_trace_self_wrapper+0x2b/frame
> 0xfe04f5785250
> kdb_backtrace() at 0x80669f39 = kdb_backtrace+0x39/frame 
> 0xfe04f5785300
> vpanic() at 0x8063531c = vpanic+0x14c/frame 0xfe04f5785340
> panic() at 0x80635063 = panic+0x43/frame 0xfe04f57853a0
> trap_fatal() at 0x8081fc0f = trap_fatal+0x33f/frame 0xfe04f5785400
> trap() at 0x8081f872 = trap+0x7d2/frame 0xfe04f5785610
> trap_check() at 0x8081ff2a = trap_check+0x2a/frame 0xfe04f5785630
> calltrap() at 0x80807ea0 = calltrap+0x8/frame 0xfe04f5785630
> --- trap 0x1, rip = 0x80619a1e, rsp = 0xfe04f5785700, rbp =
> 0xfe04f57857b0 ---
> __mtx_lock_flags() at 0x80619a1e = __mtx_lock_flags+0x2ee/frame
> 0xfe04f57857b0
> uma_dbg_getslab() at 0x807df15c = uma_dbg_getslab+0x3c/frame
> 0xfe04f57857d0
> uma_dbg_alloc() at 0x807df08d = uma_dbg_alloc+0x2d/frame 
> 0xfe04f5785800
> uma_zalloc_arg() at 0x807dacf1 = uma_zalloc_arg+0x4b1/frame
> 0xfe04f5785890
> uma_zalloc() at 0x8068b040 = uma_zalloc+0x10/frame 0xfe04f57858a0
> selfdalloc() at 0x8068aa12 = selfdalloc+0x22/frame 0xfe04f57858c0
> pollscan() at 0x8068a615 = pollscan+0x95/frame 0xfe04f5785910
> kern_poll() at 0x8068a4b1 = kern_poll+0x1f1/frame 0xfe04f5785a70
> sys_poll() at 0x8068a2b9 = sys_poll+0x79/frame 0xfe04f5785a90
> syscallenter() at 0x80820560 = syscallenter+0x320/frame 
> 0xfe04f5785b00
> amd64_syscall() at 0x8082012f = amd64_syscall+0x1f/frame 
> 0xfe04f5785bf0
> Xfast_syscall() at 0x8080818b = Xfast_syscall+0xfb/frame 
> 0xfe04f5785bf0
> --- syscall (209, FreeBSD ELF64, sys_poll), rip = 0x80146342a, rsp =
> 0x7fffd8e8, rbp = 0x7fffd920 ---
> Uptime: 1d12h57m32s
> 
> 
> Now the strange part:
> 
>0x80619a18 <+744>:   jne0x80619a61 
> <__mtx_lock_flags+817>
>0x80619a1a <+746>:   mov%rbx,(%rsp)
> => 0x80619a1e <+750>:   movq   $0x0,0x18(%rsp)
>0x80619a27 <+759>:   movq   $0x0,0x10(%rsp)
>0x80619a30 <+768>:   movq   $0x0,0x8(%rsp)
> 
> RSP value seems to be sane and consistent with the stack information above:
> (kgdb) i reg
> rax0x4  4
> rbx0xf80126ea54f0   -8791145163536
> rcx0x8099a600   -2137414144
> rdx0xf803270b6000   -8782553063424
> rsi0x4  4
> rdi0xf80027f41318   -8795422715112
> rbp0xfe04f57857b0   0xfe04f57857b0
> rsp0xfe04f5785700   0xfe04f5785700
> r8 0x809a7727   -2137360601
> r9 0xf80126ea54f0   -8791145163536
> r100x3e81000
> r110xfe04f5785cc0   -2177725080384
> r120x1  1
> r130xf803270b6000   -8782553063424
> r140xf80027f41318   -8795422715112
> r150x0  0
> rip0x80619a1e   0x80619a1e 
> <__mtx_lock_flags+750>
> eflags 0x10246  [ PF ZF IF RF ]
> cs 0x20 32
> ss 0x28 40
> ds 
> es 
> fs 
> gs 
> 
> (kgdb) x/a $rsp
> 0xfe04f5785700: 0xf80126ea54f0
> (kgdb) x/a $rsp + 0x18
> 0xfe04f5785718: 0x0
> 
> I have no idea what could have caused the #GP.  This is certainly not a stack
> overflow.

This is a second report, please take a look at
https://lists.freebsd.org/pipermail/freebsd-current/2015-October/057975.html
I have no idea as well.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: r289932 causes pf reversion - breaks rules with broadcast destination

2015-11-06 Thread Kristof Provost
On 2015-11-05 12:39:22 (-0500), Tom Uffner  wrote:
> So, if my rule was "working" due to false positive in a comparison that has
> now been fixed, how many other address comparisons were affected by this
> error?
> 
> There are 36 occurrences of PF_ANEQ in pf.c and 2 in if_pfsync.c
> 
Most of them are an optimisation check. They're used in the NAT paths to
see if addresses need to be rewritten (and checksums updated) or not.
That's probably part of the reason it took so long to notice the bug in
the macro: in most cases a false positive only slowed things down a
little, it didn't actually produce an incorrect result.

I think I've reproduced your problem with very simple rules:
pass out
block out proto icmp
pass out log on vtnet0 proto icmp from any to vtnet0:broadcast
pass out log on vtnet0 proto icmp from any to 172.16.2.1

With those rules I can ping to ping 172.16.2.255 (vtnet0 has
172.16.2.2/24), but not to 172.16.2.1.
If I remove the broadcast rule I suddenly can ping to 172.16.2.1.

I suspect I've also found the source of the problem:
pf_addr_wrap_neq() uses PF_ANEQ(), but sets address family 0.
As a result of the fix that now means we always return false there.

Can you give this a quick test:

diff --git a/sys/netpfil/pf/pf.c b/sys/netpfil/pf/pf.c
index 1dfc37d..762b82e 100644
--- a/sys/netpfil/pf/pf.c
+++ b/sys/netpfil/pf/pf.c
@@ -1973,9 +1973,9 @@ pf_addr_wrap_neq(struct pf_addr_wrap *aw1, struct 
pf_addr_wrap *aw2)
switch (aw1->type) {
case PF_ADDR_ADDRMASK:
case PF_ADDR_RANGE:
-   if (PF_ANEQ(>v.a.addr, >v.a.addr, 0))
+   if (PF_ANEQ(>v.a.addr, >v.a.addr, AF_INET6))
return (1);
-   if (PF_ANEQ(>v.a.mask, >v.a.mask, 0))
+   if (PF_ANEQ(>v.a.mask, >v.a.mask, AF_INET6))
return (1);
return (0);
case PF_ADDR_DYNIFTL:

Regards,
Kristof
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: strange kernel crash

2015-11-06 Thread Andriy Gapon
On 06/11/2015 17:35, Konstantin Belousov wrote:
> This is a second report, please take a look at
> https://lists.freebsd.org/pipermail/freebsd-current/2015-October/057975.html
> I have no idea as well.

In my case it does not look like the code was overwritten, though.

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Timing issue with Dummynet on high kernel timer interrupt

2015-11-06 Thread Hans Petter Selasky

Hi,

I spent some time to write a test application to investigate this issue 
and I found some irregularities, that when kern.eventtimer.periodic=0, 
the timer appears to run very irregular.


Test software:
==

fetch http://home.selasky.org:8192/privat/callout_test_dummynet.tar.gz
tar -zxvf callout_test_dummynet.tar.gz
cd callout_test_dummynet
make -m /usr/src/share/mk SYSDIR=/usr/src/sys all install

Test I:
===

The following settings are placed in /boot/loader.conf and computer 
rebooted.

kern.hz=1
kern.eventtimer.periodic=1

kldload callout_test

./test.sh
debug.total: 32604 -> 0
debug.total: 10021 -> 0
debug.total: 10020 -> 0
debug.total: 10020 -> 0
debug.total: 10020 -> 0
debug.total: 10020 -> 0
debug.total: 10020 -> 0
debug.total: 10020 -> 0
debug.total: 10019 -> 0
debug.total: 10021 -> 0
debug.total: 10020 -> 0
debug.total: 10021 -> 0
debug.total: 10019 -> 0
debug.total: 10020 -> 0
debug.total: 10018 -> 0
debug.total: 10021 -> 0
debug.total: 10020 -> 0
debug.total: 10020 -> 0
debug.total: 10020 -> 0
debug.total: 10020 -> 0
debug.total: 10020 -> 0
debug.total: 10020 -> 0
debug.total: 10020 -> 0
debug.total: 10020 -> 0

Test II:


The following settings are placed in /boot/loader.conf and computer 
rebooted.

kern.hz=1
kern.eventtimer.periodic=0

kldload callout_test

./test.sh
debug.total: 20337 -> 0
debug.total: 10091 -> 0
debug.total: 10018 -> 1
debug.total: 10041 -> 0
debug.total: 10019 -> 0
debug.total: 10644 -> 0
debug.total: 10014 -> 1
debug.total: 10020 -> 0
debug.total: 10019 -> 0
debug.total: 10644 -> 0
debug.total: 10014 -> 0
debug.total: 10644 -> 0
debug.total: 10640 -> 0
debug.total: 10210 -> 0
debug.total: 10015 -> 0
debug.total: 10020 -> 1
debug.total: 10020 -> 0
debug.total: 10453 -> 0
debug.total: 10642 -> 0

If you load the computer, like a multi-CPU buildkernel, the value 
flattens out again.


Test III:
=
The following settings are placed in /boot/loader.conf and computer 
rebooted.

kern.hz=1000
kern.eventtimer.periodic=1

kldload callout_test

./test.sh
debug.total: 5238 -> 0
debug.total: 1005 -> 0
debug.total: 1003 -> 0
debug.total: 1006 -> 0
debug.total: 1003 -> 0
debug.total: 1003 -> 0
debug.total: 1005 -> 0
debug.total: 1006 -> 0
debug.total: 1003 -> 0
debug.total: 1013 -> 0
debug.total: 1005 -> 0
debug.total: 1006 -> 0
debug.total: 1004 -> 0
debug.total: 1005 -> 0
debug.total: 1003 -> 0
debug.total: 1004 -> 0
debug.total: 1006 -> 0
debug.total: 1004 -> 0
debug.total: 1004 -> 0
debug.total: 1007 -> 0
debug.total: 1003 -> 0
debug.total: 1004 -> 0


The difference between 100021 and 10642 interrupts in a second for test 
II is quite big, close to 6.5% deviation. Does anyone have any clue if 
this is the expected behaviour when kern.eventtimer.periodic=1 or not?


MAV ?

--HPS
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Timing issue with Dummynet on high kernel timer interrupt

2015-11-06 Thread Ian Lepore
On Fri, 2015-11-06 at 17:28 +0100, Hans Petter Selasky wrote:
> Hi,
> 
> I spent some time to write a test application to investigate this
> issue 
> and I found some irregularities, that when
> kern.eventtimer.periodic=0, 
> the timer appears to run very irregular.
> 
> Test software:
> ==
> 
> fetch 
> http://home.selasky.org:8192/privat/callout_test_dummynet.tar.gz
> tar -zxvf callout_test_dummynet.tar.gz
> cd callout_test_dummynet
> make -m /usr/src/share/mk SYSDIR=/usr/src/sys all install
> 
> Test I:
> ===
> 
> The following settings are placed in /boot/loader.conf and computer 
> rebooted.
> kern.hz=1
> kern.eventtimer.periodic=1
> 
> kldload callout_test
> 
> ./test.sh
> debug.total: 32604 -> 0
> debug.total: 10021 -> 0
> debug.total: 10020 -> 0
> debug.total: 10020 -> 0
> debug.total: 10020 -> 0
> debug.total: 10020 -> 0
> debug.total: 10020 -> 0
> debug.total: 10020 -> 0
> debug.total: 10019 -> 0
> debug.total: 10021 -> 0
> debug.total: 10020 -> 0
> debug.total: 10021 -> 0
> debug.total: 10019 -> 0
> debug.total: 10020 -> 0
> debug.total: 10018 -> 0
> debug.total: 10021 -> 0
> debug.total: 10020 -> 0
> debug.total: 10020 -> 0
> debug.total: 10020 -> 0
> debug.total: 10020 -> 0
> debug.total: 10020 -> 0
> debug.total: 10020 -> 0
> debug.total: 10020 -> 0
> debug.total: 10020 -> 0
> 
> Test II:
> 
> 
> The following settings are placed in /boot/loader.conf and computer 
> rebooted.
> kern.hz=1
> kern.eventtimer.periodic=0
> 
> kldload callout_test
> 
> ./test.sh
> debug.total: 20337 -> 0
> debug.total: 10091 -> 0
> debug.total: 10018 -> 1
> debug.total: 10041 -> 0
> debug.total: 10019 -> 0
> debug.total: 10644 -> 0
> debug.total: 10014 -> 1
> debug.total: 10020 -> 0
> debug.total: 10019 -> 0
> debug.total: 10644 -> 0
> debug.total: 10014 -> 0
> debug.total: 10644 -> 0
> debug.total: 10640 -> 0
> debug.total: 10210 -> 0
> debug.total: 10015 -> 0
> debug.total: 10020 -> 1
> debug.total: 10020 -> 0
> debug.total: 10453 -> 0
> debug.total: 10642 -> 0
> 
> If you load the computer, like a multi-CPU buildkernel, the value 
> flattens out again.
> 
> Test III:
> =
> The following settings are placed in /boot/loader.conf and computer 
> rebooted.
> kern.hz=1000
> kern.eventtimer.periodic=1
> 
> kldload callout_test
> 
> ./test.sh
> debug.total: 5238 -> 0
> debug.total: 1005 -> 0
> debug.total: 1003 -> 0
> debug.total: 1006 -> 0
> debug.total: 1003 -> 0
> debug.total: 1003 -> 0
> debug.total: 1005 -> 0
> debug.total: 1006 -> 0
> debug.total: 1003 -> 0
> debug.total: 1013 -> 0
> debug.total: 1005 -> 0
> debug.total: 1006 -> 0
> debug.total: 1004 -> 0
> debug.total: 1005 -> 0
> debug.total: 1003 -> 0
> debug.total: 1004 -> 0
> debug.total: 1006 -> 0
> debug.total: 1004 -> 0
> debug.total: 1004 -> 0
> debug.total: 1007 -> 0
> debug.total: 1003 -> 0
> debug.total: 1004 -> 0
> 
> 
> The difference between 100021 and 10642 interrupts in a second for
> test 
> II is quite big, close to 6.5% deviation. Does anyone have any clue
> if 
> this is the expected behaviour when kern.eventtimer.periodic=1 or
> not?
> 
> MAV ?
> 
> --HPS

Do the test II results change with this setting?

  sysctl kern.timecounter.alloweddeviation=0

-- Ian

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Timing issue with Dummynet on high kernel timer interrupt

2015-11-06 Thread Hans Petter Selasky

On 11/06/15 17:43, Ian Lepore wrote:

On Fri, 2015-11-06 at 17:28 +0100, Hans Petter Selasky wrote:

Hi,




Do the test II results change with this setting?

   sysctl kern.timecounter.alloweddeviation=0



Yes, it looks much better:

debug.total: 10013 -> 0
debug.total: 10013 -> 0
debug.total: 10012 -> 0
debug.total: 10012 -> 0
debug.total: 10012 -> 0
debug.total: 10013 -> 0
debug.total: 10012 -> 1
debug.total: 10013 -> 0
debug.total: 10013 -> 0
debug.total: 10013 -> 0
debug.total: 10012 -> 0
debug.total: 10013 -> 0
debug.total: 10013 -> 0

--HPS

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Timing issue with Dummynet on high kernel timer interrupt

2015-11-06 Thread Hans Petter Selasky

On 11/06/15 17:51, Hans Petter Selasky wrote:

On 11/06/15 17:43, Ian Lepore wrote:

On Fri, 2015-11-06 at 17:28 +0100, Hans Petter Selasky wrote:

Hi,




Do the test II results change with this setting?

   sysctl kern.timecounter.alloweddeviation=0



Yes, it looks much better:

debug.total: 10013 -> 0
debug.total: 10013 -> 0
debug.total: 10012 -> 0
debug.total: 10012 -> 0
debug.total: 10012 -> 0
debug.total: 10013 -> 0
debug.total: 10012 -> 1
debug.total: 10013 -> 0
debug.total: 10013 -> 0
debug.total: 10013 -> 0
debug.total: 10012 -> 0
debug.total: 10013 -> 0
debug.total: 10013 -> 0

--HPS



Thought I still see some unexpected dips, as the test runs for a longer 
amount of time. Not sure what the cause might be.



debug.total: 10013 -> 0
debug.total: 10013 -> 0
debug.total: 10013 -> 0
debug.total: 9844 -> 0
debug.total: 10013 -> 0
debug.total: 10013 -> 0
debug.total: 10012 -> 0


--HPS
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Timing issue with Dummynet on high kernel timer interrupt

2015-11-06 Thread Ian Lepore
On Fri, 2015-11-06 at 17:51 +0100, Hans Petter Selasky wrote:
> On 11/06/15 17:43, Ian Lepore wrote:
> > On Fri, 2015-11-06 at 17:28 +0100, Hans Petter Selasky wrote:
> > > Hi,
> 
> > 
> > Do the test II results change with this setting?
> > 
> >sysctl kern.timecounter.alloweddeviation=0
> > 
> 
> Yes, it looks much better:
> 
> debug.total: 10013 -> 0
> debug.total: 10013 -> 0
> debug.total: 10012 -> 0
> debug.total: 10012 -> 0
> debug.total: 10012 -> 0
> debug.total: 10013 -> 0
> debug.total: 10012 -> 1
> debug.total: 10013 -> 0
> debug.total: 10013 -> 0
> debug.total: 10013 -> 0
> debug.total: 10012 -> 0
> debug.total: 10013 -> 0
> debug.total: 10013 -> 0
> 
> --HPS

This isn't the first time that the alloweddeviation feature has led
people (including me in the past) to think there is a timing bug.  I
think the main purpose of the feature is to help save battery power on
laptops by clustering nearby scheduled wakeups to all happen at the
same time and then allow for longer sleeps between each wakeup.

I've been wondering lately whether this might also be behind the
unexplained "load average is always 0.60" problem people have noticed
on some systems.  If load average is calculated by sampling what work
is happening when a timer interrupt fires, and the system is working
hard to ensure that a timer interrupt only happens when there is actual
work to do, you'd end up with statistics reporting that there is work
being done most of the time when it took a sample.

Maybe it would make sense to only enable the feature by default on
battery-powered systems?

-- Ian

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


strange kernel crash

2015-11-06 Thread Andriy Gapon
Unread portion of the kernel message buffer:

Fatal trap 1: privileged instruction fault while in kernel mode
cpuid = 0; apic id = 00
instruction pointer = 0x20:0x80619a1e
stack pointer   = 0x28:0xfe04f57856f0
frame pointer   = 0x28:0xfe04f57857b0
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 2658 (firefox)
trap number = 1
panic: privileged instruction fault
cpuid = 0
curthread: 0xf803270b6000
stack: 0xfe04f5782000 - 0xfe04f5786000
stack pointer: 0xfe04f5785320
KDB: stack backtrace:
db_trace_self_wrapper() at 0x8041e86b = db_trace_self_wrapper+0x2b/frame
0xfe04f5785250
kdb_backtrace() at 0x80669f39 = kdb_backtrace+0x39/frame 
0xfe04f5785300
vpanic() at 0x8063531c = vpanic+0x14c/frame 0xfe04f5785340
panic() at 0x80635063 = panic+0x43/frame 0xfe04f57853a0
trap_fatal() at 0x8081fc0f = trap_fatal+0x33f/frame 0xfe04f5785400
trap() at 0x8081f872 = trap+0x7d2/frame 0xfe04f5785610
trap_check() at 0x8081ff2a = trap_check+0x2a/frame 0xfe04f5785630
calltrap() at 0x80807ea0 = calltrap+0x8/frame 0xfe04f5785630
--- trap 0x1, rip = 0x80619a1e, rsp = 0xfe04f5785700, rbp =
0xfe04f57857b0 ---
__mtx_lock_flags() at 0x80619a1e = __mtx_lock_flags+0x2ee/frame
0xfe04f57857b0
uma_dbg_getslab() at 0x807df15c = uma_dbg_getslab+0x3c/frame
0xfe04f57857d0
uma_dbg_alloc() at 0x807df08d = uma_dbg_alloc+0x2d/frame 
0xfe04f5785800
uma_zalloc_arg() at 0x807dacf1 = uma_zalloc_arg+0x4b1/frame
0xfe04f5785890
uma_zalloc() at 0x8068b040 = uma_zalloc+0x10/frame 0xfe04f57858a0
selfdalloc() at 0x8068aa12 = selfdalloc+0x22/frame 0xfe04f57858c0
pollscan() at 0x8068a615 = pollscan+0x95/frame 0xfe04f5785910
kern_poll() at 0x8068a4b1 = kern_poll+0x1f1/frame 0xfe04f5785a70
sys_poll() at 0x8068a2b9 = sys_poll+0x79/frame 0xfe04f5785a90
syscallenter() at 0x80820560 = syscallenter+0x320/frame 
0xfe04f5785b00
amd64_syscall() at 0x8082012f = amd64_syscall+0x1f/frame 
0xfe04f5785bf0
Xfast_syscall() at 0x8080818b = Xfast_syscall+0xfb/frame 
0xfe04f5785bf0
--- syscall (209, FreeBSD ELF64, sys_poll), rip = 0x80146342a, rsp =
0x7fffd8e8, rbp = 0x7fffd920 ---
Uptime: 1d12h57m32s


Now the strange part:

   0x80619a18 <+744>:   jne0x80619a61 <__mtx_lock_flags+817>
   0x80619a1a <+746>:   mov%rbx,(%rsp)
=> 0x80619a1e <+750>:   movq   $0x0,0x18(%rsp)
   0x80619a27 <+759>:   movq   $0x0,0x10(%rsp)
   0x80619a30 <+768>:   movq   $0x0,0x8(%rsp)

RSP value seems to be sane and consistent with the stack information above:
(kgdb) i reg
rax0x4  4
rbx0xf80126ea54f0   -8791145163536
rcx0x8099a600   -2137414144
rdx0xf803270b6000   -8782553063424
rsi0x4  4
rdi0xf80027f41318   -8795422715112
rbp0xfe04f57857b0   0xfe04f57857b0
rsp0xfe04f5785700   0xfe04f5785700
r8 0x809a7727   -2137360601
r9 0xf80126ea54f0   -8791145163536
r100x3e81000
r110xfe04f5785cc0   -2177725080384
r120x1  1
r130xf803270b6000   -8782553063424
r140xf80027f41318   -8795422715112
r150x0  0
rip0x80619a1e   0x80619a1e 
<__mtx_lock_flags+750>
eflags 0x10246  [ PF ZF IF RF ]
cs 0x20 32
ss 0x28 40
ds 
es 
fs 
gs 

(kgdb) x/a $rsp
0xfe04f5785700: 0xf80126ea54f0
(kgdb) x/a $rsp + 0x18
0xfe04f5785718: 0x0

I have no idea what could have caused the #GP.  This is certainly not a stack
overflow.

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Timing issue with Dummynet on high kernel timer interrupt

2015-11-06 Thread Ian Lepore
On Fri, 2015-11-06 at 17:57 +0100, Hans Petter Selasky wrote:
> On 11/06/15 17:51, Hans Petter Selasky wrote:
> > On 11/06/15 17:43, Ian Lepore wrote:
> > > On Fri, 2015-11-06 at 17:28 +0100, Hans Petter Selasky wrote:
> > > > Hi,
> > 
> > > 
> > > Do the test II results change with this setting?
> > > 
> > >sysctl kern.timecounter.alloweddeviation=0
> > > 
> > 
> > Yes, it looks much better:
> > 
> > debug.total: 10013 -> 0
> > debug.total: 10013 -> 0
> > debug.total: 10012 -> 0
> > debug.total: 10012 -> 0
> > debug.total: 10012 -> 0
> > debug.total: 10013 -> 0
> > debug.total: 10012 -> 1
> > debug.total: 10013 -> 0
> > debug.total: 10013 -> 0
> > debug.total: 10013 -> 0
> > debug.total: 10012 -> 0
> > debug.total: 10013 -> 0
> > debug.total: 10013 -> 0
> > 
> > --HPS
> > 
> 
> Thought I still see some unexpected dips, as the test runs for a
> longer 
> amount of time. Not sure what the cause might be.
> 
> > debug.total: 10013 -> 0
> > debug.total: 10013 -> 0
> > debug.total: 10013 -> 0
> > debug.total: 9844 -> 0
> > debug.total: 10013 -> 0
> > debug.total: 10013 -> 0
> > debug.total: 10012 -> 0
> 
> --HPS

Is it possible your machine is occasionally falling into a deeper sleep
state (C3 or whatever)?  I think it can take hundreds of microseconds
to wake up from some of the deeper power-saving modes.

If not power-saving, just plain old system-is-busy could lead to the
reduced counts occasionally, since each callout is scheduled as a delta
from the current actual wakeup time, not a delta from the current
scheduled wakeup time (i.e., there is a phase shift in the firing of
events over time).  Raising the priority of the test thread into the
realtime range might help with system-busy variation.

It's also a bit iffy that you're using eventtimers to measure the
performance of eventtimers (by using sleep 1 in the script).  That
explains the +12/13 count every second, not the reduced counts unless
we speculate that sleep is returning early (which I have NEVER caught
it doing).  If you have a PPS source available, that can make a good
way to sleep based on an off-box timing signal and avoid using a clock
to measure itself.  But I doubt it would make a big difference in this
case.

-- Ian

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: strange kernel crash

2015-11-06 Thread Hans Petter Selasky

On 11/06/15 12:20, Andriy Gapon wrote:

Now the strange part:

0x80619a18 <+744>:   jne0x80619a61 
<__mtx_lock_flags+817>
0x80619a1a <+746>:   mov%rbx,(%rsp)
=> 0x80619a1e <+750>:   movq   $0x0,0x18(%rsp)
0x80619a27 <+759>:   movq   $0x0,0x10(%rsp)
0x80619a30 <+768>:   movq   $0x0,0x8(%rsp)


Were these instructions dumped from RAM or from the kernel ELF file?

--HPS
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


panic in zfs on reboot

2015-11-06 Thread Kurt Lidl

I updated my machine that tracks head and rebooted it,
and it panic'd during the 'shutdown -r' execution.

  Panic String: solaris assert: zrl->zr_refcount == 0 (0x2 == 0x0), 
file: /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zrlock.c, 
line: 64


(kgdb) bt
#0  doadump (textdump=1) at pcpu.h:221
#1  0x80708ec5 in kern_reboot (howto=260)
at /usr/src/sys/kern/kern_shutdown.c:364
#2  0x8070949b in vpanic (fmt=,
ap=) at /usr/src/sys/kern/kern_shutdown.c:757
#3  0x807094e3 in panic (fmt=0x0)
at /usr/src/sys/kern/kern_shutdown.c:688
#4  0x81b3c25f in assfail3 (a=,
lv=, op=,
rv=, f=, l=optimized out>)

at /usr/src/sys/cddl/compat/opensolaris/kern/opensolaris_cmn_err.c:91
#5  0x8187fe54 in zrl_destroy (zrl=0xf8001d429820)
at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zrlock.c:64
#6  0x81804432 in dnode_special_close (dnh=0xf8001d429820)
at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dnode.c:1005
#7  0x817fa608 in dmu_objset_evict_done (os=0xf8001d429800)
at 
/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_objset.c:726

#8  0x8180f191 in dsl_dataset_evict (dbu=0xf8000f957000)
at 
/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_dataset.c:287

#9  0x80756ae0 in taskqueue_run_locked (queue=0xf8000d00cd00)
at /usr/src/sys/kern/subr_taskqueue.c:430
#10 0x807575f8 in taskqueue_thread_loop (arg=)
at /usr/src/sys/kern/subr_taskqueue.c:683
#11 0x806cf6b4 in fork_exit (
callout=0x80757570 ,
arg=0xf8000d01e020, frame=0xfe085c058ac0)
at /usr/src/sys/kern/kern_fork.c:1011
#12 0x809ae70e in fork_trampoline ()
at /usr/src/sys/amd64/amd64/exception.S:609
#13 0x in ?? ()

I have a vmcore for this panic.  I have saved off the installed kernel
directory for future examination.

-Kurt
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: strange kernel crash

2015-11-06 Thread Don Lewis
On  6 Nov, Konstantin Belousov wrote:
> On Fri, Nov 06, 2015 at 01:20:13PM +0200, Andriy Gapon wrote:
>> Unread portion of the kernel message buffer:
>> 
>> Fatal trap 1: privileged instruction fault while in kernel mode
>> cpuid = 0; apic id = 00
>> instruction pointer = 0x20:0x80619a1e
>> stack pointer   = 0x28:0xfe04f57856f0
>> frame pointer   = 0x28:0xfe04f57857b0
>> code segment= base 0x0, limit 0xf, type 0x1b
>> = DPL 0, pres 1, long 1, def32 0, gran 1
>> processor eflags= interrupt enabled, resume, IOPL = 0
>> current process = 2658 (firefox)
>> trap number = 1
>> panic: privileged instruction fault
>> cpuid = 0
>> curthread: 0xf803270b6000
>> stack: 0xfe04f5782000 - 0xfe04f5786000
>> stack pointer: 0xfe04f5785320
>> KDB: stack backtrace:
>> db_trace_self_wrapper() at 0x8041e86b = 
>> db_trace_self_wrapper+0x2b/frame
>> 0xfe04f5785250
>> kdb_backtrace() at 0x80669f39 = kdb_backtrace+0x39/frame 
>> 0xfe04f5785300
>> vpanic() at 0x8063531c = vpanic+0x14c/frame 0xfe04f5785340
>> panic() at 0x80635063 = panic+0x43/frame 0xfe04f57853a0
>> trap_fatal() at 0x8081fc0f = trap_fatal+0x33f/frame 
>> 0xfe04f5785400
>> trap() at 0x8081f872 = trap+0x7d2/frame 0xfe04f5785610
>> trap_check() at 0x8081ff2a = trap_check+0x2a/frame 0xfe04f5785630
>> calltrap() at 0x80807ea0 = calltrap+0x8/frame 0xfe04f5785630
>> --- trap 0x1, rip = 0x80619a1e, rsp = 0xfe04f5785700, rbp =
>> 0xfe04f57857b0 ---
>> __mtx_lock_flags() at 0x80619a1e = __mtx_lock_flags+0x2ee/frame
>> 0xfe04f57857b0
>> uma_dbg_getslab() at 0x807df15c = uma_dbg_getslab+0x3c/frame
>> 0xfe04f57857d0
>> uma_dbg_alloc() at 0x807df08d = uma_dbg_alloc+0x2d/frame 
>> 0xfe04f5785800
>> uma_zalloc_arg() at 0x807dacf1 = uma_zalloc_arg+0x4b1/frame
>> 0xfe04f5785890
>> uma_zalloc() at 0x8068b040 = uma_zalloc+0x10/frame 0xfe04f57858a0
>> selfdalloc() at 0x8068aa12 = selfdalloc+0x22/frame 0xfe04f57858c0
>> pollscan() at 0x8068a615 = pollscan+0x95/frame 0xfe04f5785910
>> kern_poll() at 0x8068a4b1 = kern_poll+0x1f1/frame 0xfe04f5785a70
>> sys_poll() at 0x8068a2b9 = sys_poll+0x79/frame 0xfe04f5785a90
>> syscallenter() at 0x80820560 = syscallenter+0x320/frame 
>> 0xfe04f5785b00
>> amd64_syscall() at 0x8082012f = amd64_syscall+0x1f/frame 
>> 0xfe04f5785bf0
>> Xfast_syscall() at 0x8080818b = Xfast_syscall+0xfb/frame 
>> 0xfe04f5785bf0
>> --- syscall (209, FreeBSD ELF64, sys_poll), rip = 0x80146342a, rsp =
>> 0x7fffd8e8, rbp = 0x7fffd920 ---
>> Uptime: 1d12h57m32s
>> 
>> 
>> Now the strange part:
>> 
>>0x80619a18 <+744>:   jne0x80619a61 
>> <__mtx_lock_flags+817>
>>0x80619a1a <+746>:   mov%rbx,(%rsp)
>> => 0x80619a1e <+750>:   movq   $0x0,0x18(%rsp)
>>0x80619a27 <+759>:   movq   $0x0,0x10(%rsp)
>>0x80619a30 <+768>:   movq   $0x0,0x8(%rsp)
>> 
>> RSP value seems to be sane and consistent with the stack information above:
>> (kgdb) i reg
>> rax0x4  4
>> rbx0xf80126ea54f0   -8791145163536
>> rcx0x8099a600   -2137414144
>> rdx0xf803270b6000   -8782553063424
>> rsi0x4  4
>> rdi0xf80027f41318   -8795422715112
>> rbp0xfe04f57857b0   0xfe04f57857b0
>> rsp0xfe04f5785700   0xfe04f5785700
>> r8 0x809a7727   -2137360601
>> r9 0xf80126ea54f0   -8791145163536
>> r100x3e81000
>> r110xfe04f5785cc0   -2177725080384
>> r120x1  1
>> r130xf803270b6000   -8782553063424
>> r140xf80027f41318   -8795422715112
>> r150x0  0
>> rip0x80619a1e   0x80619a1e 
>> <__mtx_lock_flags+750>
>> eflags 0x10246  [ PF ZF IF RF ]
>> cs 0x20 32
>> ss 0x28 40
>> ds 
>> es 
>> fs 
>> gs 
>> 
>> (kgdb) x/a $rsp
>> 0xfe04f5785700: 0xf80126ea54f0
>> (kgdb) x/a $rsp + 0x18
>> 0xfe04f5785718: 0x0
>> 
>> I have no idea what could have caused the #GP.  This is certainly not a stack
>> overflow.
> 
> This is a second report, please take a look at
> https://lists.freebsd.org/pipermail/freebsd-current/2015-October/057975.html
> I have no idea as well.

Whatever the problem is, it appears to be hard to trigger.  I haven't
had a recurrence.

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to 

Re: panic in zfs on reboot

2015-11-06 Thread Kurt Lidl

On 11/6/15 1:12 PM, Kurt Lidl wrote:

I updated my machine that tracks head and rebooted it,
and it panic'd during the 'shutdown -r' execution.

   Panic String: solaris assert: zrl->zr_refcount == 0 (0x2 == 0x0),
file: /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zrlock.c,
line: 64

(kgdb) bt
#0  doadump (textdump=1) at pcpu.h:221
#1  0x80708ec5 in kern_reboot (howto=260)
 at /usr/src/sys/kern/kern_shutdown.c:364
#2  0x8070949b in vpanic (fmt=,
 ap=) at /usr/src/sys/kern/kern_shutdown.c:757
#3  0x807094e3 in panic (fmt=0x0)
 at /usr/src/sys/kern/kern_shutdown.c:688
#4  0x81b3c25f in assfail3 (a=,
 lv=, op=,
 rv=, f=, l=)
 at /usr/src/sys/cddl/compat/opensolaris/kern/opensolaris_cmn_err.c:91
#5  0x8187fe54 in zrl_destroy (zrl=0xf8001d429820)
 at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zrlock.c:64
#6  0x81804432 in dnode_special_close (dnh=0xf8001d429820)
 at
/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dnode.c:1005
#7  0x817fa608 in dmu_objset_evict_done (os=0xf8001d429800)
 at
/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_objset.c:726
#8  0x8180f191 in dsl_dataset_evict (dbu=0xf8000f957000)
 at
/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_dataset.c:287
#9  0x80756ae0 in taskqueue_run_locked (queue=0xf8000d00cd00)
 at /usr/src/sys/kern/subr_taskqueue.c:430
#10 0x807575f8 in taskqueue_thread_loop (arg=)
 at /usr/src/sys/kern/subr_taskqueue.c:683
#11 0x806cf6b4 in fork_exit (
 callout=0x80757570 ,
 arg=0xf8000d01e020, frame=0xfe085c058ac0)
 at /usr/src/sys/kern/kern_fork.c:1011
#12 0x809ae70e in fork_trampoline ()
 at /usr/src/sys/amd64/amd64/exception.S:609
#13 0x in ?? ()

I have a vmcore for this panic.  I have saved off the installed kernel
directory for future examination.

-Kurt


I should have copied in the whole info.2 file:

Dump header from device: /dev/gpt/swap0
  Architecture: amd64
  Architecture Version: 2
  Dump Length: 1404231680
  Blocksize: 512
  Dumptime: Fri Nov  6 12:10:25 2015
  Hostname: busybox.pix.net
  Magic: FreeBSD Kernel Dump
  Version String: FreeBSD 11.0-CURRENT #86 r290447M: Fri Nov  6 
11:00:18 EST 2015

l...@busybox.pix.net:/usr/obj/usr/src/sys/BUSYBOX
  Panic String: solaris assert: zrl->zr_refcount == 0 (0x2 == 0x0), 
file: /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zrlock.c, 
line: 64

  Dump Parity: 375889703
  Bounds: 2
  Dump Status: good

The only delta in my source tree on this machine is a patch
to the pagezero.S file, which I've been running for several months,
so I know it's not that.

(https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=199151 for the
delta, if you care.  I need to benchmark this more fully, but just
haven't gotten around to doing it yet.)

-Kurt



___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Timing issue with Dummynet on high kernel timer interrupt

2015-11-06 Thread John-Mark Gurney
Adrian Chadd wrote this message on Fri, Nov 06, 2015 at 11:15 -0800:
> Ideally there'd be both behaviours:
> 
> * You'd specify whether a timer/sleep needs to be exact or can
> withstand some jitter (which is what linux provides); and

Isn't that what the precision argument in callout is for?

See callout_reset_sbt(9):
 The sbt, pr, and flags arguments provide more control over the scheduled
 time including support for higher resolution times, specifying the
 precision of the scheduled time, and setting an absolute deadline instead
 of a relative timeout.  The callout is scheduled to execute in a time
 window which begins at the time specified in sbt and extends for the
 amount of time specified in pr.  If sbt specifies a time in the past, the
 window is adjusted to start at the current time.  A non-zero value for pr
 allows the callout subsystem to coalesce callouts scheduled close to each
 other into fewer timer interrupts, reducing processing overhead and power
 consumption.  These flags may be specified to adjust the interpretation
 of sbt and pr:


-- 
  John-Mark Gurney  Voice: +1 415 225 5579

 "All that I will do, has been done, All that I have, has not."
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [PATCH] microoptimize by trying to avoid locking a locked mutex

2015-11-06 Thread David Chisnall
On 5 Nov 2015, at 16:55, Mateusz Guzik  wrote:
> 
>> Will that cause problems, especially for code inside a loop
>> where the compiler may decide to shuffle things around?
>> 
> 
> Lock value is volatile, so the compiler should not screw us up.

Note: volatile means do not reorder loads/stores *to this value*, the compiler 
is completely free to reorder loads/stores to other values around this one.  
_Atomic exists in C11 to do the right thing when you want to enforce stricter 
barriers.  In this case, we’re currently relying on the compiler not being able 
to see through the assembly.

At some point in the future, I imagine that we’ll move to using C11 atomic 
operations, which will allow the compiler more freedom.  The new reads that 
you’re currently doing should be replaced by explicit atomic reads with a 
relaxed memory order (to enforce atomicity with respect to that variable).  If 
we were to replace the mtx_lock field with an _Atomic variant, then the current 
code would make it a sequentially consistent read, which would add a lot of 
extra overhead.  Note also that it is undefined behaviour in C11 to do atomic 
and non-atomic accesses to the same underlying memory location.  Please 
consider the future maintainability of the code when making this change.

David

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: FYI: SVN to GIT converter currently broken, github is falling behind

2015-11-06 Thread Ulrich Spörlein
2015-11-05 15:46 GMT+01:00 Ulrich Spörlein :
> 2015-11-04 18:57 GMT+01:00 Ulrich Spörlein :
>> The recent SVN update on the cluster broke svn2git in certain circumstances.
>>
>> To fix this and make sure no content was dropped, the converter has
>> been stopped and we're working on bringing a fixed version online, as
>> well as vetting the correctness of the published git repositories.
>>
>> ETA is currently unknown, expect an update to this thread within 24h.
>> Sorry for the inconvenience!
>>
>> Uli, on behalf of git-admin
>
> An independent run of the converter produces a different git
> repository starting at the commit following this one:
> https://github.com/freebsd/freebsd/commit/bf66c97c4a64e64410bf0223d221a54ca9555f52
>
> This is from 9d ago and will likely require a force push to github
> that will necessitate people to rebase or merge there work (a
> fast-forward merge will fail).
>
> This is the preliminary inspection and a third verification run is
> currently underway. Expect another update within 24h.
>
> Uli

We've rolled back to SVN 1.8 and have resumed normal operations. We're
still working on getting svn2git to work nice with SVN 1.9 and produce
the same repository checksums.

Sorry for the inconvenience!
Uli
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: strange kernel crash

2015-11-06 Thread Andriy Gapon
On 06/11/2015 20:02, Hans Petter Selasky wrote:
> On 11/06/15 12:20, Andriy Gapon wrote:
>> Now the strange part:
>>
>> 0x80619a18 <+744>:   jne0x80619a61 
>> <__mtx_lock_flags+817>
>> 0x80619a1a <+746>:   mov%rbx,(%rsp)
>> => 0x80619a1e <+750>:   movq   $0x0,0x18(%rsp)
>> 0x80619a27 <+759>:   movq   $0x0,0x10(%rsp)
>> 0x80619a30 <+768>:   movq   $0x0,0x8(%rsp)
> 
> Were these instructions dumped from RAM or from the kernel ELF file?

Whatever minidump and kgdb (libkvm) do for the text section.
Just in case, in addition to 'disassemble' I also did this:

(kgdb) x/i 0x80619a1e
=> 0x80619a1e <__mtx_lock_flags+750>:   movq   $0x0,0x18(%rsp)


-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"