Re: [PATCH] microoptimize by trying to avoid locking a locked mutex

2015-11-05 Thread Konstantin Belousov
On Thu, Nov 05, 2015 at 12:32:18AM +0100, Mateusz Guzik wrote:
> mtx_lock will unconditionally try to grab the lock and if that fails,
> will call __mtx_lock_sleep which will immediately try to do the same
> atomic op again.
> 
> So, the obvious microoptimization is to check the state in
> __mtx_lock_sleep and avoid the operation if the lock is not free.
> 
> This gives me ~40% speedup in a microbenchmark of 40 find processes
> traversing tmpfs and contending on mount mtx (only used as an easy
> benchmark, I have WIP patches to get rid of it).
> 
> Second part of the patch is optional and just checks the state of the
> lock prior to doing any atomic operations, but it gives a very modest
> speed up when applied on top of the __mtx_lock_sleep change. As such,
> I'm not going to defend this part.
Shouldn't the same consideration applied to all spinning loops, i.e.
also to the spin/thread mutexes, and to the spinning parts of sx and
lockmgr ?

> 
> x vanilla
> + patched
> +--+
> |   + 
>  |
> |+   ++  ++ ++   +
>  xx x   xx xxx  x|
> |  |_AM|  
>   |A_M__||
> +--+
> N   Min   MaxMedian   AvgStddev
> x   913.84516.14815.271 15.1338890.75997096
> +   9 8.363  9.56 9.126 9.0640.34198501
> Difference at 95.0% confidence
>   -6.06956 +/- 0.588917
>   -40.1057% +/- 3.89138%
>   (Student's t, pooled s = 0.589283)
> 
> x patched
> + patched2
> +--+
> | +   
>  |
> |+* ++ +  +   
> x x+ + x   x x x  x x|
> |   
> |A_M__|___|__A___M__|
>   |
> +--+
> N   Min   MaxMedian   AvgStddev
> x   9 8.363  9.56 9.126 9.0640.34198501
> +   9 7.563 9.038 8.611 8.52566670.43365885
> Difference at 95.0% confidence
>   -0.538667 +/- 0.390278
>   -5.94271% +/- 4.30565%
>   (Student's t, pooled s = 0.390521)
> 
> diff --git a/sys/kern/kern_mutex.c b/sys/kern/kern_mutex.c
> index bec8f6b..092aaae 100644
> --- a/sys/kern/kern_mutex.c
> +++ b/sys/kern/kern_mutex.c
> @@ -419,7 +419,10 @@ __mtx_lock_sleep(volatile uintptr_t *c, uintptr_t tid, 
> int opts,
>   all_time -= lockstat_nsecs(>lock_object);
>  #endif
>  
> - while (!_mtx_obtain_lock(m, tid)) {
> + for (;;) {
> + v = m->mtx_lock;
> + if (v == MTX_UNOWNED && _mtx_obtain_lock(m, tid))
> + break;
>  #ifdef KDTRACE_HOOKS
>   spin_cnt++;
>  #endif
> @@ -428,7 +431,6 @@ __mtx_lock_sleep(volatile uintptr_t *c, uintptr_t tid, 
> int opts,
>* If the owner is running on another CPU, spin until the
>* owner stops running or the state of the lock changes.
>*/
> - v = m->mtx_lock;
>   if (v != MTX_UNOWNED) {
You could restructure the code to only do one comparision with MTX_UNOWNED,
effectively using 'else' instead of the if() on the previous line.

>   owner = (struct thread *)(v & ~MTX_FLAGMASK);
>   if (TD_IS_RUNNING(owner)) {
> diff --git a/sys/sys/mutex.h b/sys/sys/mutex.h
> index a9ec072..4208d5f 100644
> --- a/sys/sys/mutex.h
> +++ b/sys/sys/mutex.h
> @@ -184,12 +184,11 @@ voidthread_lock_flags_(struct thread *, int, const 
> char *, int);
>  /* Lock a normal mutex. */
>  #define __mtx_lock(mp, tid, opts, file, line) do {   \
>   uintptr_t _tid = (uintptr_t)(tid);  \
> - \
> - if (!_mtx_obtain_lock((mp), _tid))  \
> - _mtx_lock_sleep((mp), _tid, (opts), (file), (line));\
> - else

Re: r289932 causes pf reversion - breaks rules with broadcast destination

2015-11-05 Thread Kristof Provost
On 2015-11-04 20:31:35 (-0500), Tom Uffner  wrote:
> Commit r289932 causes pf rules with broadcast destinations (and some but not 
> all rules after them in pf.conf) to be silently ignored. This is bad.
> 
Thanks for the report.

What version did you test exactly?

There was an issue with r289932 that was fixed in r289940, so if you're
in between those two can you test with something after r289940?

Thanks,
Kristof
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: pf NAT and VNET Jails

2015-11-05 Thread Shawn Webb
On Tuesday, 03 November 2015 12:44:19 AM Kristof Provost wrote:
> > On 02 Nov 2015, at 15:07, Shawn Webb  wrote:
> > 
> > On Monday, 02 November 2015 02:59:03 PM Kristof Provost wrote:
> >> Can you add your pf.conf too?
> >> 
> >> I’ll try upgrading my machine to something beyond 290228 to see if I can
> >> reproduce it. It’s on r289635 now, and seems to be fine. My VNET jails
> >> certainly get their traffic NATed.
> > 
> > Sorry about that! I should've included it. It's pasted here:
> > http://ix.io/lLI
> > 
> > It's probably not the most concise. This is a laptop that can have one of
> > three interfaces online: re0 (ethernet on the laptop), wlan0 (you can
> > guess
> > what that is), or ue0 (usb tethering from my phone). I used to be able to
> > specify NATing like that and pf would automatically figure out which
> > outgoing device to use. Seems like that's broken now.
> 
> I’ve updated my machine and things still seem to be working.
> As you said, it’s probably related to the multiple nat entries.
> 
> I’ll have to make a test setup, which’ll take a bit of time, especially
> since I’m messing with  the host machine at the moment.

I've figured it out. I've removed all rules and went with a barebones config.

Right now, the laptop I'm using for NAT has an outbound interface of wlan0 
with an IP of 129.6.251.181 (from DHCP). The following line works:

nat on wlan0 from any to any -> 129.6.251.181

The following line doesn't:

nat on wlan0 from any to any -> (wlan0)

Nor does this:

nat on wlan0 from any to any -> wlan0

From the Handbook, the lines that don't work are prefered especially the first 
non-working line, since using (wlan0) would cause pf to pick up wlan0's IP 
dynamically (which is good, since wlan0 is DHCP'd).

So it seems at some point of time, doing NAT dynamically broke.

-- 
Shawn Webb
HardenedBSD

GPG Key ID:0x6A84658F52456EEE
GPG Key Fingerprint: 2ABA B6BD EF6A F486 BE89  3D9E 6A84 658F 5245 6EEE

signature.asc
Description: This is a digitally signed message part.


Re: r289932 causes pf reversion - breaks rules with broadcast destination

2015-11-05 Thread Tom Uffner

Tom Uffner wrote:

Commit r289932 causes pf rules with broadcast destinations (and some but not
all rules after them in pf.conf) to be silently ignored. This is bad.



I do not understand the pf code well enough to see why this change caused
the breakage, but I suspect that it might expose some deeper problem and
should not simply be reverted.


OK, so here is why I don't want to simply back this out and have a "working"
firewall again:

Apparently PF_ANEQ was prone to false positives when comparing IPv4 addrs.
This is what r289932 and r289940 fixed. For IPv4 it does not matter where
in bits 32-127 the address mismatch occurs or what order the garbage data
is tested. That is all the paren fix in r289940 changes. It might be relevant
for v6, but doesn't matter here.

So, if my rule was "working" due to false positive in a comparison that has
now been fixed, how many other address comparisons were affected by this
error?

There are 36 occurrences of PF_ANEQ in pf.c and 2 in if_pfsync.c

About half of them appear to be testing IPv4 addresses. How many of them may
have been influenced by uninitialized data returning a false positive result?
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [PATCH] microoptimize by trying to avoid locking a locked mutex

2015-11-05 Thread Mateusz Guzik
On Thu, Nov 05, 2015 at 11:04:13AM -0800, John Baldwin wrote:
> On Thursday, November 05, 2015 04:26:28 PM Konstantin Belousov wrote:
> > On Thu, Nov 05, 2015 at 12:32:18AM +0100, Mateusz Guzik wrote:
> > > mtx_lock will unconditionally try to grab the lock and if that fails,
> > > will call __mtx_lock_sleep which will immediately try to do the same
> > > atomic op again.
> > > 
> > > So, the obvious microoptimization is to check the state in
> > > __mtx_lock_sleep and avoid the operation if the lock is not free.
> > > 
> > > This gives me ~40% speedup in a microbenchmark of 40 find processes
> > > traversing tmpfs and contending on mount mtx (only used as an easy
> > > benchmark, I have WIP patches to get rid of it).
> > > 
> > > Second part of the patch is optional and just checks the state of the
> > > lock prior to doing any atomic operations, but it gives a very modest
> > > speed up when applied on top of the __mtx_lock_sleep change. As such,
> > > I'm not going to defend this part.
> > Shouldn't the same consideration applied to all spinning loops, i.e.
> > also to the spin/thread mutexes, and to the spinning parts of sx and
> > lockmgr ?
> 
> I agree.  I think both changes are good and worth doing in our other
> primitives.
> 

I glanced over e.g. rw_rlock and it did not have the issue, now that I
see _sx_xlock_hard it wuld indeed use fixing.

Expect a patch in few h for all primitives I'll find. I'll stress test
the kernel, but it is unlikely I'll do microbenchmarks for remaining
primitives.

-- 
Mateusz Guzik 
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [PATCH] microoptimize by trying to avoid locking a locked mutex

2015-11-05 Thread John Baldwin
On Thursday, November 05, 2015 04:26:28 PM Konstantin Belousov wrote:
> On Thu, Nov 05, 2015 at 12:32:18AM +0100, Mateusz Guzik wrote:
> > mtx_lock will unconditionally try to grab the lock and if that fails,
> > will call __mtx_lock_sleep which will immediately try to do the same
> > atomic op again.
> > 
> > So, the obvious microoptimization is to check the state in
> > __mtx_lock_sleep and avoid the operation if the lock is not free.
> > 
> > This gives me ~40% speedup in a microbenchmark of 40 find processes
> > traversing tmpfs and contending on mount mtx (only used as an easy
> > benchmark, I have WIP patches to get rid of it).
> > 
> > Second part of the patch is optional and just checks the state of the
> > lock prior to doing any atomic operations, but it gives a very modest
> > speed up when applied on top of the __mtx_lock_sleep change. As such,
> > I'm not going to defend this part.
> Shouldn't the same consideration applied to all spinning loops, i.e.
> also to the spin/thread mutexes, and to the spinning parts of sx and
> lockmgr ?

I agree.  I think both changes are good and worth doing in our other
primitives.

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Panic with PF on current.

2015-11-05 Thread Daniel Dettlaff
Hello.

I have my second kernel panic, related with “MAC_PORTACL” kernel module loading 
in CURRENT.
The only thing to do is to put mac_portacl_load=“YES” in loader.conf and boot 
machine.

I built kernel using this config: 
https://github.com/VerKnowSys/ServeD-OS/blob/master/kernel/VERKNOWSYS-11.0
My make.conf: https://github.com/VerKnowSys/ServeD-OS/blob/master/etc/make.conf
My src.conf: https://github.com/VerKnowSys/ServeD-OS/blob/master/etc/src.conf
My loader.conf: 
https://github.com/VerKnowSys/ServeD-OS/blob/master/etc/loader.conf.served
My sysctl.conf: 
https://github.com/VerKnowSys/ServeD-OS/blob/master/etc/sysctl.conf.served

I’m using Vmware Fusion 7.0 pro as host.

I catched that panic on main system console (verbose boot turned on):

http://s.verknowsys.com/33551a89eda736059df6dcb35ea4eda3.png
with bt:
http://s.verknowsys.com/caeb3389d9e7399793a12c44f5760466.png

Thank you :) Hope this will help someone, let me know if I can help somehow 
further.

--
kind regards

Daniel (dmilith) Dettlaff



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [PATCH] microoptimize by trying to avoid locking a locked mutex

2015-11-05 Thread Ian Lepore
On Thu, 2015-11-05 at 14:19 -0800, John Baldwin wrote:
> On Thursday, November 05, 2015 01:45:19 PM Adrian Chadd wrote:
> > On 5 November 2015 at 11:26, Mateusz Guzik 
> > wrote:
> > > On Thu, Nov 05, 2015 at 11:04:13AM -0800, John Baldwin wrote:
> > > > On Thursday, November 05, 2015 04:26:28 PM Konstantin Belousov
> > > > wrote:
> > > > > On Thu, Nov 05, 2015 at 12:32:18AM +0100, Mateusz Guzik
> > > > > wrote:
> > > > > > mtx_lock will unconditionally try to grab the lock and if
> > > > > > that fails,
> > > > > > will call __mtx_lock_sleep which will immediately try to do
> > > > > > the same
> > > > > > atomic op again.
> > > > > > 
> > > > > > So, the obvious microoptimization is to check the state in
> > > > > > __mtx_lock_sleep and avoid the operation if the lock is not
> > > > > > free.
> > > > > > 
> > > > > > This gives me ~40% speedup in a microbenchmark of 40 find
> > > > > > processes
> > > > > > traversing tmpfs and contending on mount mtx (only used as
> > > > > > an easy
> > > > > > benchmark, I have WIP patches to get rid of it).
> > > > > > 
> > > > > > Second part of the patch is optional and just checks the
> > > > > > state of the
> > > > > > lock prior to doing any atomic operations, but it gives a
> > > > > > very modest
> > > > > > speed up when applied on top of the __mtx_lock_sleep
> > > > > > change. As such,
> > > > > > I'm not going to defend this part.
> > > > > Shouldn't the same consideration applied to all spinning
> > > > > loops, i.e.
> > > > > also to the spin/thread mutexes, and to the spinning parts of
> > > > > sx and
> > > > > lockmgr ?
> > > > 
> > > > I agree.  I think both changes are good and worth doing in our
> > > > other
> > > > primitives.
> > > > 
> > > 
> > > I glanced over e.g. rw_rlock and it did not have the issue, now
> > > that I
> > > see _sx_xlock_hard it wuld indeed use fixing.
> > > 
> > > Expect a patch in few h for all primitives I'll find. I'll stress
> > > test
> > > the kernel, but it is unlikely I'll do microbenchmarks for
> > > remaining
> > > primitives.
> > 
> > Is this stuff you're proposing still valid for non-x86 platforms?
> 
> Yes.  It just does a read before trying the atomic compare and swap
> and
> falls through to the hard case as if the atomic op failed if the
> result
> of the read would result in a compare failure.
> 

The atomic ops include barriers, the new pre-read of the variable
doesn't.  Will that cause problems, especially for code inside a loop
where the compiler may decide to shuffle things around?

I suspect the performance gain will be biggest on the platforms where
atomic ops are expensive (I gather from various code comments that's
the case on x86).

-- Ian
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Panic with MAC_PORTACL on current.

2015-11-05 Thread Daniel Dettlaff
Hello.

I have my second kernel panic, related with “MAC_PORTACL” kernel module loading 
in CURRENT.
The only thing to do is to put mac_portacl_load=“YES” in loader.conf and boot 
machine.

I built kernel using this config: 
https://github.com/VerKnowSys/ServeD-OS/blob/master/kernel/VERKNOWSYS-11.0
My make.conf: https://github.com/VerKnowSys/ServeD-OS/blob/master/etc/make.conf
My src.conf: https://github.com/VerKnowSys/ServeD-OS/blob/master/etc/src.conf
My loader.conf: 
https://github.com/VerKnowSys/ServeD-OS/blob/master/etc/loader.conf.served
My sysctl.conf: 
https://github.com/VerKnowSys/ServeD-OS/blob/master/etc/sysctl.conf.served

I’m using Vmware Fusion 7.0 pro as host.

I catched that panic on main system console (verbose boot turned on):

http://s.verknowsys.com/33551a89eda736059df6dcb35ea4eda3.png
with bt:
http://s.verknowsys.com/caeb3389d9e7399793a12c44f5760466.png

Thank you :) Hope this will help someone, let me know if I can help somehow 
further.

--
kind regards

Daniel (dmilith) Dettlaff



signature.asc
Description: Message signed with OpenPGP using GPGMail


Panic with PF on current.

2015-11-05 Thread Daniel Dettlaff
Hello.

I have interesting verbose output with backtrace (not panic) from one of my 
VMs: http://s.verknowsys.com/f0d457ce9420399baaf531012c33eb81.png
It’s triggered by autostarting jail on bridged vlan interface (no VNET feature 
enabled)

I built kernel using this config: 
https://github.com/VerKnowSys/ServeD-OS/blob/master/kernel/VERKNOWSYS-11.0
My make.conf: https://github.com/VerKnowSys/ServeD-OS/blob/master/etc/make.conf
My src.conf: https://github.com/VerKnowSys/ServeD-OS/blob/master/etc/src.conf
My loader.conf: 
https://github.com/VerKnowSys/ServeD-OS/blob/master/etc/loader.conf.served
My sysctl.conf: 
https://github.com/VerKnowSys/ServeD-OS/blob/master/etc/sysctl.conf.served

I’m using Vmware Fusion 7.0 pro as host.

Thank you :) Hope this will help someone, let me know if I can help somehow 
further.

--
kind regards

Daniel (dmilith) Dettlaff



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [PATCH] microoptimize by trying to avoid locking a locked mutex

2015-11-05 Thread Adrian Chadd
On 5 November 2015 at 11:26, Mateusz Guzik  wrote:
> On Thu, Nov 05, 2015 at 11:04:13AM -0800, John Baldwin wrote:
>> On Thursday, November 05, 2015 04:26:28 PM Konstantin Belousov wrote:
>> > On Thu, Nov 05, 2015 at 12:32:18AM +0100, Mateusz Guzik wrote:
>> > > mtx_lock will unconditionally try to grab the lock and if that fails,
>> > > will call __mtx_lock_sleep which will immediately try to do the same
>> > > atomic op again.
>> > >
>> > > So, the obvious microoptimization is to check the state in
>> > > __mtx_lock_sleep and avoid the operation if the lock is not free.
>> > >
>> > > This gives me ~40% speedup in a microbenchmark of 40 find processes
>> > > traversing tmpfs and contending on mount mtx (only used as an easy
>> > > benchmark, I have WIP patches to get rid of it).
>> > >
>> > > Second part of the patch is optional and just checks the state of the
>> > > lock prior to doing any atomic operations, but it gives a very modest
>> > > speed up when applied on top of the __mtx_lock_sleep change. As such,
>> > > I'm not going to defend this part.
>> > Shouldn't the same consideration applied to all spinning loops, i.e.
>> > also to the spin/thread mutexes, and to the spinning parts of sx and
>> > lockmgr ?
>>
>> I agree.  I think both changes are good and worth doing in our other
>> primitives.
>>
>
> I glanced over e.g. rw_rlock and it did not have the issue, now that I
> see _sx_xlock_hard it wuld indeed use fixing.
>
> Expect a patch in few h for all primitives I'll find. I'll stress test
> the kernel, but it is unlikely I'll do microbenchmarks for remaining
> primitives.

Is this stuff you're proposing still valid for non-x86 platforms?



-adrian
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: pf NAT and VNET Jails

2015-11-05 Thread Kristof Provost

> On 05 Nov 2015, at 17:25, Shawn Webb  wrote:
> I've figured it out. I've removed all rules and went with a barebones config.
> 
> Right now, the laptop I'm using for NAT has an outbound interface of wlan0
> with an IP of 129.6.251.181 (from DHCP). The following line works:
> 
> nat on wlan0 from any to any -> 129.6.251.181
> 
> The following line doesn't:
> 
> nat on wlan0 from any to any -> (wlan0)
> 
> Nor does this:
> 
> nat on wlan0 from any to any -> wlan0
> 
> From the Handbook, the lines that don't work are prefered especially the first
> non-working line, since using (wlan0) would cause pf to pick up wlan0's IP
> dynamically (which is good, since wlan0 is DHCP'd).
> 
> So it seems at some point of time, doing NAT dynamically broke.
> 

So far I’ve had no luck reproducing this.
With pf.conf:
nat on vtnet0 from any to any -> (vtnet0)
pass in
pass out

And setup code:
ifconfig bridge0 create
ifconfig epair0 create
ifconfig epair0a up
ifconfig epair0b up
ifconfig bridge0 addm epair0a

jail -c name=test host.hostname=test vnet persist
ifconfig epair0b vnet test

ifconfig bridge0 inet 10.0.0.1/24

jexec test ifconfig epair0b 10.0.0.2/23
jexec test route add default 10.0.0.1

# Activate routing
sysctl net.inet.ip.forwarding=1

pfctl -e
pfctl -g -f pf.conf

Then I run exec test ping 8.8.8.8, which works as expected.

My home routing is running CURRENT, used vnet jails and also doesn’t seem to be 
triggering the problem.

Perhaps we’re still missing a component of the problem, but right now I have no 
idea what that would be.

Hmm. Perhaps… do you happen to know in what order things are done during 
startup?
Perhaps it’s related to the fact that wlan0 is both wifi and DHCP, in the sense 
that pf is configured before the IP is assigned to the interface.

Can you try reloading pf with the (wlan0) rule? (Just pfctl -g -f /etc/pf.conf 
should do the trick).

Regards,
Kristof


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [PATCH] microoptimize by trying to avoid locking a locked mutex

2015-11-05 Thread John Baldwin
On Thursday, November 05, 2015 01:45:19 PM Adrian Chadd wrote:
> On 5 November 2015 at 11:26, Mateusz Guzik  wrote:
> > On Thu, Nov 05, 2015 at 11:04:13AM -0800, John Baldwin wrote:
> >> On Thursday, November 05, 2015 04:26:28 PM Konstantin Belousov wrote:
> >> > On Thu, Nov 05, 2015 at 12:32:18AM +0100, Mateusz Guzik wrote:
> >> > > mtx_lock will unconditionally try to grab the lock and if that fails,
> >> > > will call __mtx_lock_sleep which will immediately try to do the same
> >> > > atomic op again.
> >> > >
> >> > > So, the obvious microoptimization is to check the state in
> >> > > __mtx_lock_sleep and avoid the operation if the lock is not free.
> >> > >
> >> > > This gives me ~40% speedup in a microbenchmark of 40 find processes
> >> > > traversing tmpfs and contending on mount mtx (only used as an easy
> >> > > benchmark, I have WIP patches to get rid of it).
> >> > >
> >> > > Second part of the patch is optional and just checks the state of the
> >> > > lock prior to doing any atomic operations, but it gives a very modest
> >> > > speed up when applied on top of the __mtx_lock_sleep change. As such,
> >> > > I'm not going to defend this part.
> >> > Shouldn't the same consideration applied to all spinning loops, i.e.
> >> > also to the spin/thread mutexes, and to the spinning parts of sx and
> >> > lockmgr ?
> >>
> >> I agree.  I think both changes are good and worth doing in our other
> >> primitives.
> >>
> >
> > I glanced over e.g. rw_rlock and it did not have the issue, now that I
> > see _sx_xlock_hard it wuld indeed use fixing.
> >
> > Expect a patch in few h for all primitives I'll find. I'll stress test
> > the kernel, but it is unlikely I'll do microbenchmarks for remaining
> > primitives.
> 
> Is this stuff you're proposing still valid for non-x86 platforms?

Yes.  It just does a read before trying the atomic compare and swap and
falls through to the hard case as if the atomic op failed if the result
of the read would result in a compare failure.

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: r289932 causes pf reversion - breaks rules with broadcast destination

2015-11-05 Thread Kristof Provost

> On 05 Nov 2015, at 18:39, Tom Uffner  wrote:
> 
> Tom Uffner wrote:
>> Commit r289932 causes pf rules with broadcast destinations (and some but not
>> all rules after them in pf.conf) to be silently ignored. This is bad.
> 
>> I do not understand the pf code well enough to see why this change caused
>> the breakage, but I suspect that it might expose some deeper problem and
>> should not simply be reverted.
> 
> OK, so here is why I don't want to simply back this out and have a "working"
> firewall again:
> 
> Apparently PF_ANEQ was prone to false positives when comparing IPv4 addrs.
> This is what r289932 and r289940 fixed. For IPv4 it does not matter where
> in bits 32-127 the address mismatch occurs or what order the garbage data
> is tested. That is all the paren fix in r289940 changes. It might be relevant
> for v6, but doesn't matter here.
> 
Yes, that’s right. 

I haven’t yet had the time to look at your problem in any depth.
I’m currently working on a different pf issue, but this one is also high on my 
priority list. Hopefully I’ll get round to it in the next few days, but please 
do prod me 
if you hear nothing.

Regards,
Kristof

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

nanoBSD: buidlworld failure

2015-11-05 Thread O. Hartmann
Since two days, I receive the error shown below on a build machine building
nanoBSD. The recent CURRENT (nanoBSD sources AND the building host) is

 FreeBSD 11.0-CURRENT #2 r290394: Thu Nov  5 16:30:20 CET 2015 amd64


[...]
cc   -O3 -pipe   -I/empty/src/ALG/CURRENT/usr.sbin/cron/lib/../cron -DLOGIN_CAP
-DPAM -DNDEBUG -std=gnu99 -fstack-protector-strong -Wsystem-headers -Wall
-Wno-format-y2k -W -Wno-unused-parameter -Wstrict-prototypes
-Wmissing-prototypes -Wpointer-arith -Wno-uninitialized -Wno-pointer-sign
-Wno-empty-body -Wno-string-plus-int -Wno-unused-const-variable
-Wno-tautological-compare -Wno-unused-value -Wno-parentheses-equality
-Wno-unused-function -Wno-enum-conversion -Wno-unused-local-typedef
-Qunused-arguments -c /empty/src/ALG/CURRENT/usr.sbin/cron/lib/env.c -o env.o
--- all_subdir_usr.bin --- --- all_subdir_netstat --- if.o: In function
`intpr': /empty/src/ALG/CURRENT/usr.bin/netstat/if.c:(.text+0x40f): undefined
reference to `MAX' /empty/src/ALG/CURRENT/usr.bin/netstat/if.c:(.text+0x472):
undefined reference to `MAX' cc: error: linker command failed with exit code 1
(use -v to see invocation) *** [netstat] Error code 1
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Panic with PF on current.

2015-11-05 Thread NGie Cooper
On Thu, Nov 5, 2015 at 4:12 PM, Daniel Dettlaff  wrote:
> Hello.
>
> I have interesting verbose output with backtrace (not panic) from one of my 
> VMs: http://s.verknowsys.com/f0d457ce9420399baaf531012c33eb81.png
> It’s triggered by autostarting jail on bridged vlan interface (no VNET 
> feature enabled)
>
> I built kernel using this config: 
> https://github.com/VerKnowSys/ServeD-OS/blob/master/kernel/VERKNOWSYS-11.0
> My make.conf: 
> https://github.com/VerKnowSys/ServeD-OS/blob/master/etc/make.conf
> My src.conf: https://github.com/VerKnowSys/ServeD-OS/blob/master/etc/src.conf
> My loader.conf: 
> https://github.com/VerKnowSys/ServeD-OS/blob/master/etc/loader.conf.served
> My sysctl.conf: 
> https://github.com/VerKnowSys/ServeD-OS/blob/master/etc/sysctl.conf.served
>
> I’m using Vmware Fusion 7.0 pro as host.
>
> Thank you :) Hope this will help someone, let me know if I can help somehow 
> further.

Please email kp@ [next time] if you run into issues with pf(4). If the
issues have not been fixed/identified already, please file bugs.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: [PATCH] microoptimize by trying to avoid locking a locked mutex

2015-11-05 Thread John Baldwin
On Thursday, November 05, 2015 04:35:22 PM Ian Lepore wrote:
> On Thu, 2015-11-05 at 14:19 -0800, John Baldwin wrote:
> > On Thursday, November 05, 2015 01:45:19 PM Adrian Chadd wrote:
> > > On 5 November 2015 at 11:26, Mateusz Guzik 
> > > wrote:
> > > > On Thu, Nov 05, 2015 at 11:04:13AM -0800, John Baldwin wrote:
> > > > > On Thursday, November 05, 2015 04:26:28 PM Konstantin Belousov
> > > > > wrote:
> > > > > > On Thu, Nov 05, 2015 at 12:32:18AM +0100, Mateusz Guzik
> > > > > > wrote:
> > > > > > > mtx_lock will unconditionally try to grab the lock and if
> > > > > > > that fails,
> > > > > > > will call __mtx_lock_sleep which will immediately try to do
> > > > > > > the same
> > > > > > > atomic op again.
> > > > > > > 
> > > > > > > So, the obvious microoptimization is to check the state in
> > > > > > > __mtx_lock_sleep and avoid the operation if the lock is not
> > > > > > > free.
> > > > > > > 
> > > > > > > This gives me ~40% speedup in a microbenchmark of 40 find
> > > > > > > processes
> > > > > > > traversing tmpfs and contending on mount mtx (only used as
> > > > > > > an easy
> > > > > > > benchmark, I have WIP patches to get rid of it).
> > > > > > > 
> > > > > > > Second part of the patch is optional and just checks the
> > > > > > > state of the
> > > > > > > lock prior to doing any atomic operations, but it gives a
> > > > > > > very modest
> > > > > > > speed up when applied on top of the __mtx_lock_sleep
> > > > > > > change. As such,
> > > > > > > I'm not going to defend this part.
> > > > > > Shouldn't the same consideration applied to all spinning
> > > > > > loops, i.e.
> > > > > > also to the spin/thread mutexes, and to the spinning parts of
> > > > > > sx and
> > > > > > lockmgr ?
> > > > > 
> > > > > I agree.  I think both changes are good and worth doing in our
> > > > > other
> > > > > primitives.
> > > > > 
> > > > 
> > > > I glanced over e.g. rw_rlock and it did not have the issue, now
> > > > that I
> > > > see _sx_xlock_hard it wuld indeed use fixing.
> > > > 
> > > > Expect a patch in few h for all primitives I'll find. I'll stress
> > > > test
> > > > the kernel, but it is unlikely I'll do microbenchmarks for
> > > > remaining
> > > > primitives.
> > > 
> > > Is this stuff you're proposing still valid for non-x86 platforms?
> > 
> > Yes.  It just does a read before trying the atomic compare and swap
> > and
> > falls through to the hard case as if the atomic op failed if the
> > result
> > of the read would result in a compare failure.
> > 
> 
> The atomic ops include barriers, the new pre-read of the variable
> doesn't.  Will that cause problems, especially for code inside a loop
> where the compiler may decide to shuffle things around?

I do not believe so.  Eventually you have to go through a barrier to
break out of the loop.
 
> I suspect the performance gain will be biggest on the platforms where
> atomic ops are expensive (I gather from various code comments that's
> the case on x86).

Yes, and where you have contention. :-/

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [PATCH] microoptimize by trying to avoid locking a locked mutex

2015-11-05 Thread Mateusz Guzik
On Thu, Nov 05, 2015 at 04:35:22PM -0700, Ian Lepore wrote:
> On Thu, 2015-11-05 at 14:19 -0800, John Baldwin wrote:
> > On Thursday, November 05, 2015 01:45:19 PM Adrian Chadd wrote:
> > > On 5 November 2015 at 11:26, Mateusz Guzik 
> > > wrote:
> > > > On Thu, Nov 05, 2015 at 11:04:13AM -0800, John Baldwin wrote:
> > > > > On Thursday, November 05, 2015 04:26:28 PM Konstantin Belousov
> > > > > wrote:
> > > > > > On Thu, Nov 05, 2015 at 12:32:18AM +0100, Mateusz Guzik
> > > > > > wrote:
> > > > > > > mtx_lock will unconditionally try to grab the lock and if
> > > > > > > that fails,
> > > > > > > will call __mtx_lock_sleep which will immediately try to do
> > > > > > > the same
> > > > > > > atomic op again.
> > > > > > > 
> > > > > > > So, the obvious microoptimization is to check the state in
> > > > > > > __mtx_lock_sleep and avoid the operation if the lock is not
> > > > > > > free.
> > > > > > > 
> > > > > > > This gives me ~40% speedup in a microbenchmark of 40 find
> > > > > > > processes
> > > > > > > traversing tmpfs and contending on mount mtx (only used as
> > > > > > > an easy
> > > > > > > benchmark, I have WIP patches to get rid of it).
> > > > > > > 
> > > > > > > Second part of the patch is optional and just checks the
> > > > > > > state of the
> > > > > > > lock prior to doing any atomic operations, but it gives a
> > > > > > > very modest
> > > > > > > speed up when applied on top of the __mtx_lock_sleep
> > > > > > > change. As such,
> > > > > > > I'm not going to defend this part.
> > > > > > Shouldn't the same consideration applied to all spinning
> > > > > > loops, i.e.
> > > > > > also to the spin/thread mutexes, and to the spinning parts of
> > > > > > sx and
> > > > > > lockmgr ?
> > > > > 
> > > > > I agree.  I think both changes are good and worth doing in our
> > > > > other
> > > > > primitives.
> > > > > 
> > > > 
> > > > I glanced over e.g. rw_rlock and it did not have the issue, now
> > > > that I
> > > > see _sx_xlock_hard it wuld indeed use fixing.
> > > > 
> > > > Expect a patch in few h for all primitives I'll find. I'll stress
> > > > test
> > > > the kernel, but it is unlikely I'll do microbenchmarks for
> > > > remaining
> > > > primitives.
> > > 
> > > Is this stuff you're proposing still valid for non-x86 platforms?
> > 
> > Yes.  It just does a read before trying the atomic compare and swap
> > and
> > falls through to the hard case as if the atomic op failed if the
> > result
> > of the read would result in a compare failure.
> > 
> 
> The atomic ops include barriers, the new pre-read of the variable
> doesn't.  Will that cause problems, especially for code inside a loop
> where the compiler may decide to shuffle things around?
> 

Lock value is volatile, so the compiler should not screw us up. I removed
the store to the variable due to a different concern. In worst case we
would have slept instead of spinning. (see below)

> I suspect the performance gain will be biggest on the platforms where
> atomic ops are expensive (I gather from various code comments that's
> the case on x86).
> 

I don't know about other architectures, on x86 atomic op performance on
contended cachelines deteriorates drastically.

===

For the most port the patch below does 2 simple kinds of changes:
1. in macros for lock/unlock primitives the lock value is fetched and
  compared against relevant _UNLOCKED macro prior to the atomic op
2. in functions:

before:
while (!_mtx_obtain_lock(m, tid)) {
v = m->mtx_lock;
if (v != MTX_UNOWNED) {

}
.
}

after:
for (;;) {
if (m->mtx_lock == MTX_UNOWNED && _mtx_obtain_lock(m, tid))
break;
v = m->mtx_lock;
if (v != MTX_UNOWNED) {

}
.
}

The original patch preloaded the 'v' variable. If the value was
MTX_UNOWNED and _mtx_obtain_lock failed, we just lost the race.  In
order to spin waiting for the other thread to release the lock, we have
to re-read the variable. Thus for simplicity it is no longer stored in
'v'. Note this is contrary to kib's earlier suggestion which would put
such threads directly to sleep. I don't have proper measurements to have
any strong opinion here, I went the aforementioned route to minimise
changes.

Note this is a trivial patch to improve the situation a little bit, I
have no interest in trying to polish these primitives at this time.

For overall results, on a machine with 32GB of RAM and the following:
CPU: Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz (2800.06-MHz K8-class
CPU)
FreeBSD/SMP: 2 package(s) x 10 core(s) x 2 SMT threads

this reduced make -j 40 kernel in a 10.2 jail from ~2:15 to ~2:05.

===

diff --git a/sys/kern/kern_lock.c b/sys/kern/kern_lock.c
index aa67180..198e93e 100644
--- a/sys/kern/kern_lock.c
+++ b/sys/kern/kern_lock.c
@@ -787,8 +787,10 @@ 

Re: FYI: SVN to GIT converter currently broken, github is falling behind

2015-11-05 Thread Ulrich Spörlein
2015-11-04 18:57 GMT+01:00 Ulrich Spörlein :
> The recent SVN update on the cluster broke svn2git in certain circumstances.
>
> To fix this and make sure no content was dropped, the converter has
> been stopped and we're working on bringing a fixed version online, as
> well as vetting the correctness of the published git repositories.
>
> ETA is currently unknown, expect an update to this thread within 24h.
> Sorry for the inconvenience!
>
> Uli, on behalf of git-admin

An independent run of the converter produces a different git
repository starting at the commit following this one:
https://github.com/freebsd/freebsd/commit/bf66c97c4a64e64410bf0223d221a54ca9555f52

This is from 9d ago and will likely require a force push to github
that will necessitate people to rebase or merge there work (a
fast-forward merge will fail).

This is the preliminary inspection and a third verification run is
currently underway. Expect another update within 24h.

Uli
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: r289932 causes pf reversion - breaks rules with broadcast destination

2015-11-05 Thread Tom Uffner

Kristof Provost wrote:

On 2015-11-04 20:31:35 (-0500), Tom Uffner  wrote:

Commit r289932 causes pf rules with broadcast destinations (and some but not
all rules after them in pf.conf) to be silently ignored. This is bad.



What version did you test exactly?

There was an issue with r289932 that was fixed in r289940, so if you're
in between those two can you test with something after r289940?


thanks for your response.

r289940 does not fix the problem that I am seeing.

I first discovered it when I updated a -current system (from Jun 30, don't
know the exact rev) to r290174 on Oct 30. After finding that many of my net
services no longer worked, I isolated rules w/ broadcast addresses as the 
specific cause of the problem.


Then I looked up every commit that touched sys/netpfil/pf from 6/30 to 10/30
and tested a kernel from before & after each one. when r290160 unexpectedly
failed, I looked a little deeper and came up with sys/net/pfvars.h and r289932

As I said, I don't know why this change causes a problem (and don't really
have time to figure it out at the moment).

I just know that <=r289931 works, and that r289932 and greater do not.

thanks,
tom
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"