Re: performance issues during build.sh -j 40 kernel

2017-09-10 Thread Mateusz Guzik
On Sun, Sep 10, 2017 at 06:51:31PM +0100, Mindaugas Rasiukevicius wrote:
> Mateusz Guzik  wrote:
> > 1. exclusive vnode locking (genfs_lock)
> >
> > ...
> >
> > 2. uvm_fault_internal
> >
> > ...
> >
> > 4. vm locks in general
> >
>
> We know these points of lock contention, but they are not really that
> trivial to fix.  Breaking down the UVM pagequeue locks would generally
> be a major project, as it would be the first step towards NUMA support.
> In any case, patches are welcome. :)
>

Breaking locks is of course the preferred long term solution, but also
time consuming. On the other hand there are most likely reasonably easy
fixes consisting of collapsing lock/unlock cycles into just one lock/unlock
etc.

FreeBSD is no saint here either with one global lock for free pages, yet
it manages to work OK-ish with 80 hardware threads and is quite nice
with 40.

That said, I had enough problems $elsewhere to not be interested in
looking too hard here. :>

> > 3. pmap
> >
> > It seems most issues stem from slow pmap handling. Chances are there are
> > perfectly avoidable shootdowns and in fact cases where there is no need
> > to alter KVA in the first place.
>
> At least x86 pmap already performs batching and has quite efficient
> synchronisation logic.  You are right that there are some key places
> where avoiding KVA map/unmap would have a major performance improvement,
> e.g. UBC and mbuf zero-copy mechanisms (it could operate on physical
> pages for I/O).  However, these changes are not really related to pmap.
> Some subsystems just need an alternative to temporary KVA mappings.
>

I was predominantly looking at teardown of ubc mappings. The flamegraph
suggests overly high cost there.

> >
> > I would like to add a remark about locking primitives.
> >
> > Today the rage is with MCS locks, which are fine but not trivial to
> > integrate with sleepable locks like your mutexes. Even so, the current
> > implementation is significantly slower than it has to be.
> >
> > ...
> >
> > Spinning mutexes should probably be handled by a different routine.
> >
> > ...
> >
>
> I disagree, because this is a wrong approach to the problem.  Instead of
> marginally optimising the slow-path (and the more contended is the lock,
> the less impact these micro-optimisations have), the subsystems should be
> refactored to eliminate the lock contention in the first place.  Yes, it
> is much more work, but it is the long term fix.  Having said that, I can
> see some use cases where MCS locks could be useful, but it is really a low
> priority in the big picture.
>

Locks are fundamentally about damage control. As noted earlier, spurious
bus transaction due to an avoidable read make performance unnecessarily
tad bit worse. That was minor anyway, more important bit was the
backoff.

Even on systems modest by today standards the quality of locking
primitives can be a difference between a system which is slower than
ideal but perfectly usable and one which is just dog slow.

That said, making backoff parameters autoscale on cpus with some kind of
upper cap is definitely warranted.

-- 
Mateusz Guzik
Swearing Maintenance Engineer


Re: performance issues during build.sh -j 40 kernel

2017-09-10 Thread Mateusz Guzik
> Le 09/09/2017 à 20:48, Mateusz Guzik a écrit :
On Sun, Sep 10, 2017 at 07:29:11PM +0200, Maxime Villard wrote:
> Le 09/09/2017 à 20:48, Mateusz Guzik a écrit :
> > [...]
> > I installed the 7.1 release, downloaded recent git snapshot and built
the
> > trunk kernel while using config stolen from the release (had to edit out
> > something about 3g modems to make it compile). I presume this is enough
> > to not have debug of any sort enabled.
>
> Not sure I understand; did you test a kernel from the netbsd-7.1 branch,
or
> from netbsd-current? You might want to test netbsd-current, I know that
several
> performance-related improvements were made.
>

I noted it's a current kernel. The 7.1 release bits were there to ensure
I don't run into userspace/kernel debug.

> > 3. pmap
> >
> > It seems most issues stem from slow pmap handling. Chances are there are
> > perfectly avoidable shootdowns and in fact cases where there is no need
> > to alter KVA in the first place.
>
> This seems rather surprising to me. I tried to reduce the number of
shootdowns
> some time ago, but they were already optimized, and my attempts just made
them
> slower to process. The only related thing I fixed was making sure there
is no
> kernel page that gets flushed under a local shootdown, but as far as I
> remember, it didn't significantly improve performance (on a somewhat old
> hardware, I must admit).
>

Note this was tested on kvm, where shootdowns are more expensive than on
bare metal so the result is probably worsened compared to bare-metal
(still, kvm is a perfectly fine production vm deployment, so I don't
feel bad for testing on it).

I'm did not investigate in detail (I'll have to), but I believe
dragonflybsd went to extended measures to reduce/eleminate IPIs in
general. Most definitely worth looking at.

-- 
Mateusz Guzik
Swearing Maintenance Engineer


re: how to tell if a process is 64-bit

2017-09-10 Thread matthew green
Thor Lancelot Simon writes:
> On Sun, Sep 10, 2017 at 03:29:22PM +, paul.kon...@dell.com wrote:
> > 
> > MIPS has four ABIs, if you include "O64".  Whether a particular OS allows
> > all four concurrently is another matter; it isn't clear that would make
> > sense.  Mixing "O" and "N" ABIs is rather messy.
> > 
> > Would you call N32 a 64-bit ABI?  It has 64 bit registers, so if a value
> > is passed to the kernel in a register it comes across as 64 bits.  But it
> > has 32 bit addresses.
> 
> I wouldn't, because if an address is passed to the kernel, it comes across
> as 32 bits.  But what _do_ we do on modern, 32-bit MIPS?  Are we still O32?
> It does kind of look like it -- all our 32-bit MIPS ports' sets files seem
> to be linked to ../../../shared/mipsel/ which must be O32 since it is also
> used for the pmax sets.

as i mentioned earlier in this thread, our mips64 defaults to n32
userland for everything except kvm-only using utils (that all need
to be fixed.)

o32, n32 and n64 all are supported, though n64 dynamic is currently
broken for some reason i haven't looked closely at.

any mips port without "64" in it is o32-only, because it's built to
only support a 32 bit register-size CPU.


.mrg.


Re: how to tell if a process is 64-bit

2017-09-10 Thread Thor Lancelot Simon
On Sun, Sep 10, 2017 at 03:29:22PM +, paul.kon...@dell.com wrote:
> 
> MIPS has four ABIs, if you include "O64".  Whether a particular OS allows
> all four concurrently is another matter; it isn't clear that would make
> sense.  Mixing "O" and "N" ABIs is rather messy.
> 
> Would you call N32 a 64-bit ABI?  It has 64 bit registers, so if a value
> is passed to the kernel in a register it comes across as 64 bits.  But it
> has 32 bit addresses.

I wouldn't, because if an address is passed to the kernel, it comes across
as 32 bits.  But what _do_ we do on modern, 32-bit MIPS?  Are we still O32?
It does kind of look like it -- all our 32-bit MIPS ports' sets files seem
to be linked to ../../../shared/mipsel/ which must be O32 since it is also
used for the pmax sets.

-- 
  Thor Lancelot Simont...@panix.com

  "We cannot usually in social life pursue a single value or a single moral
   aim, untroubled by the need to compromise with others."  - H.L.A. Hart


Re: Proposal: Disable autoload of compat_xyz modules

2017-09-10 Thread Manuel Bouyer
On Sun, Sep 10, 2017 at 09:03:27PM +0200, Maxime Villard wrote:
> If you have a fix to untangle this mess, be my guest. I proposed to 
> reimplement
> the 43* functions separately into compat_linux, people disagreed.

Others have proposed to move it to a compat_common module, and this is
the way to go I guess.  But I won't do it as I'm happy with COMPAT_LINUX
being enabled by default (despite eventual bugs).

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: Proposal: Disable autoload of compat_xyz modules

2017-09-10 Thread Maxime Villard

Le 10/09/2017 à 20:51, Manuel Bouyer a écrit :

On Sun, Sep 10, 2017 at 08:46:52PM +0200, Maxime Villard wrote:

Le 10/09/2017 à 19:59, Manuel Bouyer a écrit :

There's something I don't understand in this thread: what is the point
of having the code in kernel if you still have to use modload to make it
availble ? Why not comment it out in kernel and have users modload it
if they want to ?


said earlier, but on a different list, see

http://mail-index.netbsd.org/source-changes-d/2017/08/04/msg009366.html


OK. So you want this because (some?) compat modules can't be dynamically
loaded. This problem should be fixed, instead of of worked around in
such a ugly way.


If you have a fix to untangle this mess, be my guest. I proposed to reimplement
the 43* functions separately into compat_linux, people disagreed.


Re: Proposal: Disable autoload of compat_xyz modules

2017-09-10 Thread Manuel Bouyer
On Sun, Sep 10, 2017 at 08:46:52PM +0200, Maxime Villard wrote:
> Le 10/09/2017 à 19:59, Manuel Bouyer a écrit :
> > There's something I don't understand in this thread: what is the point
> > of having the code in kernel if you still have to use modload to make it
> > availble ? Why not comment it out in kernel and have users modload it
> > if they want to ?
> 
> said earlier, but on a different list, see
> 
> http://mail-index.netbsd.org/source-changes-d/2017/08/04/msg009366.html

OK. So you want this because (some?) compat modules can't be dynamically
loaded. This problem should be fixed, instead of of worked around in
such a ugly way.

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: Proposal: Disable autoload of compat_xyz modules

2017-09-10 Thread Maxime Villard

Le 10/09/2017 à 19:59, Manuel Bouyer a écrit :

There's something I don't understand in this thread: what is the point
of having the code in kernel if you still have to use modload to make it
availble ? Why not comment it out in kernel and have users modload it
if they want to ?


said earlier, but on a different list, see

http://mail-index.netbsd.org/source-changes-d/2017/08/04/msg009366.html


Re: performance issues during build.sh -j 40 kernel

2017-09-10 Thread Joerg Sonnenberger
On Sun, Sep 10, 2017 at 07:56:11PM +0200, Maxime Villard wrote:
> Le 10/09/2017 à 19:50, Joerg Sonnenberger a écrit :
> > On Sun, Sep 10, 2017 at 07:17:51PM +0200, Joerg Sonnenberger wrote:
> > > That's true, but changing this also has quite a significant downside on
> > > some workloads for second order effects. I don't think it is a good idea
> > > to change this right now, as it doesn't even fix the real problem.
> > 
> > Just to quantify this part, for a current release build on tmpfs, I see:
> > 
> > After:
> > 4267
> > 4280
> > 4261
> > 4247
> > 4300
> > 
> > Before:
> > 3915
> > 3951
> > 3991
> > 3961
> > 3968
> 
> That's the cacheline alignment on the uvm locks, right? In that case, what do
> you think are the "second order effects"?

Yes, it is adding the alignment in uvm_init.c. So an isolated build of
GENERIC on tmpfs gives:

https://www.netbsd.org/~joerg/lockstat-generic.txt

(that's without DIAGNOSTICS, hannken added a very heavy assert in genfs
recently, that needs to be investigated separateply). What I strongly
suspect is that the major factor for the lock contention in
uvm_fault_internal is still the uvm_fpageqlock contention. While a
change to the contention of that might be locally positive, it can just
as well increase the contention on the vmobjlock.

Joerg


Re: Proposal: Disable autoload of compat_xyz modules

2017-09-10 Thread Manuel Bouyer
There's something I don't understand in this thread: what is the point
of having the code in kernel if you still have to use modload to make it
availble ? Why not comment it out in kernel and have users modload it
if they want to ?

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: performance issues during build.sh -j 40 kernel

2017-09-10 Thread Maxime Villard

Le 10/09/2017 à 19:50, Joerg Sonnenberger a écrit :

On Sun, Sep 10, 2017 at 07:17:51PM +0200, Joerg Sonnenberger wrote:

That's true, but changing this also has quite a significant downside on
some workloads for second order effects. I don't think it is a good idea
to change this right now, as it doesn't even fix the real problem.


Just to quantify this part, for a current release build on tmpfs, I see:

After:
4267
4280
4261
4247
4300

Before:
3915
3951
3991
3961
3968


That's the cacheline alignment on the uvm locks, right? In that case, what do
you think are the "second order effects"?

Maxime


Re: performance issues during build.sh -j 40 kernel

2017-09-10 Thread Mindaugas Rasiukevicius
Mateusz Guzik  wrote:
> ...
> 
> 1) #define UBC_NWINS 1024
> 
> The parameter was set in 2001 and is used on amd64 to this very day.
> 
> lockstat says:
>  51.63  585505 321201.06 e4011d8304c0   
>  40.39  291550 251302.17 e4011d8304c0   ubc_alloc+69
>   9.13  255967  56776.26 e4011d8304c0   ubc_release+a5
>   1.72   35632  10680.06 e4011d8304c0   uvm_fault_internal+532
> [snip]
> 
> The contention is on the global ubc vmobj lock just prior to hash lookup.
> I recompiled the kernel with randomly slapped value of 65536 and the the
> problem cleared itself with ubc_alloc going way down.
> 
> I made no attempts to check what value makes sense or how to autoscale it.
> ...

Yes, ubc_nwins should be auto-tuned, I'd say depending on the physical
memory size and the number of CPUs (as some weighted multiplier).

> 2. uvm_pageidlezero
> 
> Idle zeroing these days definitely makes no sense on amd64. Any amount of
> pages possibly prepared is quickly shredded and vast majority of all
> allocations end up zeroing in place. With rep stosb this is even less of
> a problem.

My feeling is the same: on heavily loaded systems the pressures are too
high and for idling systems it's not worth the hassle.  However, I guess
others might have a different feeling.  More benchmarks and analysis could
settle this.

> 3. false sharing
> 
> Followed the issue noted earlier I __cacheline_aligned aforementioned
> locks. But also moved atomically updated counters out of uvmexp.
> 
> uvmexp is full of counters updated with mere increments possibly by
> multiple threads, thus the issue of this obj was not resolved.
> 
> Nonetheless, said annotations applied combined with the rest give the
> improvement mentioned earlier.

Yes, although if they get significantly contended, they should be moved
out to struct uvm_cpu and/or percpu(9) API and aggregated on collection.
It depends on the counter, of course.

> 1. exclusive vnode locking (genfs_lock)
> 
> ...
> 
> 2. uvm_fault_internal
> 
> ...
> 
> 4. vm locks in general
> 

We know these points of lock contention, but they are not really that
trivial to fix.  Breaking down the UVM pagequeue locks would generally
be a major project, as it would be the first step towards NUMA support.
In any case, patches are welcome. :)

> 3. pmap
> 
> It seems most issues stem from slow pmap handling. Chances are there are
> perfectly avoidable shootdowns and in fact cases where there is no need
> to alter KVA in the first place.

At least x86 pmap already performs batching and has quite efficient
synchronisation logic.  You are right that there are some key places
where avoiding KVA map/unmap would have a major performance improvement,
e.g. UBC and mbuf zero-copy mechanisms (it could operate on physical
pages for I/O).  However, these changes are not really related to pmap.
Some subsystems just need an alternative to temporary KVA mappings.

> 
> I would like to add a remark about locking primitives.
> 
> Today the rage is with MCS locks, which are fine but not trivial to
> integrate with sleepable locks like your mutexes. Even so, the current
> implementation is significantly slower than it has to be.
> 
> ...
> 
> Spinning mutexes should probably be handled by a different routine.
> 
> ...
> 

I disagree, because this is a wrong approach to the problem.  Instead of
marginally optimising the slow-path (and the more contended is the lock,
the less impact these micro-optimisations have), the subsystems should be
refactored to eliminate the lock contention in the first place.  Yes, it
is much more work, but it is the long term fix.  Having said that, I can
see some use cases where MCS locks could be useful, but it is really a low
priority in the big picture.

-- 
Mindaugas


Re: Proposal: Disable autoload of compat_xyz modules

2017-09-10 Thread Maxime Villard

Le 10/09/2017 à 17:24, Greg Troxel a écrit :

[...]
Reading maxv@'s suggestion, I wondered about autoload of non-built-in
modules (but maybe that is already disabled).  My quick reaction is that
it would be nice if the "don't autoload" flag had the same behavior for
builtin and non-builtin modules, so that builtin/not is just a linking
style thing, and not more.


Modules can be autoloaded from the filesystem, in exec_autoload(). In such
a case, we want the kernel to do a MODULE_CMD_INIT on them, regardless of
whether they have the MODINFO_BUILTIN_NOLOAD flag set or not.

This flag must be parsed exclusively for the builtin modules, and not the rest.


[...]
   expand config(8) to be able to set "noautoload", so that if a module
   is included as part of a kernel, it will be marked noautoload if and
   noly if the flag is on the line, regardless of defaults.  This would
   not affect the modules in stand; they'd still have the default value
   of the noautoload flag from the default


This would be good. But I guess it entails introducing a new "module" keyword,
as opposed to the current "options" used for a certain number of drivers.

Another short-term alternative would be to add options that set
MODINFO_BUILTIN_NOLOAD.

Something like:

#ifdef COMPAT_LINUX_BUILTIN_NOLOAD
MD1 MD2 MD3, MODINFO_BUILTIN_NOLOAD);
#else
MD1 MD2 MD3, 0);
#endif

options COMPAT_LINUX
options COMPAT_LINUX_BUILTIN_NOLOAD

People that want the module builtin+loaded would comment the second line. Note
that this is similar to the notion that shipping functions for a kernel module
and dynamically registering them for use are two different unrelated options -
which is more or less what was suggested earlier in this thread.

But it indeed becomes a bit more complicated to understand and use...

Maxime


Re: performance issues during build.sh -j 40 kernel

2017-09-10 Thread Joerg Sonnenberger
On Sun, Sep 10, 2017 at 07:17:51PM +0200, Joerg Sonnenberger wrote:
> That's true, but changing this also has quite a significant downside on
> some workloads for second order effects. I don't think it is a good idea
> to change this right now, as it doesn't even fix the real problem.

Just to quantify this part, for a current release build on tmpfs, I see:

After:
4267
4280
4261
4247
4300

Before:
3915
3951
3991
3961
3968

After with longer spin off:
4327
4333
4343
4331
4312

Time is in seconds. So adding the cacheline alignment slows the system
down by 8% on average. That's a -j32 release on a dual Xeon.

Joerg


Re: performance issues during build.sh -j 40 kernel

2017-09-10 Thread Maxime Villard

Thanks for this analysis. I have three remarks:

Le 09/09/2017 à 20:48, Mateusz Guzik a écrit :

[...]
I installed the 7.1 release, downloaded recent git snapshot and built the
trunk kernel while using config stolen from the release (had to edit out
something about 3g modems to make it compile). I presume this is enough
to not have debug of any sort enabled.


Not sure I understand; did you test a kernel from the netbsd-7.1 branch, or
from netbsd-current? You might want to test netbsd-current, I know that several
performance-related improvements were made.


[...]
Here it turned out to be harmful by inducing avoidable cacheline traffic.

Look at nm kernel | sort -nk 1:

810b8fc0 B uvm_swap_data_lock
810b8fc8 B uvm_kentry_lock
810b8fd0 B uvm_fpageqlock
810b8fd8 B uvm_pageqlock
810b8fe0 B uvm_kernel_object



I saw exactly this too a few months ago. In fact, there is a certain number of
places that generate huge false sharing. Typically, the xpq_idx_array[MAXCPUS]
array in Xen. I've fixed only few of them, but it is clear that they should all
be taken care of.


[...]
3. pmap

It seems most issues stem from slow pmap handling. Chances are there are
perfectly avoidable shootdowns and in fact cases where there is no need
to alter KVA in the first place.


This seems rather surprising to me. I tried to reduce the number of shootdowns
some time ago, but they were already optimized, and my attempts just made them
slower to process. The only related thing I fixed was making sure there is no
kernel page that gets flushed under a local shootdown, but as far as I
remember, it didn't significantly improve performance (on a somewhat old
hardware, I must admit).

I'll take care of some of the false sharing soon.

Maxime


Re: performance issues during build.sh -j 40 kernel

2017-09-10 Thread Joerg Sonnenberger
On Sat, Sep 09, 2017 at 08:48:19PM +0200, Mateusz Guzik wrote:
> 1) #define UBC_NWINS 1024

Yes, this one should scale automatically. Needs a bit thought about that
a good scaling would be.

> 2. uvm_pageidlezero

I disagree on this, a lot. At best it is a band aid unless the
uvm_f?pageqlock handling is fixed. Not that unlike FreeBSD, this has
been using non-temporal stores for ever, so it has very little
additional cacheline traffic beyond the free queue interaction. While it
doesn't help on a completely busy system, it does provide value for any
system that is even occassionally.

> 
> 810b8fc0 B uvm_swap_data_lock
> 810b8fc8 B uvm_kentry_lock
> 810b8fd0 B uvm_fpageqlock
> 810b8fd8 B uvm_pageqlock
> 810b8fe0 B uvm_kernel_object
> 
> 
> All these locks false-share a cacheline. In particular fpagqlock is
> obstructing uvm_pageqlock.

That's true, but changing this also has quite a significant downside on
some workloads for second order effects. I don't think it is a good idea
to change this right now, as it doesn't even fix the real problem.

> Doing #if 0'ing the uvm_pageidlezero call in the idle func shaved about 2
> seconds real time:
>   589.02s user 792.62s system 2541% cpu 54.365 total

There is a sysctl for it, you know?

> Followed the issue noted earlier I __cacheline_aligned aforementioned
> locks. But also moved atomically updated counters out of uvmexp.

Actually, most of them should be switched to localcounter.

Joerg


Re: how to tell if a process is 64-bit

2017-09-10 Thread Paul.Koning

> On Sep 10, 2017, at 10:31 AM, Thor Lancelot Simon  wrote:
> 
> On Fri, Sep 08, 2017 at 07:38:24AM -0400, Mouse wrote:
>>> In a cross-platform process utility tool the question came up how to
>>> decide if a process is 64-bit.
>> 
>> First, I have to ask: what does it mean to say that a particular
>> process is - or isn't - 64-bit?
> 
> I think the only simple answer is "it is 64-bit in the relevant sense if
> it uses the platform's 64-bit ABI for interaction with the kernel".
> 
> This actually raises a question for me about MIPS: do we have another
> process flag to indicate O32 vs. N32, or can we simply not run O32
> executables on 64-bit or N32 kernels (surely we don't use the O32 ABI
> for all kernel interaction by 32-bit processes)?

MIPS has four ABIs, if you include "O64".  Whether a particular OS allows
all four concurrently is another matter; it isn't clear that would make
sense.  Mixing "O" and "N" ABIs is rather messy.

Would you call N32 a 64-bit ABI?  It has 64 bit registers, so if a value
is passed to the kernel in a register it comes across as 64 bits.  But it
has 32 bit addresses.

paul



Re: Proposal: Disable autoload of compat_xyz modules

2017-09-10 Thread Marc Balmer

> Am 10.09.2017 um 12:35 schrieb Maxime Villard :
> 
> Le 10/09/2017 à 12:24, Paul Goyette a écrit :
>> On Sun, 10 Sep 2017, Maxime Villard wrote:
>>> Re-thinking about this again, it seems to me we could simply add a flags
>>> field in modinfo_t, with a bit that says "if this module is builtin, then
>>> don't load it". To use compat_xyz, you'll have to type modload, and the
>>> kernel will load the module from the builtin list.
>>> 
>>> Something like [1] (from memory, not tested at all). Obviously this patch
>>> is not complete, since we need to update each MODULE().
>>> 
>>> While it is clear that it does not solve the cross-dependency issue we're
>>> having, it does reduce the attack surface almost as much as if the module
>>> was not builtin, with very little effort. Cheap, but relevant.
>>> 
>>> [1] http://m00nbsd.net/garbage/module/noload.diff
>> Well, probably not quite what you wanted, but if a module is built-in
>> you can disable it by using modunload(8).  Any built-in module which has
>> been disabled in this manner needs to be explicitly reload manually, and
>> you'll need to additionally specify the -f option to modload(8).
> 
> I know.
> 
>> Perhaps /etc/rc.d/modules can be updated to have both a load and an
>> unload phase, with appropriate syntax for the associated config file.
> 
> Thought about this too, but it seemed bizarre to me to have the kernel load
> modules, then rc.d/modules unload them, and then the user reload them.
> 
>> This would be a lot cleaner IMHO than updating individual modules.
> 
> I believe per-module flags can be useful in the future, and not just in the
> noload case; a module could want to tell the kernel how it wants to be loaded.

I think "how a module should be loaded" should be left to the sysadmins 
discretion,
not the module itself.

Besides that, I don't like the whole idea of built-in modules not being 
activated by
default, after all that is how it has been for many releases.

> 
> Maxime



performance issues during build.sh -j 40 kernel

2017-09-10 Thread Mateusz Guzik
Hello,

I have been playing a little bit with a NetBSD vm running on Centos7 + kvm.
I ran into severe performance issues which I partially investigated.
A bunch of total hacks was written to confirm few problems, but there is
nothing committable without doing actual work and major problems remain.

I think the kernel is in dire need to have someone sit on issues reported
below and see them through. I'm happy to test patches, although I wont
necessarily have access to the same hardware used for current tests.

Hardware specs:
Intel(R) Xeon(R) Gold 5115 CPU @ 2.40GHz
2 sockets * 10 cores * 2 hardware threads
32GB of ram

I assigned all 40 threads to the vm + gave it 16GB of ram.

The host is otherwise idle.

I installed the 7.1 release, downloaded recent git snapshot and built the
trunk kernel while using config stolen from the release (had to edit out
something about 3g modems to make it compile). I presume this is enough
to not have debug of any sort enabled.

The filesystem is just ufs mounted with noatime.

Attempts to use virtio for storage resulted in extremely abysmall
performance which I did not investigate. Using SATA gave read errors
and the vm failed to boot multiuser. I settled for IDE which works
reasonbly fine, but inherently makes the test worse.

All tests were performed with the trunk kernel booted.

Here is a bunch of "./build.sh -j 40 kernel=MYCONF > /dev/null" on stock
kernel:
  618.65s user 1097.80s system 2502% cpu 1:08.60 total
  628.73s user 1128.71s system 2540% cpu 1:09.18 total
  629.05s user 1082.58s system 2517% cpu 1:07.99 total
  641.11s user 1081.05s system 2545% cpu 1:07.65 total
  641.18s user 1079.89s system 2522% cpu 1:08.24 total

And on kernel with total hacks:
  594.08s user 693.11s system 2459% cpu 52.331 total
  594.81s user 711.90s system 2498% cpu 52.292 total
  600.34s user 676.39s system 2486% cpu 51.336 total
  597.33s user 725.78s system 2536% cpu 52.157 total
  597.13s user 708.79s system 2510% cpu 52.011 total

i.e. it's still pretty bad, with system time being above user. However,
real time dropped from ~68 to ~52 and %sys from ~1100 to ~700.

Hacks can be seen here (wear gloves and something to protect eyes):
https://people.freebsd.org/~mjg/netbsd/hacks.diff

1) #define UBC_NWINS 1024

The parameter was set in 2001 and is used on amd64 to this very day.

lockstat says:
 51.63  585505 321201.06 e4011d8304c0   
 40.39  291550 251302.17 e4011d8304c0   ubc_alloc+69
  9.13  255967  56776.26 e4011d8304c0   ubc_release+a5
  1.72   35632  10680.06 e4011d8304c0   uvm_fault_internal+532
[snip]

The contention is on the global ubc vmobj lock just prior to hash lookup.
I recompiled the kernel with randomly slapped value of 65536 and the the
problem cleared itself with ubc_alloc going way down.

I made no attempts to check what value makes sense or how to autoscale it.

This change alone accounts for most of the speed up by giving:
  586.87s user 919.99s system 2612% cpu 57.676 total

2. uvm_pageidlezero

Idle zeroing these days definitely makes no sense on amd64. Any amount of
pages possibly prepared is quickly shredded and vast majority of all
allocations end up zeroing in place. With rep stosb this is even less of
a problem.

Here it turned out to be harmful by inducing avoidable cacheline traffic.

Look at nm kernel | sort -nk 1:

810b8fc0 B uvm_swap_data_lock
810b8fc8 B uvm_kentry_lock
810b8fd0 B uvm_fpageqlock
810b8fd8 B uvm_pageqlock
810b8fe0 B uvm_kernel_object


All these locks false-share a cacheline. In particular fpagqlock is
obstructing uvm_pageqlock.

Attempt to run zeroing performs mutex_tryenter. It uncoditionally does
lock cmpxchg which dirties the cacheline, thus even if zeroing would
end up not being performed the damage was already done. Chances are
succesfull zeroing is also a problem, but that I did not investigate.

Doing #if 0'ing the uvm_pageidlezero call in the idle func shaved about 2
seconds real time:
  589.02s user 792.62s system 2541% cpu 54.365 total

This should definitely be disabled for amd64 altogether and probably
removed in general.

3. false sharing

Followed the issue noted earlier I __cacheline_aligned aforementioned
locks. But also moved atomically updated counters out of uvmexp.

uvmexp is full of counters updated with mere increments possibly by
multiple threads, thus the issue of this obj was not resolved.

Nonetheless, said annotations applied combined with the rest give the
improvement mentioned earlier.

==

Here is a flamegraph from a fully patched kernel:
https://people.freebsd.org/~mjg/netbsd/build-kernel-j40.svg

And here are top mutex spinners:
 59.42 1560022 184255.00 e40138351180   
 57.52 1538978 178356.84 e40138351180   uvm_fault_internal+7e0
  1.238884   3819.43 e40138351180   uvm_unmap_remove+101
  0.67   12159   2078.61 e40138351180   cach

Re: Proposal: Disable autoload of compat_xyz modules

2017-09-10 Thread Greg Troxel

Manuel Bouyer  writes:

> On Sun, Sep 10, 2017 at 12:17:58PM +0200, Maxime Villard wrote:
>> Re-thinking about this again, it seems to me we could simply add a flags
>> field in modinfo_t, with a bit that says "if this module is builtin, then
>> don't load it". To use compat_xyz, you'll have to type modload, and the
>> kernel will load the module from the builtin list.
>
> If I compile a kernel with a built-in module, I expect this module to
> be active. Otherwise I don't compile it.

But maxv@ is not talking about you deciding to compile a kernel and
putting in a line for a module.  The question is about compat modules
that are in GENERIC, and how to choose defaults so that users who want
to use them aren't inconveniencyed and that users that don't want to use
them don't have reduced security.

Reading maxv@'s suggestion, I wondered about autoload of non-built-in
modules (but maybe that is already disabled).  My quick reaction is that
it would be nice if the "don't autoload" flag had the same behavior for
builtin and non-builtin modules, so that builtin/not is just a linking
style thing, and not more.

But I see your point about respecting explicit configuration.

So I wonder about (without providing a patch of course):

  having a per-compiled-module flag to disable autoload, as suggested
  (in builtin and not, unless I'm confused)

  set the noautoload flag to true in modules that are deemed an
  unnecessary risk to people who have not made a choice to use them

  [so far this is maxv's proposal, I think]

  expand config(8) to be able to set "noautoload", so that if a module
  is included as part of a kernel, it will be marked noautoload if and
  noly if the flag is on the line, regardless of defaults.  This would
  not affect the modules in stand; they'd still have the default value
  of the noautoload flag from the default

  add the noautload flag to in-tree kernel configs for the above modules

which means that in Manuel's custom kernel he can just leave out the
noautoload flag and then that kernel will behave as always.

People trying to run a MODULAR kernel would still need to either edit
their module sources to change the flag (which if you are a MODULAR
type, is more or less like editing GENERIC) or do manual modload.


Overall I find this disabling of things by default but leaving them in
far preferable to not building them or removing them from sources in
terms of getting to a better place in the security/usability trade
space.


signature.asc
Description: PGP signature


Re: how to tell if a process is 64-bit

2017-09-10 Thread Kamil Rytarowski
On 10.09.2017 16:31, Thor Lancelot Simon wrote:
> On Fri, Sep 08, 2017 at 07:38:24AM -0400, Mouse wrote:
>>> In a cross-platform process utility tool the question came up how to
>>> decide if a process is 64-bit.
>>
>> First, I have to ask: what does it mean to say that a particular
>> process is - or isn't - 64-bit?
> 
> I think the only simple answer is "it is 64-bit in the relevant sense if
> it uses the platform's 64-bit ABI for interaction with the kernel".
> 
> This actually raises a question for me about MIPS: do we have another
> process flag to indicate O32 vs. N32, or can we simply not run O32
> executables on 64-bit or N32 kernels (surely we don't use the O32 ABI
> for all kernel interaction by 32-bit processes)?
> 
> Thor
> 

From a debugger pointer of view it's useful to know ABI and emulation
name of a running application.

This is also useful in core(5) files.

On Linux there is a problem to guess ABI and there is guessing (mostly
assuming the host one).

However any changes in this field are premature from my side. I need to
get elementary features to work properly.



signature.asc
Description: OpenPGP digital signature


Re: how to tell if a process is 64-bit

2017-09-10 Thread Thor Lancelot Simon
On Fri, Sep 08, 2017 at 07:38:24AM -0400, Mouse wrote:
> > In a cross-platform process utility tool the question came up how to
> > decide if a process is 64-bit.
> 
> First, I have to ask: what does it mean to say that a particular
> process is - or isn't - 64-bit?

I think the only simple answer is "it is 64-bit in the relevant sense if
it uses the platform's 64-bit ABI for interaction with the kernel".

This actually raises a question for me about MIPS: do we have another
process flag to indicate O32 vs. N32, or can we simply not run O32
executables on 64-bit or N32 kernels (surely we don't use the O32 ABI
for all kernel interaction by 32-bit processes)?

Thor


Re: Proposal: Disable autoload of compat_xyz modules

2017-09-10 Thread Maxime Villard

Le 10/09/2017 à 12:43, Manuel Bouyer a écrit :

On Sun, Sep 10, 2017 at 12:38:52PM +0200, Maxime Villard wrote:

Le 10/09/2017 à 12:22, Manuel Bouyer a écrit :

On Sun, Sep 10, 2017 at 12:17:58PM +0200, Maxime Villard wrote:

Re-thinking about this again, it seems to me we could simply add a flags
field in modinfo_t, with a bit that says "if this module is builtin, then
don't load it". To use compat_xyz, you'll have to type modload, and the
kernel will load the module from the builtin list.


If I compile a kernel with a built-in module, I expect this module to
be active. Otherwise I don't compile it.


This kind of all-or-nothing mindset just does not work if we want to reduce
the attack surface but still have features nearby. A level of indirection is
needed, and it didn't seem to me that having per-module flags was a really bad
idea.


A secure system is also a system which is simple. Adding indirections
doesn't keep the system simple.


True enough; but in this particular case, leaving compat features enabled just
for the sake of simplicity produces a system that is much more vulnerable than
if it had one level of indirection.


Re: Proposal: Disable autoload of compat_xyz modules

2017-09-10 Thread Maxime Villard

Le 10/09/2017 à 13:37, Manuel Bouyer a écrit :

On Sun, Sep 10, 2017 at 01:32:27PM +0200, Maxime Villard wrote:

Le 10/09/2017 à 13:16, Manuel Bouyer a écrit :

On Sun, Sep 10, 2017 at 01:13:14PM +0200, Maxime Villard wrote:

True enough; but in this particular case, leaving compat features enabled just
for the sake of simplicity produces a system that is much more vulnerable than
if it had one level of indirection.


If you know it's vulnerable then fix it, do not spend time trying to
work around it.


Yes, compat_linux/linux32/svr4/svr4_32/ibcs2/etc are probably still vulnerable,


as is the native exec path or compat_netbsd32 ...


yes, but these are critical to the functioning of the system, contrary to the
ones I'm talking about


Re: Proposal: Disable autoload of compat_xyz modules

2017-09-10 Thread Manuel Bouyer
On Sun, Sep 10, 2017 at 01:32:27PM +0200, Maxime Villard wrote:
> Le 10/09/2017 à 13:16, Manuel Bouyer a écrit :
> > On Sun, Sep 10, 2017 at 01:13:14PM +0200, Maxime Villard wrote:
> > > True enough; but in this particular case, leaving compat features enabled 
> > > just
> > > for the sake of simplicity produces a system that is much more vulnerable 
> > > than
> > > if it had one level of indirection.
> > 
> > If you know it's vulnerable then fix it, do not spend time trying to
> > work around it.
> 
> Yes, compat_linux/linux32/svr4/svr4_32/ibcs2/etc are probably still 
> vulnerable,

as is the native exec path or compat_netbsd32 ...

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: Proposal: Disable autoload of compat_xyz modules

2017-09-10 Thread Maxime Villard

Le 10/09/2017 à 13:16, Manuel Bouyer a écrit :

On Sun, Sep 10, 2017 at 01:13:14PM +0200, Maxime Villard wrote:

True enough; but in this particular case, leaving compat features enabled just
for the sake of simplicity produces a system that is much more vulnerable than
if it had one level of indirection.


If you know it's vulnerable then fix it, do not spend time trying to
work around it.


Yes, compat_linux/linux32/svr4/svr4_32/ibcs2/etc are probably still vulnerable,
but in ways that are far from being obvious. Just look at the vulnerability I
fixed in linux32 a few days ago.

It was agreed here that somehow there needs to be a way to reduce the attack
surface by default without totally "disabling" the features that have a common
use case - what I'm discussing now is how to achieve that, not whether to do
it or not.

Having said that, I can understand that my noload proposal may not be the best.

Maxime


Re: Proposal: Disable autoload of compat_xyz modules

2017-09-10 Thread Manuel Bouyer
On Sun, Sep 10, 2017 at 01:13:14PM +0200, Maxime Villard wrote:
> True enough; but in this particular case, leaving compat features enabled just
> for the sake of simplicity produces a system that is much more vulnerable than
> if it had one level of indirection.

If you know it's vulnerable then fix it, do not spend time trying to
work around it.

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: Proposal: Disable autoload of compat_xyz modules

2017-09-10 Thread Maxime Villard

Le 10/09/2017 à 12:22, Manuel Bouyer a écrit :

On Sun, Sep 10, 2017 at 12:17:58PM +0200, Maxime Villard wrote:

Re-thinking about this again, it seems to me we could simply add a flags
field in modinfo_t, with a bit that says "if this module is builtin, then
don't load it". To use compat_xyz, you'll have to type modload, and the
kernel will load the module from the builtin list.


If I compile a kernel with a built-in module, I expect this module to
be active. Otherwise I don't compile it.


This kind of all-or-nothing mindset just does not work if we want to reduce
the attack surface but still have features nearby. A level of indirection is
needed, and it didn't seem to me that having per-module flags was a really bad
idea.


Re: Proposal: Disable autoload of compat_xyz modules

2017-09-10 Thread Manuel Bouyer
On Sun, Sep 10, 2017 at 12:38:52PM +0200, Maxime Villard wrote:
> Le 10/09/2017 à 12:22, Manuel Bouyer a écrit :
> > On Sun, Sep 10, 2017 at 12:17:58PM +0200, Maxime Villard wrote:
> > > Re-thinking about this again, it seems to me we could simply add a flags
> > > field in modinfo_t, with a bit that says "if this module is builtin, then
> > > don't load it". To use compat_xyz, you'll have to type modload, and the
> > > kernel will load the module from the builtin list.
> > 
> > If I compile a kernel with a built-in module, I expect this module to
> > be active. Otherwise I don't compile it.
> 
> This kind of all-or-nothing mindset just does not work if we want to reduce
> the attack surface but still have features nearby. A level of indirection is
> needed, and it didn't seem to me that having per-module flags was a really bad
> idea.

A secure system is also a system which is simple. Adding indirections
doesn't keep the system simple.

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: Proposal: Disable autoload of compat_xyz modules

2017-09-10 Thread Maxime Villard

Le 10/09/2017 à 12:24, Paul Goyette a écrit :

On Sun, 10 Sep 2017, Maxime Villard wrote:


Re-thinking about this again, it seems to me we could simply add a flags
field in modinfo_t, with a bit that says "if this module is builtin, then
don't load it". To use compat_xyz, you'll have to type modload, and the
kernel will load the module from the builtin list.

Something like [1] (from memory, not tested at all). Obviously this patch
is not complete, since we need to update each MODULE().

While it is clear that it does not solve the cross-dependency issue we're
having, it does reduce the attack surface almost as much as if the module
was not builtin, with very little effort. Cheap, but relevant.

[1] http://m00nbsd.net/garbage/module/noload.diff


Well, probably not quite what you wanted, but if a module is built-in
you can disable it by using modunload(8).  Any built-in module which has
been disabled in this manner needs to be explicitly reload manually, and
you'll need to additionally specify the -f option to modload(8).


I know.


Perhaps /etc/rc.d/modules can be updated to have both a load and an
unload phase, with appropriate syntax for the associated config file.


Thought about this too, but it seemed bizarre to me to have the kernel load
modules, then rc.d/modules unload them, and then the user reload them.


This would be a lot cleaner IMHO than updating individual modules.


I believe per-module flags can be useful in the future, and not just in the
noload case; a module could want to tell the kernel how it wants to be loaded.

Maxime


Re: Proposal: Disable autoload of compat_xyz modules

2017-09-10 Thread Paul Goyette

On Sun, 10 Sep 2017, Maxime Villard wrote:


Re-thinking about this again, it seems to me we could simply add a flags
field in modinfo_t, with a bit that says "if this module is builtin, then
don't load it". To use compat_xyz, you'll have to type modload, and the
kernel will load the module from the builtin list.

Something like [1] (from memory, not tested at all). Obviously this patch
is not complete, since we need to update each MODULE().

While it is clear that it does not solve the cross-dependency issue we're
having, it does reduce the attack surface almost as much as if the module
was not builtin, with very little effort. Cheap, but relevant.

[1] http://m00nbsd.net/garbage/module/noload.diff


Well, probably not quite what you wanted, but if a module is built-in 
you can disable it by using modunload(8).  Any built-in module which has 
been disabled in this manner needs to be explicitly reload manually, and 
you'll need to additionally specify the -f option to modload(8).


Perhaps /etc/rc.d/modules can be updated to have both a load and an 
unload phase, with appropriate syntax for the associated config file.


This would be a lot cleaner IMHO than updating individual modules.



+--+--++
| Paul Goyette | PGP Key fingerprint: | E-mail addresses:  |
| (Retired)| FA29 0E3B 35AF E8AE 6651 | paul at whooppee dot com   |
| Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd dot org |
+--+--++


Re: Proposal: Disable autoload of compat_xyz modules

2017-09-10 Thread Manuel Bouyer
On Sun, Sep 10, 2017 at 12:17:58PM +0200, Maxime Villard wrote:
> Re-thinking about this again, it seems to me we could simply add a flags
> field in modinfo_t, with a bit that says "if this module is builtin, then
> don't load it". To use compat_xyz, you'll have to type modload, and the
> kernel will load the module from the builtin list.

If I compile a kernel with a built-in module, I expect this module to
be active. Otherwise I don't compile it.

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: Proposal: Disable autoload of compat_xyz modules

2017-09-10 Thread Maxime Villard

Re-thinking about this again, it seems to me we could simply add a flags
field in modinfo_t, with a bit that says "if this module is builtin, then
don't load it". To use compat_xyz, you'll have to type modload, and the
kernel will load the module from the builtin list.

Something like [1] (from memory, not tested at all). Obviously this patch
is not complete, since we need to update each MODULE().

While it is clear that it does not solve the cross-dependency issue we're
having, it does reduce the attack surface almost as much as if the module
was not builtin, with very little effort. Cheap, but relevant.

[1] http://m00nbsd.net/garbage/module/noload.diff