Re: NVMM missing opcode REPE CMPS implementation, version 2

2020-11-01 Thread Maxime Villard

Le 31/10/2020 à 23:58, Reinoud Zandijk a écrit :

Dear folks,

I've reworked the patch using the feedback I got from Maxime and had to
refactor some minor parts in the other code. I'm in the process of writing the
test-cases now but would like feedback for this solution in the meantime. It
ought now give a proper REPE memory assist implementation. It could use some
minor enhancements still though.

With regards,
Reinoud



Note that the main NVMM tree is not NetBSD anymore. Also I am no longer a NetBSD
developer, please stop spamming me.


Re: NVMM missing opcode REPE CMPS implementation

2020-10-31 Thread Maxime Villard

Le 29/10/2020 à 21:54, Reinoud Zandijk a écrit :

Dear folks,

i stumbled on an instruction that NVMM couldn't code and thus couldn't
emulate either. It was as stated the REPE CMPS (0xF3A7) instruction as stated
in https://c9x.me/x86/html/file_module_x86_id_279.html and confirmed by
disassembly by ndisasm (from nasm).

Appended is the implementation of imentioned instruction together with its
byte sized sibling 0xF3A6. When installing the modified libnvmm, qemu behaves
like NVMM is not used. I think the implementation does the right thing but
feel free to double check!

Tested and found by NetBSD/amd64 9.99.74 (19 oct) on an Intel Celeron 2957U by
executing:
qemu-system-x86_64 -accel nvmm -nographic -netdev \
user,id=n0,tftp=/usr/mdec,bootfile=pxeboot_ia32.bin -device \
e1000,netdev=n0 -boot n

With regards,
Reinoud


This is incorrect and you should revert. x86_emul_cmp deals with single memory
operands, not double. And rdi+rsi must be incremented/decremented depending on
PSL_D. Also you added printfs in the library, wtf?

As a general rule each instruction that libnvmm can emulate should have unit
tests in t_mem_assist -- in fact here a single test case would have shown that
the code is obviously wrong.

Maxime


Re: pg_jobc going negative?

2020-06-08 Thread Maxime Villard

Le 09/06/2020 à 02:49, Kamil Rytarowski a écrit :

pg_jobc is a process group struct member counting the number of
processes with a parent capable of doing job control. Once reaching 0,
the process group is orphaned.

With the addition of asserts checking for pg_jobc > 0 (by maxv@), we
caught issues that pg_jobc can go negative. I have landed new ATF tests
that trigger this reliably with ptrace(2)
(src/tests/lib/libc/sys/t_ptrace_fork_wait.h r.1.7). The problem was
originally triggered with GDB.

There are also other sources of these asserts due to races
The ptrace(2) ATF tests are reliable triggering a negative pg_jobc,
however there are racy tests doing the same as triggered by syzkaller
(mentioned at least in [1]).

Should we allow pg_jobc going negative?


I don't think so, the code is not designed to expect negative values.


(Other BSDs allow this.)


They don't "allow it", they just don't have a KASSERT against it.


Is going negative in the first place a real bug?


There were 11 bugs reported by syzbot which showed severe memory corruption
in this area. While investigating the issues, I looked at the recfounting
stuff and added KASSERTs for sanity checking, and then realized they were
actually firing when using the different reproducers generated by syzbot.

Since I put these KASSERTs, 10 of the 11 random bugs didn't trigger, and
instead it is the KASSERTs that fire earlier. Listed as duplicates:


https://syzkaller.appspot.com/bug?id=50a4ddd341b90cf15a4814048ff51db04347279a

You can see they are all different, but all have to do with reading the
group pointer, which was either freed, overwritten, not initialized,
unmapped, or contained pure garbage. This is typical of refcounting bugs
where a resource disappears under your feet.

Only one keeps firing once in a while (KASAN); that's understandable,
because even though there is a big window where the KASSERTs can fire, the
underlying race is still there and there is still the chance we miss the
window and get memory corruptions.

In short, (1) my understanding of it is that the code is not designed to
expect negative values, and (2) since I added the KASSERTs 10/11 of the
random bugs didn't trigger. Big signs the bug is indeed related to
refcounting.

It would be nice if someone with better understanding than me of the lwp
group stuff could have a look, though.

Maxime


Re: ipfilter crash in latest netbsd-9

2020-05-31 Thread Maxime Villard

Le 01/06/2020 à 03:06, Emmanuel Dreyfus a écrit :

Emmanuel Dreyfus  wrote:


After upgrading to 9.0, I experienced crashes when enabling
ipfilter (backtrace below). I tried latest netbsd-9 kernel without
improvement.


In the meantime, I filed kern/55236 about the ipfilter regression in
NetBSD 9.0 i386 Xen.

It happens with i386 XEN3PAE_DOMU kernels when ipfilter is loaded as a
module. If I add back pseudo-device ipfilter in the kernel, the problems
vanishes.

Maxime, you made the change quite a long time ago. Nobody reported
issue? Would you have an idea of where this bug stems from?
http://mail-index.netbsd.org/source-changes/2018/08/01/msg097228.html


I don't have time to check precisely right now, but if the problem arises
in a module and not in built-in, then it probably is a #define mishap.

Try to rebuild the module with -DXEN and -DPAE in CFLAGS.


fault(4)

2020-02-08 Thread Maxime Villard

[I am not subscribed to this list, so if you want to answer, make sure to CC me]

In order to explore error branches, and to test the kernel's ability to cope
with failures, it is often necessary to hard-trigger such failures.

Here is an implementation [1] for fault(4), a driver that allows to trigger
failures in the kernel. A similar driver exists in Linux.

The fault_inject() function decides whether to return true or false, depending
on parameters configurable by userland via ioctls on /dev/fault. The caller of
this function should then error out depending on the return value. Typically:

whatever_subsystem()
{
...
if (fault_inject())
return NULL; // means failure
...
return non_null; // means success
}

Several modes can be available, I have implemented one for now, the N-th mode:
every N-th call to fault_inject (N being configurable) will make it return
true.

Several scopes are available: global (ie system-wide), or calling LWP.

Examples:

 - mode=NTH scope=GLOBAL: every N-th call to fault_inject() in the whole kernel
   will return true, regardless of the LWP.
 - mode=NTH scope=LWP: every N-th call to fault_inject() made by the LWP that
   enabled the mode will return true. For the other LWPs, fault_inject() always
   returns false.

fault_inject() can be introduced in any place of interest. For now I added it
in pool_cache_get():

if (flags & PR_NOWAIT) {
if (fault_inject())
return NULL;
}

Running ATF with kASan+LOCKDEBUG+fault with {N=32 scope=GLOBAL} already gives
an instant crash:

kernel diagnostic assertion "radix_tree_empty_tree_p(&pmap->pm_pvtree)"
failed: file ".../sys/arch/x86/x86/pmap.c"

Looks like radixtree.c doesn't handle allocation failures very well somewhere.

fault(4) seems like the kind of feature that would be useful for stress-testing
and fuzzing. As you can see in the diff, its code is extremely simple.

Maxime

[1] https://m00nbsd.net/garbage/fault/fault.diff


bus_dmamap_sync: bug?

2020-01-11 Thread Maxime Villard
Seems to me there is a bug in x86's bus_dmamap_sync(). When the DMA buffer
is an UIO, and we bounce, the 'offset' argument is not taken into account
properly.

We should do
copy uio+off into bouncebuf+off
but we do
copy uio+0 into bouncebuf+off

A quick glance shows that other architectures have the same problem, like
ARM32's _bus_dma_uiomove().

Maxime


Re: EFI memory map

2020-01-03 Thread Maxime Villard

Le 02/01/2020 à 16:55, Emmanuel Dreyfus a écrit :

And indeed, studying the crash in ddb shows it happens when
accessing a physical address that is excluded by x86_fake_clusters()
but included by EFI memory map.


Note that x86_fake_clusters() is unsafe. It does not exclude the MMIO
pages, because only the bios can tell where they are. These pages can
get returned by uvm_pagealloc, triggering all sorts of crazy behavior,
potentially physically nuking the machine.

I think we should remove this function.

Maxime


Re: Proposal: removing urio(4), Rio 500 MP3 player (1999), and Rio-related packages

2020-01-02 Thread Maxime Villard

Le 02/01/2020 à 19:29, Jason Thorpe a écrit :

On Jan 2, 2020, at 10:16 AM, co...@sdf.org wrote:
Hi folks,

urio(4) is a driver for Rio 500, an MP3 player released in 1999.
It recently received some discussion due to problems detected
during maxv's USB-fuzzing.


Well, not exactly, the concerns were around the permissions of /dev/urio,
and following the discussion it turned out that the driver was a good
candidate for nuking.


Yah, seems like a good candidate for nuking.


Two other drivers were also discussed:

 - uyurex, which is currently disabled, and has been disabled forever. My
   concern in this driver is that it has buffer overflows. The developer
   who imported it is no longer in tnf. This driver is supposed to handle
   exotic body sensor devices apparently sold in Japan around 2008.

 - uscanner, which was brought up by other people for an unrelated reason.
   It was removed from FreeBSD in 2009, from OpenBSD in 2013, and disabled
   in NetBSD in 2016. It has been superseded by ugen+SANE.

I'll let Maya decide whether to formulate proposals for these two
additional drivers.

Maxime


Re: [filemon] CVS commit: htdocs/support/security

2019-12-17 Thread Maxime Villard

Le 17/12/2019 à 15:44, Andrew Doran a écrit :

Typically with a character device, the kmod can get unloaded while an ioctl
is being executed on it.


That's solvable.  Conceptually I think the main stumbling block is that
there are two layers at play which need to reconciled: specfs and devsw.  It
could also be an opportunity to lessen the distinction between block and
character devices, there's no real need for cached access from userspace,
that would simplify things too.


When it comes to syscalls, I haven't looked
closely, but the issue is likely the same.


It's atomic and side effect free if done correctly.  We have pleasing
examples of this.  This is hard to get right though, so naturally mistakes
are made.


Now that I'm looking at the syscall_{dis}establish() functions in detail,
I see that indeed the l_sysent checking is a good idea and does not require
refcounts; however the accesses on 'sy_call' should be atomic_relaxed. The
cost of these atomics is my concern.

Maxime


Re: [filemon] CVS commit: htdocs/support/security

2019-12-17 Thread Maxime Villard

Le 17/12/2019 à 12:34, Kamil Rytarowski a écrit :

On 17.12.2019 09:16, Maxime Villard wrote:

Module Name:    htdocs
Committed By:   christos
Date:   Tue Dec 17 01:03:49 UTC 2019

Modified Files:
     htdocs/support/security: advisory.html index.html

Log Message:
new advisory


To generate a diff of this commit:
cvs rdiff -u -r1.140 -r1.141 htdocs/support/security/advisory.html
cvs rdiff -u -r1.173 -r1.174 htdocs/support/security/index.html

Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.


There is something I don't understand here. Why keep this totally useless
misfeature, when we already have many tracing facilities that can do just
about the same job without having security issues?

The recommendation in the advisory is literally to remove the kernel
module from the fs. I don't see what could possibly be the use of such a
misfeature as filemon; I would remove it completely from the kernel
source tree.

Maxime


From:
http://ftp.netbsd.org/pub/NetBSD/security/advisories/NetBSD-SA2019-006.txt.asc

"Additionally the way filemon does filesystem interception is racy
and can lead to random crashes if the system calls are in use
while the module is unloaded."

Is this issue fixable? Not speaking for filemon in particular, I find
this ability to rewrite the syscall table on the fly as a feature.
Keeping a functional module with this property (even if disabled by
default) seems useful to me.


As far as I can tell, there are many races caused by autoloading.

Typically with a character device, the kmod can get unloaded while an ioctl
is being executed on it. When it comes to syscalls, I haven't looked
closely, but the issue is likely the same.

You can use tricks to "narrow down" the races; eg in NVMM, I use a global
'nmachines' variable, which prevents unloading in ~most cases. But I see
no way to fix these races, except using atomic refcounts and mutexes on
the ioctls and syscalls; obviously, this won't scale.

I find this autoloading stuff to be a misfeature, too. If we want something
on by default, then we should put it in GENERIC; if we want it disabled but
accessible, then we should put it in a kmod that requires a manual modload
from root.

Putting stuff in kmods AND having the kernel load them automatically serves
no purpose; it just adds bugs, and sometimes creates the wrong feeling that
a driver is disabled while it isn't (like filemon).

Other than that, I don't really understand your point; you can still do
syscall interception with a custom kmod if you want, regardless of filemon.

Maxime


Re: [filemon] CVS commit: htdocs/support/security

2019-12-17 Thread Maxime Villard

Le 17/12/2019 à 14:32, Kamil Rytarowski a écrit :

On 17.12.2019 14:19, Maxime Villard wrote:

Le 17/12/2019 à 12:34, Kamil Rytarowski a écrit :

On 17.12.2019 09:16, Maxime Villard wrote:

Module Name:    htdocs
Committed By:   christos
Date:   Tue Dec 17 01:03:49 UTC 2019

Modified Files:
  htdocs/support/security: advisory.html index.html

Log Message:
new advisory


To generate a diff of this commit:
cvs rdiff -u -r1.140 -r1.141 htdocs/support/security/advisory.html
cvs rdiff -u -r1.173 -r1.174 htdocs/support/security/index.html

Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.


There is something I don't understand here. Why keep this totally
useless
misfeature, when we already have many tracing facilities that can do
just
about the same job without having security issues?

The recommendation in the advisory is literally to remove the kernel
module from the fs. I don't see what could possibly be the use of such a
misfeature as filemon; I would remove it completely from the kernel
source tree.

Maxime


From:
http://ftp.netbsd.org/pub/NetBSD/security/advisories/NetBSD-SA2019-006.txt.asc


"Additionally the way filemon does filesystem interception is racy
and can lead to random crashes if the system calls are in use
while the module is unloaded."

Is this issue fixable? Not speaking for filemon in particular, I find
this ability to rewrite the syscall table on the fly as a feature.
Keeping a functional module with this property (even if disabled by
default) seems useful to me.


As far as I can tell, there are many races caused by autoloading.

Typically with a character device, the kmod can get unloaded while an ioctl
is being executed on it. When it comes to syscalls, I haven't looked
closely, but the issue is likely the same.

You can use tricks to "narrow down" the races; eg in NVMM, I use a global
'nmachines' variable, which prevents unloading in ~most cases. But I see
no way to fix these races, except using atomic refcounts and mutexes on
the ioctls and syscalls; obviously, this won't scale.

I find this autoloading stuff to be a misfeature, too. If we want something
on by default, then we should put it in GENERIC; if we want it disabled but
accessible, then we should put it in a kmod that requires a manual modload
from root.

Putting stuff in kmods AND having the kernel load them automatically serves
no purpose; it just adds bugs, and sometimes creates the wrong feeling that
a driver is disabled while it isn't (like filemon).

Other than that, I don't really understand your point; you can still do
syscall interception with a custom kmod if you want, regardless of filemon.



Out of filemon, I am only interested in syscall_hijack
(filemon_wrapper_install() + filemon_wrapper_deinstall()). If that can
be reliable, I would keep it in src/sys/modules/example.

https://nxr.netbsd.org/xref/src/sys/dev/filemon/filemon_wrapper.c#90


This code is big cancer, and I'd rather not make it an example. I would prefer
an example with syscall_{dis}establish().


I have no personal interest on the rest of filemon. Switching this
make(1) feature to at least ptrace(2) should be straightforward.


Yes, and that would be a lot more useful and reliable than filemon. What's
clear nevertheless, is that now that we've stopped installing the filemon
kmod and told users to actually remove the file, make+filemon has no
remaining use.

I will prepare the removal of filemon.

Maxime


[filemon] CVS commit: htdocs/support/security

2019-12-17 Thread Maxime Villard

Module Name:htdocs
Committed By:   christos
Date:   Tue Dec 17 01:03:49 UTC 2019

Modified Files:
htdocs/support/security: advisory.html index.html

Log Message:
new advisory


To generate a diff of this commit:
cvs rdiff -u -r1.140 -r1.141 htdocs/support/security/advisory.html
cvs rdiff -u -r1.173 -r1.174 htdocs/support/security/index.html

Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.


There is something I don't understand here. Why keep this totally useless
misfeature, when we already have many tracing facilities that can do just
about the same job without having security issues?

The recommendation in the advisory is literally to remove the kernel
module from the fs. I don't see what could possibly be the use of such a
misfeature as filemon; I would remove it completely from the kernel
source tree.

Maxime


Re: multiboot for amd64

2019-12-16 Thread Maxime Villard

Le 17/12/2019 à 02:53, Emmanuel Dreyfus a écrit :

Hello

The initial commit of multiboot for amd64 had a problem with kernel
linking that broke BIOS boot. I fixed the linker script, but in the
meantime, the multiboot option in amd64 was also commented out.

Is there any objection against re-enabling multiboot for amd64?


Yes, there is.

https://mail-index.netbsd.org/source-changes-d/2019/12/14/msg011884.html

Also, the code is still bloated with unnecessary stuff, and still hasn't
been moved to a separate function (or file).


Re: racy acccess in kern_runq.c

2019-12-07 Thread Maxime Villard

Le 06/12/2019 à 20:37, Mindaugas Rasiukevicius a écrit :

Maxime Villard  wrote:

Le 06/12/2019 à 17:53, Andrew Doran a écrit :

On Fri, Dec 06, 2019 at 05:22:39PM +0900, Kengo NAKAHARA wrote:


/* Update the worker */
-   worker_ci = hci;
+   atomic_swap_ptr(&worker_ci, hci);


Why atomic_swap_ptr() not atomic_store_relaxed()?  I don't see any bug
that it fixes.  Other than that it look OK to me.


Because I suggested it; my concern was that if not explicitly atomic, the
cpu could make two writes internally (even though the compiler produces
only one instruction), and in that case a page fault would have been
possible because of garbage dereference.


No, atomic_store_relaxed() is sufficient; that is what it is for.


Alright.

The patch needs to be completed though; ie, 'spc_mcount++' too should use
relaxed atomics, UVM style ([1]).

[1] 
https://nxr.netbsd.org/diff/src/sys/uvm/uvm_fault.c?r2=%2Fsrc%2Fsys%2Fuvm%2Fuvm_fault.c%401.209&r1=%2Fsrc%2Fsys%2Fuvm%2Fuvm_fault.c%401.208


Re: racy acccess in kern_runq.c

2019-12-06 Thread Maxime Villard

Le 06/12/2019 à 17:53, Andrew Doran a écrit :

On Fri, Dec 06, 2019 at 05:22:39PM +0900, Kengo NAKAHARA wrote:


/* Update the worker */
-   worker_ci = hci;
+   atomic_swap_ptr(&worker_ci, hci);


Why atomic_swap_ptr() not atomic_store_relaxed()?  I don't see any bug that
it fixes.  Other than that it look OK to me.


Because I suggested it; my concern was that if not explicitly atomic, the
cpu could make two writes internally (even though the compiler produces
only one instruction), and in that case a page fault would have been possible
because of garbage dereference.

To be honest, I'm not totally sure whether it is a valid concern;
theoretically, it is.


Re: racy acccess in kern_runq.c

2019-12-06 Thread Maxime Villard
Le 06/12/2019 à 11:28, Andrew Doran a écrit :
> On Fri, Dec 06, 2019 at 10:27:20AM +0100, Maxime Villard wrote:
> 
>> With 'worker_ci', there is an actual safety issue, because the compiler could
>> split the accesses and the hardware may not use atomics by default like x86.
>> This could cause random page faults; so it needs to be strictly atomic.
> 
> No I don't accept that.
> 
> The ability to load and store a native word sized int (and in more recent
> years a pointer) with a single instruction is a fundamental assumption that
> every operating system written in C rests upon.

1) That it is done with a single instruction, does not necessarily mean it is
atomic. There are many stupid things microcodes are allowed to do (not on x86,
however).

2) That there is a one-line assignment of a variable in C, does not necessarily
mean that there will be one move instruction. There are many things the compiler
is allowed to do in optimization passes, as said already.

3) "every operating system written in C rests upon" I have some difficulty with
that. Linux has our equivalent of relaxed atomics (called {READ,WRITE}_ONCE),
with 1000+ combined references in their tree. So no, here's a big open source
kernel that does not rest upon unclear assumptions for lockless operations. No
way to check Windows, but I wouldn't be surprised if they had a similar
approach.

> If the compiler splits one of those acceses, then you are either using some
> other data type, or have a broken compiler on your hands.

Or you're just using a compiler that knows well enough how to arrange
instructions for optimized pipelining, and this compiler decides to re-order
the accesses based on desirable performance properties, as allowed by the
standard.

> https://www.gnu.org/software/libc/manual/html_node/Atomic-Types.html

I don't see anything in this link that is really reassuring. "you can assume"
that things are mostly ok on "systems we know of".

At least the rationale for {READ,WRITE}_ONCE is based on standards, and not
wild assumptions:

https://github.com/google/ktsan/wiki/READ_ONCE-and-WRITE_ONCE

To me, Kengo's 2nd patch remains needed.

Maxime


Re: racy acccess in kern_runq.c

2019-12-06 Thread Maxime Villard
Le 06/12/2019 à 10:00, Andrew Doran a écrit :
> Hi,
> 
> On Fri, Dec 06, 2019 at 03:52:23PM +0900, Kengo NAKAHARA wrote:
> 
>> There are some racy accesses in kern_runq.c detected by KCSAN.  Those
>> racy access messages is so frequency that they cover other messages,
>> so I want to fix them.  They can be fixed by the following patch.
> 
> Please don't commit this.  These accesses are racy by design.  There is no
> safety issue and we do not want to disturb the activity of other CPUs in
> this code path by locking them.  We also don't want to use atomics either. 
> This code path is extremely hot using atomics would impose a severe
> performance penalty on the system under certain workloads.

With 'worker_ci', there is an actual safety issue, because the compiler could
split the accesses and the hardware may not use atomics by default like x86.
This could cause random page faults; so it needs to be strictly atomic.

Apart from that, yes, there is no other safety issue and locking would be too
expensive. All we possibly care about is making sure the accesses aren't split,
for the sake of not basing the scheduling policy on systematic garbage, and
atomic_relaxed seems like the right thing to do (Kengo's 2nd patch),
considering that it costs ~nothing.

> Also if you change this to use strong ordering, you're quite likely to
> change the selection behaviour of LWPs.  This is something delicate that
> exhibits reasonable scheduling behaviour in large part through randomness
> i.e by accident, so serialising it it's likely to have strong effects on how
> LWPs are distributed.
> 
> Lastly I have a number of experimental changes to this code which I'll be
> committing in the near future allowing people to change the LWP selection
> policy to see how if we can improve performance under different workloads. 
> They will also likely show up in KCSAN as racy/dangerous accesses.
> 
> My suggestion is to find a way to teach KCSAN that we know something is
> racy, we like it being racy, and that it's not a problem, so that it no
> longer shows up in the KCSAN output.

Maybe we should indeed have a macro to say "yes this access is racy but we
don't care". Currently this macro is atomic_{store,load}_relaxed()...




Re: racy acccess in kern_runq.c

2019-12-06 Thread Maxime Villard
Le 06/12/2019 à 07:52, Kengo NAKAHARA a écrit :
> Hi,
> 
> There are some racy accesses in kern_runq.c detected by KCSAN.  Those
> racy access messages is so frequency that they cover other messages,
> so I want to fix them.  They can be fixed by the following patch.
> 
> 
> diff --git a/sys/kern/kern_runq.c b/sys/kern/kern_runq.c
> index 04ff1732016..bb0815689cd 100644
> --- a/sys/kern/kern_runq.c
> +++ b/sys/kern/kern_runq.c
> @@ -627,7 +627,7 @@ sched_balance(void *nocallout)
>  /* Make lockless countings */
>  for (CPU_INFO_FOREACH(cii, ci)) {
>  spc = &ci->ci_schedstate;
> -
> +    spc_lock(ci);
>  /*
>   * Average count of the threads
>   *
> @@ -643,10 +643,11 @@ sched_balance(void *nocallout)
>  hci = ci;
>  highest = spc->spc_avgcount;
>  }
> +    spc_unlock(ci);
>  }
>  
>  /* Update the worker */
> -    worker_ci = hci;
> +    atomic_swap_ptr(&worker_ci, hci);
>  
>  if (nocallout == NULL)
>  callout_schedule(&balance_ch, balance_period);
> @@ -734,12 +735,15 @@ sched_idle(void)
>  spc_unlock(ci);
>  
>  no_migration:
> +    spc_lock(ci);
>  if ((spc->spc_flags & SPCF_OFFLINE) != 0 || spc->spc_count != 0) {
> +    spc_unlock(ci);
>  return;
>  }
>  
>  /* Reset the counter, and call the balancer */
>  spc->spc_avgcount = 0;
> +    spc_unlock(ci);
>  sched_balance(ci);
>  tci = worker_ci;
>  tspc = &tci->ci_schedstate;
> 

It just so happens we were talking about this yesterday.

worker_ci indeed needs to be a real atomic, but it means we also have to
fetch it atomically here:

744 tci = worker_ci;

For the other variables, my understanding was that we don't care about
races, but only care about making sure the accesses are not split, so it
seems to me atomic_relaxed would suffice.


Re: __{read,write}_once

2019-11-22 Thread Maxime Villard

Le 21/11/2019 à 23:08, David Young a écrit :

On Thu, Nov 21, 2019 at 07:19:51PM +0100, Maxime Villard wrote:

Le 18/11/2019 à 19:49, David Holland a écrit :

On Sun, Nov 17, 2019 at 02:35:43PM +, Mindaugas Rasiukevicius wrote:
  > David Holland  wrote:
  > >  > I see the potential source of confusion, but just think about: what
  > >  > could "atomic" possibly mean for loads or stores?  A load is just a
  > >  > load, there is no RMW cycle, conceptually; inter-locking the operation
  > >  > does not make sense.  For anybody who attempts to reason about this,
  > >  > it should not be too confusing, plus there are man pages.
  > >
  > > ...that it's not torn.
  > >
  > > As far as names... there are increasingly many slightly different
  > > types of atomic and semiatomic operations.
  > >
  > > I think it would be helpful if someone came up with a comprehensive
  > > naming scheme for all of them (including ones we don't currently have
  > > that we're moderately likely to end up with later...)
  >
  > Yes, the meaning of "atomic" has different flavours and describes different
  > set of properties in different fields (operating systems vs databases vs
  > distributed systems vs ...) and, as we can see, even within the fields.
  >
  > Perhaps not ideal, but "atomic loads/stores" and "relaxed" are already the
  > dominant terms.

Yes, but "relaxed" means something else... let me be clearer since I
wasn't before: I would expect e.g. atomic_inc_32_relaxed() to be
distinguished from atomic_inc_32() or maybe atomic_inc_32_ordered() by
whether or not multiple instances of it are globally ordered, not by
whether or not it's actually atomic relative to other cpus.

Checking the prior messages indicates we aren't currently talking
about atomic_inc_32_relaxed() but only about atomic_load_32_relaxed()
and atomic_store_32_relaxed(), which would be used together to
generate a local counter. This is less misleading, but I'm still not
convinced it's a good choice of names given that we might reasonably
later on want to have atomic_inc_32_relaxed() and
atomic_inc_32_ordered() that differ as above.

  > I think it is pointless to attempt to reinvent the wheel
  > here.  It is terminology used by C11 (and C++11) and accepted by various
  > technical literature and, at this point, by academia (if you look at the
  > recent papers on memory models -- it's pretty much settled).  These terms
  > are not too bad; it could be worse; and this bit is certainly not the worst
  > part of C11.  So, I would just move on.

Is it settled? My experience with the academic literature has been
that everyone uses their own terminology and the same words are
routinely found to have subtly different meanings from one paper to
the next and occasionally even within the same paper. :-/  But I
haven't been paying much attention lately due to being preoccupied
with other things.


So in the end which name do we use? Are people really unhappy with _racy()?
At least it has a general meaning, and does not imply atomicity or ordering.


_racy() doesn't really get at the intended meaning, and everything in
C11 is racy unless you arrange otherwise by using mutexes, atomics, etc.
The suffix has very little content.

Names such as load_/store_fully() or load_/store_entire() or
load_/store_completely() get to the actual semantics: at the program
step implied by the load_/store_entire() expression, the memory
constituting the named variable is loaded/stored in its entirety.  In
other words, the load cannot be drawn out over more than one local
program step.  Whether or not the load/store is performed in one step
with respect to interrupts or other threads is not defined.


Sounds like the right definition of the semantic. However:


I feel like load_entire() and store_entire() get to the heart of the
matter while being easy to speak, but _fully() or _completely() seem
fine.


I'm not a really big fan of _entire(), because it gives the impression that
the load/store is not entire otherwise - which it is, but just possibly not
in one step.

How about _onestep()? Or _nosplit()?


Re: __{read,write}_once

2019-11-21 Thread Maxime Villard

Le 18/11/2019 à 19:49, David Holland a écrit :

On Sun, Nov 17, 2019 at 02:35:43PM +, Mindaugas Rasiukevicius wrote:
  > David Holland  wrote:
  > >  > I see the potential source of confusion, but just think about: what
  > >  > could "atomic" possibly mean for loads or stores?  A load is just a
  > >  > load, there is no RMW cycle, conceptually; inter-locking the operation
  > >  > does not make sense.  For anybody who attempts to reason about this,
  > >  > it should not be too confusing, plus there are man pages.
  > >
  > > ...that it's not torn.
  > >
  > > As far as names... there are increasingly many slightly different
  > > types of atomic and semiatomic operations.
  > >
  > > I think it would be helpful if someone came up with a comprehensive
  > > naming scheme for all of them (including ones we don't currently have
  > > that we're moderately likely to end up with later...)
  >
  > Yes, the meaning of "atomic" has different flavours and describes different
  > set of properties in different fields (operating systems vs databases vs
  > distributed systems vs ...) and, as we can see, even within the fields.
  >
  > Perhaps not ideal, but "atomic loads/stores" and "relaxed" are already the
  > dominant terms.

Yes, but "relaxed" means something else... let me be clearer since I
wasn't before: I would expect e.g. atomic_inc_32_relaxed() to be
distinguished from atomic_inc_32() or maybe atomic_inc_32_ordered() by
whether or not multiple instances of it are globally ordered, not by
whether or not it's actually atomic relative to other cpus.

Checking the prior messages indicates we aren't currently talking
about atomic_inc_32_relaxed() but only about atomic_load_32_relaxed()
and atomic_store_32_relaxed(), which would be used together to
generate a local counter. This is less misleading, but I'm still not
convinced it's a good choice of names given that we might reasonably
later on want to have atomic_inc_32_relaxed() and
atomic_inc_32_ordered() that differ as above.

  > I think it is pointless to attempt to reinvent the wheel
  > here.  It is terminology used by C11 (and C++11) and accepted by various
  > technical literature and, at this point, by academia (if you look at the
  > recent papers on memory models -- it's pretty much settled).  These terms
  > are not too bad; it could be worse; and this bit is certainly not the worst
  > part of C11.  So, I would just move on.

Is it settled? My experience with the academic literature has been
that everyone uses their own terminology and the same words are
routinely found to have subtly different meanings from one paper to
the next and occasionally even within the same paper. :-/  But I
haven't been paying much attention lately due to being preoccupied
with other things.


So in the end which name do we use? Are people really unhappy with _racy()?
At least it has a general meaning, and does not imply atomicity or ordering.


Re: __{read,write}_once

2019-11-16 Thread Maxime Villard

Le 16/11/2019 à 15:31, Mindaugas Rasiukevicius a écrit :

Maxime Villard  wrote:

Alright so let's add the macros with volatile (my initial patch). Which
name do we use? I actually like __{read,write}_racy().


I suggest __atomic_load_relaxed()/__atomic_store_relaxed().


What I don't like with "atomic" in the name is that the instructions generated
are not atomic strictly speaking, and I'd rather avoid the confusion with the
really atomic instructions.

But that's not a strong opinion, and I'm fine if we go with it anyway.


If these are to be provided as macros, then I also suggest to ensure that
they provide compiler-level barrier.


You mean __insn_barrier(), right? Surrounding the access (one before, one
after) also, right?


Re: __{read,write}_once

2019-11-15 Thread Maxime Villard

Alright so let's add the macros with volatile (my initial patch). Which name do 
we
use? I actually like __{read,write}_racy().



Re: __{read,write}_once

2019-11-11 Thread Maxime Villard

Le 11/11/2019 à 13:51, Joerg Sonnenberger a écrit :

On Mon, Nov 11, 2019 at 11:02:47AM +0100, Maxime Villard wrote:

Typically in sys/uvm/uvm_fault.c there are several lockless stat increments
like the following:

/* Increment the counters.*/
uvmexp.fltanget++;


Wasn't the general consensus here to ideally have per-cpu counters here
that are aggregated passively?


In this specific case it's not going to work, because uvmexp is supposed to
be read from a core dump, and the stats are expected to be found in the
symbol...


I can think of three different options depending the platform:

(1) Use full atomic operations. Fine for true UP platforms and when the
overhead of per-CPU precise accounting is too high.

(2) Use uninterruptible memory operations in per CPU memory, aggregate
passively on demand.

(3) Emulate uninterruptible memory operations with kernel RAS, aggregate
passively on demand.


Generally I would do (2), but in this specific case I suggest (1).
atomic_inc_uint is supposed to be implemented on each platform already, so
it's easy to do.

(3) is a big headache for a very small reason, and it's not going to
prevent inter-CPU races.


Re: __{read,write}_once

2019-11-11 Thread Maxime Villard

Le 08/11/2019 à 13:40, Mindaugas Rasiukevicius a écrit :

Maxime Villard  wrote:

They are "atomic" in a sense that they prevent from tearing, fusing and
invented loads/stores.  Terms and conditions apply (e.g. they assume
properly aligned and word-sized accesses).  Additionally, READ_ONCE()
provides a data-dependency barrier, applicable only to DEC Alpha.  I
think it was the right decision on the Linux side (even though trying
to get all the data-dependency barriers right these days is kind of a
lost cause, IMO).

So, READ_ONCE()/WRITE_ONCE() is more or less equivalent to the C11
atomic load/stores routines with memory_order_relaxed (or
memory_order_consume, if you care about DEC Alpha).


But... Didn't Marco just say that 'volatile' accesses do not actually
prevent tearing/fusing/invented loads/stores? READ_ONCE/WRITE_ONCE only
do volatile accesses.


Let me try to clarify:

- The main purpose of READ_ONCE()/WRITE_ONCE() is to provide a way to
perform atomic loads/stores (in a sense of preventing from the said
behaviours), even though they help to get the memory ordering right too.
Currently, 'volatile' is a key instrument in achieving that.  However,
as stated before, terms and conditions apply: 'volatile' just in itself
does not provide the guarantee; the loads/stores also have to be properly
aligned and word-sized (these are the pre-C11 assumptions we always had).
Note: C11 introduces atomic _types_, so that the compiler could leverage
the type system and thus provide the necessary guarantees.

- Having re-read Marco's emails in this thread, I think we are very much
in agreement.  I think he merely points out that 'volatile' in itself is
not sufficient; it does not mean it's not necessary.

- There is quite a bit of confusion regarding 'volatile' amongst the
developers.  This is partly because 'volatile' is arguably underspecified
in the C standard.  AFAIK, some people in the C standardization committee
have a view that it provides weaker guarantees; however, others maintain
that the intent has always been clear and the wording is sufficient.
Without going into the details (somewhat philosophical anyway), at least
for now, 'volatile' is a de facto ingredient (one of a few, but absolutely
necessary) in achieving atomic loads/stores.

Sorry if this is a bit repetitive, but I hope it gets the point across.


Alright, thanks.

Do you think there is a performance degradation with using explicitly atomic
operations (with the "lock" prefix on x86), compared to just using an aligned
volatile which may not be exactly atomic in that sense (even if it happens to
be on x86)?

Typically in sys/uvm/uvm_fault.c there are several lockless stat increments
like the following:

/* Increment the counters.*/
uvmexp.fltanget++;

In your (not just rmind@, but tech-kern@) opinion, is it better to switch to:
atomic_inc_uint(&uvmexp.fltanget);
or to
__add_once(uvmexp.fltanget, 1);
as I did in my patch? Considering that the latter may possibly not be atomic
on certain arches, which could possibly result in garbage when the value is
read.


To fix that, do you agree that I should
   - Remove the first branch (because no lockless fastpath possible)
   - Move down the second branch (KASSERT) right after the mutex_enter
?


Feel free to write a patch and I'll have a look at it once I have a
little bit more free time.


I've committed it right away, because this one seemed clear enough, feel free
to comment or improve if you have a better idea:

https://mail-index.netbsd.org/source-changes/2019/11/11/msg110726.html

Maxime


Re: __{read,write}_once

2019-11-07 Thread Maxime Villard

Le 06/11/2019 à 23:41, Mindaugas Rasiukevicius a écrit :

Maxime Villard  wrote:

- If we do not want to stick with the C11 API (its emulation), then I
would suggest to use the similar terminology, e.g. atomic_load_relaxed()
and atomic_store_relaxed(), or Linux READ_ONCE()/WRITE_ONCE().  There is
no reason invent new terminology; it might just add more confusion.


But... Linux's READ_ONCE/WRITE_ONCE are not actually atomic, is that
correct? So there is a significant semantic difference.


They are "atomic" in a sense that they prevent from tearing, fusing and
invented loads/stores.  Terms and conditions apply (e.g. they assume properly
aligned and word-sized accesses).  Additionally, READ_ONCE() provides a
data-dependency barrier, applicable only to DEC Alpha.  I think it was
the right decision on the Linux side (even though trying to get all the
data-dependency barriers right these days is kind of a lost cause, IMO).

So, READ_ONCE()/WRITE_ONCE() is more or less equivalent to the C11 atomic
load/stores routines with memory_order_relaxed (or memory_order_consume,
if you care about DEC Alpha).


But... Didn't Marco just say that 'volatile' accesses do not actually prevent
tearing/fusing/invented loads/stores? READ_ONCE/WRITE_ONCE only do volatile
accesses.


Also, in my original patch I marked two branches of subr_xcall.c, but it
seems to me now they are actually races: ie xc_wait(), the read of
'xc_donep' could be made by two 4-byte fetches on 32bit arches (at
least). Between the two, an xc thread could have updated the value. So
there is an actual race which could possibly result in returning early
while we shouldn't have. Does that look correct to you?


Correct.


Alright, so we have the first bug found by KCSAN :)

To fix that, do you agree that I should
 - Remove the first branch (because no lockless fastpath possible)
 - Move down the second branch (KASSERT) right after the mutex_enter
?


There is more code in the NetBSD kernel which needs fixing.  I would say
pretty much all lock-free code should be audited.


I believe KCSAN can greatly help with that, since it automatically reports
concurrent accesses. Up to us then to switch to atomic, or other kinds of
markers like READ_ONCE.


Re: __{read,write}_once

2019-11-06 Thread Maxime Villard

Le 06/11/2019 à 20:38, Mindaugas Rasiukevicius a écrit :

Maxime Villard  wrote:

There are cases in the kernel where we read/write global memory
locklessly, and accept the races either because it is part of the design
(eg low-level scheduling) or we simply don't care (eg global stats).

In these cases, we want to access the memory only once, and need to ensure
the compiler does not split that access in several pieces, some of which
may be changed by concurrent accesses. There is a Linux article [1] about
this, and also [2]. I'd like to introduce the following macros:

<...>


A few comments for everybody here:

- There is a general need in the NetBSD kernel for atomic load/store
semantics.  This is because plain accesses (loads/stores) are subject
to _tearing_, _fusing_ and _invented_ loads/stores.  I created a new
thread which might help to clarify these and various other aspects:

http://mail-index.netbsd.org/tech-kern/2019/11/06/msg025664.html


Thanks


- In C11, this can be handled with atomic_{load,store}_explicit() and
memory_order_relaxed (or stronger memory order).

- If we do not want to stick with the C11 API (its emulation), then I
would suggest to use the similar terminology, e.g. atomic_load_relaxed()
and atomic_store_relaxed(), or Linux READ_ONCE()/WRITE_ONCE().  There is
no reason invent new terminology; it might just add more confusion.


But... Linux's READ_ONCE/WRITE_ONCE are not actually atomic, is that correct?
So there is a significant semantic difference.

Also, in my original patch I marked two branches of subr_xcall.c, but it
seems to me now they are actually races: ie xc_wait(), the read of 'xc_donep'
could be made by two 4-byte fetches on 32bit arches (at least). Between the
two, an xc thread could have updated the value. So there is an actual race
which could possibly result in returning early while we shouldn't have. Does
that look correct to you?

Thanks


Re: __{read,write}_once

2019-11-06 Thread Maxime Villard

Le 06/11/2019 à 17:37, Marco Elver a écrit :

On Wed, 6 Nov 2019 at 16:51, Kamil Rytarowski  wrote:


On 06.11.2019 16:44, Kamil Rytarowski wrote:

On 06.11.2019 15:57, Jason Thorpe wrote:




On Nov 6, 2019, at 5:41 AM, Kamil Rytarowski  wrote:

On 06.11.2019 14:37, Jason Thorpe wrote:




On Nov 6, 2019, at 4:45 AM, Kamil Rytarowski  wrote:

I propose __write_relaxed() / __read_relaxed().


...except that seems to imply the opposite of what these do.

-- thorpej



Rationale?

This matches atomic_load_relaxed() / atomic_write_relaxed(), but we do
not deal with atomics here.


Fair enough.  To me, the names suggest "compiler is allowed to apply relaxed constraints and 
tear the access if it wants" But apparently the common meaning is "relax, bro, I know 
what I'm doing".  If that's the case, I can roll with it.


See below.


-- thorpej



Unless I mean something this is exactly about relaxed constraints.


miss*



"Relaxed operation: there are no synchronization or ordering constraints
imposed on other reads or writes" and without "operation's atomicity is
guaranteed".


In the memory consistency model world, "relaxed" has a very specific
meaning for loads/stores. It simply means that the compiler and
architecture is free to reorder the memory operation with respect to
previous or following memory operations in program order. But the
load/store still happens as one atomic operation.

For programming-language memory models, "relaxed" appears in the
context of atomic memory operations, to define their ordering w.r.t.
other operations in program order. There is a distinction between
"relaxed atomic" and plain (unannotated) memory operations at the
programming language level.

For plain operations, the compiler, in addition to the target
architecture, is free to reorder operations or even apply
optimizations that turn single plain operations into multiple
loads/stores [1, 2].
[1] https://lwn.net/Articles/793253/
[2] https://lwn.net/Articles/799218/

In the Linux kernel READ_ONCE/WRITE_ONCE are one way to specify
"relaxed atomic read/write" for accesses that fit in a word (_ONCE
also works on things larger than a word, but probably aren't atomic
anymore). For most architectures the implementation is the same, and
merely avoids compiler optimizations; the exception is Alpha, where
READ_ONCE adds a memory barrier [3].
[3] https://www.kernel.org/doc/Documentation/memory-barriers.txt


This is also similar to what suggested Google to apply to NetBSD in our
internal thread, but with a bit different naming.


What you call the ops is entirely up to you. I would not use
'{read,write}_relaxed', since as you pointed out above, is probably
more confusing. This is why I suggested 'atomic_{read,write}_relaxed'.
If you do not like the 'atomic' in there, the next best thing is
probably '{read,write}_once'.

Just note that you're now defining your own memory consistency model.
This is a complex topic that spans decades of work. The safest thing
is to stick as closely to the C11/C++11 memory model as possible [4].
The Linux kernel also has a memory model [5], but is significantly
more complex since it was defined after various primitives had been
introduced.
[4] http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1479.htm
[5] 
https://github.com/torvalds/linux/blob/master/tools/memory-model/Documentation/explanation.txt

Best Wishes,
-- Marco


Thanks for the details.

I must say I still have a doubt about the instructions actually emited
by the compiler.

Let's take READ_ONCE for example. What it does is basically just casting
the pointer to volatile and doing the access. Two points of confusion:

 - Is 'volatile' a guarantee that the compiler will emit only one
   instruction to fetch the value?

 - Assuming there is only one instruction, strictly speaking the fetch is
   not guaranteed to be atomic, is that correct? Because the address may
   not be naturally aligned; or the cpu may not be x86 and thus may not
   provide automatic atomicity in such case.

Thanks


Re: __{read,write}_once

2019-11-06 Thread Maxime Villard

Le 06/11/2019 à 12:38, Martin Husemann a écrit :

On Wed, Nov 06, 2019 at 12:31:37PM +0100, Maxime Villard wrote:

__read_once(x)
__write_once(x, val)


The names are really not helpfull for understanding what is supposed to happen
here.


I don't know if you have a better suggestion, but it's basically the naming
convention that seems to have been used for several years already. Maybe we
could have something more explicit, like

__{read,write}_racy() or
__{read,write}_parallel() or
__{read,write}_concurrent()

But the last one is too long I think.


__{read,write}_once

2019-11-06 Thread Maxime Villard

There are cases in the kernel where we read/write global memory locklessly,
and accept the races either because it is part of the design (eg low-level
scheduling) or we simply don't care (eg global stats).

In these cases, we want to access the memory only once, and need to ensure
the compiler does not split that access in several pieces, some of which
may be changed by concurrent accesses. There is a Linux article [1] about
this, and also [2]. I'd like to introduce the following macros:

__read_once(x)
__write_once(x, val)

These macros:

 - Are supposed to ensure that reads/writes result in only one access. We
   basically just cast to volatile for now.

 - Serve as markers for KCSAN, to say "yes there is a race, but it's
   expected and you don't need to care".

 - Help understanding the code, ie the reader will easily see that the area
   can legitimately race.

In short, the main role they play is markers, to inform the reader and
sanitizers.

I've made a patch [3], with several changes already to use these macros.

Feel free to comment

[1] https://lwn.net/Articles/508991/
[2] https://github.com/google/ktsan/wiki/READ_ONCE-and-WRITE_ONCE
[3] https://m00nbsd.net/garbage/kcsan/access-once.diff


nvmm: group

2019-10-25 Thread Maxime Villard

Currently /dev/nvmm has these permissions

c rw- --- --- root wheel

Which means the user has to tune the permissions manually. I would like to
introduce an "nvmm" group, and change the perms to

c rw- r-- --- root nvmm

This means that emulators will have to be setgid nvmm, or the caller must
be part of nvmm.

Also, I want to make nvmm_init() public. Until now it was invoked during
the first NVMM call within libnvmm. From now on it's the emulators that
must invoke it before any other NVMM function. The point is that the
emulator now has the ability to manually set the proper permissions before
invoking nvmm_init() and drop them after the call has completed.

Here is a patch [1] that adds this group (are there more changes needed?),
and makes nvmm_init public.

Feel free to comment, I'll commit it soon.

[1] https://m00nbsd.net/garbage/nvmm/nvmm-group.diff


Re: make mbuf clusters coherency aligned again

2019-10-18 Thread Maxime Villard

Le 17/10/2019 à 19:39, Tobias Nygren a écrit :

Hi,

Since there were some concerns raised that this problem shouldn't be
fixed in the driver ...

https://mail-index.netbsd.org/source-changes/2019/10/15/msg109984.html

... I'm going partially back it out and commit this change instead.

To summarize why; having mbuf clusters not coherency aligned is
generally unsafe because there is as far as I have been able to
determine nothing that prevents the kernel from clobbering memory in
lower portion of the cache line whilst the upper portion of the cache
line is involved in a DMA transfer.

Index: uipc_mbuf.c
===
RCS file: /cvsroot/src/sys/kern/uipc_mbuf.c,v
retrieving revision 1.234
diff -p -u -r1.234 uipc_mbuf.c
--- uipc_mbuf.c 28 Sep 2019 16:02:12 -  1.234
+++ uipc_mbuf.c 17 Oct 2019 17:32:30 -
@@ -188,8 +188,8 @@ mbinit(void)
NULL, IPL_VM, mb_ctor, NULL, NULL);
KASSERT(mb_cache != NULL);
  
-	mcl_cache = pool_cache_init(mclbytes, 0, 0, 0, "mclpl", NULL,

-   IPL_VM, NULL, NULL, NULL);
+   mcl_cache = pool_cache_init(mclbytes, COHERENCY_UNIT, 0, 0, "mclpl",
+   NULL, IPL_VM, NULL, NULL, NULL);
KASSERT(mcl_cache != NULL);
  
  	pool_cache_set_drain_hook(mb_cache, mb_drain, NULL);


Yes, looks good to me.


Re: alloca() in kernel code

2019-10-11 Thread Maxime Villard

Le 12/10/2019 à 02:01, Emmanuel Dreyfus a écrit :

Hello

I recently encountered a situation where I had to deal with variable
length structure at a time where kernel dynamic allocator was not
initialized.

Using alloca() to have the data allocated on stack seemed attractive,
but unfortunately kernel build infrastructure ban that:


For good reasons.


error: stack protector not protecting local variables: variable length
buffer [-Werror=stack-protector]

As a result, I have to rely on a fixed-size buffer big enough so that I
can hope data will fit, and miserabily fail otherwise.

Is it possible to relax the protection on some code section?


No. The kernel stack is rather small, we should not start allocating
variable-sized stuff on it.


Or is there an alternative way of dealing with that?


You didn't say what exactly you wanted to allocate memory for, so it's hard to
tell. The rare legitimate cases where I've had to do that, I picked buffers
big enough.

In all cases, we should never allocate variable-sized buffers on the stack in
the kernel.


Re: Proposal, again: Disable autoload of compat_xyz modules

2019-09-26 Thread Maxime Villard

Le 26/09/2019 à 21:21, Mouse a écrit :

So why the hell are you throwing in statements like

immense trouble for little actual use.

Because this is true!  Do you understand, or not?  Pfff.


It's not me you were responding to here.  But you appear to be falling
into a relatively common trap: operating from the assumption that you
are entirely correct and that your underlying assumptions are
universally true, and that therefore anyone who disagrees with you must
not understand what you are saying.


Actually, you are right. This is the N-th time that this problem comes up. For
the N-th time, I see people slipping away, trying to find moronic justifi-
cations like "maybe the users don't know they use compat_linux".

Of course I get upset, I've heard this crap N-1 times before. And the bigger
the N gets, the more I am confident that I'm right, yes. And the less I'm
tempted to discuss, also.

I am absolutely upset at this moronic topic that has to come up again, and
again, and again, because people can't process one minute that there is a
*RISK* in these compat features. It's been literally years of this total
nonsense. And I've heard all of the most idiotic reasons: from the "uh no
you are just paranoid", to the "but you have no proof it is buggy", then to
the "oh no but every software has bugs", down to now "maybe the users don't
know they use compat_linux".

You know who's the dumbass who is cleaning all this shit up each time? Well,
it's me. I am the one trying to fix the bugs, processing pullups, writing
security advisories, and even handling the bad press. And I did that again
two weeks ago when I set up a fuzzing VM to find bugs in compat_linux which
has resulted in all the fixes that now need even more effort to be pulled
up, documented, etc, because this buggy crap is enabled by default.

People are here creating a f*cking mess, and I'm here systematically doing
the damage control. The slighest proposal is met with the biggest nonsense.
Complete lack of engineering spirit on things as simple as disabling risky
stuff, while other OSes have done that for years without problem. Idiocy at
its finest.

That's it, I've said it.


I understand what you are claiming.  I think most of us do.  I simply
disagree with some of it and see you as having failed to present a good
case for most of the rest.  Repeating your unsubstantiated claims -
such as the one that you readily admitted was unsubstantiated when I
called you on it - is not going to convince anyone of anything,
possibly excepting that you have such a weak point that you have to
resort to appeals to emotion rather than actual technical arguments to
support it.


The "unsubstantiated" claim is the percentage which I immediately clarified to
you as being illustrative but not definitive. The only ones who are making
unsubstianted claims are the ones who keep talking nonsense; like John Nemeth
for example, who as usual makes a great demonstration of his intelligence by
showing up late in a conversation and answering to each email in backwards
order on points that have been addressed already.


You are not worth my time.


And you're now resorting to...well, it's not a classic ad hominem, but
it's pretty close.  (It's not any individual that should be "worth
[your] time" or not, but rather the arguments presented.)  Your
arguments are convincing, but what they are convincing me, for one, of
is not what you appear to want to convince people of.


Well it appears you too have convinced me of something you were not expecting:
that you are not worth my time either. Trying to suggest I'm making
unsubstianted claims when I'm not. Trying to say I use emotions while I put
forward the technical reasons and solution. Quoting single sentences like the
above without quoting the paragraph. Do you want to contribute and do all the
actual dirty work for once, or you're just here to talk and give your random
opinions on things you've never invested yourself in? What is your background?
What is your portfolio?

Verily, I'm glad this thread has reached this point. Because it has just
cleared away the cobwebs in my mind.

My current inability to discuss this kind of topic, and to work through the
nonsense, irrelevance and profound idiocy in the answers I'm facing, just
shows I've accumulated so much frustration and anger towards this project. I
believe it is clear, and in this regard, you are not completely wrong.

So here's what's going to happen. I will drop this thread, because in the end
I don't care a lot. I will stop doing the work I've been doing, have withdrawn
from secteam already, and won't be taking care of any of that from now on. I'm
done, this is over.

Good luck


Re: Proposal, again: Disable autoload of compat_xyz modules

2019-09-26 Thread Maxime Villard

Le 26/09/2019 à 18:56, Kamil Rytarowski a écrit :

Statement that Linux-compat is not much used is false at least due to
adobe-flash-plugin-11.2.202.644nb1.

I find this package as a must have for a web browser.


Uh? No, this is absolutely wrong. Flash player will be completely abandoned
by Adobe in 2020. The vast majority of the internet has switch to equivalents.
It has also been disabled by default by Firefox and other web browsers for
years. Flash player is absolutely not a justification for compat_linux.


Re: Proposal, again: Disable autoload of compat_xyz modules

2019-09-26 Thread Maxime Villard

Le 26/09/2019 à 19:21, Edgar Fuß a écrit :

I do understand you are not proposing to remove compat_linux.
I do understand your proposal is just about disabling module auto-load
(which I don't care about because I run a monolithic kernel).

So why the hell are you throwing in statements like

immense trouble for little actual use.


Because this is true! Do you understand, or not? Pfff.

Yes, compat_linux has had vulnerabilities, repeatedly, and yes, this has
caused us immense trouble, like the situation right now, where we have things
to pull up and advisories to write but no one willing to do that work. This
has happened repeatedly, and I'm making a proposal to disable this feature,
to prevent this trouble in the future. Nothing controversial about that, it
just makes sense.

You are not worth my time.


Re: Proposal, again: Disable autoload of compat_xyz modules

2019-09-26 Thread Maxime Villard

Le 26/09/2019 à 19:39, Martin Husemann a écrit :

Btw: I find it very strange that this proposal has no concrete proposal.
What compat options exactly do you want to disallow from auto loading?
What exact sysctls and semantics do you have in mind?


My proposal is the same as Taylor's, as I said in the initial email.

However, maybe keep the NetBSD-specific compat enabled as far as
compat_50 or compat_60.

The semantic is the one I stated in the initial email: the use of
kernel modules, now that this is doable in practice.


Re: Proposal, again: Disable autoload of compat_xyz modules

2019-09-26 Thread Maxime Villard

Le 26/09/2019 à 18:19, Kamil Rytarowski a écrit :

On 26.09.2019 15:55, Kamil Rytarowski wrote:

On 26.09.2019 15:06, Mouse wrote:

[...] compat_linux and compat_linux32 [...]



Keeping them enabled for the <1% users interested means keeping
vulnerabilities for the >99% who don't use these features.


Are the usage numbers really that extreme?  Where'd you get them?  I
didn't think there were any mechanisms in place that would allow
tracking compat usage.



I depend on compat_linux/compat_linux32 in at least 4 applications. I
don't use them daily, but frequently.

There are use-cases where linux_compat* is to be or not to be for NetBSD
as host. (At least one commercial user of NetBSD depends on it as well.)

In general it is fine to disable linux_compat* unless we can ensure its
correctness with regression tests and continuous fuzzing. As I discussed
with one developer, we could test linux_compat with faked/light linux
libc, unfortunately it still did not realize.




I have managed to get light cross-toolchain producing Linux binaries.

The only dependency is musl (+gmake to build it).

It works almost out of the box (I had to manually build certain files).

The only think that is required to be tuned is to add a dummy gnu_debuglink.

https://nxr.netbsd.org/xref/src/sys/compat/linux/common/linux_exec_elf32.c#214

The kernel shall have a fallback, and probably parse .intrp string and
look for 'ld-musl' in it.

$ cat m.c


#include 

int
main(int argc, char **argv)
{
printf("Hello world!\n");

return 0;
}
$ musl-gcc m.c


$ objcopy --add-gnu-debuglink=/dev/null ./a.out
$ file a.out
a.out: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically
linked, interpreter /lib/ld-musl-x86_64.so.1, not stripped
$ ./a.out


Hello world!

Another step is to build LTP with musl-gcc and run the tests.

https://github.com/linux-test-project/ltp (LTP supports musl)

There will be some patches involved.

The question is how to setup the tests for it? anita + picking these
packages from pkgsrc?

Making a setup with a syzbot instance for linux32/linux64 target should
be not that difficult.

Unfortunately all that needs some work in various projects, integration,
upstreaming code (to LTP?) etc and maybe TNF could spare someone to do it.


Also linux_compat is getting more and more irrelevant as time pass due
to shortage in our futex code (lack of robust futexes).



This is slowly worked on.


I think you are confused.

Fuzzing does not act as a perfusion to keep something artificially alive.

Part of the compat_linux bugs and vulns I fixed recently were found with a
simple fuzzing VM I set up. Yet, does it really change anything? My proposal
is unchanged. The attack surface is huge, I've been able to exercise only a
subset of the syscalls. Now big pullups need to be done, and no one is willing
to do them. The scanner bot also has found bugs that couldn't be found with
fuzzers.

Disabling certain compat options preserves the functionality, significantly
reduces the attack surface, and eases maintenance as well. At least the bugs
do not qualify as critical vulnerabilities if the feature is disabled.

Fuzzing does not reduce the attack surface, and does not constitute "active
maintenance" either (like pulling up, writing advisories, etc...).


Re: Proposal, again: Disable autoload of compat_xyz modules

2019-09-26 Thread Maxime Villard

Le 26/09/2019 à 18:15, Manuel Bouyer a écrit :

On Thu, Sep 26, 2019 at 05:28:11PM +0200, Maxime Villard wrote:

Le 26/09/2019 à 17:25, Brian Buhrow a écrit :

[...]
One implication of your proposal is that you'll disable the autoload
functionality, users will turn it back on, use it, and be more vulnerable
than they are now because the primary developers aren't concern with making
things work or secure anymore.


Nobody is making compat_linux work, nobody is making compat_linux secure.


My experience with a Linux program using forks and signals is that it does not
work at all; the children get killed randomly for no reason, the parent doesn't
receive the signals, and after some time everything needs to be killed and
restarted to work again. Completely broken. I didn't manage to find where
exactly the problem was.

Under reasonable assumptions, compat_linux indeed used to be the most
maintained compat layer we had. This isn't the case anymore. Under reasonable
assumptions as well, it has a marginal use case, and can be disabled.

Maybe Manuel can understand that for a minute? Or is he still looking for
evidence that I'm not the Pope?


Secure, I don't know.


Well Manuel can pretend everything he wants, but when it comes to security and
compat_linux, we're entering the world of facts, and not reasonable assumptions.

compat_linux has a history of being vulnerable, the main reason being that
it is is poorly tested - no ATF, no reliable way to exercise the code
comprehensively -, and not actively maintained either. This has led to a series
of exploitable vulnerabilities in the past, and a few weeks ago I found even
more of them using a rather simplistic QA technique.

I want to remind you, once again, that no one is talking about removing
functionality. We are talking about disabling it by default. The functionality
will still be there, and only one command away from being enabled. FreeBSD has
been doing that for a while, and OpenBSD did too before removing their version
of compat_linux. It is simple, and it makes sense, and I don't understand the
fuss and the nonsence that Manuel is trying to throw here.


Re: Proposal, again: Disable autoload of compat_xyz modules

2019-09-26 Thread Maxime Villard

Le 26/09/2019 à 16:47, Manuel Bouyer a écrit :

On Thu, Sep 26, 2019 at 04:40:33PM +0200, Maxime Villard wrote:


Actually this is not clear. We have linux binaries in pkgsrc.


... And? We have 22000 packages in pkgsrc.


How is it relevant ? I install less than 200. But there is suse_base in them


The real question, is how the fact that there are linux binaries in
pkgsrc relevant. Yes, there are packages. And? Most of them are
completely outdated and haven't been updated in the last 10 years.


Re: Proposal, again: Disable autoload of compat_xyz modules

2019-09-26 Thread Maxime Villard

Le 26/09/2019 à 17:25, Brian Buhrow a écrit :

[...]
One implication of your proposal is that you'll disable the autoload
functionality, users will turn it back on, use it, and be more vulnerable
than they are now because the primary developers aren't concern with making
things work or secure anymore.


Nobody is making compat_linux work, nobody is making compat_linux secure.


Re: Proposal, again: Disable autoload of compat_xyz modules

2019-09-26 Thread Maxime Villard

Le 26/09/2019 à 17:15, Manuel Bouyer a écrit :

On Thu, Sep 26, 2019 at 05:10:01PM +0200, Maxime Villard wrote:

issues for a clearly marginal use case, and given the current general

  ^^^

This is where we dissagree. You guess it's marginal but there's no
evidence of that (and there's no evidence of the opposite either).


Can you provide evidence that it is used by the majority of the users?
And that therefore keeping vulnerabilities for 100% of the people is
legitimate?

Please provide clear evidence.


Re: Proposal, again: Disable autoload of compat_xyz modules

2019-09-26 Thread Maxime Villard

Le 26/09/2019 à 17:03, Manuel Bouyer a écrit :

On Thu, Sep 26, 2019 at 04:52:35PM +0200, Maxime Villard wrote:

Le 26/09/2019 à 16:47, Manuel Bouyer a écrit :

On Thu, Sep 26, 2019 at 04:40:33PM +0200, Maxime Villard wrote:


Actually this is not clear. We have linux binaries in pkgsrc.


... And? We have 22000 packages in pkgsrc.


How is it relevant ? I install less than 200. But there is suse_base in them


The real question, is how the fact that there are linux binaries in
pkgsrc relevant. Yes, there are packages. And? Most of them are


it's relevant because they are in the binary repositories, so anyone
can install them with pkg_add or pkgin. And they don't even have to know
these are linux binaries.


completely outdated and haven't been updated in the last 10 years.


They were usefull 10 years ago; it doen't make then useless now.
For example I use then to run eagle, or microchip compilers, on
a regular basis.


Have you at least read my initial email?

I am not saying compat_linux is useless, I am saying that it has security
issues for a clearly marginal use case, and given the current general
unwillingness to maintain it and handle the security issues in it, it is a
good candidate for being disabled (!= removed). Same as it was in 2017 when
Taylor started the first thread.

Can we stop this moronic conversation and get back to the point? Thanks.

Disabling it doesn't mean removing it, and "modload compat_linux" will
restitute the functionality entirely while perserving against the
vulnerabilities by default.


Re: Proposal, again: Disable autoload of compat_xyz modules

2019-09-26 Thread Maxime Villard

Le 26/09/2019 à 16:36, Manuel Bouyer a écrit :

On Thu, Sep 26, 2019 at 04:29:52PM +0200, Maxime Villard wrote:

Le 26/09/2019 à 16:22, Mouse a écrit :

Keeping them enabled for the <1% users interested means keeping
vulnerabilities for the >99% who don't use these features.

Are the usage numbers really that extreme?  Where'd you get them?  I
didn't think there were any mechanisms in place that would allow
tracking compat usage.

No, there is no strict procedure to monitor compat usage, and there
never will be.  Maybe it's not <1%, but rather 1.5%; or maybe it's
5%, 10%, 15%.



Who cares, exactly?


The short answer is "anyone who wants NetBSD to be useful".

If it really is only a tiny fraction - under ten people, say - then,
sure, yank it out.  If it's 90%, removing it would lose most of the
userbase, possibly provoke a fork.  15%, 40%, I don't think there is a
hard line between "pull it" and "keep it", and even if there were I'm
not sure it would matter because it appears nobody knows what the
actual use rate is anyway.


What is known, however, is that 100% of the users are affected by the
vulnerabilities. So, do we keep these things enabled by default just
because "uh we don't know so we shouldn't do anything"? Even as it's
already been clear that the majority doesn't use compat_linux?


Actually this is not clear. We have linux binaries in pkgsrc.


... And? We have 22000 packages in pkgsrc.


Is it such a Herculean effort to type "modload compat_linux" for the
people that want to use Linux binaries? In order to keep the majority
safe from the bugs and vulnerabilities?


Maybe some of them don't even know they are using compat_linux ...


Yeah, and maybe I'm the Pope also, who knows.


Re: Proposal, again: Disable autoload of compat_xyz modules

2019-09-26 Thread Maxime Villard

Le 26/09/2019 à 16:22, Mouse a écrit :

Keeping them enabled for the <1% users interested means keeping
vulnerabilities for the >99% who don't use these features.

Are the usage numbers really that extreme?  Where'd you get them?  I
didn't think there were any mechanisms in place that would allow
tracking compat usage.

No, there is no strict procedure to monitor compat usage, and there
never will be.  Maybe it's not <1%, but rather 1.5%; or maybe it's
5%, 10%, 15%.



Who cares, exactly?


The short answer is "anyone who wants NetBSD to be useful".

If it really is only a tiny fraction - under ten people, say - then,
sure, yank it out.  If it's 90%, removing it would lose most of the
userbase, possibly provoke a fork.  15%, 40%, I don't think there is a
hard line between "pull it" and "keep it", and even if there were I'm
not sure it would matter because it appears nobody knows what the
actual use rate is anyway.


What is known, however, is that 100% of the users are affected by the
vulnerabilities. So, do we keep these things enabled by default just
because "uh we don't know so we shouldn't do anything"? Even as it's
already been clear that the majority doesn't use compat_linux?

Is it such a Herculean effort to type "modload compat_linux" for the
people that want to use Linux binaries? In order to keep the majority
safe from the bugs and vulnerabilities?


Re: Proposal, again: Disable autoload of compat_xyz modules

2019-09-26 Thread Maxime Villard

Le 26/09/2019 à 15:06, Mouse a écrit :

[...] compat_linux and compat_linux32 [...]



Keeping them enabled for the <1% users interested means keeping
vulnerabilities for the >99% who don't use these features.


Are the usage numbers really that extreme?  Where'd you get them?  I
didn't think there were any mechanisms in place that would allow
tracking compat usage.


No, there is no strict procedure to monitor compat usage, and there never will
be. Maybe it's not <1%, but rather 1.5%; or maybe it's 5%, 10%, 15%.

Who cares, exactly?

This compat topic has been discussed over and over, and the conclusion is
systematically that these compat options cause immense trouble for little
actual use. Now, we are again discussing the matter, because the problem is
still there.


Proposal, again: Disable autoload of compat_xyz modules

2019-09-26 Thread Maxime Villard

I recently made a big set of changes to fix many bugs and vulnerabilities in
compat_linux and compat_linux32, the majority of which have a security impact
bigger than the Intel CPU bugs we hear about so much. These compat layers are
enabled by default, so everybody is affected.

Secteam is in a state where no one is willing to pull up all the changes to
the stable branches, because of the effort. No one is willing to write a
security advisory either. When I say "no one", it includes me.

The proposal and discussion held in this 2017 thread still hold and are
unchanged two years later:

https://mail-index.netbsd.org/tech-kern/2017/07/31/msg022153.html

The compat layers are largely untested, often broken, and are a security risk
for everybody. Keeping them enabled for the <1% users interested means keeping
vulnerabilities for the >99% who don't use these features.

In the conversation above, we hit the problem that there was cross-dependency
among compat modules, and we couldn't selectively disable specific layers.
Today this is possible thanks to pgoyette's work. That is, it is possible to
comment out "options COMPAT_LINUX" from GENERIC, and have a compat_linux.kmod
which will modload correctly and be able to run Linux binaries out of the box.
Under this scheme, the feature would be only one root command away from being
enabled in the kernel.

Therefore, I am making today the same proposal as Taylor in 2017, because the
problem is still there exactly as-is and we just hit it again; the solution
however is more straightforward.


Re: kmem_alloc() vs. VM_MIN_KERNEL_ADDRESS

2019-09-22 Thread Maxime Villard

Le 02/07/2019 à 16:21, J. Hannken-Illjes a écrit :

It would work but unfortunately dtrace is a module and cannot
use PMAP_DIRECT_BASE aka. pmap_direct_base.  These variables
are not visible from module context and don't exist if the
kernel is built with KASAN.


Not related to the issue (which has been fixed), but: while the system will
remain functional if you load a GENERIC kmod on a KASAN kernel, the KASAN
detection you get when executing the kmod is only partial.

Stack overflows in the kmod, for example, won't be detected. If you want
the detection, you need to pass the KASANFLAGS content (different on x86_64
and aarch64) as CFLAGS when building the kmod, plus -DKASAN.

I think I will add "no options MODULAR" with KASAN, so that there's no
confusion. There aren't many reasons for using modules under KASAN anyway,
and the rule when it comes to kernel compiler instrumentation is that
everything should be monolitic.


Re: pool guard

2019-09-17 Thread Maxime Villard

Le 12/09/2019 à 08:21, Maxime Villard a écrit :

Le 06/09/2019 à 15:09, Maxime Villard a écrit :

An idea for a feature similar to KMEM_GUARD - which I recently removed because
it was too weak and useless -, but this time at the pool layer, covering certain
specific pools, without memory consumption or performance cost, and enabled by
default at least on amd64. Note that this is hardening and exploit mitigation,
but not bug detection, so it will be of little interest in the context of
fuzzing. Note also that it targets 64bit arches, because they have nearly
unlimited VA.

The idea is that we can use special guard allocators on certain pools to prevent
important kernel data from being close to untrusted data in the VA space.

Suppose the kernel is parsing a received packet from the network, and there
is a buffer overflow which causes it to write beyond the mbuf. The data is in
an mbuf cluster of size 2K (on amd64). This mbuf cluster sits on a 4K page
allocated using the default pool allocator. After the 4K page in memory, there
could be critical kernel data sitting, which an attacker could overwrite.

    overflow
    --->
 +++--+
 | 2K Cluster | 2K Cluster | Critical Kernel Data |
 +++--+
  <- usual 4K pool page --> <- another 4K page -->

This is a scenario that I already encountered when working on NetBSD's network
stack.

Now, we switch the mcl pool to use the new uvm_km_guard API (simple wrappers
to allocate buffers with unmapped pages at the beginning and the end). The pool
layer sees pages of size 128K, and packs 64 2K clusters in them.

    overflow
    ~>
 ++---+---+---+---+---++
 |  Unmapped  | 2K C. | 2K C. | [...] | 2K C. | 2K C. |  Unmapped  |
 ++---+---+---+---+---++
  <-- 64K ---> <-- 128K pool page with 64 clusters --> <-- 64K --->

The pool page header is off-page, and bitmapped. Therefore, there is strictly
no kernel data in the 128K pool page.

The overflow still occurs, but this time the critical kernel data is far from
here, after the unmapped pages at the end. At worst only other clusters get
overwritten; at best we are close to the end and hit a page fault which stops
the overflow. 64K is chosen as the maximum of uint16_t.

No performance cost, because these guarded buffers are allocated only when the
pools grow, which is a rare operation that occurs almost only at boot time. No
actual memory consumption either, because unmapped areas don't consume physical
memory, only virtual, and on 64bit arches we have plenty of that - eg 32TB on
amd64, far beyond what we will ever need -, so no problem with consuming VA.

The code is here [1] for mcl, it is simple and works fine. It is not perfect
but can already prevent a lot of trouble. The principle could be applied to
other pools.

[1] https://m00nbsd.net/garbage/pool/guard.diff


If there are no further comments, I will commit it within a week.


Actually I realized there is a problem. uvm_km_guard must use uvm_map and not
the kmem arena, because the kmem arena is proportionate to the PA. The thing
is, using uvm_map is illegal, because we're sometimes in interrupt context.
Needs to be revisited.


Re: pool guard

2019-09-12 Thread Maxime Villard

Le 06/09/2019 à 15:09, Maxime Villard a écrit :

An idea for a feature similar to KMEM_GUARD - which I recently removed because
it was too weak and useless -, but this time at the pool layer, covering certain
specific pools, without memory consumption or performance cost, and enabled by
default at least on amd64. Note that this is hardening and exploit mitigation,
but not bug detection, so it will be of little interest in the context of
fuzzing. Note also that it targets 64bit arches, because they have nearly
unlimited VA.

The idea is that we can use special guard allocators on certain pools to prevent
important kernel data from being close to untrusted data in the VA space.

Suppose the kernel is parsing a received packet from the network, and there
is a buffer overflow which causes it to write beyond the mbuf. The data is in
an mbuf cluster of size 2K (on amd64). This mbuf cluster sits on a 4K page
allocated using the default pool allocator. After the 4K page in memory, there
could be critical kernel data sitting, which an attacker could overwrite.

    overflow
    --->
 +++--+
 | 2K Cluster | 2K Cluster | Critical Kernel Data |
 +++--+
  <- usual 4K pool page --> <- another 4K page -->

This is a scenario that I already encountered when working on NetBSD's network
stack.

Now, we switch the mcl pool to use the new uvm_km_guard API (simple wrappers
to allocate buffers with unmapped pages at the beginning and the end). The pool
layer sees pages of size 128K, and packs 64 2K clusters in them.

    overflow
    ~>
 ++---+---+---+---+---++
 |  Unmapped  | 2K C. | 2K C. | [...] | 2K C. | 2K C. |  Unmapped  |
 ++---+---+---+---+---++
  <-- 64K ---> <-- 128K pool page with 64 clusters --> <-- 64K --->

The pool page header is off-page, and bitmapped. Therefore, there is strictly
no kernel data in the 128K pool page.

The overflow still occurs, but this time the critical kernel data is far from
here, after the unmapped pages at the end. At worst only other clusters get
overwritten; at best we are close to the end and hit a page fault which stops
the overflow. 64K is chosen as the maximum of uint16_t.

No performance cost, because these guarded buffers are allocated only when the
pools grow, which is a rare operation that occurs almost only at boot time. No
actual memory consumption either, because unmapped areas don't consume physical
memory, only virtual, and on 64bit arches we have plenty of that - eg 32TB on
amd64, far beyond what we will ever need -, so no problem with consuming VA.

The code is here [1] for mcl, it is simple and works fine. It is not perfect
but can already prevent a lot of trouble. The principle could be applied to
other pools.

[1] https://m00nbsd.net/garbage/pool/guard.diff


If there are no further comments, I will commit it within a week.


Re: pool guard

2019-09-08 Thread Maxime Villard

Le 08/09/2019 à 08:54, Jason Thorpe a écrit :

On Sep 8, 2019, at 9:29 AM, Maxime Villard  wrote:
Don't confuse VA and PA. NetBSD-amd64 supports 16TB of PA, so even if
you have 48TB nvdimms it gets truncated to 16TB. Then, we have 32TB
of VA, twice more than the maximum PA. So again, we are fine.


Yes, but obviously we should fix that at some point.


"Fixing" this entails first having a UVM that can scale up to and work
reasonably well with such gigantic amounts of RAM, which is far from being the
case currently. We will need to have SMP, NUMA, large pages, and in short, we
will likely have to rewrite UVM entirely.

By the time we have done all of that, the 64bit CPUs will already ship with
5-level page trees which multiply by 512 the size of the VA space, so again,
we will be largely fine.

Even otherwise, revisiting pool guards to increase the mapped/unmapped ratio
is a very insignificant and easy task.

I see no problem related to VA/PA amounts.


Re: pool guard

2019-09-07 Thread Maxime Villard

Le 07/09/2019 à 23:47, matthew green a écrit :

No performance cost, because these guarded buffers are allocated only when the
pools grow, which is a rare operation that occurs almost only at boot time. No
actual memory consumption either, because unmapped areas don't consume physical
memory, only virtual, and on 64bit arches we have plenty of that - eg 32TB on
amd64, far beyond what we will ever need -, so no problem with consuming VA.


i like this idea, but i would like to point out that HPE already
sell a machine with 48TiB ram and nvdimms are going to explode
the apparently memory in the coming years, so "32TiB" is very far
from "plenty".  we have many challenges to get beyond 8TiB tho,
since we count pages in 'int' all over uvm.


Don't confuse VA and PA. NetBSD-amd64 supports 16TB of PA, so even if
you have 48TB nvdimms it gets truncated to 16TB. Then, we have 32TB
of VA, twice more than the maximum PA. So again, we are fine.


pool guard

2019-09-06 Thread Maxime Villard

An idea for a feature similar to KMEM_GUARD - which I recently removed because
it was too weak and useless -, but this time at the pool layer, covering certain
specific pools, without memory consumption or performance cost, and enabled by
default at least on amd64. Note that this is hardening and exploit mitigation,
but not bug detection, so it will be of little interest in the context of
fuzzing. Note also that it targets 64bit arches, because they have nearly
unlimited VA.

The idea is that we can use special guard allocators on certain pools to prevent
important kernel data from being close to untrusted data in the VA space.

Suppose the kernel is parsing a received packet from the network, and there
is a buffer overflow which causes it to write beyond the mbuf. The data is in
an mbuf cluster of size 2K (on amd64). This mbuf cluster sits on a 4K page
allocated using the default pool allocator. After the 4K page in memory, there
could be critical kernel data sitting, which an attacker could overwrite.

   overflow
   --->
+++--+
| 2K Cluster | 2K Cluster | Critical Kernel Data |
+++--+
 <- usual 4K pool page --> <- another 4K page -->

This is a scenario that I already encountered when working on NetBSD's network
stack.

Now, we switch the mcl pool to use the new uvm_km_guard API (simple wrappers
to allocate buffers with unmapped pages at the beginning and the end). The pool
layer sees pages of size 128K, and packs 64 2K clusters in them.

   overflow
   ~>
++---+---+---+---+---++
|  Unmapped  | 2K C. | 2K C. | [...] | 2K C. | 2K C. |  Unmapped  |
++---+---+---+---+---++
 <-- 64K ---> <-- 128K pool page with 64 clusters --> <-- 64K --->

The pool page header is off-page, and bitmapped. Therefore, there is strictly
no kernel data in the 128K pool page.

The overflow still occurs, but this time the critical kernel data is far from
here, after the unmapped pages at the end. At worst only other clusters get
overwritten; at best we are close to the end and hit a page fault which stops
the overflow. 64K is chosen as the maximum of uint16_t.

No performance cost, because these guarded buffers are allocated only when the
pools grow, which is a rare operation that occurs almost only at boot time. No
actual memory consumption either, because unmapped areas don't consume physical
memory, only virtual, and on 64bit arches we have plenty of that - eg 32TB on
amd64, far beyond what we will ever need -, so no problem with consuming VA.

The code is here [1] for mcl, it is simple and works fine. It is not perfect
but can already prevent a lot of trouble. The principle could be applied to
other pools.

[1] https://m00nbsd.net/garbage/pool/guard.diff


Re: re-enabling debugging of 32 bit processes with 64 bit debugger

2019-07-20 Thread Maxime Villard

https://www.netbsd.org/~christos/ptrace32.diff


looks good to me, however the function should be called something like
cpu_mcontext32from64_validate, to make it clear that it's just
cpu_mcontext32_validate with a 64bit structure

and in process_write_regs(), add {}s in the else too


Re: [PATCH v2] Include XSTATE note in x86 core dumps

2019-07-14 Thread Maxime Villard

Le 05/07/2019 à 17:22, Michał Górny a écrit :

+#ifdef EXEC_ELF32
+#include 
+#endif
+#define PT64_GETXSTATE PT_GETXSTATE
+#define COREDUMP_MACHDEP_LWP_NOTES \
+{  \
+   struct xstate xstate;   \


memset needed, otherwise we leak stuff


+   error = process_read_xstate(l, &xstate);\
+   if (error)  \
+   return (error); \
+   ELFNAMEEND(coredump_savenote)(ns,   \
+   CONCAT(CONCAT(PT, ELFSIZE), _GETXSTATE), name, &xstate, \
+   sizeof(xstate));\
+}


Re: kmem_alloc() vs. VM_MIN_KERNEL_ADDRESS

2019-07-02 Thread Maxime Villard

Le 02/07/2019 à 16:21, J. Hannken-Illjes a écrit :

On 1. Jul 2019, at 20:12, Maxime Villard  wrote:

Le 01/07/2019 à 19:40, J. Hannken-Illjes a écrit :

Sometimes kmem_alloc() returns an address below
VM_MIN_KERNEL_ADDRESS, something like
   kmem_alloc() -> 0xaba25800a5a8 with
   VM_MIN_KERNEL_ADDRESS 0xad80
It doesn't happen after every reboot and breaks
dtrace which treats the region 0.. VM_MIN_KERNEL_ADDRESS
as toxic.
How could we test against the lowest address the kernel
may use?
--
J. Hannken-Illjes - hann...@eis.cs.tu-bs.de - TU Braunschweig


Mmh, this seems to be a side effect of 1/2-KASLR which is enabled by default
in GENERIC. Reminder: in GENERIC, we now randomize by default every kernel
memory region.

It is now possible that the direct map is below the main memory map -- it was
not the case before. kmem_alloc uses the direct map for certain allocations.

First, please confirm by also dumping PMAP_DIRECT_BASE and PMAP_DIRECT_END to
make sure the area returned by kmem_alloc is indeed in the dmap when dtrace
doesn't work.

Then, it seems that dtrace should be taught to consider as toxic everything
that is neither in the main map nor in the dmap. That is, only these ranges
are valid:

VM_MIN_KERNEL_ADDRESS -> VM_MAX_KERNEL_ADDRESS
PMAP_DIRECT_BASE -> PMAP_DIRECT_END

There is no actual "lowest" address the kernel may use; there are just two
separate regions that are valid. (Plus the kernel image itself, if it
matters.)


It would work but unfortunately dtrace is a module and cannot
use PMAP_DIRECT_BASE aka. pmap_direct_base.  These variables
are not visible from module context and don't exist if the
kernel is built with KASAN.


Ok.


Solaris uses the toxic areas 0 .. _userlimit and all regions
mapped from devices.

Will the kernel ever map anything below
VM_MIN_KERNEL_ADDRESS_DEFAULT?


No.


Do we have a mapping that contains all device-mapped memory?


No, it's either the main map or the direct map.


Re: kmem_alloc() vs. VM_MIN_KERNEL_ADDRESS

2019-07-01 Thread Maxime Villard

Le 01/07/2019 à 19:40, J. Hannken-Illjes a écrit :

Sometimes kmem_alloc() returns an address below
VM_MIN_KERNEL_ADDRESS, something like

   kmem_alloc() -> 0xaba25800a5a8 with
   VM_MIN_KERNEL_ADDRESS 0xad80

It doesn't happen after every reboot and breaks
dtrace which treats the region 0.. VM_MIN_KERNEL_ADDRESS
as toxic.

How could we test against the lowest address the kernel
may use?

--
J. Hannken-Illjes - hann...@eis.cs.tu-bs.de - TU Braunschweig


Mmh, this seems to be a side effect of 1/2-KASLR which is enabled by default
in GENERIC. Reminder: in GENERIC, we now randomize by default every kernel
memory region.

It is now possible that the direct map is below the main memory map -- it was
not the case before. kmem_alloc uses the direct map for certain allocations.

First, please confirm by also dumping PMAP_DIRECT_BASE and PMAP_DIRECT_END to
make sure the area returned by kmem_alloc is indeed in the dmap when dtrace
doesn't work.

Then, it seems that dtrace should be taught to consider as toxic everything
that is neither in the main map nor in the dmap. That is, only these ranges
are valid:

VM_MIN_KERNEL_ADDRESS -> VM_MAX_KERNEL_ADDRESS
PMAP_DIRECT_BASE -> PMAP_DIRECT_END

There is no actual "lowest" address the kernel may use; there are just two
separate regions that are valid. (Plus the kernel image itself, if it
matters.)


Re: nvmm: change the api

2019-06-30 Thread Maxime Villard

Le 05/06/2019 à 21:26, Kamil Rytarowski a écrit :

On 05.06.2019 08:08, Maxime Villard wrote:

I intend to change the NVMM API in order to reduce the data movements. The
patches are available here [1].



This looks good and it will make the interfaces simpler.

Could we merge the demo example into /usr/share/examples/nvmm ?

https://www.netbsd.org/~maxv/nvmm/nvmm-demo.zip

It will be easier and quicker to navigate it on disk easier whenever needed.


I didn't feel like consuming disk space


Re: [PATCH v3 2/2] Implement PT_GETXSTATE and PT_SETXSTATE

2019-06-25 Thread Maxime Villard

Le 22/06/2019 à 09:38, Michał Górny a écrit :

Hi,

Attached is the next version of unified patch.  I've updated the boolean
logic as requested, and fixed indentations.


Looks good to me. Please don't use ()s when not needed, like

-   memset(&(xstate->xs_fxsave), 0, sizeof(xstate->xs_fxsave));
-   process_s87_to_xmm(&fpu_save->sv_87, &(xstate->xs_fxsave));
+   memset(&xstate->xs_fxsave, 0, sizeof(xstate->xs_fxsave));
+   process_s87_to_xmm(&fpu_save->sv_87, &xstate->xs_fxsave);

but whatever, this is detail


Re: [PATCH v3 2/2] Implement PT_GETXSTATE and PT_SETXSTATE

2019-06-21 Thread Maxime Villard

Le 22/06/2019 à 07:42, Michał Górny a écrit :

On Sat, 2019-06-22 at 07:35 +0200, Maxime Villard wrote:

Le 19/06/2019 à 14:20, Michał Górny a écrit :> On Wed, 2019-06-19 at 06:32 
+0200, Maxime Villard wrote:

Can you provide a unified patch so I can review quickly? Sorry for the delay


Here you are.  Note that the manpage is not updated yet, as I'm waiting
for your comments on the xs_rfbm approach.
+   memcpy(&(xstate -> field),  
 \
+   (char*)fpu_save + x86_xsave_offsets[xsave_val], 
\
+   sizeof(xstate -> field));\


Please use proper KNF here and in other places


Are you talking of indentation?  It would be really helpful to be more
specific here.


Yes

memcpy(&xstate->field,   \
(char*)fpu_save + x86_xsave_offsets[xsave_val], \
sizeof(xstate->field));  \

But that's just cosmetic


+   fpu_save->sv_xsave_hdr.xsh_xstate_bv =
+   (~xstate->xs_rfbm | xstate->xs_xstate_bv) &
+   (xstate->xs_rfbm | 
fpu_save->sv_xsave_hdr.xsh_xstate_bv);


Can't we just do this? Seems simpler

fpu_save->sv_xsave_hdr.xsh_xstate_bv =
(fpu_save->sv_xsave_hdr.xsh_xstate_bv & ~xstate->xs_rfbm) |
xstate->xs_xstate_bv;


Unless I'm mistaken, this would prevent the user from being able to
force xs_xstate_bv to 0 if it was set previously, and therefore clear
registers to 'unmodified' state.


If the user gives RFBM=all and XSTATE_BV=0, it does force the result to zero


Unfortunately, in this code you will likely hit a race; between the moment you
set fpu_save->sv_xsave_hdr.xsh_xstate_bv and the moment you reach 
COPY_COMPONENT,
the thread could have been context-switched, and xstate_bv could have been
clobbered.


Unless I'm mistaken, this code should only be executed on stopped
processes.  Or at least that was the idea Kamil suggested to me.


Mmh yes indeed, I even said it earlier in the thread; forget it, I was trying
to remember where I had seen races in the FPU code, and it's not a problem
here


Re: [PATCH v3 2/2] Implement PT_GETXSTATE and PT_SETXSTATE

2019-06-21 Thread Maxime Villard

Le 19/06/2019 à 14:20, Michał Górny a écrit :> On Wed, 2019-06-19 at 06:32 
+0200, Maxime Villard wrote:

Can you provide a unified patch so I can review quickly? Sorry for the delay


Here you are.  Note that the manpage is not updated yet, as I'm waiting
for your comments on the xs_rfbm approach.



+   memcpy(&(xstate -> field),  
 \
+   (char*)fpu_save + x86_xsave_offsets[xsave_val], 
\
+   sizeof(xstate -> field));\


Please use proper KNF here and in other places


+   fpu_save->sv_xsave_hdr.xsh_xstate_bv =
+   (~xstate->xs_rfbm | xstate->xs_xstate_bv) &
+   (xstate->xs_rfbm | 
fpu_save->sv_xsave_hdr.xsh_xstate_bv);


Can't we just do this? Seems simpler

fpu_save->sv_xsave_hdr.xsh_xstate_bv =
(fpu_save->sv_xsave_hdr.xsh_xstate_bv & ~xstate->xs_rfbm) |
xstate->xs_xstate_bv;

Unfortunately, in this code you will likely hit a race; between the moment you
set fpu_save->sv_xsave_hdr.xsh_xstate_bv and the moment you reach 
COPY_COMPONENT,
the thread could have been context-switched, and xstate_bv could have been
clobbered.

I think you can work around this race by adding kpreempt_disable/enable at
the beginning/end of the function. ISTR there is the same problem in the
other fpregs ptrace functions. In fact, there are a lot of problems in our
FPU code right now.

Otherwise the rfbm approach seems good to me.


Re: ehci: fix error handling?

2019-06-14 Thread Maxime Villard

Le 25/05/2019 à 08:53, Iain Hibbert a écrit :

On Sat, 25 May 2019, Maxime Villard wrote:


To fix this problem, we need to fix the error handling in ehci_pci_attach(),
so that goto fail does not leave 'sc' in an inconsistent state.

What is the best way to do that? Should I somehow forcibly unregister the
device, or should I add a flag in 'sc' to indicate whether it is initialized?


the latter


It looks like there is no clean way to handle attach failures.


attach always 'succeeds' in the sense that after attach has been called,
detach will always be called. The detach routine should tear down
everything that needs tearing down and not do things that will fail.
Perhaps the init could simply be done before the attach routine gets the
chance to fail?

iain


Ok, thanks, I've fixed it and the machine shuts down correctly now.

However there seem to be other problems in the attach/detach. ehci_init()
and ehci_detach() are not symmetrical, the former initializes more stuff
than the latter deinitializes. ehci_pci.c deinitializes the extra stuff
correctly, but the others (cardbus, arm, mips) don't.


Re: [PATCH v3 2/2] Implement PT_GETXSTATE and PT_SETXSTATE

2019-06-09 Thread Maxime Villard

Le 09/06/2019 à 09:46, Michał Górny a écrit :

On Sun, 2019-06-09 at 09:08 +0200, Maxime Villard wrote:

Le 07/06/2019 à 20:33, Michał Górny a écrit :

[...]
+int
+process_machdep_doxstate(struct lwp *curl, struct lwp *l, struct uio *uio)
+   /* curl: tracer */
+   /* l:traced */
+{
+   int error;
+   struct xstate r;
+   char *kv;
+   ssize_t kl;
+
+   kl = MIN(uio->uio_iov->iov_len, sizeof(r));
+   kv = (char *) &r;
+
+   kv += uio->uio_offset;
+   kl -= uio->uio_offset;
+   if (kl > uio->uio_resid)
+   kl = uio->uio_resid;
+
+   if (kl < 0)
+   error = EINVAL;
+   else
+   error = process_machdep_read_xstate(l, &r);
+   if (error == 0)
+   error = uiomove(kv, kl, uio);
+   if (error == 0 && uio->uio_rw == UIO_WRITE) {
+   if (l->l_proc->p_stat != SSTOP)
+   error = EBUSY;


Isn't it supposed to always be the case?


To be honest, I've followed suit with all other getter-setters there.
I can't say if it's fail-safe that's never supposed to fire or if there
could be some mistaken use possible that would trigger this.


I don't see other places that do that. If it was needed there would be a
race, because it doesn't seem to me we lock the proc, so what if the proc
goes != SSTOP after the check.


+   /* Copy MXCSR if either SSE or AVX state is requested */
+   if (xstate->xs_xstate_bv & (XCR0_SSE|XCR0_YMM_Hi128)) {
+   memcpy(&fpu_save->sv_xmm.fx_mxcsr, &xstate->xs_fxsave.fx_mxcsr, 
8);
+
+   /*
+* Invalid bits in mxcsr or mxcsr_mask will cause faults.
+*/
+   fpu_save->sv_xmm.fx_mxcsr_mask &= x86_fpu_mxcsr_mask;
+   fpu_save->sv_xmm.fx_mxcsr &= fpu_save->sv_xmm.fx_mxcsr_mask;


Please use a simple assignment instead of memcpy, and also filter out 'xstate'
and not 'fpu_save'.


Will do.  Actually, simple assignment means I can filter it out while
assigning.


Also, it would be nice to clarify the use case here. On x86, mxcsr gets reloaded
*regardless* of whether xstate_bv contains SSE|AVX. Should the ptrace api also
reload mxcsr regardless of whether the user requested SSE|AVX in xstate_bv?


I was following the Intel programmer's manual.  XRSTOR uses mxcsr either
if SSE or AVX is requested via EAX.


Yes, but EAX is the RFBM, not XSTATE_BV.


+   }
+
+   /* Copy SSE state if requested. */
+   if (xstate->xs_xstate_bv & XCR0_SSE) {
+   if (x86_fpu_save >= FPU_SAVE_XSAVE) {
+   KASSERT(fpu_save->sv_xsave_hdr.xsh_xstate_bv & 
XCR0_SSE);


Mmh, maybe it is possible that this KASSERT fires, if the LWP hasn't used SSE.


I don't think that can actually happen with XSAVE.  I wouldn't be sure
with XSAVEOPT but we're not using it.


In fact, what is the purpose?


The exact purpose is to verify that a request to write unsupported
component didn't get through process_verify_xstate().  It's an assert,
so it means to check for something-we-really-don't-expect-to-happen.


The KASSERT is on 'fpu_save', not 'xstate', so process_verify_xstate() doesn't
play any role here. My point was: if the traced proc never used SSE, then
hardware.xstate_bv[SSE]=0 (which means, the "init" state). And the KASSERT
fires.

A few other things:

 * In process_machdep_doxstate(), need to add a memset(&r, 0, sizeof(r));

 * In process_write_xstate(), need to OR the xstate_bv in 'fpu_save' with
   the xstate_bv in 'xstate', otherwise the next XRSTOR of the traced proc
   may not install the registers correctly

 * Is there any urgency in adding support for XSTATE in ptrace? Because I am
   currently rewriting the FPU code entirely

 * I think it would be easier if you hosted your patches online, because the
   repeated threads make it hard to follow what's going on


Re: [PATCH v3 1/2] Fetch XSAVE area component offsets and sizes when initializing x86 CPU

2019-06-09 Thread Maxime Villard

Le 09/06/2019 à 09:33, Michał Górny a écrit :

On Sun, 2019-06-09 at 09:20 +0200, Maxime Villard wrote:

In fact, the whole loop seems wrong: CPUID uses the XCR0_ constants, not the
XSAVE_ constants. Eg HDC is 13 in XCR0_ but 10 in XSAVE_, so you never iterate
over it.


It's intentional.  I only iterate up to XSAVE_MAX_COMPONENT,
i.e. the last component actually used by the kernel.  I don't skip
earlier unused components to avoid index madness but I see no purpose to
go beyond last component actually used.


Ok, that's fine for now and we can revisit that later for future states.
However, maybe rename XSAVE_ -> XCRO_SHIFT_, or something else, to make it
clear that this depends on the hardware layout.


Re: [PATCH v3 1/2] Fetch XSAVE area component offsets and sizes when initializing x86 CPU

2019-06-09 Thread Maxime Villard

Le 07/06/2019 à 20:33, Michał Górny a écrit :

+#define XSAVE_MAX_COMPONENT XSAVE_Hi16_ZMM
[...]
+size_t x86_xsave_offsets[XSAVE_MAX_COMPONENT] __read_mostly;
+size_t x86_xsave_sizes[XSAVE_MAX_COMPONENT] __read_mostly;


Need +1, but see below


+   /* Get component offsets and sizes for the save area */
+   for (i = XSAVE_YMM_Hi128; i < __arraycount(x86_xsave_offsets); i++) {
+   if (x86_xsave_features & ((uint64_t)1 << i)) {


I would use __BIT(i).

In fact, the whole loop seems wrong: CPUID uses the XCR0_ constants, not the
XSAVE_ constants. Eg HDC is 13 in XCR0_ but 10 in XSAVE_, so you never iterate
over it.

I would drop XSAVE_* entirely and just use XCR0_*, it is forward compatible.


-#define__NetBSD_Version__  899004200   /* NetBSD 8.99.42 */
+#define__NetBSD_Version__  899004300   /* NetBSD 8.99.43 */


No version bump needed for that


Re: [PATCH v3 2/2] Implement PT_GETXSTATE and PT_SETXSTATE

2019-06-09 Thread Maxime Villard

Le 07/06/2019 à 20:33, Michał Górny a écrit :

[...]
+int
+process_machdep_doxstate(struct lwp *curl, struct lwp *l, struct uio *uio)
+   /* curl: tracer */
+   /* l:traced */
+{
+   int error;
+   struct xstate r;
+   char *kv;
+   ssize_t kl;
+
+   kl = MIN(uio->uio_iov->iov_len, sizeof(r));
+   kv = (char *) &r;
+
+   kv += uio->uio_offset;
+   kl -= uio->uio_offset;
+   if (kl > uio->uio_resid)
+   kl = uio->uio_resid;
+
+   if (kl < 0)
+   error = EINVAL;
+   else
+   error = process_machdep_read_xstate(l, &r);
+   if (error == 0)
+   error = uiomove(kv, kl, uio);
+   if (error == 0 && uio->uio_rw == UIO_WRITE) {
+   if (l->l_proc->p_stat != SSTOP)
+   error = EBUSY;


Isn't it supposed to always be the case?


+   /* Copy MXCSR if either SSE or AVX state is requested */
+   if (xstate->xs_xstate_bv & (XCR0_SSE|XCR0_YMM_Hi128)) {
+   memcpy(&fpu_save->sv_xmm.fx_mxcsr, &xstate->xs_fxsave.fx_mxcsr, 
8);
+
+   /*
+* Invalid bits in mxcsr or mxcsr_mask will cause faults.
+*/
+   fpu_save->sv_xmm.fx_mxcsr_mask &= x86_fpu_mxcsr_mask;
+   fpu_save->sv_xmm.fx_mxcsr &= fpu_save->sv_xmm.fx_mxcsr_mask;


Please use a simple assignment instead of memcpy, and also filter out 'xstate'
and not 'fpu_save'.

Also, it would be nice to clarify the use case here. On x86, mxcsr gets reloaded
*regardless* of whether xstate_bv contains SSE|AVX. Should the ptrace api also
reload mxcsr regardless of whether the user requested SSE|AVX in xstate_bv?


+   }
+
+   /* Copy SSE state if requested. */
+   if (xstate->xs_xstate_bv & XCR0_SSE) {
+   if (x86_fpu_save >= FPU_SAVE_XSAVE) {
+   KASSERT(fpu_save->sv_xsave_hdr.xsh_xstate_bv & 
XCR0_SSE);


Mmh, maybe it is possible that this KASSERT fires, if the LWP hasn't used SSE.
In fact, what is the purpose?

Also general notes:

 * Please KNF
 * Don't add too many comments when the code is simple enough


nvmm: change the api

2019-06-04 Thread Maxime Villard

I intend to change the NVMM API in order to reduce the data movements. The
patches are available here [1].

Basically until now the API expected the user to create structures and then
pass them in a libnvmm function. Typically:

/* The structures. */
struct nvmm_machine mach;
struct nvmm_x64_state state;
struct nvmm_event event;
struct nvmm_exit exit;

/* Create the machine and VCPU0. */
nvmm_machine_create(&mach);
nvmm_vcpu_create(&mach, 0);

/* Set RAX=123. */
nvmm_vcpu_getstate(&mach, 0, &state, NVMM_X64_STATE_GPRS);
state.gprs[NVMM_X64_GPR_RAX] = 123;
nvmm_vcpu_setstate(&mach, 0, &state, NVMM_X64_STATE_GPRS);

/* Inject an #UD exception. */
event.type = NVMM_EXCEPTION;
event.vector = 6;
nvmm_vcpu_inject(&mach, 0, &event);

/* Run. */
nvmm_vcpu_run(&mach, 0, &exit);
switch (exit.reason) {
...
}

This design has some problems: the structures can occupy a lot of stack
(nvmm_x64_state is > 1000 bytes), and must be copied in and out of the
comm page by libnvmm. We can avoid this by giving the user direct access
to the comm page.

A new nvmm_vcpu structure is introduced, and contains pointers to the comm
page. The new usage is:

/* The structures. */
struct nvmm_machine mach;
struct nvmm_vcpu vcpu;

/* Create the machine and VCPU0. */
nvmm_machine_create(&mach);
nvmm_vcpu_create(&mach, 0, &vcpu);

/* Set RAX=123. getstate populates vcpu.state->gprs[]. */
nvmm_vcpu_getstate(&mach, &vcpu, NVMM_X64_STATE_GPRS);
vcpu.state->gprs[NVMM_X64_GPR_RAX] = 123;
nvmm_vcpu_setstate(&mach, &vcpu, NVMM_X64_STATE_GPRS);

/* Inject an #UD exception. */
vcpu.event->type = NVMM_EXCEPTION;
vcpu.event->vector = 6;
nvmm_vcpu_inject(&mach, &vcpu);

/* Run. */
nvmm_vcpu_run(&mach, &vcpu);
switch (vcpu.exit->reason) {
...
}

No kernel changes are required. This should be part of NetBSD 9.

[1] https://m00nbsd.net/garbage/nvmm/nvmm-nocopy.zip


ehci: fix error handling?

2019-05-24 Thread Maxime Villard

Recent Lenovo machines have a Realtek USB 1.1 controller with EHCI. NetBSD
tries to attach it in dev/pci/ehci_pci.c::ehci_pci_attach(), and does goto
fail with "pre-2.0 USB rev".

The thing is, the device remains registered. When I try to shut down the
machine, ehci_pci_detach() gets called and crashes, because we didn't
completely initialize 'sc' in ehci_pci_attach(). In particular, it triggers
a KASSERT in callout_halt(), because callout_init() was not called before.

To fix this problem, we need to fix the error handling in ehci_pci_attach(),
so that goto fail does not leave 'sc' in an inconsistent state.

What is the best way to do that? Should I somehow forcibly unregister the
device, or should I add a flag in 'sc' to indicate whether it is initialized?
It looks like there is no clean way to handle attach failures.

Currently it is impossible to cleanly shut down or reboot this machine.


Re: Removing PF

2019-04-04 Thread Maxime Villard

Le 02/04/2019 à 22:29, Mouse a écrit :

I continue to use pf and not npf because : [...]

However, I must say I'm still a bit confused by this answer (and the
others I've seen).  Do you understand that PF is a clear security
risk for your system?


Is it?  Do you know MLH's systems enough to know whether any of the
known vulnerabilities are relevant?  I don't.


That's a really simplistic answer; if by any chance he happens to avoid
every vuln found until now then it's fine to use PF? What about the next
one that comes in, will he still be lucky? No one will fix it, and it's
not viable. Or, he could be using PF in an entirely offline network, in
which case it is indeed fine (but that doesn't seem to be his particular
case).


If you really want to use PF, I would recommend that you switch to
another OS, for your own safety.  PF has no future in NetBSD.


It doesn't?  It seems to me, from the lack of consensus I'm seeing
here, that that remains to be seen.


We've been seeing it for eleven years, but we can wait another eleven
years in case you still have a doubt about whether anyone wants to work
on PF...


Re: Removing PF

2019-04-02 Thread Maxime Villard

Le 02/04/2019 à 20:46, MLH a écrit :

I continue to use pf and not npf because :

1) I couldn't get std rulesets to seem to work (been a while though)
2) no port redirection
3) dynamic ruleset use didn't appear to be adequate
4) greylisting (not just email) for custom stuff that I can't see
how to support in npf.
5) Needs far more documentation and help than I have seen.

I would like to move to npf as some future features look nice (SYN
floods, DoS attacks, etc). However, in addition to std rulesets,
etc, I use log followers to block attacks. While not the main
security, they really help hold down traffic, etc. and I'm not
anywhere near willing to give them up. I tried using blacklistd
but never could get it to work (also been a while).


At least people answer to the question that was asked, so thanks for that
already.

However, I must say I'm still a bit confused by this answer (and the others
I've seen). Do you understand that PF is a clear security risk for your
system? Or, you obviously understand, but don't care much? Sure, PF has
features NPF doesn't have; but a firewall is supposed to stop the fire,
not create the conditions for it to spread. And sure, each software has bugs,
but you don't need to have a nobel prize to understand that 11yo unmaintained
software has much more bugs than its up-to-date version, in the case of PF
it is obviously proven.

In essence, if it's that you don't care, then indeed keeping PF may not be a
real problem for us, except looking a bit irresponsible. I mean, we don't
care either if you give your credit card number to every stranger that calls
you on the phone... some responsibility is on the user's side. However, I do
believe that our responsibility is still to prevent confusion, even when it
implies removing some features. Yes, it is sad if you can't use ftp-proxy on
NPF for now, yes NPF's syntax is not the same as PF's, and so on. But NPF
equally has many advanced benefits, that you don't get with PF. If you really
want to use PF, I would recommend that you switch to another OS, for your own
safety. PF has no future in NetBSD.

It's been one decade of this, at some point we need to cut the crap.


Re: Removing PF

2019-03-30 Thread Maxime Villard

Le 30/03/2019 à 19:22, Michael van Elst a écrit :

On Sat, Mar 30, 2019 at 04:55:53PM +0100, Maxime Villard wrote:


Of course, if it's really about rototillng things actively in other places
and to limit the amount of work to fix the fallout, it suddenly becomes
reasonable to remove something you personally don't care about.


... yes, that's part of the goal, removing dead wood makes


So, whatever you are not care about, you call dead wood.


... I'm still waiting for you to provide a list of people who maintain PF in
NetBSD ... still waiting ... I'm also waiting for you to provide actual
answers rather than one-line quotes from my emails, taken out of context, and
used to draw glaringly wrong conclusions (if any) ... but I guess no point
waiting for that to happen ...


We should remember that.


... sure, meanwhile you didn't really answer to the core of the issue, which
I think was stated clearly by Sevan ...


Re: GSoC 2019 - Porting wine to amd64

2019-03-30 Thread Maxime Villard

Le 30/03/2019 à 20:14, zerous a écrit :

On Sat, Mar 30, 2019 at 05:15:27PM +0100, Maxime Villard wrote:


Kindly find the draft attached along with this mail. I would be glad
to receive feedback on the same.


This sounds really good. However, I have to warn you: this project is
more a userland project than a kernel project, and the amount of kernel
work will be limited.

But if that's fine on your side, that's fine on ours.


That is absolutely fine with me. I hope it will help me know where to look
when I find myself lost in the source code and moreover, I will get the chance
to interact with the developers who built it. So, I am really looking forward
to doing this project.


Great, in this case I believe the next step is that you submit your proposal
on the GSoC website, and we'll pick it up from there.


Re: Removing PF

2019-03-30 Thread Maxime Villard

Le 30/03/2019 à 20:26, Michael van Elst a écrit :

On Sat, Mar 30, 2019 at 08:10:21PM +0100, Maxime Villard wrote:


... sure, meanwhile you didn't really answer to the core of the issue, which
I think was stated clearly by Sevan ...


The issue is that we need to work on npf before we can drop other code.


... the questions raised were: why would someone use an insecure firewall? ...
and isn't it irresponsible to provide an insecure firewall? ... you still
fail to answer ... I see fewer and fewer reasons to keep talking to you,
given your clear inability to answer in good faith ...


Re: Removing PF

2019-03-30 Thread Maxime Villard

Le 30/03/2019 à 16:24, Michael van Elst a écrit :

On Sat, Mar 30, 2019 at 03:11:22PM +0100, Maxime Villard wrote:



The best way forward to drop PF is to actively develop NPF.


... and the best way to actively develop NPF is to stop splitting effort on
three firewalls


So now you stop maintaining PF and IPF and redirect your efforts to NPF ?
If not you, who does ? Somehow this doesn't fit into your narrative that
there are no maintainers.


... yes, when PF is not there I don't have to worry about PF-related PRs
that report huge vulns in the general indifference, and I don't need to
waste time investigating, fixing, and documenting the vulns because I
feel like I should care for the poor vulnerable people that use PF ...
there indeed are no maintainers, just people like me who make (or rather,
have made, and don't want to anymore) changes to prevent the code from
collapsing completely, or who mechanically change APIs without ever
testing for real, thereby possibly adding even more bugs and vulns than
there already are ...


Of course, if it's really about rototillng things actively in other places
and to limit the amount of work to fix the fallout, it suddenly becomes
reasonable to remove something you personally don't care about.


... yes, that's part of the goal, removing dead wood makes stuff easier to
maintain and improve, and as discussed in other threads, things that have
legitimate reasons to go, have legitimate reasons to go ...


and stop directing users to the three of them ...


It's a choice, users like that, in particular if that choice makes
a difference. Someone like you should understand this.


... I've already said I understand the "missing features" aspect of things,
but I've also already questioned whether it was a really good argument,
because I don't understand how someone can possibly want a flawed
firewall, which, in the case of PF, can open more security holes than it
plugs ... so, why? ... Sevan has summed up pretty well the underlying
question: isn't it irresponsible to ship an insecure firewall and give our
users the impression that they can use it, or even the possibility to use
it ... so, isn't it irresponsible?


Re: GSoC 2019 - Porting wine to amd64

2019-03-30 Thread Maxime Villard

Le 29/03/2019 à 12:13, zerous a écrit :

Hi,

I strongly feel like commiting to the project of porting Wine to amd64
on NetBSD. My reasons for this liking include my experience with
BSD/UNIX, C and shell in general. I have always wanted to do systems
engineering which deals with the kernel and other subsystems. I have 4
years of experience in system administration and programming in C. I
decided to take up this gig primarily because it deals with emulation
and as a KERNEL option is involved I might as well use this
opportunity to learn more about kernel modules.

I have written a draft-proposal for the project after consulting with
some of the netbsd developers like @medfly, @martin and @leot.

Kindly find the draft attached along with this mail. I would be glad
to receive feedback on the same.


This sounds really good. However, I have to warn you: this project is
more a userland project than a kernel project, and the amount of kernel
work will be limited.

But if that's fine on your side, that's fine on ours.


Re: Removing PF

2019-03-30 Thread Maxime Villard

Le 30/03/2019 à 14:34, Michael van Elst a écrit :

On Sat, Mar 30, 2019 at 02:18:29PM +0100, Piotr Meyer wrote:

On Sat, Mar 30, 2019 at 12:59:00PM -, Michael van Elst wrote:

Just to get the facts straight: NPF has a bigger market share *outside*
NetBSD, at least certainly for commercial users.  They also contribute.


Ironically, the same is also true for PF...


But contributions to PF are made against OpenBSD kernel, true? And the
essence of problem is that:
- NetBSD version of PF is too old to adapt them without ton of work,


That's self-evident, the "ton of work" is of course just the same that
would have been needed if you had tracked pf development.


... which doesn't withdraw anything from the fact that no one has been
willing to do this work over the last 10 years, and likely not in the
next 10 years either ...


- even current PF in OpenBSD doesn't fit very well into MP kernels


Whatever "MP kernel" is. It no longer fits that well to our recent
MP changes of the network stack, but that mostly results in being
less efficient, not in being more difficult to port or track.


... reports from FreeBSD on the matter indicate that it is more difficult
to port, unsurprisingly, not to mention that having PF will then be an
obstacle to enabling full MP-safety in GENERIC ... so no, it's not just
less efficient, it becomes a big obstacle to full MP-safety in GENERIC ...


   and that makes "fresh" import cumbersome and fruitless


It's always cumbersome to import foreign code. NPF won't be any different,
but since it is a newer development, it has some headstart. Don't believe
NPF is something "native" and thus easier to track, see the recent change
from our "native" proplib to its proprietary library.


... NPF is NetBSD's native firewall and is well integrated in the NetBSD
kernel, the recent switch from proplib to libnv is very anecdotical, if
anything ...

Le 30/03/2019 à 14:47, Michael van Elst a écrit :

That's a myth. When I switched from IPF to NPF, the first thing required
was to fix NPF and NPF-related bugs. The NetBSD version of it was pretty
much unmaintained, it still is to a lesser degree.


... unmaintained as in "it was 2 months behind the Github version", not as
in "it was 11 years behind the OpenBSD version" ... the NPF development is
jumping between NetBSD and Github, with changes being regularly merged in
both directions by Mindaugas from time to time ...


The best way forward to drop PF is to actively develop NPF.


... and the best way to actively develop NPF is to stop splitting effort on
three firewalls and stop directing users to the three of them ... it's also
to stop this illusional/irrational approach by which it is fine to use
inflammable firewalls ... yes, each software has bugs, but maintained
software benefits from improvements, outdated software doesn't ... outdated
software must be kept synchronized, which brings us back to the first point,
no one wants to work on that ...


Re: Removing PF

2019-03-30 Thread Maxime Villard

Le 30/03/2019 à 11:06, Manuel Bouyer a écrit :

On Sat, Mar 30, 2019 at 06:07:01AM +0100, Maxime Villard wrote:

[...]
If the effort hadn't been split on three firewalls in the last 10 years, but
rather had been focused on only one, don't you think NPF would now be
featureful?


Exactly. If work had been done to improve ipf instead of reinventing the
wheel, ipf would probably be in much better shaoe now.


Well, you're not wrong, but there were also many good reasons for developing a
firewall from scratch, and NPF is now used in commercial products, while IPF
is being dropped by commercial products. NPF is not a failure.


I'm still using ipf because npf miss the features I need (essentialy
groups). Each time I mentioned this, no comments on this topic were made.


This is one of the reasons why we keep IPF for now, to wait for NPF to reach
feature-parity. This point hasn't been ignored.

But this thread is about PF, not IPF, and was prompted by the two recent PF
vulns that were not fixed in NetBSD by anyone.

I'm talking about the current state of affairs, and how to move forward. The
current state of affairs is that PF is in a deplorable state compared to NPF,
which "only" lacks a few features.

The argument made about "ftp-proxy", while understandable, does remain highly
questionable. PF hasn't received any security updates in NetBSD, so do people
really wish to still use it for ftp-proxy? It seems silly to use an outdated
firewall, because it creates more security holes than it plugs. Firewalls are
not supposed to be inflammable.

Please do not confuse, this thread is about PF, not IPF. I believe we can
easily resolve the ftp-proxy support issue on NPF.


Re: Removing PF

2019-03-30 Thread Maxime Villard

Le 30/03/2019 à 10:06, BERTRAND Joël a écrit :

Michael van Elst a écrit :

joel.bertr...@systella.fr (=?UTF-8?Q?BERTRAND_Jo=c3=abl?=) writes:


Just a question. I use NPF for a while on NetBSD-8 that provides router
functionalities (over a lot of network interfaces with agr, bridge, vpn
and some others...). NPF runs fine but I'm unable to configure
ftp-proxy. Last time I have checked, ftp proxy doesn't work and I have
tried to fix ftp-proxy (src/dist/pf/usr.sbin/ftp-proxy) without any
success. A lot of npf code is missing or broken.


NPF so far doesn't implement a ftp proxy. The ftp-proxy you found is for PF.



I know. But without a function like ftp-proxy, npf cannot replace fp.
ftp-proxy seems to be a very simple code. Before dropping fp, I think
ftp-proxy has to run with npf.

JB


Fair enough, this _is_ an actual reason for not removing PF right now. I'll
look at how the ftp-proxy stuff can work on NPF.


Re: Removing PF

2019-03-30 Thread Maxime Villard

Le 30/03/2019 à 08:51, Michael van Elst a écrit :

On Sat, Mar 30, 2019 at 08:17:44AM +0100, Maxime Villard wrote:

Le 30/03/2019 à 08:07, Michael van Elst a écrit :

m...@m00nbsd.net (Maxime Villard) writes:


If the effort hadn't been split on three firewalls in the last 10 years, but
rather had been focused on only one, don't you think NPF would now be
featureful?


Now if the effort of the world hadn't been split among Windows, MacOS
and Linux, but rather had been focused on only NetBSD, don't you think
NetBSD would now be featureful ?


Yes, if the effort had been on one firewall instead of three, the one
chosen would be more functional. Period. You have absolutely no point,
do you?


You just don't understand the point. Period. Hey, I said Period, it must
be true.

Do you believe people would give up their efforts on Windows, or
MacOS or Linux to focus on NetBSD ?

Ah, silly me, too abstract. Do you really believe the maintainers
of ipf and npf would give up their projects to work on pf ?


Ah because there are maintainers of ipf? There is _no_ maintainer of ipf
whatsoever. Do you understand, or not? Give me a list of people who maintain
ipf. Just like, give me a list of people who maintain pf in NetBSD.


The point is that you only alienate people that are not following
your decision. Period. Hah, again.


Decision? I haven't taken any decision, I've made a clear and logical proposal
to drop PF, that I am sending to a public mailing list for further discussion.

You, on the other side, have put forward zero clear argument, and have just
thrown random, inaccurate and irrelevant comparisons, that leave my proposal
completely unchallenged.


Re: Removing PF

2019-03-30 Thread Maxime Villard

Le 30/03/2019 à 10:40, Maxime Villard a écrit :

Le 30/03/2019 à 10:06, BERTRAND Joël a écrit :

Michael van Elst a écrit :

joel.bertr...@systella.fr (=?UTF-8?Q?BERTRAND_Jo=c3=abl?=) writes:


Just a question. I use NPF for a while on NetBSD-8 that provides router
functionalities (over a lot of network interfaces with agr, bridge, vpn
and some others...). NPF runs fine but I'm unable to configure
ftp-proxy. Last time I have checked, ftp proxy doesn't work and I have
tried to fix ftp-proxy (src/dist/pf/usr.sbin/ftp-proxy) without any
success. A lot of npf code is missing or broken.


NPF so far doesn't implement a ftp proxy. The ftp-proxy you found is for PF.



I know. But without a function like ftp-proxy, npf cannot replace fp.
ftp-proxy seems to be a very simple code. Before dropping fp, I think
ftp-proxy has to run with npf.

JB


Fair enough, this _is_ an actual reason for not removing PF right now. I'll
look at how the ftp-proxy stuff can work on NPF.


Hum, wait a minute in here. Am I being totally dumb, or NPF does support
ftp-proxy since 2011? See src/dist/pf/usr.sbin/ftp-proxy/npf.c, and the
associated commit message.


Re: Removing PF

2019-03-30 Thread Maxime Villard

Le 30/03/2019 à 08:07, Michael van Elst a écrit :

m...@m00nbsd.net (Maxime Villard) writes:


If the effort hadn't been split on three firewalls in the last 10 years, but
rather had been focused on only one, don't you think NPF would now be
featureful?


Now if the effort of the world hadn't been split among Windows, MacOS
and Linux, but rather had been focused on only NetBSD, don't you think
NetBSD would now be featureful ?


Yes, if the effort had been on one firewall instead of three, the one
chosen would be more functional. Period. You have absolutely no point,
do you?


Re: Removing PF

2019-03-29 Thread Maxime Villard

Le 29/03/2019 à 22:10, Brian Buhrow a écrit :

...
And, prey tell, why is it harder to maintain a modern version
of PF than NPF?
...


It is about the _fourth_ time you ask this question, and for the fourth time
I am going to provide the exact same answer. PF has a legacy design that makes
it difficult to import in MP-safe kernels like NetBSD or FreeBSD. NPF is
already MP-safe and well integrated in the NetBSD kernel, and that's why it is
a lot easier to work on. Is it clear now, or do you want to ask a fifth time?

Again, if it was as easy you as think it is, we woud likely have maintained
it. The reality is that no one is willing to maintain PF.


At the risk of being extremely rude, although that is not my intention, if
as much effort had been put into improving the functionality of NPF as has
been put into the effort to get pf removed, I dare say, NPF would now be up
to PF in functionality and the conversation would be easier to have.


If the effort hadn't been split on three firewalls in the last 10 years, but
rather had been focused on only one, don't you think NPF would now be
featureful? The fragmented effort that has gone into fixing here and there
crazy bugs in PF and IPF, just because random people (like you) wanted their
outdated firewall not to collapse, is precisely what has brought us to the
current state, that is, three half-baked firewalls, none of which is 100%
functional or featureful.

I'm even talking from experience here; last year I was looking at network-
related PRs, and I found a PR from 2010 where a guy had reported that a
single TCP packet could crash the kernel under PF. It turned out that our
version of PF had a signedness bug that could cause a length check to pass,
which caused a fatal memory corruption. Checking the guy's github, he even
had written in 2010 an exploit for it. So for 8 years, the vuln had been
sitting in our Gnats with a public exploit, and strictly no one giving a fuck
about it. I proceeded to write a reproducer, fix the vuln, then pull it up,
then write a security advisory for it. All of that is an awful waste of time
and energy, for everybody.

Perpetuating this crazy status quo is not going to make the situation any
better, it will just make us continue splitting efforts. If we already manage
to go from three firewalls to only two, it's a good progress, and again, it
reduces the maintenance burden on the kernel.

In the case of PF, I've already said it in the internal discussion (but not
on tech-kern), if we wanted to import a new version, it would likely be better
to remove the one we have, and make a clean import from scratch.

In any case, this proposal (about removing PF) is not illegitimate or stupid,
and it makes sense, given the awful state of PF. If you want to use outdated
firewalls that haven't received security fixes, you can as well follow the
NetBSD-6 branch, or older.


And, once done, all those features would have to be maintained going forward,


No, NPF is homegrown, each feature added doesn't need to be regularly
maintained, contrary to PF.


not to mention that there would be absolutely no testing outside of the
NetBSD environment with NPF.


As a matter of fact this is incorrect, NPF works on Linux too, there have
been tests/development made on CentOS. I also have a vague memory of NPF
being embedded in DPDK. So no, NPF is definitely not NetBSD-specific, and
has received testing outside.



Removing PF

2019-03-29 Thread Maxime Villard

There have been internal discussions about removing PF from NetBSD. Currently,
NetBSD's PF is 11 years old, has received no maintenance, and has accumulated
bugs and vulnerabilities that were fixed upstream but not in NetBSD. The
latest examples are two vulnerabilities recently discovered in PF, that
haven't been fixed in NetBSD's PF by lack of interest.

Importing recent versions of PF in scalable/performant kernels is a huge work
because of PF's legacy design, and there have been reports that FreeBSD is
also considering dropping PF.

Just like other kind of dead wood, NetBSD's PF consumes APIs, makes stuff
harder to change, and has now reached a point where it is lagging behind
upstream way too much to still be considered a functional or secure firewall
on NetBSD.

NetBSD provides NPF, a clean, secure and scalable firewall, enabled by default,
that can be used instead, even if it doesn't have all the features PF has
for now. It is to be noted that IPF too is present in NetBSD, although its
use is not recommended (for other reasons).

Given NPF's advanced design and good integration in the NetBSD kernel, trying
to maintain PF seems like a huge effort for little benefit, and the resources
would be better spent on NPF.

Even if we overcame the effort needed to import a new version of PF, we would
still have to maintain it and regularly synchronize against upstream. Overall,
it is not viable to keep PF, and has already proven not to be in the past,
given the state its code finds itself in today.


Re: Regarding the ULTRIX and OSF1 compats

2019-03-23 Thread Maxime Villard

Le 24/03/2019 à 00:41, Thor Lancelot Simon a écrit :

On Sun, Mar 17, 2019 at 05:41:18AM +0800, Paul Goyette wrote:

On Sat, 16 Mar 2019, Maxime Villard wrote:


Regarding COMPAT_OSF1: I'm not totally sure, but it seems that Alpha's
COMPAT_LINUX uses COMPAT_OSF1 as dependency (even if there is no proper
dependency in the module), because there are osf1_* calls. Some more
compat mess to untangle, it seems...

In all cases, it's only a few functions that are just wrappers, so it
probably shouldn't be too complicated to solve.


It's a total of 15 functions (I generated this list by building an alpha
GENERIC kernel with COMPAT_OSF1 removed):

 osf1_sys_wait4
 osf1_sys_mount


[...]

There's a funny historical reason for this -- Linux on Alpha originally
used OSF/1 syscall numbering and args for its own native syscalls.

No, our implementation need not actually reflect that -- but it's an
interesting historical detail perhaps we should write down somewhere.


Does that mean we don't need these syscalls?


Re: Regarding the ULTRIX and OSF1 compats

2019-03-23 Thread Maxime Villard

Le 16/03/2019 à 22:41, Paul Goyette a écrit :

On Sat, 16 Mar 2019, Maxime Villard wrote:


Regarding COMPAT_OSF1: I'm not totally sure, but it seems that Alpha's
COMPAT_LINUX uses COMPAT_OSF1 as dependency (even if there is no proper
dependency in the module), because there are osf1_* calls. Some more
compat mess to untangle, it seems...

In all cases, it's only a few functions that are just wrappers, so it
probably shouldn't be too complicated to solve.


It's a total of 15 functions (I generated this list by building an alpha 
GENERIC kernel with COMPAT_OSF1 removed):

     osf1_sys_wait4
     osf1_sys_mount
     osf1_sys_set_program_attributes
     osf1_sys_setitimer
     osf1_sys_select
     osf1_sys_gettimeofday
     osf1_sys_getrusage
     osf1_sys_settimeofday
     osf1_sys_utimes
     osf1_sys_statfs
     osf1_sys_fstatfs
     osf1_sys_sysinfo
     osf1_sys_usleep_thread
     osf1_sys_getsysinfo
     osf1_sys_setsysinfo

All of these are references from linux_sysent.o


Here is a patch [1] that untangles the dependency. Cross-compile-tested on
alpha from amd64.

[1] https://m00nbsd.net/garbage/compat/alpha.diff


Re: Regarding the ULTRIX and OSF1 compats

2019-03-16 Thread Maxime Villard

Le 16/03/2019 à 22:41, Paul Goyette a écrit :

On Sat, 16 Mar 2019, Maxime Villard wrote:

Regarding COMPAT_OSF1: I'm not totally sure, but it seems that Alpha's
COMPAT_LINUX uses COMPAT_OSF1 as dependency (even if there is no proper
dependency in the module), because there are osf1_* calls. Some more
compat mess to untangle, it seems...

In all cases, it's only a few functions that are just wrappers, so it
probably shouldn't be too complicated to solve.


It's a total of 15 functions (I generated this list by building an alpha GENERIC 
kernel with COMPAT_OSF1 removed):


     osf1_sys_wait4
     osf1_sys_mount
     osf1_sys_set_program_attributes
     osf1_sys_setitimer
     osf1_sys_select
     osf1_sys_gettimeofday
     osf1_sys_getrusage
     osf1_sys_settimeofday
     osf1_sys_utimes
     osf1_sys_statfs
     osf1_sys_fstatfs
     osf1_sys_sysinfo
     osf1_sys_usleep_thread
     osf1_sys_getsysinfo
     osf1_sys_setsysinfo

All of these are references from linux_sysent.o


Yes. Most of these functions are basic wrappers, that I think we can just
gather into a linux_misc.c or similar.

Do we know if COMPAT_OSF1 has actually ever been supported on HPPA? It is
commented out in GENERIC since rev1, and there is some Alpha-specific code
in osf1_misc.c that could hardly have compiled on HPPA.


Re: Regarding the ULTRIX and OSF1 compats

2019-03-16 Thread Maxime Villard

Le 16/03/2019 à 20:49, m...@netbsd.org a écrit :

It makes me compelled to delete more of it.
COMPAT_LINUX doesn't work without matching PAGE_SIZE.


What does that mean?


Re: Regarding the ULTRIX and OSF1 compats

2019-03-16 Thread Maxime Villard

Le 16/03/2019 à 17:44, Dave McGuire a écrit :

On 3/16/19 11:04 AM, Maxime Villard wrote:

I think that what Robert, and others (including me) argument is
actually that things should not be removed, and the reason would be
that this is the core mission, purpose, reason (or whatever you want
to call it) for NetBSDs existence. Instead it should be fixed, because
that is what it all is about. Make it work - don't remove it.


It seems our disagreement comes down to that, indeed. It seems to me
this has
already been answered to, too, in previous discussions.


   Sigh.  So I see you're at it again.

   You are obviously quite obsessed with chopping out functionality that
you have unilaterally decided is not worth having in NetBSD anymore.


Could you possibly be more dishonest than this? Impressive. Keep up the good
work.


   Just who the hell do you think you are, anyway, and why are you so
obsessed with trashing everything that existed before you graced us with
your presence?


I am one of the very few people that have continuously and extensively worked
on the NetBSD kernel over the last five years, to bring stability, performance
and security improvements in all of the system. I've redesigned a good part of
our x86 port, I've developed all the complex fixes for the speculation bugs,
I've worked on securing our network stack, our syscalls, our allocators, I've
developed comprehensive hypervisor support, the strongest KASLR existing to
date, and apart from that, I've developed numerous bug detection systems that
have fixed several hundreds of bugs in NetBSD, among many other miscellaneous
things I've done in and for NetBSD.

And you, who are you exactly? You sound like a clueless fatass.


   If you have such a problem with the COMPAT_* layers, go fix them.  I
realize that would be more work, and it'd be easier to just trash them,
but that's not what NetBSD is about, and your attitude about this is
most unwelcome, coming from this NetBSD user since v0.9.

   You need a serious adjustment of perspective.  As others have
suggested, please go read about what NetBSD is, its clearly-stated
mission, and its spirit.


As I said, keep up the good work. Anyway you've just lost what was left
of your credibility, your mails go directly to the trash from now on.


Re: pool: removing ioff?

2019-03-16 Thread Maxime Villard

Le 16/03/2019 à 18:00, Greg Troxel a écrit :

Maxime Villard  writes:


I would like to remove the 'ioff' argument from pool_init() and friends,
documented as 'align_offset' in the man page. This parameter allows the
caller to specify that the alignment given in 'align' is to be applied at
the offset 'ioff' within the buffer.

I think we're better-off with hard-aligned structures, ie with __aligned(32)
in the case of XSCALE. Then we just pass align=32 in the pool, and that's it.

I would prefer to avoid any confusion in the pool initializers and drop ioff,
rather than having this kind of marginal and not-well-defined features that
add complexity with no real good reason.

Note also that, as far as I can tell, our policy in the kernel has always
been to hard-align the structures, and then pass the same alignment in the
allocators.


I am not objecting as I can't make a performance/complexity argument.

But, I wonder if this comes from the Solaris allocation design, and that
the ioff notion is not about alignment for 4/8 objects to fit the way
the CPU wants, but for say 128 byte objects to be lined up on various
different offsets in different pages to make caching work better.  But
perhaps that doesn't exist in NetBSD, or is done differently, or my
memory of the paper is off.


Indeed, Sun's SLAB had cache coloring.

We do too in our pool subsystem, that's not related to ioff, we don't
lose it as a result of removing ioff.


Re: Regarding the ULTRIX and OSF1 compats

2019-03-16 Thread Maxime Villard

Le 16/03/2019 à 17:00, Jason Thorpe a écrit :

On Mar 16, 2019, at 5:09 AM, m...@netbsd.org wrote:

Most likely, COMPAT_ULTRIX and COMPAT_OSF1 have the same type of bugs
that we have seen in compatibility layers elsewhere.
Is it worth his time to test them?


Folks, PLEASE.  This is a point I've tried to make repeatedly...

These need to be taken on a case-by-case basis.

[...]


Yes... -_-

Anyway, people, let's wrap it up. The status right now is that I've disabled
COMPAT_OSF1 on Alpha, and I'm not sure if there is an improper dependency of
COMPAT_LINUX on COMPAT_OSF1, because I see calls like osf1_sys_wait4 in
sys/compat/linux/arch/alpha/, but there is no dependency in the modules.

Right now I'm waiting to see if the build cluster fails, to know for sure.
If there is indeed a dependency, we'll have to untangle it before retiring
COMPAT_OSF1. I don't think it is really complicated, the functions used are
just wrappers, that we can easily move into Alpha's COMPAT_LINUX.


Re: Regarding the ULTRIX and OSF1 compats

2019-03-16 Thread Maxime Villard

Le 16/03/2019 à 16:12, Robert Elz a écrit :

 Date:Sat, 16 Mar 2019 14:28:27 +0100
 From:Maxime Villard 
 Message-ID:  <17ba7752-793d-d352-09ef-c43676d2f...@m00nbsd.net>

   | Ok. So you believe that dead wood should hold back all of the development
   | process?

No, but only if it truly is dead.   Just because you don't see the
utility does not make it so.

   | I believe you misread,

You're right, I did, I just skimmed the patch you made, and did not
look carefully enough.   But the point is unchanged.   There was a bug.
When it was reported it was fixed.   That's what's supposed to happen,
and isn't evidence of anything being wrong.

   | As I said, this is one example among others.

Yes, so you keep saying.   But those others are in one of three
groups.   Either they're like this one, and have now been fixed,
in which case they're irrelevant.   Or they have been reported,
and no-one is fixing them (which can happen for many reasons) in
which case you should just supply the PR numbers for all of these
problems, that might stir some activity around some of them, or it
might not.   Or you know of bugs that no-one else knows about, and
you're just sitting on.   That would be all on you.

   | And yes, this bug was introduced because the API was being replaced,
   | and yes, the person who committed this code likely did all he could
   | to make sure no bug was introduced, and yes, he did still add a bug
   | because a certain number of these compat layers are dead wood that
   | can't be tested, not even by those who claim they are so important.

But this one (the one above) is in code that is apparently not dead
wood, and that people do use.   But that people use code does not mean
that every bug is going to be encountered - bugs often stay undetected for
a very long time, just because no-one happens to do whatever it takes
to trigger them.

   | In fact, all of your "rules" were respected, yet the bug still inevitably
   | came in. Why is that?

Because bugs happen.ALways have, always will.   The only way to
avoid bugs in NetBSD would be to shut it down.   Delete everything.
Then it would have no remaining bugs.   It wouldn't be useful though.
But it would end this discussion!


   | This is what happens with dead wood, you just don't want to face it.

But, apparently, COMPAT_ULTRIX isn't dead woood, and people still use it.
And that's what the one bug (now fixed bug) you have cited, that I have
seen, was in.

   | And I'm not even speaking for myself, you just have to see the answers to
   | "does anyone use compat_xxx". You can even try as hard as I did, with
   | COMPAT_SVR4, ask on several different mailing lists, repeatedly, over and
   | over. No one ever answers.

And most likely no-one ever will.   You cannot get answers that way,
the people on our mailing lists mostly only care about this week's (or
this year's anyway) code, and wouldn't consider running anything that
didn't come from available sources.

But NetBSD (and all the other systems) are used by more people that that.
I know of several NetBSD users who would never go anywhere near any of
our mailing lists ... the closest they ever come is looking at the web
site from time to time to see if there's been a new release.

The only way to find out whether something is being used is to remove
it from the default config, but leave it relatively easy to turn back
on again (and ideally ask - where it is to be enabled - for anyone who
needs it, and enables it, to send e-mail to x...@netbsd.org)

That was done with the removal of the ISA graphics card support (I mean,
who uses ancient PCs, new ones are cheap, and much better...) but then
it turned out that someone still did.

   | If UVM was of questionable utility,

Everything is of questionable utility.   We used to not have UVM
so we know it is possible to survive without it.

Note I am not questioning keeping that, just your certainty that you
know everything that is required, and useful, and that everything else
is "dead wood" and should be deleted.

   | I stated my point clearly and logically, about why certain things have
   | legitimate reasons to go away,

Sorry, I must have missed that.   All I ever seem to see is that xxx is
unmaintained and full of unspecified bugs, and that obviously no-one cares,
and so we should delete it.That's not an argument for anything.

Also note I am not arguing for keeping anything in particular, just
against your reasons for deleting it.   If some functionality really
is of no further use (and some of the drivers that were recently removed
seem to be in that category) then by all means, we should remove it.

But that you don't use it, and no-one on the mailing lists admits to
using it, does not mean no-one uses it.   You need to move much more
slowly.

I'd suggest perhap

Re: Regarding the ULTRIX and OSF1 compats

2019-03-16 Thread Maxime Villard

Le 16/03/2019 à 14:43, Johnny Billquist a écrit :

On 2019-03-16 14:28, Maxime Villard wrote:

I stated my point clearly and logically, about why certain things have
legitimate reasons to go away, regardless of whether they are compat layers,
or drivers, or something else. Rather than giving clear, logical counter
arguments, you are just repeating "XXX is YYY years old, remove it".

You are going to have to provide better arguments I'm afraid, because you're
not going to convince _*anyone*_ with that.


I think that what Robert, and others (including me) argument is actually that 
things should not be removed, and the reason would be that this is the core 
mission, purpose, reason (or whatever you want to call it) for NetBSDs 
existence. Instead it should be fixed, because that is what it all is about. 
Make it work - don't remove it.


It seems our disagreement comes down to that, indeed. It seems to me this has
already been answered to, too, in previous discussions.

"Make it work" -> No one has volunteered to make it work. No one is maintaining
it, no one is actively using it, no one is testing it. It's simple, no one is
here to "make it work".

"Don't remove it" -> If you have a suggestion on what to do, feel free to tell
me about it. As was said numerous times, dead wood consumes APIs and makes
stuff harder to change; the dead wood naturally deprecates even more because we
all make blind changes. It then reaches a state where it's broken beyond
repair, and there's just nothing else to do than retiring it. It may not be the
case for OSF1 right now, but it was certainly the case for SVR4, and it will
inevitably be the case for the few esoteric compat layers we still have.

It's good to think what you think, but you need to face reality. The reality is
that no one wants to take care of our compat layers (compat_linux being an
exception, maybe). No one has taken care of that in the last 10 years either.
Compat layers may have been a USP 20+ years ago, but a good number of them are
in a deplorable state nowadays, in addition to being of a questionable utility,
to say the least.

And don't even try to say "uh but this is YOUR reality and not mine", it would
just be ridiculous, these things have been discussed many times, and it's no
rocket science.


Re: Regarding the ULTRIX and OSF1 compats

2019-03-16 Thread Maxime Villard

Le 16/03/2019 à 13:53, Robert Elz a écrit :

Whoever is changing them should be fixing all the users of those APIs,
including the ones in the compat code.   Consider all the work PaulG
did as part of the kenel module changes recently -- that's an example of
how it should be done.   Simply deciding an API should alter, and changing
the few places that you care about is not good enough.   (generic "you" not
personal there).
[... and the rest ...]


Ok. So you believe that dead wood should hold back all of the development
process? See below:


   | I've already given this example (and it is one among many others):
   |https://mail-index.netbsd.org/source-changes/2017/12/03/msg090179.html

That's not an example of anything useful.   There was a bug.   It was
(apparently) reported.   It was fixed (you fixed it!)

That it hadn't been reported for a long time merely meant that no-one
had noticed it.   In that case, that's not surprising, the bug was when
copyout() failed, which in practice is a very rare event - as it normally
requires bugs (and often really dumb bugs) in the application as well.


I believe you misread, 'iter' was not initialized, there is a goto earlier,
this could have been an exploitable vulnerability. As I said, this is one
example among others.

And yes, this bug was introduced because the API was being replaced, and yes,
the person who committed this code likely did all he could to make sure no
bug was introduced, and yes, he did still add a bug because a certain number
of these compat layers are dead wood that can't be tested, not even by those
who claim they are so important.

In fact, all of your "rules" were respected, yet the bug still inevitably
came in. Why is that? This is what happens with dead wood, you just don't
want to face it. The reality is that very, very, very few (if any) people
care about our esoteric compat layers; no one is testing them, no one is
actively using them, no one even ever reads their code.

And I'm not even speaking for myself, you just have to see the answers to
"does anyone use compat_xxx". You can even try as hard as I did, with
COMPAT_SVR4, ask on several different mailing lists, repeatedly, over and
over. No one ever answers.


A similar example would be the mlock(addr, 0) bug that was fixed the
other day.   That had also been there for ages (perhaps since UVM was
added).  Should the existence of that bug for such a long time be a
justification for removing UVM ?


If UVM was of questionable utility, yes, absolutely. Except it isn't. The
compat layers we've talked about (SVR4, IBCS2, now OSF1) _are_ of questionable
utility, as discussed.

I stated my point clearly and logically, about why certain things have
legitimate reasons to go away, regardless of whether they are compat layers,
or drivers, or something else. Rather than giving clear, logical counter
arguments, you are just repeating "XXX is YYY years old, remove it".

You are going to have to provide better arguments I'm afraid, because you're
not going to convince _*anyone*_ with that.


Re: Regarding the ULTRIX and OSF1 compats

2019-03-16 Thread Maxime Villard

Le 16/03/2019 à 12:05, Johnny Billquist a écrit :

I don't see any confusion.

My question was about what NetBSD is about.

If we are serious about being interoperable, implementing many standard APIs,
and care about different platforms, then buggy or broken code should be fixed.
Or at worst left for someone else to fix later.

If we don't care about the interoperability, then indeed, we delete the code.


This misses absolutely 100% of the points that were given in the many
discussions that have been had on this topic. Please re-read the threads.

In short, there are things that are a burden in terms of maintenance, and that
no one is willing to fix, because no one is interested in doing so. These
things have legitimate reasons to go away. Similarly there are things that are
hindrances to overall design improvement, like MPification; and when such
things have questionable utility and receive zero maintenance, they also have
legitimate reasons to go away.

This has been said very clearly multiple times.


If that's your definition of interoperable, then indeed, we used to be - and
still are a bit, actually - a very, very interoperable system.


Do you mean that deleting the code is your definition of interoperable?


You are confused.


Re: Regarding the ULTRIX and OSF1 compats

2019-03-16 Thread Maxime Villard

Le 16/03/2019 à 10:12, Johnny Billquist a écrit :

On 2019-03-16 09:45, Maxime Villard wrote:

Le 16/03/2019 à 06:23, John Nemeth a écrit :

On Mar 15, 10:31pm, Michael Kronsteiner wrote:
}
} i have this discussion today aswell... considering 64/32bit machines.
} if you want ultrix, install ultrix. if you want osf1/dec unix/tru64
} install that. being able to run ummm nearly 20 year old binaries...
} well. if thats what you want be prepared for a ride. i never ran
} "foreign" binaries on a BSD. and i often compile myself even on more
} "user friendly" systems.

  By any chance, have you seen our About page:
http://www.netbsd.org/about/ ?  The second paragraph reads thus:

-
One of the primary focuses of the NetBSD project has been to make
the base OS highly portable. This has resulted in NetBSD being
ported to a large number of hardware platforms. NetBSD is also
interoperable, implementing many standard APIs and network protocols,
and emulating many other systems' ABIs.
-

Emulating other systems is fundamental to what NetBSD is about.


This is a really simplistic answer. It is not difficult to see that our
website does not reflect reality at all.


So, what is the reality then, in your opinion?


The reality is that our compat layers are largely unmaintained and broken.
One would have to be very dumb to believe that this constitutes a USP.


Re: Regarding the ULTRIX and OSF1 compats

2019-03-16 Thread Maxime Villard

Le 16/03/2019 à 12:17, Robert Elz a écrit :

If there are bug reports that are not being attended to (open PRs),
that's different.  Otherwise unmaintained code is simply code that
doesn't need changes (which for emulation of ancient systems is not
a huge surprise - those systems aren't changing any more, if the
emulation is adequate for its purpose - which does not mean it has
to be complete - then there is no reason for it to ever change again.)


I understand this point. But to me it is deeply wrong: the compat layers use
system APIs, and these APIs do change regularly. You can't change the VFS
locking without changing the implementations of each compat layer; yet indeed
the emulated systems aren't changing anymore.

I've already given this example (and it is one among many others):

https://mail-index.netbsd.org/source-changes/2017/12/03/msg090179.html

From my point of view (of someone who did want to change the mbuf API for
instance, and who realized it was just impossible given the ton of dead wood
that uses it), in these cases, we are forced to take the time to make blind
changes which may be wrong. It costs us the effort, and makes us add bugs.

If the situation was as you described, I wouldn't mind. But to me, the old
unmaintained code is a significant barrier to many improvements; so when the
code in question has a questionable utility or is even known to be broken,
there need to be talks about removing it.

Hence my emails.

Also, this thread seems to have diverted towards "interoperability", but I
don't see what is the debate here. Dead wood is dead wood, regardless of
whether it is a compat layer, or a driver, or a network protocol, or even a
complete architecture (as has been done in the past already). We can disagree
on the deadness of it, but I don't see the debate about interoperability.

And as I said, I believe it is futile to talk about interoperability in the
case of broken features (eg SVR4); the thing was completely broken, so it
didn't make NetBSD more interoperable. Maybe the only thing it did was making
NetBSD more buggy.


Re: Regarding the ULTRIX and OSF1 compats

2019-03-16 Thread Maxime Villard

Le 16/03/2019 à 11:26, Johnny Billquist a écrit :

If the answer is that we remove the code, then indeed, the whole webpage is
incorrect, and we should change it to state that we do not try to be
interoperable, implementing many standard APIs, or care about other platforms.


There seems to be some confusion here. Being "interoperable" doesn't mean
"keeping unmaintained code that is so broken it can't even emulate a dumbass
mmap". (Thinking about SVR4 here.)

If that's your definition of interoperable, then indeed, we used to be - and
still are a bit, actually - a very, very interoperable system.

Regarding the web site, if my memory is correct, not too long ago we did change
the page to put less emphasis on the compat layers, but the changes were backed
out because one person wanted to keep the old version; if my memory is correct,
this person was John Nemeth.


Re: Regarding the ULTRIX and OSF1 compats

2019-03-16 Thread Maxime Villard

Le 16/03/2019 à 06:23, John Nemeth a écrit :

On Mar 15, 10:31pm, Michael Kronsteiner wrote:
}
} i have this discussion today aswell... considering 64/32bit machines.
} if you want ultrix, install ultrix. if you want osf1/dec unix/tru64
} install that. being able to run ummm nearly 20 year old binaries...
} well. if thats what you want be prepared for a ride. i never ran
} "foreign" binaries on a BSD. and i often compile myself even on more
} "user friendly" systems.

  By any chance, have you seen our About page:
http://www.netbsd.org/about/ ?  The second paragraph reads thus:

-
One of the primary focuses of the NetBSD project has been to make
the base OS highly portable. This has resulted in NetBSD being
ported to a large number of hardware platforms. NetBSD is also
interoperable, implementing many standard APIs and network protocols,
and emulating many other systems' ABIs.
-

Emulating other systems is fundamental to what NetBSD is about.


This is a really simplistic answer. It is not difficult to see that our
website does not reflect reality at all.

Regarding COMPAT_OSF1: I'm not totally sure, but it seems that Alpha's
COMPAT_LINUX uses COMPAT_OSF1 as dependency (even if there is no proper
dependency in the module), because there are osf1_* calls. Some more
compat mess to untangle, it seems...

In all cases, it's only a few functions that are just wrappers, so it
probably shouldn't be too complicated to solve.


pool: removing ioff?

2019-03-16 Thread Maxime Villard

I would like to remove the 'ioff' argument from pool_init() and friends,
documented as 'align_offset' in the man page. This parameter allows the
caller to specify that the alignment given in 'align' is to be applied at
the offset 'ioff' within the buffer.

The two users I could find are:

 * sys/arch/arm/xscale/iopaau.c: iopaau_desc_4_cache and iopaau_desc_8_cache.

 * sys/arch/arm/arm32/pmap.c: pmap_l2ptp_cache, at a time, and it was actually
   a bug, the arguments were inverted...

I think we're better-off with hard-aligned structures, ie with __aligned(32)
in the case of XSCALE. Then we just pass align=32 in the pool, and that's it.

I would prefer to avoid any confusion in the pool initializers and drop ioff,
rather than having this kind of marginal and not-well-defined features that
add complexity with no real good reason.

Note also that, as far as I can tell, our policy in the kernel has always
been to hard-align the structures, and then pass the same alignment in the
allocators.


Re: Regarding the ULTRIX and OSF1 compats

2019-03-15 Thread Maxime Villard

Le 10/03/2019 à 17:39, Maxime Villard a écrit :

There's been more emphasis on the fact that COMPAT_OSF1 should go. Do people
have anything to say about that?


So, no one? I will remove it soon...


Re: Regarding the ULTRIX and OSF1 compats

2019-03-10 Thread Maxime Villard

Please read _*all*_ the answers, because I've already said that I was fine
with keeping COMPAT_ULTRIX. If someone has been able to even run a full
ULTRIX userland, then it's probably functional enough, and we can keep it
for the time being. Case closed.


  1   2   3   4   5   >