Re: Potential new syscall

2018-04-03 Thread Kamil Rytarowski
On 03.04.2018 16:57, Mouse wrote:
>> =46rom the GDB protocol point of view,
> ...what does gdb have to do with it?  Did I miss something?
> 

We need to track forks and its variations. Just a note that from the
existing protocol point of view, we can handle this new syscall.



signature.asc
Description: OpenPGP digital signature


Re: Potential new syscall

2018-04-03 Thread Mouse
> =46rom the GDB protocol point of view,

...what does gdb have to do with it?  Did I miss something?

> I think there is needed prior verification of its stability and
> benchmarking before the final decision.

I would expect such work to be done before it goes into the main NetBSD
tree.  What I have is a proof-of-concept implementation and, for anyone
willing to run by 5.2 variant, or willing to port what I've done to
stock 5.2 (which would probably be easy), or port what I have to
-current (which I can only speculate about), it can provide something
to test and benchmark.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: Potential new syscall

2018-04-03 Thread Mouse
>> [...] - "just use fork" is a very common response, but no matter how
>> fork gets implemented, vfork() when used correctly always performs
>> better by huge margins.
> But most of those cases are handled just as well by posix_spawn.

Possibly - but most of a system's operation is handled perfectly well
by no more than a few dozen syscalls.  Is that a reason to get rid of
the rest?

If you want, sure, use posix_spawn when it's applicable.  But it's also
nice to have something that can handle the cases where it _isn't_
applicable - which is, in a sense, what fork() is for, but it's also
nice to not cripple performance unnecessarily.  And, in my case, the
only easy answer was to make vfork() equivalent to fork() in the
_emulated_ system, which I consider a last-ditch fallback.  The new
syscall is almost as easy (for me) and much closer to correct.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: Potential new syscall

2018-04-03 Thread Mouse
>> Basically, what I want is a syscall that a vfork()ed child can call
>> to have the unsharing effects of execve(2) or _exit(2) [...]
> I have considered (and think I mentioned on some list some time ago)
> the same capability to improve sh performance.
> [...]
> Having the mechanism available for testing (even if it was not
> committed to the standard NetBSD sources (yet?))

Certainly not yet; at present, it exists on only one machine.  I don't
run -current, so it's not appropriate for me to try to put it into
-current, but I would be fine with someone else doing so.

The "one machine" that has it now is running (my evolution of) 5.2.  On
my morning commute today I tested it, and it works, for very
rudimentary smoke-test values of "works".

> Kamil - "just use fork" is a very common response, but no matter how
> fork gets implemented, vfork() when used correctly always performs
> better by huge margins.

Which of course is why vfork exists at all. :-)

> You are of course correct that there is a very limited set of
> functions possible in a vfork()'d child -

I disagree.  The set of functions usable in a vfork()ed child is
actually quite wide on most systems - but you have to know a good deal
about the implementation of vfork() and the functions in question to
know which ones are safe and why, and how to safely use the ones you
can.  (For example, on the 5.2 I'm working on, I can printf() from the
child just fine, provided I fflush() at suitable times so that stdio's
internals don't get confused.)  The set of functions you can use
narrows as you care about wider and wider portability, to the point
where, if you're trying to be portable to anything POSIX, vfork() is
basically useless (because you can't do any of the usual post-fork
pre-exec prep).

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: Potential new syscall

2018-04-03 Thread Mouse
>> Basically, what I want is a syscall that a vfork()ed child can call
>> to have the unsharing effects of execve(2) or _exit(2) (return the
>> vmspace to the parent and let the it continue), while the child
>> carries on with a clone of the vmspace [...]
> That sounds suspiciously like Linux's unshare(2) call, with the
> CLONE_VM option.

Yes, except that (based on reading the unshare(2) manpage on a work
machine) unshare(CLONE_VM) doesn't have the "let the parent continue"
semantic that my putative syscall does.  What I want could perhaps be
called unshare(CLONE_VFORK) (which doesn't seem to exist).

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: Potential new syscall

2018-04-03 Thread Joerg Sonnenberger
On Tue, Apr 03, 2018 at 09:08:15AM +0700, Robert Elz wrote:
> Kamil - "just use fork" is a very common response, but no matter how
> fork gets implemented, vfork() when used correctly always performs
> better by huge margins.

But most of those cases are handled just as well by posix_spawn. Which
doesn't have any of the thread-safety issues that vfork has.

Joerg


re: Fixing excessive shootdowns during FFS read (kern/53124) on x86 - emap, direct map

2018-04-03 Thread matthew green
> 4GB of KVA; and in addition pmap_kernel will consume some KVA too. i386 for
> example has only 4GB of ram and 4GB of KVA, so we'll just never add a direct
> map there.

note that it changes your direct map point, but, i386 GENERIC_PAE
works fine for at least 16GB ram.  it should work upto 64GB.

actually, it only strengthens your direct map point, since it's
even harder to fit 64GB phys into 4GB virt :-)

> As opposed to that, emaps can be implemented everywhere with no constraint on
> the arch. I think they are better than the direct map for uvm_bio.

IIRC, the main reason we stopped using emap is that there were
performance issues.  rmind?


.mrg.


Re: Fixing excessive shootdowns during FFS read (kern/53124) on x86 - emap, direct map

2018-04-03 Thread Maxime Villard

Le 02/04/2018 à 21:28, Jaromír Doleček a écrit :

2018-03-31 13:42 GMT+02:00 Jaromír Doleček :

2018-03-25 17:27 GMT+02:00 Joerg Sonnenberger :

Yeah, that's what ephemeral mappings where supposed to be for. The other
question is whether we can't just use the direct map for this on amd64
and similar platforms?


Right, we could/should use emap. I haven't realized emap is actually already
implemented. It's currently used for pipe for the loan/"direct" write.

I don't know anything about emap thought. Are there any known issues,
do you reckon it's ready to be used for general I/O handling?


Okay, so I've hacked to gether a patch to switch uvm_bio.c to ephemeral
mapping:

http://www.netbsd.org/~jdolecek/uvm_bio_emap.diff


-   pmap_emap_enter(va, pa, VM_PROT_READ);
+   pmap_emap_enter(va, pa, VM_PROT_READ | VM_PROT_WRITE);

Mmh no, sys_pipe wanted it read-only, we shouldn't make it writable by default.
Adding a prot argument to uvm_emap_enter would be better.


Looking at the state of usage though, the emap is only used for disabled code
path for sys_pipe and nowhere else. That code had several on-and-off passes
for being enabled in 2009, and no further use since then. Doesn't give too much
confidence.

The only port actually having optimization for emap is x86. Since amd64
is also the only one supporting direct map, we are really at liberty to pick
either one. I'd lean towards direct map, since that doesn't require
adding/removing any mapping in pmap_kernel() at all. From looking on the code,
I gather direct map is quite easy to implement for other archs like
sparc64. I'd say significantly easier than adding the necessary emap hooks
into MD pmaps.


There is a good number of architectures where implementing a direct map is not
possible, because of KVA consumption. With a direct map we consume more than
once the physical space. If you have 4GB of ram, the direct map will consume
4GB of KVA; and in addition pmap_kernel will consume some KVA too. i386 for
example has only 4GB of ram and 4GB of KVA, so we'll just never add a direct
map there.

Direct maps are good when the architecture has much, much more KVA than it has
physical space.

I saw some low-KVA architectures have a "partial direct map", where only a
(tiny) area of the physical space is direct-mapped. There, we would have to
adapt uvm_bio to use pmap_kernel instead, which seems ugly.

As opposed to that, emaps can be implemented everywhere with no constraint on
the arch. I think they are better than the direct map for uvm_bio.