Re: makesyscalls

2020-06-09 Thread Thor Lancelot Simon
Python is essentially uncrosscompilable and its maintainers have repeatedly
rudely rejected efforts to make it so.

If that weren't the case, and the way installed Python modules were "managed"
(I use the term liberally) were made sane, I'd think it were a fine thing to
use in base.  But it is the case, and that won't be made sane, and so I think
it belongs nowhere near NetBSD.  Could you translate your prototype into a
different language, one that's less basically hostile to our build system
and its goals and benefits?



re: makesyscalls

2020-06-09 Thread matthew green
i'm not very interested in a solution that doesn't use tools
available to the build.  you've not shown that there is
sufficient pain here to force an external solution.  i'm not
sure i buy your claims about awk and size of program.  IME,
it just requires that one is strict to rules.

if you want to use python for creating files in the tree
for _our_ things, then i think you have to propose adding
python to src/tools.  [*]

why aren't you willing to discuss a lua version?  it has
most of the features you complained awk is missing, and would
make it relatively easy to unit-test the components easily.


.mrg.


[*] i support this generally, and as a non-public visible
library that our gdb can use as well.  i do not think we
should add it to /usr/bin (but i might be convinceable that
it can be installed in a non-standard location, using non-
standard paths for libararies, as long as it does not have
anything to do with or interfere with pkgsrc or other ways
of using python.)


Re: makesyscalls

2020-06-09 Thread Kamil Rytarowski
On 10.06.2020 01:13, David Holland wrote:
> The question is: do we want the Python version in the tree now

For this, I would say "NO", at least as long Python is out of base and
IMO it shall not be there.

But it is fine to put into othersrc/.

On 10.06.2020 01:13, David Holland wrote:
> Rewriting in C is a possible future step. The code generator I have in
> mind going forward should not be done in Python. But again, more on
> that later.

I would like to have mksyscalls (and at some point makeioctls) much more
flexible and as a tool scriptable. I had to iterate a dozen of times
over all our syscalls in various fuzzers, sanitizers, debuggers, tracers
etc.

Something that is very needed is knowing the full serialized struct
passed as a pointer to each syscall. It's a lot of work to teach the
tool about it, but it could be finally centralized and time saved of
repeatedly teaching all the other programs about this property of syscalls.



signature.asc
Description: OpenPGP digital signature


makesyscalls

2020-06-09 Thread David Holland
Between various forms of prompting I got tired of waiting for a gsoc
student to take up code generation for syscall argument handling. We
should really be generating all the calls to copyin/copyout, not to
mention all the compat32 and compat_* glue, not maintaining them by
hand. More on that later.

In the meantime the first step on this was to assess the current
makesyscalls.sh and decide whether to (a) leave it alone as a separate
mechanism, (b) integrate it into the new stuff, or (c) ignore it and
write new stuff from scratch.

My conclusion after wading into it is that between additions for rump,
dtrace, and other things it's become quite a bit too large for an awk
script. (Experience suggests that the limit is about 1000 lines,
depending on how well structured the script is and what it's doing.
Beyond this point, between there being no variable declarations,
function arguments being unchcked, and other properties of awk, it
becomes difficult to modify the script safely and alterations made
regardless tend to resort to cutpaste and make the situation worse.
This script is 1200 lines and it's definitely showing signs of
deterioration.)

This means that regarding (a) it shouldn't be left alone because it's
itself becoming a problem. Meanwhile, regarding (c), it does too many
things and has too many tentacles to be safely ignored or easily
reimplemented. Consequently, as a step towards (b) I have translated
it into Python; Python is still untyped but is substantially more
robust than awk and has a decidedly larger size limit.

The question is: do we want the Python version in the tree now, or
should I just treat it as an intermediate development prototype that
nobody has to see?

Arguments pro:
   - this is definitely a step forward for maintainability
   - having it in the tree means other people can work on it readily
   - having it in the tree means it's there even if my future plans
 get derailed or don't materialize

Arguments con:
   - it's Python and we don't have Python in base (and don't
 particularly want it)
   - Python being Python all the code paths I haven't managed to test
 by the time it's committed will probably crash the first time
 they're reached
   - definitely a risk of having introduced bugs (also found some, but
 so far all of them have been cosmetic things that awk doesn't
 trap on, like passing extra arguments to printf)

Arguments I'm not interested in listening to:
   - it's Python and Python is a terrible language (might be so but
 see below)
   - it should be Lua (don't see any reason Lua beats awk for this)
   - sh/awk is fine for this and you're doing it wrong (no I'm not, I
 think I have as much experience using sh and awk on nontrivial
 things as anyone)

There are not many languages to choose from: in base we have sh, awk,
lua, and C (or C++), and of languages not in base I think Python is
the only one sufficiently ubiquitous to justify using it for important
build infrastructure. Well, maybe Perl is, but I'm trying to make the
script _more_ maintainable :-)

Note before flaming that it doesn't actually run during builds, only
when someone regenerates the outputs after changing syscalls.master.
Needing to have Python installed for this is not a showstopper.

Rewriting in C is a possible future step. The code generator I have in
mind going forward should not be done in Python. But again, more on
that later.

-- 
David A. Holland
dholl...@netbsd.org


Re: TSC improvement

2020-06-09 Thread Joerg Sonnenberger
On Tue, Jun 09, 2020 at 05:16:27PM +, Taylor R Campbell wrote:
> It's great to see improvements to our calibration of the TSC (and I
> tend to agree that cpu_counter should be serializing, so that, e.g.,
> cpu_counter(); ...; cpu_counter() reliably measures time taken in the
> ellipsis).

I'm pretty sure we want to have both variants.

Joerg


Re: pg_jobc going negative?

2020-06-09 Thread Christos Zoulas
The FreeBSD refactoring LGTM. It also simplifies the code.

christos

> On Jun 9, 2020, at 2:11 PM, Robert Elz  wrote:
> 
>Date:Tue, 9 Jun 2020 17:04:54 +0200
>From:Kamil Rytarowski 
>Message-ID:  
> 
> 
>  | Yes... syzkaller had like 12 different ways to reproduce it.
> 
> OK, thanks.
> 
>  | There is still a race and we randomly go to negative pg_jobc.
> 
> I am not at all surprised...
> 
> I will look at it over the next couple of days.   No guarantees...
> 
> kre



signature.asc
Description: Message signed with OpenPGP


Re: pg_jobc going negative?

2020-06-09 Thread Robert Elz
Date:Tue, 9 Jun 2020 17:04:54 +0200
From:Kamil Rytarowski 
Message-ID:  


  | Yes... syzkaller had like 12 different ways to reproduce it.

OK, thanks.

  | There is still a race and we randomly go to negative pg_jobc.

I am not at all surprised...

I will look at it over the next couple of days.   No guarantees...

kre



Re: TSC improvement

2020-06-09 Thread Taylor R Campbell
It's great to see improvements to our calibration of the TSC (and I
tend to agree that cpu_counter should be serializing, so that, e.g.,
cpu_counter(); ...; cpu_counter() reliably measures time taken in the
ellipsis).

At the same time, I wonder whether we should _also_:

1. Modify the tsc timecounter so that it uses a global atomic to
   ensure that there is a global view of time as counted by the tsc.
   This is what the timecounter(9) API per se expects of timecounters,
   and right now tsc (along with various other per-CPU cycle counters)
   fails to guarantee that.

2. Introduce an API for a local timecounter -- a per-CPU timecounter
   that never goes backwards on a single CPU, but:
   (a) measures units of wall clock time, unlike cpu_counter();
   (b) need not be synchronized between CPUs; and
   (c) may be cheaper to read than a global timecounter.
   We could then use that for, e.g., rusage calculations.



TSC improvement

2020-06-09 Thread SAITOH Masanobu

Hi, all.

 I wrote a code to improve x86's TSC.x86/tsc.c rev. 1.67 reduced cache
problem, but it still has room. I measured the effect of lfence, mfence,
cpuid and rdtscp. The impact to TSC skew and/or drift is:

AMD:   mfence > rdtscp > cpuid > lfence-serialize > lfence = nomodify
Intel: lfence > rdtscp > cpuid > nomodify

So, mfence is the best on AMD and lfence is the best on Intel. If it has no
SSE2, we can use cpuid. The logs of TSC calibration
(from "dmesg -t |grep TSC") is at:

http://www.netbsd.org/~msaitoh/tsc/tsc-20200605log.tgz

Diff is at:

http://www.netbsd.org/~msaitoh/tsc/tsc-20200609-0.dif

In this diff, cpu_counter*(those functions are MI API) uses serializing.
We can provide both cpu_counter() and cpu_counter_serializing(). I think
almost all usecases requires serializing on x86. For RNG, serializing is
not required but the overhead is trivial. I think users who use
cpu_counter() use the function to get time difference precisely. The
serializing is required to get precised value, so I think just adding it
to x86's cpu_counter() is enough. If it's acceptable, I'd like to commit
this change.

 Any comments/advice are welcomed.


NOTE:
  - An AMD's document says DE_CFG_LFENCE_SERIALIZE bit can be used for
serializing, but it's not so good.
  - On Intel i386(not amd64), it seems the improvement is very little.
  - rdtscp instruct can be used as serializing instruction + rdtsc, but
it's not good as [lm]fence. Both Intel and AMD's document say that
the latency of rdtscp is bigger than rdtsc, so I suspect the difference
of the result comes from it.

And, thanks ad@, knakahara@, nonaka@ and some other people to help this work.

--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: pg_jobc going negative?

2020-06-09 Thread Kamil Rytarowski
On 09.06.2020 16:35, Robert Elz wrote:
> Date:Tue, 9 Jun 2020 14:13:56 +0200
> From:Kamil Rytarowski 
> Message-ID:  <85d5e51f-afd1-1038-fd68-2366ff073...@netbsd.org>
> 
>   | Here is the simplest reproducer crashing the kernel on negative pg_jobc:
> 
> I have not looked at this closely yet, but this is likely because
> ptrace() fiddles p_pptr which the routines that manipulate the pg_jobc
> more or less expect to be a constant.
> 
> Is there any known reproducer of this problem which does not involve ptrace() 
> ?
> 

Yes... syzkaller had like 12 different ways to reproduce it.

As far as I can tell, in the syzkaller case they all are about races.

One of them is here:

https://syzkaller.appspot.com/text?tag=ReproC=128060f610

After adding the asserts, all look similar to me: forking + setpgid(0, 0).

> At first glance, the manipulations of pg_jobc looks a bit dodgy to me, but I 
> haven't investigated enough to be able to spot a definite problem yet
> (possible ptrace() generated issue aside - and yes, those need to work as
> well).
> 
> I doubt very much that adding a new mutex will make a difference, all the
> manipulations are done with proc_lock held, which is kind of the "big lock"
> for process manipulation - adding finer grained locking might improve
> performance, by improving concurrency, but is unlikely (at this stage,
> nothing is impossible) to be a fix for this problem.
> 

There is still a race and we randomly go to negative pg_jobc.

> kre
> 



Re: pg_jobc going negative?

2020-06-09 Thread Robert Elz
Date:Tue, 9 Jun 2020 14:13:56 +0200
From:Kamil Rytarowski 
Message-ID:  <85d5e51f-afd1-1038-fd68-2366ff073...@netbsd.org>

  | Here is the simplest reproducer crashing the kernel on negative pg_jobc:

I have not looked at this closely yet, but this is likely because
ptrace() fiddles p_pptr which the routines that manipulate the pg_jobc
more or less expect to be a constant.

Is there any known reproducer of this problem which does not involve ptrace() ?

At first glance, the manipulations of pg_jobc looks a bit dodgy to me, but I 
haven't investigated enough to be able to spot a definite problem yet
(possible ptrace() generated issue aside - and yes, those need to work as
well).

I doubt very much that adding a new mutex will make a difference, all the
manipulations are done with proc_lock held, which is kind of the "big lock"
for process manipulation - adding finer grained locking might improve
performance, by improving concurrency, but is unlikely (at this stage,
nothing is impossible) to be a fix for this problem.

kre



Re: pg_jobc going negative?

2020-06-09 Thread Kamil Rytarowski
On 09.06.2020 08:38, Maxime Villard wrote:
>> Should we allow pg_jobc going negative?
> 
> I don't think so, the code is not designed to expect negative values.

Here is the simplest reproducer crashing the kernel on negative pg_jobc:

http://netbsd.org/~kamil/ptrace/pg_jobc-crash.c

On 09.06.2020 08:38, Maxime Villard wrote:
> In short, (1) my understanding of it is that the code is not designed to
> expect negative values, and (2) since I added the KASSERTs 10/11 of the
> random bugs didn't trigger. Big signs the bug is indeed related to
> refcounting.
>
> It would be nice if someone with better understanding than me of the lwp
> group stuff could have a look, though.


Generally, pg_jobc looks like a ref counting mechanism. FreeBSD reworked
the code and added a dedicated pgrp mutex.

I don't know which path is the best for us, especially regarding the SMP
properties.

+ ad@


Re: pg_jobc going negative?

2020-06-09 Thread Kamil Rytarowski
On 09.06.2020 10:23, Michael van Elst wrote:
> m...@m00nbsd.net (Maxime Villard) writes:
> 
>> You can see they are all different, but all have to do with reading the
>> group pointer, which was either freed, overwritten, not initialized,
>> unmapped, or contained pure garbage. This is typical of refcounting bugs
>> where a resource disappears under your feet.
> 
> pg_jobc is not a reference counter. The assertion probably stopped
> a bug in a different place by coincidence.
> 

As the first step, is it fine to replace all pg_jobc == 0/ != 0 checks
with pg_jobc > 0 / <= 0?

This should not make anything worse than it is now.

The remaining code assumes that pg_jobc never goes below 0.

And then, follow up with the removal of the assert. We will check with
syzkaller whether the races/crashes are gone.



signature.asc
Description: OpenPGP digital signature


Re: pg_jobc going negative?

2020-06-09 Thread Michael van Elst
m...@m00nbsd.net (Maxime Villard) writes:

>You can see they are all different, but all have to do with reading the
>group pointer, which was either freed, overwritten, not initialized,
>unmapped, or contained pure garbage. This is typical of refcounting bugs
>where a resource disappears under your feet.

pg_jobc is not a reference counter. The assertion probably stopped
a bug in a different place by coincidence.

-- 
-- 
Michael van Elst
Internet: mlel...@serpens.de
"A potential Snark may lurk in every tree."


Re: pg_jobc going negative?

2020-06-09 Thread Michael van Elst
mo...@rodents-montreal.org (Mouse) writes:

>> pg_jobc is a process group struct member counting the number of
>> processes with [...]

>> [W]e caught issues that pg_jobc can go negative.  [...]

>> Is going negative in the first place a real bug?

>I would say, yes.  A "count of processes with [condition]" cannot ever
>truly be negative.

Unfortunately it doesn't "count processes with [condition]", that's
more a coincidence on how the mechanism is used.

-- 
-- 
Michael van Elst
Internet: mlel...@serpens.de
"A potential Snark may lurk in every tree."


Re: pg_jobc going negative?

2020-06-09 Thread Maxime Villard

Le 09/06/2020 à 02:49, Kamil Rytarowski a écrit :

pg_jobc is a process group struct member counting the number of
processes with a parent capable of doing job control. Once reaching 0,
the process group is orphaned.

With the addition of asserts checking for pg_jobc > 0 (by maxv@), we
caught issues that pg_jobc can go negative. I have landed new ATF tests
that trigger this reliably with ptrace(2)
(src/tests/lib/libc/sys/t_ptrace_fork_wait.h r.1.7). The problem was
originally triggered with GDB.

There are also other sources of these asserts due to races
The ptrace(2) ATF tests are reliable triggering a negative pg_jobc,
however there are racy tests doing the same as triggered by syzkaller
(mentioned at least in [1]).

Should we allow pg_jobc going negative?


I don't think so, the code is not designed to expect negative values.


(Other BSDs allow this.)


They don't "allow it", they just don't have a KASSERT against it.


Is going negative in the first place a real bug?


There were 11 bugs reported by syzbot which showed severe memory corruption
in this area. While investigating the issues, I looked at the recfounting
stuff and added KASSERTs for sanity checking, and then realized they were
actually firing when using the different reproducers generated by syzbot.

Since I put these KASSERTs, 10 of the 11 random bugs didn't trigger, and
instead it is the KASSERTs that fire earlier. Listed as duplicates:


https://syzkaller.appspot.com/bug?id=50a4ddd341b90cf15a4814048ff51db04347279a

You can see they are all different, but all have to do with reading the
group pointer, which was either freed, overwritten, not initialized,
unmapped, or contained pure garbage. This is typical of refcounting bugs
where a resource disappears under your feet.

Only one keeps firing once in a while (KASAN); that's understandable,
because even though there is a big window where the KASSERTs can fire, the
underlying race is still there and there is still the chance we miss the
window and get memory corruptions.

In short, (1) my understanding of it is that the code is not designed to
expect negative values, and (2) since I added the KASSERTs 10/11 of the
random bugs didn't trigger. Big signs the bug is indeed related to
refcounting.

It would be nice if someone with better understanding than me of the lwp
group stuff could have a look, though.

Maxime