Re: panic in amap_wipeout()

2018-07-10 Thread Maxime Villard

Le 10/07/2018 à 16:44, Edgar Fuß a écrit :

So the machine [...] just paniced:

It's running 6.1/amd64.


You are using NetBSD 6 and IPF; that's about the least bug-free configuration
I can think of. Really, you should switch to NetBSD 8 - even if IPF is not
maintained at least the rest of the system is.

And basically no one is going to try to investigate what's wrong with your
system, if the kernel you're using is reaching EOL in one or two months...
Unless you can reproduce the issue on NetBSD 8.


Re: Removing bitrotted sys/dev/pci/n8 (NetOctave NSP2000)

2018-07-10 Thread coypu
Thanks for confirming! :-)
I'll still hold my original promise of waiting a month to do so.


Re: Removing bitrotted sys/dev/pci/n8 (NetOctave NSP2000)

2018-07-10 Thread Thor Lancelot Simon
On Tue, Jul 10, 2018 at 06:56:53PM +, co...@sdf.org wrote:
> Hi,
> 
> The code in sys/dev/pci/n8 has bitrotted - it still makes references to
> LKM_ system things, so it is unlikely it builds.
> This has been the case since netbsd-6.

I still have the hardware, but I seriously doubt anyone's using it.  We
imported this driver mostly as a testbed for hardware crypto improvements
(it was not really performance-competitive any more even at the time; but
we had a good relationship with the IC designer and extensive documentation
on their SDK and OpenSSL modifications).

The hardware is no longer made and was, as far as I know, used under NetBSD
only within the engineering organization at Coyote Point, which doesn't
exist any more.

I think the clock's up on this one; take it out please.

Thor


Re: Removing viadrm

2018-07-10 Thread maya
On Mon, Jul 02, 2018 at 12:18:51PM +, co...@sdf.org wrote:
> Hi folks,
> 
> we have two ports of linux drm code. old drm, which exists because not
> all devices/drivers work with the newer, also non-x86 architectures use
> this.
> 
> new drm ("drm2") which hopefully we'll transition to.
> 
> there are two via drivers:
> viadrmums (drm2)
> viadrm (old drm)
> 
> according to PR port-i386/53364, viadrm doesn't work any more.
> viadrmums does, and uses the newer drm code.
> 
> viadrmums shouldn't be significantly different in terms of support (the
> driver was almost abandoned upstream until recently).
> 
> I don't know of any non-x86 via graphics possible users.
> 
> If nobody objects, I will delete viadrm in 1-2 weeks.

Removed.
https://mail-index.netbsd.org/source-changes/2018/07/10/msg096670.html

And thanks to testing it, I found out that the Xorg driver was
dysfunctional, too, and fixed it.



Removing bitrotted sys/dev/pci/n8 (NetOctave NSP2000)

2018-07-10 Thread coypu
Hi,

The code in sys/dev/pci/n8 has bitrotted - it still makes references to
LKM_ system things, so it is unlikely it builds.
This has been the case since netbsd-6.

I am interested in removing this because while playing with a
text-processing tool to look for bugs, I came across this and spent some
time looking at something that looked like a bug before realizing the
code doesn't build.

Thoughts? objections?

If no one objects in a month, I will remove all the files in
sys/dev/pci/n8 and all related references.


Re: Console on both VGA/Keyboard and IPMI

2018-07-10 Thread Edgar Fuß
I asked
> Is there any way to have the console (more precisely: the thing where panic 
> messages go and where DDB operates on) both on VGA/Physical Keyboard (for 
> on-site access) and something like IPMI SOL (for off-site access)?

How does DDB select the device it communicates on?


Re: panic in amap_wipeout()

2018-07-10 Thread Edgar Fuß
> Since it's a development server, I let it sit in DDB in case someone wants me 
> to examine something.
I tried to dump (reboot 104), but that froze, so I had to press The Button, 
so above no longer holds, sorry.


Re: panic in amap_wipeout()

2018-07-10 Thread Edgar Fuß
> So the machine [...] just paniced:
It's running 6.1/amd64.


panic in amap_wipeout() (was: ipnat ftp proxy suddenly stopped working)

2018-07-10 Thread Edgar Fuß
So the machine where ipnat's ftp proxy misbehaved just paniced:

uvm_fault(0x8076d460, 0x0, 1) -> e
fatal page fault in supervisor mode
trap type 6 code 0 rip 8045102e cs 8 rflags 10212 cr2  8 cpl 0 rsp 
fe812a095a90
kernel: page fault trap, code=0
Stopped in pid 27044.1 (perl) atnetbsd:amap_wipeout+0x81:   movq
8
(%rax),%rdx
db{0}> bt
amap_wipeout() at netbsd:amap_wipeout+0x81
uvm_unmap_detach() at netbsd:uvm_unmap_detach+0x43
uvmspace_free() at netbsd:uvmspace_free+0xe7
exit1() at netbsd:exit1+0x175
sys_exit() at netbsd:sys_exit+0x3e
syscall() at netbsd:syscall+0xc4
db{0}> show reg
ds  92d8
es  0
fs  2c80
gs  4b73
rdi 8075e100amep_list_lock
rsi 0
rbp fe812a095ab0
rbx fe8294b648e8
rdx 2
rcx fe841338a140
rax 0
r8  fe83f3c192d8
r9  2
r10 0
r11 1
r12 2
r13 fe825f43cb88
r14 0
r15 fe811d870520
rip 8045102eamap_wipeout+0x81
cs  8
rflags  10212
rsp fe812a095a90
ss  10
netbsd:amap_wipeout+0x81:   movq8(%rax),%rdx
db{0}>

Since it's a development server, I let it sit in DDB in case someone wants me 
to examine something.


Re: 8.0 performance issue when running build.sh?

2018-07-10 Thread Martin Husemann
On Tue, Jul 10, 2018 at 12:11:41PM +0200, Kamil Rytarowski wrote:
> After the switch from NetBSD-HEAD (version from 1 year ago) to 8.0RC2,
> the ld(1) linker has serious issues with linking Clang/LLVM single
> libraries within 20 minutes. This causes frequent timeouts on the NetBSD
> buildbot in the LLVM buildfarm. Timeouts were never observed in the
> past, today there might be few of them daily.

Sounds like a binutils issue (or something like too little RAM available
on the machine).

> Another observation is that grep(1) on one NetBSD server is
> significantly slower between the switch from -7 to 8RC1.

Please file separate PRs for each (and maybe provide some input files
to reproduce the issue).

Martin




Re: 8.0 performance issue when running build.sh?

2018-07-10 Thread Kamil Rytarowski
On 10.07.2018 11:01, Martin Husemann wrote:
> On Fri, Jul 06, 2018 at 04:04:50PM +0200, Martin Husemann wrote:
>> I have no scientific data yet, but just noticed that build times on the
>> auto build cluster did rise very dramatically since it has been updated
>> to run NetBSD 8.0 RC2.
>>
>> Since builds move around build slaves sometimes (not exactly randomly,
>> but anyway) I picked the alpha port as an example (the first few
>> architectures in the alphabetical list get build slaves assigned pretty
>> consistently).
> 
> Here is an intermediate result from further experiments and statistics:
> 
>  - fpu_eager (as it is on NetBSD 8.0 RC2, which is not what is in -current
>and not what will be in the final 8.0 release) has a measurable performance
>impact - but it is not the big issue here.
> 
>  - if we ignore netbsd-7* branches, the performance loss is reasonable 
>explainable by the SVS penalty - we are going to check that theory soon.
> 
>  - maybe the netbsd-7 /bin/sh and/or /usr/bin/make cause some very bad
>interaction with SVS, making those build times sky rocket - if turning
>off SVS does not solve this, we will need to dig deeper.
> 
> So stay tuned, maybe only Intel to blame ;-)
> 
> If anyone has concrete pointers for the last issue (or ideas what to change/
> measure) please speak up.
> 
> Martin
> 

After the switch from NetBSD-HEAD (version from 1 year ago) to 8.0RC2,
the ld(1) linker has serious issues with linking Clang/LLVM single
libraries within 20 minutes. This causes frequent timeouts on the NetBSD
buildbot in the LLVM buildfarm. Timeouts were never observed in the
past, today there might be few of them daily.

We were experimenting with disabled SVS, but it didn't help.

Another observation is that grep(1) on one NetBSD server is
significantly slower between the switch from -7 to 8RC1.



signature.asc
Description: OpenPGP digital signature


Re: 8.0 performance issue when running build.sh?

2018-07-10 Thread Maxime Villard

Le 10/07/2018 à 11:01, Martin Husemann a écrit :

On Fri, Jul 06, 2018 at 04:04:50PM +0200, Martin Husemann wrote:

I have no scientific data yet, but just noticed that build times on the
auto build cluster did rise very dramatically since it has been updated
to run NetBSD 8.0 RC2.

Since builds move around build slaves sometimes (not exactly randomly,
but anyway) I picked the alpha port as an example (the first few
architectures in the alphabetical list get build slaves assigned pretty
consistently).


Here is an intermediate result from further experiments and statistics:

  - fpu_eager (as it is on NetBSD 8.0 RC2, which is not what is in -current
and not what will be in the final 8.0 release) has a measurable performance
impact - but it is not the big issue here.


For the record: EagerFPU has a fixed performance cost, which is,
saving+restoring the FPU context during each context switch. LazyFPU, however,
had a variable performance cost: during context switches the FPU state was
kept on the CPU, in the hope that if we switched back to the owner lwp we may
not have had to do a save+restore. If we did have to, however, we needed to
send an IPI, and cost(IPI+save+restore) > cost(save+restore).

So LazyFPU may have been less/more expensive than EagerFPU, depending on the
workload/scheduling.

The reason it is more expensive for you is maybe because on your machine each
lwp ("make" thread) stays on the same CPU, and the kpreemtions cause a
save+restore that is not actually necessary since each CPU always comes back
to the owner lwp. (As you said, also, you have the old version of EagerFPU in
RC2, which is more expensive than that of the current -current, so it's part
of the problem too.)

I've already said it, but XSAVEOPT actually eliminates this problem, since it
performs the equivalent of LazyFPU (not saving+restoring when not needed)
without requiring an IPI.


  - if we ignore netbsd-7* branches, the performance loss is reasonable
explainable by the SVS penalty - we are going to check that theory soon.

  - maybe the netbsd-7 /bin/sh and/or /usr/bin/make cause some very bad
interaction with SVS, making those build times sky rocket - if turning
off SVS does not solve this, we will need to dig deeper.

So stay tuned, maybe only Intel to blame ;-)

If anyone has concrete pointers for the last issue (or ideas what to change/
measure) please speak up.


Not sure this is related, but it seems that the performance of build.sh on
netbsd-current is oscillating as fuck. Maybe I just hadn't noticed that it
was oscillating this much before, but right now a

build.sh -j 4 kernel=GENERIC

is in [5min; 5min35s]. Even with SpectreV2/SpectreV4/SVS/EagerFPU all
disabled. It seems to me that at a time doing two or three builds was enough
and you would get an oscillation that was <5s. Now it looks like I have to
do more than 5 builds to get a relevant average.

Maxime



Re: interesting skylake perf tidbit

2018-07-10 Thread Maxime Villard

Le 06/07/2018 à 21:47, Maxime Villard a écrit :

I guess we should do both; use "monitor" when possible, and in the places
that are still required to use "pause", use a lower BACKOFF_MIN (set at
boot time, depending on the cpu model) to compensate for the increased CPU
latency.


Here are two patches [1] [2]. We reduce the backoff values for PAUSE, and use
MWAIT instead when possible.

The code for MWAIT is not very beautiful, because we need to pass the
condition, and therefore we need a macro. And we do a 64bit mwait, while the
value could actually be smaller.

I've benchmarked the latency of PAUSE on a Kabylake (Core i5). It seems that
Kabylake indeed has the same latency, because on average PAUSE takes ~136
cycles, which matches the 140c documented for Skylake.

The reason for the uncertainty, is that the Kabylake microarchitecture is an
optimization of Skylake, but it's not documented whether the increased latency
is present (as inherited from Skylake); it looks like it is.

I guess we'll have to add manually the CPU models in probe_intel_slowpause().

I don't have a lot of conviction down there... If you could measure the
performance difference with the patches applied, that would be nice.

Maxime

[1] http://m00nbsd.net/garbage/idle/slowpause.diff
[2] http://m00nbsd.net/garbage/idle/spinwmait.diff


Re: CVS commit: src/sys/arch/x86/x86

2018-07-10 Thread Kamil Rytarowski
On 08.07.2018 17:44, Kamil Rytarowski wrote:
> I will try to scratch a new header unaligned.h with the set of macros
> and submit it to evaluation.

I've prepared a scratch of unaligned.h with get_unaligned():

http://netbsd.org/~kamil/kubsan/unaligned.h

There are at least two problems to proceed:

1. GCC 8.x is required for no_sanitizer attributes

https://gcc.gnu.org/gcc-8/changes.html
This version will also ship with NetBSD code for sanitization.

The basesystem GCC version in HEAD (v. 6.4.0) is too old.

2. get_unaligned() is a fundamental type oriented (char, int, long etc)

A large part of the issues detected in the kernel are due to a
misaligned pointer to a struct passed (like disklabel or in6_addr).

I think that these cases shall to be addressed directly in the kernel
code and treated as buggy.


I'm deferring right now the work on unaligned.h and wait for a required
minimum version of GCC in the base. I will keep KUBSan reports as
non-fatals for now.



signature.asc
Description: OpenPGP digital signature


Re: 8.0 performance issue when running build.sh?

2018-07-10 Thread Martin Husemann
On Fri, Jul 06, 2018 at 04:04:50PM +0200, Martin Husemann wrote:
> I have no scientific data yet, but just noticed that build times on the
> auto build cluster did rise very dramatically since it has been updated
> to run NetBSD 8.0 RC2.
> 
> Since builds move around build slaves sometimes (not exactly randomly,
> but anyway) I picked the alpha port as an example (the first few
> architectures in the alphabetical list get build slaves assigned pretty
> consistently).

Here is an intermediate result from further experiments and statistics:

 - fpu_eager (as it is on NetBSD 8.0 RC2, which is not what is in -current
   and not what will be in the final 8.0 release) has a measurable performance
   impact - but it is not the big issue here.

 - if we ignore netbsd-7* branches, the performance loss is reasonable 
   explainable by the SVS penalty - we are going to check that theory soon.

 - maybe the netbsd-7 /bin/sh and/or /usr/bin/make cause some very bad
   interaction with SVS, making those build times sky rocket - if turning
   off SVS does not solve this, we will need to dig deeper.

So stay tuned, maybe only Intel to blame ;-)

If anyone has concrete pointers for the last issue (or ideas what to change/
measure) please speak up.

Martin