Re: CVS commit: src/sys/kern

2020-01-21 Thread Simon Burge
"Christos Zoulas" wrote:

> Log Message:
>
> Don't crash if we are on a hippie trail, head full of zombie

+1 for any Australian references in a commit message :)

Cheers,
Simon.


re: CVS commit: src/sys/uvm

2020-01-21 Thread matthew green
Andrew Doran writes:
> I also recommend disabling ACPI idle, at least until it can be made less
> aggressive by default.  It causes a significant slowdown.  It can be done
> with detaching all acpicpu devices using "drvctl -d" on each.

this disables cpufreq support, doesn't it?  that seems like
an unfortunate side effect..


.mrg.


Re: CVS commit: src/sys/uvm

2020-01-21 Thread Andrew Doran
On Fri, Jan 10, 2020 at 10:21:25PM +, Andrew Doran wrote:
> Hi Frank,
> 
> On Fri, Jan 10, 2020 at 01:10:02PM +0100, Frank Kardel wrote:
> 
> > Hi !
> > 
> > With this state of January 2nd we ran some tests for robustness and timing
> > with our database setup:
> > 
> > Machine:
> > 
> > Mainboard: S2600WFT
> > 
> > CPU: 2 x Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz
> > 
> > machdep.spectre_v1.mitigated = 0
> > machdep.spectre_v2.hwmitigated = 1
> > machdep.spectre_v2.swmitigated = 1
> > machdep.spectre_v2.method = [GCC retpoline] + [Intel IBRS]
> > machdep.spectre_v4.mitigated = 0
> > machdep.spectre_v4.method = (none)
> > machdep.mds.mitigated = 0
> > machdep.mds.method = (none)
> > machdep.taa.mitigated = 0
> > machdep.taa.method = [MDS]
> > 
> > Memory:
> > 
> > hw.physmem64 = 549446447104
> > hw.usermem64 = 549438365696
> > 
> > This machine is/has been a challenge to NetBSD as it has 0.5Tb Memory and 32
> > cores.
> > 
> > Testcase is restoring a 1Tb Postgresql-11 database with varying degres of
> > Postresql pg_restore parallelism.
> > 
> > Why did we do the tests? The machine was installed with 8.99.24 as that
> > supported the memory setup.
> > 
> > The machine was not able to reliably copy with many db/restore processes and
> > large memory - see
> > 
> >  PR kern/54209: NetBSD 8 large memory performance extremely low
> >  PR kern/54210: NetBSD-8 processes presumably not exiting
> > 
> > for details.
> > 
> > With Andrew Doran's work on the vm system we restarted the tests.
> > 
> > The baseline is 8.99.24 from around  Sep  3 04:10:20 UTC 2018:
> > TEST 1
> > FRESH BOOT
> > time pg_restore -Upgsql -p5433 -Fd -d db -j5 20200103-db.dmpdir
> > 1826.599u 1752.878s 10:36:03.83 9.3%0+0k 397+0io 1789pf+0w
> > 
> > Higher levels of parallelism lead to a higher probability for catatonic
> > systems with increasing restore parallelism.
> > Trouble starts around -j8 and gets worse at higher levels.
> > 
> > TEST 2
> > 9.99.33 from around Fri Jan  3 16:14:02 CET 2020
> > FRESH BOOT
> > time pg_restore -Upgsql -p5433 -Fd -d db -j28 20200103-db.dmpdir
> > 2047.925u 1191.878s 14:24:15.23 6.2%0+0k 0+0io 5784pf+0w
> > 
> > This survived a -j28 run that was not possible with 8.99.24 - this a a big
> > step forward, but ~4h slower real time.
> > 
> > TEST 3
> > FRESH BOOT
> > 9.99.34 from around Mon Jan  6 14:43:01
> > time pg_restore -Upgsql -p5433 -Fd -d db -j28 20200103-db.dmpdir
> > 1816.348u 1792.530s 10:56:02.56 9.1%0+0k 395+0io 5620pf+0w
> > 
> > -j5 run to compare to 9.99.33 - big improvement in real run time though
> > system time went up.
> > 
> > TEST 4
> > State after TEST 3 run to compare to 8.99.24
> > time pg_restore -Upgsql -p5433 -Fd -d db -j5 20200103-db.dmpdir
> > 1706.548u 1748.623s 11:26:38.87 8.3%0+0k 0+0io 1420pf+0w
> > 
> > This ran faster that -j28 - probably due to less contention, but 50 min
> > slower that 8.99.24 after fresh boot.
> > 
> > TEST 5:
> > re-run TEST 4 with fresh boot for 8.99.24 comparison
> > time pg_restore -Upgsql -p5433 -Fd -d db -j5 20200103-db.dmpdir
> > 1710.665u 1611.083s 9:14:56.86 9.9% 0+0k 398+0io 1504pf+0w
> > 
> > better the 8.99.24 for real time.
> > 
> > There seems no big difference in system time between 8.99.24 and 9.99.34,
> > but a big improvement in robustness.
> > The lockups don't seem to happen any more and there are a fewer short term
> > system freezes and the systems remains
> > responsive with 9.99.34.
> > 
> > The big differences in real time are interesting but the cause for that may
> > not be easy to pinpoint. The database
> > runs on an nvme:
> > nvme0 at pci10 dev 0 function 0: Intel SSD DC P4500 (rev. 0x00)
> > nvme0: NVMe 1.2
> > nvme0: for admin queue interrupting at msix4 vec 0
> > nvme0: INTEL SSDPE2KX040T8, firmware VDV10131, serial ...
> > nvme0: for io queue 1 interrupting at msix4 vec 1 affinity to cpu0
> > [...]
> > nvme0: for io queue 32 interrupting at msix4 vec 32 affinity to cpu31
> > ld0 at nvme0 nsid 1
> > ld0: 3726 GB, 486401 cyl, 255 head, 63 sec, 512 bytes/sect x 7814037168
> > sectors
> > 
> > And we are seeing transfer rates up to 300Mb/s and up 80% busy on the
> > complex I/O (load) and CPU (build index) workload.
> > 
> > So in summary we a a big step forward in robustness.
> > 
> > Thanks to Andrew for the big improvements here.
> 
> Thank you for the detailed testing, and report.
> 
> Many of the changes to the VM system came from Takashi Yamamoto's
> yamt-pagecache branch, so it's not all my work.
> 
> I'm glad to hear that this has worked well for you.  There are a couple of
> things that, time permitting, I would like to get in place over the next few
> weeks which should help a little with this workload (and then I am done, for
> now).
> 
> The first is enabling Jaromir Dolecek's vm.ubc_direct by default, which may
> help with such a high I/O rate.  There is a possible deadlock condition with
> this that needs to be fixed first.

ubc_direct didn't make it in yet.  As an interim measure I 

Re: CVS commit: src/sys [freeze on boot]

2020-01-21 Thread SAITOH Masanobu
On 2020/01/21 20:20, Patrick Welche wrote:
> On Tue, Jan 21, 2020 at 03:48:27PM +0900, Masanobu SAITOH wrote:
>> I suspect the location of your panic is after the following message
>> (because of ixgbe_allocate_msix()'s failure):
>>
>>> aprint_normal(" ETrackID %08x\n", ((uint32_t)high << 16) | low);
> 
> Exactly right:
> 
> ixg0 at pci8 dev 0 function 0: Intel(R) PRO/10GbE PCI-Express Network Driver, 
> Version - 4.0.1-k
> ixg0: device 82599EB
> ixg0: ETrackID 81a5
> ixg0: autoconfiguration error: failed to allocate MSI-X interrupt
> 
>> If so, could you try the following diff?
> 
> That fixed it, thanks! (with ad@'s rwlock fix and your patch all permutations
> work)

Committed.

Thanks!


> Cheers,
> 
> Patrick
> 


-- 
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: CVS commit: src/sys [freeze on boot]

2020-01-21 Thread Patrick Welche
On Tue, Jan 21, 2020 at 03:48:27PM +0900, Masanobu SAITOH wrote:
> I suspect the location of your panic is after the following message
> (because of ixgbe_allocate_msix()'s failure):
> 
> > aprint_normal(" ETrackID %08x\n", ((uint32_t)high << 16) | low);

Exactly right:

ixg0 at pci8 dev 0 function 0: Intel(R) PRO/10GbE PCI-Express Network Driver, 
Version - 4.0.1-k
ixg0: device 82599EB
ixg0: ETrackID 81a5
ixg0: autoconfiguration error: failed to allocate MSI-X interrupt

> If so, could you try the following diff?

That fixed it, thanks! (with ad@'s rwlock fix and your patch all permutations
work)

Cheers,

Patrick