Re: WANTED: nvme(4) driver testing on MP systems on -current

2016-10-22 Thread Chavdar Ivanov
VirtualBox

On Fri, 21 Oct 2016, 04:17 Thor Lancelot Simon,  wrote:

> On Thu, Oct 20, 2016 at 11:09:02PM +0200, Jarom??r Dole??ek wrote:
> > I've now committed my fixes for NVMe driver, should be more stable
> > now, give it a try.
> >
> > With those fixes, the driver works without any problem, even under
> > fairly heavy i/o load, when nvme.c and ld_nvme.c is compiled with -O0,
> > on both virtual and real MP machine. -O2 kernel works also on virtual
> > machine, but I've had an I/O lockup on real hw machine with -O2
> > kernel. It may have been unrelated, I'm still investigating.
>
> I keep forgetting to ask -- what kind of virtual machine has NVMe
> as an emulated device?
>
> Thor
>


Re: WANTED: nvme(4) driver testing on MP systems on -current

2016-10-20 Thread Jaromír Doleček
I've now committed my fixes for NVMe driver, should be more stable
now, give it a try.

With those fixes, the driver works without any problem, even under
fairly heavy i/o load, when nvme.c and ld_nvme.c is compiled with -O0,
on both virtual and real MP machine. -O2 kernel works also on virtual
machine, but I've had an I/O lockup on real hw machine with -O2
kernel. It may have been unrelated, I'm still investigating.

Jaromir

2016-10-18 22:01 GMT+02:00 Jaromír Doleček :
> Hey,
>
> thank you. This iostat_unbusy panic is typical symptom of the current
> MP issues, the command completion queue gets corrupted, and
> nvme_q_complete() delivers some commands twice. It causes either this
> panic (due to duplicate lddone() for stale buf), or a random kernel
> crash.
>
> I've been working on debugging this for past two weeks or so. I have
> some local changes (mainly some volatile classifiers) which seem to
> fix this issue at least for my MP VirtualBox test machine. But these
> changes still do not fix the issue completely on another real system I
> have access to. I guess it would be useful to share the ongoing work
> at least. I'll polish and commit what I have, today or tomorrow.
>
> Jaromir
>
> 2016-10-18 10:40 GMT+02:00 Masanobu SAITOH :
>> On 2016/09/22 5:54, Jaromír Doleček wrote:
>>>
>>> Hello,
>>>
>>> NVMe driver in NetBSD-current was recently tweaked to fix several MP and
>>> locking
>>> issues, and the driver is now marked as MPSAFE by default.
>>>
>>> Most of this work was done on emulators since I lack the the hardware,
>>> so it's not clear if
>>> everything would work properly on real systems too.
>>>
>>> Anyone having the hardware, I'd appreciate if you could check the
>>> driver out, and try
>>> to punish the drive by some heavy I/O test with parallel load if
>>> possible, and report
>>> results.
>>>
>>> The driver should work on i386 and amd64, and is enabled in
>>> INSTALL/GENERIC kernels there,
>>> so you could just try to boot install iso from NetBSD daily builds,
>>> and send-pr any
>>> issues.
>>>
>>> I'd also especially welcome if someone with sparc64 system could test
>>> the driver out, too.
>>> The driver originates from OpenBSD where nvme(4) is enabled in GENERIC
>>> sparc64
>>> kernel, so it should work. But it was not confirmed yet on
>>> NetBSD/sparc64. Note you might
>>> need fairly modern system, at least some Intel NVMe cards require PCIe
>>> Generation 3 to
>>> actually work, so this rules out e.g. T1s.
>>>
>>> I'd also very welcome any benchmark results, it would be very
>>> interesting to share some
>>> IOPS figures.
>>>
>>> Let me know the results, I'd like to update driver manpage to list
>>> known working hardware.
>>>
>>> In any reports, please include the attachment fragment from dmesg, as
>>> there
>>> is quite significant different between attachment via apic/INTx and
>>> MSI/MSI-X.
>>> Also useful would be intrctl(8) output, to confirm interrupt handlers
>>> are dispatched
>>> properly to individual available CPUs.
>>>
>>> Thank you.
>>>
>>> Jaromir
>>>
>>
>> With nvme.c rev. 1.16:
>>
>>> Oct 18 17:14:02 five savecore: reboot after panic: panic:
>>> ioWsAtRNatI_NWG:Au nRSNPILN GbNuO:Ts  SLPOyLW E RN
>>
>>
>> and,
>>
>>> five# crash -M netbsd.36.core -N /netbsd
>>> Crash version 7.99.39, image version 7.99.39.
>>> System panicked: iostat_unbusy
>>> Backtrace from time of crash is available.
>>> crash> trace
>>> _KERNEL_OPT_NVGA_RASTERCONSOLE() at 0
>>> ?() at 80008f0e5240
>>> vpanic() at vpanic+0x149
>>> snprintf() at snprintf
>>> iostat_isbusy() at iostat_isbusy
>>> dk_done1() at dk_done1+0xab
>>> lddone() at lddone+0xf
>>> nvme_q_complete() at nvme_q_complete+0xc6
>>> softint_dispatch() at softint_dispatch+0xd3
>>> DDB lost frame for Xsoftintr+0x4f, trying 0xfe810e919ff0
>>> Xsoftintr() at Xsoftintr+0x4f
>>> --- interrupt ---
>>> 0:
>>
>>
>> Again, the panic message was:
>>
>>> Oct 18 17:14:02 five savecore: reboot after panic: panic:
>>> ioWsAtRNatI_NWG:Au nRSNPILN GbNuO:Ts  SLPOyLW E RN
>>
>>
>> -> panic: iostat_unbust
>> -> WARNINWG:A RSNPILN GNO:T  SLPOLW E RN
>>
>>   -> WARNING: SPL NOT LOWER
>>   -> WARNING: SPL N
>>
>> The full dmesg is at:
>>
>> http://www.netbsd.org/~msaitoh/nvme-20161018-0.log
>>
>> Any test code are welcomed!
>>
>> --
>> ---
>> SAITOH Masanobu (msai...@execsw.org
>>  msai...@netbsd.org)



Re: WANTED: nvme(4) driver testing on MP systems on -current

2016-10-18 Thread Masanobu SAITOH

On 2016/09/22 5:54, Jaromír Doleček wrote:

Hello,

NVMe driver in NetBSD-current was recently tweaked to fix several MP and locking
issues, and the driver is now marked as MPSAFE by default.

Most of this work was done on emulators since I lack the the hardware,
so it's not clear if
everything would work properly on real systems too.

Anyone having the hardware, I'd appreciate if you could check the
driver out, and try
to punish the drive by some heavy I/O test with parallel load if
possible, and report
results.

The driver should work on i386 and amd64, and is enabled in
INSTALL/GENERIC kernels there,
so you could just try to boot install iso from NetBSD daily builds,
and send-pr any
issues.

I'd also especially welcome if someone with sparc64 system could test
the driver out, too.
The driver originates from OpenBSD where nvme(4) is enabled in GENERIC sparc64
kernel, so it should work. But it was not confirmed yet on
NetBSD/sparc64. Note you might
need fairly modern system, at least some Intel NVMe cards require PCIe
Generation 3 to
actually work, so this rules out e.g. T1s.

I'd also very welcome any benchmark results, it would be very
interesting to share some
IOPS figures.

Let me know the results, I'd like to update driver manpage to list
known working hardware.

In any reports, please include the attachment fragment from dmesg, as there
is quite significant different between attachment via apic/INTx and MSI/MSI-X.
Also useful would be intrctl(8) output, to confirm interrupt handlers
are dispatched
properly to individual available CPUs.

Thank you.

Jaromir



With nvme.c rev. 1.16:


Oct 18 17:14:02 five savecore: reboot after panic: panic: ioWsAtRNatI_NWG:Au 
nRSNPILN GbNuO:Ts  SLPOyLW E RN


and,


five# crash -M netbsd.36.core -N /netbsd
Crash version 7.99.39, image version 7.99.39.
System panicked: iostat_unbusy
Backtrace from time of crash is available.
crash> trace
_KERNEL_OPT_NVGA_RASTERCONSOLE() at 0
?() at 80008f0e5240
vpanic() at vpanic+0x149
snprintf() at snprintf
iostat_isbusy() at iostat_isbusy
dk_done1() at dk_done1+0xab
lddone() at lddone+0xf
nvme_q_complete() at nvme_q_complete+0xc6
softint_dispatch() at softint_dispatch+0xd3
DDB lost frame for Xsoftintr+0x4f, trying 0xfe810e919ff0
Xsoftintr() at Xsoftintr+0x4f
--- interrupt ---
0:


Again, the panic message was:


Oct 18 17:14:02 five savecore: reboot after panic: panic: ioWsAtRNatI_NWG:Au 
nRSNPILN GbNuO:Ts  SLPOyLW E RN


-> panic: iostat_unbust
-> WARNINWG:A RSNPILN GNO:T  SLPOLW E RN

  -> WARNING: SPL NOT LOWER
  -> WARNING: SPL N

The full dmesg is at:

http://www.netbsd.org/~msaitoh/nvme-20161018-0.log

Any test code are welcomed!

--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)



WANTED: nvme(4) driver testing on MP systems on -current

2016-09-21 Thread Jaromír Doleček
Hello,

NVMe driver in NetBSD-current was recently tweaked to fix several MP and locking
issues, and the driver is now marked as MPSAFE by default.

Most of this work was done on emulators since I lack the the hardware,
so it's not clear if
everything would work properly on real systems too.

Anyone having the hardware, I'd appreciate if you could check the
driver out, and try
to punish the drive by some heavy I/O test with parallel load if
possible, and report
results.

The driver should work on i386 and amd64, and is enabled in
INSTALL/GENERIC kernels there,
so you could just try to boot install iso from NetBSD daily builds,
and send-pr any
issues.

I'd also especially welcome if someone with sparc64 system could test
the driver out, too.
The driver originates from OpenBSD where nvme(4) is enabled in GENERIC sparc64
kernel, so it should work. But it was not confirmed yet on
NetBSD/sparc64. Note you might
need fairly modern system, at least some Intel NVMe cards require PCIe
Generation 3 to
actually work, so this rules out e.g. T1s.

I'd also very welcome any benchmark results, it would be very
interesting to share some
IOPS figures.

Let me know the results, I'd like to update driver manpage to list
known working hardware.

In any reports, please include the attachment fragment from dmesg, as there
is quite significant different between attachment via apic/INTx and MSI/MSI-X.
Also useful would be intrctl(8) output, to confirm interrupt handlers
are dispatched
properly to individual available CPUs.

Thank you.

Jaromir