Re: WANTED: nvme(4) driver testing on MP systems on -current
VirtualBox On Fri, 21 Oct 2016, 04:17 Thor Lancelot Simon,wrote: > On Thu, Oct 20, 2016 at 11:09:02PM +0200, Jarom??r Dole??ek wrote: > > I've now committed my fixes for NVMe driver, should be more stable > > now, give it a try. > > > > With those fixes, the driver works without any problem, even under > > fairly heavy i/o load, when nvme.c and ld_nvme.c is compiled with -O0, > > on both virtual and real MP machine. -O2 kernel works also on virtual > > machine, but I've had an I/O lockup on real hw machine with -O2 > > kernel. It may have been unrelated, I'm still investigating. > > I keep forgetting to ask -- what kind of virtual machine has NVMe > as an emulated device? > > Thor >
Re: WANTED: nvme(4) driver testing on MP systems on -current
I've now committed my fixes for NVMe driver, should be more stable now, give it a try. With those fixes, the driver works without any problem, even under fairly heavy i/o load, when nvme.c and ld_nvme.c is compiled with -O0, on both virtual and real MP machine. -O2 kernel works also on virtual machine, but I've had an I/O lockup on real hw machine with -O2 kernel. It may have been unrelated, I'm still investigating. Jaromir 2016-10-18 22:01 GMT+02:00 Jaromír Doleček: > Hey, > > thank you. This iostat_unbusy panic is typical symptom of the current > MP issues, the command completion queue gets corrupted, and > nvme_q_complete() delivers some commands twice. It causes either this > panic (due to duplicate lddone() for stale buf), or a random kernel > crash. > > I've been working on debugging this for past two weeks or so. I have > some local changes (mainly some volatile classifiers) which seem to > fix this issue at least for my MP VirtualBox test machine. But these > changes still do not fix the issue completely on another real system I > have access to. I guess it would be useful to share the ongoing work > at least. I'll polish and commit what I have, today or tomorrow. > > Jaromir > > 2016-10-18 10:40 GMT+02:00 Masanobu SAITOH : >> On 2016/09/22 5:54, Jaromír Doleček wrote: >>> >>> Hello, >>> >>> NVMe driver in NetBSD-current was recently tweaked to fix several MP and >>> locking >>> issues, and the driver is now marked as MPSAFE by default. >>> >>> Most of this work was done on emulators since I lack the the hardware, >>> so it's not clear if >>> everything would work properly on real systems too. >>> >>> Anyone having the hardware, I'd appreciate if you could check the >>> driver out, and try >>> to punish the drive by some heavy I/O test with parallel load if >>> possible, and report >>> results. >>> >>> The driver should work on i386 and amd64, and is enabled in >>> INSTALL/GENERIC kernels there, >>> so you could just try to boot install iso from NetBSD daily builds, >>> and send-pr any >>> issues. >>> >>> I'd also especially welcome if someone with sparc64 system could test >>> the driver out, too. >>> The driver originates from OpenBSD where nvme(4) is enabled in GENERIC >>> sparc64 >>> kernel, so it should work. But it was not confirmed yet on >>> NetBSD/sparc64. Note you might >>> need fairly modern system, at least some Intel NVMe cards require PCIe >>> Generation 3 to >>> actually work, so this rules out e.g. T1s. >>> >>> I'd also very welcome any benchmark results, it would be very >>> interesting to share some >>> IOPS figures. >>> >>> Let me know the results, I'd like to update driver manpage to list >>> known working hardware. >>> >>> In any reports, please include the attachment fragment from dmesg, as >>> there >>> is quite significant different between attachment via apic/INTx and >>> MSI/MSI-X. >>> Also useful would be intrctl(8) output, to confirm interrupt handlers >>> are dispatched >>> properly to individual available CPUs. >>> >>> Thank you. >>> >>> Jaromir >>> >> >> With nvme.c rev. 1.16: >> >>> Oct 18 17:14:02 five savecore: reboot after panic: panic: >>> ioWsAtRNatI_NWG:Au nRSNPILN GbNuO:Ts SLPOyLW E RN >> >> >> and, >> >>> five# crash -M netbsd.36.core -N /netbsd >>> Crash version 7.99.39, image version 7.99.39. >>> System panicked: iostat_unbusy >>> Backtrace from time of crash is available. >>> crash> trace >>> _KERNEL_OPT_NVGA_RASTERCONSOLE() at 0 >>> ?() at 80008f0e5240 >>> vpanic() at vpanic+0x149 >>> snprintf() at snprintf >>> iostat_isbusy() at iostat_isbusy >>> dk_done1() at dk_done1+0xab >>> lddone() at lddone+0xf >>> nvme_q_complete() at nvme_q_complete+0xc6 >>> softint_dispatch() at softint_dispatch+0xd3 >>> DDB lost frame for Xsoftintr+0x4f, trying 0xfe810e919ff0 >>> Xsoftintr() at Xsoftintr+0x4f >>> --- interrupt --- >>> 0: >> >> >> Again, the panic message was: >> >>> Oct 18 17:14:02 five savecore: reboot after panic: panic: >>> ioWsAtRNatI_NWG:Au nRSNPILN GbNuO:Ts SLPOyLW E RN >> >> >> -> panic: iostat_unbust >> -> WARNINWG:A RSNPILN GNO:T SLPOLW E RN >> >> -> WARNING: SPL NOT LOWER >> -> WARNING: SPL N >> >> The full dmesg is at: >> >> http://www.netbsd.org/~msaitoh/nvme-20161018-0.log >> >> Any test code are welcomed! >> >> -- >> --- >> SAITOH Masanobu (msai...@execsw.org >> msai...@netbsd.org)
Re: WANTED: nvme(4) driver testing on MP systems on -current
On 2016/09/22 5:54, Jaromír Doleček wrote: Hello, NVMe driver in NetBSD-current was recently tweaked to fix several MP and locking issues, and the driver is now marked as MPSAFE by default. Most of this work was done on emulators since I lack the the hardware, so it's not clear if everything would work properly on real systems too. Anyone having the hardware, I'd appreciate if you could check the driver out, and try to punish the drive by some heavy I/O test with parallel load if possible, and report results. The driver should work on i386 and amd64, and is enabled in INSTALL/GENERIC kernels there, so you could just try to boot install iso from NetBSD daily builds, and send-pr any issues. I'd also especially welcome if someone with sparc64 system could test the driver out, too. The driver originates from OpenBSD where nvme(4) is enabled in GENERIC sparc64 kernel, so it should work. But it was not confirmed yet on NetBSD/sparc64. Note you might need fairly modern system, at least some Intel NVMe cards require PCIe Generation 3 to actually work, so this rules out e.g. T1s. I'd also very welcome any benchmark results, it would be very interesting to share some IOPS figures. Let me know the results, I'd like to update driver manpage to list known working hardware. In any reports, please include the attachment fragment from dmesg, as there is quite significant different between attachment via apic/INTx and MSI/MSI-X. Also useful would be intrctl(8) output, to confirm interrupt handlers are dispatched properly to individual available CPUs. Thank you. Jaromir With nvme.c rev. 1.16: Oct 18 17:14:02 five savecore: reboot after panic: panic: ioWsAtRNatI_NWG:Au nRSNPILN GbNuO:Ts SLPOyLW E RN and, five# crash -M netbsd.36.core -N /netbsd Crash version 7.99.39, image version 7.99.39. System panicked: iostat_unbusy Backtrace from time of crash is available. crash> trace _KERNEL_OPT_NVGA_RASTERCONSOLE() at 0 ?() at 80008f0e5240 vpanic() at vpanic+0x149 snprintf() at snprintf iostat_isbusy() at iostat_isbusy dk_done1() at dk_done1+0xab lddone() at lddone+0xf nvme_q_complete() at nvme_q_complete+0xc6 softint_dispatch() at softint_dispatch+0xd3 DDB lost frame for Xsoftintr+0x4f, trying 0xfe810e919ff0 Xsoftintr() at Xsoftintr+0x4f --- interrupt --- 0: Again, the panic message was: Oct 18 17:14:02 five savecore: reboot after panic: panic: ioWsAtRNatI_NWG:Au nRSNPILN GbNuO:Ts SLPOyLW E RN -> panic: iostat_unbust -> WARNINWG:A RSNPILN GNO:T SLPOLW E RN -> WARNING: SPL NOT LOWER -> WARNING: SPL N The full dmesg is at: http://www.netbsd.org/~msaitoh/nvme-20161018-0.log Any test code are welcomed! -- --- SAITOH Masanobu (msai...@execsw.org msai...@netbsd.org)
WANTED: nvme(4) driver testing on MP systems on -current
Hello, NVMe driver in NetBSD-current was recently tweaked to fix several MP and locking issues, and the driver is now marked as MPSAFE by default. Most of this work was done on emulators since I lack the the hardware, so it's not clear if everything would work properly on real systems too. Anyone having the hardware, I'd appreciate if you could check the driver out, and try to punish the drive by some heavy I/O test with parallel load if possible, and report results. The driver should work on i386 and amd64, and is enabled in INSTALL/GENERIC kernels there, so you could just try to boot install iso from NetBSD daily builds, and send-pr any issues. I'd also especially welcome if someone with sparc64 system could test the driver out, too. The driver originates from OpenBSD where nvme(4) is enabled in GENERIC sparc64 kernel, so it should work. But it was not confirmed yet on NetBSD/sparc64. Note you might need fairly modern system, at least some Intel NVMe cards require PCIe Generation 3 to actually work, so this rules out e.g. T1s. I'd also very welcome any benchmark results, it would be very interesting to share some IOPS figures. Let me know the results, I'd like to update driver manpage to list known working hardware. In any reports, please include the attachment fragment from dmesg, as there is quite significant different between attachment via apic/INTx and MSI/MSI-X. Also useful would be intrctl(8) output, to confirm interrupt handlers are dispatched properly to individual available CPUs. Thank you. Jaromir