Re: Intel NVMe troubles?
On Monday, September 12, 2016, Borja Marcos <bor...@sarenet.es> wrote: > > > On 12 Sep 2016, at 17:23, Jim Harris <jim.har...@gmail.com > <javascript:;>> wrote: > > > > There is an updated DCT 3.0.2 at: https://downloadcenter.intel. > > com/download/26221/Intel-SSD-Data-Center-Tool which has a fix for this > > issue. > > > > Borja has already downloaded this update and confirmed it looks good so > > far. Posting the update and results here so it is archived on the STABLE > > mailing list. > > Is it just my imagination or has trim performance improved dramatically? > I’m being unable to replicate > the I/O stalls that I observed after running some simultaneous Bonnie++ > benchmarks with large files. Yes - Trim performance should be significantly faster with the FW included in the 3.0.2 DCT release. Jim > > Thanks, > > > > Borja. > > > ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Intel NVMe troubles?
On Mon, Aug 1, 2016 at 11:49 AM, Jim Harris <jim.har...@gmail.com> wrote: > > > On Mon, Aug 1, 2016 at 7:38 AM, Borja Marcos <bor...@sarenet.es> wrote: > >> >> > >> > It looks like all of the TRIM commands are formatted properly. The >> failures do not happen until about 10 seconds after the last TRIM to each >> drive was submitted, and immediately before TRIMs start to the next drive, >> so I'm assuming the failures are for the the last few TRIM commands but >> cannot say for sure. Could you apply patch v2 (attached) which will dump >> the TRIM payload contents inline with the failure messages? >> >> Sure, this is the complete /var/log/messages starting with the system >> boot. Before booting I destroyed the pool >> so that you could capture what happens when booting, zpool create, etc. >> >> Remember that the drives are in LBA format #3 (4 KB blocks). As far as I >> know that’s preferred to the old 512 byte blocks. >> >> Thank you very much and sorry about the belated response. > > > Hi Borja, > > Thanks for the additional testing. This has all of the detail that I need > for now. > > -Jim > > > There is an updated DCT 3.0.2 at: https://downloadcenter.intel. com/download/26221/Intel-SSD-Data-Center-Tool which has a fix for this issue. Borja has already downloaded this update and confirmed it looks good so far. Posting the update and results here so it is archived on the STABLE mailing list. Thanks, -Jim ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Intel NVMe troubles?
On Mon, Aug 1, 2016 at 7:38 AM, Borja Marcos <bor...@sarenet.es> wrote: > > > On 29 Jul 2016, at 17:44, Jim Harris <jim.har...@gmail.com> wrote: > > > > > > > > On Fri, Jul 29, 2016 at 1:10 AM, Borja Marcos <bor...@sarenet.es> wrote: > > > > > On 28 Jul 2016, at 19:25, Jim Harris <jim.har...@gmail.com> wrote: > > > > > > Yes, you should worry. > > > > > > Normally we could use the dump_debug sysctls to help debug this - these > > > sysctls will dump the NVMe I/O submission and completion queues. But > in > > > this case the LBA data is in the payload, not the NVMe submission > entries, > > > so dump_debug will not help as much as dumping the NVMe DSM payload > > > directly. > > > > > > Could you try the attached patch and send output after recreating your > pool? > > > > Just in case the evil anti-spam ate my answer, sent the results to your > Gmail account. > > > > > > Thanks Borja. > > > > It looks like all of the TRIM commands are formatted properly. The > failures do not happen until about 10 seconds after the last TRIM to each > drive was submitted, and immediately before TRIMs start to the next drive, > so I'm assuming the failures are for the the last few TRIM commands but > cannot say for sure. Could you apply patch v2 (attached) which will dump > the TRIM payload contents inline with the failure messages? > > Sure, this is the complete /var/log/messages starting with the system > boot. Before booting I destroyed the pool > so that you could capture what happens when booting, zpool create, etc. > > Remember that the drives are in LBA format #3 (4 KB blocks). As far as I > know that’s preferred to the old 512 byte blocks. > > Thank you very much and sorry about the belated response. Hi Borja, Thanks for the additional testing. This has all of the detail that I need for now. -Jim ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Intel NVMe troubles?
On Fri, Jul 29, 2016 at 1:10 AM, Borja Marcos <bor...@sarenet.es> wrote: > > > On 28 Jul 2016, at 19:25, Jim Harris <jim.har...@gmail.com> wrote: > > > > Yes, you should worry. > > > > Normally we could use the dump_debug sysctls to help debug this - these > > sysctls will dump the NVMe I/O submission and completion queues. But in > > this case the LBA data is in the payload, not the NVMe submission > entries, > > so dump_debug will not help as much as dumping the NVMe DSM payload > > directly. > > > > Could you try the attached patch and send output after recreating your > pool? > > Just in case the evil anti-spam ate my answer, sent the results to your > Gmail account. > > Thanks Borja. It looks like all of the TRIM commands are formatted properly. The failures do not happen until about 10 seconds after the last TRIM to each drive was submitted, and immediately before TRIMs start to the next drive, so I'm assuming the failures are for the the last few TRIM commands but cannot say for sure. Could you apply patch v2 (attached) which will dump the TRIM payload contents inline with the failure messages? Thanks, -Jim delete_debug_v2.patch Description: Binary data ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Intel NVMe troubles?
On Thu, Jul 28, 2016 at 3:29 AM, Borja Marcoswrote: > Hi :) > > Still experimenting with NVMe drives and FreeBSD, and I have ran into > problems, I think. > > I´ve got a server with 10 Intel DC P3500 NVMe drives. Right now, running > 11-BETA2. > > I have updated the firmware in the drives to the latest version (8DV10174) > using the Data Center Tools. > And I’ve formatted them for 4 KB blocks (LBA format #3) > > nvmecontrol identify nvme0ns1 > Size (in LBAs): 488378646 (465M) > Capacity (in LBAs): 488378646 (465M) > Utilization (in LBAs): 488378646 (465M) > Thin Provisioning: Not Supported > Number of LBA Formats: 7 > Current LBA Format: LBA Format #03 > LBA Format #00: Data Size: 512 Metadata Size: 0 > LBA Format #01: Data Size: 512 Metadata Size: 8 > LBA Format #02: Data Size: 512 Metadata Size:16 > LBA Format #03: Data Size: 4096 Metadata Size: 0 > LBA Format #04: Data Size: 4096 Metadata Size: 8 > LBA Format #05: Data Size: 4096 Metadata Size:64 > LBA Format #06: Data Size: 4096 Metadata Size: 128 > > > ZFS properly detects the 4 KB block size and sets the correct ashift (12). > But I’ve found these error messages > generated while I created a pool (zpool create tank raidz2 /dev/nvd[0-8] > spare /dev/nvd9) > > Jul 28 13:16:11 nvme2 kernel: nvme0: DATASET MANAGEMENT sqid:6 cid:63 > nsid:1 > Jul 28 13:16:11 nvme2 kernel: nvme0: LBA OUT OF RANGE (00/80) sqid:6 > cid:63 cdw0:0 > Jul 28 13:16:11 nvme2 kernel: nvme0: DATASET MANAGEMENT sqid:6 cid:62 > nsid:1 > Jul 28 13:16:11 nvme2 kernel: nvme0: LBA OUT OF RANGE (00/80) sqid:6 > cid:62 cdw0:0 > Jul 28 13:16:11 nvme2 kernel: nvme0: DATASET MANAGEMENT sqid:6 cid:61 > nsid:1 > Jul 28 13:16:11 nvme2 kernel: nvme0: LBA OUT OF RANGE (00/80) sqid:6 > cid:61 cdw0:0 > Jul 28 13:16:11 nvme2 kernel: nvme0: DATASET MANAGEMENT sqid:6 cid:60 > nsid:1 > Jul 28 13:16:11 nvme2 kernel: nvme0: LBA OUT OF RANGE (00/80) sqid:6 > cid:60 cdw0:0 > Jul 28 13:16:11 nvme2 kernel: nvme0: DATASET MANAGEMENT sqid:6 cid:59 > nsid:1 > Jul 28 13:16:11 nvme2 kernel: nvme0: LBA OUT OF RANGE (00/80) sqid:6 > cid:59 cdw0:0 > Jul 28 13:16:11 nvme2 kernel: nvme0: DATASET MANAGEMENT sqid:6 cid:58 > nsid:1 > Jul 28 13:16:11 nvme2 kernel: nvme0: LBA OUT OF RANGE (00/80) sqid:6 > cid:58 cdw0:0 > Jul 28 13:16:11 nvme2 kernel: nvme0: DATASET MANAGEMENT sqid:6 cid:57 > nsid:1 > Jul 28 13:16:11 nvme2 kernel: nvme0: LBA OUT OF RANGE (00/80) sqid:6 > cid:57 cdw0:0 > Jul 28 13:16:11 nvme2 kernel: nvme0: DATASET MANAGEMENT sqid:6 cid:56 > nsid:1 > Jul 28 13:16:11 nvme2 kernel: nvme0: LBA OUT OF RANGE (00/80) sqid:6 > cid:56 cdw0:0 > > And the same for the rest of the drives [0-9]. > > Should I worry? > Yes, you should worry. Normally we could use the dump_debug sysctls to help debug this - these sysctls will dump the NVMe I/O submission and completion queues. But in this case the LBA data is in the payload, not the NVMe submission entries, so dump_debug will not help as much as dumping the NVMe DSM payload directly. Could you try the attached patch and send output after recreating your pool? -Jim Thanks! > > > > > Borja. > > > > ___ > freebsd-stable@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" delete_debug.patch Description: Binary data ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: FreeBSD 10.3 - nvme regression
On Mon, Mar 7, 2016 at 5:33 AM, Borja Marcoswrote: > > Hello, > > I am trying a SuperMicro server with NVME disks. The system boots FreeBSD > 10.2, panics when booting FreeBSD 10.3. > > It was compiled on March 7th and Revision 296191 is included. > > On 10.3 it’s crashing right after this line: > > nvme9: mem 0xfba1-0xfba13fff irq 59 at > device 0.0 on pci134 > > with a panic. > > panic: couldn’t find an APIC vector for IRQ 59. > > cpuid = 0 > The backtrace is (sorry, copying from a screen video) > > #0 kdb_backtrace=0x60 > #1 vpanic+0x126 > #2 panic+0x43 > #3 ioapic_disable_intr+0 > #4 intr_add_handler+0xfb > #5 nexus_setup_inter+0x8a > #6 pci_setup_intr+0x33 > #7 pci_setup_intr+0x33 > #8 bus_setup_intr+0xac > #9 nvme_ctrlr_configure_intx+0x88 > #10 nvme_ctrlr_construct+0x407 > #11 nvme_attach+0x20 > #12 device_attach+0x43d > #13 bus_generic_attach+0x2d > #14 acpi_pci_attach+0x15c > #15 device_attach+0x43d > #16 bus_generic_attach+0x2d > #17 acpi_pcib_attach+0x22c > > It said “Uptime 1s” and did a cold reboot. > Hi, (Moving to freebsd-stable. NVMe is not associated with the SCSI stack at all.) Can you please file a bug report on this? Also, can you try setting the following loader variable before install? hw.nvme.min_cpus_per_ioq=4 I am fairly certain you are hitting bug 199321, and since you have so many devices in your system (NVMe + NICs) allocating per-CPU MSIx vectors, that this last NVMe device cannot even allocate one APIC vector entry for an INTx interrupt. https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=199321 -Jim > > > > > dmesg.boot from 10.2 (the system in installed on a memory stick). > > root@ssd9:/usr/src/sys/dev/nvme # cat /var/run/dmesg.boot > Copyright (c) 1992-2015 The FreeBSD Project. > Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 > The Regents of the University of California. All rights reserved. > FreeBSD is a registered trademark of The FreeBSD Foundation. > FreeBSD 10.2-RELEASE #0 r28: Wed Aug 12 15:26:37 UTC 2015 > r...@releng1.nyi.freebsd.org:/usr/obj/usr/src/sys/GENERIC amd64 > FreeBSD clang version 3.4.1 (tags/RELEASE_34/dot1-final 208032) 20140512 > CPU: Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz (2400.04-MHz K8-class CPU) > Origin="GenuineIntel" Id=0x306f2 Family=0x6 Model=0x3f Stepping=2 > > Features=0xbfebfbff > > Features2=0x7ffefbff > AMD Features=0x2c100800 > AMD Features2=0x21 > Structured Extended > Features=0x37ab > XSAVE Features=0x1 > VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID,VID,PostIntr > TSC: P-state invariant, performance statistics > real memory = 137438953472 (131072 MB) > avail memory = 133409718272 (127229 MB) > Event timer "LAPIC" quality 600 > ACPI APIC Table: > FreeBSD/SMP: Multiprocessor System Detected: 32 CPUs > FreeBSD/SMP: 2 package(s) x 8 core(s) x 2 SMT threads > cpu0 (BSP): APIC ID: 0 > cpu1 (AP): APIC ID: 1 > cpu2 (AP): APIC ID: 2 > cpu3 (AP): APIC ID: 3 > cpu4 (AP): APIC ID: 4 > cpu5 (AP): APIC ID: 5 > cpu6 (AP): APIC ID: 6 > cpu7 (AP): APIC ID: 7 > cpu8 (AP): APIC ID: 8 > cpu9 (AP): APIC ID: 9 > cpu10 (AP): APIC ID: 10 > cpu11 (AP): APIC ID: 11 > cpu12 (AP): APIC ID: 12 > cpu13 (AP): APIC ID: 13 > cpu14 (AP): APIC ID: 14 > cpu15 (AP): APIC ID: 15 > cpu16 (AP): APIC ID: 16 > cpu17 (AP): APIC ID: 17 > cpu18 (AP): APIC ID: 18 > cpu19 (AP): APIC ID: 19 > cpu20 (AP): APIC ID: 20 > cpu21 (AP): APIC ID: 21 > cpu22 (AP): APIC ID: 22 > cpu23 (AP): APIC ID: 23 > cpu24 (AP): APIC ID: 24 > cpu25 (AP): APIC ID: 25 > cpu26 (AP): APIC ID: 26 > cpu27 (AP): APIC ID: 27 > cpu28 (AP): APIC ID: 28 > cpu29 (AP): APIC ID: 29 > cpu30 (AP): APIC ID: 30 > cpu31 (AP): APIC ID: 31 > ioapic0 irqs 0-23 on motherboard > ioapic1 irqs 24-47 on motherboard > ioapic2 irqs 48-71 on motherboard > random: initialized > module_register_init: MOD_LOAD (vesa, 0x80db8eb0, 0) error 19 > kbd1 at kbdmux0 > acpi0: on motherboard > acpi0: Power Button (fixed) > cpu0: on acpi0 > cpu1: on acpi0 > cpu2: on acpi0 > cpu3: on acpi0 > cpu4: on acpi0 > cpu5: on acpi0 > cpu6: on acpi0 > cpu7: on acpi0 > cpu8: on acpi0 > cpu9: on acpi0 > cpu10: on acpi0 > cpu11: on acpi0 > cpu12: on acpi0 > cpu13: on acpi0 > cpu14: on acpi0 > cpu15: on acpi0 > cpu16: on acpi0 > cpu17: on acpi0 > cpu18: on acpi0 > cpu19: on acpi0 > cpu20: on acpi0 > cpu21: on acpi0 > cpu22: on acpi0 > cpu23: on acpi0 > cpu24: on acpi0 > cpu25: on acpi0 > cpu26: on acpi0 > cpu27: on acpi0 > cpu28: on acpi0 > cpu29: on acpi0 > cpu30: on acpi0 > cpu31: on
Re: Dell NVMe issues
On Tue, Oct 6, 2015 at 9:42 AM, Steven Hartland <kill...@multiplay.co.uk> wrote: > Also looks like nvme exposes a timeout_period sysctl you could try > increasing that as it could be too small for a full disk TRIM. > > Under CAM SCSI da support we have a delete_max which limits the max single > request size for a delete it may be we need something similar for nvme as > well to prevent this as it should still be chunking the deletes to ensure > this sort of thing doesn't happen. See attached. Sean - can you try this patch with TRIM re-enabled in ZFS? I would be curious if TRIM passes without this patch if you increase the timeout_period as suggested. -Jim > > > On 06/10/2015 16:18, Sean Kelly wrote: > >> Back in May, I posted about issues I was having with a Dell PE R630 with >> 4x800GB NVMe SSDs. I would get kernel panics due to the inability to assign >> all the interrupts because of >> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=199321 < >> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=199321>. Jim Harris >> helped fix this issue so I bought several more of these servers, Including >> ones with 4x1.6TB drives… >> >> while the new servers with 4x800GB drives still work, the ones with >> 4x1.6TB drives do not. When I do a >> zpool create tank mirror nvd0 nvd1 mirror nvd2 nvd3 >> the command never returns and the kernel logs: >> nvme0: resetting controller >> nvme0: controller ready did not become 0 within 2000 ms >> >> I’ve tried several different things trying to understand where the actual >> problem is. >> WORKS: dd if=/dev/nvd0 of=/dev/null bs=1m >> WORKS: dd if=/dev/zero of=/dev/nvd0 bs=1m >> WORKS: newfs /dev/nvd0 >> FAILS: zpool create tank mirror nvd[01] >> FAILS: gpart add -t freebsd-zfs nvd[01] && zpool create tank mirror >> nvd[01]p1 >> FAILS: gpart add -t freebsd-zfs -s 1400g nvd[01[ && zpool create tank >> nvd[01]p1 >> WORKS: gpart add -t freebsd-zfs -s 800g nvd[01] && zpool create tank >> nvd[01]p1 >> >> NOTE: The above commands are more about getting the point across, not >> validity. I wiped the disk clean between gpart attempts and used GPT. >> >> So it seems like zpool works if I don’t cross past ~800GB. But other >> things like dd and newfs work. >> >> When I get the kernel messages about the controller resetting and then >> not responding, the NVMe subsystem hangs entirely. Since my boot disks are >> not NVMe, the system continues to work but no more NVMe stuff can be done. >> Further, attempting to reboot hangs and I have to do a power cycle. >> >> Any thoughts on what the deal may be here? >> >> 10.2-RELEASE-p5 >> >> nvme0@pci0:132:0:0: class=0x010802 card=0x1f971028 chip=0xa820144d >> rev=0x03 hdr=0x00 >> vendor = 'Samsung Electronics Co Ltd' >> class = mass storage >> subclass = NVM >> >> > ___ > freebsd-stable@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" > nvd.patch Description: Binary data ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Dell NVMe issues
On Tue, Oct 6, 2015 at 11:46 AM, Steven Hartland <kill...@multiplay.co.uk> wrote: > On 06/10/2015 19:03, Jim Harris wrote: > > > > On Tue, Oct 6, 2015 at 9:42 AM, Steven Hartland < > <kill...@multiplay.co.uk>kill...@multiplay.co.uk> wrote: > >> Also looks like nvme exposes a timeout_period sysctl you could try >> increasing that as it could be too small for a full disk TRIM. >> > >> Under CAM SCSI da support we have a delete_max which limits the max >> single request size for a delete it may be we need something similar for >> nvme as well to prevent this as it should still be chunking the deletes to >> ensure this sort of thing doesn't happen. > > > See attached. Sean - can you try this patch with TRIM re-enabled in ZFS? > > I would be curious if TRIM passes without this patch if you increase the > timeout_period as suggested. > > -Jim > > > Interesting does the nvme spec not provide information from the device as > to what its optimal / max deallocate request size should be like the ATA > spec exposes? > Correct - there is no way for devices to specify a max/optimal deallocate size in NVMe. There is an implicit limit from the 32-bit LBA length in the DSM Range data structure defined by the spec. So this patch is needed anyways to make sure we don't overflow the 32-bit LBA length. Sean's drives are 1.6TB which fits in an unsigned 32-bit value on a 512-byte sector formatted controller, so I don't think that is the problem in Sean's case. -Jim > Regards > Steve > ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ISCI bus_alloc_resource failed
On Mon, Sep 7, 2015 at 10:37 PM, Bradley W. Dutton < brad-fbsd-sta...@duttonbros.com> wrote: > Quoting Jim Harris <jim.har...@gmail.com>: > > On Mon, Sep 7, 2015 at 7:29 PM, Bradley W. Dutton < >> brad-fbsd-sta...@duttonbros.com> wrote: >> >> There are 2 devices in the same group so I passed both of them: >>> http://duttonbrosllc.com/misc/vmware_esxi_passthrough_config.png >>> >>> At the time I wasn't sure if this was necessary but I just tried the >>> Centos 7 VM and it worked without the SMBus device being passed through. >>> I >>> then tried the FreeBSD VM without SMBus and saw the same allocation error >>> as before. Looks like the SMBus device is a red herring? >>> >>> >>> Looks like on ESXi we are using Xen HVM init ops, which do not enable >> MSI. >> And the isci driver is not reverting to INTx resource allocation when MSIx >> vector allocation fails. I've added reverting to INTx in the attached >> patch - can you try once more? >> >> Thanks, >> >> -Jim >> > > That patch worked. No allocation errors and the drives work as expected. > > Thanks again, > Brad > > Thanks Brad. Committed as r287563 (pci_enable_busmaster) and r287564 (pci_alloc_msix check). ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ISCI bus_alloc_resource failed
On Mon, Sep 7, 2015 at 7:29 PM, Bradley W. Dutton < brad-fbsd-sta...@duttonbros.com> wrote: > There are 2 devices in the same group so I passed both of them: > http://duttonbrosllc.com/misc/vmware_esxi_passthrough_config.png > > At the time I wasn't sure if this was necessary but I just tried the > Centos 7 VM and it worked without the SMBus device being passed through. I > then tried the FreeBSD VM without SMBus and saw the same allocation error > as before. Looks like the SMBus device is a red herring? > > Looks like on ESXi we are using Xen HVM init ops, which do not enable MSI. And the isci driver is not reverting to INTx resource allocation when MSIx vector allocation fails. I've added reverting to INTx in the attached patch - can you try once more? Thanks, -Jim isci.patch Description: Binary data ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ISCI bus_alloc_resource failed
On Mon, Sep 7, 2015 at 10:34 AM, Bradley W. Dutton < brad-fbsd-sta...@duttonbros.com> wrote: > Hi, > > I'm having trouble with the isci driver in both stable and current. I see > the following dmesg in stable: > > isci0:port > 0x5000-0x50ff mem 0xe7afc000-0xe7af,0xe740-0xe77f irq 19 at > device 0.0 on pci11 > isci: 1:51 ISCI bus_alloc_resource failed > > > I'm running FreeBSD on VMWare ESXi 6 with vt-d passthrough of the isci > devices, here is the relevant pciconf output: > > none2@pci0:3:0:0: class=0x0c0500 card=0x062815d9 chip=0x1d708086 > rev=0x06 hdr=0x00 > vendor = 'Intel Corporation' > device = 'C600/X79 series chipset SMBus Controller 0' > class = serial bus > subclass = SMBus > cap 10[90] = PCI-Express 2 endpoint max data 128(128) link x32(x32) > speed 5.0(5.0) ASPM disabled(L0s) > cap 01[cc] = powerspec 3 supports D0 D3 current D0 > cap 05[d4] = MSI supports 1 message > ecap 000e[100] = ARI 1 > isci0@pci0:11:0:0: class=0x010700 card=0x062815d9 chip=0x1d6b8086 > rev=0x06 hdr=0x00 > vendor = 'Intel Corporation' > device = 'C602 chipset 4-Port SATA Storage Control Unit' > class = mass storage > subclass = SAS > cap 01[98] = powerspec 3 supports D0 D3 current D0 > cap 10[c4] = PCI-Express 2 endpoint max data 128(128) link x32(x32) > speed 5.0(5.0) ASPM disabled(L0s) > cap 11[a0] = MSI-X supports 2 messages > Table in map 0x10[0x2000], PBA in map 0x10[0x3000] > ecap 0001[100] = AER 1 0 fatal 0 non-fatal 1 corrected > ecap 000e[138] = ARI 1 > ecap 0017[180] = TPH Requester 1 > ecap 0010[140] = SRIOV 1 > > > I haven't tried booting on bare metal but running a linux distro (centos > 7) in the same VM works without issue. Is is possible the SRIOV option is > causing trouble? I don't see a BIOS option to disable that setting on this > server like I have on some others. Any other ideas to get this working? > I do not think the SRIOV is the problem here. I do notice that isci(4) does not explicitly enable PCI busmaster, which will cause problems with PCI passthrough. I've attached a patch that rectifies that issue. I'm not certain that is the root cause of the interrupt resource allocation failure though. Could you: 1) Apply the attached patch and retest. 2) If you still see the resource allocation failure, reboot in verbose mode and provide the resulting dmesg output. Thanks, -Jim > Thanks, > Brad > > ___ > freebsd-stable@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" > isci_busmaster.patch Description: Binary data ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Problems adding Intel 750 to zfs pool
On Fri, Jul 17, 2015 at 9:35 PM, dy...@techtangents.com dy...@techtangents.com wrote: Hi, I've installed an Intel 750 400GB NVMe PCIe SSD in a Dell R320 running FreeBSD 10.2-beta-1... not STABLE, but not far behind, I think. Apologies if this is the wrong mailing list, or if this has been fixed in STABLE since the beta. Anyway, I've gparted it into 2 partitions - 16GB for slog/zil and 357GB for l2arc. Adding the slog partition to the pool takes about 2 minutes - machine seems hung during that time. Ping works, but I can't open another ssh session. Adding the l2arc doesn't seem to complete - it's been going 10 minutes now and nothing. Ping works, but I can't log in to the local console or another ssh session. I'm adding the partitions using their gpt names. i.e. zpool add zroot log gpt/slog zpool add zroot cache gpt/l2arc The system BIOS is up-to-date. The OS was a fresh 10.1 install, then freebsd-update to 10.2-beta2. 10.1 exhibited the same symptoms. Root is on zfs. Device was tested to be working on Windows 8.1 on a Dell T1700 workstation. Any ideas? Hi Dylan, I just committed SVN r285767 which should fix this issue. I will request MFC to stable/10 after the 3 day waiting period. Thanks, -Jim Cheers, Dylan Just ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: 10.1 NVMe kernel panic
On Thu, May 21, 2015 at 8:33 AM, Sean Kelly smke...@smkelly.org wrote: Greetings. I have a Dell R630 server with four of Dell’s 800GB NVMe SSDs running FreeBSD 10.1-p10. According to the PCI vendor, they are some sort of rebranded Samsung drive. If I boot the system and then load nvme.ko and nvd.ko from a command line, the drives show up okay. If I put nvme_load=“YES” nvd_load=“YES” in /boot/loader.conf, the box panics on boot: panic: nexus_setup_intr: NULL irq resource! If I boot the system with “Safe Mode: ON” from the loader menu, it also boots successfully and the drives show up. You can see a full ‘boot -v’ here: http://smkelly.org/stuff/nvme-panic.txt http://smkelly.org/stuff/nvme-panic.txt Anyone have any insight into what the issue may be here? Ideally I need to get this working in the next few days or return this thing to Dell. Hi Sean, Can you try adding hw.nvme.force_intx=1 to /boot/loader.conf? I suspect you are able to load the drivers successfully after boot because interrupt assignments are not restricted to CPU0 at that point - see https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=199321 for a related issue. Your logs clearly show that vectors were allocated for the first 2 NVMe SSDs, but the third could not get its full allocation. There is a bug in the INTx fallback code that needs to be fixed - you do not hit this bug when loading after boot because bug #199321 only affects interrupt allocation during boot. If the force_intx test works, would you able to upgrade your nvme drivers to the latest on stable/10? There are several patches (one related to interrupt vector allocation) that have been pushed to stable/10 since 10.1 was released, and I will be pushing another patch for the issue you have reported shortly. Thanks, -Jim Thanks! -- Sean Kelly smke...@smkelly.org http://smkelly.org ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: AHCI Patsburg SATA controller and slow transfer speed
On Thu, Jun 27, 2013 at 6:38 PM, Jeremy Chadwick j...@koitsu.org wrote: Intel Patsburg is otherwise known as Intel X79. The X79 chipset/southbridge offers 6 SATA ports, 2 of which are SATA600, and the remaining 4 are SATA300: http://en.wikipedia.org/wiki/Intel_X79 While Wikipedia correctly says Intel X79 is codenamed Patsburg, most Patsburgs are actually known as the C60x family of chipsets. X79 pairs with a Core i7 while the C60x pairs with a Xeon E5. ark.intel.com is a very good decoder ring for all of the Intel code names. http://ark.intel.com/products/codename/29968/Patsburg -Jim ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Strange CAM errors
On Mon, Dec 17, 2012 at 9:26 AM, Willem Jan Withagen w...@digiware.nlwrote: On 2012-12-17 15:38, Steven Hartland wrote: Check the smart results of each disk in the array you may have a failing disk. - Original Message - From: Willem Jan Withagen w...@digiware.nl To: FreeBSD Stable Users freebsd-stable@freebsd.org Sent: Monday, December 17, 2012 10:58 AM Subject: Strange CAM errors Hi, I have not noticed this before, but my system rebooted this morning and in the following security report I found a lot of messgaes in the dmesg-part like: +(probe0:arcmsr0:0:16:1): INQUIRY. CDB: 12 20 0 0 24 0 +(probe0:arcmsr0:0:16:1): CAM status: Command timeout +(probe0:arcmsr0:0:16:1): Retrying command +(probe0:arcmsr0:0:16:1): INQUIRY. CDB: 12 20 0 0 24 0 +(probe0:arcmsr0:0:16:1): CAM status: Command timeout +(probe0:arcmsr0:0:16:1): Retrying command And it seems that bus 16 is: +pass6 at arcmsr0 bus 0 scbus0 target 16 lun 0 +pass6: Areca RAID controller R001 Fixed Processor SCSI-0 device The system has been running FreeBSD zfs.digiware.nl 9.1-PRERELEASE FreeBSD 9.1-PRERELEASE #3: Wed Nov 14 13:25:55 CET 2012 r...@zfs.digiware.nl:/usr/obj/usr/srcs/src9/src/sys/ZFS amd64 for already a while. Anybody suggestions as to why I have these messages? They are during the boot sequence, so no smartd talking to the disks at that moment. --WjW ps: dmesg, config, etc at: http://www.tegenbosch28.nl/FreeBSD/Systems/ZFS ps2: upgrading to the most recent 9.1 'mmm, Smartd seems to think otherwise... 'camcontrol rescan all' actually delivers the same pack of errors. --WjW The timeouts are occurring on inquiry commands to non-zero LUNs. arcmsr(4) is returning CAM_SEL_TIMEOUT instead of CAM_DEV_NOT_THERE for inquiry commands to this device and LUN 0. CAM_DEV_NOT_THERE is preferred to remove these types of warnings, and similar patches have gone into for other SCSI drivers recently. Can you try this patch? Index: sys/dev/arcmsr/arcmsr.c === --- sys/dev/arcmsr/arcmsr.c (revision 244190) +++ sys/dev/arcmsr/arcmsr.c (working copy) @@ -2439,7 +2439,7 @@ char *buffer=pccb-csio.data_ptr; if (pccb-ccb_h.target_lun) { - pccb-ccb_h.status |= CAM_SEL_TIMEOUT; + pccb-ccb_h.status |= CAM_DEV_NOT_THERE; xpt_done(pccb); return; } ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Strange CAM errors
On Mon, Dec 17, 2012 at 2:45 PM, Willem Jan Withagen w...@digiware.nlwrote: On 17-12-2012 20:16, Jim Harris wrote: The timeouts are occurring on inquiry commands to non-zero LUNs. arcmsr(4) is returning CAM_SEL_TIMEOUT instead of CAM_DEV_NOT_THERE for inquiry commands to this device and LUN 0. CAM_DEV_NOT_THERE is preferred to remove these types of warnings, and similar patches have gone into for other SCSI drivers recently. Can you try this patch? Index: sys/dev/arcmsr/arcmsr.c === --- sys/dev/arcmsr/arcmsr.c (revision 244190) +++ sys/dev/arcmsr/arcmsr.c (working copy) @@ -2439,7 +2439,7 @@ char *buffer=pccb-csio.data_ptr; if (pccb-ccb_h.target_lun) { - pccb-ccb_h.status |= CAM_SEL_TIMEOUT; + pccb-ccb_h.status |= CAM_DEV_NOT_THERE; xpt_done(pccb); return; } Hi Jim, The noise has gone down by a factor of 5, now I get: (probe6:arcmsr0:0:16:1): INQUIRY. CDB: 12 20 0 0 24 0 (probe6:arcmsr0:0:16:1): CAM status: Unable to terminate I/O CCB request (probe6:arcmsr0:0:16:1): Error 5, Unretryable error (probe6:arcmsr0:0:16:2): INQUIRY. CDB: 12 40 0 0 24 0 Which is defined in sys/cam/cam.c as CAM_UA_TERMIO, but that error is nowhere set in the arcmsr code There is something out of sync on your system. I just noticed this, but your original error messages were showing Command timeout (CAM_CMD_TIMEOUT) even though the driver was returning CAM_SEL_TIMEOUT. Now in this case, driver is returning CAM_DEV_NOT_THERE, but CAM is printing error message for CAM_UA_TERMIO. In both cases, driver is returning value X, but cam is interpreting it as X+1. So CAM and arcmsr(4) seem to have a different idea of the values of the cam_status enumeration. Can you provide details on your build environment? Are you building arcmsr as a loadable module or do you specify device arcmsr in your kernel config to link it statically? I'm suspecting loadable module, although I have no idea how these values would get out of sync since this enumeration hasn't changed in probably 10+ years. -Jim ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Strange CAM errors
On Mon, Dec 17, 2012 at 3:21 PM, Willem Jan Withagen w...@digiware.nlwrote: On 17-12-2012 23:10, Jim Harris wrote: On Mon, Dec 17, 2012 at 2:45 PM, Willem Jan Withagen w...@digiware.nl mailto:w...@digiware.nl wrote: On 17-12-2012 20:16, Jim Harris wrote: The timeouts are occurring on inquiry commands to non-zero LUNs. arcmsr(4) is returning CAM_SEL_TIMEOUT instead of CAM_DEV_NOT_THERE for inquiry commands to this device and LUN 0. CAM_DEV_NOT_THERE is preferred to remove these types of warnings, and similar patches have gone into for other SCSI drivers recently. Can you try this patch? Index: sys/dev/arcmsr/arcmsr.c === --- sys/dev/arcmsr/arcmsr.c (revision 244190) +++ sys/dev/arcmsr/arcmsr.c (working copy) @@ -2439,7 +2439,7 @@ char *buffer=pccb-csio.data_ptr; if (pccb-ccb_h.target_lun) { - pccb-ccb_h.status |= CAM_SEL_TIMEOUT; + pccb-ccb_h.status |= CAM_DEV_NOT_THERE; xpt_done(pccb); return; } Hi Jim, The noise has gone down by a factor of 5, now I get: (probe6:arcmsr0:0:16:1): INQUIRY. CDB: 12 20 0 0 24 0 (probe6:arcmsr0:0:16:1): CAM status: Unable to terminate I/O CCB request (probe6:arcmsr0:0:16:1): Error 5, Unretryable error (probe6:arcmsr0:0:16:2): INQUIRY. CDB: 12 40 0 0 24 0 Which is defined in sys/cam/cam.c as CAM_UA_TERMIO, but that error is nowhere set in the arcmsr code There is something out of sync on your system. I just noticed this, but your original error messages were showing Command timeout (CAM_CMD_TIMEOUT) even though the driver was returning CAM_SEL_TIMEOUT. Now in this case, driver is returning CAM_DEV_NOT_THERE, but CAM is printing error message for CAM_UA_TERMIO. In both cases, driver is returning value X, but cam is interpreting it as X+1. So CAM and arcmsr(4) seem to have a different idea of the values of the cam_status enumeration. Can you provide details on your build environment? Are you building arcmsr as a loadable module or do you specify device arcmsr in your kernel config to link it statically? I'm suspecting loadable module, although I have no idea how these values would get out of sync since this enumeration hasn't changed in probably 10+ years. arcmsr is build in the kernel [/usr/src] w...@zfs.digiware.nl kldstat Id Refs AddressSize Name 1 28 0x8020 b55be0 kernel 21 0x80d56000 6138 nullfs.ko 31 0x80d5d000 2153b0 zfs.ko 42 0x80f73000 5e38 opensolaris.ko 51 0x80f79000 f510 aio.ko 61 0x80f89000 2a20 coretemp.ko 71 0x81012000 316d4nfscl.ko 82 0x81044000 10827nfscommon.ko And I just refetched 9.1-PRERELEASE this afternoon over svn Could this have something to do with Clang gcc Not that I did anything to change this. Note that I have nothing changed other than the KERNEL CONFIG file. And both kernel and world were build at the same time this afternoon. With your patch I just only rebuild kernel and modules. Never mind my earlier comment on out-of-sync. It's another bug in arcmsr(4) - CAM_REQ_CMP == 0x1, and in the LUN 0 case here it OR's the status values together, causing the off-by-one issue we were seeing. Please try the following patch instead (reverting earlier patch): Index: sys/dev/arcmsr/arcmsr.c === --- sys/dev/arcmsr/arcmsr.c (revision 244190) +++ sys/dev/arcmsr/arcmsr.c (working copy) @@ -2432,14 +2432,13 @@ static void arcmsr_handle_virtual_command(struct AdapterControlBlock *acb, union ccb * pccb) { - pccb-ccb_h.status |= CAM_REQ_CMP; switch (pccb-csio.cdb_io.cdb_bytes[0]) { case INQUIRY: { unsigned char inqdata[36]; char *buffer=pccb-csio.data_ptr; if (pccb-ccb_h.target_lun) { - pccb-ccb_h.status |= CAM_SEL_TIMEOUT; + pccb-ccb_h.status |= CAM_DEV_NOT_THERE; xpt_done(pccb); return; } @@ -2455,6 +2454,7 @@ strncpy(inqdata[16], RAID controller , 16); /* Product Identification */ strncpy(inqdata[32], R001, 4); /* Product Revision */ memcpy(buffer, inqdata, sizeof(inqdata)); + pccb-ccb_h.status |= CAM_REQ_CMP; xpt_done(pccb); } break; @@ -2464,10 +2464,12 @@ pccb-ccb_h.status |= CAM_SCSI_STATUS_ERROR
Re: Strange CAM errors
On Mon, Dec 17, 2012 at 4:52 PM, Willem Jan Withagen w...@digiware.nlwrote: Right, That did the trick. Thanx for the code. --WjW Patch committed as r244369. It will get MFC'd but obviously won't be in 9.1. Thanks, -Jim ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: tws bug ? (LSI SAS 9750)
On Fri, Oct 26, 2012 at 1:18 PM, John-Mark Gurney j...@funkthat.com wrote: I'm seeing similar stuff on the hpt27xx driver: (probe18:hpt27xx0:0:18:0): INQUIRY. CDB: 12 0 0 0 24 0 (probe18:hpt27xx0:0:18:0): CAM status: Invalid Target ID (probe18:hpt27xx0:0:18:0): Error 22, Unretryable error Should I make a similar change in sys/dev/hpt27xx/osm_bsd.c? Looks like there are two CAM_TID_INVALID lines, but from reading the comments, only the second one should change... Correct? If so, I'll try making the change and make sure everything works well. Yes - I agree that a similar change is needed, and only to the second one in that file. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: tws bug ? (LSI SAS 9750)
On Sat, Sep 22, 2012 at 1:32 AM, Thomas Mueller muelle...@insightbb.com wrote: The specific subject of this thread is not my issue, but I did notice problems apparently related to CAM on a SATA hard drive. I would suggest starting a new thread if you have a different issue. I use one UFS partition, with FreeBSD 9.0-BETA1 installed (subsequently updated on another partition, using GPT as opposed to MBR), for ports tree and also NetBSD pkgsrc and NetBSD source code. I built NetBSD 5.1_STABLE i386 from FreeBSD and also built xorg-modular on the new NetBSD installation from pkgsrc. Going into and out of the newly installed Xorg resulted in some crashes with the FreeBSD 9.0-BETA1 partition mounted and not cleanly unmounted. File system was damaged, and FreeBSD fsck_ffs wouldn't fix it, went into a loop: Script started on Wed Sep 19 04:15:02 2012 fsck_ffs /dev/ada0p9 ** /dev/ada0p9 ** Last Mounted on /BETA1 ** Phase 1 - Check Blocks and Sizes CANNOT READ BLK: 7584192 CONTINUE? [yn] y THE FOLLOWING DISK SECTORS COULD NOT BE READ: 7584318, 7584319, ** Phase 2 - Check Pathnames ** Phase 3 - Check Connectivity ** Phase 4 - Check Reference Counts ** Phase 5 - Check Cyl groups 1475900 files, 4638292 used, 21162419 free (61643 frags, 2637597 blocks, 0.2% fragmentation) * FILE SYSTEM STILL DIRTY * * PLEASE RERUN FSCK * Script done on Wed Sep 19 04:17:27 2012 This happened repeatedly, meaning an impasse. I didn't get to record preceding error messages relating to ATA and CAM but, seeing this last message, wonder if there are some bugs in the CAM. I booted that new NetBSD 5.1_STABLE i386 installation, on a USB stick, was able to mount that partition and see it wasn't trashed though there was a message about the dirty flag. I then umounted and ran NetBSD fsck_ffs successfully, just a few files were lost, and FreeBSD can access that partition again. I still intend to be more cautious when in NetBSD, not mounting a FreeBSD partition unnecessarily when doing something crash-prone on my system in NetBSD, such as going into and out of X. Tom ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: tws bug ? (LSI SAS 9750)
On Fri, Sep 21, 2012 at 1:07 PM, Mike Tancsa m...@sentex.net wrote: Hi, I have been trying out a nice new tws controller and decided to enable debugging in the kernel and run some stress tests. With a regular GENERIC kernel, it boots up fine. But with debugging, it panics on boot. Anyone know whats up ? Is this something that should be sent directly to LSI ? Through a code inspection, this mutex is being recursed whether or not debugging is enabled. There is no code path here specific to INVARIANTS. And the main IO path in this driver is always recursing on this lock - it is not specific to the initialization callstack you listed below. The best course of action seems to be initializing the lock with MTX_RECURSE, since the driver seems to expect to be able to recurse on the io_lock. Can you try the following patch? diff --git a/sys/dev/tws/tws.c b/sys/dev/tws/tws.c index b1615db..d156d40 100644 --- a/sys/dev/tws/tws.c +++ b/sys/dev/tws/tws.c @@ -197,7 +197,7 @@ tws_attach(device_t dev) mtx_init( sc-q_lock, tws_q_lock, NULL, MTX_DEF); mtx_init( sc-sim_lock, tws_sim_lock, NULL, MTX_DEF); mtx_init( sc-gen_lock, tws_gen_lock, NULL, MTX_DEF); -mtx_init( sc-io_lock, tws_io_lock, NULL, MTX_DEF); +mtx_init( sc-io_lock, tws_io_lock, NULL, MTX_DEF | MTX_RECURSE); if ( tws_init_trace_q(sc) == FAILURE ) printf(trace init failure\n); pcib0: ACPI Host-PCI bridge port 0xcf8-0xcff on acpi0 pci0: ACPI PCI bus on pcib0 pcib1: ACPI PCI-PCI bridge irq 16 at device 1.0 on pci0 pci1: ACPI PCI bus on pcib1 pcib2: ACPI PCI-PCI bridge irq 17 at device 1.1 on pci0 pci2: ACPI PCI bus on pcib2 LSI 3ware device driver for SAS/SATA storage controllers, version: 10.80.00.003 tws0: LSI 3ware SAS/SATA Storage Controller port 0x4000-0x40ff mem 0xc246-0xc2463fff,0xc240-0xc243 irq 17 at device 0.0 on pci2 tws0: Using legacy INTx panic: _mtx_lock_sleep: recursed on non-recursive mutex tws_io_lock @ /usr/HEAD/src/sys/dev/tws/tws_hdm.c:287 cpuid = 0 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a kdb_backtrace() at kdb_backtrace+0x37 panic() at panic+0x1d8 _mtx_lock_sleep() at _mtx_lock_sleep+0x27f _mtx_lock_flags() at _mtx_lock_flags+0xf1 tws_submit_command() at tws_submit_command+0x3f tws_dmamap_data_load_cbfn() at tws_dmamap_data_load_cbfn+0xb7 bus_dmamap_load() at bus_dmamap_load+0x16c tws_map_request() at tws_map_request+0x78 tws_get_param() at tws_get_param+0xe1 tws_display_ctlr_info() at tws_display_ctlr_info+0x4c tws_init_ctlr() at tws_init_ctlr+0x6d tws_attach() at tws_attach+0x68c device_attach() at device_attach+0x72 bus_generic_attach() at bus_generic_attach+0x1a acpi_pci_attach() at acpi_pci_attach+0x164 device_attach() at device_attach+0x72 bus_generic_attach() at bus_generic_attach+0x1a acpi_pcib_attach() at acpi_pcib_attach+0x1a7 acpi_pcib_pci_attach() at acpi_pcib_pci_attach+0x9b device_attach() at device_attach+0x72 bus_generic_attach() at bus_generic_attach+0x1a acpi_pci_attach() at acpi_pci_attach+0x164 device_attach() at device_attach+0x72 bus_generic_attach() at bus_generic_attach+0x1a acpi_pcib_attach() at acpi_pcib_attach+0x1a7 acpi_pcib_acpi_attach() at acpi_pcib_acpi_attach+0x1f6 device_attach() at device_attach+0x72 bus_generic_attach() at bus_generic_attach+0x1a acpi_attach() at acpi_attach+0xbc1 device_attach() at device_attach+0x72 bus_generic_attach() at bus_generic_attach+0x1a nexus_acpi_attach() at nexus_acpi_attach+0x69 device_attach() at device_attach+0x72 bus_generic_new_pass() at bus_generic_new_pass+0xd6 bus_set_pass() at bus_set_pass+0x7a configure() at configure+0xa mi_startup() at mi_startup+0x77 btext() at btext+0x2c KDB: enter: panic [ thread pid 0 tid 10 ] Stopped at kdb_enter+0x3b: movq$0,0x993262(%rip) db int tws_submit_command(struct tws_softc *sc, struct tws_request *req) { u_int32_t regl, regh; u_int64_t mfa=0; /* * mfa register read and write must be in order. * Get the io_lock to protect against simultinous * passthru calls */ mtx_lock(sc-io_lock); if ( sc-obfl_q_overrun ) { tws_init_obfl_q(sc); } With no debugging in the kernel, it boots up fine pcib2: ACPI PCI-PCI bridge irq 17 at device 1.1 on pci0 pci2: ACPI PCI bus on pcib2 LSI 3ware device driver for SAS/SATA storage controllers, version: 10.80.00.003 tws0: LSI 3ware SAS/SATA Storage Controller port 0x4000-0x40ff mem 0xc246-0xc2463fff,0xc240-0xc243 irq 17 at device 0.0 on pci2 tws0: Using legacy INTx tws0: Controller details: Model 9750-4i, 8 Phys, Firmware FH9X 5.12.00.007, BIOS BE9X 5.11.00.006 em0: Intel(R) PRO/1000 Network Connection 7.3.2 port 0x5040-0x505f mem 0xc250-0xc251,0xc257-0xc2570fff irq 19 at device 25.0 on pci0 em0: Using an MSI interrupt em0: Ethernet address: 00:1e:67:45:b6:29 ehci0: EHCI (generic) USB 2.0 controller mem
Re: tws bug ? (LSI SAS 9750)
On Fri, Sep 21, 2012 at 3:11 PM, Mike Tancsa m...@sentex.net wrote: On 9/21/2012 4:59 PM, Jim Harris wrote: snip Thanks, that allows it to boot up now! pci2: ACPI PCI bus on pcib2 LSI 3ware device driver for SAS/SATA storage controllers, version: 10.80.00.003 tws0: LSI 3ware SAS/SATA Storage Controller port 0x4000-0x40ff mem 0xc246-0xc2463fff,0xc240-0xc243 irq 17 at device 0.0 on pci2 tws0: Using MSI tws0: Controller details: Model 9750-4i, 8 Phys, Firmware FH9X 5.12.00.007, BIOS BE9X 5.11.00.006 em0: Intel(R) PRO/1000 Network Connection 7.3.2 port 0x5040-0x505f mem 0xc250-0xc251,0xc257-0xc2570fff irq 19 at device 25.0 on pci0 . then a lot of . (probe65:tws0:0:65:0): INQUIRY. CDB: 12 0 0 0 24 0 (probe65:tws0:0:65:0): CAM status: Invalid Target ID (probe65:tws0:0:65:0): Error 22, Unretryable error (probe1:tws0:0:1:0): INQUIRY. CDB: 12 0 0 0 24 0 (probe1:tws0:0:1:0): CAM status: Invalid Target ID (probe1:tws0:0:1:0): Error 22, Unretryable error (probe2:tws0:0:2:0): INQUIRY. CDB: 12 0 0 0 24 0 (probe2:tws0:0:2:0): CAM status: Invalid Target ID . . . (probe63:tws0:0:63:0): INQUIRY. CDB: 12 0 0 0 24 0 (probe63:tws0:0:63:0): CAM status: Invalid Target ID (probe63:tws0:0:63:0): Error 22, Unretryable error (probe64:tws0:0:64:0): INQUIRY. CDB: 12 0 0 0 24 0 (probe64:tws0:0:64:0): CAM status: Invalid Target ID (probe64:tws0:0:64:0): Error 22, Unretryable error These can be ignored. CAM is just telling you that there are no devices attached at these target IDs. da0 at tws0 bus 0 scbus0 target 0 lun 0 da0: LSI 9750-4iDISK 5.12 Fixed Direct Access SCSI-5 device da0: 6000.000MB/s transfers da0: 953654MB (1953083392 512 byte sectors: 255H 63S/T 121573C) SMP: AP CPU #1 Launched! SMP: AP CPU #4 Launched! snip Any thoughts on msi vs no msi ? Time to run some stress tests. Its certainly a fast little controller for the money! Typically MSI is preferred to INTx for performance reasons. I can't speak for why the original author made INTx the default though. Regards, -Jim ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: tws bug ? (LSI SAS 9750)
On Fri, Sep 21, 2012 at 5:37 PM, Mike Tancsa m...@sentex.net wrote: On 9/21/2012 8:03 PM, Jim Harris wrote: . then a lot of . (probe65:tws0:0:65:0): INQUIRY. CDB: 12 0 0 0 24 0 (probe65:tws0:0:65:0): CAM status: Invalid Target ID (probe65:tws0:0:65:0): Error 22, Unretryable error (probe1:tws0:0:1:0): INQUIRY. CDB: 12 0 0 0 24 0 (probe1:tws0:0:1:0): CAM status: Invalid Target ID (probe1:tws0:0:1:0): Error 22, Unretryable error (probe2:tws0:0:2:0): INQUIRY. CDB: 12 0 0 0 24 0 (probe2:tws0:0:2:0): CAM status: Invalid Target ID . . . (probe63:tws0:0:63:0): INQUIRY. CDB: 12 0 0 0 24 0 (probe63:tws0:0:63:0): CAM status: Invalid Target ID (probe63:tws0:0:63:0): Error 22, Unretryable error (probe64:tws0:0:64:0): INQUIRY. CDB: 12 0 0 0 24 0 (probe64:tws0:0:64:0): CAM status: Invalid Target ID (probe64:tws0:0:64:0): Error 22, Unretryable error These can be ignored. CAM is just telling you that there are no devices attached at these target IDs. What about a change similar to what Alexander Motin did in http://lists.freebsd.org/pipermail/svn-src-head/2012-June/038196.html Ah, yes. I was thinking you had CAM_DEBUG enabled which is why you were seeing this spew - but that's not the case. This indeed should be fixed and not just ignored. Seeing the attributions on Alexander's commit, you certainly seem to have a monopoly on controllers that exhibit this problem on FreeBSD. :) I believe the CAM_LUN_INVALID here should be fixed as well, similar to the twa commit. If you send me a revised patch I will commit it. Thanks, -Jim 0(ich10)# diff -u tws_cam.c.orig tws_cam.c --- tws_cam.c.orig 2012-09-21 20:10:43.0 -0400 +++ tws_cam.c 2012-09-21 20:11:11.0 -0400 @@ -532,7 +532,7 @@ ccb-ccb_h.status |= CAM_LUN_INVALID; } else { TWS_TRACE_DEBUG(sc, invalid target error,0,0); -ccb-ccb_h.status |= CAM_TID_INVALID; +ccb-ccb_h.status |= CAM_SEL_TIMEOUT; } } else { 1(ich10)# ---Mike -- --- Mike Tancsa, tel +1 519 651 3400 Sentex Communications, m...@sentex.net Providing Internet services since 1994 www.sentex.net Cambridge, Ontario Canada http://www.tancsa.com/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: geom_virstor with kernel panic in FreeBSD 9.x
On Fri, Aug 3, 2012 at 5:06 AM, Marcelo Gondim gon...@bsdinfo.com.br wrote: Hi all, I sent a PR [1] but I decided to also send the problem here. If you try to destroy a geom_virstor that does not exist, this causes a kernel panic immediately. Just try: gvirstor load gvirstor destroy tatata # uname -a FreeBSD zeus..xxx.br 9.1-PRERELEASE FreeBSD 9.1-PRERELEASE #27: Mon Jul 16 01:41:24 BRT 2012 r...@zeus..xxx.br:/usr/obj/usr/src/sys/GONDIM amd64 [1] http://www.freebsd.org/cgi/query-pr.cgi?pr=170199 Best regards, Gondim Hi Gondim, Can you test the following patch? Index: sys/geom/virstor/g_virstor.c === --- sys/geom/virstor/g_virstor.c(revision 238909) +++ sys/geom/virstor/g_virstor.c(working copy) @@ -235,6 +235,12 @@ return; } sc = virstor_find_geom(cp, name); + if (sc == NULL) { + gctl_error(req, Don't know anything about '%s', name); + g_topology_unlock(); + return; + } + LOG_MSG(LVL_INFO, Stopping %s by the userland command, sc-geom-name); update_metadata(sc); Thanks, -Jim ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: geom_virstor with kernel panic in FreeBSD 9.x
On Fri, Aug 3, 2012 at 10:04 AM, Marcelo Gondim gon...@bsdinfo.com.br wrote: Hi Jim, When I applied the patch gave this error: # patch /root/patch.diff Hmm... Looks like a unified diff to me... The text leading up to this was: -- |--- sys/geom/virstor/g_virstor.c(revision 238909) |+++ sys/geom/virstor/g_virstor.c(working copy) -- Patching file sys/geom/virstor/g_virstor.c using Plan A... Hunk #1 failed at 235. 1 out of 1 hunks failed--saving rejects to sys/geom/virstor/g_virstor.c.rej done Strange. It applies with no issues on my checkout. Let's try an attachment instead. If this doesn't work, could you kindly apply the patch by hand? Thanks, -Jim virstor.diff Description: Binary data ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: geom_virstor with kernel panic in FreeBSD 9.x
On Fri, Aug 3, 2012 at 1:12 PM, Marcelo Gondim gon...@bsdinfo.com.br wrote: Hi Jim, Perfect!!! # gvirstor destroy tudo gvirstor: Don't know anything about 'tudo' Patch applied to head as r239021. I have requested approval from re@ to merge to stable/9. Thank you for confirming the patch on your end. Regards, -Jim ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org