[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496 Mark Linimon changed: What|Removed |Added Assignee|b...@freebsd.org|bugmeis...@freebsd.org Resolution|--- |Overcome By Events Status|New |Closed -- You are receiving this mail because: You are the assignee for the bug.
[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496 --- Comment #57 from Michel Le Cocq --- no more trouble since upgrade to 14.0. -- You are receiving this mail because: You are the assignee for the bug.
[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496 Michel Le Cocq changed: What|Removed |Added CC||no...@neuronfarm.net --- Comment #56 from Michel Le Cocq --- (In reply to Allan Jude from comment #55) Hi, is this patch been included in 13.2-RELEASE ? I'm on 13.2-RELEASE-p9 and have the same issue with 2 different 9207-8i card : Jan 5 09:38:49 gobt kernel: mps0: IOC Fault 0x4d04, Resetting Jan 5 09:38:49 gobt kernel: mps0: Reinitializing controller Jan 5 09:38:49 gobt kernel: mps0: Firmware: 20.00.07.00, Driver: 21.02.00.00-fbsd Controller information Controller type : SAS2308_2 BIOS version: 7.39.02.00 Firmware version: 20.00.07.00 mps0@pci0:4:0:0:class=0x010700 rev=0x05 hdr=0x00 vendor=0x1000 device=0x0087 subvendor=0x1000 subdevice=0x3020 vendor = 'Broadcom / LSI' device = 'SAS2308 PCI-Express Fusion-MPT SAS-2' class = mass storage subclass = SAS pcib8@pci0:6:0:0: clas -- You are receiving this mail because: You are the assignee for the bug.
[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496 Allan Jude changed: What|Removed |Added CC||allanj...@freebsd.org --- Comment #55 from Allan Jude --- If you see the error message: "kernel: mps0: IOC Fault 0x4d04, Resetting" You need to update to get this fix: https://cgit.freebsd.org/src/commit/?id=e30fceb89b7eb51825bdd65f9cc4fbadf107d763 If your errors don't include that code, that is a different problem. -- You are receiving this mail because: You are the assignee for the bug.
[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496 JerRatt IT changed: What|Removed |Added CC||cont...@jerratt.com --- Comment #54 from JerRatt IT --- I'm reporting either the same or similar issue, here are my findings, and please let me know if my plans sound correct: Setup: TrueNAS Scale 22.02.0.1 AMD Threadripper 1920X ASRock X399 Taichi 128GB (8x16GB) Crucial CT8G4WFD824A Unbuffered ECC AVAGO/LSI 9400-8i SAS3408 12Gbps HBA Adapter Supermicro BPN-SAS3-743A 8-Port SAS3/SAS2/SATA 12Gbps Backplane 8 x Seagate Exos X18 18TB HDD ST18000NM004J SAS 12Gbps 512e/4Kn 2 x Crucial 120GB SSD 2 x Crucial 1TB SSD 2 x Western Digital 960GB NVME Supermicro 4U case w/2000watt Redundant Power Supply The system is connected with a large APC data-center battery system and conditioner, in a HVAC controlled area. All hard drives have the newest firmware, and in 4k sectors both logical and native. The controller has the newest firmware, both regular and legacy roms, and with the SATA/SAS only mode flashed (dropping the NVME multi/tri-mode option that the new 9400 series cards support). Running any kind of heavy I/O onto the 18TB drives that are connected to the BPN-SAS3-743A backplane and through to the LSI 9400-8i HBA eventually results in the drive resetting. This happens even without the drives assigned to any kind of ZFS pool. This also happens whether running from the shell within the GUI or from the shell itself. This happens on all drives, that are using two separate SFF8643 cables with a backplane that has two separate SFF8643 ports. To cause this to happen, I can either run badblocks on each drive (using: badblocks -c 1024 -w -s -v -e 1 -b 65536 /dev/sdX), or even just running a SMART extended/long test. Eventually, all or nearly all drives will reset, even spin down (according to the shell logs). Sometimes they reset in batches, while others continue chugging along. It's made completing any kind of SMART extended test not possible. Badblocks will fail out, reporting too many bad blocks, on multiple hard drives all at nearly the exact same moment, yet consecutive badblock scans won't report bad blocks in the same areas. SMART test will just show "aborted, drive reset?" as the result. My plan was to replace the HBA with an older LSI 9305-16i, replace the two SFF8643-SFF8643 cables going from the HBA to the backplane just for good measure, install two different SFF8643-SFF8482 cables that bypass the backplane fully, then four of the existing Seagate 18TB drives and put them on the the SFF8643-SFF8482 connections that bypass the backplane, as well as install four new WD Ultrastar DC HC550 (WUH721818AL5204) drives into the mix (some using the backplane, some not). That should reveal if this is a compatibility/bug issue with all large drives or certain large drives on a LSI controller, the mpr driver, and/or this backplane. If none of that works or doesn't eliminate all the potential points of failures, I'm left with nothing but subpar work arounds, such as just using the onboard SATA ports, disabling NCQ in the LSI controller, or setting up a L2ARC cache (or I might try a metadata cache to see if that circumvents the issue as well). Condensed logs when one drive errors out: sd 0:0:0:0: device_unblock and setting to running, handle(0x000d) mpt3sas_cm0: log_info(0x31110e05): originator(PL), code(0x11), sub_code(0x0e05) mpt3sas_cm0: log_info(0x31110e05): originator(PL), code(0x11), sub_code(0x0e05) ~ ~ ~ ~ sd 0:0:0:0: Power-on or device reset occurred ...ready sd 0:0:6:0: device_block, handle(0x000f) sd 0:0:9:0: device_block, handle(0x0012) sd 0:0:10:0: device_block, handle(0x0014) mpt3sas_cm0: log_info(0x3112010c): originator(PL), code(0x12), sub_code(0x010c) sd 0:0:9:0: device_unblock and setting to running, handle(0x0012) sd 0:0:6:0: device_unblock and setting to running, handle(0x000f) sd 0:0:10:0: device_unblock and setting to running, handle(0x0014) sd 0:0:9:0: Power-on or device reset occurred sd 0:0:6:0: Power-on or device reset occurred sd 0:0:10:0: Power-on or device reset occurred scsi_io_completion_action: 5 callbacks suppressed sd 0:0:10:0: [sdd] tag#5532 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=2s sd 0:0:10:0: [sdd] tag#5532 Sense Key : Not Ready [current] [descriptor] sd 0:0:10:0: [sdd] tag#5532 Add. Sense: Logical unit not ready, additional power granted sd 0:0:10:0: [sdd] tag#5532 CDB: Write(16) 8a 00 00 00 00 00 5c 75 7a 12 00 00 01 40 00 00 print_req_error: 5 callbacks suppressed blk_update_request: I/O error, dev sdd, sector 12409622672 op 0x1:(WRITE) flags 0xc800 phys_seg 1 prio class 0 sd 0:0:10:0: [sdd] tag#5533 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=2s sd 0:0:10:0: [sdd] tag#5533 Sense Key : Not Ready [current] [descriptor] sd 0:0:10:0: [sdd] tag#5533 Add. Sense: Logical u
[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496 leopoldo.20b75...@mailbeaver.net changed: What|Removed |Added CC||leopoldo.20b7513b@mailbeave ||r.net --- Comment #53 from leopoldo.20b75...@mailbeaver.net --- I just started experiencing this issue with my setup with 3 IBM M1015 HBAs and ST1NM002G SAS drives. Has anyone tested their setup with TrueNAS Scale? Since the platform is based on Linux I was hoping this bug was not present. I may try switching when Scale is out of RC status later this month. -- You are receiving this mail because: You are the assignee for the bug.
[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496 --- Comment #52 from Daniel Austin --- (In reply to Alexander Motin from comment #51) If it helps, it seems to be enclosure related... I can use NCQ on my disks and controller in another enclosure (e.g. I have an Areca 8 bay that's fine)... but when using my QNAP jbod enclosure (TL-D1600S) I get errors when NCQ is enabled. -- You are receiving this mail because: You are the assignee for the bug.
[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496 Alexander Motin changed: What|Removed |Added CC||m...@freebsd.org --- Comment #51 from Alexander Motin --- (In reply to Daniel Austin from comment #48) I see mpsutil/mprutil tools recently got support for `set ncq` subcommand, that should allow to disable NCQ. I've merged that into TrueNAS 12.0-U7. I haven't tried that myself, and it makes me shiver inside from its inefficiency, but I can believe that it may reduce maximum command latency in some scenarios, or may be even avoid command timeouts in situations when the disks or the HBAs can't schedule or process the NCQ commands reasonably. -- You are receiving this mail because: You are the assignee for the bug.
[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496 --- Comment #50 from Conall O'Brien --- I've been experiencing these issues with WD Red 6TB disks. I had frequent, unexplained reboots with 12.1-RELEASE and 12.2-RELEASE. mprutil show all Adapter: mpr0 Adapter: Board Name: LSI3008-IR Board Assembly: Chip Name: LSISAS3008 Chip Revision: ALL BIOS Revision: 12.00.00.00 Firmware Revision: 10.00.00.00 Integrated RAID: yes PhyNum CtlrHandle DevHandle Disabled Speed MinMaxDevice 0 00010009 N 6.0 3.012 SAS Initiator 1 0002000a N 6.0 3.012 SAS Initiator 2 0003000b N 6.0 3.012 SAS Initiator 3 0004000c N 6.0 3.012 SAS Initiator 4 0005000d N 6.0 3.012 SAS Initiator 5 0006000e N 6.0 3.012 SAS Initiator 6 0007000f N 6.0 3.012 SAS Initiator 7 00080010 N 6.0 3.012 SAS Initiator Since upgrading to 13.0-RELEASE I am no longer experiencing reboots, but I do continue to see CAM errors da7:mpr0:0:9:0): Info: 0x8024c3a0 (da7:mpr0:0:9:0): Error 22, Unretryable error (da7:mpr0:0:9:0): WRITE(10). CDB: 2a 00 80 24 c3 e8 00 00 08 00 (da7:mpr0:0:9:0): CAM status: SCSI Status Error (da7:mpr0:0:9:0): SCSI status: Check Condition (da7:mpr0:0:9:0): SCSI sense: ILLEGAL REQUEST asc:21,0 (Logical block address out of range) (da7:mpr0:0:9:0): Info: 0x8024c3e8 (da7:mpr0:0:9:0): Error 22, Unretryable error (da7:mpr0:0:9:0): WRITE(10). CDB: 2a 00 3a 80 4d 20 00 00 10 00 (da7:mpr0:0:9:0): CAM status: SCSI Status Error (da7:mpr0:0:9:0): SCSI status: Check Condition (da7:mpr0:0:9:0): SCSI sense: ILLEGAL REQUEST asc:21,0 (Logical block address out of range) (da7:mpr0:0:9:0): Info: 0x3a804d20 (da7:mpr0:0:9:0): Error 22, Unretryable error (da7:mpr0:0:9:0): WRITE(10). CDB: 2a 00 31 c4 f0 90 00 00 10 00 (da7:mpr0:0:9:0): CAM status: SCSI Status Error (da7:mpr0:0:9:0): SCSI status: Check Condition (da7:mpr0:0:9:0): SCSI sense: ILLEGAL REQUEST asc:21,0 (Logical block address out of range) (da7:mpr0:0:9:0): Info: 0x31c4f090 (da7:mpr0:0:9:0): Error 22, Unretryable error (da7:mpr0:0:9:0): WRITE(10). CDB: 2a 00 31 c4 f8 b8 00 06 88 00 (da7:mpr0:0:9:0): CAM status: SCSI Status Error (da7:mpr0:0:9:0): SCSI status: Check Condition (da7:mpr0:0:9:0): SCSI sense: ILLEGAL REQUEST asc:21,0 (Logical block address out of range) (da7:mpr0:0:9:0): Info: 0x31c4f8b8 (da7:mpr0:0:9:0): Error 22, Unretryable error (da7:mpr0:0:9:0): WRITE(10). CDB: 2a 00 31 c3 c0 a0 00 01 60 00 (da7:mpr0:0:9:0): CAM status: SCSI Status Error (da7:mpr0:0:9:0): SCSI status: Check Condition (da7:mpr0:0:9:0): SCSI sense: ILLEGAL REQUEST asc:21,0 (Logical block address out of range) (da7:mpr0:0:9:0): Info: 0x31c3c0a0 (da7:mpr0:0:9:0): Error 22, Unretryable error (da7:mpr0:0:9:0): WRITE(16). CDB: 8a 00 00 00 00 02 ba a0 f0 10 00 00 00 10 00 00 (da7:mpr0:0:9:0): CAM status: SCSI Status Error (da7:mpr0:0:9:0): SCSI status: Check Condition (da7:mpr0:0:9:0): SCSI sense: ILLEGAL REQUEST asc:21,0 (Logical block address out of range) (da7:mpr0:0:9:0): Info: 0x2baa0f010 (da7:mpr0:0:9:0): Error 22, Unretryable error (da7:mpr0:0:9:0): WRITE(16). CDB: 8a 00 00 00 00 02 14 4b b4 c0 00 00 00 08 00 00 (da7:mpr0:0:9:0): CAM status: SCSI Status Error (da7:mpr0:0:9:0): SCSI status: Check Condition (da7:mpr0:0:9:0): SCSI sense: ILLEGAL REQUEST asc:21,0 (Logical block address out of range) (da7:mpr0:0:9:0): Info: 0x2144bb4c0 (da7:mpr0:0:9:0): Error 22, Unretryable error I have upgraded to 13.0-RELEASE-p1, on account of FreeBSD-EN-21:13.mpt . Could the issues from that errata notice also be an issue for mps? -- You are receiving this mail because: You are the assignee for the bug.
[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496 --- Comment #49 from Conall O'Brien --- I've been experiencing these issues with WD Red 6TB disks. I had frequent, unexplained reboots with 12.1-RELEASE and 12.2-RELEASE. mprutil show all Adapter: mpr0 Adapter: Board Name: LSI3008-IR Board Assembly: Chip Name: LSISAS3008 Chip Revision: ALL BIOS Revision: 12.00.00.00 Firmware Revision: 10.00.00.00 Integrated RAID: yes PhyNum CtlrHandle DevHandle Disabled Speed MinMaxDevice 0 00010009 N 6.0 3.012 SAS Initiator 1 0002000a N 6.0 3.012 SAS Initiator 2 0003000b N 6.0 3.012 SAS Initiator 3 0004000c N 6.0 3.012 SAS Initiator 4 0005000d N 6.0 3.012 SAS Initiator 5 0006000e N 6.0 3.012 SAS Initiator 6 0007000f N 6.0 3.012 SAS Initiator 7 00080010 N 6.0 3.012 SAS Initiator Since upgrading to 13.0-RELEASE I am no longer experiencing reboots, but I do continue to see CAM errors da7:mpr0:0:9:0): Info: 0x8024c3a0 (da7:mpr0:0:9:0): Error 22, Unretryable error (da7:mpr0:0:9:0): WRITE(10). CDB: 2a 00 80 24 c3 e8 00 00 08 00 (da7:mpr0:0:9:0): CAM status: SCSI Status Error (da7:mpr0:0:9:0): SCSI status: Check Condition (da7:mpr0:0:9:0): SCSI sense: ILLEGAL REQUEST asc:21,0 (Logical block address out of range) (da7:mpr0:0:9:0): Info: 0x8024c3e8 (da7:mpr0:0:9:0): Error 22, Unretryable error (da7:mpr0:0:9:0): WRITE(10). CDB: 2a 00 3a 80 4d 20 00 00 10 00 (da7:mpr0:0:9:0): CAM status: SCSI Status Error (da7:mpr0:0:9:0): SCSI status: Check Condition (da7:mpr0:0:9:0): SCSI sense: ILLEGAL REQUEST asc:21,0 (Logical block address out of range) (da7:mpr0:0:9:0): Info: 0x3a804d20 (da7:mpr0:0:9:0): Error 22, Unretryable error (da7:mpr0:0:9:0): WRITE(10). CDB: 2a 00 31 c4 f0 90 00 00 10 00 (da7:mpr0:0:9:0): CAM status: SCSI Status Error (da7:mpr0:0:9:0): SCSI status: Check Condition (da7:mpr0:0:9:0): SCSI sense: ILLEGAL REQUEST asc:21,0 (Logical block address out of range) (da7:mpr0:0:9:0): Info: 0x31c4f090 (da7:mpr0:0:9:0): Error 22, Unretryable error (da7:mpr0:0:9:0): WRITE(10). CDB: 2a 00 31 c4 f8 b8 00 06 88 00 (da7:mpr0:0:9:0): CAM status: SCSI Status Error (da7:mpr0:0:9:0): SCSI status: Check Condition (da7:mpr0:0:9:0): SCSI sense: ILLEGAL REQUEST asc:21,0 (Logical block address out of range) (da7:mpr0:0:9:0): Info: 0x31c4f8b8 (da7:mpr0:0:9:0): Error 22, Unretryable error (da7:mpr0:0:9:0): WRITE(10). CDB: 2a 00 31 c3 c0 a0 00 01 60 00 (da7:mpr0:0:9:0): CAM status: SCSI Status Error (da7:mpr0:0:9:0): SCSI status: Check Condition (da7:mpr0:0:9:0): SCSI sense: ILLEGAL REQUEST asc:21,0 (Logical block address out of range) (da7:mpr0:0:9:0): Info: 0x31c3c0a0 (da7:mpr0:0:9:0): Error 22, Unretryable error (da7:mpr0:0:9:0): WRITE(16). CDB: 8a 00 00 00 00 02 ba a0 f0 10 00 00 00 10 00 00 (da7:mpr0:0:9:0): CAM status: SCSI Status Error (da7:mpr0:0:9:0): SCSI status: Check Condition (da7:mpr0:0:9:0): SCSI sense: ILLEGAL REQUEST asc:21,0 (Logical block address out of range) (da7:mpr0:0:9:0): Info: 0x2baa0f010 (da7:mpr0:0:9:0): Error 22, Unretryable error (da7:mpr0:0:9:0): WRITE(16). CDB: 8a 00 00 00 00 02 14 4b b4 c0 00 00 00 08 00 00 (da7:mpr0:0:9:0): CAM status: SCSI Status Error (da7:mpr0:0:9:0): SCSI status: Check Condition (da7:mpr0:0:9:0): SCSI sense: ILLEGAL REQUEST asc:21,0 (Logical block address out of range) (da7:mpr0:0:9:0): Info: 0x2144bb4c0 (da7:mpr0:0:9:0): Error 22, Unretryable error I have upgraded to 13.0-RELEASE-p1, on account of FreeBSD-EN-21:13.mpt . Could the issues from that errata notice also be an issue for mps? -- You are receiving this mail because: You are the assignee for the bug.
[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496 Daniel Austin changed: What|Removed |Added CC||freebsd-po...@dan.me.uk --- Comment #48 from Daniel Austin --- I have this same issue too... My setup is a LSI 9206-16e PCIe card (which is really just 2 x LSISAS2308 cards with a PCIe switch) and BSD 12+13. I have this connecting to a QNAP TL-D1600S 16-bay SATA chassis with 12 x 8TB Toshiba SATA (non-SMR) disks. I tried a workaround online (camcontrol tags daX -N 1) to disable NCQ per drive and this is fine *once the server is booted*... however, if i ever rebooted, I had a massive scroll of CAM errors as the kernel was trying to import a ZFS pool before the camcontrol script had run... This even leads to ZFS reporting too many errors and taking the pool offline (note: i'm not booting from this pool) I don't have the luxury of a firmware update as there hasn't been any updates to the SAS2308 in a long long time! I am running 20.00.07.00. I've also tried 3 different 9206-16e cards, so i'm happy it's not just a faulty card. I also tried in 2 different servers with the same results so happy it's probably not a hardware issue at all. My final solution which is really a kludge but does fix the issue permanently was to boot into a live ubuntu environment and use lsiutil 1.72 to disable NCQ on the card itself. This is saved in the cards EEPROM. Now when I boot bsd, I get zero CAM errors from any disks, zfs pool imports straight away, and I can still max out the bandwidth to the drives - yay. I also have an LSI 9207-8i card in a different machine running the same firmware which has no issues at all even with NCQ enabled (but these are directly connected to the card via SATA cables with no enclosure as such), so I do think this is just some kind of incompatibility between driver<-->card<-->enclosure of some kind. It would be lovely if lsiutil could be ported to bsd... I did look at it briefly but it's beyond my capabilities... the source code is online if anyone wanted to try. Hope that helps anyone else stumbling upon this PR and I appreciate it may only fix some not all of the cases reported here so far... but some is better than none :-) -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496 --- Comment #47 from elij --- I have a SAS9305-16i, and was seeing similar errors to some of the folks here. The comment about NCQ got me thinking, and sure enough.. looks like broadcom had a new firmware out (I had 16.00.11.00 where I saw the errors). 16.00.12.00 has these two as the fixed "defect list": ID: DCSG00398894 Headline: SATA only: WRITE SAME NCQ encapsulation assumes NonData NCQ is supported if Zero EXT is Supported Description Of Change: Disable NCQ encapsulation if Zero EXT is supported but Non Data NCQ is not supported Issue Description: In case if Zero EXT is supported but Non Data NCQ is not supported by drive, WRITE SAME NCQ encapsulation would send Non Data NCQ command to the drive. Drive would fail the command as Non Data NCQ is not supported by drive. This will cause command failure to host. Steps To Reproduce: IO errors are observed when mkfs.ext4 operation is done on drives that support Zero EXT but do not support Non Data NCQ. ID: DCSG00411882 (Port Of Defect DCSG00139294) Headline: SATA only : Controller hang is observed due to recursive function call. Description Of Change: Avoid starting of pended commands to SATA drive from completion functions of a command to avoid recursion. Issue Description: Controller hang is observed with ATA pass through command followed by a pended command that fails. At the completion of pass through command, PL starts the pended IOs if any. If the pended IO is failed due to invalid CDB, then immediately completion function is called causing recursion. This causes controller hang. Steps To Reproduce: Install the FreeNAS and try to create ZFS pool(Storage -> Pools -> Add) of the direct attached SSDs. - ATA command completes when there are IOs in pendlist. - The pended IO has invalid CDB I have installed the firmware, and am keeping an eye on it. My system is lightly loaded though, so my sightings of the issue have been rather spurious. -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496 Wayne Willcox changed: What|Removed |Added CC||wwillc...@gmail.com --- Comment #46 from Wayne Willcox --- So I have been getting the same errors in 12.1 and now 12.2 at scbus0 target 8 lun 0 (pass0,da0) at scbus0 target 9 lun 0 (pass1,da1) at scbus0 target 10 lun 0 (pass2,da2) at scbus0 target 11 lun 0 (pass3,da3) at scbus0 target 12 lun 0 (pass4,da4) at scbus0 target 13 lun 0 (pass5,da5) at scbus0 target 14 lun 0 (pass6,da6) at scbus0 target 15 lun 0 (pass7,da7) at scbus0 target 16 lun 0 (pass8,da8) at scbus0 target 17 lun 0 (pass9,da9) at scbus0 target 18 lun 0 (pass10,da10) at scbus0 target 19 lun 0 (pass11,da11) at scbus0 target 20 lun 0 (ses0,pass12) at scbus0 target 21 lun 0 (ses1,pass13) at scbus0 target 22 lun 0 (pass14,da12) at scbus0 target 23 lun 0 (pass15,da13) at scbus0 target 24 lun 0 (pass16,da14) Adapter: mps0 Adapter: Board Name: SAS9211-8i Board Assembly: H3-25250-02B Chip Name: LSISAS2008 Chip Revision: ALL: BIOS Revision: 7.39.00.00 Firmware Revision: 20.00.06.00 Integrated RAID: no PhyNum CtlrHandle DevHandle Disabled Speed MinMaxDevice 0 N 1.56.0SAS Initiator 1 N 1.56.0SAS Initiator 2 N 1.56.0SAS Initiator 3 N 1.56.0SAS Initiator 4 00010009 N 6.0 1.56.0SAS Initiator 5 00010009 N 6.0 1.56.0SAS Initiator 6 00010009 N 6.0 1.56.0SAS Initiator 7 00010009 N 6.0 1.56.0SAS Initiator hawthorn kernel log messages: + (da14:mps0:0:24:0): WRITE(10). CDB: 2a 00 3b c2 f3 f0 00 00 18 00 length 12288 SMID 1732 Command timeout on target 24(0x001a) 6 set, 60.96681849 elapsed +mps0: Sending abort to target 24 for SMID 1732 + (da14:mps0:0:24:0): WRITE(10). CDB: 2a 00 3b c2 f3 f0 00 00 18 00 length 12288 SMID 1732 Aborting command 0xfe00daa55760 + (da4:mps0:0:12:0): WRITE(10). CDB: 2a 00 3b c2 f3 d8 00 00 30 00 length 24576 SMID 1089 Command timeout on target 12(0x000f) 6 set, 60.97158463 elapsed +mps0: Sending abort to target 12 for SMID 1089 + (da4:mps0:0:12:0): WRITE(10). CDB: 2a 00 3b c2 f3 d8 00 00 30 00 length 24576 SMID 1089 Aborting command 0xfe00daa1f758 + (da14:mps0:0:24:0): WRITE(10). CDB: 2a 00 49 79 f9 f8 00 00 08 00 length 4096 SMID 1230 Command timeout on target 24(0x001a) 6 set, 60.97561590 elapsed + (da13:mps0:0:23:0): WRITE(10). CDB: 2a 00 3b c2 f3 f0 00 00 18 00 length 12288 SMID 78 Command timeout on target 23(0x0019) 6 set, 60.9852 elapsed +mps0: Sending abort to target 23 for SMID 78 + (da13:mps0:0:23:0): WRITE(10). CDB: 2a 00 3b c2 f3 f0 00 00 18 00 length 12288 SMID 78 Aborting command 0xfe00da9ca8d0 + (da5:mps0:0:13:0): WRITE(10). CDB: 2a 00 53 42 14 48 00 00 10 00 length 8192 SMID 217 Command timeout on target 13(0x0010) 6 set, 60.98193577 elapsed +mps0: Sending abort to target 13 for SMID 217 + (da5:mps0:0:13:0): WRITE(10). CDB: 2a 00 53 42 14 48 00 00 10 00 length 8192 SMID 217 Aborting command 0xfe00da9d6398 + (da8:mps0:0:16:0): WRITE(10). CDB: 2a 00 53 42 14 40 00 00 18 00 length 12288 SMID 670 Command timeout on target 16(0x0013) 6 set, 60.98762575 elapsed +mps0: Sending abort to target 16 for SMID 670 + (da8:mps0:0:16:0): WRITE(10). CDB: 2a 00 53 42 14 40 00 00 18 00 length 12288 SMID 670 Aborting command 0xfe00da9fc450 + (da2:mps0:0:10:0): WRITE(10). CDB: 2a 00 53 42 14 40 00 00 18 00 length 12288 SMID 91 Command timeout on target 10(0x000d) 6 set, 60.99148905 elapsed +mps0: Sending abort to target 10 for SMID 91 + (da2:mps0:0:10:0): WRITE(10). CDB: 2a 00 53 42 14 40 00 00 18 00 length 12288 SMID 91 Aborting command 0xfe00da9cba48 + (da2:mps0:0:10:0): WRITE(10). CDB: 2a 00 1b 8d ae b0 00 00 08 00 length 4096 SMID 1064 Command timeout on target 10(0x000d) 6 set, 60.99537335 elapsed + (da9:mps0:0:17:0): WRITE(10). CDB: 2a 00 53 42 14 40 00 00 10 00 length 8192 SMID 982 Command timeout on target 17(0x0014) 6 set, 60.99738899 elapsed +mps0: Sending abort to target 17 for SMID 982 + (da9:mps0:0:17:0): WRITE(10). CDB: 2a 00 53 42 14 40 00 00 10 00 length 8192 SMID 982 Aborting command 0xfe00daa16790 + (da12:mps0:0:22:0): WRITE(10). CDB: 2a 00 53 42 14 40 00 00 10 00 length 8192 SMID 1677 Command timeout on target 22(0x0018) 6 set, 60.100129429 elapsed +mps0: Sending abort to target 2
[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496 --- Comment #45 from Matthias Pfaller --- (In reply to Sharad Ahlawat from comment #44) [root@nyx ~]# sysctl kern.cam.da.retry_count kern.cam.da.retry_count: 4 Previously we had the drives connected to the onboard SATA/SAS ports (hoping the mps problems would get solved). The machine was running for month without showing any disk related problems. At the moment we are looking for a replacement for the SAS2008 (for other machines without that many onboard SATA/SAS channels) and are testing the following config: ahci0: port 0x3050-0x3057,0x3040-0x3043,0x3030-0x3037,0x3020-0x3023,0x3000-0x301f mem 0xdf94-0xdf9407ff irq 32 at device 0.0 on pci5 ahci0: AHCI v1.00 with 4 6Gbps ports, Port Multiplier supported with FBS ahcich0: at channel 0 on ahci0 ahcich1: at channel 1 on ahci0 ahcich2: at channel 2 on ahci0 ahcich3: at channel 3 on ahci0 ahci1: AHCI v1.00 with 4 6Gbps ports, Port Multiplier supported with FBS ahcich4: at channel 0 on ahci1 ahcich5: at channel 1 on ahci1 ahcich6: at channel 2 on ahci1 ahcich7: at channel 3 on ahci1 ada0 at ahcich0 bus 0 scbus0 target 0 lun 0 ada0: ACS-3 ATA SATA 3.x device ada0: Serial Number ZJV3ZYSX ada0: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes) ada0: Command Queueing enabled ada0: 11444224MB (23437770752 512 byte sectors) ada1 at ahcich1 bus 0 scbus1 target 0 lun 0 ada1: ACS-3 ATA SATA 3.x device ada1: Serial Number ZJV1VVWX ada1: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes) ada1: Command Queueing enabled ada1: 11444224MB (23437770752 512 byte sectors) ada2 at ahcich2 bus 0 scbus2 target 0 lun 0 ada2: ACS-3 ATA SATA 3.x device ada2: Serial Number ZJV1WLXM ada2: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes) ada2: Command Queueing enabled ada2: 11444224MB (23437770752 512 byte sectors) ada3 at ahcich3 bus 0 scbus3 target 0 lun 0 ada3: ACS-3 ATA SATA 3.x device ada3: Serial Number ZJV2YY9A ada3: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes) ada3: Command Queueing enabled ada3: 11444224MB (23437770752 512 byte sectors) ada4 at ahcich4 bus 0 scbus4 target 0 lun 0 ada4: ACS-3 ATA SATA 3.x device ada4: Serial Number ZJV3ZJNA ada4: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes) ada4: Command Queueing enabled ada4: 11444224MB (23437770752 512 byte sectors) ada5 at ahcich5 bus 0 scbus5 target 0 lun 0 ada5: ACS-3 ATA SATA 3.x device ada5: Serial Number ZJV3ZXN5 ada5: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes) ada5: Command Queueing enabled ada5: 11444224MB (23437770752 512 byte sectors) ada6 at ahcich6 bus 0 scbus6 target 0 lun 0 ada6: ACS-3 ATA SATA 3.x device ada6: Serial Number ZJV2MWZ0 ada6: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes) ada6: Command Queueing enabled ada6: 11444224MB (23437770752 512 byte sectors) ada7 at ahcich7 bus 0 scbus7 target 0 lun 0 ada7: ACS-3 ATA SATA 3.x device ada7: Serial Number ZCH0HHJ6 ada7: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes) ada7: Command Queueing enabled ada7: 11444224MB (23437770752 512 byte sectors) ada8: ACS-2 ATA SATA 3.x device ada8: Serial Number S3F5NX0M400493 ada8: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes) ada8: Command Queueing enabled ada8: 228936MB (468862128 512 byte sectors) ada8: quirks=0x1<4K> ada9: ACS-2 ATA SATA 3.x device ada9: Serial Number S3F5NX0M400461 ada9: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes) ada9: Command Queueing enabled ada9: 228936MB (468862128 512 byte sectors) ada9: quirks=0x1<4K> I reverted the "camcontrol tags -N 1" and the changed sysctls. So with this setup the system runs with the same drives and without any changed sysctl settings or cam settings. Up to now there are no problems showing up. Operating conditions are the same as with the last mps test (a zpool scrub is running and our systems did a backup to the machine last night). As the drives behave well with at least three different controllers ahci0: ahci1: isci0: I can't imagine that changing parameters on the cam/zfs side (NCQ TRIM, NCQ, timeouts) will help in our case. Probably we will have to decommission the LSI9211 boards :-( regards, Matthias -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496 --- Comment #44 from Sharad Ahlawat --- (In reply to Matthias Pfaller from comment #43) logs shows the read events are still timing out, even with 90 also there are no retry messages, is "sysctl kern.cam.da.retry_count" set to 0 ? you could try a few things to get to the root cause: a: also disable ZFS cache flush sysctl vfs.zfs.cache_flush_disable=1 even though your drives are not SMRs b: experiment with larger timeout values also observe "gstat" output and ensure the first column L(q) is continually returning to zero and not getting stuck for any of the drives c: try setting reducing the SCIC speed to 3.0 in the controller settings; just to eliminate some disk firmware speed compatibility issue. Side note, not sure if this applies to your drives but a few of mine don't support NCQ TRIM and are not properly driver blacklisted, so I had to set vfs.unmapped_buf_allowed=0 in loader.conf -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496 --- Comment #43 from Matthias Pfaller --- (In reply to Matthias Pfaller from comment #42) Sorry, I messed up the names... This should have been Sharad Ahlawat and not Christoph Bubel. -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496 --- Comment #42 from Matthias Pfaller --- We followed the advice of Christop Bubel, disabled NCQ and set the timeouts to 90s (I can't imagine a situation where this should be necessary, but still...). Results: May 8 17:36:00 nyx kernel: mps0: IOC Fault 0x4d04, Resetting May 8 17:36:00 nyx kernel: mps0: Reinitializing controller May 8 17:36:00 nyx kernel: mps0: Firmware: 19.00.00.00, Driver: 21.02.00.00-fbsd May 8 17:36:00 nyx kernel: mps0: IOCCapabilities: 1285c May 8 17:50:44 nyx kernel: mps0: IOC Fault 0x4d04, Resetting May 8 17:50:44 nyx kernel: mps0: Reinitializing controller May 8 17:50:44 nyx kernel: mps0: Firmware: 19.00.00.00, Driver: 21.02.00.00-fbsd May 8 17:50:44 nyx kernel: mps0: IOCCapabilities: 1285c May 8 18:10:06 nyx kernel: mps0: IOC Fault 0x4d04, Resetting May 8 18:10:06 nyx kernel: mps0: Reinitializing controller May 8 18:10:06 nyx kernel: mps0: Firmware: 19.00.00.00, Driver: 21.02.00.00-fbsd May 8 18:10:06 nyx kernel: mps0: IOCCapabilities: 1285c May 8 18:27:27 nyx kernel: mps0: IOC Fault 0x4d04, Resetting May 8 18:27:27 nyx kernel: mps0: Reinitializing controller May 8 18:27:27 nyx kernel: mps0: Firmware: 19.00.00.00, Driver: 21.02.00.00-fbsd We upgraded the controller firmware to 20.00.07.00 and tried again: [root@nyx ~]# mpsutil show all Adapter: mps0 Adapter: Board Name: SAS9211-8i Board Assembly: Chip Name: LSISAS2008 Chip Revision: ALL BIOS Revision: 0.00.00.00 Firmware Revision: 20.00.07.00 Integrated RAID: no PhyNum CtlrHandle DevHandle Disabled Speed MinMaxDevice 0 00010009 N 6.0 1.56.0SAS Initiator 1 0002000a N 6.0 1.56.0SAS Initiator 2 0003000b N 6.0 1.56.0SAS Initiator 3 0004000c N 6.0 1.56.0SAS Initiator 4 0005000d N 6.0 1.56.0SAS Initiator 5 0006000e N 6.0 1.56.0SAS Initiator 6 0007000f N 6.0 1.56.0SAS Initiator 7 00080010 N 6.0 1.56.0SAS Initiator Devices: BTSAS Address Handle ParentDeviceSpeed Enc Slot Wdt 00 01 44332211 00090001 SATA Target 6.0 0001 031 00 04 443322110100 000a0002 SATA Target 6.0 0001 021 00 03 443322110200 000b0003 SATA Target 6.0 0001 011 00 00 443322110300 000c0004 SATA Target 6.0 0001 001 00 02 443322110400 000d0005 SATA Target 6.0 0001 071 00 05 443322110500 000e0006 SATA Target 6.0 0001 061 00 06 443322110600 000f0007 SATA Target 6.0 0001 051 00 07 443322110700 00100008 SATA Target 6.0 0001 041 Enclosures: Slots Logical ID SEPHandle EncHandleType 08500605b001551a80 0001 Direct Attached SGPIO Expanders: NumPhys SAS Address DevHandle Parent EncHandle SAS Level [root@nyx ~]# for i in $(camcontrol devlist | grep "ST12000" | cut -d"," -f2 | cut -d")" -f1); do > camcontrol tags $i > done (pass0:mps0:0:0:0): device openings: 1 (pass1:mps0:0:1:0): device openings: 1 (pass2:mps0:0:2:0): device openings: 1 (pass3:mps0:0:3:0): device openings: 1 (pass4:mps0:0:4:0): device openings: 1 (pass5:mps0:0:5:0): device openings: 1 (pass6:mps0:0:6:0): device openings: 1 (pass7:mps0:0:7:0): device openings: 1 [root@nyx ~]# Results: May 11 15:58:58 nyx kernel: (da7:mps0:0:7:0): READ(10). CDB: 28 00 ef 46 55 38 00 00 10 00 length 8192 SMID 64 Command timeout on target 7(0x0010) 9 set, 90.154685585 elapsed May 11 15:58:58 nyx kernel: mps0: Sending abort to target 7 for SMID 64 May 11 15:58:58 nyx kernel: (da7:mps0:0:7:0): READ(10). CDB: 28 00 ef 46 55 38 00 00 10 00 length 8192 SMID 64 Aborting command 0xfe00f92e5600 May 11 15:58:58 nyx kernel: (da2:mps0:0:2:0): READ(16). CDB: 88 00 00 00 00 04 e9 23 92 a0 00 00 00 20 00 00 length 16384 SMID 1564 Command timeout on target 2(0x000d) 9 set, 90.120064627 elapsed May 11 15:58:58 nyx kernel: mps0: Sending abort to target 2 for SMID 1564 May 11 15:58:58 nyx kernel: (da2:mps0:0:2:0): READ(16). CDB: 88 00 00 00 00 04 e9 23 92 a0 00 00 00 20 00 00 length 16384 SMID 1564 Aborting command 0xfe00f93635a0 May 11 15:58:58 nyx kernel: (da6:mps0:0:6:0): READ(16). CDB: 88 00 00 00 00 04 e8 09 8e 30 00 00 00 30 00 00 length 24576 SMID 432 Command timeout on target 6(0x000f) 9 set, 90.97377933 elapsed May 11 15:58:58 nyx kernel: mps0: Sending abort to target 6 for SMID 432 May 11 15:58:58 nyx kernel: (da6:mps0:0:6:0): READ(16). CDB: 88 00 00 00 00 04 e8 09 8e 30 00 00 00 30 00 00 length 24576 SMID 432 Aborting command 0xfe00f9304480 May 11 15:58:58 nyx kern
[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496 --- Comment #41 from Sharad Ahlawat --- (In reply to Sharad Ahlawat from comment #28) Symptom: CAM retry and timeout messages leading to controller aborts and resets Cause: slow disks or using SMR disks Workaround: Increase the CAM timeout defaults ❯ sysctl kern.cam.da.default_timeout=90 kern.cam.da.default_timeout: 60 -> 90 ❯ sysctl kern.cam.ada.default_timeout=60 kern.cam.ada.default_timeout: 30 -> 60 And disable NCQ on SMR Seagates: ❯ cat cam-tags.sh #!/usr/local/bin/bash #shrinking the command Native Command Queue down to 1 effectively disabling queuing for Disk in `camcontrol devlist | grep "ST8000DM" | cut -d"," -f2 | cut -d")" -f1`; do camcontrol tags $Disk -N 1 ; # camcontrol tags $Disk -v done If you only have SMRs in your setup and use an UPS you could also: ❯ sysctl vfs.zfs.cache_flush_disable=1 Solution: don't use slow disks and SMRs disks with ZFS The long version: I am obliged to post this update given the driver downgrade workaround I previously posted on this thread before getting to the root cause for these logs in my messages file after upgrading to 12.x Jan 18 17:29:28 nas kernel: ahcich6: Timeout on slot 8 port 0 Jan 18 17:29:28 nas kernel: ahcich6: is cs 0100 ss rs 0100 tfd c0 serr cmd c817 Jan 18 17:29:28 nas kernel: (ada6:ahcich6:0:0:0): FLUSHCACHE48. ACB: ea 00 00 00 00 40 00 00 00 00 00 00 Jan 18 17:29:28 nas kernel: (ada6:ahcich6:0:0:0): CAM status: Command timeout Jan 18 17:29:28 nas kernel: (ada6:ahcich6:0:0:0): Retrying command, 0 more tries remain Jan 18 17:30:00 nas kernel: ahcich6: AHCI reset: device not ready after 31000ms (tfd = 0080) Jan 18 17:30:30 nas kernel: ahcich6: Timeout on slot 9 port 0 Jan 18 17:30:30 nas kernel: ahcich6: is cs 0200 ss rs 0200 tfd 80 serr cmd c917 Jan 18 17:30:30 nas kernel: (aprobe0:ahcich6:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00 Jan 18 17:30:30 nas kernel: (aprobe0:ahcich6:0:0:0): CAM status: Command timeout Jan 18 17:30:30 nas kernel: (aprobe0:ahcich6:0:0:0): Retrying command, 0 more tries remain Apr 25 22:28:12 nas kernel: mps0: Controller reported scsi ioc terminated tgt 11 SMID 1039 loginfo 3108 Apr 25 22:28:12 nas kernel: mps0: Controller reported scsi ioc terminated tgt 11 SMID 1357 loginfo 3108 Apr 25 22:28:12 nas kernel: mps0: Controller reported scsi ioc terminated tgt 11 SMID 1933 loginfo 3108 Apr 25 22:28:12 nas kernel: (da4:mps0:0:11:0): READ(16). CDB: 88 00 00 00 00 01 45 fb 37 c8 00 00 00 b0 00 00 Apr 25 22:28:12 nas kernel: mps0: (da4:mps0:0:11:0): CAM status: CCB request completed with an error Apr 25 22:28:12 nas kernel: (da4:mps0:0:11:0): Retrying command, 3 more tries remain Apr 25 22:28:12 nas kernel: (da4:mps0:0:11:0): READ(16). CDB: 88 00 00 00 00 01 45 fb 38 78 00 00 00 58 00 00 Apr 25 22:28:12 nas kernel: (da4:mps0:0:11:0): CAM status: CCB request completed with an error Apr 25 22:28:12 nas kernel: (da4:mps0:0:11:0): Retrying command, 3 more tries remain Apr 25 22:28:12 nas kernel: (da4:mps0:0:11:0): WRITE(16). CDB: 8a 00 00 00 00 01 b4 c0 a1 d8 00 00 01 00 00 00 Apr 25 22:28:12 nas kernel: (da4:mps0:0:11:0): CAM status: CCB request completed with an error Apr 25 22:28:12 nas kernel: (da4:mps0:0:11:0): Retrying command, 3 more tries remain Apr 25 22:28:12 nas kernel: Controller reported scsi ioc terminated tgt 11 SMID 621 loginfo 3108 Apr 25 22:28:12 nas kernel: mps0: Controller reported scsi ioc terminated tgt 11 SMID 476 loginfo 3108 Apr 25 22:28:12 nas kernel: mps0: Controller reported scsi ioc terminated tgt 11 SMID 321 loginfo 3108 Apr 25 22:28:12 nas kernel: mps0: Controller reported scsi ioc terminated tgt 11 SMID 1873 loginfo 3108 Apr 25 22:28:12 nas kernel: mps0: Controller reported scsi ioc terminated tgt 11 SMID 1852 loginfo 3108 Apr 25 22:28:12 nas kernel: mps0: Controller reported scsi ioc terminated tgt 11 SMID 1742 loginfo 3108 Apr 25 22:28:12 nas kernel: mps0: Controller reported scsi ioc terminated tgt 11 SMID 387 loginfo 3108 Apr 25 22:28:12 nas kernel: mps0: Controller reported scsi ioc terminated tgt 11 SMID 2104 loginfo 3108 Apr 25 22:28:12 nas kernel: (da4:mps0:0:11:0): WRITE(16). CDB: 8a 00 00 00 00 01 b4 c0 a2 d8 00 00 01 00 00 00 Apr 25 22:28:12 nas kernel: (da4:mps0:0:11:0): CAM status: CCB request completed with an error Apr 25 22:28:12 nas kernel: (da4:mps0:0:11:0): Retrying command, 3 more tries remain Apr 25 22:28:12 nas kernel: (da4:mps0:0:11:0): WRITE(16). CDB: 8a 00 00 00 00 01 b4 c0 a3 d8 00 00 01 00 00 00 Apr 25 22:28:12 nas kernel: (da4:mps0:0:11:0): CAM status: CCB request completed with an error Apr 25 22:28:12 nas kernel: (da4:mps0:0:11:0): Retrying command, 3 more tries remain Apr 25 22:28:12 nas kernel: (da4:mps0:0:11:0): WRITE(16). CDB: 8a 00 00 00 00 01 b4 c0 a4 d8 00 00 01 00 00 00 Apr 25 22:28:12 nas kernel: (da4:mps0:0:11:0): CAM status: CCB request co
[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496 --- Comment #40 from Peter Eriksson --- I've had some random problems when using the cards with IR firmware, so I always make sure they are running IT firmware. Might be worth testing. But that might not be relevant for your problems. We are using them without any issues (so far) but we're using SAS drives (10TB HGST) behind SAS Expanders (and a couple of Intel SSD SATA drives) - so nothing similar to your situation... # mprutil show all Adapter: mpr0 Adapter: Board Name: Dell HBA330 Mini Board Assembly: Chip Name: LSISAS3008 Chip Revision: ALL BIOS Revision: 18.00.00.00 Firmware Revision: 16.00.08.00 Integrated RAID: no -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496 --- Comment #39 from Paul Thornton --- I don't have an expanders (note that this card has IR firmware, in case that is relevant - I wasn't actually aware of that, it is only being used as a dumb HBA): [root@nas1b ~]# mprutil show all Adapter: mpr0 Adapter: Board Name: LSI3008-IR Board Assembly: Chip Name: LSISAS3008 Chip Revision: ALL BIOS Revision: 8.35.00.00 Firmware Revision: 15.00.03.00 Integrated RAID: yes PhyNum CtlrHandle DevHandle Disabled Speed MinMaxDevice 0 N 3.012 SAS Initiator 1 N 3.012 SAS Initiator 2 N 3.012 SAS Initiator 3 N 3.012 SAS Initiator 4 N 3.012 SAS Initiator 5 N 3.012 SAS Initiator 6 N 3.012 SAS Initiator 7 N 3.012 SAS Initiator Devices: BTSAS Address Handle ParentDeviceSpeed Enc Slot Wdt Enclosures: Slots Logical ID SEPHandle EncHandleType 08500304801cf54a05 0001 Direct Attached SGPIO Expanders: NumPhys SAS Address DevHandle Parent EncHandle SAS Level [root@nas1b ~]# -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496 --- Comment #38 from Matthias Pfaller --- (In reply to Peter Eriksson from comment #37) Not in our case: [root@nyx ~]# mpsutil show all Adapter: mps0 Adapter: Board Name: SAS9211-8i Board Assembly: Chip Name: LSISAS2008 Chip Revision: ALL BIOS Revision: 0.00.00.00 Firmware Revision: 19.00.00.00 Integrated RAID: no PhyNum CtlrHandle DevHandle Disabled Speed MinMaxDevice 0 N 1.56.0SAS Initiator 1 N 1.56.0SAS Initiator 2 N 1.56.0SAS Initiator 3 N 1.56.0SAS Initiator 4 N 1.56.0SAS Initiator 5 N 1.56.0SAS Initiator 6 N 1.56.0SAS Initiator 7 N 1.56.0SAS Initiator Devices: BTSAS Address Handle ParentDeviceSpeed Enc Slot Wdt Enclosures: Slots Logical ID SEPHandle EncHandleType 08500605b001551a80 0001 Direct Attached SGPIO Expanders: NumPhys SAS Address DevHandle Parent EncHandle SAS Level We did downgrade to firmware 19.00.00.00, but I hadn't the time to run some tests using this firmware level. -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496 --- Comment #37 from Peter Eriksson --- Any SAS-expanders between the SAS HBA and the (SATA) disks? ("mprutil show expanders") - and if so - do they (the expanders) have up to date firmware? -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496 lastb0i...@gmail.com changed: What|Removed |Added CC||lastb0i...@gmail.com --- Comment #36 from lastb0i...@gmail.com --- Has anyone gotten further with this issue? Seems that it is affecting me every other hour now! And it all started after adding a WD DC Drive... -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496 --- Comment #35 from Ștefan BĂLU --- Looks like the problem persists with FreeNAS (FreeBSD 11.3-RELEASE) and Firmware Revision: 15.00.03.00. -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496 --- Comment #34 from Bane Ivosev --- (In reply to Ștefan BĂLU from comment #33) Before first install (11.3) we update firmware and leave it as is. So, later we didn't change firmware, we just try mps/mpr driver different from 18.03. Now, with 12.1-RELEASE, we have: mpr0: Firmware: 16.00.01.00, Driver: 23.00.00.00-fbsd -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496 --- Comment #33 from Ștefan BĂLU --- (In reply to Bane Ivosev from comment #32), so, should we keep the firmware in sync with those versions or go for a trial-and-error aproach? In the meantime, i'll let you know how it goes with latest FreeNAS (FreeBSD 11.3-RELEASE) and the firmware 15.00.03.00 version. I see that my corrected comment didn't go through... So, I downgraded from: BIOS Revision: 18.00.00.00 Firmware Revision: 16.00.01.00 to: BIOS Revision: 8.35.00.00 Firmware Revision: 15.00.03.00 The behaviour would trigger in a couple of days, so i'll keep you all posted. -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496 --- Comment #32 from Bane Ivosev --- We have problem with 18.03 version of the mpr/mps driver. That's why i said 12.1 and 11.1 relase working for us. 12.1 -> #define MPR_DRIVER_VERSION "23.00.00.00-fbsd" 12.0 -> #define MPR_DRIVER_VERSION "18.03.00.00-fbsd" 11.3 -> #define MPR_DRIVER_VERSION "18.03.00.00-fbsd" 11.2 -> #define MPR_DRIVER_VERSION "18.03.00.00-fbsd" 11.1 -> #define MPR_DRIVER_VERSION "15.03.00.00-fbsd" -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496 --- Comment #31 from Matthias Pfaller --- mpsutil show all Adapter: mps0 Adapter: Board Name: SAS9211-8i Board Assembly: Chip Name: LSISAS2008 Chip Revision: ALL BIOS Revision: 7.37.00.00 Firmware Revision: 20.00.07.00 Integrated RAID: yes uname -a FreeBSD nyx 12.1-RELEASE-p1 FreeBSD 12.1-RELEASE-p1 GENERIC amd64 This machine is experiencing the problems. The machines without problems are running 11.1 and (thank's for the pointer) Board Name: SAS9211-8i Board Assembly: H3-25250-02J Chip Name: LSISAS2008 Chip Revision: ALL BIOS Revision: 7.37.00.00 Firmware Revision: 19.00.00.00 Integrated RAID: no I'll try a downgrade to 19.00.00.00 regards, Matthias(In reply to Ștefan BĂLU from comment #30) -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496 Ștefan BĂLU changed: What|Removed |Added CC||stefan.b...@ulab.ro --- Comment #30 from Ștefan BĂLU --- Guys, i've just experienced these issues on the latest FreeNAS (it's a FreeBSD 11.3-RELEASE) with a LSI3008 controller. The issues you all experience are related to the BIOS Revision and Firmware Revision versions shown by: mprutil show all | less ... Adapter: mpr0 Adapter: Board Name: LSI3008-IT Board Assembly: Chip Name: LSISAS3008 Chip Revision: ALL BIOS Revision: 17.00.00.00 Firmware Revision: 15.00.03.00 Integrated RAID: no ... It's definitely not related to the types/models of disks used or the BSD version. Just make sure you are running the following firmware versions as i already have these in production with FreeBSD 11.2-RELEASE: Adapter: mpr0 Adapter: Board Name: LSI3008-IT Board Assembly: Chip Name: LSISAS3008 Chip Revision: ALL BIOS Revision: 17.00.00.00 Firmware Revision: 15.00.03.00 Integrated RAID: no -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496 --- Comment #29 from Bane Ivosev --- We have success with 11.1 and 12.1-RELEASE standard installation. No compiling and mixing driver versions. Problem was with 12.0 and 11.3-RELEASE. -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496 --- Comment #28 from free...@diyit.org --- I finally tried 12.1 and it seemed to run fine for a few days till I started a large file transfer and ... (da4:mps0:0:11:0): READ(10). CDB: 28 00 7c fa 4a a0 00 00 08 00 (da4:mps0:0:11:0): CAM status: SCSI Status Error (da4:mps0:0:11:0): SCSI status: Check Condition (da4:mps0:0:11:0): SCSI sense: ABORTED COMMAND asc:47,3 (Information unit iuCRC error detected) (da4:mps0:0:11:0): Retrying command (per sense data) Currently running 12.1-p2 with 11.3 release 357156 mps driver. I just delete the /usr/sr/sys/dev/mps directory and copy it over from the 11.3 source and compile the kernel /Sharad -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496 --- Comment #27 from Matthias Pfaller --- How does one change the affected version in the bug report? After all this is a >11.1 problem and not a 11.1-STABLE problem. -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496 wanpengq...@gmail.com changed: What|Removed |Added CC||wanpengq...@gmail.com --- Comment #26 from wanpengq...@gmail.com --- I experience this issue mulpi-times after upgrade to 12.1 from 11.1. the server is rock before. and I didn't realize it is a driver issue. Since my pool is already upgrade to lastest, I cannot downgrade to 11.1. I am buiding a 11.1 mps.ko for 12.1 kernel. load it manually by loader.conf I will report the result a few weeks later. -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496 --- Comment #25 from Bane Ivosev --- (In reply to Matthias Pfaller from comment #24) We are still fine. No problem at all. -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496 --- Comment #24 from Matthias Pfaller --- (In reply to Bane Ivosev from comment #23) We just gave it another try. We hopped that the problem might have been caused by a defective disk when we tried last. The problem was triggered again during the next night's backup :-( This is forcing us to keep running 11.1 on our backup machines. Any (bad/good) news from your machine? -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496 --- Comment #23 from Bane Ivosev --- Really sad news ... We are still running for 18 days now, everything is great with 12.1 for us, but is still early to make conclusions and your expirience is not promissing. -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496 --- Comment #22 from Matthias Pfaller --- I'm running FreeBSD 12.1 now. Yesterday we reattached the drives to the LSI controller. Problems showed up during the next backup :-( Kernel log: Dec 4 10:13:30 nyx kernel: mps0: port 0x8000-0x80ff mem 0xdf70-0xdf703fff,0xdf68-0xdf6b irq 32 at device 0.0 on pci3 Dec 4 10:13:30 nyx kernel: mps0: Firmware: 20.00.07.00, Driver: 21.02.00.00-fbsd Dec 4 10:13:30 nyx kernel: mps0: IOCCapabilities: 185c Dec 4 10:33:08 nyx kernel: mps0: port 0x8000-0x80ff mem 0xdf70-0xdf703fff,0xdf68-0xdf6b irq 32 at device 0.0 on pci3 Dec 4 10:33:08 nyx kernel: mps0: Firmware: 20.00.07.00, Driver: 21.02.00.00-fbsd Dec 4 10:33:08 nyx kernel: mps0: IOCCapabilities: 185c Dec 4 10:33:08 nyx kernel: da0 at mps0 bus 0 scbus0 target 2 lun 0 Dec 4 10:33:08 nyx kernel: da1 at mps0 bus 0 scbus0 target 3 lun 0 Dec 4 10:33:08 nyx kernel: da2 at mps0 bus 0 scbus0 target 4 lun 0 Dec 4 10:33:08 nyx kernel: da4 at mps0 bus 0 scbus0 target 6 lun 0 Dec 4 10:33:08 nyx kernel: da5 at mps0 bus 0 scbus0 target 7 lun 0 Dec 4 10:33:08 nyx kernel: da6 at mps0 bus 0 scbus0 target 8 lun 0 Dec 4 10:33:08 nyx kernel: da3 at mps0 bus 0 scbus0 target 5 lun 0 Dec 4 10:33:08 nyx kernel: da7 at mps0 bus 0 scbus0 target 9 lun 0 Dec 4 22:42:34 nyx kernel: (da4:mps0:0:6:0): READ(16). CDB: 88 00 00 00 00 03 45 37 b3 48 00 00 00 08 00 00 length 4096 SMID 1583 Command timeout on target 6(0x0010) 6 set, 60.66782887 elapsed Dec 4 22:42:34 nyx kernel: mps0: Sending abort to target 6 for SMID 1583 Dec 4 22:42:34 nyx kernel: (da4:mps0:0:6:0): READ(16). CDB: 88 00 00 00 00 03 45 37 b3 48 00 00 00 08 00 00 length 4096 SMID 1583 Aborting command 0xfe00f9364f28 Dec 4 22:42:34 nyx kernel: (da7:mps0:0:9:0): WRITE(16). CDB: 8a 00 00 00 00 03 60 06 f6 c8 00 00 00 18 00 00 length 12288 SMID 925 Command timeout on target 9(0x000d) 6 set, 60.23115994 elapsed Dec 4 22:42:34 nyx kernel: mps0: Sending abort to target 9 for SMID 925 Dec 4 22:42:34 nyx kernel: (da7:mps0:0:9:0): WRITE(16). CDB: 8a 00 00 00 00 03 60 06 f6 c8 00 00 00 18 00 00 length 12288 SMID 925 Aborting command 0xfe00f932daf8 Dec 4 22:42:34 nyx kernel: (da0:mps0:0:2:0): WRITE(16). CDB: 8a 00 00 00 00 04 dc 52 aa d8 00 00 00 30 00 00 length 24576 SMID 1033 Command timeout on target 2(0x000c) 6 set, 60.24094823 elapsed Dec 4 22:42:34 nyx kernel: mps0: Sending abort to target 2 for SMID 1033 Dec 4 22:42:34 nyx kernel: (da0:mps0:0:2:0): WRITE(16). CDB: 8a 00 00 00 00 04 dc 52 aa d8 00 00 00 30 00 00 length 24576 SMID 1033 Aborting command 0xfe00f9336c18 Dec 4 22:42:34 nyx kernel: (da1:mps0:0:3:0): WRITE(16). CDB: 8a 00 00 00 00 04 dc 52 aa d8 00 00 00 30 00 00 length 24576 SMID 1440 Command timeout on target 3(0x000b) 6 set, 60.24535090 elapsed Dec 4 22:42:34 nyx kernel: mps0: Sending abort to target 3 for SMID 1440 Dec 4 22:42:34 nyx kernel: (da1:mps0:0:3:0): WRITE(16). CDB: 8a 00 00 00 00 04 dc 52 aa d8 00 00 00 30 00 00 length 24576 SMID 1440 Aborting command 0xfe00f9358f00 Dec 4 22:42:34 nyx kernel: (da2:mps0:0:4:0): WRITE(10). CDB: 2a 00 24 36 61 98 00 00 18 00 length 12288 SMID 1472 Command timeout on target 4(0x000a) 6 set, 60.24982318 elapsed Dec 4 22:42:34 nyx kernel: mps0: Sending abort to target 4 for SMID 1472 Dec 4 22:42:34 nyx kernel: (da2:mps0:0:4:0): WRITE(10). CDB: 2a 00 24 36 61 98 00 00 18 00 length 12288 SMID 1472 Aborting command 0xfe00f935ba00 Dec 4 22:42:34 nyx kernel: (da5:mps0:0:7:0): WRITE(10). CDB: 2a 00 24 36 61 10 00 00 a0 00 length 81920 SMID 507 Command timeout on target 7(0x000f) 6 set, 60.25666047 elapsed Dec 4 22:42:34 nyx kernel: mps0: Sending abort to target 7 for SMID 507 Dec 4 22:42:34 nyx kernel: (da5:mps0:0:7:0): WRITE(10). CDB: 2a 00 24 36 61 10 00 00 a0 00 length 81920 SMID 507 Aborting command 0xfe00f930a948 Dec 4 22:42:40 nyx kernel: (xpt0:mps0:0:7:0): SMID 6 task mgmt 0xfe00f92e0810 timed out Dec 4 22:42:40 nyx kernel: mps0: Reinitializing controller Dec 4 22:42:40 nyx kernel: mps0: Unfreezing devq for target ID 6 Dec 4 22:42:40 nyx kernel: mps0: Unfreezing devq for target ID 9 Dec 4 22:42:40 nyx kernel: mps0: Unfreezing devq for target ID 2 ... lots more Just the reinitialization messages: Dec 4 22:42:40 nyx kernel: mps0: Reinitializing controller Dec 4 23:01:27 nyx kernel: mps0: Reinitializing controller Dec 4 23:40:55 nyx kernel: mps0: Reinitializing controller Dec 4 23:47:49 nyx kernel: mps0: Reinitializing controller Dec 4 23:58:02 nyx kernel: mps0: Reinitializing controller Dec 5 00:17:33 nyx kernel: mps0: Reinitializing controller Dec 5 00:20:18 nyx kernel: mps0: Reinitializing controller Dec 5 00:21:33 nyx kernel: mps0: Reinitializing controller Dec 5 00:24:30 nyx kernel: mps0: Reinitializing controller Dec 5 00:26:40 nyx kernel: mps0: Reinitializing controller Dec 5 00:29:30 nyx kernel: mps0: Reini
[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496 --- Comment #21 from Bane Ivosev --- (In reply to freebsd from comment #20) I'm running 12.1-RELEASE on the same hardware for 10 days now. Everything is ok. I'll report back for about a month. Success, i hope. -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496 --- Comment #20 from free...@diyit.org --- (In reply to Paul Thornton from comment #19) Thanks for sharing Paul. I took a slightly different approach. I have been running stable for a few months now with a FreeBSD 12.0-RELEASE kernel but using the FreeBSD 11.2 LSI mps driver. 12.1 brings in many updates to the 12.0 mps driver, hopefully those address these problems; will test once its released. /Sharad -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496 --- Comment #19 from Paul Thornton --- An update for all. I downgraded our production servers from 12.0 to 11.1-RELEASE-p15 and for 5 weeks they have worked without any problems. Previously we saw a problem after 4 weeks, then another after 1 week - so whilst there is a chance I have not waited long enough, I think this definitely fixes my problem. Paul. -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496 --- Comment #18 from Paul Thornton --- (In reply to Bane Ivosev from comment #16) Thanks for the confirmation. I'm in the process of downgrading the affected NAS machine (they are an identical pair) from 12.0 to 11.1, I need to test this in the lab first but will report back once we have the production machine on the older release. I need to do this before it next crashes and reboots, of course! -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496 --- Comment #17 from Matthias Pfaller --- (In reply to Francois Baillargeon from comment #14) We have cache and log devices: NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAGCAP DEDUP HEALTH ALTROOT k-zb1 87.2T 35.5T 51.8T- -20%40% 1.91x ONLINE - raidz287.2T 35.5T 51.8T- -20%40% gpt/k-zb1-0 - - -- - - - gpt/k-zb1-1 - - -- - - - gpt/k-zb1-2 - - -- - - - gpt/k-zb1-3 - - -- - - - gpt/k-zb1-4 - - -- - - - gpt/k-zb1-5 - - -- - - - gpt/k-zb1-6 - - -- - - - gpt/k-zb1-7 - - -- - - - log - - - - - - mirror11.5G 0 11.5G- - 0% 0% gpt/k-zb1-zil0 - - -- - - - gpt/k-zb1-zil1 - - -- - - - cache - - - - - - gpt/k-zb1-cache0 80.0G 29.9G 50.1G- - 0%37% gpt/k-zb1-cache1 80.0G 30.0G 50.0G- - 0%37% The only solution for us was to use the onboard sata ports instead of the LSI-controller. We have keep the other machines that need the LSI-controllers (not enough ports on the mainboard) at 11.1 :-( -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496 --- Comment #16 from Bane Ivosev --- Hi Paul, i had exact same symptoms like you but with WD Reds 4TB, and for me with 11.1 everything work flawlessly for more then 130 days now. -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496 Paul Thornton changed: What|Removed |Added CC||freebsd-bugzi...@prt.org --- Comment #15 from Paul Thornton --- I too have run into this issue on a nas box, once it started taking on any kind of load. Running 12.0-RELEASE p3 The server contains 8x Seagate Ironwolf Pro 10Tb SATA drives on an Avago 3008 HBA - 8 of these basically: da2 at mpr1 bus 0 scbus13 target 12 lun 0 da2: Fixed Direct Access SPC-4 SCSI device da2: Serial Number ZA237AVY da2: 1200.000MB/s transfers da2: Command Queueing enabled da2: 9537536MB (19532873728 512 byte sectors) Driver versions: dev.mpr.1.driver_version: 18.03.00.00-fbsd dev.mpr.1.firmware_version: 15.00.03.00 These drives are configured in a ZFS RAID10 setup (in case that datapoint matters): NAME STATE READ WRITE CKSUM data0ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 da2.eli ONLINE 0 0 0 da3.eli ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 da4.eli ONLINE 0 0 0 da5.eli ONLINE 0 0 0 mirror-2 ONLINE 0 0 0 da6.eli ONLINE 0 0 0 da7.eli ONLINE 0 0 0 mirror-3 ONLINE 0 0 0 da8.eli ONLINE 0 0 0 da9.eli ONLINE 0 0 0 I currently get about 25 days between reboots. The machine hangs and (I'm guessing here) kernel panics and restarts - I don't have the panic information, but log messages look very similar to what other people are seeing: Jul 20 11:14:17 nas1a kernel: (da2:mpr1:0:12:0): WRITE(10). CDB: 2a 00 62 81 f9 d0 00 00 30 00 length 24576 SMID 1484 Command timeout on target 12(0x000c), 6 set, 60.703976195 elapsed Jul 20 11:14:17 nas1a kernel: mpr1: At enclosure level 0, slot 2, connector name () Jul 20 11:14:17 nas1a kernel: mpr1: Sending abort to target 12 for SMID 1484 Jul 20 11:14:17 nas1a kernel: (da2:mpr1:0:12:0): WRITE(10). CDB: 2a 00 62 81 f9 d0 00 00 30 00 length 24576 SMID 1484 Aborting command 0 xfe00bad0b540 Jul 20 11:14:17 nas1a kernel: (da2:mpr1:0:12:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 length 0 SMID 1792 Command ti meout on target 12(0x000c), 6 set, 60.707504796 elapsed Jul 20 11:14:17 nas1a kernel: mpr1: At enclosure level 0, slot 2, connector name () Jul 20 11:14:18 nas1a kernel: mpr1: Controller reported scsi ioc terminated tgt 12 SMID 1792 loginfo 3114 Jul 20 11:14:18 nas1a kernel: (da2:mpr1:0:12:0): WRITE(10). CDB: 2a 00 62 81 f9 d0 00 00 30 00 Jul 20 11:14:18 nas1a kernel: mpr1: Abort failed for target 12, sending logical unit reset Jul 20 11:14:18 nas1a kernel: mpr1: (da2:mpr1:0:12:0): CAM status: CCB request aborted by the host Jul 20 11:14:18 nas1a kernel: Sending logical unit reset to target 12 lun 0 Jul 20 11:14:18 nas1a kernel: (da2:mpr1:0:12:0): Retrying command, 3 more tries remain Jul 20 11:14:18 nas1a kernel: mpr1: At enclosure level 0, slot 2, connector name () Jul 20 11:14:18 nas1a kernel: (da2:mpr1:0:12:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 Jul 20 11:14:18 nas1a kernel: (da2:mpr1:0:12:0): CAM status: CCB request completed with an error Jul 20 11:14:18 nas1a kernel: (da2:mpr1:0:12:0): Retrying command, 0 more tries remain Jul 20 11:14:18 nas1a kernel: mpr1: mprsas_action_scsiio: Freezing devq for target ID 12 Jul 20 11:14:18 nas1a kernel: (da2:mpr1:0:12:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 Jul 20 11:14:18 nas1a kernel: (da2:mpr1:0:12:0): CAM status: CAM subsystem is busy Jul 20 11:14:18 nas1a kernel: (da2:mpr1:0:12:0): Error 5, Retries exhausted Jul 20 11:14:18 nas1a kernel: mpr1: mprsas_action_scsiio: Freezing devq for target ID 12 Jul 20 11:14:18 nas1a kernel: (da2:mpr1:0:12:0): WRITE(10). CDB: 2a 00 62 81 f9 d0 00 00 30 00 Jul 20 11:14:18 nas1a kernel: (da2:mpr1:0:12:0): CAM status: CAM subsystem is busy Jul 20 11:14:18 nas1a kernel: (da2:mpr1:0:12:0): Retrying command, 2 more tries remain [reboot happens here] And the most recent one, today: Aug 13 08:58:55 nas1a kernel: (da6:mpr1:0:16:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 length 0 SMID 998 Command tim eout on target 16(0x0010), 6 set, 60.109683189 elapsed Aug 13 08:58:55 nas1a kernel: mpr1: At enclosure level 0, slot 6, connector name () Aug 13 08:58:55 nas1a kernel: mpr1: Sending abort to target 16 for SMID 998 Aug 13 08:58:55 nas1a kernel: (da6:mpr1:0:16:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 length 0 SMID 998 Aborting co mmand 0xfe00bacdfaa0 Aug 13 08:58:55 nas1a kernel: mpr1: Abort failed for target 16, sending logical unit reset Aug 13 08:58:55 nas1a kern
[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496 --- Comment #14 from Francois Baillargeon --- We usually have a cache on affected pools, and sadly we still have issues when the l2arc hits the disk, think long sequential reads -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496 --- Comment #13 from Christoph Bubel --- (In reply to Daniel Shafer from comment #10) I can confirm the workaround, no errors since i added L2ARC to the pool. -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496 --- Comment #12 from n...@nmc.dev --- Hi, We are seeing the same issue. Here is more information on our setup : FreeNAS-11.2-U5 FreeBSD 11.2-STABLE amd64 We use 2 x (6x 14TB seagate ironwolf drives ) We also have a 2 TB crucial SSD for L2ARC Issue always comes up after 10-14hours of heavy IO Disk Model : 14 TB Seagate ST14000VN0008 The drives are on 2 different LSI HBAs. Drive that fails are random on both those HBA. Please let us know if you need more information on this, it is impacting our production load. Thank you. Log output for our latest errors : > (da30:mpr1:0:20:0): READ(16). CDB: 88 00 00 00 00 01 05 1a 70 a8 00 > 00 00 10 00 00 length 8192 SMID 60 Aborting command 0xfe000171f640 > mpr1: Sending reset from mprsas_send_abort for target ID 20 > (da30:mpr1:0:20:0): READ(10). CDB: 28 00 b7 81 49 f0 00 00 08 00 length > 4096 SMID 332 terminated ioc 804b loginfo 3113 scsi 0 state c xfer 0 > (da30:mpr1:0:20:0): READ(10). CDB: 28 00 b7 81 49 e8 00 00 08 00 > length 4096 SMID 703 terminated ioc 804b loginfo 3113 > sc(da30:mpr1:0:20:0): READ(10). CDB: 28 00 b7 81 49 f0 00 00 08 00 si 0 state > c xfer 0 > (da30:mpr1:0:20:0): READ(16). CDB: 88 00 00 00 00 01 05 1a 77 b8 00 > 00 01 00 00 00 length 131072 SMID 510 terminated ioc > 804b(da30:mpr1:0:20:0): CAM status: CCB request completed with an > error > (da30:mpr1:0:20:0): Retrying command > (da30:mpr1:0:20:0): READ(10). CDB: 28 00 b7 81 49 e8 00 00 08 00 > loginfo 3113 scsi 0 state c xfer 0 > (da30:mpr1:0:20:0): CAM status: CCB request completed with an error > (da30:mpr1:0:20:0): Retrying command > (da30:mpr1:0:20:0): READ(16). CDB: 88 00 00 00 00 01 05 1a 76 b8 00 00 > 01 00 00 00 length 131072 SMID 938 terminated ioc 804b loginfo 3113 scsi > 0 state c xfer 0 > (da30:mpr1:0:20:0): READ(16). CDB: 88 00 00 00 00 01 05 1a 75 b8 00 00 > 01 00 00 00 length 131072 SMID 839 terminated ioc 804b loginfo 3113 scsi > 0 state c xfer 0 > (da30:mpr1:0:20:0): READ(16). CDB: 88 00 00 00 00 01 05 1a 74 b8 00 00 > 01 00 00 00 length 131072 SMID 681 terminated ioc 804b loginfo 3113 scsi > 0 state c xfer 0 > (da30:mpr1:0:20:0): READ(16). CDB: 88 00 00 00 00 01 05 1a 73 b8 00 00 > 01 00 00 00 length 131072 SMID 647 terminated ioc 804b loginfo 3113 scsi > 0 state c xfer 0 > (da30:mpr1:0:20:0): READ(16). CDB: 88 00 00 00 00 01 05 1a 72 b8 00 00 > 01 00 00 00 length 131072 SMID 253 terminated ioc 804b loginfo 3113 scsi > 0 state c xfer 0 > (da30:mpr1:0:20:0): READ(16). CDB: 88 00 00 00 00 01 05 1a 71 b8 00 00 > 01 00 00 00 length 131072 SMID 109 terminated ioc 804b loginfo 3113 scsi > 0 state c xfer 0 > (da30:mpr1:0:20:0): READ(16). CDB: 88 00 00 00 00 01 05 1a 70 b8 00 00 > 01 00 00 00 length 131072 SMID 267 terminated ioc 804b loginfo 3113 scsi > 0 state c xfer 0 > (da30:mpr1:0:20:0): READ(16). CDB: 88 00 00 00 00 01 05 1a 70 98 00 00 > 00 10 00 00 length 8192 SMID 506 terminated ioc 804b loginfo 3113 scsi 0 > state c xfer 0 > (da30:mpr1:0:20:0): READ(16). CDB: 88 00 00 00 00 01 05 1a 70 88 00 00 > 00 10 00 00 length 8192 SMID 774 terminated ioc 804b loginfo 3113 scsi 0 > state c xfer 0 > (da30:mpr1:0:20:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 > 00 00 00 length 0 SMID 281 terminated ioc 804b loginfo 3114 scsi 0 > state c xfer 0 > mpr1: Unfreezing devq for target ID 20 > (da30:mpr1:0:20:0): READ(16). CDB: 88 00 00 00 00 01 05 1a 77 b8 00 00 > 01 00 00 00 > (da30:mpr1:0:20:0): CAM status: CCB request completed with an error > (da30:mpr1:0:20:0): Retrying command > (da30:mpr1:0:20:0): READ(16). CDB: 88 00 00 00 00 01 05 1a 76 b8 00 00 > 01 00 00 00 > (da30:mpr1:0:20:0): CAM status: CCB request completed with an error > (da30:mpr1:0:20:0): Retrying command > (da30:mpr1:0:20:0): READ(16). CDB: 88 00 00 00 00 01 05 1a 75 b8 00 00 > 01 00 00 00 > (da30:mpr1:0:20:0): CAM status: CCB request completed with an error > (da30:mpr1:0:20:0): Retrying command > (da30:mpr1:0:20:0): READ(16). CDB: 88 00 00 00 00 01 05 1a 74 b8 00 00 > 01 00 00 00 > (da30:mpr1:0:20:0): CAM status: CCB request completed with an error > (da30:mpr1:0:20:0): Retrying command > (da30:mpr1:0:20:0): READ(16). CDB: 88 00 00 00 00 01 05 1a 73 b8 00 00 > 01 00 00 00 > (da30:mpr1:0:20:0): CAM status: CCB request completed with an error > (da30:mpr1:0:20:0): Retrying command > (da30:mpr1:0:20:0): READ(16). CDB: 88 00 00 00 00 01 05 1a 72 b8 00 00 > 01 00 00 00 > (da30:mpr1:0:20:0): CAM status: CCB request completed with an error > (da30:mpr1:0:20:0): Retrying command > (da30:mpr1:0:20:0): READ(16). CDB: 88 00 00 00 00 01 05 1a 71 b8 00 00 > 01 00 00 00 > (da30:mpr1:0:20:0): CAM status: CCB request completed with an error > (da30:mpr1:0:20:0): Retrying command > (da30:mpr1:0:20:0): READ(16). CDB: 88 00 00 00 00 01 05 1a 70 b8 00 00 > 01 00 00 00 > (da30:mpr1:0:20:
[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496 Francois Baillargeon changed: What|Removed |Added CC||francois.baillargeon@gearbo ||xsoftware.com --- Comment #11 from Francois Baillargeon --- Following what Daniel Shafer says, we have the same issues on a Freenas deployment we did. Everything fine with our other pools that use less than 10tb drives. But one of our pool using 14tb drive exhibit this exact behavior. For us this is a major show stopper bug since we can't use this pool reliably. Our vendor sent us a new HBA, a new server, etc before I stumbled upon this bug listing. -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496 Daniel Shafer changed: What|Removed |Added CC||dan...@shafer.cc --- Comment #10 from Daniel Shafer --- So I came across this same issue. It was causing my server to reboot several times a day due to kernel panics caused by this issue. It happens with both SAS9200 and 9300 controllers. I have 8 x 10TB Seagate Iron Wolf NAS drives. I wanted to mention that for me there was a resolution. I added an Intel Optane 900p 280GB drive and set that up for cache/l2arc and the problem entirely disappeared. My server ran for 20 days before I rebooted it last night to perform an upgrade. So, a workaround I believe would be is to add a cache drive to your ZFS pool. The Intel Optane 900p is a highly recommended cache drive for ZFS pools. -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496 --- Comment #9 from Matthias Pfaller --- (In reply to Bane Ivosev from comment #8) We have several other machines with SAS2008 controllers. All of them are running 11.1 and none of them shows these problems... -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496 --- Comment #8 from Bane Ivosev --- And I don't think the problem is exclusively with Seagate 10TB drives. We have WD Red 4TB drives and have the same problem. We have same situation also with 11.2-RELEASE, and beacuse 11.2 and 12.0 have same mpr/mps driver version we decide to try with 11.1. -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496 --- Comment #7 from Bane Ivosev --- Just to append my previous post from March, same hardware and same config, we revert back on 11.1-RELEASE and everything working flawlessly for more then two months now. -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496 --- Comment #6 from Matthias Pfaller --- We are using FreeBSD 12.0-RELEASE: FreeBSD nyx 12.0-RELEASE-p4 FreeBSD 12.0-RELEASE-p4 GENERIC amd64 -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496 --- Comment #5 from Matthias Pfaller --- Comment on attachment 205003 --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=205003 /var/log/messages during device resets We just did configure a backup server with eight seagate ironwulf (ST12000VN0007-2GS116) 12TB disks connected to a SAS2008: Jun 12 08:51:35 nyx kernel: mps0: port 0x8000-0x80ff mem 0xdf70-0xdf703fff,0xdf68-0xdf6b irq 32 at device 0.0 on pci3 Jun 12 08:51:35 nyx kernel: mps0: Firmware: 20.00.07.00, Driver: 21.02.00.00-fbsd Jun 12 08:51:35 nyx kernel: mps0: IOCCapabilities: 185c After writing ~200gb to our pool it started reseting. I did a sysctl dev.mps.0.debug_level=0x$((0x1+0x2+0x4+0x10+0x20)) The resulting trace is attached. -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496 --- Comment #4 from Matthias Pfaller --- We just did configure a backup server with eight seagate ironwulf (ST12000VN0007-2GS116) 12TB disks connected to a SAS2008: Jun 12 08:51:35 nyx kernel: mps0: port 0x8000-0x80ff mem 0xdf70-0xdf703fff,0xdf68-0xdf6b irq 32 at device 0.0 on pci3 Jun 12 08:51:35 nyx kernel: mps0: Firmware: 20.00.07.00, Driver: 21.02.00.00-fbsd Jun 12 08:51:35 nyx kernel: mps0: IOCCapabilities: 185c After writing ~200gb to our pool it started reseting. I did a sysctl dev.mps.0.debug_level=0x$((0x1+0x2+0x4+0x10+0x20)) The resulting trace is attached. -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496 Matthias Pfaller changed: What|Removed |Added CC||matthias.pfaller@familie-pf ||aller.de --- Comment #3 from Matthias Pfaller --- Created attachment 205003 --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=205003&action=edit /var/log/messages during device resets -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496 --- Comment #2 from Bane Ivosev --- Forgot to say, its FreeBSD 12-RELEASE. -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496 Bane Ivosev changed: What|Removed |Added CC||bane.ivo...@pmf.uns.ac.rs --- Comment #1 from Bane Ivosev --- We have the very same problem but with WD Red disks. System randomly reboot sometimes after 20 days of working. Different disk everytime. It's our production nfs server and now it's very frustrating. Supermicro 5049p 64 GB ECC RAM LSI 3008 IT mode 18x WD Red 4 TB Mar 23 07:39:46 fap kernel: (da17:mpr0:0:25:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 length 0 SMID 357 Command timeout on target 25(0x001c), 6 set, 60.107418057 elapsed Mar 23 07:39:46 fap kernel: mpr0: At enclosure level 0, slot 17, connector name () Mar 23 07:39:46 fap kernel: mpr0: Sending abort to target 25 for SMID 357 Mar 23 07:39:46 fap kernel: (da17:mpr0:0:25:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 length 0 SMID 357 Aborting command 0xfe00b7aa6130 Mar 23 07:39:46 fap kernel: (pass19:mpr0:0:25:0): ATA COMMAND PASS THROUGH(16). CDB: 85 08 0e 00 d0 00 01 00 00 00 4f 00 c2 00 b0 00 length 512 SMID 1182 Command timeout on target 25(0x001c), 6 set, 60.217681679 elampr0: At enclosure level 0, slot 17, connector name () Mar 23 07:39:46 fap kernel: mpr0: Controller reported scsi ioc terminated tgt 25 SMID 1182 loginfo 3113 Mar 23 07:39:46 fap kernel: mpr0: Abort failed for target 25, sending logical unit reset Mar 23 07:39:46 fap kernel: mpr0: Sending logical unit reset to target 25 lun 0 Mar 23 07:39:46 fap kernel: mpr0: At enclosure level 0, slot 17, connector name () Mar 23 07:39:46 fap kernel: (da17:mpr0:0:25:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 Mar 23 07:39:46 fap kernel: (da17:mpr0:0:25:0): CAM status: CCB request aborted by the host Mar 23 07:39:46 fap kernel: (da17:mpr0:0:25:0): Retrying command, 0 more tries remain Mar 23 07:39:46 fap kernel: mpr0: mprsas_action_scsiio: Freezing devq for target ID 25 Mar 23 07:39:46 fap kernel: (da17:mpr0:0:25:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 Mar 23 07:39:46 fap kernel: (da17:mpr0:0:25:0): CAM status: CAM subsystem is busy Mar 23 07:39:46 fap kernel: (da17:mpr0:0:25:0): Error 5, Retries exhausted Mar 23 07:39:46 fap smartd[95746]: Device: /dev/da17 [SAT], failed to read SMART Attribute Data Mar 23 07:39:46 fap kernel: mpr0: mprsas_action_scsiio: Freezing devq for target ID 25 Mar 23 07:39:46 fap kernel: (da17:mpr0:0:25:0): WRITE(10). CDB: 2a 00 09 4a 32 a8 00 00 08 00 Mar 23 07:39:46 fap kernel: (da17:mpr0:0:25:0): CAM status: CAM subsystem is busy Mar 23 07:39:46 fap kernel: (da17:mpr0:0:25:0): Retrying command, 3 more tries remain Mar 23 07:43:19 fap syslogd: kernel boot file is /boot/kernel/kernel Mar 23 07:43:19 fap kernel: ---<>--- -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496 Christoph Bubel changed: What|Removed |Added Hardware|Any |amd64 Severity|Affects Only Me |Affects Some People -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496 Bug ID: 224496 Summary: mpr and mps drivers seems to have issues with large seagate drives Product: Base System Version: 11.1-STABLE Hardware: Any OS: Any Status: New Severity: Affects Only Me Priority: --- Component: kern Assignee: freebsd-bugs@FreeBSD.org Reporter: cbu...@mailbox.org Over on the Freenas forums several people reported issues with large (10TB) Seagate drives (ST1NM0016 and ST1VN0004) and LSI controllers. Links to the threads: https://forums.freenas.org/index.php?threads/lsi-avago-9207-8i-with-seagate-10tb-enterprise-st1nm0016.58251/ https://forums.freenas.org/index.php?threads/synchronize-cache-command-timeout-error.55067/ I am using the ST1NM0016 drives and i am getting the following errors on a LSI SAS2308 (mps driver) and on a LSI SAS3008 (mpr driver). This happens about once every one or two weeks in low load situations. Here the logs: (da2:mps0:0:1:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 length 0 SMID 1010 command timeout cm 0xfef7cda0 ccb 0xf8018198d800 (noperiph:mps0:0:4294967295:0): SMID 1 Aborting command 0xfef7cda0 mps0: Sending reset from mpssas_send_abort for target ID 1 (da2:mps0:0:1:0): WRITE(16). CDB: 8a 00 00 00 00 02 e1 76 9f 88 00 00 00 08 00 00 length 4096 SMID 959 terminated ioc 804b scsi 0 state c xfer 0 mps0: Unfreezing devq for target ID 1 (da2:mps0:0:1:0): WRITE(16). CDB: 8a 00 00 00 00 02 e1 76 9f 88 00 00 00 08 00 00 (da2:mps0:0:1:0): CAM status: CCB request completed with an error (da2:mps0:0:1:0): Retrying command (da2:mps0:0:1:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 (da2:mps0:0:1:0): CAM status: Command timeout (da2:mps0:0:1:0): Retrying command (da2:mps0:0:1:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 (da2:mps0:0:1:0): CAM status: SCSI Status Error (da2:mps0:0:1:0): SCSI status: Check Condition (da2:mps0:0:1:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred) (da2:mps0:0:1:0): Error 6, Retries exhausted (da2:mps0:0:1:0): Invalidating pack --- (da1:mpr0:0:4:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 length 0 SMID 319 Aborting command 0xfef54a90 mpr0: Sending reset from mprsas_send_abort for target ID 4 (da1:mpr0:0:4:0): WRITE(16). CDB: 8a 00 00 00 00 03 35 b7 b9 f0 00 00 00 28 00 00 length 20480 SMID 320 terminated ioc 804b loginfo 3113 scsi 0 state c xfer 0 mpr0: Unfreezing devq for target ID 4 (da1:mpr0:0:4:0): WRITE(16). CDB: 8a 00 00 00 00 03 35 b7 b9 f0 00 00 00 28 00 00 (da1:mpr0:0:4:0): CAM status: CCB request completed with an error (da1:mpr0:0:4:0): Retrying command (da1:mpr0:0:4:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 (da1:mpr0:0:4:0): CAM status: Command timeout (da1:mpr0:0:4:0): Retrying command (da1:mpr0:0:4:0): WRITE(16). CDB: 8a 00 00 00 00 03 35 b7 b9 f0 00 00 00 28 00 00 (da1:mpr0:0:4:0): CAM status: SCSI Status Error (da1:mpr0:0:4:0): SCSI status: Check Condition (da1:mpr0:0:4:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred) (da1:mpr0:0:4:0): Retrying command (per sense data) (da1:mpr0:0:4:0): WRITE(16). CDB: 8a 00 00 00 00 03 35 b7 ba 80 00 00 00 20 00 00 length 16384 SMID 653 terminated ioc 804b loginfo 31110e03 scsi 0 state c xfer 0 (da1:mpr0:0:4:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 length 0 SMID 711 terminated ioc 804b loginfo 311(da1:mpr0:0:4:0): WRITE(16). CDB: 8a 00 00 00 00 03 35 b7 ba 80 00 00 00 20 00 00 10e03 scsi 0 state c xfer 0 (da1:mpr0:0:4:0): CAM status: CCB request completed with an error (da1:mpr0:0:4:0): Retrying command (da1:mpr0:0:4:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 (da1:mpr0:0:4:0): CAM status: CCB request completed with an error (da1:mpr0:0:4:0): Retrying command (da1:mpr0:0:4:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 (da1:mpr0:0:4:0): CAM status: SCSI Status Error (da1:mpr0:0:4:0): SCSI status: Check Condition (da1:mpr0:0:4:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred) (da1:mpr0:0:4:0): Error 6, Retries exhausted (da1:mpr0:0:4:0): Invalidating pack (pass1:mpr0:0:4:0): ATA COMMAND PASS THROUGH(16). CDB: 85 06 2c 00 da 00 00 00 00 00 4f 00 c2 00 b0 00 length 0 SMID 797 terminated ioc 804b loginfo 31110e03 scsi 0 state c xfer 0 (pass1:mpr0:0:4:0): ATA COMMAND PASS THROUGH(16). CDB: 85 08 0e 00 d0 00 01 00 00 00 4f 00 c2 00 b0 00 length 512 SMID 753 terminated ioc 804b loginfo 31110e03 scsi 0 state c xfer 0 (pass1:mpr0:0:4:0): ATA COMMAND PASS THROUGH(16). CDB: 85 08 0e 00 d5 00 01 00 06 00 4f 00 c2 00 b0 00 length 512 SMID 846 terminated ioc 804b loginfo 31110e03 scsi 0 state c xfer 0 (pass1:mpr0:0:4:0): ATA COMMAND PASS THROUGH(16). CDB: 85 08 0e 00 d5 00 01 00