[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives

2024-01-17 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496

Mark Linimon  changed:

   What|Removed |Added

   Assignee|b...@freebsd.org|bugmeis...@freebsd.org
 Resolution|--- |Overcome By Events
 Status|New |Closed

-- 
You are receiving this mail because:
You are the assignee for the bug.


[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives

2024-01-17 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496

--- Comment #57 from Michel Le Cocq  ---
no more trouble since upgrade to 14.0.

-- 
You are receiving this mail because:
You are the assignee for the bug.


[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives

2024-01-05 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496

Michel Le Cocq  changed:

   What|Removed |Added

 CC||no...@neuronfarm.net

--- Comment #56 from Michel Le Cocq  ---
(In reply to Allan Jude from comment #55)

Hi, is this patch been included in 13.2-RELEASE ?

I'm on 13.2-RELEASE-p9 and have the same issue with 2 different 9207-8i card :

Jan  5 09:38:49 gobt kernel: mps0: IOC Fault 0x4d04, Resetting
Jan  5 09:38:49 gobt kernel: mps0: Reinitializing controller
Jan  5 09:38:49 gobt kernel: mps0: Firmware: 20.00.07.00, Driver:
21.02.00.00-fbsd


Controller information

  Controller type : SAS2308_2
  BIOS version: 7.39.02.00
  Firmware version: 20.00.07.00

mps0@pci0:4:0:0:class=0x010700 rev=0x05 hdr=0x00 vendor=0x1000
device=0x0087 subvendor=0x1000 subdevice=0x3020
vendor = 'Broadcom / LSI'
device = 'SAS2308 PCI-Express Fusion-MPT SAS-2'
class  = mass storage
subclass   = SAS
pcib8@pci0:6:0:0:   clas

-- 
You are receiving this mail because:
You are the assignee for the bug.


[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives

2022-05-28 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496

Allan Jude  changed:

   What|Removed |Added

 CC||allanj...@freebsd.org

--- Comment #55 from Allan Jude  ---
If you see the error message: "kernel: mps0: IOC Fault 0x4d04, Resetting"

You need to update to get this fix:
https://cgit.freebsd.org/src/commit/?id=e30fceb89b7eb51825bdd65f9cc4fbadf107d763

If your errors don't include that code, that is a different problem.

-- 
You are receiving this mail because:
You are the assignee for the bug.


[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives

2022-05-17 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496

JerRatt IT  changed:

   What|Removed |Added

 CC||cont...@jerratt.com

--- Comment #54 from JerRatt IT  ---
I'm reporting either the same or similar issue, here are my findings, and
please let me know if my plans sound correct:

Setup:
TrueNAS Scale 22.02.0.1
AMD Threadripper 1920X
ASRock X399 Taichi
128GB (8x16GB) Crucial CT8G4WFD824A Unbuffered ECC
AVAGO/LSI 9400-8i SAS3408 12Gbps HBA Adapter
Supermicro BPN-SAS3-743A 8-Port SAS3/SAS2/SATA 12Gbps Backplane
8 x Seagate Exos X18 18TB HDD ST18000NM004J SAS 12Gbps 512e/4Kn
2 x Crucial 120GB SSD
2 x Crucial 1TB SSD
2 x Western Digital 960GB NVME
Supermicro 4U case w/2000watt Redundant Power Supply

The system is connected with a large APC data-center battery system and
conditioner, in a HVAC controlled area. All hard drives have the newest
firmware, and in 4k sectors both logical and native. The controller has the
newest firmware, both regular and legacy roms, and with the SATA/SAS only mode
flashed (dropping the NVME multi/tri-mode option that the new 9400 series cards
support).

Running any kind of heavy I/O onto the 18TB drives that are connected to the
BPN-SAS3-743A backplane and through to the LSI 9400-8i HBA eventually results
in the drive resetting. This happens even without the drives assigned to any
kind of ZFS pool. This also happens whether running from the shell within the
GUI or from the shell itself. This happens on all drives, that are using two
separate SFF8643 cables with a backplane that has two separate SFF8643 ports.

To cause this to happen, I can either run badblocks on each drive (using:
badblocks -c 1024 -w -s -v -e 1 -b 65536 /dev/sdX), or even just running a
SMART extended/long test.

Eventually, all or nearly all drives will reset, even spin down (according to
the shell logs). Sometimes they reset in batches, while others continue
chugging along. It's made completing any kind of SMART extended test not
possible. Badblocks will fail out, reporting too many bad blocks, on multiple
hard drives all at nearly the exact same moment, yet consecutive badblock scans
won't report bad blocks in the same areas. SMART test will just show "aborted,
drive reset?" as the result.

My plan was to replace the HBA with an older LSI 9305-16i, replace the two
SFF8643-SFF8643 cables going from the HBA to the backplane just for good
measure, install two different SFF8643-SFF8482 cables that bypass the backplane
fully, then four of the existing Seagate 18TB drives and put them on the the
SFF8643-SFF8482 connections that bypass the backplane, as well as install four
new WD Ultrastar DC HC550 (WUH721818AL5204) drives into the mix (some using the
backplane, some not). That should reveal if this is a compatibility/bug issue
with all large drives or certain large drives on a LSI controller, the mpr
driver, and/or this backplane.

If none of that works or doesn't eliminate all the potential points of
failures, I'm left with nothing but subpar work arounds, such as just using the
onboard SATA ports, disabling NCQ in the LSI controller, or setting up a L2ARC
cache (or I might try a metadata cache to see if that circumvents the issue as
well).




Condensed logs when one drive errors out:

sd 0:0:0:0: device_unblock and setting to running, handle(0x000d)
mpt3sas_cm0: log_info(0x31110e05): originator(PL), code(0x11), sub_code(0x0e05)
mpt3sas_cm0: log_info(0x31110e05): originator(PL), code(0x11), sub_code(0x0e05)
~
~
~
~
sd 0:0:0:0: Power-on or device reset occurred
...ready
sd 0:0:6:0: device_block, handle(0x000f)
sd 0:0:9:0: device_block, handle(0x0012)
sd 0:0:10:0: device_block, handle(0x0014)
mpt3sas_cm0: log_info(0x3112010c): originator(PL), code(0x12), sub_code(0x010c)
sd 0:0:9:0: device_unblock and setting to running, handle(0x0012)
sd 0:0:6:0: device_unblock and setting to running, handle(0x000f)
sd 0:0:10:0: device_unblock and setting to running, handle(0x0014)
sd 0:0:9:0: Power-on or device reset occurred
sd 0:0:6:0: Power-on or device reset occurred
sd 0:0:10:0: Power-on or device reset occurred
scsi_io_completion_action: 5 callbacks suppressed
sd 0:0:10:0: [sdd] tag#5532 FAILED Result: hostbyte=DID_OK
driverbyte=DRIVER_SENSE cmd_age=2s
sd 0:0:10:0: [sdd] tag#5532 Sense Key : Not Ready [current] [descriptor] 
sd 0:0:10:0: [sdd] tag#5532 Add. Sense: Logical unit not ready, additional
power granted
sd 0:0:10:0: [sdd] tag#5532 CDB: Write(16) 8a 00 00 00 00 00 5c 75 7a 12 00 00
01 40 00 00
print_req_error: 5 callbacks suppressed
blk_update_request: I/O error, dev sdd, sector 12409622672 op 0x1:(WRITE) flags
0xc800 phys_seg 1 prio class 0
sd 0:0:10:0: [sdd] tag#5533 FAILED Result: hostbyte=DID_OK
driverbyte=DRIVER_SENSE cmd_age=2s
sd 0:0:10:0: [sdd] tag#5533 Sense Key : Not Ready [current] [descriptor] 
sd 0:0:10:0: [sdd] tag#5533 Add. Sense: Logical 

[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives

2022-02-13 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496

leopoldo.20b75...@mailbeaver.net changed:

   What|Removed |Added

 CC||leopoldo.20b7513b@mailbeave
   ||r.net

--- Comment #53 from leopoldo.20b75...@mailbeaver.net ---
I just started experiencing this issue with my setup with 3 IBM M1015 HBAs and
ST1NM002G SAS drives.

Has anyone tested their setup with TrueNAS Scale? Since the platform is based
on Linux I was hoping this bug was not present. I may try switching when Scale
is out of RC status later this month.

-- 
You are receiving this mail because:
You are the assignee for the bug.


[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives

2022-01-06 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496

--- Comment #52 from Daniel Austin  ---
(In reply to Alexander Motin from comment #51)
If it helps, it seems to be enclosure related...

I can use NCQ on my disks and controller in another enclosure (e.g. I have an
Areca 8 bay that's fine)... but when using my QNAP jbod enclosure (TL-D1600S) I
get errors when NCQ is enabled.

-- 
You are receiving this mail because:
You are the assignee for the bug.


[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives

2022-01-06 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496

Alexander Motin  changed:

   What|Removed |Added

 CC||m...@freebsd.org

--- Comment #51 from Alexander Motin  ---
(In reply to Daniel Austin from comment #48)
I see mpsutil/mprutil tools recently got support for `set ncq` subcommand, that
should allow to disable NCQ.  I've merged that into TrueNAS 12.0-U7.  I haven't
tried that myself, and it makes me shiver inside from its inefficiency, but I
can believe that it may reduce maximum command latency in some scenarios, or
may be even avoid command timeouts in situations when the disks or the HBAs
can't schedule or process the NCQ commands reasonably.

-- 
You are receiving this mail because:
You are the assignee for the bug.


[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives

2021-05-31 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496

--- Comment #50 from Conall O'Brien  ---
I've been experiencing these issues with WD Red 6TB disks. I had frequent,
unexplained reboots with 12.1-RELEASE and 12.2-RELEASE. 
mprutil show all
Adapter:
mpr0 Adapter:
   Board Name: LSI3008-IR
   Board Assembly: 
Chip Name: LSISAS3008
Chip Revision: ALL
BIOS Revision: 12.00.00.00
Firmware Revision: 10.00.00.00
  Integrated RAID: yes

PhyNum  CtlrHandle  DevHandle  Disabled  Speed   MinMaxDevice
0   00010009   N 6.0 3.012 SAS Initiator 
1   0002000a   N 6.0 3.012 SAS Initiator 
2   0003000b   N 6.0 3.012 SAS Initiator 
3   0004000c   N 6.0 3.012 SAS Initiator 
4   0005000d   N 6.0 3.012 SAS Initiator 
5   0006000e   N 6.0 3.012 SAS Initiator 
6   0007000f   N 6.0 3.012 SAS Initiator 
7   00080010   N 6.0 3.012 SAS Initiator 


Since upgrading to 13.0-RELEASE I am no longer experiencing reboots, but I do
continue to see CAM errors


da7:mpr0:0:9:0): Info: 0x8024c3a0
(da7:mpr0:0:9:0): Error 22, Unretryable error
(da7:mpr0:0:9:0): WRITE(10). CDB: 2a 00 80 24 c3 e8 00 00 08 00 
(da7:mpr0:0:9:0): CAM status: SCSI Status Error
(da7:mpr0:0:9:0): SCSI status: Check Condition
(da7:mpr0:0:9:0): SCSI sense: ILLEGAL REQUEST asc:21,0 (Logical block address
out of range)
(da7:mpr0:0:9:0): Info: 0x8024c3e8
(da7:mpr0:0:9:0): Error 22, Unretryable error
(da7:mpr0:0:9:0): WRITE(10). CDB: 2a 00 3a 80 4d 20 00 00 10 00 
(da7:mpr0:0:9:0): CAM status: SCSI Status Error
(da7:mpr0:0:9:0): SCSI status: Check Condition
(da7:mpr0:0:9:0): SCSI sense: ILLEGAL REQUEST asc:21,0 (Logical block address
out of range)
(da7:mpr0:0:9:0): Info: 0x3a804d20
(da7:mpr0:0:9:0): Error 22, Unretryable error
(da7:mpr0:0:9:0): WRITE(10). CDB: 2a 00 31 c4 f0 90 00 00 10 00 
(da7:mpr0:0:9:0): CAM status: SCSI Status Error
(da7:mpr0:0:9:0): SCSI status: Check Condition
(da7:mpr0:0:9:0): SCSI sense: ILLEGAL REQUEST asc:21,0 (Logical block address
out of range)
(da7:mpr0:0:9:0): Info: 0x31c4f090
(da7:mpr0:0:9:0): Error 22, Unretryable error
(da7:mpr0:0:9:0): WRITE(10). CDB: 2a 00 31 c4 f8 b8 00 06 88 00 
(da7:mpr0:0:9:0): CAM status: SCSI Status Error
(da7:mpr0:0:9:0): SCSI status: Check Condition
(da7:mpr0:0:9:0): SCSI sense: ILLEGAL REQUEST asc:21,0 (Logical block address
out of range)
(da7:mpr0:0:9:0): Info: 0x31c4f8b8
(da7:mpr0:0:9:0): Error 22, Unretryable error
(da7:mpr0:0:9:0): WRITE(10). CDB: 2a 00 31 c3 c0 a0 00 01 60 00 
(da7:mpr0:0:9:0): CAM status: SCSI Status Error
(da7:mpr0:0:9:0): SCSI status: Check Condition
(da7:mpr0:0:9:0): SCSI sense: ILLEGAL REQUEST asc:21,0 (Logical block address
out of range)
(da7:mpr0:0:9:0): Info: 0x31c3c0a0
(da7:mpr0:0:9:0): Error 22, Unretryable error
(da7:mpr0:0:9:0): WRITE(16). CDB: 8a 00 00 00 00 02 ba a0 f0 10 00 00 00 10 00
00 
(da7:mpr0:0:9:0): CAM status: SCSI Status Error
(da7:mpr0:0:9:0): SCSI status: Check Condition
(da7:mpr0:0:9:0): SCSI sense: ILLEGAL REQUEST asc:21,0 (Logical block address
out of range)
(da7:mpr0:0:9:0): Info: 0x2baa0f010
(da7:mpr0:0:9:0): Error 22, Unretryable error
(da7:mpr0:0:9:0): WRITE(16). CDB: 8a 00 00 00 00 02 14 4b b4 c0 00 00 00 08 00
00 
(da7:mpr0:0:9:0): CAM status: SCSI Status Error
(da7:mpr0:0:9:0): SCSI status: Check Condition
(da7:mpr0:0:9:0): SCSI sense: ILLEGAL REQUEST asc:21,0 (Logical block address
out of range)
(da7:mpr0:0:9:0): Info: 0x2144bb4c0
(da7:mpr0:0:9:0): Error 22, Unretryable error


I have upgraded to 13.0-RELEASE-p1, on account of FreeBSD-EN-21:13.mpt . Could
the issues from that errata notice also be an issue for mps?

-- 
You are receiving this mail because:
You are the assignee for the bug.


[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives

2021-05-31 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496

--- Comment #49 from Conall O'Brien  ---
I've been experiencing these issues with WD Red 6TB disks. I had frequent,
unexplained reboots with 12.1-RELEASE and 12.2-RELEASE. 
mprutil show all
Adapter:
mpr0 Adapter:
   Board Name: LSI3008-IR
   Board Assembly: 
Chip Name: LSISAS3008
Chip Revision: ALL
BIOS Revision: 12.00.00.00
Firmware Revision: 10.00.00.00
  Integrated RAID: yes

PhyNum  CtlrHandle  DevHandle  Disabled  Speed   MinMaxDevice
0   00010009   N 6.0 3.012 SAS Initiator 
1   0002000a   N 6.0 3.012 SAS Initiator 
2   0003000b   N 6.0 3.012 SAS Initiator 
3   0004000c   N 6.0 3.012 SAS Initiator 
4   0005000d   N 6.0 3.012 SAS Initiator 
5   0006000e   N 6.0 3.012 SAS Initiator 
6   0007000f   N 6.0 3.012 SAS Initiator 
7   00080010   N 6.0 3.012 SAS Initiator 


Since upgrading to 13.0-RELEASE I am no longer experiencing reboots, but I do
continue to see CAM errors


da7:mpr0:0:9:0): Info: 0x8024c3a0
(da7:mpr0:0:9:0): Error 22, Unretryable error
(da7:mpr0:0:9:0): WRITE(10). CDB: 2a 00 80 24 c3 e8 00 00 08 00 
(da7:mpr0:0:9:0): CAM status: SCSI Status Error
(da7:mpr0:0:9:0): SCSI status: Check Condition
(da7:mpr0:0:9:0): SCSI sense: ILLEGAL REQUEST asc:21,0 (Logical block address
out of range)
(da7:mpr0:0:9:0): Info: 0x8024c3e8
(da7:mpr0:0:9:0): Error 22, Unretryable error
(da7:mpr0:0:9:0): WRITE(10). CDB: 2a 00 3a 80 4d 20 00 00 10 00 
(da7:mpr0:0:9:0): CAM status: SCSI Status Error
(da7:mpr0:0:9:0): SCSI status: Check Condition
(da7:mpr0:0:9:0): SCSI sense: ILLEGAL REQUEST asc:21,0 (Logical block address
out of range)
(da7:mpr0:0:9:0): Info: 0x3a804d20
(da7:mpr0:0:9:0): Error 22, Unretryable error
(da7:mpr0:0:9:0): WRITE(10). CDB: 2a 00 31 c4 f0 90 00 00 10 00 
(da7:mpr0:0:9:0): CAM status: SCSI Status Error
(da7:mpr0:0:9:0): SCSI status: Check Condition
(da7:mpr0:0:9:0): SCSI sense: ILLEGAL REQUEST asc:21,0 (Logical block address
out of range)
(da7:mpr0:0:9:0): Info: 0x31c4f090
(da7:mpr0:0:9:0): Error 22, Unretryable error
(da7:mpr0:0:9:0): WRITE(10). CDB: 2a 00 31 c4 f8 b8 00 06 88 00 
(da7:mpr0:0:9:0): CAM status: SCSI Status Error
(da7:mpr0:0:9:0): SCSI status: Check Condition
(da7:mpr0:0:9:0): SCSI sense: ILLEGAL REQUEST asc:21,0 (Logical block address
out of range)
(da7:mpr0:0:9:0): Info: 0x31c4f8b8
(da7:mpr0:0:9:0): Error 22, Unretryable error
(da7:mpr0:0:9:0): WRITE(10). CDB: 2a 00 31 c3 c0 a0 00 01 60 00 
(da7:mpr0:0:9:0): CAM status: SCSI Status Error
(da7:mpr0:0:9:0): SCSI status: Check Condition
(da7:mpr0:0:9:0): SCSI sense: ILLEGAL REQUEST asc:21,0 (Logical block address
out of range)
(da7:mpr0:0:9:0): Info: 0x31c3c0a0
(da7:mpr0:0:9:0): Error 22, Unretryable error
(da7:mpr0:0:9:0): WRITE(16). CDB: 8a 00 00 00 00 02 ba a0 f0 10 00 00 00 10 00
00 
(da7:mpr0:0:9:0): CAM status: SCSI Status Error
(da7:mpr0:0:9:0): SCSI status: Check Condition
(da7:mpr0:0:9:0): SCSI sense: ILLEGAL REQUEST asc:21,0 (Logical block address
out of range)
(da7:mpr0:0:9:0): Info: 0x2baa0f010
(da7:mpr0:0:9:0): Error 22, Unretryable error
(da7:mpr0:0:9:0): WRITE(16). CDB: 8a 00 00 00 00 02 14 4b b4 c0 00 00 00 08 00
00 
(da7:mpr0:0:9:0): CAM status: SCSI Status Error
(da7:mpr0:0:9:0): SCSI status: Check Condition
(da7:mpr0:0:9:0): SCSI sense: ILLEGAL REQUEST asc:21,0 (Logical block address
out of range)
(da7:mpr0:0:9:0): Info: 0x2144bb4c0
(da7:mpr0:0:9:0): Error 22, Unretryable error


I have upgraded to 13.0-RELEASE-p1, on account of FreeBSD-EN-21:13.mpt . Could
the issues from that errata notice also be an issue for mps?

-- 
You are receiving this mail because:
You are the assignee for the bug.


[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives

2021-04-02 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496

Daniel Austin  changed:

   What|Removed |Added

 CC||freebsd-po...@dan.me.uk

--- Comment #48 from Daniel Austin  ---
I have this same issue too...

My setup is a LSI 9206-16e PCIe card (which is really just 2 x LSISAS2308 cards
with a PCIe switch) and BSD 12+13.  I have this connecting to a QNAP TL-D1600S
16-bay SATA chassis with 12 x 8TB Toshiba SATA (non-SMR) disks.

I tried a workaround online (camcontrol tags daX -N 1) to disable NCQ per drive
and this is fine *once the server is booted*... however, if i ever rebooted, I
had a massive scroll of CAM errors as the kernel was trying to import a ZFS
pool before the camcontrol script had run... This even leads to ZFS reporting
too many errors and taking the pool offline (note: i'm not booting from this
pool)

I don't have the luxury of a firmware update as there hasn't been any updates
to the SAS2308 in a long long time!  I am running 20.00.07.00.

I've also tried 3 different 9206-16e cards, so i'm happy it's not just a faulty
card.  I also tried in 2 different servers with the same results so happy it's
probably not a hardware issue at all.

My final solution which is really a kludge but does fix the issue permanently
was to boot into a live ubuntu environment and use lsiutil 1.72 to disable NCQ
on the card itself.  This is saved in the cards EEPROM.

Now when I boot bsd, I get zero CAM errors from any disks, zfs pool imports
straight away, and I can still max out the bandwidth to the drives - yay.

I also have an LSI 9207-8i card in a different machine running the same
firmware which has no issues at all even with NCQ enabled (but these are
directly connected to the card via SATA cables with no enclosure as such), so I
do think this is just some kind of incompatibility between
driver<-->card<-->enclosure of some kind.

It would be lovely if lsiutil could be ported to bsd... I did look at it
briefly but it's beyond my capabilities... the source code is online if anyone
wanted to try.

Hope that helps anyone else stumbling upon this PR and I appreciate it may only
fix some not all of the cases reported here so far... but some is better than
none :-)

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives

2020-12-29 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496

--- Comment #47 from elij  ---
I have a SAS9305-16i, and was seeing similar errors to some of the folks here.
The comment about NCQ got me thinking, and sure enough.. looks like broadcom
had a new firmware out (I had 16.00.11.00 where I saw the errors).

16.00.12.00 has these two as the fixed "defect list":

ID:
DCSG00398894
Headline: SATA only:
WRITE SAME NCQ encapsulation assumes NonData NCQ is supported if Zero
EXT is Supported
Description Of Change:
Disable NCQ encapsulation if Zero EXT is supported but Non Data NCQ is
not supported
Issue Description:
In case if Zero EXT is supported but Non Data NCQ is not supported by
drive, WRITE SAME NCQ encapsulation would send Non Data NCQ command to
the drive.
Drive would fail the command as Non Data NCQ is not supported by drive.
This will cause command failure to host.
Steps To Reproduce:
IO errors are observed when mkfs.ext4 operation is done on drives that
support Zero EXT but do not support Non Data NCQ.


ID: 
DCSG00411882 (Port Of Defect DCSG00139294)
Headline:
SATA only : Controller hang is observed due to recursive function call.
Description Of Change:
Avoid starting of pended commands to SATA drive from completion
functions of a command to avoid recursion.
Issue Description:
Controller hang is observed with ATA pass through command followed by a
pended command that fails.
At the completion of pass through command, PL starts the pended IOs if
any.
If the pended IO is failed due to invalid CDB, then immediately
completion function is called causing recursion. This causes controller
hang.
Steps To Reproduce:
Install the FreeNAS and try to create ZFS pool(Storage -> Pools -> Add)
of the direct attached SSDs.
- ATA command completes when there are IOs in pendlist.
- The pended IO has invalid CDB

I have installed the firmware, and am keeping an eye on it. My system is
lightly loaded though, so my sightings of the issue have been rather spurious.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives

2020-11-05 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496

Wayne Willcox  changed:

   What|Removed |Added

 CC||wwillc...@gmail.com

--- Comment #46 from Wayne Willcox  ---
So I have been getting the same errors in 12.1 and now 12.2
at scbus0 target 8 lun 0 (pass0,da0)
at scbus0 target 9 lun 0 (pass1,da1)
at scbus0 target 10 lun 0 (pass2,da2)
at scbus0 target 11 lun 0 (pass3,da3)
at scbus0 target 12 lun 0 (pass4,da4)
at scbus0 target 13 lun 0 (pass5,da5)
at scbus0 target 14 lun 0 (pass6,da6)
at scbus0 target 15 lun 0 (pass7,da7)
at scbus0 target 16 lun 0 (pass8,da8)
at scbus0 target 17 lun 0 (pass9,da9)
at scbus0 target 18 lun 0 (pass10,da10)
at scbus0 target 19 lun 0 (pass11,da11)
at scbus0 target 20 lun 0 (ses0,pass12)
at scbus0 target 21 lun 0 (ses1,pass13)
at scbus0 target 22 lun 0 (pass14,da12)
at scbus0 target 23 lun 0 (pass15,da13)
at scbus0 target 24 lun 0 (pass16,da14)
Adapter:
mps0 Adapter:
   Board Name: SAS9211-8i
   Board Assembly: H3-25250-02B
Chip Name: LSISAS2008
Chip Revision: ALL:
BIOS Revision: 7.39.00.00
Firmware Revision: 20.00.06.00
  Integrated RAID: no

PhyNum  CtlrHandle  DevHandle  Disabled  Speed   MinMaxDevice
0  N 1.56.0SAS Initiator 
1  N 1.56.0SAS Initiator 
2  N 1.56.0SAS Initiator 
3  N 1.56.0SAS Initiator 
4   00010009   N 6.0 1.56.0SAS Initiator 
5   00010009   N 6.0 1.56.0SAS Initiator 
6   00010009   N 6.0 1.56.0SAS Initiator 
7   00010009   N 6.0 1.56.0SAS Initiator 


hawthorn kernel log messages:
+   (da14:mps0:0:24:0): WRITE(10). CDB: 2a 00 3b c2 f3 f0 00 00 18 00
length 12288 SMID 1732 Command timeout on target 24(0x001a) 6 set,
60.96681849 elapsed
+mps0: Sending abort to target 24 for SMID 1732
+   (da14:mps0:0:24:0): WRITE(10). CDB: 2a 00 3b c2 f3 f0 00 00 18 00
length 12288 SMID 1732 Aborting command 0xfe00daa55760
+   (da4:mps0:0:12:0): WRITE(10). CDB: 2a 00 3b c2 f3 d8 00 00 30 00 length
24576 SMID 1089 Command timeout on target 12(0x000f) 6 set, 60.97158463
elapsed
+mps0: Sending abort to target 12 for SMID 1089
+   (da4:mps0:0:12:0): WRITE(10). CDB: 2a 00 3b c2 f3 d8 00 00 30 00 length
24576 SMID 1089 Aborting command 0xfe00daa1f758
+   (da14:mps0:0:24:0): WRITE(10). CDB: 2a 00 49 79 f9 f8 00 00 08 00
length 4096 SMID 1230 Command timeout on target 24(0x001a) 6 set,
60.97561590 elapsed
+   (da13:mps0:0:23:0): WRITE(10). CDB: 2a 00 3b c2 f3 f0 00 00 18 00
length 12288 SMID 78 Command timeout on target 23(0x0019) 6 set,
60.9852 elapsed
+mps0: Sending abort to target 23 for SMID 78
+   (da13:mps0:0:23:0): WRITE(10). CDB: 2a 00 3b c2 f3 f0 00 00 18 00
length 12288 SMID 78 Aborting command 0xfe00da9ca8d0
+   (da5:mps0:0:13:0): WRITE(10). CDB: 2a 00 53 42 14 48 00 00 10 00 length
8192 SMID 217 Command timeout on target 13(0x0010) 6 set, 60.98193577
elapsed
+mps0: Sending abort to target 13 for SMID 217
+   (da5:mps0:0:13:0): WRITE(10). CDB: 2a 00 53 42 14 48 00 00 10 00 length
8192 SMID 217 Aborting command 0xfe00da9d6398
+   (da8:mps0:0:16:0): WRITE(10). CDB: 2a 00 53 42 14 40 00 00 18 00 length
12288 SMID 670 Command timeout on target 16(0x0013) 6 set, 60.98762575
elapsed
+mps0: Sending abort to target 16 for SMID 670
+   (da8:mps0:0:16:0): WRITE(10). CDB: 2a 00 53 42 14 40 00 00 18 00 length
12288 SMID 670 Aborting command 0xfe00da9fc450
+   (da2:mps0:0:10:0): WRITE(10). CDB: 2a 00 53 42 14 40 00 00 18 00 length
12288 SMID 91 Command timeout on target 10(0x000d) 6 set, 60.99148905
elapsed
+mps0: Sending abort to target 10 for SMID 91
+   (da2:mps0:0:10:0): WRITE(10). CDB: 2a 00 53 42 14 40 00 00 18 00 length
12288 SMID 91 Aborting command 0xfe00da9cba48
+   (da2:mps0:0:10:0): WRITE(10). CDB: 2a 00 1b 8d ae b0 00 00 08 00 length
4096 SMID 1064 Command timeout on target 10(0x000d) 6 set, 60.99537335
elapsed
+   (da9:mps0:0:17:0): WRITE(10). CDB: 2a 00 53 42 14 40 00 00 10 00 length
8192 SMID 982 Command timeout on target 17(0x0014) 6 set, 60.99738899
elapsed
+mps0: Sending abort to target 17 for SMID 982
+   (da9:mps0:0:17:0): WRITE(10). CDB: 2a 00 53 42 14 40 00 00 10 00 length
8192 SMID 982 Aborting command 0xfe00daa16790
+   (da12:mps0:0:22:0): WRITE(10). CDB: 2a 00 53 42 14 40 00 00 10 00
length 8192 SMID 1677 Command timeout on target 22(0x0018) 6 set,
60.100129429 elapsed
+mps0: Sending abort to target 

[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives

2020-05-13 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496

--- Comment #45 from Matthias Pfaller  ---
(In reply to Sharad Ahlawat from comment #44)
[root@nyx ~]# sysctl kern.cam.da.retry_count
kern.cam.da.retry_count: 4

Previously we had the drives connected to the onboard SATA/SAS ports (hoping
the mps problems would get solved). The machine was running for month without
showing any disk related problems. At the moment we are looking for a
replacement for the SAS2008 (for other machines without that many onboard
SATA/SAS channels) and are testing the following config:

ahci0:  port
0x3050-0x3057,0x3040-0x3043,0x3030-0x3037,0x3020-0x3023,0x3000-0x301f mem
0xdf94-0xdf9407ff irq 32 at device 0.0 on pci5
ahci0: AHCI v1.00 with 4 6Gbps ports, Port Multiplier supported with FBS
ahcich0:  at channel 0 on ahci0
ahcich1:  at channel 1 on ahci0
ahcich2:  at channel 2 on ahci0
ahcich3:  at channel 3 on ahci0
ahci1: AHCI v1.00 with 4 6Gbps ports, Port Multiplier supported with FBS
ahcich4:  at channel 0 on ahci1
ahcich5:  at channel 1 on ahci1
ahcich6:  at channel 2 on ahci1
ahcich7:  at channel 3 on ahci1
ada0 at ahcich0 bus 0 scbus0 target 0 lun 0
ada0:  ACS-3 ATA SATA 3.x device
ada0: Serial Number ZJV3ZYSX
ada0: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
ada0: Command Queueing enabled
ada0: 11444224MB (23437770752 512 byte sectors)
ada1 at ahcich1 bus 0 scbus1 target 0 lun 0
ada1:  ACS-3 ATA SATA 3.x device
ada1: Serial Number ZJV1VVWX
ada1: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
ada1: Command Queueing enabled
ada1: 11444224MB (23437770752 512 byte sectors)
ada2 at ahcich2 bus 0 scbus2 target 0 lun 0
ada2:  ACS-3 ATA SATA 3.x device
ada2: Serial Number ZJV1WLXM
ada2: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
ada2: Command Queueing enabled
ada2: 11444224MB (23437770752 512 byte sectors)
ada3 at ahcich3 bus 0 scbus3 target 0 lun 0
ada3:  ACS-3 ATA SATA 3.x device
ada3: Serial Number ZJV2YY9A
ada3: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
ada3: Command Queueing enabled
ada3: 11444224MB (23437770752 512 byte sectors)
ada4 at ahcich4 bus 0 scbus4 target 0 lun 0
ada4:  ACS-3 ATA SATA 3.x device
ada4: Serial Number ZJV3ZJNA
ada4: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
ada4: Command Queueing enabled
ada4: 11444224MB (23437770752 512 byte sectors)
ada5 at ahcich5 bus 0 scbus5 target 0 lun 0
ada5:  ACS-3 ATA SATA 3.x device
ada5: Serial Number ZJV3ZXN5
ada5: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
ada5: Command Queueing enabled
ada5: 11444224MB (23437770752 512 byte sectors)
ada6 at ahcich6 bus 0 scbus6 target 0 lun 0
ada6:  ACS-3 ATA SATA 3.x device
ada6: Serial Number ZJV2MWZ0
ada6: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
ada6: Command Queueing enabled
ada6: 11444224MB (23437770752 512 byte sectors)
ada7 at ahcich7 bus 0 scbus7 target 0 lun 0
ada7:  ACS-3 ATA SATA 3.x device
ada7: Serial Number ZCH0HHJ6
ada7: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
ada7: Command Queueing enabled
ada7: 11444224MB (23437770752 512 byte sectors)
ada8:  ACS-2 ATA SATA 3.x device
ada8: Serial Number S3F5NX0M400493
ada8: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
ada8: Command Queueing enabled
ada8: 228936MB (468862128 512 byte sectors)
ada8: quirks=0x1<4K>
ada9:  ACS-2 ATA SATA 3.x device
ada9: Serial Number S3F5NX0M400461
ada9: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
ada9: Command Queueing enabled
ada9: 228936MB (468862128 512 byte sectors)
ada9: quirks=0x1<4K>

I reverted the "camcontrol tags  -N 1" and the changed sysctls. So with
this setup the system runs with the same drives and without any changed sysctl
settings or cam settings. 

Up to now there are no problems showing up. Operating conditions are the same
as with the last mps test (a zpool scrub is running and our systems did a
backup to the machine last night).

As the drives behave well with at least three different controllers

  ahci0: 
  ahci1: 
  isci0: 

I can't imagine that changing parameters on the cam/zfs side (NCQ TRIM, NCQ,
timeouts) will help in our case.

Probably we will have to decommission the LSI9211 boards :-(


regards, Matthias

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives

2020-05-13 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496

--- Comment #44 from Sharad Ahlawat  ---
(In reply to Matthias Pfaller from comment #43)

logs shows the read events are still timing out, even with 90

also there are no retry messages, is "sysctl kern.cam.da.retry_count" set to 0
?

you could try a few things to get to the root cause:

a: also disable ZFS cache flush
sysctl vfs.zfs.cache_flush_disable=1
even though your drives are not SMRs

b: experiment with larger timeout values
also observe "gstat" output and ensure the first column L(q) is continually
returning to zero and not getting stuck for any of the drives

c: try setting reducing the SCIC speed to 3.0 in the controller settings; just
to eliminate some disk firmware speed compatibility issue.

Side note, not sure if this applies to your drives but a few of mine don't
support NCQ TRIM and are not properly driver blacklisted, so I had to set
vfs.unmapped_buf_allowed=0 in loader.conf

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives

2020-05-12 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496

--- Comment #43 from Matthias Pfaller  ---
(In reply to Matthias Pfaller from comment #42)
Sorry, I messed up the names... This should have been Sharad Ahlawat and not
Christoph Bubel.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives

2020-05-12 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496

--- Comment #42 from Matthias Pfaller  ---
We followed the advice of Christop Bubel, disabled NCQ and set the timeouts to
90s (I can't imagine a situation where this should be necessary, but still...).

Results:
May  8 17:36:00 nyx kernel: mps0: IOC Fault 0x4d04, Resetting
May  8 17:36:00 nyx kernel: mps0: Reinitializing controller
May  8 17:36:00 nyx kernel: mps0: Firmware: 19.00.00.00, Driver:
21.02.00.00-fbsd
May  8 17:36:00 nyx kernel: mps0: IOCCapabilities:
1285c
May  8 17:50:44 nyx kernel: mps0: IOC Fault 0x4d04, Resetting
May  8 17:50:44 nyx kernel: mps0: Reinitializing controller
May  8 17:50:44 nyx kernel: mps0: Firmware: 19.00.00.00, Driver:
21.02.00.00-fbsd
May  8 17:50:44 nyx kernel: mps0: IOCCapabilities:
1285c
May  8 18:10:06 nyx kernel: mps0: IOC Fault 0x4d04, Resetting
May  8 18:10:06 nyx kernel: mps0: Reinitializing controller
May  8 18:10:06 nyx kernel: mps0: Firmware: 19.00.00.00, Driver:
21.02.00.00-fbsd
May  8 18:10:06 nyx kernel: mps0: IOCCapabilities:
1285c
May  8 18:27:27 nyx kernel: mps0: IOC Fault 0x4d04, Resetting
May  8 18:27:27 nyx kernel: mps0: Reinitializing controller
May  8 18:27:27 nyx kernel: mps0: Firmware: 19.00.00.00, Driver:
21.02.00.00-fbsd

We upgraded the controller firmware to 20.00.07.00 and tried again:
[root@nyx ~]# mpsutil show all
Adapter:
mps0 Adapter:
   Board Name: SAS9211-8i
   Board Assembly: 
Chip Name: LSISAS2008
Chip Revision: ALL
BIOS Revision: 0.00.00.00
Firmware Revision: 20.00.07.00
  Integrated RAID: no

PhyNum  CtlrHandle  DevHandle  Disabled  Speed   MinMaxDevice
0   00010009   N 6.0 1.56.0SAS Initiator 
1   0002000a   N 6.0 1.56.0SAS Initiator 
2   0003000b   N 6.0 1.56.0SAS Initiator 
3   0004000c   N 6.0 1.56.0SAS Initiator 
4   0005000d   N 6.0 1.56.0SAS Initiator 
5   0006000e   N 6.0 1.56.0SAS Initiator 
6   0007000f   N 6.0 1.56.0SAS Initiator 
7   00080010   N 6.0 1.56.0SAS Initiator 

Devices:
BTSAS Address  Handle  ParentDeviceSpeed Enc  Slot  Wdt
00   01   44332211 00090001  SATA Target   6.0   0001 031
00   04   443322110100 000a0002  SATA Target   6.0   0001 021
00   03   443322110200 000b0003  SATA Target   6.0   0001 011
00   00   443322110300 000c0004  SATA Target   6.0   0001 001
00   02   443322110400 000d0005  SATA Target   6.0   0001 071
00   05   443322110500 000e0006  SATA Target   6.0   0001 061
00   06   443322110600 000f0007  SATA Target   6.0   0001 051
00   07   443322110700 00100008  SATA Target   6.0   0001 041

Enclosures:
Slots  Logical ID SEPHandle  EncHandleType
  08500605b001551a80   0001 Direct Attached SGPIO

Expanders:
NumPhys   SAS Address DevHandle   Parent  EncHandle  SAS Level

[root@nyx ~]# for i in $(camcontrol devlist | grep "ST12000" | cut -d"," -f2 |
cut -d")" -f1); do 
> camcontrol tags $i
> done
(pass0:mps0:0:0:0): device openings: 1
(pass1:mps0:0:1:0): device openings: 1
(pass2:mps0:0:2:0): device openings: 1
(pass3:mps0:0:3:0): device openings: 1
(pass4:mps0:0:4:0): device openings: 1
(pass5:mps0:0:5:0): device openings: 1
(pass6:mps0:0:6:0): device openings: 1
(pass7:mps0:0:7:0): device openings: 1
[root@nyx ~]# 
Results:

May 11 15:58:58 nyx kernel: (da7:mps0:0:7:0): READ(10). CDB: 28 00 ef 46 55
38 00 00 10 00 length 8192 SMID 64 Command timeout on target 7(0x0010) 9
set, 90.154685585 elapsed
May 11 15:58:58 nyx kernel: mps0: Sending abort to target 7 for SMID 64
May 11 15:58:58 nyx kernel: (da7:mps0:0:7:0): READ(10). CDB: 28 00 ef 46 55
38 00 00 10 00 length 8192 SMID 64 Aborting command 0xfe00f92e5600
May 11 15:58:58 nyx kernel: (da2:mps0:0:2:0): READ(16). CDB: 88 00 00 00 00
04 e9 23 92 a0 00 00 00 20 00 00 length 16384 SMID 1564 Command timeout on
target 2(0x000d) 9 set, 90.120064627 elapsed
May 11 15:58:58 nyx kernel: mps0: Sending abort to target 2 for SMID 1564
May 11 15:58:58 nyx kernel: (da2:mps0:0:2:0): READ(16). CDB: 88 00 00 00 00
04 e9 23 92 a0 00 00 00 20 00 00 length 16384 SMID 1564 Aborting command
0xfe00f93635a0
May 11 15:58:58 nyx kernel: (da6:mps0:0:6:0): READ(16). CDB: 88 00 00 00 00
04 e8 09 8e 30 00 00 00 30 00 00 length 24576 SMID 432 Command timeout on
target 6(0x000f) 9 set, 90.97377933 elapsed
May 11 15:58:58 nyx kernel: mps0: Sending abort to target 6 for SMID 432
May 11 15:58:58 nyx kernel: (da6:mps0:0:6:0): READ(16). CDB: 88 00 00 00 00
04 e8 09 8e 30 00 00 00 30 00 00 length 24576 SMID 432 Aborting command
0xfe00f9304480
May 11 15:58:58 nyx 

[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives

2020-05-06 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496

--- Comment #41 from Sharad Ahlawat  ---
(In reply to Sharad Ahlawat from comment #28)

Symptom:
CAM retry and timeout messages leading to controller aborts and resets

Cause:
slow disks or using SMR disks

Workaround:
Increase the CAM timeout defaults

❯ sysctl kern.cam.da.default_timeout=90
kern.cam.da.default_timeout: 60 -> 90
❯ sysctl kern.cam.ada.default_timeout=60
kern.cam.ada.default_timeout: 30 -> 60

And disable NCQ on SMR Seagates:
❯ cat cam-tags.sh
#!/usr/local/bin/bash
#shrinking the command Native Command Queue down to 1 effectively disabling
queuing
for Disk in `camcontrol devlist | grep "ST8000DM" | cut -d"," -f2 | cut -d")"
-f1`;
do
camcontrol tags $Disk -N 1 ;
# camcontrol tags $Disk -v
done

If you only have SMRs in your setup and use an UPS you could also:
❯ sysctl vfs.zfs.cache_flush_disable=1

Solution:
don't use slow disks and SMRs disks with ZFS


The long version:

I am obliged to post this update given the driver downgrade workaround I
previously posted on this thread before getting to the root cause for these
logs in my messages file after upgrading to 12.x

Jan 18 17:29:28 nas kernel: ahcich6: Timeout on slot 8 port 0
Jan 18 17:29:28 nas kernel: ahcich6: is  cs 0100 ss  rs
0100 tfd c0 serr  cmd c817
Jan 18 17:29:28 nas kernel: (ada6:ahcich6:0:0:0): FLUSHCACHE48. ACB: ea 00 00
00 00 40 00 00 00 00 00 00
Jan 18 17:29:28 nas kernel: (ada6:ahcich6:0:0:0): CAM status: Command timeout
Jan 18 17:29:28 nas kernel: (ada6:ahcich6:0:0:0): Retrying command, 0 more
tries remain
Jan 18 17:30:00 nas kernel: ahcich6: AHCI reset: device not ready after 31000ms
(tfd = 0080)
Jan 18 17:30:30 nas kernel: ahcich6: Timeout on slot 9 port 0
Jan 18 17:30:30 nas kernel: ahcich6: is  cs 0200 ss  rs
0200 tfd 80 serr  cmd c917
Jan 18 17:30:30 nas kernel: (aprobe0:ahcich6:0:0:0): ATA_IDENTIFY. ACB: ec 00
00 00 00 40 00 00 00 00 00 00
Jan 18 17:30:30 nas kernel: (aprobe0:ahcich6:0:0:0): CAM status: Command
timeout
Jan 18 17:30:30 nas kernel: (aprobe0:ahcich6:0:0:0): Retrying command, 0 more
tries remain

Apr 25 22:28:12 nas kernel: mps0: Controller reported scsi ioc terminated tgt
11 SMID 1039 loginfo 3108
Apr 25 22:28:12 nas kernel: mps0: Controller reported scsi ioc terminated tgt
11 SMID 1357 loginfo 3108
Apr 25 22:28:12 nas kernel: mps0: Controller reported scsi ioc terminated tgt
11 SMID 1933 loginfo 3108
Apr 25 22:28:12 nas kernel: (da4:mps0:0:11:0): READ(16). CDB: 88 00 00 00 00 01
45 fb 37 c8 00 00 00 b0 00 00
Apr 25 22:28:12 nas kernel: mps0: (da4:mps0:0:11:0): CAM status: CCB request
completed with an error
Apr 25 22:28:12 nas kernel: (da4:mps0:0:11:0): Retrying command, 3 more tries
remain
Apr 25 22:28:12 nas kernel: (da4:mps0:0:11:0): READ(16). CDB: 88 00 00 00 00 01
45 fb 38 78 00 00 00 58 00 00
Apr 25 22:28:12 nas kernel: (da4:mps0:0:11:0): CAM status: CCB request
completed with an error
Apr 25 22:28:12 nas kernel: (da4:mps0:0:11:0): Retrying command, 3 more tries
remain
Apr 25 22:28:12 nas kernel: (da4:mps0:0:11:0): WRITE(16). CDB: 8a 00 00 00 00
01 b4 c0 a1 d8 00 00 01 00 00 00
Apr 25 22:28:12 nas kernel: (da4:mps0:0:11:0): CAM status: CCB request
completed with an error
Apr 25 22:28:12 nas kernel: (da4:mps0:0:11:0): Retrying command, 3 more tries
remain
Apr 25 22:28:12 nas kernel: Controller reported scsi ioc terminated tgt 11 SMID
621 loginfo 3108
Apr 25 22:28:12 nas kernel: mps0: Controller reported scsi ioc terminated tgt
11 SMID 476 loginfo 3108
Apr 25 22:28:12 nas kernel: mps0: Controller reported scsi ioc terminated tgt
11 SMID 321 loginfo 3108
Apr 25 22:28:12 nas kernel: mps0: Controller reported scsi ioc terminated tgt
11 SMID 1873 loginfo 3108
Apr 25 22:28:12 nas kernel: mps0: Controller reported scsi ioc terminated tgt
11 SMID 1852 loginfo 3108
Apr 25 22:28:12 nas kernel: mps0: Controller reported scsi ioc terminated tgt
11 SMID 1742 loginfo 3108
Apr 25 22:28:12 nas kernel: mps0: Controller reported scsi ioc terminated tgt
11 SMID 387 loginfo 3108
Apr 25 22:28:12 nas kernel: mps0: Controller reported scsi ioc terminated tgt
11 SMID 2104 loginfo 3108
Apr 25 22:28:12 nas kernel: (da4:mps0:0:11:0): WRITE(16). CDB: 8a 00 00 00 00
01 b4 c0 a2 d8 00 00 01 00 00 00
Apr 25 22:28:12 nas kernel: (da4:mps0:0:11:0): CAM status: CCB request
completed with an error
Apr 25 22:28:12 nas kernel: (da4:mps0:0:11:0): Retrying command, 3 more tries
remain
Apr 25 22:28:12 nas kernel: (da4:mps0:0:11:0): WRITE(16). CDB: 8a 00 00 00 00
01 b4 c0 a3 d8 00 00 01 00 00 00
Apr 25 22:28:12 nas kernel: (da4:mps0:0:11:0): CAM status: CCB request
completed with an error
Apr 25 22:28:12 nas kernel: (da4:mps0:0:11:0): Retrying command, 3 more tries
remain
Apr 25 22:28:12 nas kernel: (da4:mps0:0:11:0): WRITE(16). CDB: 8a 00 00 00 00
01 b4 c0 a4 d8 00 00 01 00 00 00
Apr 25 22:28:12 nas kernel: (da4:mps0:0:11:0): CAM status: CCB request

[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives

2020-03-10 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496

--- Comment #40 from Peter Eriksson  ---
I've had some random problems when using the cards with IR firmware, so I
always make sure they are running IT firmware. Might be worth testing.


But that might not be relevant for your problems. We are using them without any
issues (so far) but we're using SAS drives (10TB HGST) behind SAS Expanders
(and a couple of Intel SSD SATA drives) - so nothing similar to your
situation...

# mprutil show all
Adapter:
mpr0 Adapter:
   Board Name: Dell HBA330 Mini
   Board Assembly: 
Chip Name: LSISAS3008
Chip Revision: ALL
BIOS Revision: 18.00.00.00
Firmware Revision: 16.00.08.00
  Integrated RAID: no

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives

2020-03-10 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496

--- Comment #39 from Paul Thornton  ---
I don't have an expanders (note that this card has IR firmware, in case that is
relevant - I wasn't actually aware of that, it is only being used as a dumb
HBA):

[root@nas1b ~]# mprutil show all
Adapter:
mpr0 Adapter:
   Board Name: LSI3008-IR
   Board Assembly: 
Chip Name: LSISAS3008
Chip Revision: ALL
BIOS Revision: 8.35.00.00
Firmware Revision: 15.00.03.00
  Integrated RAID: yes

PhyNum  CtlrHandle  DevHandle  Disabled  Speed   MinMaxDevice
0  N 3.012 SAS Initiator 
1  N 3.012 SAS Initiator 
2  N 3.012 SAS Initiator 
3  N 3.012 SAS Initiator 
4  N 3.012 SAS Initiator 
5  N 3.012 SAS Initiator 
6  N 3.012 SAS Initiator 
7  N 3.012 SAS Initiator 

Devices:
BTSAS Address  Handle  ParentDeviceSpeed Enc  Slot  Wdt

Enclosures:
Slots  Logical ID SEPHandle  EncHandleType
  08500304801cf54a05   0001 Direct Attached SGPIO

Expanders:
NumPhys   SAS Address DevHandle   Parent  EncHandle  SAS Level

[root@nas1b ~]#

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives

2020-03-10 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496

--- Comment #38 from Matthias Pfaller  ---
(In reply to Peter Eriksson from comment #37)
Not in our case:
[root@nyx ~]# mpsutil show all
Adapter:
mps0 Adapter:
   Board Name: SAS9211-8i
   Board Assembly: 
Chip Name: LSISAS2008
Chip Revision: ALL
BIOS Revision: 0.00.00.00
Firmware Revision: 19.00.00.00
  Integrated RAID: no

PhyNum  CtlrHandle  DevHandle  Disabled  Speed   MinMaxDevice
0  N 1.56.0SAS Initiator 
1  N 1.56.0SAS Initiator 
2  N 1.56.0SAS Initiator 
3  N 1.56.0SAS Initiator 
4  N 1.56.0SAS Initiator 
5  N 1.56.0SAS Initiator 
6  N 1.56.0SAS Initiator 
7  N 1.56.0SAS Initiator 

Devices:
BTSAS Address  Handle  ParentDeviceSpeed Enc  Slot  Wdt

Enclosures:
Slots  Logical ID SEPHandle  EncHandleType
  08500605b001551a80   0001 Direct Attached SGPIO

Expanders:
NumPhys   SAS Address DevHandle   Parent  EncHandle  SAS Level

We did downgrade to firmware 19.00.00.00, but I hadn't the time to run some
tests using this firmware level.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives

2020-03-10 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496

--- Comment #37 from Peter Eriksson  ---
Any SAS-expanders between the SAS HBA and the (SATA) disks?
("mprutil show expanders") - and if so - do they (the expanders) have up to
date firmware?

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives

2020-02-25 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496

lastb0i...@gmail.com changed:

   What|Removed |Added

 CC||lastb0i...@gmail.com

--- Comment #36 from lastb0i...@gmail.com ---
Has anyone gotten further with this issue?  Seems that it is affecting me every
other hour now! And it all started after adding a WD DC Drive...

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives

2020-02-18 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496

--- Comment #35 from Ștefan BĂLU  ---
Looks like the problem persists with FreeNAS (FreeBSD 11.3-RELEASE) and
Firmware Revision: 15.00.03.00.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives

2020-02-17 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496

--- Comment #34 from Bane Ivosev  ---
(In reply to Ștefan BĂLU from comment #33)
Before first install (11.3) we update firmware and leave it as is. So, later we
didn't change firmware, we just try mps/mpr driver different from 18.03.

Now, with 12.1-RELEASE, we have:
mpr0: Firmware: 16.00.01.00, Driver: 23.00.00.00-fbsd

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives

2020-02-17 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496

--- Comment #33 from Ștefan BĂLU  ---
(In reply to Bane Ivosev from comment #32), so, should we keep the firmware in
sync with those versions or go for a trial-and-error aproach?

In the meantime, i'll let you know how it goes with latest FreeNAS (FreeBSD
11.3-RELEASE) and the firmware 15.00.03.00 version.

I see that my corrected comment didn't go through... So, I downgraded from:

BIOS Revision: 18.00.00.00
Firmware Revision: 16.00.01.00

to:

BIOS Revision: 8.35.00.00
Firmware Revision: 15.00.03.00

The behaviour would trigger in a couple of days, so i'll keep you all posted.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives

2020-02-17 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496

--- Comment #32 from Bane Ivosev  ---
We have problem with 18.03 version of the mpr/mps driver. That's why i said
12.1 and 11.1 relase working for us.

12.1 -> #define MPR_DRIVER_VERSION  "23.00.00.00-fbsd"
12.0 -> #define MPR_DRIVER_VERSION  "18.03.00.00-fbsd"
11.3 -> #define MPR_DRIVER_VERSION  "18.03.00.00-fbsd"
11.2 -> #define MPR_DRIVER_VERSION  "18.03.00.00-fbsd"
11.1 -> #define MPR_DRIVER_VERSION  "15.03.00.00-fbsd"

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives

2020-02-17 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496

--- Comment #31 from Matthias Pfaller  ---
mpsutil show all
Adapter:
mps0 Adapter:
   Board Name: SAS9211-8i
   Board Assembly: 
Chip Name: LSISAS2008
Chip Revision: ALL
BIOS Revision: 7.37.00.00
Firmware Revision: 20.00.07.00
  Integrated RAID: yes

uname -a
FreeBSD nyx 12.1-RELEASE-p1 FreeBSD 12.1-RELEASE-p1 GENERIC  amd64

This machine is experiencing the problems.

The machines without problems are running 11.1 and (thank's for the pointer)
   Board Name: SAS9211-8i
   Board Assembly: H3-25250-02J
Chip Name: LSISAS2008
Chip Revision: ALL
BIOS Revision: 7.37.00.00
Firmware Revision: 19.00.00.00
  Integrated RAID: no

I'll try a downgrade to 19.00.00.00

regards, Matthias(In reply to Ștefan BĂLU from comment #30)

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives

2020-02-17 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496

Ștefan BĂLU  changed:

   What|Removed |Added

 CC||stefan.b...@ulab.ro

--- Comment #30 from Ștefan BĂLU  ---
Guys, i've just experienced these issues on the latest FreeNAS (it's a FreeBSD
11.3-RELEASE) with a LSI3008 controller.

The issues you all experience are related to the BIOS Revision and Firmware
Revision versions shown by:

mprutil show all | less
...
Adapter:
mpr0 Adapter:
   Board Name: LSI3008-IT
   Board Assembly:
Chip Name: LSISAS3008
Chip Revision: ALL
BIOS Revision: 17.00.00.00
Firmware Revision: 15.00.03.00
  Integrated RAID: no
...


It's definitely not related to the types/models of disks used or the BSD
version. Just make sure you are running the following firmware versions as i
already have these in production with FreeBSD 11.2-RELEASE:

Adapter:
mpr0 Adapter:
   Board Name: LSI3008-IT
   Board Assembly: 
Chip Name: LSISAS3008
Chip Revision: ALL
BIOS Revision: 17.00.00.00
Firmware Revision: 15.00.03.00
  Integrated RAID: no

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives

2020-01-29 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496

--- Comment #29 from Bane Ivosev  ---
We have success with  11.1 and 12.1-RELEASE standard installation. No compiling
and mixing driver versions. Problem was with 12.0 and 11.3-RELEASE.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives

2020-01-29 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496

--- Comment #28 from free...@diyit.org ---
I finally tried 12.1 and it seemed to run fine for a few days till I started a
large file transfer and ...

(da4:mps0:0:11:0): READ(10). CDB: 28 00 7c fa 4a a0 00 00 08 00
(da4:mps0:0:11:0): CAM status: SCSI Status Error
(da4:mps0:0:11:0): SCSI status: Check Condition
(da4:mps0:0:11:0): SCSI sense: ABORTED COMMAND asc:47,3 (Information unit iuCRC
error detected)
(da4:mps0:0:11:0): Retrying command (per sense data)

Currently running 12.1-p2 with 11.3 release 357156 mps driver.

I just delete the /usr/sr/sys/dev/mps directory and copy it over from the 11.3
source and compile the kernel

/Sharad

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives

2020-01-29 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496

--- Comment #27 from Matthias Pfaller  ---
How does one change the affected version in the bug report? After all this is a
>11.1 problem and not a 11.1-STABLE problem.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives

2020-01-29 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496

wanpengq...@gmail.com changed:

   What|Removed |Added

 CC||wanpengq...@gmail.com

--- Comment #26 from wanpengq...@gmail.com ---
I experience this issue mulpi-times after upgrade to 12.1 from 11.1.
the server is rock before. and I didn't realize it is a driver issue.

Since my pool is already upgrade to lastest, I cannot downgrade
to 11.1. I am buiding a 11.1 mps.ko for 12.1 kernel.
load it manually by loader.conf

I will report the result a few weeks later.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives

2020-01-22 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496

--- Comment #25 from Bane Ivosev  ---
(In reply to Matthias Pfaller from comment #24)
We are still fine. No problem at all.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives

2020-01-22 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496

--- Comment #24 from Matthias Pfaller  ---
(In reply to Bane Ivosev from comment #23)
We just gave it another try. We hopped that the problem might have been caused
by a defective disk when we tried last. The problem was triggered again during
the next night's backup :-( This is forcing us to keep running 11.1 on our
backup machines.

Any (bad/good) news from your machine?

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives

2019-12-05 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496

--- Comment #23 from Bane Ivosev  ---
Really sad news ... We are still running for 18 days now, everything is great
with 12.1 for us, but is still early to make conclusions and your expirience is
not promissing.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives

2019-12-05 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496

--- Comment #22 from Matthias Pfaller  ---
I'm running FreeBSD 12.1 now. Yesterday we reattached the drives to the LSI
controller. Problems showed up during the next backup :-(

Kernel log:
Dec  4 10:13:30 nyx kernel: mps0:  port
0x8000-0x80ff mem 0xdf70-0xdf703fff,0xdf68-0xdf6b irq 32 at device
0.0 on pci3
Dec  4 10:13:30 nyx kernel: mps0: Firmware: 20.00.07.00, Driver:
21.02.00.00-fbsd
Dec  4 10:13:30 nyx kernel: mps0: IOCCapabilities:
185c
Dec  4 10:33:08 nyx kernel: mps0:  port
0x8000-0x80ff mem 0xdf70-0xdf703fff,0xdf68-0xdf6b irq 32 at device
0.0 on pci3
Dec  4 10:33:08 nyx kernel: mps0: Firmware: 20.00.07.00, Driver:
21.02.00.00-fbsd
Dec  4 10:33:08 nyx kernel: mps0: IOCCapabilities:
185c
Dec  4 10:33:08 nyx kernel: da0 at mps0 bus 0 scbus0 target 2 lun 0
Dec  4 10:33:08 nyx kernel: da1 at mps0 bus 0 scbus0 target 3 lun 0
Dec  4 10:33:08 nyx kernel: da2 at mps0 bus 0 scbus0 target 4 lun 0
Dec  4 10:33:08 nyx kernel: da4 at mps0 bus 0 scbus0 target 6 lun 0
Dec  4 10:33:08 nyx kernel: da5 at mps0 bus 0 scbus0 target 7 lun 0
Dec  4 10:33:08 nyx kernel: da6 at mps0 bus 0 scbus0 target 8 lun 0
Dec  4 10:33:08 nyx kernel: da3 at mps0 bus 0 scbus0 target 5 lun 0
Dec  4 10:33:08 nyx kernel: da7 at mps0 bus 0 scbus0 target 9 lun 0
Dec  4 22:42:34 nyx kernel: (da4:mps0:0:6:0): READ(16). CDB: 88 00 00 00 00
03 45 37 b3 48 00 00 00 08 00 00 length 4096 SMID 1583 Command timeout on
target 6(0x0010) 6 set, 60.66782887 elapsed
Dec  4 22:42:34 nyx kernel: mps0: Sending abort to target 6 for SMID 1583
Dec  4 22:42:34 nyx kernel: (da4:mps0:0:6:0): READ(16). CDB: 88 00 00 00 00
03 45 37 b3 48 00 00 00 08 00 00 length 4096 SMID 1583 Aborting command
0xfe00f9364f28
Dec  4 22:42:34 nyx kernel: (da7:mps0:0:9:0): WRITE(16). CDB: 8a 00 00 00
00 03 60 06 f6 c8 00 00 00 18 00 00 length 12288 SMID 925 Command timeout on
target 9(0x000d) 6 set, 60.23115994 elapsed
Dec  4 22:42:34 nyx kernel: mps0: Sending abort to target 9 for SMID 925
Dec  4 22:42:34 nyx kernel: (da7:mps0:0:9:0): WRITE(16). CDB: 8a 00 00 00
00 03 60 06 f6 c8 00 00 00 18 00 00 length 12288 SMID 925 Aborting command
0xfe00f932daf8
Dec  4 22:42:34 nyx kernel: (da0:mps0:0:2:0): WRITE(16). CDB: 8a 00 00 00
00 04 dc 52 aa d8 00 00 00 30 00 00 length 24576 SMID 1033 Command timeout on
target 2(0x000c) 6 set, 60.24094823 elapsed
Dec  4 22:42:34 nyx kernel: mps0: Sending abort to target 2 for SMID 1033
Dec  4 22:42:34 nyx kernel: (da0:mps0:0:2:0): WRITE(16). CDB: 8a 00 00 00
00 04 dc 52 aa d8 00 00 00 30 00 00 length 24576 SMID 1033 Aborting command
0xfe00f9336c18
Dec  4 22:42:34 nyx kernel: (da1:mps0:0:3:0): WRITE(16). CDB: 8a 00 00 00
00 04 dc 52 aa d8 00 00 00 30 00 00 length 24576 SMID 1440 Command timeout on
target 3(0x000b) 6 set, 60.24535090 elapsed
Dec  4 22:42:34 nyx kernel: mps0: Sending abort to target 3 for SMID 1440
Dec  4 22:42:34 nyx kernel: (da1:mps0:0:3:0): WRITE(16). CDB: 8a 00 00 00
00 04 dc 52 aa d8 00 00 00 30 00 00 length 24576 SMID 1440 Aborting command
0xfe00f9358f00
Dec  4 22:42:34 nyx kernel: (da2:mps0:0:4:0): WRITE(10). CDB: 2a 00 24 36
61 98 00 00 18 00 length 12288 SMID 1472 Command timeout on target 4(0x000a)
6 set, 60.24982318 elapsed
Dec  4 22:42:34 nyx kernel: mps0: Sending abort to target 4 for SMID 1472
Dec  4 22:42:34 nyx kernel: (da2:mps0:0:4:0): WRITE(10). CDB: 2a 00 24 36
61 98 00 00 18 00 length 12288 SMID 1472 Aborting command 0xfe00f935ba00
Dec  4 22:42:34 nyx kernel: (da5:mps0:0:7:0): WRITE(10). CDB: 2a 00 24 36
61 10 00 00 a0 00 length 81920 SMID 507 Command timeout on target 7(0x000f)
6 set, 60.25666047 elapsed
Dec  4 22:42:34 nyx kernel: mps0: Sending abort to target 7 for SMID 507
Dec  4 22:42:34 nyx kernel: (da5:mps0:0:7:0): WRITE(10). CDB: 2a 00 24 36
61 10 00 00 a0 00 length 81920 SMID 507 Aborting command 0xfe00f930a948
Dec  4 22:42:40 nyx kernel: (xpt0:mps0:0:7:0): SMID 6 task mgmt
0xfe00f92e0810 timed out
Dec  4 22:42:40 nyx kernel: mps0: Reinitializing controller
Dec  4 22:42:40 nyx kernel: mps0: Unfreezing devq for target ID 6
Dec  4 22:42:40 nyx kernel: mps0: Unfreezing devq for target ID 9
Dec  4 22:42:40 nyx kernel: mps0: Unfreezing devq for target ID 2

... lots more
Just the reinitialization messages:
Dec  4 22:42:40 nyx kernel: mps0: Reinitializing controller
Dec  4 23:01:27 nyx kernel: mps0: Reinitializing controller
Dec  4 23:40:55 nyx kernel: mps0: Reinitializing controller
Dec  4 23:47:49 nyx kernel: mps0: Reinitializing controller
Dec  4 23:58:02 nyx kernel: mps0: Reinitializing controller
Dec  5 00:17:33 nyx kernel: mps0: Reinitializing controller
Dec  5 00:20:18 nyx kernel: mps0: Reinitializing controller
Dec  5 00:21:33 nyx kernel: mps0: Reinitializing controller
Dec  5 00:24:30 nyx kernel: mps0: Reinitializing controller
Dec  5 00:26:40 nyx kernel: mps0: Reinitializing controller
Dec  5 00:29:30 nyx kernel: mps0: 

[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives

2019-11-28 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496

--- Comment #21 from Bane Ivosev  ---
(In reply to freebsd from comment #20)

I'm running 12.1-RELEASE on the same hardware for 10 days now. Everything is
ok. I'll report back for about a month. Success, i hope.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives

2019-11-03 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496

--- Comment #20 from free...@diyit.org ---
(In reply to Paul Thornton from comment #19)

Thanks for sharing Paul. I took a slightly different approach. I have been
running stable for a few months now with a FreeBSD 12.0-RELEASE kernel but
using the FreeBSD 11.2 LSI mps driver. 12.1 brings in many updates to the 12.0
mps driver, hopefully those address these problems; will test once its
released.

/Sharad

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives

2019-10-02 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496

--- Comment #19 from Paul Thornton  ---
An update for all.

I downgraded our production servers from 12.0 to 11.1-RELEASE-p15 and for 5
weeks they have worked without any problems.

Previously we saw a problem after 4 weeks, then another after 1 week - so
whilst there is a chance I have not waited long enough, I think this definitely
fixes my problem.

Paul.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives

2019-08-14 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496

--- Comment #18 from Paul Thornton  ---
(In reply to Bane Ivosev from comment #16)

Thanks for the confirmation.  I'm in the process of downgrading the affected
NAS machine (they are an identical pair) from 12.0 to 11.1, I need to test this
in the lab first but will report back once we have the production machine on
the older release.

I need to do this before it next crashes and reboots, of course!

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives

2019-08-14 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496

--- Comment #17 from Matthias Pfaller  ---
(In reply to Francois Baillargeon from comment #14)
We have cache and log devices:
NAME SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAGCAP  DEDUP
 HEALTH  ALTROOT
k-zb1   87.2T  35.5T  51.8T- -20%40%  1.91x
 ONLINE  -
  raidz287.2T  35.5T  51.8T- -20%40%
gpt/k-zb1-0 -  -  -- -  -  -
gpt/k-zb1-1 -  -  -- -  -  -
gpt/k-zb1-2 -  -  -- -  -  -
gpt/k-zb1-3 -  -  -- -  -  -
gpt/k-zb1-4 -  -  -- -  -  -
gpt/k-zb1-5 -  -  -- -  -  -
gpt/k-zb1-6 -  -  -- -  -  -
gpt/k-zb1-7 -  -  -- -  -  -
log -  -  - -  -  -
  mirror11.5G  0  11.5G- - 0% 0%
gpt/k-zb1-zil0  -  -  -- -  -  -
gpt/k-zb1-zil1  -  -  -- -  -  -
cache   -  -  - -  -  -
  gpt/k-zb1-cache0  80.0G  29.9G  50.1G- - 0%37%
  gpt/k-zb1-cache1  80.0G  30.0G  50.0G- - 0%37%

The only solution for us was to use the onboard sata ports instead of the
LSI-controller. We have keep the other machines that need the LSI-controllers
(not enough ports on the mainboard) at 11.1 :-(

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives

2019-08-13 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496

--- Comment #16 from Bane Ivosev  ---
Hi Paul, i had exact same symptoms like you but with WD Reds 4TB, and for me 
with 11.1 everything work flawlessly for more then 130 days now.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives

2019-08-13 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496

Paul Thornton  changed:

   What|Removed |Added

 CC||freebsd-bugzi...@prt.org

--- Comment #15 from Paul Thornton  ---
I too have run into this issue on a nas box, once it started taking on any kind
of load.

Running 12.0-RELEASE p3

The server contains 8x Seagate Ironwolf Pro 10Tb SATA drives on an Avago 3008
HBA - 8 of these basically:

da2 at mpr1 bus 0 scbus13 target 12 lun 0
da2:  Fixed Direct Access SPC-4 SCSI device
da2: Serial Number ZA237AVY
da2: 1200.000MB/s transfers
da2: Command Queueing enabled
da2: 9537536MB (19532873728 512 byte sectors)

Driver versions:
dev.mpr.1.driver_version: 18.03.00.00-fbsd
dev.mpr.1.firmware_version: 15.00.03.00

These drives are configured in a ZFS RAID10 setup (in case that datapoint
matters):
NAME STATE READ WRITE CKSUM
data0ONLINE   0 0 0
  mirror-0   ONLINE   0 0 0
da2.eli  ONLINE   0 0 0
da3.eli  ONLINE   0 0 0
  mirror-1   ONLINE   0 0 0
da4.eli  ONLINE   0 0 0
da5.eli  ONLINE   0 0 0
  mirror-2   ONLINE   0 0 0
da6.eli  ONLINE   0 0 0
da7.eli  ONLINE   0 0 0
  mirror-3   ONLINE   0 0 0
da8.eli  ONLINE   0 0 0
da9.eli  ONLINE   0 0 0

I currently get about 25 days between reboots.  The machine hangs and (I'm
guessing here) kernel panics and restarts - I don't have the panic information,
but log messages look very similar to what other people are seeing:

Jul 20 11:14:17 nas1a kernel:   (da2:mpr1:0:12:0): WRITE(10). CDB: 2a 00 62 81
f9 d0 00 00 30 00 length 24576 SMID 1484 Command timeout on
 target 12(0x000c), 6 set, 60.703976195 elapsed
Jul 20 11:14:17 nas1a kernel: mpr1: At enclosure level 0, slot 2, connector
name ()
Jul 20 11:14:17 nas1a kernel: mpr1: Sending abort to target 12 for SMID 1484
Jul 20 11:14:17 nas1a kernel:   (da2:mpr1:0:12:0): WRITE(10). CDB: 2a 00 62 81
f9 d0 00 00 30 00 length 24576 SMID 1484 Aborting command 0
xfe00bad0b540
Jul 20 11:14:17 nas1a kernel:   (da2:mpr1:0:12:0): SYNCHRONIZE CACHE(10). CDB:
35 00 00 00 00 00 00 00 00 00 length 0 SMID 1792 Command ti
meout on target 12(0x000c), 6 set, 60.707504796 elapsed
Jul 20 11:14:17 nas1a kernel: mpr1: At enclosure level 0, slot 2, connector
name ()
Jul 20 11:14:18 nas1a kernel: mpr1: Controller reported scsi ioc terminated tgt
12 SMID 1792 loginfo 3114
Jul 20 11:14:18 nas1a kernel: (da2:mpr1:0:12:0): WRITE(10). CDB: 2a 00 62 81 f9
d0 00 00 30 00
Jul 20 11:14:18 nas1a kernel: mpr1: Abort failed for target 12, sending logical
unit reset
Jul 20 11:14:18 nas1a kernel: mpr1: (da2:mpr1:0:12:0): CAM status: CCB request
aborted by the host
Jul 20 11:14:18 nas1a kernel: Sending logical unit reset to target 12 lun 0
Jul 20 11:14:18 nas1a kernel: (da2:mpr1:0:12:0): Retrying command, 3 more tries
remain
Jul 20 11:14:18 nas1a kernel: mpr1: At enclosure level 0, slot 2, connector
name ()
Jul 20 11:14:18 nas1a kernel: (da2:mpr1:0:12:0): SYNCHRONIZE CACHE(10). CDB: 35
00 00 00 00 00 00 00 00 00
Jul 20 11:14:18 nas1a kernel: (da2:mpr1:0:12:0): CAM status: CCB request
completed with an error
Jul 20 11:14:18 nas1a kernel: (da2:mpr1:0:12:0): Retrying command, 0 more tries
remain
Jul 20 11:14:18 nas1a kernel: mpr1: mprsas_action_scsiio: Freezing devq for
target ID 12
Jul 20 11:14:18 nas1a kernel: (da2:mpr1:0:12:0): SYNCHRONIZE CACHE(10). CDB: 35
00 00 00 00 00 00 00 00 00
Jul 20 11:14:18 nas1a kernel: (da2:mpr1:0:12:0): CAM status: CAM subsystem is
busy
Jul 20 11:14:18 nas1a kernel: (da2:mpr1:0:12:0): Error 5, Retries exhausted
Jul 20 11:14:18 nas1a kernel: mpr1: mprsas_action_scsiio: Freezing devq for
target ID 12
Jul 20 11:14:18 nas1a kernel: (da2:mpr1:0:12:0): WRITE(10). CDB: 2a 00 62 81 f9
d0 00 00 30 00
Jul 20 11:14:18 nas1a kernel: (da2:mpr1:0:12:0): CAM status: CAM subsystem is
busy
Jul 20 11:14:18 nas1a kernel: (da2:mpr1:0:12:0): Retrying command, 2 more tries
remain

[reboot happens here]

And the most recent one, today:

Aug 13 08:58:55 nas1a kernel:   (da6:mpr1:0:16:0): SYNCHRONIZE CACHE(10). CDB:
35 00 00 00 00 00 00 00 00 00 length 0 SMID 998 Command tim
eout on target 16(0x0010), 6 set, 60.109683189 elapsed
Aug 13 08:58:55 nas1a kernel: mpr1: At enclosure level 0, slot 6, connector
name ()
Aug 13 08:58:55 nas1a kernel: mpr1: Sending abort to target 16 for SMID 998
Aug 13 08:58:55 nas1a kernel:   (da6:mpr1:0:16:0): SYNCHRONIZE CACHE(10). CDB:
35 00 00 00 00 00 00 00 00 00 length 0 SMID 998 Aborting co
mmand 0xfe00bacdfaa0
Aug 13 08:58:55 nas1a kernel: mpr1: Abort failed for target 16, sending logical
unit reset
Aug 13 08:58:55 nas1a 

[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives

2019-08-13 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496

--- Comment #14 from Francois Baillargeon 
 ---
We usually have a cache on affected pools, and sadly we still have issues when
the l2arc hits the disk, think long sequential reads

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives

2019-08-13 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496

--- Comment #13 from Christoph Bubel  ---
(In reply to Daniel Shafer from comment #10)

I can confirm the workaround, no errors since i added L2ARC to the pool.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives

2019-08-08 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496

--- Comment #12 from n...@nmc.dev ---

Hi,

We are seeing the same issue. 

Here is more information on our setup :

FreeNAS-11.2-U5
FreeBSD 11.2-STABLE amd64

We use 2 x (6x 14TB seagate ironwolf drives )
We also have a 2 TB crucial SSD for L2ARC

Issue always comes up after 10-14hours of heavy IO

Disk Model : 14 TB Seagate  ST14000VN0008


The drives are on 2 different LSI HBAs. Drive that fails are random on both
those HBA.

Please let us know if you need more information on this, it is impacting our
production load.

Thank you.

Log output for our latest errors :

>   (da30:mpr1:0:20:0): READ(16). CDB: 88 00 00 00 00 01 05 1a 70 a8 00 
> 00 00 10 00 00 length 8192 SMID 60 Aborting command 0xfe000171f640
> mpr1: Sending reset from mprsas_send_abort for target ID 20
>   (da30:mpr1:0:20:0): READ(10). CDB: 28 00 b7 81 49 f0 00 00 08 00 length 
> 4096 SMID 332 terminated ioc 804b loginfo 3113 scsi 0 state c xfer 0
>   (da30:mpr1:0:20:0): READ(10). CDB: 28 00 b7 81 49 e8 00 00 08 00 
> length 4096 SMID 703 terminated ioc 804b loginfo 3113 
> sc(da30:mpr1:0:20:0): READ(10). CDB: 28 00 b7 81 49 f0 00 00 08 00 si 0 state 
> c xfer 0
>   (da30:mpr1:0:20:0): READ(16). CDB: 88 00 00 00 00 01 05 1a 77 b8 00 
> 00 01 00 00 00 length 131072 SMID 510 terminated ioc 
> 804b(da30:mpr1:0:20:0): CAM status: CCB request completed with an 
> error
> (da30:mpr1:0:20:0): Retrying command
> (da30:mpr1:0:20:0): READ(10). CDB: 28 00 b7 81 49 e8 00 00 08 00  
> loginfo 3113 scsi 0 state c xfer 0
> (da30:mpr1:0:20:0): CAM status: CCB request completed with an error
> (da30:mpr1:0:20:0): Retrying command
>   (da30:mpr1:0:20:0): READ(16). CDB: 88 00 00 00 00 01 05 1a 76 b8 00 00 
> 01 00 00 00 length 131072 SMID 938 terminated ioc 804b loginfo 3113 scsi 
> 0 state c xfer 0
>   (da30:mpr1:0:20:0): READ(16). CDB: 88 00 00 00 00 01 05 1a 75 b8 00 00 
> 01 00 00 00 length 131072 SMID 839 terminated ioc 804b loginfo 3113 scsi 
> 0 state c xfer 0
>   (da30:mpr1:0:20:0): READ(16). CDB: 88 00 00 00 00 01 05 1a 74 b8 00 00 
> 01 00 00 00 length 131072 SMID 681 terminated ioc 804b loginfo 3113 scsi 
> 0 state c xfer 0
>   (da30:mpr1:0:20:0): READ(16). CDB: 88 00 00 00 00 01 05 1a 73 b8 00 00 
> 01 00 00 00 length 131072 SMID 647 terminated ioc 804b loginfo 3113 scsi 
> 0 state c xfer 0
>   (da30:mpr1:0:20:0): READ(16). CDB: 88 00 00 00 00 01 05 1a 72 b8 00 00 
> 01 00 00 00 length 131072 SMID 253 terminated ioc 804b loginfo 3113 scsi 
> 0 state c xfer 0
>   (da30:mpr1:0:20:0): READ(16). CDB: 88 00 00 00 00 01 05 1a 71 b8 00 00 
> 01 00 00 00 length 131072 SMID 109 terminated ioc 804b loginfo 3113 scsi 
> 0 state c xfer 0
>   (da30:mpr1:0:20:0): READ(16). CDB: 88 00 00 00 00 01 05 1a 70 b8 00 00 
> 01 00 00 00 length 131072 SMID 267 terminated ioc 804b loginfo 3113 scsi 
> 0 state c xfer 0
>   (da30:mpr1:0:20:0): READ(16). CDB: 88 00 00 00 00 01 05 1a 70 98 00 00 
> 00 10 00 00 length 8192 SMID 506 terminated ioc 804b loginfo 3113 scsi 0 
> state c xfer 0
>   (da30:mpr1:0:20:0): READ(16). CDB: 88 00 00 00 00 01 05 1a 70 88 00 00 
> 00 10 00 00 length 8192 SMID 774 terminated ioc 804b loginfo 3113 scsi 0 
> state c xfer 0
>   (da30:mpr1:0:20:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 
> 00 00 00 length 0 SMID 281 terminated ioc 804b loginfo 3114 scsi 0 
> state c xfer 0
> mpr1: Unfreezing devq for target ID 20
> (da30:mpr1:0:20:0): READ(16). CDB: 88 00 00 00 00 01 05 1a 77 b8 00 00 
> 01 00 00 00
> (da30:mpr1:0:20:0): CAM status: CCB request completed with an error
> (da30:mpr1:0:20:0): Retrying command
> (da30:mpr1:0:20:0): READ(16). CDB: 88 00 00 00 00 01 05 1a 76 b8 00 00 
> 01 00 00 00
> (da30:mpr1:0:20:0): CAM status: CCB request completed with an error
> (da30:mpr1:0:20:0): Retrying command
> (da30:mpr1:0:20:0): READ(16). CDB: 88 00 00 00 00 01 05 1a 75 b8 00 00 
> 01 00 00 00
> (da30:mpr1:0:20:0): CAM status: CCB request completed with an error
> (da30:mpr1:0:20:0): Retrying command
> (da30:mpr1:0:20:0): READ(16). CDB: 88 00 00 00 00 01 05 1a 74 b8 00 00 
> 01 00 00 00
> (da30:mpr1:0:20:0): CAM status: CCB request completed with an error
> (da30:mpr1:0:20:0): Retrying command
> (da30:mpr1:0:20:0): READ(16). CDB: 88 00 00 00 00 01 05 1a 73 b8 00 00 
> 01 00 00 00
> (da30:mpr1:0:20:0): CAM status: CCB request completed with an error
> (da30:mpr1:0:20:0): Retrying command
> (da30:mpr1:0:20:0): READ(16). CDB: 88 00 00 00 00 01 05 1a 72 b8 00 00 
> 01 00 00 00
> (da30:mpr1:0:20:0): CAM status: CCB request completed with an error
> (da30:mpr1:0:20:0): Retrying command
> (da30:mpr1:0:20:0): READ(16). CDB: 88 00 00 00 00 01 05 1a 71 b8 00 00 
> 01 00 00 00
> (da30:mpr1:0:20:0): CAM status: CCB request completed with an error
> (da30:mpr1:0:20:0): Retrying command
> (da30:mpr1:0:20:0): READ(16). CDB: 88 00 00 00 00 01 05 1a 70 b8 00 00 
> 01 00 00 00
> 

[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives

2019-08-08 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496

Francois Baillargeon  changed:

   What|Removed |Added

 CC||francois.baillargeon@gearbo
   ||xsoftware.com

--- Comment #11 from Francois Baillargeon 
 ---
Following what Daniel Shafer says, we have the same issues on a Freenas
deployment we did.

Everything fine with our other pools that use less than 10tb drives. But one of
our pool using 14tb drive exhibit this exact behavior.

For us this is a major show stopper bug since we can't use this pool reliably.
Our vendor sent us a new HBA, a new server, etc before I stumbled upon this bug
listing.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives

2019-07-23 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496

Daniel Shafer  changed:

   What|Removed |Added

 CC||dan...@shafer.cc

--- Comment #10 from Daniel Shafer  ---
So I came across this same issue.  It was causing my server to reboot several
times a day due to kernel panics caused by this issue.  It happens with both
SAS9200 and 9300 controllers.  I have 8 x 10TB Seagate Iron Wolf NAS drives.

I wanted to mention that for me there was a resolution.  I added an Intel
Optane 900p 280GB drive and set that up for cache/l2arc and the problem
entirely disappeared.  My server ran for 20 days before I rebooted it last
night to perform an upgrade.

So, a workaround I believe would be is to add a cache drive to your ZFS pool.

The Intel Optane 900p is a highly recommended cache drive for ZFS pools.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives

2019-06-12 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496

--- Comment #9 from Matthias Pfaller  ---
(In reply to Bane Ivosev from comment #8)
We have several other machines with SAS2008 controllers. All of them are
running 11.1 and none of them shows these problems...

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives

2019-06-12 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496

--- Comment #8 from Bane Ivosev  ---
And I don't think the problem is exclusively with Seagate 10TB drives. We have
WD Red 4TB drives and have the same problem. We have same situation also with
11.2-RELEASE, and beacuse 11.2 and 12.0 have same mpr/mps driver version we
decide to try with 11.1.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives

2019-06-12 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496

--- Comment #7 from Bane Ivosev  ---
Just to append my previous post from March, same hardware and same config, we
revert back on 11.1-RELEASE and everything working flawlessly for more then two
months now.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives

2019-06-12 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496

--- Comment #6 from Matthias Pfaller  ---
We are using FreeBSD 12.0-RELEASE:
FreeBSD nyx 12.0-RELEASE-p4 FreeBSD 12.0-RELEASE-p4 GENERIC  amd64

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives

2019-06-12 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496

--- Comment #5 from Matthias Pfaller  ---
Comment on attachment 205003
  --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=205003
/var/log/messages during device resets

We just did configure a backup server with eight seagate ironwulf
(ST12000VN0007-2GS116) 12TB disks connected to a SAS2008:

Jun 12 08:51:35 nyx kernel: mps0:  port
0x8000-0x80ff mem 0xdf70-0xdf703fff,0xdf68-0xdf6b irq 32 at device
0.0 on pci3
Jun 12 08:51:35 nyx kernel: mps0: Firmware: 20.00.07.00, Driver:
21.02.00.00-fbsd
Jun 12 08:51:35 nyx kernel: mps0: IOCCapabilities:
185c

After writing ~200gb to our pool it started reseting. I did a

sysctl dev.mps.0.debug_level=0x$((0x1+0x2+0x4+0x10+0x20))

The resulting trace is attached.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives

2019-06-12 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496

--- Comment #4 from Matthias Pfaller  ---
We just did configure a backup server with eight seagate ironwulf
(ST12000VN0007-2GS116) 12TB disks connected to a SAS2008:

Jun 12 08:51:35 nyx kernel: mps0:  port
0x8000-0x80ff mem 0xdf70-0xdf703fff,0xdf68-0xdf6b irq 32 at device
0.0 on pci3
Jun 12 08:51:35 nyx kernel: mps0: Firmware: 20.00.07.00, Driver:
21.02.00.00-fbsd
Jun 12 08:51:35 nyx kernel: mps0: IOCCapabilities:
185c

After writing ~200gb to our pool it started reseting. I did a

sysctl dev.mps.0.debug_level=0x$((0x1+0x2+0x4+0x10+0x20))

The resulting trace is attached.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives

2019-06-12 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496

Matthias Pfaller  changed:

   What|Removed |Added

 CC||matthias.pfaller@familie-pf
   ||aller.de

--- Comment #3 from Matthias Pfaller  ---
Created attachment 205003
  --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=205003=edit
/var/log/messages during device resets

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives

2019-03-23 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496

--- Comment #2 from Bane Ivosev  ---
Forgot to say, its FreeBSD 12-RELEASE.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives

2019-03-23 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496

Bane Ivosev  changed:

   What|Removed |Added

 CC||bane.ivo...@pmf.uns.ac.rs

--- Comment #1 from Bane Ivosev  ---
We have the very same problem but with WD Red disks. System randomly reboot
sometimes after 20 days of working. Different disk everytime. It's our
production nfs server and now it's very frustrating.

Supermicro 5049p
64 GB ECC RAM
LSI 3008 IT mode
18x WD Red 4 TB

Mar 23 07:39:46 fap kernel:   (da17:mpr0:0:25:0): SYNCHRONIZE CACHE(10).
CDB: 35 00 00 00 00 00 00 00 00 00 length 0 SMID 357 Command timeout on target
25(0x001c), 6 set, 60.107418057 elapsed
Mar 23 07:39:46 fap kernel: mpr0: At enclosure level 0, slot 17, connector name
()
Mar 23 07:39:46 fap kernel: mpr0: Sending abort to target 25 for SMID 357
Mar 23 07:39:46 fap kernel:   (da17:mpr0:0:25:0): SYNCHRONIZE CACHE(10).
CDB: 35 00 00 00 00 00 00 00 00 00 length 0 SMID 357 Aborting command
0xfe00b7aa6130
Mar 23 07:39:46 fap kernel:   (pass19:mpr0:0:25:0): ATA COMMAND PASS
THROUGH(16). CDB: 85 08 0e 00 d0 00 01 00 00 00 4f 00 c2 00 b0 00 length 512
SMID 1182 Command timeout on target 25(0x001c), 6 set, 60.217681679
elampr0: At enclosure level 0, slot 17, connector name ()
Mar 23 07:39:46 fap kernel: mpr0: Controller reported scsi ioc terminated tgt
25 SMID 1182 loginfo 3113
Mar 23 07:39:46 fap kernel: mpr0: Abort failed for target 25, sending logical
unit reset
Mar 23 07:39:46 fap kernel: mpr0: Sending logical unit reset to target 25 lun 0
Mar 23 07:39:46 fap kernel: mpr0: At enclosure level 0, slot 17, connector name
()
Mar 23 07:39:46 fap kernel: (da17:mpr0:0:25:0): SYNCHRONIZE CACHE(10). CDB: 35
00 00 00 00 00 00 00 00 00 
Mar 23 07:39:46 fap kernel: (da17:mpr0:0:25:0): CAM status: CCB request aborted
by the host
Mar 23 07:39:46 fap kernel: (da17:mpr0:0:25:0): Retrying command, 0 more tries
remain
Mar 23 07:39:46 fap kernel: mpr0: mprsas_action_scsiio: Freezing devq for
target ID 25
Mar 23 07:39:46 fap kernel: (da17:mpr0:0:25:0): SYNCHRONIZE CACHE(10). CDB: 35
00 00 00 00 00 00 00 00 00 
Mar 23 07:39:46 fap kernel: (da17:mpr0:0:25:0): CAM status: CAM subsystem is
busy
Mar 23 07:39:46 fap kernel: (da17:mpr0:0:25:0): Error 5, Retries exhausted
Mar 23 07:39:46 fap smartd[95746]: Device: /dev/da17 [SAT], failed to read
SMART Attribute Data
Mar 23 07:39:46 fap kernel: mpr0: mprsas_action_scsiio: Freezing devq for
target ID 25
Mar 23 07:39:46 fap kernel: (da17:mpr0:0:25:0): WRITE(10). CDB: 2a 00 09 4a 32
a8 00 00 08 00 
Mar 23 07:39:46 fap kernel: (da17:mpr0:0:25:0): CAM status: CAM subsystem is
busy
Mar 23 07:39:46 fap kernel: (da17:mpr0:0:25:0): Retrying command, 3 more tries
remain
Mar 23 07:43:19 fap syslogd: kernel boot file is /boot/kernel/kernel
Mar 23 07:43:19 fap kernel: ---<>---

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives

2017-12-21 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496

Christoph Bubel  changed:

   What|Removed |Added

   Hardware|Any |amd64
   Severity|Affects Only Me |Affects Some People

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives

2017-12-21 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496

Bug ID: 224496
   Summary: mpr and mps drivers seems to have issues with large
seagate drives
   Product: Base System
   Version: 11.1-STABLE
  Hardware: Any
OS: Any
Status: New
  Severity: Affects Only Me
  Priority: ---
 Component: kern
  Assignee: freebsd-bugs@FreeBSD.org
  Reporter: cbu...@mailbox.org

Over on the Freenas forums several people reported issues with large (10TB)
Seagate drives (ST1NM0016 and ST1VN0004) and LSI controllers. Links to
the threads:
https://forums.freenas.org/index.php?threads/lsi-avago-9207-8i-with-seagate-10tb-enterprise-st1nm0016.58251/
https://forums.freenas.org/index.php?threads/synchronize-cache-command-timeout-error.55067/

I am using the ST1NM0016 drives and i am getting the following errors on a
LSI SAS2308 (mps driver) and on a LSI SAS3008 (mpr driver). This happens about
once every one or two weeks in low load situations. 

Here the logs:

(da2:mps0:0:1:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00
length 0 SMID 1010 command timeout cm 0xfef7cda0 ccb 0xf8018198d800
(noperiph:mps0:0:4294967295:0): SMID 1 Aborting command 0xfef7cda0
mps0: Sending reset from mpssas_send_abort for target ID 1
(da2:mps0:0:1:0): WRITE(16). CDB: 8a 00 00 00 00 02 e1 76 9f 88 00 00 00 08 00
00 length 4096 SMID 959 terminated ioc 804b scsi 0 state c xfer 0
mps0: Unfreezing devq for target ID 1
(da2:mps0:0:1:0): WRITE(16). CDB: 8a 00 00 00 00 02 e1 76 9f 88 00 00 00 08 00
00 
(da2:mps0:0:1:0): CAM status: CCB request completed with an error
(da2:mps0:0:1:0): Retrying command
(da2:mps0:0:1:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 
(da2:mps0:0:1:0): CAM status: Command timeout
(da2:mps0:0:1:0): Retrying command
(da2:mps0:0:1:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 
(da2:mps0:0:1:0): CAM status: SCSI Status Error
(da2:mps0:0:1:0): SCSI status: Check Condition
(da2:mps0:0:1:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus
device reset occurred)
(da2:mps0:0:1:0): Error 6, Retries exhausted
(da2:mps0:0:1:0): Invalidating pack

---

(da1:mpr0:0:4:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00
length 0 SMID 319 Aborting command 0xfef54a90
mpr0: Sending reset from mprsas_send_abort for target ID 4
(da1:mpr0:0:4:0): WRITE(16). CDB: 8a 00 00 00 00 03 35 b7 b9 f0 00 00 00 28 00
00 length 20480 SMID 320 terminated ioc 804b loginfo 3113 scsi 0 state c
xfer 0
mpr0: Unfreezing devq for target ID 4
(da1:mpr0:0:4:0): WRITE(16). CDB: 8a 00 00 00 00 03 35 b7 b9 f0 00 00 00 28 00
00 
(da1:mpr0:0:4:0): CAM status: CCB request completed with an error
(da1:mpr0:0:4:0): Retrying command
(da1:mpr0:0:4:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 
(da1:mpr0:0:4:0): CAM status: Command timeout
(da1:mpr0:0:4:0): Retrying command
(da1:mpr0:0:4:0): WRITE(16). CDB: 8a 00 00 00 00 03 35 b7 b9 f0 00 00 00 28 00
00 
(da1:mpr0:0:4:0): CAM status: SCSI Status Error
(da1:mpr0:0:4:0): SCSI status: Check Condition
(da1:mpr0:0:4:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus
device reset occurred)
(da1:mpr0:0:4:0): Retrying command (per sense data)
(da1:mpr0:0:4:0): WRITE(16). CDB: 8a 00 00 00 00 03 35 b7 ba 80 00 00 00 20 00
00 length 16384 SMID 653 terminated ioc 804b loginfo 31110e03 scsi 0 state c
xfer 0
(da1:mpr0:0:4:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00
length 0 SMID 711 terminated ioc 804b loginfo 311(da1:mpr0:0:4:0): WRITE(16).
CDB: 8a 00 00 00 00 03 35 b7 ba 80 00 00 00 20 00 00 
10e03 scsi 0 state c xfer 0
(da1:mpr0:0:4:0): CAM status: CCB request completed with an error
(da1:mpr0:0:4:0): Retrying command
(da1:mpr0:0:4:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 
(da1:mpr0:0:4:0): CAM status: CCB request completed with an error
(da1:mpr0:0:4:0): Retrying command
(da1:mpr0:0:4:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 
(da1:mpr0:0:4:0): CAM status: SCSI Status Error
(da1:mpr0:0:4:0): SCSI status: Check Condition
(da1:mpr0:0:4:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus
device reset occurred)
(da1:mpr0:0:4:0): Error 6, Retries exhausted
(da1:mpr0:0:4:0): Invalidating pack
(pass1:mpr0:0:4:0): ATA COMMAND PASS THROUGH(16). CDB: 85 06 2c 00 da 00 00 00
00 00 4f 00 c2 00 b0 00 length 0 SMID 797 terminated ioc 804b loginfo 31110e03
scsi 0 state c xfer 0
(pass1:mpr0:0:4:0): ATA COMMAND PASS THROUGH(16). CDB: 85 08 0e 00 d0 00 01 00
00 00 4f 00 c2 00 b0 00 length 512 SMID 753 terminated ioc 804b loginfo
31110e03 scsi 0 state c xfer 0
(pass1:mpr0:0:4:0): ATA COMMAND PASS THROUGH(16). CDB: 85 08 0e 00 d5 00 01 00
06 00 4f 00 c2 00 b0 00 length 512 SMID 846 terminated ioc 804b loginfo
31110e03 scsi 0 state c xfer 0
(pass1:mpr0:0:4:0): ATA COMMAND PASS THROUGH(16). CDB: 85 08 0e 00 d5 00 01