[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives

2019-08-14 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496

--- Comment #18 from Paul Thornton  ---
(In reply to Bane Ivosev from comment #16)

Thanks for the confirmation.  I'm in the process of downgrading the affected
NAS machine (they are an identical pair) from 12.0 to 11.1, I need to test this
in the lab first but will report back once we have the production machine on
the older release.

I need to do this before it next crashes and reboots, of course!

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives

2019-08-14 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496

--- Comment #17 from Matthias Pfaller  ---
(In reply to Francois Baillargeon from comment #14)
We have cache and log devices:
NAME SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAGCAP  DEDUP
 HEALTH  ALTROOT
k-zb1   87.2T  35.5T  51.8T- -20%40%  1.91x
 ONLINE  -
  raidz287.2T  35.5T  51.8T- -20%40%
gpt/k-zb1-0 -  -  -- -  -  -
gpt/k-zb1-1 -  -  -- -  -  -
gpt/k-zb1-2 -  -  -- -  -  -
gpt/k-zb1-3 -  -  -- -  -  -
gpt/k-zb1-4 -  -  -- -  -  -
gpt/k-zb1-5 -  -  -- -  -  -
gpt/k-zb1-6 -  -  -- -  -  -
gpt/k-zb1-7 -  -  -- -  -  -
log -  -  - -  -  -
  mirror11.5G  0  11.5G- - 0% 0%
gpt/k-zb1-zil0  -  -  -- -  -  -
gpt/k-zb1-zil1  -  -  -- -  -  -
cache   -  -  - -  -  -
  gpt/k-zb1-cache0  80.0G  29.9G  50.1G- - 0%37%
  gpt/k-zb1-cache1  80.0G  30.0G  50.0G- - 0%37%

The only solution for us was to use the onboard sata ports instead of the
LSI-controller. We have keep the other machines that need the LSI-controllers
(not enough ports on the mainboard) at 11.1 :-(

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives

2019-08-13 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496

--- Comment #16 from Bane Ivosev  ---
Hi Paul, i had exact same symptoms like you but with WD Reds 4TB, and for me 
with 11.1 everything work flawlessly for more then 130 days now.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives

2019-08-13 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496

Paul Thornton  changed:

   What|Removed |Added

 CC||freebsd-bugzi...@prt.org

--- Comment #15 from Paul Thornton  ---
I too have run into this issue on a nas box, once it started taking on any kind
of load.

Running 12.0-RELEASE p3

The server contains 8x Seagate Ironwolf Pro 10Tb SATA drives on an Avago 3008
HBA - 8 of these basically:

da2 at mpr1 bus 0 scbus13 target 12 lun 0
da2:  Fixed Direct Access SPC-4 SCSI device
da2: Serial Number ZA237AVY
da2: 1200.000MB/s transfers
da2: Command Queueing enabled
da2: 9537536MB (19532873728 512 byte sectors)

Driver versions:
dev.mpr.1.driver_version: 18.03.00.00-fbsd
dev.mpr.1.firmware_version: 15.00.03.00

These drives are configured in a ZFS RAID10 setup (in case that datapoint
matters):
NAME STATE READ WRITE CKSUM
data0ONLINE   0 0 0
  mirror-0   ONLINE   0 0 0
da2.eli  ONLINE   0 0 0
da3.eli  ONLINE   0 0 0
  mirror-1   ONLINE   0 0 0
da4.eli  ONLINE   0 0 0
da5.eli  ONLINE   0 0 0
  mirror-2   ONLINE   0 0 0
da6.eli  ONLINE   0 0 0
da7.eli  ONLINE   0 0 0
  mirror-3   ONLINE   0 0 0
da8.eli  ONLINE   0 0 0
da9.eli  ONLINE   0 0 0

I currently get about 25 days between reboots.  The machine hangs and (I'm
guessing here) kernel panics and restarts - I don't have the panic information,
but log messages look very similar to what other people are seeing:

Jul 20 11:14:17 nas1a kernel:   (da2:mpr1:0:12:0): WRITE(10). CDB: 2a 00 62 81
f9 d0 00 00 30 00 length 24576 SMID 1484 Command timeout on
 target 12(0x000c), 6 set, 60.703976195 elapsed
Jul 20 11:14:17 nas1a kernel: mpr1: At enclosure level 0, slot 2, connector
name ()
Jul 20 11:14:17 nas1a kernel: mpr1: Sending abort to target 12 for SMID 1484
Jul 20 11:14:17 nas1a kernel:   (da2:mpr1:0:12:0): WRITE(10). CDB: 2a 00 62 81
f9 d0 00 00 30 00 length 24576 SMID 1484 Aborting command 0
xfe00bad0b540
Jul 20 11:14:17 nas1a kernel:   (da2:mpr1:0:12:0): SYNCHRONIZE CACHE(10). CDB:
35 00 00 00 00 00 00 00 00 00 length 0 SMID 1792 Command ti
meout on target 12(0x000c), 6 set, 60.707504796 elapsed
Jul 20 11:14:17 nas1a kernel: mpr1: At enclosure level 0, slot 2, connector
name ()
Jul 20 11:14:18 nas1a kernel: mpr1: Controller reported scsi ioc terminated tgt
12 SMID 1792 loginfo 3114
Jul 20 11:14:18 nas1a kernel: (da2:mpr1:0:12:0): WRITE(10). CDB: 2a 00 62 81 f9
d0 00 00 30 00
Jul 20 11:14:18 nas1a kernel: mpr1: Abort failed for target 12, sending logical
unit reset
Jul 20 11:14:18 nas1a kernel: mpr1: (da2:mpr1:0:12:0): CAM status: CCB request
aborted by the host
Jul 20 11:14:18 nas1a kernel: Sending logical unit reset to target 12 lun 0
Jul 20 11:14:18 nas1a kernel: (da2:mpr1:0:12:0): Retrying command, 3 more tries
remain
Jul 20 11:14:18 nas1a kernel: mpr1: At enclosure level 0, slot 2, connector
name ()
Jul 20 11:14:18 nas1a kernel: (da2:mpr1:0:12:0): SYNCHRONIZE CACHE(10). CDB: 35
00 00 00 00 00 00 00 00 00
Jul 20 11:14:18 nas1a kernel: (da2:mpr1:0:12:0): CAM status: CCB request
completed with an error
Jul 20 11:14:18 nas1a kernel: (da2:mpr1:0:12:0): Retrying command, 0 more tries
remain
Jul 20 11:14:18 nas1a kernel: mpr1: mprsas_action_scsiio: Freezing devq for
target ID 12
Jul 20 11:14:18 nas1a kernel: (da2:mpr1:0:12:0): SYNCHRONIZE CACHE(10). CDB: 35
00 00 00 00 00 00 00 00 00
Jul 20 11:14:18 nas1a kernel: (da2:mpr1:0:12:0): CAM status: CAM subsystem is
busy
Jul 20 11:14:18 nas1a kernel: (da2:mpr1:0:12:0): Error 5, Retries exhausted
Jul 20 11:14:18 nas1a kernel: mpr1: mprsas_action_scsiio: Freezing devq for
target ID 12
Jul 20 11:14:18 nas1a kernel: (da2:mpr1:0:12:0): WRITE(10). CDB: 2a 00 62 81 f9
d0 00 00 30 00
Jul 20 11:14:18 nas1a kernel: (da2:mpr1:0:12:0): CAM status: CAM subsystem is
busy
Jul 20 11:14:18 nas1a kernel: (da2:mpr1:0:12:0): Retrying command, 2 more tries
remain

[reboot happens here]

And the most recent one, today:

Aug 13 08:58:55 nas1a kernel:   (da6:mpr1:0:16:0): SYNCHRONIZE CACHE(10). CDB:
35 00 00 00 00 00 00 00 00 00 length 0 SMID 998 Command tim
eout on target 16(0x0010), 6 set, 60.109683189 elapsed
Aug 13 08:58:55 nas1a kernel: mpr1: At enclosure level 0, slot 6, connector
name ()
Aug 13 08:58:55 nas1a kernel: mpr1: Sending abort to target 16 for SMID 998
Aug 13 08:58:55 nas1a kernel:   (da6:mpr1:0:16:0): SYNCHRONIZE CACHE(10). CDB:
35 00 00 00 00 00 00 00 00 00 length 0 SMID 998 Aborting co
mmand 0xfe00bacdfaa0
Aug 13 08:58:55 nas1a kernel: mpr1: Abort failed for target 16, sending logical
unit reset
Aug 13 08:58:55 nas1a 

[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives

2019-08-13 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496

--- Comment #14 from Francois Baillargeon 
 ---
We usually have a cache on affected pools, and sadly we still have issues when
the l2arc hits the disk, think long sequential reads

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives

2019-08-13 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496

--- Comment #13 from Christoph Bubel  ---
(In reply to Daniel Shafer from comment #10)

I can confirm the workaround, no errors since i added L2ARC to the pool.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives

2019-08-08 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496

--- Comment #12 from n...@nmc.dev ---

Hi,

We are seeing the same issue. 

Here is more information on our setup :

FreeNAS-11.2-U5
FreeBSD 11.2-STABLE amd64

We use 2 x (6x 14TB seagate ironwolf drives )
We also have a 2 TB crucial SSD for L2ARC

Issue always comes up after 10-14hours of heavy IO

Disk Model : 14 TB Seagate  ST14000VN0008


The drives are on 2 different LSI HBAs. Drive that fails are random on both
those HBA.

Please let us know if you need more information on this, it is impacting our
production load.

Thank you.

Log output for our latest errors :

>   (da30:mpr1:0:20:0): READ(16). CDB: 88 00 00 00 00 01 05 1a 70 a8 00 
> 00 00 10 00 00 length 8192 SMID 60 Aborting command 0xfe000171f640
> mpr1: Sending reset from mprsas_send_abort for target ID 20
>   (da30:mpr1:0:20:0): READ(10). CDB: 28 00 b7 81 49 f0 00 00 08 00 length 
> 4096 SMID 332 terminated ioc 804b loginfo 3113 scsi 0 state c xfer 0
>   (da30:mpr1:0:20:0): READ(10). CDB: 28 00 b7 81 49 e8 00 00 08 00 
> length 4096 SMID 703 terminated ioc 804b loginfo 3113 
> sc(da30:mpr1:0:20:0): READ(10). CDB: 28 00 b7 81 49 f0 00 00 08 00 si 0 state 
> c xfer 0
>   (da30:mpr1:0:20:0): READ(16). CDB: 88 00 00 00 00 01 05 1a 77 b8 00 
> 00 01 00 00 00 length 131072 SMID 510 terminated ioc 
> 804b(da30:mpr1:0:20:0): CAM status: CCB request completed with an 
> error
> (da30:mpr1:0:20:0): Retrying command
> (da30:mpr1:0:20:0): READ(10). CDB: 28 00 b7 81 49 e8 00 00 08 00  
> loginfo 3113 scsi 0 state c xfer 0
> (da30:mpr1:0:20:0): CAM status: CCB request completed with an error
> (da30:mpr1:0:20:0): Retrying command
>   (da30:mpr1:0:20:0): READ(16). CDB: 88 00 00 00 00 01 05 1a 76 b8 00 00 
> 01 00 00 00 length 131072 SMID 938 terminated ioc 804b loginfo 3113 scsi 
> 0 state c xfer 0
>   (da30:mpr1:0:20:0): READ(16). CDB: 88 00 00 00 00 01 05 1a 75 b8 00 00 
> 01 00 00 00 length 131072 SMID 839 terminated ioc 804b loginfo 3113 scsi 
> 0 state c xfer 0
>   (da30:mpr1:0:20:0): READ(16). CDB: 88 00 00 00 00 01 05 1a 74 b8 00 00 
> 01 00 00 00 length 131072 SMID 681 terminated ioc 804b loginfo 3113 scsi 
> 0 state c xfer 0
>   (da30:mpr1:0:20:0): READ(16). CDB: 88 00 00 00 00 01 05 1a 73 b8 00 00 
> 01 00 00 00 length 131072 SMID 647 terminated ioc 804b loginfo 3113 scsi 
> 0 state c xfer 0
>   (da30:mpr1:0:20:0): READ(16). CDB: 88 00 00 00 00 01 05 1a 72 b8 00 00 
> 01 00 00 00 length 131072 SMID 253 terminated ioc 804b loginfo 3113 scsi 
> 0 state c xfer 0
>   (da30:mpr1:0:20:0): READ(16). CDB: 88 00 00 00 00 01 05 1a 71 b8 00 00 
> 01 00 00 00 length 131072 SMID 109 terminated ioc 804b loginfo 3113 scsi 
> 0 state c xfer 0
>   (da30:mpr1:0:20:0): READ(16). CDB: 88 00 00 00 00 01 05 1a 70 b8 00 00 
> 01 00 00 00 length 131072 SMID 267 terminated ioc 804b loginfo 3113 scsi 
> 0 state c xfer 0
>   (da30:mpr1:0:20:0): READ(16). CDB: 88 00 00 00 00 01 05 1a 70 98 00 00 
> 00 10 00 00 length 8192 SMID 506 terminated ioc 804b loginfo 3113 scsi 0 
> state c xfer 0
>   (da30:mpr1:0:20:0): READ(16). CDB: 88 00 00 00 00 01 05 1a 70 88 00 00 
> 00 10 00 00 length 8192 SMID 774 terminated ioc 804b loginfo 3113 scsi 0 
> state c xfer 0
>   (da30:mpr1:0:20:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 
> 00 00 00 length 0 SMID 281 terminated ioc 804b loginfo 3114 scsi 0 
> state c xfer 0
> mpr1: Unfreezing devq for target ID 20
> (da30:mpr1:0:20:0): READ(16). CDB: 88 00 00 00 00 01 05 1a 77 b8 00 00 
> 01 00 00 00
> (da30:mpr1:0:20:0): CAM status: CCB request completed with an error
> (da30:mpr1:0:20:0): Retrying command
> (da30:mpr1:0:20:0): READ(16). CDB: 88 00 00 00 00 01 05 1a 76 b8 00 00 
> 01 00 00 00
> (da30:mpr1:0:20:0): CAM status: CCB request completed with an error
> (da30:mpr1:0:20:0): Retrying command
> (da30:mpr1:0:20:0): READ(16). CDB: 88 00 00 00 00 01 05 1a 75 b8 00 00 
> 01 00 00 00
> (da30:mpr1:0:20:0): CAM status: CCB request completed with an error
> (da30:mpr1:0:20:0): Retrying command
> (da30:mpr1:0:20:0): READ(16). CDB: 88 00 00 00 00 01 05 1a 74 b8 00 00 
> 01 00 00 00
> (da30:mpr1:0:20:0): CAM status: CCB request completed with an error
> (da30:mpr1:0:20:0): Retrying command
> (da30:mpr1:0:20:0): READ(16). CDB: 88 00 00 00 00 01 05 1a 73 b8 00 00 
> 01 00 00 00
> (da30:mpr1:0:20:0): CAM status: CCB request completed with an error
> (da30:mpr1:0:20:0): Retrying command
> (da30:mpr1:0:20:0): READ(16). CDB: 88 00 00 00 00 01 05 1a 72 b8 00 00 
> 01 00 00 00
> (da30:mpr1:0:20:0): CAM status: CCB request completed with an error
> (da30:mpr1:0:20:0): Retrying command
> (da30:mpr1:0:20:0): READ(16). CDB: 88 00 00 00 00 01 05 1a 71 b8 00 00 
> 01 00 00 00
> (da30:mpr1:0:20:0): CAM status: CCB request completed with an error
> (da30:mpr1:0:20:0): Retrying command
> (da30:mpr1:0:20:0): READ(16). CDB: 88 00 00 00 00 01 05 1a 70 b8 00 00 
> 01 00 00 00
> 

[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives

2019-08-08 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496

Francois Baillargeon  changed:

   What|Removed |Added

 CC||francois.baillargeon@gearbo
   ||xsoftware.com

--- Comment #11 from Francois Baillargeon 
 ---
Following what Daniel Shafer says, we have the same issues on a Freenas
deployment we did.

Everything fine with our other pools that use less than 10tb drives. But one of
our pool using 14tb drive exhibit this exact behavior.

For us this is a major show stopper bug since we can't use this pool reliably.
Our vendor sent us a new HBA, a new server, etc before I stumbled upon this bug
listing.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives

2019-07-23 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496

Daniel Shafer  changed:

   What|Removed |Added

 CC||dan...@shafer.cc

--- Comment #10 from Daniel Shafer  ---
So I came across this same issue.  It was causing my server to reboot several
times a day due to kernel panics caused by this issue.  It happens with both
SAS9200 and 9300 controllers.  I have 8 x 10TB Seagate Iron Wolf NAS drives.

I wanted to mention that for me there was a resolution.  I added an Intel
Optane 900p 280GB drive and set that up for cache/l2arc and the problem
entirely disappeared.  My server ran for 20 days before I rebooted it last
night to perform an upgrade.

So, a workaround I believe would be is to add a cache drive to your ZFS pool.

The Intel Optane 900p is a highly recommended cache drive for ZFS pools.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives

2019-06-12 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496

--- Comment #9 from Matthias Pfaller  ---
(In reply to Bane Ivosev from comment #8)
We have several other machines with SAS2008 controllers. All of them are
running 11.1 and none of them shows these problems...

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives

2019-06-12 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496

--- Comment #8 from Bane Ivosev  ---
And I don't think the problem is exclusively with Seagate 10TB drives. We have
WD Red 4TB drives and have the same problem. We have same situation also with
11.2-RELEASE, and beacuse 11.2 and 12.0 have same mpr/mps driver version we
decide to try with 11.1.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives

2019-06-12 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496

--- Comment #7 from Bane Ivosev  ---
Just to append my previous post from March, same hardware and same config, we
revert back on 11.1-RELEASE and everything working flawlessly for more then two
months now.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives

2019-06-12 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496

--- Comment #6 from Matthias Pfaller  ---
We are using FreeBSD 12.0-RELEASE:
FreeBSD nyx 12.0-RELEASE-p4 FreeBSD 12.0-RELEASE-p4 GENERIC  amd64

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives

2019-06-12 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496

--- Comment #5 from Matthias Pfaller  ---
Comment on attachment 205003
  --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=205003
/var/log/messages during device resets

We just did configure a backup server with eight seagate ironwulf
(ST12000VN0007-2GS116) 12TB disks connected to a SAS2008:

Jun 12 08:51:35 nyx kernel: mps0:  port
0x8000-0x80ff mem 0xdf70-0xdf703fff,0xdf68-0xdf6b irq 32 at device
0.0 on pci3
Jun 12 08:51:35 nyx kernel: mps0: Firmware: 20.00.07.00, Driver:
21.02.00.00-fbsd
Jun 12 08:51:35 nyx kernel: mps0: IOCCapabilities:
185c

After writing ~200gb to our pool it started reseting. I did a

sysctl dev.mps.0.debug_level=0x$((0x1+0x2+0x4+0x10+0x20))

The resulting trace is attached.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives

2019-06-12 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496

--- Comment #4 from Matthias Pfaller  ---
We just did configure a backup server with eight seagate ironwulf
(ST12000VN0007-2GS116) 12TB disks connected to a SAS2008:

Jun 12 08:51:35 nyx kernel: mps0:  port
0x8000-0x80ff mem 0xdf70-0xdf703fff,0xdf68-0xdf6b irq 32 at device
0.0 on pci3
Jun 12 08:51:35 nyx kernel: mps0: Firmware: 20.00.07.00, Driver:
21.02.00.00-fbsd
Jun 12 08:51:35 nyx kernel: mps0: IOCCapabilities:
185c

After writing ~200gb to our pool it started reseting. I did a

sysctl dev.mps.0.debug_level=0x$((0x1+0x2+0x4+0x10+0x20))

The resulting trace is attached.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives

2019-06-12 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496

Matthias Pfaller  changed:

   What|Removed |Added

 CC||matthias.pfaller@familie-pf
   ||aller.de

--- Comment #3 from Matthias Pfaller  ---
Created attachment 205003
  --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=205003=edit
/var/log/messages during device resets

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives

2019-03-23 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496

--- Comment #2 from Bane Ivosev  ---
Forgot to say, its FreeBSD 12-RELEASE.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives

2019-03-23 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496

Bane Ivosev  changed:

   What|Removed |Added

 CC||bane.ivo...@pmf.uns.ac.rs

--- Comment #1 from Bane Ivosev  ---
We have the very same problem but with WD Red disks. System randomly reboot
sometimes after 20 days of working. Different disk everytime. It's our
production nfs server and now it's very frustrating.

Supermicro 5049p
64 GB ECC RAM
LSI 3008 IT mode
18x WD Red 4 TB

Mar 23 07:39:46 fap kernel:   (da17:mpr0:0:25:0): SYNCHRONIZE CACHE(10).
CDB: 35 00 00 00 00 00 00 00 00 00 length 0 SMID 357 Command timeout on target
25(0x001c), 6 set, 60.107418057 elapsed
Mar 23 07:39:46 fap kernel: mpr0: At enclosure level 0, slot 17, connector name
()
Mar 23 07:39:46 fap kernel: mpr0: Sending abort to target 25 for SMID 357
Mar 23 07:39:46 fap kernel:   (da17:mpr0:0:25:0): SYNCHRONIZE CACHE(10).
CDB: 35 00 00 00 00 00 00 00 00 00 length 0 SMID 357 Aborting command
0xfe00b7aa6130
Mar 23 07:39:46 fap kernel:   (pass19:mpr0:0:25:0): ATA COMMAND PASS
THROUGH(16). CDB: 85 08 0e 00 d0 00 01 00 00 00 4f 00 c2 00 b0 00 length 512
SMID 1182 Command timeout on target 25(0x001c), 6 set, 60.217681679
elampr0: At enclosure level 0, slot 17, connector name ()
Mar 23 07:39:46 fap kernel: mpr0: Controller reported scsi ioc terminated tgt
25 SMID 1182 loginfo 3113
Mar 23 07:39:46 fap kernel: mpr0: Abort failed for target 25, sending logical
unit reset
Mar 23 07:39:46 fap kernel: mpr0: Sending logical unit reset to target 25 lun 0
Mar 23 07:39:46 fap kernel: mpr0: At enclosure level 0, slot 17, connector name
()
Mar 23 07:39:46 fap kernel: (da17:mpr0:0:25:0): SYNCHRONIZE CACHE(10). CDB: 35
00 00 00 00 00 00 00 00 00 
Mar 23 07:39:46 fap kernel: (da17:mpr0:0:25:0): CAM status: CCB request aborted
by the host
Mar 23 07:39:46 fap kernel: (da17:mpr0:0:25:0): Retrying command, 0 more tries
remain
Mar 23 07:39:46 fap kernel: mpr0: mprsas_action_scsiio: Freezing devq for
target ID 25
Mar 23 07:39:46 fap kernel: (da17:mpr0:0:25:0): SYNCHRONIZE CACHE(10). CDB: 35
00 00 00 00 00 00 00 00 00 
Mar 23 07:39:46 fap kernel: (da17:mpr0:0:25:0): CAM status: CAM subsystem is
busy
Mar 23 07:39:46 fap kernel: (da17:mpr0:0:25:0): Error 5, Retries exhausted
Mar 23 07:39:46 fap smartd[95746]: Device: /dev/da17 [SAT], failed to read
SMART Attribute Data
Mar 23 07:39:46 fap kernel: mpr0: mprsas_action_scsiio: Freezing devq for
target ID 25
Mar 23 07:39:46 fap kernel: (da17:mpr0:0:25:0): WRITE(10). CDB: 2a 00 09 4a 32
a8 00 00 08 00 
Mar 23 07:39:46 fap kernel: (da17:mpr0:0:25:0): CAM status: CAM subsystem is
busy
Mar 23 07:39:46 fap kernel: (da17:mpr0:0:25:0): Retrying command, 3 more tries
remain
Mar 23 07:43:19 fap syslogd: kernel boot file is /boot/kernel/kernel
Mar 23 07:43:19 fap kernel: ---<>---

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives

2017-12-21 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496

Christoph Bubel  changed:

   What|Removed |Added

   Hardware|Any |amd64
   Severity|Affects Only Me |Affects Some People

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 224496] mpr and mps drivers seems to have issues with large seagate drives

2017-12-21 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224496

Bug ID: 224496
   Summary: mpr and mps drivers seems to have issues with large
seagate drives
   Product: Base System
   Version: 11.1-STABLE
  Hardware: Any
OS: Any
Status: New
  Severity: Affects Only Me
  Priority: ---
 Component: kern
  Assignee: freebsd-bugs@FreeBSD.org
  Reporter: cbu...@mailbox.org

Over on the Freenas forums several people reported issues with large (10TB)
Seagate drives (ST1NM0016 and ST1VN0004) and LSI controllers. Links to
the threads:
https://forums.freenas.org/index.php?threads/lsi-avago-9207-8i-with-seagate-10tb-enterprise-st1nm0016.58251/
https://forums.freenas.org/index.php?threads/synchronize-cache-command-timeout-error.55067/

I am using the ST1NM0016 drives and i am getting the following errors on a
LSI SAS2308 (mps driver) and on a LSI SAS3008 (mpr driver). This happens about
once every one or two weeks in low load situations. 

Here the logs:

(da2:mps0:0:1:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00
length 0 SMID 1010 command timeout cm 0xfef7cda0 ccb 0xf8018198d800
(noperiph:mps0:0:4294967295:0): SMID 1 Aborting command 0xfef7cda0
mps0: Sending reset from mpssas_send_abort for target ID 1
(da2:mps0:0:1:0): WRITE(16). CDB: 8a 00 00 00 00 02 e1 76 9f 88 00 00 00 08 00
00 length 4096 SMID 959 terminated ioc 804b scsi 0 state c xfer 0
mps0: Unfreezing devq for target ID 1
(da2:mps0:0:1:0): WRITE(16). CDB: 8a 00 00 00 00 02 e1 76 9f 88 00 00 00 08 00
00 
(da2:mps0:0:1:0): CAM status: CCB request completed with an error
(da2:mps0:0:1:0): Retrying command
(da2:mps0:0:1:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 
(da2:mps0:0:1:0): CAM status: Command timeout
(da2:mps0:0:1:0): Retrying command
(da2:mps0:0:1:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 
(da2:mps0:0:1:0): CAM status: SCSI Status Error
(da2:mps0:0:1:0): SCSI status: Check Condition
(da2:mps0:0:1:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus
device reset occurred)
(da2:mps0:0:1:0): Error 6, Retries exhausted
(da2:mps0:0:1:0): Invalidating pack

---

(da1:mpr0:0:4:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00
length 0 SMID 319 Aborting command 0xfef54a90
mpr0: Sending reset from mprsas_send_abort for target ID 4
(da1:mpr0:0:4:0): WRITE(16). CDB: 8a 00 00 00 00 03 35 b7 b9 f0 00 00 00 28 00
00 length 20480 SMID 320 terminated ioc 804b loginfo 3113 scsi 0 state c
xfer 0
mpr0: Unfreezing devq for target ID 4
(da1:mpr0:0:4:0): WRITE(16). CDB: 8a 00 00 00 00 03 35 b7 b9 f0 00 00 00 28 00
00 
(da1:mpr0:0:4:0): CAM status: CCB request completed with an error
(da1:mpr0:0:4:0): Retrying command
(da1:mpr0:0:4:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 
(da1:mpr0:0:4:0): CAM status: Command timeout
(da1:mpr0:0:4:0): Retrying command
(da1:mpr0:0:4:0): WRITE(16). CDB: 8a 00 00 00 00 03 35 b7 b9 f0 00 00 00 28 00
00 
(da1:mpr0:0:4:0): CAM status: SCSI Status Error
(da1:mpr0:0:4:0): SCSI status: Check Condition
(da1:mpr0:0:4:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus
device reset occurred)
(da1:mpr0:0:4:0): Retrying command (per sense data)
(da1:mpr0:0:4:0): WRITE(16). CDB: 8a 00 00 00 00 03 35 b7 ba 80 00 00 00 20 00
00 length 16384 SMID 653 terminated ioc 804b loginfo 31110e03 scsi 0 state c
xfer 0
(da1:mpr0:0:4:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00
length 0 SMID 711 terminated ioc 804b loginfo 311(da1:mpr0:0:4:0): WRITE(16).
CDB: 8a 00 00 00 00 03 35 b7 ba 80 00 00 00 20 00 00 
10e03 scsi 0 state c xfer 0
(da1:mpr0:0:4:0): CAM status: CCB request completed with an error
(da1:mpr0:0:4:0): Retrying command
(da1:mpr0:0:4:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 
(da1:mpr0:0:4:0): CAM status: CCB request completed with an error
(da1:mpr0:0:4:0): Retrying command
(da1:mpr0:0:4:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 
(da1:mpr0:0:4:0): CAM status: SCSI Status Error
(da1:mpr0:0:4:0): SCSI status: Check Condition
(da1:mpr0:0:4:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus
device reset occurred)
(da1:mpr0:0:4:0): Error 6, Retries exhausted
(da1:mpr0:0:4:0): Invalidating pack
(pass1:mpr0:0:4:0): ATA COMMAND PASS THROUGH(16). CDB: 85 06 2c 00 da 00 00 00
00 00 4f 00 c2 00 b0 00 length 0 SMID 797 terminated ioc 804b loginfo 31110e03
scsi 0 state c xfer 0
(pass1:mpr0:0:4:0): ATA COMMAND PASS THROUGH(16). CDB: 85 08 0e 00 d0 00 01 00
00 00 4f 00 c2 00 b0 00 length 512 SMID 753 terminated ioc 804b loginfo
31110e03 scsi 0 state c xfer 0
(pass1:mpr0:0:4:0): ATA COMMAND PASS THROUGH(16). CDB: 85 08 0e 00 d5 00 01 00
06 00 4f 00 c2 00 b0 00 length 512 SMID 846 terminated ioc 804b loginfo
31110e03 scsi 0 state c xfer 0
(pass1:mpr0:0:4:0): ATA COMMAND PASS THROUGH(16). CDB: 85 08 0e 00 d5 00 01