On 28/10/2011 04:14, Jan Mikkelsen wrote:
Hi,
There is a patch linked to from this PR, which seems very similar:
http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/140416
http://lists.freebsd.org/pipermail/freebsd-scsi/2011-March/004839.html
The problem is also consistent with running mfiutil clearing the problem.
I'm about to deploy mfi controllers in a similar configuration, so I'd be
very curious about whether the patch fixes the problem for you.
This looks promising, I'll give a try when I get a moment.
Thanks,
Vince
Regards,
Jan Mikkelsen
On 28/10/2011, at 10:39 AM, Vincent Hoffman wrote:
On 28/10/2011 00:04, Jeremy Chadwick wrote:
On Thu, Oct 27, 2011 at 11:52:51PM +0100, Vincent Hoffman wrote:
I've recently installed a new NAS at work which uses a rebranded LSI
megaraid sas
[root@banshee ~]# mfiutil show adapter
mfi0 Adapter:
Product Name: Supermicro SMC2108
Serial Number:
Firmware: 12.12.0-0047
RAID Levels: JBOD, RAID0, RAID1, RAID5, RAID6, RAID10, RAID50
Battery Backup: present
NVRAM: 32K
Onboard Memory: 512M
Minimum Stripe: 8k
Maximum Stripe: 1M
I'm running 8-STABLE as of 2011-10-23 (for zfs v28 as is got 26 3Tb drives)
I'm seeing a lot of messages like
mfi0: COMMAND 0xff8000b216c8 TIMEOUT AFTER 60 SECONDS
mfi0: COMMAND 0xff8000b216c8 TIMEOUT AFTER 90 SECONDS
mfi0: COMMAND 0xff8000b216c8 TIMEOUT AFTER 120 SECONDS
mfi0: COMMAND 0xff8000b216c8 TIMEOUT AFTER 150 SECONDS
mfi0: COMMAND 0xff8000b216c8 TIMEOUT AFTER 180 SECONDS
mfi0: COMMAND 0xff8000b216c8 TIMEOUT AFTER 210 SECONDS
mfi0: COMMAND 0xff8000b216c8 TIMEOUT AFTER 240 SECONDS
mfi0: COMMAND 0xff8000b216c8 TIMEOUT AFTER 271 SECONDS
mfi0: COMMAND 0xff8000b216c8 TIMEOUT AFTER 301 SECONDS
mfi0: COMMAND 0xff8000b216c8 TIMEOUT AFTER 331 SECONDS
mfi0: COMMAND 0xff8000b216c8 TIMEOUT AFTER 361 SECONDS
mfi0: COMMAND 0xff8000b216c8 TIMEOUT AFTER 391 SECONDS
mfi0: COMMAND 0xff8000b21b08 TIMEOUT AFTER 55 SECONDS
mfi0: COMMAND 0xff8000b21b08 TIMEOUT AFTER 85 SECONDS
At which time I'm seeing IO stall on the array connected to the mfi
adapter, this can continue for
20 minutes or so resuming randomly (or so it seems although a little
more on this later on)
From pciconf -lv
mfi0@pci0:5:0:0:class=0x010400 card=0x070015d9 chip=0x00791000
rev=0x04 hdr=0x00
vendor = 'LSI Logic (Was: Symbios Logic, NCR)'
class = mass storage
subclass = RAID
From dmesg
mfi0: LSI MegaSAS Gen2 port 0xe000-0xe0ff mem
0xfbd9c000-0xfbd9,0xfbdc-0xfbdf irq 32 at device 0.0 on pci5
mfi0: Megaraid SAS driver Ver 3.00
mfi0: 12330 (372962922s/0x0020/info) - Shutdown command received from host
mfi0: 12331 (boot + 4s/0x0020/info) - Firmware initialization started
(PCI ID 0079/1000/0700/15d9)
mfi0: 12332 (boot + 4s/0x0020/info) - Firmware version 2.120.53-1235
mfi0: 12333 (boot + 7s/0x0008/info) - Battery Present
mfi0: 12334 (boot + 7s/0x0020/info) - Package version 12.12.0-0047
mfi0: 12335 (boot + 7s/0x0020/info) - Board Revision
I have found this thread from a bit of googleing but it doesnt end too
well.
http://lists.freebsd.org/pipermail/freebsd-stable/2011-September/063821.html
Was this ever taken further?
One thing I have noticed is that the stall (and timeout messages) seem
to go away if I query the card using mfiutil, I currently have a cron
doing this every 2 minutes to see if this has been coincidence or not.
Any suggestions welcome and i'm happy to provide more info if i can but
I dont have a duplicate to do too much debugging on, I'm happy to try
patches though.
Is this worth filing a PR?
Can you please provide uname -a output? The version of FreeBSD you're
using matters greatly here.
Sure
FreeBSD banshee.foobar.net 8.2-STABLE FreeBSD 8.2-STABLE #2: Wed Oct 26
16:14:09 BST 2011
t...@banshee.foobar.net:/usr/obj/usr/src/sys/BANSHEE amd64
[root@banshee /usr/src]# svn info
Path: .
Working Copy Root Path: /usr/src
URL: http://svn.freebsd.org/base/stable/8
Repository Root: http://svn.freebsd.org/base
Repository UUID: ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f
Revision: 226708
Node Kind: directory
Schedule: normal
Last Changed Author: brueffer
Last Changed Rev: 226671
Last Changed Date: 2011-10-23 19:37:57 +0100 (Sun, 23 Oct 2011)
It's looking like the mfiutil query stopping the stall is not a coincidence
the last 2 have lasted less than the every 2 minutes that i set the cron
to run, much less than previously.
The cron is a simple /usr/sbin/mfiutil show volumes | grep -v OPTIMAL
So get at least get an email if the volume breaks ;)
Oct 28 00:01:06 banshee mfi0: COMMAND 0xff8000b22d18 TIMEOUT AFTER
59 SECONDS
Oct 28 00:01:36 banshee mfi0: COMMAND 0xff8000b22d18 TIMEOUT AFTER
89 SECONDS
Oct 28 00:13:09 banshee mfi0: COMMAND 0xff8000b205c8 TIMEOUT AFTER
50 SECONDS
Oct 28 00:13:39 banshee mfi0: COMMAND 0xff8000b205c8 TIMEOUT