[Bug 239801] mfi errors causing zfs checksum errors

2019-09-23 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=239801

--- Comment #9 from Daniel Mafua  ---
I think I can safely say that switching to the mrsas driver has resolved my
problem. Thanks to everyone!

For reference I was experiencing problems with Dell servers of various models
using the PERC H330 controller. Firmware versions were 25.3.0.0016 and
25.5.2.0001.

A PERC H310 Adapter Firmware: 20.13.0-0007 wasn't experiencing the problem and
continued to use the mfi driver.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 239801] mfi errors causing zfs checksum errors

2019-09-04 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=239801

--- Comment #8 from Timur I. Bakeyev  ---
(In reply to Andriy Gapon from comment #7)

Yes. Basically, it has a correct statement - for PERC H730P mrsas driver is a
better(and proper) option. I.e. mfi shouldn't even be picked up by the kernel
by default for it.

After switching to mrsas we've seen double in performance and overall
bandwidth, not to mention disappearing of those weird I/O error bugs.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 239801] mfi errors causing zfs checksum errors

2019-09-04 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=239801

--- Comment #7 from Andriy Gapon  ---
(In reply to Timur I. Bakeyev from comment #6)
Do you mean bug 230557 ?

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 239801] mfi errors causing zfs checksum errors

2019-09-04 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=239801

Timur I. Bakeyev  changed:

   What|Removed |Added

   Severity|Affects Only Me |Affects Some People

--- Comment #6 from Timur I. Bakeyev  ---
The #230557 is kind of related to this issue

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 239801] mfi errors causing zfs checksum errors

2019-09-03 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=239801

Hiroki Sato  changed:

   What|Removed |Added

 Status|Open|In Progress

--- Comment #5 from Hiroki Sato  ---
(In reply to Daniel Mafua from comment #4)

Thanks for your report.  I have several other systems which are suffering from
the same problem.  Here is information about this problem which I got so far:

* mfi(4) can report I/O errors which is not related to an actual hardware
failure
  after upgrading FreeBSD to 11.3 or 12.0.

* The error does not depend on ZFS while ZFS is very likely to report
  checksum errors of the zpool.  On a system using UFS, the I/O error
  can cause a system panic, boot failure, or something fatal.

* It seems that the I/O error depends on a specific firmware version.  Some
older
  firmware versions work fine even with mfi(4) on 11.3 and 12.0. 

* If the device is also supported by mrsas(4), switching to it will solves
  the error.  Note that it will cause an incompatibility issue---mfi(4) uses
  /dev/mfi* device nodes for the attached drives and mfiutil(8) as the userland
  utility.  mrsas(4) uses /dev/da* and a vendor-supplied utility such as
  megaCli instead.

I am investigating what caused this regression but a workaround is to use
mrsas(4) instead of mfi(4) by specifying hw.mfi.mrsas_enable="1" at boot time.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 239801] mfi errors causing zfs checksum errors

2019-09-03 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=239801

--- Comment #4 from Daniel Mafua  ---
The firmware has not been upgraded on any of the servers. 

I was able to switch from mfi to mrsas on one of the servers, but because
they're at remote locations I need to be careful on how aggressive I am on
this. I've successfully scrubbed out errors and everything is clear now.
However, this also has been the case after any reboot, and errors started
appearing again around 3-4 days later. It will probably take me two weeks to
verify it's working okay, then I'll report back.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 239801] mfi errors causing zfs checksum errors

2019-09-03 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=239801

--- Comment #3 from ma...@tols.org ---
Hi, so my bit more detailed story is this:

We have 2 identical Dell R730xd systems with each 12 6TB drives in them,
running on raidz2.  It also has 2 SSD's in it which add to the system as ZIL
and cache drives.

Both have been running 11.1-RELEASE which gave no problems.  The systems also
have been on 11.2-RELEASE, also without any problems.  In between the upgrades
I have also done "zpool upgrade" where available.

Then I upgraded to version 11.3-RELEASE, and trusting that all was as flawless
as in the first 2 years, I gave it not much attention other then keeping an eye
on it from the monitoring host.  Unfortunately we only monitor zpool status,
which has been ONLINE throughout the entire process.

I left it running for quite a while when at some point I wanted to show the
pool to someone and found out both systems had a few 100K checksum errors on
each of the 12 drives in the pool.  Not on the SSDs that make up the ZIL and
cache.

One of the systems was running 11.3-RELEASE-p1, and the other was running
11.3-RELEASE-p3.

My path to fixing this was this:
- Google for the issue, and find out mrsas was a potential fix
- Change the system on p3 to mrsas and reboot
- Do a zpool scrub to autoheal all broken sectors, which ended up fixing almost
100GB of data
- Do a zpool clear to clear the counters
- Do another zpool scrub to see if the counters remain 0, which they did.
- Do the same on the other system, which was on p1, and bring it to p3 in the
process

Hope this helps,

Marco van Tol

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 239801] mfi errors causing zfs checksum errors

2019-09-03 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=239801

Hiroki Sato  changed:

   What|Removed |Added

 Status|New |Open

--- Comment #2 from Hiroki Sato  ---
Can you tell me whether you updated or not the firmware before or after
upgrading FreeBSD?

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 239801] mfi errors causing zfs checksum errors

2019-09-03 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=239801

ma...@tols.org changed:

   What|Removed |Added

 CC||ma...@tols.org

--- Comment #1 from ma...@tols.org ---
I had this exact same problem with a large array on a PERC H730/P.

I could fix this by changing the controller driver from mfi to mrsas.

See
https://www.freebsd.org/cgi/man.cgi?query=mrsas=0=0=FreeBSD+11.3-RELEASE=default=html
on details on how to do that. (loader.conf variable)

Please make sure you do a "zfs scrub" afterwards to autoheal all checksum
errors.

Marco van Tol

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 239801] mfi errors causing zfs checksum errors

2019-08-13 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=239801

Mark Linimon  changed:

   What|Removed |Added

   Keywords||regression

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 239801] mfi errors causing zfs checksum errors

2019-08-12 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=239801

Bug ID: 239801
   Summary: mfi errors causing zfs checksum errors
   Product: Base System
   Version: 11.3-RELEASE
  Hardware: amd64
OS: Any
Status: New
  Severity: Affects Only Me
  Priority: ---
 Component: kern
  Assignee: b...@freebsd.org
  Reporter: ma...@dempseyuniform.com

After upgrading to FreeBSD 11.3 I began having storage issues. (I may have
experienced the same problem when testing FreeBSD 12.0).  At first I thought it
was a problem with upgrading my ZFS pool to include the update for spacemap_v2,
but now I think it's a lower level problem.  After a day or two, I start to see
checksum errors appear on a pool. Doing a scrub on the pool just results in
more errors. If I restart the machine, I can then scrub the pool and errors no
longer appear, then after a few days it happens again.

There's only one or two machines I haven't upgraded my zpool on, but I think
it's just a coincidence that I saw the problem after upgrading the zpool
(thinking the upgrade went okay). But I haven't experienced problems on the one
machine I haven't upgraded the pool on.

All (3) servers are Dell PowerEdge servers, using a PERC raid controller
configured as JBOD.  They have two drives configured as a zfs mirror. One
machine had a drive failure a few months ago, which was replaced and running
fine, but otherwise all machines have been running for years with no issues and
never failed a scrub until the upgrade. They're at different locations, so that
probably rules out power issues.



mfi0 Adapter:
Product Name: PERC H330 Adapter
   Serial Number: 59N01F6
Firmware: 25.3.0.0016
 RAID Levels: JBOD, RAID0, RAID1, RAID5, RAID10, RAID50
  Battery Backup: not present
   NVRAM: 32K
  Onboard Memory: 0M
  Minimum Stripe: 64K
  Maximum Stripe: 64K

mfi0 Physical Drives:
 0 (  932G) JBOD  SCSI-6 S0
 1 (  932G) JBOD  SCSI-6 S1


Errors Reported in /var/log/messages

kernel: mfi0: I/O error, cmd=0xfef9a4a8, status=0x3c, scsi_status=0
kernel: mfi0: sense error 0, sense_key 0, asc 0, ascq 0
kernel: mfisyspd1: hard error cmd=write 675298440-675298991
kernel: mfi0: I/O error, cmd=0xfef99dc0, status=0x3c, scsi_status=0
kernel: mfi0: sense error 0, sense_key 0, asc 0, ascq 0
kernel: mfisyspd1: hard error cmd=write 675298504-675299015
kernel: mfi0: I/O error, cmd=0xfef9c048, status=0x3c, scsi_status=0
kernel: mfi0: sense error 0, sense_key 0, asc 0, ascq 0
kernel: mfisyspd0: hard error cmd=write 675298504-675299015
kernel: mfi0: I/O error, cmd=0xfef9ae38, status=0x3c, scsi_status=0
kernel: mfi0: sense error 0, sense_key 0, asc 0, ascq 0
kernel: mfisyspd0: hard error cmd=write 675298440-675298991
kernel: mfi0: I/O error, cmd=0xfef9c048, status=0x3c, scsi_status=0
kernel: mfi0: sense error 0, sense_key 0, asc 0, ascq 0
kernel: mfisyspd0: hard error cmd=write 675298440-675298991
kernel: mfi0: I/O error, cmd=0xfef9afd0, status=0x3c, scsi_status=0
kernel: mfi0: sense error 0, sense_key 0, asc 0, ascq 0
kernel: mfisyspd1: hard error cmd=write 675298440-675298991
kernel: mfi0: I/O error, cmd=0xfef98bb0, status=0x3c, scsi_status=0
kernel: mfi0: sense error 47, sense_key 15, asc 175, ascq 175
kernel: mfisyspd1: hard error cmd=write 675298440-675298991
kernel: mfi0: I/O error, cmd=0xfef9a6c8, status=0x3c, scsi_status=0
kernel: mfi0: sense error 0, sense_key 0, asc 0, ascq 0
kernel: mfisyspd0: hard error cmd=write 675298440-675298991
kernel: mfi0: I/O error, cmd=0xfef9b0e0, status=0x3c, scsi_status=0
kernel: mfi0: sense error 0, sense_key 0, asc 0, ascq 0
kernel: mfisyspd0: hard error cmd=write 675298440-675298991

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"