Re: Supermicro Bladeserver
On Wed, Jan 12, 2011 at 03:13 -, you wrote: Out of interest what change was that? As what seems to have been a left-over from a debugging session a long time ago, I had MSI disabled in loader.conf. That's not supported by the driver. So simply reenabling that solved my problem. Robin -- Robin Sommer * Phone +1 (510) 722-6541 * ro...@icir.org ICSI/LBNL* Fax +1 (510) 666-2956 * www.icir.org ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: File system trouble with ICH9 controller
On Thu, Jun 10, 2010 at 14:06 -0700, I wrote: Thanks for your quick response. I don't need much in terms of long-term data reliability on these machines (thus the RAID 0). However, if MatrixRAID is unreliably even without further external events (like disk problems/changes), I'll turn it off. An update on this: I have now turned off the RAID on half of my blades, leaving the other half untouched. After a few days, 3 of those systems still using the RAID have experienced similar fs corruption as reported before, while all the blades wo/ RAID have been running fine. So, that looks like the RAID is indeed to blame and I'll turn it off for all systems now. Robin -- Robin Sommer * Phone +1 (510) 666-2886 * ro...@icir.org ICSI/LBNL* Fax +1 (510) 666-2956 * www.icir.org ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
File system trouble with ICH9 controller
I'm running 8.0-RELEASE-p2 (amd64) on a larger number of Supermicro SBI-7425C-T3 blades. Each of the blades has 2 x 500GB disks striped into a single volume via the on-board ICH9 RAID controller. However, after running fine for a while (days), the blades crash eventually with file system problems such as the one below. Initially I thought that must be a bad disk, but by now 5 different blades have shown similar problems so I'm suspecting some OS issue. Has anybody seen something similar before? Could this be an incompatibility with the RAID controller (I haven't found much recent on Google but there are a number of older threads indicating that it might not be well supported. Not sure though whether those still apply). Any other thoughts? Thanks, Robin - syslog --- Jun 9 10:00:02 user.crit blade19 kernel: ar0s1a[WRITE(offset=704187858944, length=114688)]error = 5 Jun 9 10:00:02 user.crit blade19 kernel: g_vfs_done():ar0s1a[WRITE(offset=704188219392, length=131072)]error = 5 Jun 9 10:00:02 user.crit blade19 kernel: g_vfs_done():ar0s1a[WRITE(offset=704188891136, length=114688)]error = 5 Jun 9 10:00:02 user.crit blade19 kernel: g_vfs_done():ar0s1a[WRITE(offset=704189382656, length=114688)]error = 5 Jun 9 10:00:02 user.crit blade19 kernel: g_vfs_done():ar0s1a[WRITE(offset=704189743104, length=131072)] Jun 9 10:00:02 user.crit blade19 kernel: error = 5 - system information -- # uname -a FreeBSD blade5 8.0-RELEASE-p2 FreeBSD 8.0-RELEASE-p2 #0: Tue Jan 5 21:11:58 UTC 2010 r...@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64 # pciconf -lv | grep SATA device = '82801IB/IR/IH (ICH9 Family) SATA RAID Controller' # atacontrol list ATA channel 2: Master: ad4 ST9500325AS/0001SDM1 SATA revision 2.x Slave: no device present ATA channel 3: Master: ad6 ST9500325AS/0001SDM1 SATA revision 2.x Slave: no device present # dmesg | grep ata atapci0: Intel ICH9 SATA300 controller port 0x1c50-0x1c57,0x1c44-0x1c47,0x1c48-0x1c4f,0x1c40-0x1c43,0x18e0-0x18ff mem 0xfcc0-0xfcc007ff irq 17 at device 31.2 on pci0 atapci0: [ITHREAD] atapci0: AHCI called from vendor specific driver atapci0: AHCI v1.20 controller with 6 3Gbps ports, PM supported ata2: ATA channel 0 on atapci0 ata2: [ITHREAD] ata3: ATA channel 1 on atapci0 ata3: [ITHREAD] ata4: ATA channel 2 on atapci0 ata4: stopping AHCI engine failed ata4: [ITHREAD] ata5: ATA channel 3 on atapci0 ata5: stopping AHCI engine failed ata5: [ITHREAD] ata6: ATA channel 4 on atapci0 ata6: [ITHREAD] ata7: ATA channel 5 on atapci0 ata7: [ITHREAD] ad4: 476940MB Seagate ST9500325AS 0001SDM1 at ata2-master SATA300 ad6: 476940MB Seagate ST9500325AS 0001SDM1 at ata3-master SATA300 ar0: writing of DDF metadata is NOT supported yet ar0: disk0 READY using ad4 at ata2-master ar0: disk1 READY using ad6 at ata3-master -- Robin Sommer * Phone +1 (510) 666-2886 * ro...@icir.org ICSI/LBNL* Fax +1 (510) 666-2956 * www.icir.org ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
File system trouble with ICH9 controller
I'm running 8.0-RELEASE-p2 (amd64) on a larger number of Supermicro SBI-7425C-T3 blades. Each of the blades has 2 x 500GB disks striped into a single volume via the on-board ICH9 RAID controller. However, after running fine for a while (days), the blades crash eventually with file system problems such as the one below. Initially I thought that must be a bad disk, but by now 5 different blades have shown similar problems so I'm suspecting some OS issue. Has anybody seen something similar before? Could this be an incompatibility with the RAID controller (I haven't found much recent on Google but there are a number of older threads indicating that it might not be well supported. Not sure though whether those still apply). Any other thoughts? Thanks, Robin - syslog --- Jun 9 10:00:02 user.crit blade19 kernel: ar0s1a[WRITE(offset=704187858944, length=114688)]error = 5 Jun 9 10:00:02 user.crit blade19 kernel: g_vfs_done():ar0s1a[WRITE(offset=704188219392, length=131072)]error = 5 Jun 9 10:00:02 user.crit blade19 kernel: g_vfs_done():ar0s1a[WRITE(offset=704188891136, length=114688)]error = 5 Jun 9 10:00:02 user.crit blade19 kernel: g_vfs_done():ar0s1a[WRITE(offset=704189382656, length=114688)]error = 5 Jun 9 10:00:02 user.crit blade19 kernel: g_vfs_done():ar0s1a[WRITE(offset=704189743104, length=131072)] Jun 9 10:00:02 user.crit blade19 kernel: error = 5 - system information -- # uname -a FreeBSD blade5 8.0-RELEASE-p2 FreeBSD 8.0-RELEASE-p2 #0: Tue Jan 5 21:11:58 UTC 2010 r...@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64 # pciconf -lv | grep SATA device = '82801IB/IR/IH (ICH9 Family) SATA RAID Controller' # atacontrol list ATA channel 2: Master: ad4 ST9500325AS/0001SDM1 SATA revision 2.x Slave: no device present ATA channel 3: Master: ad6 ST9500325AS/0001SDM1 SATA revision 2.x Slave: no device present # dmesg | grep ata atapci0: Intel ICH9 SATA300 controller port 0x1c50-0x1c57,0x1c44-0x1c47,0x1c48-0x1c4f,0x1c40-0x1c43,0x18e0-0x18ff mem 0xfcc0-0xfcc007ff irq 17 at device 31.2 on pci0 atapci0: [ITHREAD] atapci0: AHCI called from vendor specific driver atapci0: AHCI v1.20 controller with 6 3Gbps ports, PM supported ata2: ATA channel 0 on atapci0 ata2: [ITHREAD] ata3: ATA channel 1 on atapci0 ata3: [ITHREAD] ata4: ATA channel 2 on atapci0 ata4: stopping AHCI engine failed ata4: [ITHREAD] ata5: ATA channel 3 on atapci0 ata5: stopping AHCI engine failed ata5: [ITHREAD] ata6: ATA channel 4 on atapci0 ata6: [ITHREAD] ata7: ATA channel 5 on atapci0 ata7: [ITHREAD] ad4: 476940MB Seagate ST9500325AS 0001SDM1 at ata2-master SATA300 ad6: 476940MB Seagate ST9500325AS 0001SDM1 at ata3-master SATA300 ar0: writing of DDF metadata is NOT supported yet ar0: disk0 READY using ad4 at ata2-master ar0: disk1 READY using ad6 at ata3-master -- Robin Sommer * Phone +1 (510) 666-2886 * ro...@icir.org ICSI/LBNL* Fax +1 (510) 666-2956 * www.icir.org ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: File system trouble with ICH9 controller
100 100 097Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 037 020Old_age Always - 9 184 End-to-End_Error0x0032 100 100 099Old_age Always - 0 187 Reported_Uncorrect 0x0032 100 100 000Old_age Always - 0 188 Command_Timeout 0x0032 100 100 000Old_age Always - 0 189 High_Fly_Writes 0x003a 100 100 000Old_age Always - 0 190 Airflow_Temperature_Cel 0x0022 079 074 045Old_age Always - 21 (Lifetime Min/Max 20/21) 191 G-Sense_Error_Rate 0x0032 100 100 000Old_age Always - 0 192 Power-Off_Retract_Count 0x0032 100 100 000Old_age Always - 0 193 Load_Cycle_Count0x0032 075 075 000Old_age Always - 51868 194 Temperature_Celsius 0x0022 021 040 000Old_age Always - 21 (0 18 0 0) 195 Hardware_ECC_Recovered 0x001a 051 050 000Old_age Always - 37297906 197 Current_Pending_Sector 0x0012 100 100 000Old_age Always - 0 198 Offline_Uncorrectable 0x0010 100 100 000Old_age Offline - 0 199 UDMA_CRC_Error_Count0x003e 200 200 000Old_age Always - 0 254 Free_Fall_Sensor0x0032 100 100 000Old_age Always - 0 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 No self-tests have been logged. [To run self-tests, use: smartctl -t] SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 100 Not_testing 200 Not_testing 300 Not_testing 400 Not_testing 500 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. -- Robin Sommer * Phone +1 (510) 666-2886 * ro...@icir.org ICSI/LBNL* Fax +1 (510) 666-2956 * www.icir.org ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org