Re: Supermicro Bladeserver

2011-01-11 Thread Robin Sommer

On Wed, Jan 12, 2011 at 03:13 -, you wrote:

 Out of interest what change was that?

As what seems to have been a left-over from a debugging session a
long time ago, I had MSI disabled in loader.conf. That's not
supported by the driver. So simply reenabling that solved my
problem.

Robin

-- 
Robin Sommer * Phone +1 (510) 722-6541 * ro...@icir.org
ICSI/LBNL* Fax   +1 (510) 666-2956 *   www.icir.org
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: File system trouble with ICH9 controller

2010-06-15 Thread Robin Sommer

On Thu, Jun 10, 2010 at 14:06 -0700, I wrote:

 Thanks for your quick response. I don't need much in terms of
 long-term data reliability on these machines (thus the RAID 0).
 However, if MatrixRAID is unreliably even without further external
 events (like disk problems/changes), I'll turn it off. 

An update on this: I have now turned off the RAID on half of my
blades, leaving the other half untouched. After a few days, 3 of
those systems still using the RAID have experienced similar fs
corruption as reported before, while all the blades wo/ RAID have
been running fine. 

So, that looks like the RAID is indeed to blame and I'll turn it off
for all systems now.

Robin

-- 
Robin Sommer * Phone +1 (510) 666-2886 * ro...@icir.org 
ICSI/LBNL* Fax   +1 (510) 666-2956 *   www.icir.org
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


File system trouble with ICH9 controller

2010-06-10 Thread Robin Sommer
I'm running 8.0-RELEASE-p2 (amd64) on a larger number of Supermicro
SBI-7425C-T3 blades. Each of the blades has 2 x 500GB disks striped
into a single volume via the on-board ICH9 RAID controller. 

However, after running fine for a while (days), the blades crash
eventually with file system problems such as the one below.
Initially I thought that must be a bad disk, but by now 5 different
blades have shown similar problems so I'm suspecting some OS issue. 

Has anybody seen something similar before? Could this be an
incompatibility with the RAID controller (I haven't found much
recent on Google but there are a number of older threads indicating
that it might not be well supported. Not sure though whether those
still apply).  

Any other thoughts?

Thanks,

Robin

- syslog ---

Jun  9 10:00:02 user.crit blade19 kernel: ar0s1a[WRITE(offset=704187858944, 
length=114688)]error = 5
Jun  9 10:00:02 user.crit blade19 kernel: 
g_vfs_done():ar0s1a[WRITE(offset=704188219392, length=131072)]error = 5
Jun  9 10:00:02 user.crit blade19 kernel: 
g_vfs_done():ar0s1a[WRITE(offset=704188891136, length=114688)]error = 5
Jun  9 10:00:02 user.crit blade19 kernel: 
g_vfs_done():ar0s1a[WRITE(offset=704189382656, length=114688)]error = 5
Jun  9 10:00:02 user.crit blade19 kernel: 
g_vfs_done():ar0s1a[WRITE(offset=704189743104, length=131072)]
Jun  9 10:00:02 user.crit blade19 kernel: error = 5

- system information  --

# uname -a
FreeBSD blade5 8.0-RELEASE-p2 FreeBSD 8.0-RELEASE-p2 #0: Tue Jan  5 21:11:58 
UTC 2010 r...@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC  amd64

# pciconf -lv | grep SATA
device = '82801IB/IR/IH (ICH9 Family) SATA RAID Controller'

# atacontrol list
ATA channel 2:
Master:  ad4 ST9500325AS/0001SDM1 SATA revision 2.x
Slave:   no device present
ATA channel 3:
Master:  ad6 ST9500325AS/0001SDM1 SATA revision 2.x
Slave:   no device present

# dmesg | grep ata
atapci0: Intel ICH9 SATA300 controller port 
0x1c50-0x1c57,0x1c44-0x1c47,0x1c48-0x1c4f,0x1c40-0x1c43,0x18e0-0x18ff mem 
0xfcc0-0xfcc007ff irq 17 at device 31.2 on pci0
atapci0: [ITHREAD]
atapci0: AHCI called from vendor specific driver
atapci0: AHCI v1.20 controller with 6 3Gbps ports, PM supported
ata2: ATA channel 0 on atapci0
ata2: [ITHREAD]
ata3: ATA channel 1 on atapci0
ata3: [ITHREAD]
ata4: ATA channel 2 on atapci0
ata4: stopping AHCI engine failed
ata4: [ITHREAD]
ata5: ATA channel 3 on atapci0
ata5: stopping AHCI engine failed
ata5: [ITHREAD]
ata6: ATA channel 4 on atapci0
ata6: [ITHREAD]
ata7: ATA channel 5 on atapci0
ata7: [ITHREAD]
ad4: 476940MB Seagate ST9500325AS 0001SDM1 at ata2-master SATA300
ad6: 476940MB Seagate ST9500325AS 0001SDM1 at ata3-master SATA300
ar0: writing of DDF metadata is NOT supported yet
ar0: disk0 READY using ad4 at ata2-master
ar0: disk1 READY using ad6 at ata3-master


-- 
Robin Sommer * Phone +1 (510) 666-2886 * ro...@icir.org 
ICSI/LBNL* Fax   +1 (510) 666-2956 *   www.icir.org
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


File system trouble with ICH9 controller

2010-06-10 Thread Robin Sommer
I'm running 8.0-RELEASE-p2 (amd64) on a larger number of Supermicro
SBI-7425C-T3 blades. Each of the blades has 2 x 500GB disks striped
into a single volume via the on-board ICH9 RAID controller. 

However, after running fine for a while (days), the blades crash
eventually with file system problems such as the one below.
Initially I thought that must be a bad disk, but by now 5 different
blades have shown similar problems so I'm suspecting some OS issue. 

Has anybody seen something similar before? Could this be an
incompatibility with the RAID controller (I haven't found much
recent on Google but there are a number of older threads indicating
that it might not be well supported. Not sure though whether those
still apply).  

Any other thoughts?

Thanks,

Robin

- syslog ---

Jun  9 10:00:02 user.crit blade19 kernel: ar0s1a[WRITE(offset=704187858944, 
length=114688)]error = 5
Jun  9 10:00:02 user.crit blade19 kernel: 
g_vfs_done():ar0s1a[WRITE(offset=704188219392, length=131072)]error = 5
Jun  9 10:00:02 user.crit blade19 kernel: 
g_vfs_done():ar0s1a[WRITE(offset=704188891136, length=114688)]error = 5
Jun  9 10:00:02 user.crit blade19 kernel: 
g_vfs_done():ar0s1a[WRITE(offset=704189382656, length=114688)]error = 5
Jun  9 10:00:02 user.crit blade19 kernel: 
g_vfs_done():ar0s1a[WRITE(offset=704189743104, length=131072)]
Jun  9 10:00:02 user.crit blade19 kernel: error = 5

- system information  --

# uname -a
FreeBSD blade5 8.0-RELEASE-p2 FreeBSD 8.0-RELEASE-p2 #0: Tue Jan  5 21:11:58 
UTC 2010 r...@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC  amd64

# pciconf -lv | grep SATA
device = '82801IB/IR/IH (ICH9 Family) SATA RAID Controller'

# atacontrol list
ATA channel 2:
Master:  ad4 ST9500325AS/0001SDM1 SATA revision 2.x
Slave:   no device present
ATA channel 3:
Master:  ad6 ST9500325AS/0001SDM1 SATA revision 2.x
Slave:   no device present

# dmesg | grep ata
atapci0: Intel ICH9 SATA300 controller port 
0x1c50-0x1c57,0x1c44-0x1c47,0x1c48-0x1c4f,0x1c40-0x1c43,0x18e0-0x18ff mem 
0xfcc0-0xfcc007ff irq 17 at device 31.2 on pci0
atapci0: [ITHREAD]
atapci0: AHCI called from vendor specific driver
atapci0: AHCI v1.20 controller with 6 3Gbps ports, PM supported
ata2: ATA channel 0 on atapci0
ata2: [ITHREAD]
ata3: ATA channel 1 on atapci0
ata3: [ITHREAD]
ata4: ATA channel 2 on atapci0
ata4: stopping AHCI engine failed
ata4: [ITHREAD]
ata5: ATA channel 3 on atapci0
ata5: stopping AHCI engine failed
ata5: [ITHREAD]
ata6: ATA channel 4 on atapci0
ata6: [ITHREAD]
ata7: ATA channel 5 on atapci0
ata7: [ITHREAD]
ad4: 476940MB Seagate ST9500325AS 0001SDM1 at ata2-master SATA300
ad6: 476940MB Seagate ST9500325AS 0001SDM1 at ata3-master SATA300
ar0: writing of DDF metadata is NOT supported yet
ar0: disk0 READY using ad4 at ata2-master
ar0: disk1 READY using ad6 at ata3-master

-- 
Robin Sommer * Phone +1 (510) 666-2886 * ro...@icir.org 
ICSI/LBNL* Fax   +1 (510) 666-2956 *   www.icir.org
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: File system trouble with ICH9 controller

2010-06-10 Thread Robin Sommer
   100   100   097Pre-fail  Always   
-   0
 12 Power_Cycle_Count   0x0032   100   037   020Old_age   Always   
-   9
184 End-to-End_Error0x0032   100   100   099Old_age   Always   
-   0
187 Reported_Uncorrect  0x0032   100   100   000Old_age   Always   
-   0
188 Command_Timeout 0x0032   100   100   000Old_age   Always   
-   0
189 High_Fly_Writes 0x003a   100   100   000Old_age   Always   
-   0
190 Airflow_Temperature_Cel 0x0022   079   074   045Old_age   Always   
-   21 (Lifetime Min/Max 20/21)
191 G-Sense_Error_Rate  0x0032   100   100   000Old_age   Always   
-   0
192 Power-Off_Retract_Count 0x0032   100   100   000Old_age   Always   
-   0
193 Load_Cycle_Count0x0032   075   075   000Old_age   Always   
-   51868
194 Temperature_Celsius 0x0022   021   040   000Old_age   Always   
-   21 (0 18 0 0)
195 Hardware_ECC_Recovered  0x001a   051   050   000Old_age   Always   
-   37297906
197 Current_Pending_Sector  0x0012   100   100   000Old_age   Always   
-   0
198 Offline_Uncorrectable   0x0010   100   100   000Old_age   Offline  
-   0
199 UDMA_CRC_Error_Count0x003e   200   200   000Old_age   Always   
-   0
254 Free_Fall_Sensor0x0032   100   100   000Old_age   Always   
-   0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
100  Not_testing
200  Not_testing
300  Not_testing
400  Not_testing
500  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.





-- 
Robin Sommer * Phone +1 (510) 666-2886 * ro...@icir.org 
ICSI/LBNL* Fax   +1 (510) 666-2956 *   www.icir.org
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org