Bug#582275: linux-image-2.6.32-bpo.4-686: ext3 filesystem corruption with md RAID1 on Seagate disks

2010-05-23 Thread Ben Hutchings
On Sun, 2010-05-23 at 18:38 +0200, Reiner Buehl wrote:
 Hi Ben,
 
 as you might have seen from my last mails on the linux-fsdevel list, the 
 problem has not disappeared. If it does not cause too much trouble, I 
 would like to keep the report open at least until Ted Tso has had a 
 chance to look at the fsck output. Is this possible?

Yes, that's OK.  Please add the bug address 582...@bugs.debian.org to
the cc list in further discussions.

Ben.

-- 
Ben Hutchings
Once a job is fouled up, anything done to improve it makes it worse.


signature.asc
Description: This is a digitally signed message part


Bug#582275: linux-image-2.6.32-bpo.4-686: ext3 filesystem corruption with md RAID1 on Seagate disks

2010-05-21 Thread Ben Hutchings
On Wed, 2010-05-19 at 18:50 +0200, Reiner Buehl wrote:
 Package: linux-2.6
 Version: 2.6.32-11~bpo50+1
 Severity: critical
 Justification: causes serious data loss
 
 I keep getting ext3 filesystem corruptions on one of my md RAID1
 arrays. Shortly after booting, I get messages like the following one:
[...]

I see that you've also send a bug report to some kernel mailing lists,
and the problem has now disappeared.  Do you still want to keep this bug
report open?

Ben.

-- 
Ben Hutchings
Once a job is fouled up, anything done to improve it makes it worse.


signature.asc
Description: This is a digitally signed message part


Bug#582275: linux-image-2.6.32-bpo.4-686: ext3 filesystem corruption with md RAID1 on Seagate disks

2010-05-19 Thread Reiner Buehl
Package: linux-2.6
Version: 2.6.32-11~bpo50+1
Severity: critical
Justification: causes serious data loss

I keep getting ext3 filesystem corruptions on one of my md RAID1 arrays. 
Shortly after booting, I get messages like the following one:

EXT3-fs error (device md1): htree_dirblock_to_tree: bad entry in directory 
#17269110: rec_len is smaller than minimal - offset=0, inode=0, rec_len=0, 
name_len=0

This forces an automatic fsck at the next reboot that fails. The manual 
fsck.ext3 -y /dev/md1 takes a long time but manages to get a clean FS again. 
After the reboot, it takes just a few minutes until the first of these messages 
appear again.

The two disks used in the RAID1 md device are both Seagate ST31000528AS that 
show no errors in long and short SMART test and Seatools. Memtest shows no 
memory problems. Two other RAID1 systems connected to the same Intel Ibex Peak 
6 port SATA AHCI Controller (rev 06) show no such problems. A RAID5 with 4 
Seagate ST3750640AS on a Promise PDC40718 (SATA 300 TX4) also works without 
problems in the same system. 

I saw that sata_sil.c has a blacklist that includes mainly Seagate drives but 
do not know if this is related to my problem since my system uses an Intel SATA 
controller.

-- Package-specific info:
** Version:
Linux version 2.6.32-bpo.4-686 (Debian 2.6.32-11~bpo50+1) 
(norb...@tretkowski.de) (gcc version 4.3.2 (Debian 4.3.2-1.1) ) #1 SMP Mon Apr 
12 16:20:13 UTC 2010

** Command line:
root=UUID=a059fdf2-4ff6-4c30-ba7f-77c85e7f5d1b ro quiet splash

** Not tainted

** Kernel log:
[9.619225] md0: detected capacity change from 0 to 2250460889088
[9.619426] md: md1 stopped.
[9.619507]  md0: unknown partition table
[9.627810] md: bindsdf1
[9.627937] md: bindsde1
[9.637594] raid1: raid set md1 active with 2 out of 2 mirrors
[9.637607] md1: detected capacity change from 0 to 991614926848
[9.637814] md: md2 stopped.
[9.637897]  md1: unknown partition table
[9.675765] md: bindsdf2
[9.675894] md: bindsde2
[9.685475] raid1: raid set md2 active with 2 out of 2 mirrors
[9.685489] md2: detected capacity change from 0 to 8587116544
[9.685682] md: md3 stopped.
[9.685764]  md2: unknown partition table
[9.747654] md: bindsdi1
[9.747756] md: bindsdh1
[9.757332] raid1: raid set md3 active with 2 out of 2 mirrors
[9.757345] md3: detected capacity change from 0 to 2000396222464
[9.757541] md: md4 stopped.
[9.757625]  md3: unknown partition table
[9.804386] md: bindsdg1
[9.804488] md: bindsdl1
[9.814098] raid1: raid set md4 active with 2 out of 2 mirrors
[9.814111] md4: detected capacity change from 0 to 2000396222464
[9.814323]  md4: unknown partition table
[9.994122] device-mapper: uevent: version 1.0.3
[9.994182] device-mapper: ioctl: 4.15.0-ioctl (2009-04-01) initialised: 
dm-de...@redhat.com
[   10.009098] PM: Starting manual resume from disk
[   10.009100] PM: Resume from partition 9:2
[   10.009101] PM: Checking hibernation image.
[   10.013003] PM: Error -22 checking image file
[   10.013004] PM: Resume from disk failed.
[   10.082000] kjournald starting.  Commit interval 5 seconds
[   10.082004] EXT3-fs: mounted filesystem with ordered data mode.
[   11.327588] udevd version 125 started
[   11.898037] input: Power Button as 
/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0C0C:00/input/input2
[   11.898041] ACPI: Power Button [PWRB]
[   11.924627] Monitor-Mwait will be used to enter C-1 state
[   11.951685] Monitor-Mwait will be used to enter C-2 state
[   11.983607] Monitor-Mwait will be used to enter C-3 state
[   11.983664] processor LNXCPU:00: registered as cooling_device0
[   11.983706] input: Power Button as 
/devices/LNXSYSTM:00/LNXPWRBN:00/input/input3
[   11.983709] ACPI: Power Button [PWRF]
[   12.102371] processor LNXCPU:01: registered as cooling_device1
[   12.183683] processor LNXCPU:02: registered as cooling_device2
[   12.269226] input: PC Speaker as /devices/platform/pcspkr/input/input4
[   12.287409] input: X10 Wireless Technology Inc USB Receiver as 
/devices/pci:00/:00:1a.7/usb7/7-4/7-4.4/input/input5
[   12.287462] usbcore: registered new interface driver ati_remote
[   12.287464] ati_remote: 2.2.1:ATI/X10 RF USB Remote Control
[   12.288353] usb 7-4.4: Weird data, len=1 ff 02 00 00 00 00 ...
[   12.311612] processor LNXCPU:03: registered as cooling_device3
[   12.468824] i801_smbus :00:1f.3: PCI INT C - GSI 18 (level, low) - IRQ 
18
[   12.493504] HDA Intel :00:1b.0: PCI INT A - GSI 22 (level, low) - IRQ 
22
[   12.493529] HDA Intel :00:1b.0: setting latency timer to 64
[   12.770141] input: HDA Digital PCBeep as 
/devices/pci:00/:00:1b.0/input/input6
[   12.772834] HDA Intel :01:00.1: PCI INT A - GSI 16 (level, low) - IRQ 
16
[   12.772836] hda_intel: Disable MSI for Nvidia chipset
[   12.772853] HDA Intel :01:00.1: setting latency timer to 64
[   14.991127] Adding 8385848k swap on /dev/md2.  Priority:-1