Re: Interesting fs corruption story
On 07 Mar 2001 12:22:22 -0800, Tim Wright wrote: > On Tue, Mar 06, 2001 at 08:10:10PM -0500, Ettore Perazzoli wrote: > > On 06 Mar 2001 17:01:02 -0800, Tim Wright wrote: > Yes, it does. I have the drive running in UDMA mode 2, and get ~16MB/s from > 'hdparm -t -T'. I have the "use DMA automatically" option turned on in the > kernel, so I inherit the BIOS settings which are correct. > > I've used standby and hibernation with complete success since. This seemed to fix the problem for me as well. I have had DMA turned on since then, and I have experienced no file system corruption anymore. Thanks! Maybe the help message for this kernel option (CONFIG_APM_ALLOW_INTS) should report in big blocky letters that disabling it might cause major data loss with some drive/bios combinations?.. I was not aware that I was touching such a sensitive parameter when I rebuilt the kernel, and the help message didn't warn me in any way. -- Ettore - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Interesting fs corruption story
On 07 Mar 2001 12:22:22 -0800, Tim Wright wrote: On Tue, Mar 06, 2001 at 08:10:10PM -0500, Ettore Perazzoli wrote: On 06 Mar 2001 17:01:02 -0800, Tim Wright wrote: Yes, it does. I have the drive running in UDMA mode 2, and get ~16MB/s from 'hdparm -t -T'. I have the "use DMA automatically" option turned on in the kernel, so I inherit the BIOS settings which are correct. I've used standby and hibernation with complete success since. This seemed to fix the problem for me as well. I have had DMA turned on since then, and I have experienced no file system corruption anymore. Thanks! Maybe the help message for this kernel option (CONFIG_APM_ALLOW_INTS) should report in big blocky letters that disabling it might cause major data loss with some drive/bios combinations?.. I was not aware that I was touching such a sensitive parameter when I rebuilt the kernel, and the help message didn't warn me in any way. -- Ettore - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Interesting fs corruption story
On 06 Mar 2001 17:01:02 -0800, Tim Wright wrote: > Hi Ettore, > I have no idea if this is related to your problem since you didn't mention > that key part, but with the same drive, I managed to trash my root partition > incredibly badly by trying to use DMA and then do APM suspend or hibernate. > On wakeup, I'd get an 'hda: lost interrupt' but then things would appear to > carry on. > > The fix for me was to rebuild the kernel and make sure CONFIG_APM_ALLOW_INTS > was enabled. So, do you ever use power management and is this similar, or do > you have a completely different problem ? Wow, this sounds like this might be the problem. I just checked my `.config' and indeed `CONFIG_APM_ALLOW_INTS' is not enabled. And indeed I have been suspending/resuming the machine a few times before the partition got corrupted. So, does DMA work correctly on your system after setting this option? I have now disabled it completely as a safety measure (and as suggested by somebody else on this list), and indeed I have not had any more troubles for now. (I have been forcing a fsck every day before turning the machine off.) Thanks a lot for the hint! I will now rebuild my kernel with that option turned on. -- Ettore - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Interesting fs corruption story
On 06 Mar 2001 17:01:02 -0800, Tim Wright wrote: Hi Ettore, I have no idea if this is related to your problem since you didn't mention that key part, but with the same drive, I managed to trash my root partition incredibly badly by trying to use DMA and then do APM suspend or hibernate. On wakeup, I'd get an 'hda: lost interrupt' but then things would appear to carry on. The fix for me was to rebuild the kernel and make sure CONFIG_APM_ALLOW_INTS was enabled. So, do you ever use power management and is this similar, or do you have a completely different problem ? Wow, this sounds like this might be the problem. I just checked my `.config' and indeed `CONFIG_APM_ALLOW_INTS' is not enabled. And indeed I have been suspending/resuming the machine a few times before the partition got corrupted. So, does DMA work correctly on your system after setting this option? I have now disabled it completely as a safety measure (and as suggested by somebody else on this list), and indeed I have not had any more troubles for now. (I have been forcing a fsck every day before turning the machine off.) Thanks a lot for the hint! I will now rebuild my kernel with that option turned on. -- Ettore - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Interesting fs corruption story
> You could try turning off DMA (rebuild your kernel again, and turn off "use > DMA by default"). Would this be in any way different from just `hdparm -d0 /dev/hda'? > UDMA is known to work reliably only with a (reasonably > broad) subset of chipsets, and it is likely that laptop chipsets get the > least testing. If turning off DMA fixes the problem for you, we at least > know where to start looking. Sure I can try this, although it's hard to safely say if the problem is fixed or not, as it's not reliably reproduceable. BTW, the Inspiron seemed to work just fine with DMA turned on, before the drive was replaced, with the 2.2.16 kernel that Red Hat ships. (I always had DMA turned on, and that was for about six months, without any problems ever.) Also, I have some friends using T20s with the same drive without any problems, with DMA turned on. Is there any kind of IDE DMA test I could run to see if it works reliably? -- Ettore - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Interesting fs corruption story
Hello all, I have been telling this story to a few people, and nobody seems to have a clue about what is going on... Alan suggested me to post a description of the problem to this list, so this is what I am doing. So, I had a Dell Inspiron 5000 which worked great for a while. It was running a more-or-less stock Red Hat 6.1 with the stock kernel from it. At some point, the hard drive in that machine was broken so I had to buy a new one. The new drive was an IBM Travelstar 20G. I installed a Debian system on it, with a reiserfs root partition, which was the only partition besides an ext2 /boot partition. Everything seemed to work fine, but after a while I started getting massive metadata corruption on it. Whenever I did an apt-get dist-upgrade, something weird happened, such as files that couldn't be stat()ed nor unlink()ed and directories that would make the kernel oops nicely if written to. I could never figure out what was wrong with it. The reiserfs people seemed to have no clue about what was going there. In the meantime, I got a new machine. An IBM Thinkpad T21, which is now my main machine. After the previous experience, I decided to not trust reiserfs this time, so I installed using ext2. Again, I installed Debian Woody. I needed to rebuild the kernel myself as I needed the soundcard to work, and the stock Debian one didn't even seem to have APM working, so I installed the 2.2.18 source from Debian, configured it, and compiled/installed using make-kpkg and dpkg. Unfortunately, after importing 15k mail messages or so into Gnus (which is a pretty disk-intensive activity -- I use nnml so every mail goes into a separate file) and apt-get upgrading a couple of times, I started getting file system corruption again. /tmp/.X0-lock was turned into a weird file with abnormal length and couldn't be removed, so I tried to manually force a fsck and this resulted in a lot of problems being reported, and lost+found getting 656 files into it. (Some of which are files from the Gnus mail repository, and other seem to come from TeX.) So, this looks pretty interesting to me. I got these metadata corruption problems (no data corruption that I know of) on two different machines with different hardware and different file systems. Maybe it's a kernel bug? Another interesting thing is that both machines use a Travelstar 20G drive. Maybe the drive's firmware is to blame, but I know at least two more people that are using that same drive on Thinkpads for quite a long time and have had no problems at all with it. (Using both XFS and ext2.) Some system information: (I don't have the Ispiron at hand anymore, so I can only be detailed about the Thinkpad) milkplus:~# /sbin/lspci 00:00.0 Host bridge: Intel Corporation 440BX/ZX - 82443BX/ZX Host bridge (rev 03) 00:01.0 PCI bridge: Intel Corporation 440BX/ZX - 82443BX/ZX AGP bridge (rev 03) 00:02.0 CardBus bridge: Texas Instruments PCI1450 (rev 03) 00:02.1 CardBus bridge: Texas Instruments PCI1450 (rev 03) 00:03.0 Ethernet controller: Intel Corporation 82557 [Ethernet Pro 100] (rev 09)00:03.1 Serial controller: Xircom: Unknown device 000c 00:05.0 Multimedia audio controller: Cirrus Logic CS 4614/22/24 [CrystalClear SoundFusion Audio Accelerator] (rev 01) 00:07.0 Bridge: Intel Corporation 82371AB PIIX4 ISA (rev 02) 00:07.1 IDE interface: Intel Corporation 82371AB PIIX4 IDE (rev 01) 00:07.2 USB Controller: Intel Corporation 82371AB PIIX4 USB (rev 01) 00:07.3 Bridge: Intel Corporation 82371AB PIIX4 ACPI (rev 03) 01:00.0 VGA compatible controller: S3 Inc. 86C270-294 Savage/MX-/IX (rev 13) milkplus:~# dmesg | grep hda ide0: BM-DMA at 0x1850-0x1857, BIOS settings: hda:DMA, hdb:pio hda: IBM-DJSA-220, ATA DISK drive hda: IBM-DJSA-220, 19077MB w/1874kB Cache, CHS=2584/240/63, UDMA hda: hda1 hda3 < hda5 hda6 > hda4 milkplus:~# hdparm /dev/hda /dev/hda: multcount= 0 (off) I/O support = 0 (default 16-bit) unmaskirq= 0 (off) using_dma= 1 (on) keepsettings = 0 (off) nowerr = 0 (off) readonly = 0 (off) readahead= 8 (on) geometry = 2584/240/63, sectors = 39070080, start = 0 Any idea? What am I doing wrong? Thanks in advance, -- Ettore (I am not subscribed to the list, so please reply to my own address too.) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Interesting fs corruption story
Hello all, I have been telling this story to a few people, and nobody seems to have a clue about what is going on... Alan suggested me to post a description of the problem to this list, so this is what I am doing. So, I had a Dell Inspiron 5000 which worked great for a while. It was running a more-or-less stock Red Hat 6.1 with the stock kernel from it. At some point, the hard drive in that machine was broken so I had to buy a new one. The new drive was an IBM Travelstar 20G. I installed a Debian system on it, with a reiserfs root partition, which was the only partition besides an ext2 /boot partition. Everything seemed to work fine, but after a while I started getting massive metadata corruption on it. Whenever I did an apt-get dist-upgrade, something weird happened, such as files that couldn't be stat()ed nor unlink()ed and directories that would make the kernel oops nicely if written to. I could never figure out what was wrong with it. The reiserfs people seemed to have no clue about what was going there. In the meantime, I got a new machine. An IBM Thinkpad T21, which is now my main machine. After the previous experience, I decided to not trust reiserfs this time, so I installed using ext2. Again, I installed Debian Woody. I needed to rebuild the kernel myself as I needed the soundcard to work, and the stock Debian one didn't even seem to have APM working, so I installed the 2.2.18 source from Debian, configured it, and compiled/installed using make-kpkg and dpkg. Unfortunately, after importing 15k mail messages or so into Gnus (which is a pretty disk-intensive activity -- I use nnml so every mail goes into a separate file) and apt-get upgrading a couple of times, I started getting file system corruption again. /tmp/.X0-lock was turned into a weird file with abnormal length and couldn't be removed, so I tried to manually force a fsck and this resulted in a lot of problems being reported, and lost+found getting 656 files into it. (Some of which are files from the Gnus mail repository, and other seem to come from TeX.) So, this looks pretty interesting to me. I got these metadata corruption problems (no data corruption that I know of) on two different machines with different hardware and different file systems. Maybe it's a kernel bug? Another interesting thing is that both machines use a Travelstar 20G drive. Maybe the drive's firmware is to blame, but I know at least two more people that are using that same drive on Thinkpads for quite a long time and have had no problems at all with it. (Using both XFS and ext2.) Some system information: (I don't have the Ispiron at hand anymore, so I can only be detailed about the Thinkpad) milkplus:~# /sbin/lspci 00:00.0 Host bridge: Intel Corporation 440BX/ZX - 82443BX/ZX Host bridge (rev 03) 00:01.0 PCI bridge: Intel Corporation 440BX/ZX - 82443BX/ZX AGP bridge (rev 03) 00:02.0 CardBus bridge: Texas Instruments PCI1450 (rev 03) 00:02.1 CardBus bridge: Texas Instruments PCI1450 (rev 03) 00:03.0 Ethernet controller: Intel Corporation 82557 [Ethernet Pro 100] (rev 09)00:03.1 Serial controller: Xircom: Unknown device 000c 00:05.0 Multimedia audio controller: Cirrus Logic CS 4614/22/24 [CrystalClear SoundFusion Audio Accelerator] (rev 01) 00:07.0 Bridge: Intel Corporation 82371AB PIIX4 ISA (rev 02) 00:07.1 IDE interface: Intel Corporation 82371AB PIIX4 IDE (rev 01) 00:07.2 USB Controller: Intel Corporation 82371AB PIIX4 USB (rev 01) 00:07.3 Bridge: Intel Corporation 82371AB PIIX4 ACPI (rev 03) 01:00.0 VGA compatible controller: S3 Inc. 86C270-294 Savage/MX-/IX (rev 13) milkplus:~# dmesg | grep hda ide0: BM-DMA at 0x1850-0x1857, BIOS settings: hda:DMA, hdb:pio hda: IBM-DJSA-220, ATA DISK drive hda: IBM-DJSA-220, 19077MB w/1874kB Cache, CHS=2584/240/63, UDMA hda: hda1 hda3 hda5 hda6 hda4 milkplus:~# hdparm /dev/hda /dev/hda: multcount= 0 (off) I/O support = 0 (default 16-bit) unmaskirq= 0 (off) using_dma= 1 (on) keepsettings = 0 (off) nowerr = 0 (off) readonly = 0 (off) readahead= 8 (on) geometry = 2584/240/63, sectors = 39070080, start = 0 Any idea? What am I doing wrong? Thanks in advance, -- Ettore (I am not subscribed to the list, so please reply to my own address too.) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Interesting fs corruption story
You could try turning off DMA (rebuild your kernel again, and turn off "use DMA by default"). Would this be in any way different from just `hdparm -d0 /dev/hda'? UDMA is known to work reliably only with a (reasonably broad) subset of chipsets, and it is likely that laptop chipsets get the least testing. If turning off DMA fixes the problem for you, we at least know where to start looking. Sure I can try this, although it's hard to safely say if the problem is fixed or not, as it's not reliably reproduceable. BTW, the Inspiron seemed to work just fine with DMA turned on, before the drive was replaced, with the 2.2.16 kernel that Red Hat ships. (I always had DMA turned on, and that was for about six months, without any problems ever.) Also, I have some friends using T20s with the same drive without any problems, with DMA turned on. Is there any kind of IDE DMA test I could run to see if it works reliably? -- Ettore - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/