Re: 2.4 ate my filesystem on rw-mount, getting closer
On Sun, Jan 14, 2001 at 06:59:57PM +0100, Tobias Ringstrom wrote: > > I should also add that the 3.11 driver seems to make things better, but > not yet perfect. My intuition tells me that I get CRC errors much sooner > with 2.1e than with 3.11. > > Has the timings changed from 2.1e to 3.11, and would it be easy to modify > 3.11 to get extra safe/paranoid, but less high performance, timings? If you use 'idebus=40' or 'idebus=50', the driver will add an extra margin to the timings, trying to compensate for the 40 or 50 MHz PCI bus it will be tricked to think it's working with. This could add a data point, yes. > Some extra data: > * B seems to work in 2 with udma2 > * A seems to work in 2 with udma1, but not with udma2. UDMA1 is 22.2 MB/sec, UDMA2 is 33.3. UDMA0 is 16.6. Could you (if didn't already) send me the lspci -vvxxx after the -X65 (UDMA1) command, together with the one before? That also could tell something. > I wouldn't say it's rock solid, and I would not trust my data to any of > these combinations, but at least it not break immmediately (i.e. for less > than 1 GB written). Actually, the CRC messages are safe and only mean a data transfer is retried. That is, only if it doesn't fail every time. They happen on many boards and drives using UDMA even under normal correct operation :( > The worst combination is 2.4.0 with VIA 2.1e and A in 1. Going from 2.1e > to 3.11 helps, but it is still very bad. > > I'd really like to be more precise, but there are too many combinations to > try to try them all, and sometimes it fails right away, and sometimes > after several hundred megabytes. If 'fails after several hundred megabytes' only means a single CRC error which is recovered from correctly, then that actually means 'working and probably would work perfect with a shorter cable'. -- Vojtech Pavlik SuSE Labs - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4 ate my filesystem on rw-mount, getting closer
I should also add that the 3.11 driver seems to make things better, but not yet perfect. My intuition tells me that I get CRC errors much sooner with 2.1e than with 3.11. Has the timings changed from 2.1e to 3.11, and would it be easy to modify 3.11 to get extra safe/paranoid, but less high performance, timings? Some extra data: * B seems to work in 2 with udma2 * A seems to work in 2 with udma1, but not with udma2. I wouldn't say it's rock solid, and I would not trust my data to any of these combinations, but at least it not break immmediately (i.e. for less than 1 GB written). The worst combination is 2.4.0 with VIA 2.1e and A in 1. Going from 2.1e to 3.11 helps, but it is still very bad. I'd really like to be more precise, but there are too many combinations to try to try them all, and sometimes it fails right away, and sometimes after several hundred megabytes. /Tobias - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4 ate my filesystem on rw-mount, getting closer
On Sun, 14 Jan 2001, Vojtech Pavlik wrote: > > > So the drive *did* work on the vt82c686a in the A7V board? You tested it > > > both on the Promise and on the 686a? But doesn't work on the 686a in > > > your other board? > > > > Yes, on both the Promise and on the 686a. But the device revisions are > > different. The machine that does NOT work: > > > > 00:07.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super] (rev 1b) > > 00:07.1 IDE interface: VIA Technologies, Inc. VT82C586 IDE [Apollo] (rev 06) > > > > The machine that works: > > > > 00:04.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super] (rev 22) > > 00:04.1 IDE interface: VIA Technologies, Inc. VT82C586 IDE [Apollo] (rev 10) > > > > The one the works is a 1 GHz Athlon, and the other is an 800 MHz > > Pentium-III. Of course is isn't. The vt82c686 that does not work is a 450 MHz K-6, not a PIII. > > > > no matter what cable I use. When I get this, the machine does not recover > > > > most of the time, and I have to reset or power cycle. > > > > > > It should be able to recover in a couple (up to 10) minutes ... > > > > Who waits 10 minutes for a timeout? Can it be lowered? > > It's not a 10 minute timeout, it's a shorter timeout retried many times. > Not my code, though - this is generic PCI IDE code, and is a huge mess. What I get is a number of Busy and Drive is not ready for command for different sectors. > > Expect another mail with the data you requested within a couple of hours. > > Thanks a lot. Ok, it took a bit longer that that, mostly because me and my whife had unexpected (but very welcome) guests at home. It is Sunday, after all... I have attached a tar file with "lspci -vvxxx" and "hdinfo -i" for machine 1 and 2 to this mail, but first some comments. I will be talking about three machines: 1) 450 MHz K-6 on an AOpen MX59 PRO II motherboard 2) 800 MHz PIII on an unknown cheap/crappy motherboard. 3) 1 GHz Athlon on an ASUS A7V motherboard. and the following drives: A) SAMSUNG VG34323A, sdma0 sdma1 sdma2 mdma0 mdma1 mdma2 udma0 udma1 udma2 B) ST38421A, mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 Machine 3 is the machine at home, and it does not have problems with any disks I have tried soo far, and seems very stable, both with ATA100 and ATA66. I verified that what is happening when RH7 tries to remount / read-write, is that I get the infamous CRC errors. It does not seem to recover from this state. At least I did not wait that long. I do not think that the RH7 kernel 2.2.16-22 uses udma2 at any time, and that may be why it works. Disk B does NOT work with DMA enabled with machine 1 or 2. It works better than disk A, but it does still fail after some time. The combination 1B was the most stable, and only failed once. When using disk B, the computer has managed to recover from the CRC error condition every time, as opposed to disk A which never recovers. (Busy) Using hdparm -X65 (udma1) makes disk A work with 2.4 in machine 2. What is the difference between udma1 and udma2? Now I'm almost completely lost. Hope this helps. Let me know if you want me to try something else. /Tobias /dev/hde: Model=SAMSUNG VG34323A (4.32GB), FwRev=GQ200, SerialNo=dW1921060033c8 Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs } RawCHS=14896/9/63, TrkSize=32256, SectSize=512, ECCbytes=21 BuffType=DualPortCache, BuffSize=496kB, MaxMultSect=16, MultSect=off CurCHS=14896/9/63, CurSects=-531627904, LBA=yes, LBAsects=8446032 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120} PIO modes: pio0 pio1 pio2 pio3 pio4 DMA modes: sdma0 sdma1 sdma2 mdma0 mdma1 mdma2 udma0 udma1 *udma2 00:00.0 Host bridge: VIA Technologies, Inc.: Unknown device 0305 (rev 02) Subsystem: Asustek Computer, Inc.: Unknown device 8033 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- SERR- Capabilities: [c0] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=0 PME- 00: 06 11 05 03 06 00 10 a2 02 00 00 06 00 00 00 00 10: 08 00 00 e0 00 00 00 00 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 43 10 33 80 30: 00 00 00 00 a0 00 00 00 00 00 00 00 00 00 00 00 40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 50: 17 a4 6b b4 4f 81 10 10 80 00 08 10 10 10 10 10 60: 03 ff 00 b0 e6 e5 e5 00 44 7c 86 0f 08 3f 00 00 70: de 80 cc 0c 0e a1 d2 00 01 b4 11 02 00 00 00 01 80: 0f 40 00 00 80 00 00 00 02 00 00 00 00 00 00 00 90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 a0: 02 c0 20 00 17 02 00 1f 00 00 00 00 6e 02 14 00 b0: 61 ec 80 e5 32 33 28 00 00 00 00 00 00 00 00 00 c0: 01 00 02 00 00 00 00 00 00 00 00 00 00 00 00 00 d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Re: 2.4 ate my filesystem on rw-mount, getting closer
On Sun, Jan 14, 2001 at 09:45:09AM +0100, Tobias Ringstrom wrote: > On Sun, 14 Jan 2001, Vojtech Pavlik wrote: > > On Sat, Jan 13, 2001 at 11:36:13PM +0100, Tobias Ringstrom wrote: > > > > > I have now tried the SAMSUNG VG34323A disk with two other controllers at > > > home (Promise ATA100 an VIA vt82c686a rev 0x22, both on an ASUS A7V > > > motherboard), and there are no problems to be found with DMA enabled. > > > Streaming 10 MB/s without glitches. > > > > So the drive *did* work on the vt82c686a in the A7V board? You tested it > > both on the Promise and on the 686a? But doesn't work on the 686a in > > your other board? > > Yes, on both the Promise and on the 686a. But the device revisions are > different. The machine that does NOT work: > > 00:07.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super] (rev 1b) > 00:07.1 IDE interface: VIA Technologies, Inc. VT82C586 IDE [Apollo] (rev 06) > > The machine that works: > > 00:04.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super] (rev 22) > 00:04.1 IDE interface: VIA Technologies, Inc. VT82C586 IDE [Apollo] (rev 10) > > The one the works is a 1 GHz Athlon, and the other is an 800 MHz > Pentium-III. > > > > no matter what cable I use. When I get this, the machine does not recover > > > most of the time, and I have to reset or power cycle. > > > > It should be able to recover in a couple (up to 10) minutes ... > > Who waits 10 minutes for a timeout? Can it be lowered? It's not a 10 minute timeout, it's a shorter timeout retried many times. Not my code, though - this is generic PCI IDE code, and is a huge mess. > Expect another mail with the data you requested within a couple of hours. Thanks a lot. -- Vojtech Pavlik SuSE Labs - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4 ate my filesystem on rw-mount, getting closer
On Sun, 14 Jan 2001, Vojtech Pavlik wrote: > On Sat, Jan 13, 2001 at 11:36:13PM +0100, Tobias Ringstrom wrote: > > > I have now tried the SAMSUNG VG34323A disk with two other controllers at > > home (Promise ATA100 an VIA vt82c686a rev 0x22, both on an ASUS A7V > > motherboard), and there are no problems to be found with DMA enabled. > > Streaming 10 MB/s without glitches. > > So the drive *did* work on the vt82c686a in the A7V board? You tested it > both on the Promise and on the 686a? But doesn't work on the 686a in > your other board? Yes, on both the Promise and on the 686a. But the device revisions are different. The machine that does NOT work: 00:07.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super] (rev 1b) 00:07.1 IDE interface: VIA Technologies, Inc. VT82C586 IDE [Apollo] (rev 06) The machine that works: 00:04.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super] (rev 22) 00:04.1 IDE interface: VIA Technologies, Inc. VT82C586 IDE [Apollo] (rev 10) The one the works is a 1 GHz Athlon, and the other is an 800 MHz Pentium-III. > > no matter what cable I use. When I get this, the machine does not recover > > most of the time, and I have to reset or power cycle. > > It should be able to recover in a couple (up to 10) minutes ... Who waits 10 minutes for a timeout? Can it be lowered? Expect another mail with the data you requested within a couple of hours. /Tobias - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4 ate my filesystem on rw-mount, getting closer
On Sat, Jan 13, 2001 at 11:36:13PM +0100, Tobias Ringstrom wrote: > I have now tried the SAMSUNG VG34323A disk with two other controllers at > home (Promise ATA100 an VIA vt82c686a rev 0x22, both on an ASUS A7V > motherboard), and there are no problems to be found with DMA enabled. > Streaming 10 MB/s without glitches. So the drive *did* work on the vt82c686a in the A7V board? You tested it both on the Promise and on the 686a? But doesn't work on the 686a in your other board? > However, writing to the SAMSUNG VG34323A disk with DMA enabled on either > this machine [1] (at work, using the VIA IDE driver version 3.11) > > 00:07.0 ISA bridge: VIA Technologies, Inc. VT82C596 ISA [Apollo PRO] (rev 23) > 00:07.1 IDE interface: VIA Technologies, Inc. VT82C586 IDE [Apollo] (rev 10) > > or this machine [2] (at work, using the VIA IDE driver version 2.1e) > > 00:07.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super] (rev 1b) > 00:07.1 IDE interface: VIA Technologies, Inc. VT82C586 IDE [Apollo] (rev 06) What's the manufacturer/model of these boards? Just for record ... What's the PCI bus speed? Or memory speed? > I get exactly the following errors on both machines > > hdc: dma_intr: status=0x51 { DriveReady SeekComplete Error } > hdc: dma_intr: error=0x84 { DriveStatusError BadCRC } > > no matter what cable I use. When I get this, the machine does not recover > most of the time, and I have to reset or power cycle. It should be able to recover in a couple (up to 10) minutes ... > This disc works > flawlessly on two other IDE controllers, so I do not think that the disk > is completely broken. It must be either these chipsets or the driver in > combination with this disk. Note that I _can_ use another UDMA66 disk > _with_ DMA enabled on both machine [1] and [2] above without problems. > Also, 2.2.16-22 seems to work with DMA enabled on machine [1]. I have not > tried 2.2.16-22 with DMA enabled on machine [2]. > > The problem I reported at first, hence the nasty subject, was a hang and a > nasty fs corruption when RH7 tried to remount the root fs read-write. I > examined the RH7 init scripts, or more precisely /etc/rc.sysinit, and > discovered, to my great disgust, that the stupid thing disables the dmesg > output on the console very early in the script. It is thus entirely > possible that I do get the above mentioned errors when the computer seems > to hang, and my fs gets corrupted. I will fix the script tomorrow to see > if my assumption is correct. > > SUMMARY: I have a disk that with DMA enabled give me CRC errors on two > machines, but not on two other, independent on the cable. Both troubling > machines do not recover from these errors. Linux 2.2.16-22 from RedHat > works fine with DMA enabled on machine [1], [2] is unknown. > > I hope this makes things a lot clearer. Yes, indeed it's much clearer now. Now to fix the bug, or at least be able to track it closer, I'll need 'lspci -vvxxx' of the IDE pci device in the following cases: 1) SAMSUNG VG34323A on VT82C596b/cf with RH 2.2.16-22 and DMA (working) 2) SAMSUNG VG34323A on VT82C686a/ce with RH 2.2.16-22 and DMA (working) 3) SAMSUNG VG34323A on VT82C596b/cf with 2.4.0+via3.11 and DMA, (doesn't work, so fs readonly) 4) SAMSUNG VG34323A on VT82C686a/ce with 2.4.0+via3.11 and DMA, (doesn't work, so fs readonly) 5) The other drive on VT82C596b/cf with 2.4.0+via3.11 and DMA (working) 6) The other drive on VT82C686a/ce with 2.4.0+via3.11 and DMA (working) With these data I should be able to find out what's different between the working and not working setups ... My current theory: In UDMA, when reading, the drive provides the clock. The IDE controller thus can read everything OK. When writing, the controller provides the clock and for some reason the Samsung can't keep up with the setting the driver selects for it. The question is why and why the driver selects the incorrect (or just too tight?) value. -- Vojtech Pavlik SuSE Labs - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4 ate my filesystem on rw-mount, getting closer
I have now tried the SAMSUNG VG34323A disk with two other controllers at home (Promise ATA100 an VIA vt82c686a rev 0x22, both on an ASUS A7V motherboard), and there are no problems to be found with DMA enabled. Streaming 10 MB/s without glitches. However, writing to the SAMSUNG VG34323A disk with DMA enabled on either this machine [1] (at work, using the VIA IDE driver version 3.11) 00:07.0 ISA bridge: VIA Technologies, Inc. VT82C596 ISA [Apollo PRO] (rev 23) 00:07.1 IDE interface: VIA Technologies, Inc. VT82C586 IDE [Apollo] (rev 10) or this machine [2] (at work, using the VIA IDE driver version 2.1e) 00:07.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super] (rev 1b) 00:07.1 IDE interface: VIA Technologies, Inc. VT82C586 IDE [Apollo] (rev 06) I get exactly the following errors on both machines hdc: dma_intr: status=0x51 { DriveReady SeekComplete Error } hdc: dma_intr: error=0x84 { DriveStatusError BadCRC } no matter what cable I use. When I get this, the machine does not recover most of the time, and I have to reset or power cycle. This disc works flawlessly on two other IDE controllers, so I do not think that the disk is completely broken. It must be either these chipsets or the driver in combination with this disk. Note that I _can_ use another UDMA66 disk _with_ DMA enabled on both machine [1] and [2] above without problems. Also, 2.2.16-22 seems to work with DMA enabled on machine [1]. I have not tried 2.2.16-22 with DMA enabled on machine [2]. The problem I reported at first, hence the nasty subject, was a hang and a nasty fs corruption when RH7 tried to remount the root fs read-write. I examined the RH7 init scripts, or more precisely /etc/rc.sysinit, and discovered, to my great disgust, that the stupid thing disables the dmesg output on the console very early in the script. It is thus entirely possible that I do get the above mentioned errors when the computer seems to hang, and my fs gets corrupted. I will fix the script tomorrow to see if my assumption is correct. SUMMARY: I have a disk that with DMA enabled give me CRC errors on two machines, but not on two other, independent on the cable. Both troubling machines do not recover from these errors. Linux 2.2.16-22 from RedHat works fine with DMA enabled on machine [1], [2] is unknown. I hope this makes things a lot clearer. /Tobias - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/