Re: your mail
On Mon, 21 May 2001, Thomas Palm wrote: > there ist still file-corruption. I use an ASUS A7V133 (Revision 1.05, > including Sound + Raid). My tests: > 1st run of "diff -r srcdir destdir" -> no differs > 2nd run of "diff -r srcdir destdir" -> 2 files differ > 3rd run of "diff -r srcdir destdir" -> 1 file differs > 4th run of "diff -r srcdir destdir" -> 1 file differs > 5th run of "diff -r srcdir destdir" -> no differs Could you check WHERE the file differ and WHERE the data come from ? I've got the same mobo AND some nasty DAT tape corruption problems... (also, VERY rarely, on the CD burner). I've got all on SCSI, but if it's the DMA troubling us... -- Lorenzo Marcantonio - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: your mail
On Mon, 21 May 2001, Thomas Palm wrote: there ist still file-corruption. I use an ASUS A7V133 (Revision 1.05, including Sound + Raid). My tests: 1st run of diff -r srcdir destdir - no differs 2nd run of diff -r srcdir destdir - 2 files differ 3rd run of diff -r srcdir destdir - 1 file differs 4th run of diff -r srcdir destdir - 1 file differs 5th run of diff -r srcdir destdir - no differs Could you check WHERE the file differ and WHERE the data come from ? I've got the same mobo AND some nasty DAT tape corruption problems... (also, VERY rarely, on the CD burner). I've got all on SCSI, but if it's the DMA troubling us... -- Lorenzo Marcantonio - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
New AIC7xxx driver - Berkeley DB1 to DB3
Can someone verify if it's legal to change the include/link in the assembler for AIC7xxx ? DB 1.85 has header clash with DB 3 (db.h). It SEEMS to work but I'd rather be sure (since I've got that nasty 32 bit corruption prob on SCSI char devices...) -- Lorenzo Marcantonio - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[SCSI TAPE CORRUPTION] - AARGH... it's even on CDWR!
Yesterday, I burnt some deployment CD for W98SE (yes, THAT thing...), about 400 megs of stuff. And they DIDN'T work. dd'ed the image from the cd device, then compared with the original (luckily I had still it on my HDD) 32 defective bytes !!! from 32KB before :((( Note that I've never suffered from HD corruption. Maybe something on SCSI character devices? Crying... -- Lorenzo Marcantonio - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux-2.4.4 failure to compile
On Thu, 17 May 2001, H. Peter Anvin wrote: > I think the header file you're talking about is the db1 header file, > which has nothing to do with yacc -- it's the Berkeley libdb version 1, > which is a pretty bad thing to require. > I've got it to compile (and apparently work) even con libdb3... which has the compability header db_185.h (or something similar). IIRC, libdb1 was bundled with libc till release 2.1.3. Since 2.2 they've said 'get it at sleepycat...'. BTW, there are ifdef inside the driver about which header to include (db.h or db_185.h IIRC). I still doesn't comprend what does it NEED FOR the libdb... -- Lorenzo Marcantonio - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux-2.4.4 failure to compile
On Thu, 17 May 2001, H. Peter Anvin wrote: I think the header file you're talking about is the db1 header file, which has nothing to do with yacc -- it's the Berkeley libdb version 1, which is a pretty bad thing to require. I've got it to compile (and apparently work) even con libdb3... which has the compability header db_185.h (or something similar). IIRC, libdb1 was bundled with libc till release 2.1.3. Since 2.2 they've said 'get it at sleepycat...'. BTW, there are ifdef inside the driver about which header to include (db.h or db_185.h IIRC). I still doesn't comprend what does it NEED FOR the libdb... -- Lorenzo Marcantonio - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[SCSI TAPE CORRUPTION] - AARGH... it's even on CDWR!
Yesterday, I burnt some deployment CD for W98SE (yes, THAT thing...), about 400 megs of stuff. And they DIDN'T work. dd'ed the image from the cd device, then compared with the original (luckily I had still it on my HDD) 32 defective bytes !!! from 32KB before :((( Note that I've never suffered from HD corruption. Maybe something on SCSI character devices? Crying... -- Lorenzo Marcantonio - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
New AIC7xxx driver - Berkeley DB1 to DB3
Can someone verify if it's legal to change the include/link in the assembler for AIC7xxx ? DB 1.85 has header clash with DB 3 (db.h). It SEEMS to work but I'd rather be sure (since I've got that nasty 32 bit corruption prob on SCSI char devices...) -- Lorenzo Marcantonio - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SCSI Tape Corruption - 2nd round experiment result
On Tue, 15 May 2001, Geert Uytterhoeven wrote: > I never saw an offset different from the block size, though. > > Assuming you did have 32-byte errors, you had 7 errors for 1.3 GB. > > I have approx. 6 errors for 256 MB. But I have only 128 MB RAM. Next test: boot with mem=32M (shall I get 0 errors?... naaahh) -- Lorenzo Marcantonio - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SCSI Tape Corruption - 2nd round experiment result
On Tue, 15 May 2001, Geert Uytterhoeven wrote: I never saw an offset different from the block size, though. Assuming you did have 32-byte errors, you had 7 errors for 1.3 GB. I have approx. 6 errors for 256 MB. But I have only 128 MB RAM. Next test: boot with mem=32M (shall I get 0 errors?... naaahh) -- Lorenzo Marcantonio - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
SCSI Tape Corruption - 2nd round experiment result
Again battling with my SDT-9000, tonight first experiment was: - Moderately huge file (339443712 bytes). Obtained cat'ing some large tar.bz2, so essentially 'random data' - Fixed block size (dd bs=32KB, mt bs=512=default) - HW data compression at default (enabled) - Variable machine load (varying disk access to stimulate the I/O subsys) - New AIC7xxx driver Test sequence: -- mt rewind dd if=blob.dat of=/dev/tape bs=32k mt rewind dd if=/dev/tape of=blob.da1 md5sum blob.dat blob.da1 /dev/tape is symlinked to /dev/tapes/tape0/mtn First test was done with very light load Second test was done with a kernel compilation in background Third test was done with HEAVY disk I/O in background (some mkisofs) SCSI driver says this at start (seems harmless, SDTR=SetDataTransferRate?): (scsi1:A:4:0): Sending SDTR period 19, offset f (scsi1:A:4:0): Received SDTR period 19, offset f Filtered to period 19, offset f Only the third test got the file 100% right. Seeing that as suspicious (I've never got a good tar on this machine), I ran a fourth test (same conditions as the third). The differences: (File offsets in hex, patterns were found without other matches in the file) First test: 64 bytes at D9E0800 (found starting at D9D8800, 32KB before) Second test: 64 bytes at 2F187C0 (found starting at 2F107C0, 32KB before) 64 bytes at A8643C0 (found starting at A8343C0, 192KB before[!]) Third test: No differences (sheer luck?) Fourth test: 32 bytes at B937640 (found starting at B8D7640, 384KB before[!!]) Conclusions (IMO): -- It's the first time I see 64 consecutive corrupted bytes. Also, on the fourth test the data were from MUCH earlier in the file... (maybe in some remote cache area... I've got 512MB RAM, 1024MB swap) The Second Experiment (As Suggested By Alex Q Chen) --- This time I've played with mt options. Since I was almost sleeping, I've tried something radical: mt stoptions debug # It should have disabled EVERYTHING but debug It says: st0: Mode 0 options: buffer writes: 0, async writes: 0, read ahead: 0 st0:can bsr: 0, two FMs: 0, fast mteom: 0, auto lock: 0, st0:defs for wr: 0, no block limits: 0, partitions: 0, s2 log: 0 st0:sysv: 0 Then tested as before (same three load conditions)... First test: No differences Second test: No differences Third test: No differences Fourth test (par condicio :): 64 bytes at 1039A840 (found starting at 10262840, 1248KB before) Final Conclusion: - Disabling the st options hasn't resolved the problem (which is at least very elusive). Still, I can do more tests (not today, it's 3:14 AM here... ). Good night -- Lorenzo Marcantonio - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
SCSI Tape Corruption - 2nd round experiment result
Again battling with my SDT-9000, tonight first experiment was: - Moderately huge file (339443712 bytes). Obtained cat'ing some large tar.bz2, so essentially 'random data' - Fixed block size (dd bs=32KB, mt bs=512=default) - HW data compression at default (enabled) - Variable machine load (varying disk access to stimulate the I/O subsys) - New AIC7xxx driver Test sequence: -- mt rewind dd if=blob.dat of=/dev/tape bs=32k mt rewind dd if=/dev/tape of=blob.da1 md5sum blob.dat blob.da1 /dev/tape is symlinked to /dev/tapes/tape0/mtn First test was done with very light load Second test was done with a kernel compilation in background Third test was done with HEAVY disk I/O in background (some mkisofs) SCSI driver says this at startend (seems harmless, SDTR=SetDataTransferRate?): (scsi1:A:4:0): Sending SDTR period 19, offset f (scsi1:A:4:0): Received SDTR period 19, offset f Filtered to period 19, offset f Only the third test got the file 100% right. Seeing that as suspicious (I've never got a good tar on this machine), I ran a fourth test (same conditions as the third). The differences: (File offsets in hex, patterns were found without other matches in the file) First test: 64 bytes at D9E0800 (found starting at D9D8800, 32KB before) Second test: 64 bytes at 2F187C0 (found starting at 2F107C0, 32KB before) 64 bytes at A8643C0 (found starting at A8343C0, 192KB before[!]) Third test: No differences (sheer luck?) Fourth test: 32 bytes at B937640 (found starting at B8D7640, 384KB before[!!]) Conclusions (IMO): -- It's the first time I see 64 consecutive corrupted bytes. Also, on the fourth test the data were from MUCH earlier in the file... (maybe in some remote cache area... I've got 512MB RAM, 1024MB swap) The Second Experiment (As Suggested By Alex Q Chen) --- This time I've played with mt options. Since I was almost sleeping, I've tried something radical: mt stoptions debug # It should have disabled EVERYTHING but debug It says: st0: Mode 0 options: buffer writes: 0, async writes: 0, read ahead: 0 st0:can bsr: 0, two FMs: 0, fast mteom: 0, auto lock: 0, st0:defs for wr: 0, no block limits: 0, partitions: 0, s2 log: 0 st0:sysv: 0 Then tested as before (same three load conditions)... First test: No differences Second test: No differences Third test: No differences Fourth test (par condicio :): 64 bytes at 1039A840 (found starting at 10262840, 1248KB before) Final Conclusion: - Disabling the st options hasn't resolved the problem (which is at least very elusive). Still, I can do more tests (not today, it's 3:14 AM here... ). Good night -- Lorenzo Marcantonio - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SCSI Tape corruption - update
On Tue, 8 May 2001, Geert Uytterhoeven wrote: > In the mean time I down/upgraded to 2.2.17 on my PPC box (CHRP LongTrail, > Sym53c875, HP C5136A DDS1) and I can confirm that the problem does not happen > under 2.2.17 neither. > > My experiences: > - reading works fine, writing doesn't Same here > - 2.2.x works fine, 2.4.x doesn't (at least since 2.4.0-test1-ac10) SAME here > - hardware compression doesn't matter SAME HERE > - I have a sym53c875, Lorenzo has an Adaptec, so most likely it's not a > SCSI hardware driver bug > - I have a PPC, Lorenzo doesn't, so it's not CPU-specific > - corruption is always a block of 32 bytes being replaced by 32 bytes from > the previous tape block (depending on block size!) (approx. 6 errors per > 256 MB) YESSS... EXACTLY 32 consecutive bytes are different. I'll bet we've got the same problem > - How many successive bytes are corrupted? > - Where do the corrupted data come from? H I'll set up some sort of binary pattern match. This afternoon I'll pinpoint the source of the rogue bytes... -- Lorenzo Marcantonio - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SCSI Tape corruption - update
On Mon, 7 May 2001, Rob Turk wrote: > Lorenzo, > > Have you ruled out hardware failures? There's been a few isolated reports That tape drive (Sony SDT-9000, less than 2 years of service) works perfectly on Windows NT (were it was before) and even on Linux 2.2 Also the cartridge was brand new. (BTW, I've tried even with DC disabled. Well, it's REALLY fast dumping /dev/zero on tape with DC enabled :) ) -- Lorenzo Marcantonio - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
SCSI Tape corruption - update
As of my latest build [2.4.5-pre1] I've STILL got the tape corruption problem. Some new facts: (1) It happens only writing the tape (tried exchanging tapes with a brand new Alpha Digital Tru64 box). I can read her tape, she can't read my tape. Tried with GNU tar and gzip. (2) I suppose it isn't fault of AIC7xxx driver (tried both new and old) (3) Playing with block size doesn't help (even tried variable block size) What can I do? Can I set some kind of trace to pinpoint the problem (in st.c, maybe?) -- Lorenzo Marcantonio - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
SCSI Tape corruption - update
As of my latest build [2.4.5-pre1] I've STILL got the tape corruption problem. Some new facts: (1) It happens only writing the tape (tried exchanging tapes with a brand new Alpha Digital Tru64 box). I can read her tape, she can't read my tape. Tried with GNU tar and gzip. (2) I suppose it isn't fault of AIC7xxx driver (tried both new and old) (3) Playing with block size doesn't help (even tried variable block size) What can I do? Can I set some kind of trace to pinpoint the problem (in st.c, maybe?) -- Lorenzo Marcantonio - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SCSI Tape corruption - update
On Mon, 7 May 2001, Rob Turk wrote: Lorenzo, Have you ruled out hardware failures? There's been a few isolated reports That tape drive (Sony SDT-9000, less than 2 years of service) works perfectly on Windows NT (were it was before) and even on Linux 2.2 Also the cartridge was brand new. (BTW, I've tried even with DC disabled. Well, it's REALLY fast dumping /dev/zero on tape with DC enabled :) ) -- Lorenzo Marcantonio - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SCSI tape corruption problem
On Sat, 14 Apr 2001, Marc SCHAEFER wrote: > Now try this: > >cd ~archive >mt -f /dev/tapes/tape0 rewind >tar cvf - . | gzip -9 | dd of=/dev/tapes/tape0 bs=32k > > and then: > >mt -f /dev/tapes/tape0 rewind >dd if=/dev/tapes/tape0 bs=32k | gzip -d | tar --compare -v -f - > > The above is the proper way to talk to a tape drive through gzip. I see the blocking part... but in my second experiment I've used ONLY dd to put a large file on tape... ... still, I've investigated on this because amverify gave me a ton of crc errors... (I REALLY hope that amanda uses proper blocking :) -- Lorenzo Marcantonio - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SCSI Tape Corruption - update 2
On Fri, 13 Apr 2001, Nate Eldredge wrote: > (32 bytes is the size of a cache line.) A memory tester might be > something to try (I wrote a simple program that seemed to show the > error better than memtest86; can send it if desired.) Already tried that... this system has passed some 20 hours running memtest86... Also I've got NO OTHER memory failure symptom (and the tape fails only on writing) -- Lorenzo Marcantonio - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SCSI Tape Corruption - update 2
On Fri, 13 Apr 2001, Nate Eldredge wrote: (32 bytes is the size of a cache line.) A memory tester might be something to try (I wrote a simple program that seemed to show the error better than memtest86; can send it if desired.) Already tried that... this system has passed some 20 hours running memtest86... Also I've got NO OTHER memory failure symptom (and the tape fails only on writing) -- Lorenzo Marcantonio - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SCSI tape corruption problem
On Sat, 14 Apr 2001, Marc SCHAEFER wrote: Now try this: cd ~archive mt -f /dev/tapes/tape0 rewind tar cvf - . | gzip -9 | dd of=/dev/tapes/tape0 bs=32k and then: mt -f /dev/tapes/tape0 rewind dd if=/dev/tapes/tape0 bs=32k | gzip -d | tar --compare -v -f - The above is the proper way to talk to a tape drive through gzip. I see the blocking part... but in my second experiment I've used ONLY dd to put a large file on tape... ... still, I've investigated on this because amverify gave me a ton of crc errors... (I REALLY hope that amanda uses proper blocking :) -- Lorenzo Marcantonio - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SCSI Tape Corruption - update 2
On Thu, 12 Apr 2001, GĂ©rard Roudier wrote: > using a sym53c875 controller. In this case, kernel 2.2 was fine. > > > Now I'll build some old 2.2 kernel to try... > > If 2.2 is ok with your tape, a software error in 2.4 gets very likely, in > my opinion. Well, the 2.2 distributed with Mandrake 7.2 works fine ... :) Hmmm... 32 CONSECUTIVE bytes are a very peculiar error. What can it be? Still experimenting... -- Lorenzo Marcantonio - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SCSI Tape Corruption - update 2
On Thu, 12 Apr 2001, Grard Roudier wrote: using a sym53c875 controller. In this case, kernel 2.2 was fine. Now I'll build some old 2.2 kernel to try... If 2.2 is ok with your tape, a software error in 2.4 gets very likely, in my opinion. Well, the 2.2 distributed with Mandrake 7.2 works fine ... :) Hmmm... 32 CONSECUTIVE bytes are a very peculiar error. What can it be? Still experimenting... -- Lorenzo Marcantonio - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/