subject:"Re\: SCSI Tape corruption \- update"

Re: SCSI Tape corruption - update

2001-07-20 Thread Gérard Roudier




On Fri, 20 Jul 2001, Geert Uytterhoeven wrote:

> On Sun, 8 Jul 2001, Geert Uytterhoeven wrote:
> > New findings:
> >   - The problem doesn't happen with kernels <= 2.2.17. It does happen with all
> > kernels starting with 2.2.18-pre1.
> >   - The only related stuff that changed in 2.2.18-pre1 seems to be the
> > Sym53c8xx driver itself. I'll do some more tests soon to isolate the
> > problem.
> >   - The changes to the Sym53c8xx driver in 2.2.18-pre1 are _huge_. Are the
> > individual changes between sym53c8xx-1.3g and sym53c8xx-1.7.0 available
> > somewhere?

Not completely. The reason is that I used manual diffing/patching against
various kernel versions and it would be a PITA to resurrect all
intermediate driver versions using these patches. If we consider patches
that went directly to kernel main stream without changing the driver
version, a double PITA it would be. Btw, for sym-2.1.x series, I now use a
CVS tree and each driver release is tagged independently. For those ones,
it will be much more easy to isolate broken changes.

> The problem is indeed introduced by the changes to the Sym53c8xx in 2.2.18-pre1.
> I managed to find some intermediate versions in the 2.3.x series, and here are the
> results:
>   - sym53c8xx-1.3g (from BK linuxppc_2_2): OK
>   - sym53c8xx-1.5e: crash in SCSI interrupt during driver init
>   - sym53c8xx-1.5f: lock up during driver init
>   - sym53c8xx-1.5g: random 32-byte error bursts when writing to tape

That's an interesting result. But 1.5g - 1.3g diffs are probably very
large. Patches available from ftp.tux.org should allow to resurrect
driver versions 1.4, 1.5, 1.5a, 1.5b, 1.5c, 1.5d.

ftp://ftp.tux.org/pub/roudier/drivers/linux/sym53c8xx/README

You may, for example, apply incremental patches that address kernel 2.2.5
to a fresh kernel 2.2.5 tree and extract driver files accordingly.

> Perhaps I can get 1.5e and 1.5g to work using some PPC-specific fixes from the
> 1.3.g driver in the linuxppc_2_2 tree (it differed a bit from the 1.3g in
> Alan's 2.2.17). But even then the changes in 1.5f and 1.5g are rather small,
> compared to the changes between 1.3g and 1.5f.

Some PPC specific changes are very probably not present in my driver
sources. I am unable to help on that point.

> So I'd be very happy if I could get my hand on more intermediate versions.
> Thanks for your help! I _really_ want to nail this one down!
>
> Gr{oetje,eeting}s,

Regards,
  Gérard.



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: SCSI Tape corruption - update

2001-07-20 Thread Geert Uytterhoeven


On Sun, 8 Jul 2001, Geert Uytterhoeven wrote:
> New findings:
>   - The problem doesn't happen with kernels <= 2.2.17. It does happen with all
> kernels starting with 2.2.18-pre1.
>   - The only related stuff that changed in 2.2.18-pre1 seems to be the
> Sym53c8xx driver itself. I'll do some more tests soon to isolate the
> problem.
>   - The changes to the Sym53c8xx driver in 2.2.18-pre1 are _huge_. Are the
> individual changes between sym53c8xx-1.3g and sym53c8xx-1.7.0 available
> somewhere?

The problem is indeed introduced by the changes to the Sym53c8xx in 2.2.18-pre1.
I managed to find some intermediate versions in the 2.3.x series, and here are the
results:
  - sym53c8xx-1.3g (from BK linuxppc_2_2): OK
  - sym53c8xx-1.5e: crash in SCSI interrupt during driver init
  - sym53c8xx-1.5f: lock up during driver init
  - sym53c8xx-1.5g: random 32-byte error bursts when writing to tape

Perhaps I can get 1.5e and 1.5g to work using some PPC-specific fixes from the
1.3.g driver in the linuxppc_2_2 tree (it differed a bit from the 1.3g in
Alan's 2.2.17). But even then the changes in 1.5f and 1.5g are rather small,
compared to the changes between 1.3g and 1.5f.

So I'd be very happy if I could get my hand on more intermediate versions.
Thanks for your help! I _really_ want to nail this one down!

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [EMAIL PROTECTED]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: SCSI Tape corruption - update

2001-07-20 Thread Geert Uytterhoeven


On Sun, 8 Jul 2001, Geert Uytterhoeven wrote:
 New findings:
   - The problem doesn't happen with kernels = 2.2.17. It does happen with all
 kernels starting with 2.2.18-pre1.
   - The only related stuff that changed in 2.2.18-pre1 seems to be the
 Sym53c8xx driver itself. I'll do some more tests soon to isolate the
 problem.
   - The changes to the Sym53c8xx driver in 2.2.18-pre1 are _huge_. Are the
 individual changes between sym53c8xx-1.3g and sym53c8xx-1.7.0 available
 somewhere?

The problem is indeed introduced by the changes to the Sym53c8xx in 2.2.18-pre1.
I managed to find some intermediate versions in the 2.3.x series, and here are the
results:
  - sym53c8xx-1.3g (from BK linuxppc_2_2): OK
  - sym53c8xx-1.5e: crash in SCSI interrupt during driver init
  - sym53c8xx-1.5f: lock up during driver init
  - sym53c8xx-1.5g: random 32-byte error bursts when writing to tape

Perhaps I can get 1.5e and 1.5g to work using some PPC-specific fixes from the
1.3.g driver in the linuxppc_2_2 tree (it differed a bit from the 1.3g in
Alan's 2.2.17). But even then the changes in 1.5f and 1.5g are rather small,
compared to the changes between 1.3g and 1.5f.

So I'd be very happy if I could get my hand on more intermediate versions.
Thanks for your help! I _really_ want to nail this one down!

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [EMAIL PROTECTED]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say programmer or something like that.
-- Linus Torvalds

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: SCSI Tape corruption - update

2001-07-20 Thread Gérard Roudier




On Fri, 20 Jul 2001, Geert Uytterhoeven wrote:

 On Sun, 8 Jul 2001, Geert Uytterhoeven wrote:
  New findings:
- The problem doesn't happen with kernels = 2.2.17. It does happen with all
  kernels starting with 2.2.18-pre1.
- The only related stuff that changed in 2.2.18-pre1 seems to be the
  Sym53c8xx driver itself. I'll do some more tests soon to isolate the
  problem.
- The changes to the Sym53c8xx driver in 2.2.18-pre1 are _huge_. Are the
  individual changes between sym53c8xx-1.3g and sym53c8xx-1.7.0 available
  somewhere?

Not completely. The reason is that I used manual diffing/patching against
various kernel versions and it would be a PITA to resurrect all
intermediate driver versions using these patches. If we consider patches
that went directly to kernel main stream without changing the driver
version, a double PITA it would be. Btw, for sym-2.1.x series, I now use a
CVS tree and each driver release is tagged independently. For those ones,
it will be much more easy to isolate broken changes.

 The problem is indeed introduced by the changes to the Sym53c8xx in 2.2.18-pre1.
 I managed to find some intermediate versions in the 2.3.x series, and here are the
 results:
   - sym53c8xx-1.3g (from BK linuxppc_2_2): OK
   - sym53c8xx-1.5e: crash in SCSI interrupt during driver init
   - sym53c8xx-1.5f: lock up during driver init
   - sym53c8xx-1.5g: random 32-byte error bursts when writing to tape

That's an interesting result. But 1.5g - 1.3g diffs are probably very
large. Patches available from ftp.tux.org should allow to resurrect
driver versions 1.4, 1.5, 1.5a, 1.5b, 1.5c, 1.5d.

ftp://ftp.tux.org/pub/roudier/drivers/linux/sym53c8xx/README

You may, for example, apply incremental patches that address kernel 2.2.5
to a fresh kernel 2.2.5 tree and extract driver files accordingly.

 Perhaps I can get 1.5e and 1.5g to work using some PPC-specific fixes from the
 1.3.g driver in the linuxppc_2_2 tree (it differed a bit from the 1.3g in
 Alan's 2.2.17). But even then the changes in 1.5f and 1.5g are rather small,
 compared to the changes between 1.3g and 1.5f.

Some PPC specific changes are very probably not present in my driver
sources. I am unable to help on that point.

 So I'd be very happy if I could get my hand on more intermediate versions.
 Thanks for your help! I _really_ want to nail this one down!

 Gr{oetje,eeting}s,

Regards,
  Gérard.



-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: SCSI Tape corruption - update

2001-07-08 Thread Gérard Roudier




On Sun, 8 Jul 2001, Geert Uytterhoeven wrote:

> On Thu, 21 Jun 2001, Geert Uytterhoeven wrote:
> > On Tue, 8 May 2001, Geert Uytterhoeven wrote:
> > > In the mean time I down/upgraded to 2.2.17 on my PPC box (CHRP LongTrail,
> > > Sym53c875, HP C5136A  DDS1) and I can confirm that the problem does not happen
> > > under 2.2.17 neither.
> > >
> > > My experiences:
> > >   - reading works fine, writing doesn't
> > >   - 2.2.x works fine, 2.4.x doesn't (at least since 2.4.0-test1-ac10)
> > >   - hardware compression doesn't matter
> > >   - I have a sym53c875, Lorenzo has an Adaptec, so most likely it's not a
> > > SCSI hardware driver bug
> > >   - I have a PPC, Lorenzo doesn't, so it's not CPU-specific
> > >   - corruption is always a block of 32 bytes being replaced by 32 bytes from
> > > the previous tape block (depending on block size!) (approx. 6 errors per
> > > 256 MB)
> > >
> > > Lorenzo, can you please investigate the exact nature of the corruption on your
> > > system?
> > >   - How many successive bytes are corrupted?
> > >   - Where do the corrupted data come from?
> >
> > Yesterday I noticed the same corruption under 2.2.19 (yes, I run amverify after
> > backing up my system now, so it detects corruption through the gzip CRCs).
> >
> > I'll do some more tests (when I find time) to get a higher statistical
> > certainty that it really doesn't happen under earlier 2.2.x kernels.
>
> New findings:
>   - The problem doesn't happen with kernels <= 2.2.17. It does happen with all
> kernels starting with 2.2.18-pre1.
>   - The only related stuff that changed in 2.2.18-pre1 seems to be the
> Sym53c8xx driver itself. I'll do some more tests soon to isolate the
> problem.
>   - The changes to the Sym53c8xx driver in 2.2.18-pre1 are _huge_. Are the
> individual changes between sym53c8xx-1.3g and sym53c8xx-1.7.0 available
> somewhere?

No. But you can move the sym/ncr driver bundle from 2.2.18-pre1 to 2.2.17
and vice-versa.
 sym53c8xx.h, sym53c8xx_defs.h, sym53c8xx.c,
 sym53c8xx_comm.h, ncr53c8xx.h, ncr53c8xx.c

You also can download either sym-1.7.3c-ncr-3.4.3b, or sym-2.1.11, or just
both and play with all that stuff under 2.2.17 and later 2.2 kernels.

 ftp://ftp.tux.org/pub/roudier/README-drivers-linux

Btw, I am interested in results using sym-1.7.3c and sym-2.1.11 under
kernel 2.2.17 and possibly 2.2.18.

> BTW, I wrote a small test program which tries to analyze error bursts. You can
> find it at http://home.tvd.be/cr26864/Download/genpseudorandom.c
>
> Sample test using 2 bytes of data:
>
> genpseudorandom -o -l 2  > /dev/tape
> genpseudorandom -i < /dev/tape

Unfortunately, I haven't any tape device.

> So far I always saw problems when writing even only 10 MB to tape: ca. 3-5
> bursts of 32 or 12 incorrect bytes, which are always a copy of the
> corresponding bytes in the previous block. Of course I used a much larger test
> stream to verify 2.2.17.
>
> Thanks!
>
> Gr{oetje,eeting}s,
>
>   Geert

Thanks for your testings,
  Gérard.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: SCSI Tape corruption - update

2001-07-08 Thread Geert Uytterhoeven


On Thu, 21 Jun 2001, Geert Uytterhoeven wrote:
> On Tue, 8 May 2001, Geert Uytterhoeven wrote:
> > In the mean time I down/upgraded to 2.2.17 on my PPC box (CHRP LongTrail,
> > Sym53c875, HP C5136A  DDS1) and I can confirm that the problem does not happen
> > under 2.2.17 neither.
> > 
> > My experiences:
> >   - reading works fine, writing doesn't
> >   - 2.2.x works fine, 2.4.x doesn't (at least since 2.4.0-test1-ac10)
> >   - hardware compression doesn't matter
> >   - I have a sym53c875, Lorenzo has an Adaptec, so most likely it's not a
> > SCSI hardware driver bug
> >   - I have a PPC, Lorenzo doesn't, so it's not CPU-specific
> >   - corruption is always a block of 32 bytes being replaced by 32 bytes from
> > the previous tape block (depending on block size!) (approx. 6 errors per
> > 256 MB)
> > 
> > Lorenzo, can you please investigate the exact nature of the corruption on your
> > system?
> >   - How many successive bytes are corrupted?
> >   - Where do the corrupted data come from?
> 
> Yesterday I noticed the same corruption under 2.2.19 (yes, I run amverify after
> backing up my system now, so it detects corruption through the gzip CRCs).
> 
> I'll do some more tests (when I find time) to get a higher statistical
> certainty that it really doesn't happen under earlier 2.2.x kernels.

New findings:
  - The problem doesn't happen with kernels <= 2.2.17. It does happen with all
kernels starting with 2.2.18-pre1.
  - The only related stuff that changed in 2.2.18-pre1 seems to be the
Sym53c8xx driver itself. I'll do some more tests soon to isolate the
problem.
  - The changes to the Sym53c8xx driver in 2.2.18-pre1 are _huge_. Are the
individual changes between sym53c8xx-1.3g and sym53c8xx-1.7.0 available
somewhere?

BTW, I wrote a small test program which tries to analyze error bursts. You can
find it at http://home.tvd.be/cr26864/Download/genpseudorandom.c

Sample test using 2 bytes of data:

genpseudorandom -o -l 2  > /dev/tape
genpseudorandom -i < /dev/tape

So far I always saw problems when writing even only 10 MB to tape: ca. 3-5
bursts of 32 or 12 incorrect bytes, which are always a copy of the
corresponding bytes in the previous block. Of course I used a much larger test
stream to verify 2.2.17.

Thanks!

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [EMAIL PROTECTED]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: SCSI Tape corruption - update

2001-07-08 Thread Geert Uytterhoeven


On Thu, 21 Jun 2001, Geert Uytterhoeven wrote:
 On Tue, 8 May 2001, Geert Uytterhoeven wrote:
  In the mean time I down/upgraded to 2.2.17 on my PPC box (CHRP LongTrail,
  Sym53c875, HP C5136A  DDS1) and I can confirm that the problem does not happen
  under 2.2.17 neither.
  
  My experiences:
- reading works fine, writing doesn't
- 2.2.x works fine, 2.4.x doesn't (at least since 2.4.0-test1-ac10)
- hardware compression doesn't matter
- I have a sym53c875, Lorenzo has an Adaptec, so most likely it's not a
  SCSI hardware driver bug
- I have a PPC, Lorenzo doesn't, so it's not CPU-specific
- corruption is always a block of 32 bytes being replaced by 32 bytes from
  the previous tape block (depending on block size!) (approx. 6 errors per
  256 MB)
  
  Lorenzo, can you please investigate the exact nature of the corruption on your
  system?
- How many successive bytes are corrupted?
- Where do the corrupted data come from?
 
 Yesterday I noticed the same corruption under 2.2.19 (yes, I run amverify after
 backing up my system now, so it detects corruption through the gzip CRCs).
 
 I'll do some more tests (when I find time) to get a higher statistical
 certainty that it really doesn't happen under earlier 2.2.x kernels.

New findings:
  - The problem doesn't happen with kernels = 2.2.17. It does happen with all
kernels starting with 2.2.18-pre1.
  - The only related stuff that changed in 2.2.18-pre1 seems to be the
Sym53c8xx driver itself. I'll do some more tests soon to isolate the
problem.
  - The changes to the Sym53c8xx driver in 2.2.18-pre1 are _huge_. Are the
individual changes between sym53c8xx-1.3g and sym53c8xx-1.7.0 available
somewhere?

BTW, I wrote a small test program which tries to analyze error bursts. You can
find it at http://home.tvd.be/cr26864/Download/genpseudorandom.c

Sample test using 2 bytes of data:

genpseudorandom -o -l 2   /dev/tape
genpseudorandom -i  /dev/tape

So far I always saw problems when writing even only 10 MB to tape: ca. 3-5
bursts of 32 or 12 incorrect bytes, which are always a copy of the
corresponding bytes in the previous block. Of course I used a much larger test
stream to verify 2.2.17.

Thanks!

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [EMAIL PROTECTED]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say programmer or something like that.
-- Linus Torvalds

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: SCSI Tape corruption - update

2001-07-08 Thread Gérard Roudier




On Sun, 8 Jul 2001, Geert Uytterhoeven wrote:

 On Thu, 21 Jun 2001, Geert Uytterhoeven wrote:
  On Tue, 8 May 2001, Geert Uytterhoeven wrote:
   In the mean time I down/upgraded to 2.2.17 on my PPC box (CHRP LongTrail,
   Sym53c875, HP C5136A  DDS1) and I can confirm that the problem does not happen
   under 2.2.17 neither.
  
   My experiences:
 - reading works fine, writing doesn't
 - 2.2.x works fine, 2.4.x doesn't (at least since 2.4.0-test1-ac10)
 - hardware compression doesn't matter
 - I have a sym53c875, Lorenzo has an Adaptec, so most likely it's not a
   SCSI hardware driver bug
 - I have a PPC, Lorenzo doesn't, so it's not CPU-specific
 - corruption is always a block of 32 bytes being replaced by 32 bytes from
   the previous tape block (depending on block size!) (approx. 6 errors per
   256 MB)
  
   Lorenzo, can you please investigate the exact nature of the corruption on your
   system?
 - How many successive bytes are corrupted?
 - Where do the corrupted data come from?
 
  Yesterday I noticed the same corruption under 2.2.19 (yes, I run amverify after
  backing up my system now, so it detects corruption through the gzip CRCs).
 
  I'll do some more tests (when I find time) to get a higher statistical
  certainty that it really doesn't happen under earlier 2.2.x kernels.

 New findings:
   - The problem doesn't happen with kernels = 2.2.17. It does happen with all
 kernels starting with 2.2.18-pre1.
   - The only related stuff that changed in 2.2.18-pre1 seems to be the
 Sym53c8xx driver itself. I'll do some more tests soon to isolate the
 problem.
   - The changes to the Sym53c8xx driver in 2.2.18-pre1 are _huge_. Are the
 individual changes between sym53c8xx-1.3g and sym53c8xx-1.7.0 available
 somewhere?

No. But you can move the sym/ncr driver bundle from 2.2.18-pre1 to 2.2.17
and vice-versa.
 sym53c8xx.h, sym53c8xx_defs.h, sym53c8xx.c,
 sym53c8xx_comm.h, ncr53c8xx.h, ncr53c8xx.c

You also can download either sym-1.7.3c-ncr-3.4.3b, or sym-2.1.11, or just
both and play with all that stuff under 2.2.17 and later 2.2 kernels.

 ftp://ftp.tux.org/pub/roudier/README-drivers-linux

Btw, I am interested in results using sym-1.7.3c and sym-2.1.11 under
kernel 2.2.17 and possibly 2.2.18.

 BTW, I wrote a small test program which tries to analyze error bursts. You can
 find it at http://home.tvd.be/cr26864/Download/genpseudorandom.c

 Sample test using 2 bytes of data:

 genpseudorandom -o -l 2   /dev/tape
 genpseudorandom -i  /dev/tape

Unfortunately, I haven't any tape device.

 So far I always saw problems when writing even only 10 MB to tape: ca. 3-5
 bursts of 32 or 12 incorrect bytes, which are always a copy of the
 corresponding bytes in the previous block. Of course I used a much larger test
 stream to verify 2.2.17.

 Thanks!

 Gr{oetje,eeting}s,

   Geert

Thanks for your testings,
  Gérard.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: SCSI Tape corruption - update

2001-06-21 Thread Geert Uytterhoeven


On Tue, 8 May 2001, Geert Uytterhoeven wrote:
> On Mon, 7 May 2001, Lorenzo Marcantonio wrote:
> > On Mon, 7 May 2001, Rob Turk wrote:
> > > Have you ruled out hardware failures? There's been a few isolated reports
> > 
> > That tape drive (Sony SDT-9000, less than 2 years of service) works
> > perfectly on Windows NT (were it was before) and even on Linux 2.2
> > 
> > Also the cartridge was brand new.
> 
> In the mean time I down/upgraded to 2.2.17 on my PPC box (CHRP LongTrail,
> Sym53c875, HP C5136A  DDS1) and I can confirm that the problem does not happen
> under 2.2.17 neither.
> 
> My experiences:
>   - reading works fine, writing doesn't
>   - 2.2.x works fine, 2.4.x doesn't (at least since 2.4.0-test1-ac10)
>   - hardware compression doesn't matter
>   - I have a sym53c875, Lorenzo has an Adaptec, so most likely it's not a
> SCSI hardware driver bug
>   - I have a PPC, Lorenzo doesn't, so it's not CPU-specific
>   - corruption is always a block of 32 bytes being replaced by 32 bytes from
> the previous tape block (depending on block size!) (approx. 6 errors per
> 256 MB)
> 
> Lorenzo, can you please investigate the exact nature of the corruption on your
> system?
>   - How many successive bytes are corrupted?
>   - Where do the corrupted data come from?

Yesterday I noticed the same corruption under 2.2.19 (yes, I run amverify after
backing up my system now, so it detects corruption through the gzip CRCs).

I'll do some more tests (when I find time) to get a higher statistical
certainty that it really doesn't happen under earlier 2.2.x kernels.

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [EMAIL PROTECTED]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: SCSI Tape corruption - update

2001-06-21 Thread Geert Uytterhoeven


On Tue, 8 May 2001, Geert Uytterhoeven wrote:
 On Mon, 7 May 2001, Lorenzo Marcantonio wrote:
  On Mon, 7 May 2001, Rob Turk wrote:
   Have you ruled out hardware failures? There's been a few isolated reports
  
  That tape drive (Sony SDT-9000, less than 2 years of service) works
  perfectly on Windows NT (were it was before) and even on Linux 2.2
  
  Also the cartridge was brand new.
 
 In the mean time I down/upgraded to 2.2.17 on my PPC box (CHRP LongTrail,
 Sym53c875, HP C5136A  DDS1) and I can confirm that the problem does not happen
 under 2.2.17 neither.
 
 My experiences:
   - reading works fine, writing doesn't
   - 2.2.x works fine, 2.4.x doesn't (at least since 2.4.0-test1-ac10)
   - hardware compression doesn't matter
   - I have a sym53c875, Lorenzo has an Adaptec, so most likely it's not a
 SCSI hardware driver bug
   - I have a PPC, Lorenzo doesn't, so it's not CPU-specific
   - corruption is always a block of 32 bytes being replaced by 32 bytes from
 the previous tape block (depending on block size!) (approx. 6 errors per
 256 MB)
 
 Lorenzo, can you please investigate the exact nature of the corruption on your
 system?
   - How many successive bytes are corrupted?
   - Where do the corrupted data come from?

Yesterday I noticed the same corruption under 2.2.19 (yes, I run amverify after
backing up my system now, so it detects corruption through the gzip CRCs).

I'll do some more tests (when I find time) to get a higher statistical
certainty that it really doesn't happen under earlier 2.2.x kernels.

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [EMAIL PROTECTED]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say programmer or something like that.
-- Linus Torvalds

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: SCSI Tape corruption - update

2001-05-09 Thread Ishikawa


For comparison purposes,

I use stock kernel 2.4.4.
Use scsi tape support as module.
Tape drive is HP c1539 (aka 1533a) dds-2.
This drive is on the scsi chain of Tekram dc390, tmscsim driver
(used as module).
Hardware compression is enabled.

Under this setup,

tar cvbf 20 /dev/st0 large_directory

works perfectly, and I can read it back without problem.

What software do you use for writing to tape?

Or maybe the problem is in the latest -ac tree only?

(HP has a software that checks the hardware installation and
drive health.
The software runs on Windows, and it supports firmware upgrade,
simple drive self-check, read/write check, etc. Highly recommended.
Obviously, the software is meant to help the HP tech support.
It generates a support ticket with the internal state of the firmware
media recoverable error statistics history and the like.

If the manufacturer of your tape drive has
a similar test software, you might want to check
the hardware using the vendor software.)



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: SCSI Tape corruption - update

2001-05-09 Thread Ishikawa


For comparison purposes,

I use stock kernel 2.4.4.
Use scsi tape support as module.
Tape drive is HP c1539 (aka 1533a) dds-2.
This drive is on the scsi chain of Tekram dc390, tmscsim driver
(used as module).
Hardware compression is enabled.

Under this setup,

tar cvbf 20 /dev/st0 large_directory

works perfectly, and I can read it back without problem.

What software do you use for writing to tape?

Or maybe the problem is in the latest -ac tree only?

(HP has a software that checks the hardware installation and
drive health.
The software runs on Windows, and it supports firmware upgrade,
simple drive self-check, read/write check, etc. Highly recommended.
Obviously, the software is meant to help the HP tech support.
It generates a support ticket with the internal state of the firmware
media recoverable error statistics history and the like.

If the manufacturer of your tape drive has
a similar test software, you might want to check
the hardware using the vendor software.)



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: SCSI Tape corruption - update

2001-05-08 Thread Lorenzo Marcantonio


On Tue, 8 May 2001, Geert Uytterhoeven wrote:

> In the mean time I down/upgraded to 2.2.17 on my PPC box (CHRP LongTrail,
> Sym53c875, HP C5136A  DDS1) and I can confirm that the problem does not happen
> under 2.2.17 neither.
>
> My experiences:
>   - reading works fine, writing doesn't

Same here

>   - 2.2.x works fine, 2.4.x doesn't (at least since 2.4.0-test1-ac10)

SAME here

>   - hardware compression doesn't matter

SAME HERE

>   - I have a sym53c875, Lorenzo has an Adaptec, so most likely it's not a
> SCSI hardware driver bug
>   - I have a PPC, Lorenzo doesn't, so it's not CPU-specific
>   - corruption is always a block of 32 bytes being replaced by 32 bytes from
> the previous tape block (depending on block size!) (approx. 6 errors per
> 256 MB)

YESSS... EXACTLY 32 consecutive bytes are different. I'll bet we've got
the same problem

>   - How many successive bytes are corrupted?
>   - Where do the corrupted data come from?


H I'll set up some sort of binary pattern match. This afternoon
I'll pinpoint the source of the rogue bytes...


-- Lorenzo Marcantonio


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: SCSI Tape corruption - update

2001-05-08 Thread Geert Uytterhoeven


On Mon, 7 May 2001, Lorenzo Marcantonio wrote:
> On Mon, 7 May 2001, Rob Turk wrote:
> > Have you ruled out hardware failures? There's been a few isolated reports
> 
> That tape drive (Sony SDT-9000, less than 2 years of service) works
> perfectly on Windows NT (were it was before) and even on Linux 2.2
> 
> Also the cartridge was brand new.

In the mean time I down/upgraded to 2.2.17 on my PPC box (CHRP LongTrail,
Sym53c875, HP C5136A  DDS1) and I can confirm that the problem does not happen
under 2.2.17 neither.

My experiences:
  - reading works fine, writing doesn't
  - 2.2.x works fine, 2.4.x doesn't (at least since 2.4.0-test1-ac10)
  - hardware compression doesn't matter
  - I have a sym53c875, Lorenzo has an Adaptec, so most likely it's not a
SCSI hardware driver bug
  - I have a PPC, Lorenzo doesn't, so it's not CPU-specific
  - corruption is always a block of 32 bytes being replaced by 32 bytes from
the previous tape block (depending on block size!) (approx. 6 errors per
256 MB)

Lorenzo, can you please investigate the exact nature of the corruption on your
system?
  - How many successive bytes are corrupted?
  - Where do the corrupted data come from?

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [EMAIL PROTECTED]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: SCSI Tape corruption - update

2001-05-08 Thread Geert Uytterhoeven


On Mon, 7 May 2001, Lorenzo Marcantonio wrote:
 On Mon, 7 May 2001, Rob Turk wrote:
  Have you ruled out hardware failures? There's been a few isolated reports
 
 That tape drive (Sony SDT-9000, less than 2 years of service) works
 perfectly on Windows NT (were it was before) and even on Linux 2.2
 
 Also the cartridge was brand new.

In the mean time I down/upgraded to 2.2.17 on my PPC box (CHRP LongTrail,
Sym53c875, HP C5136A  DDS1) and I can confirm that the problem does not happen
under 2.2.17 neither.

My experiences:
  - reading works fine, writing doesn't
  - 2.2.x works fine, 2.4.x doesn't (at least since 2.4.0-test1-ac10)
  - hardware compression doesn't matter
  - I have a sym53c875, Lorenzo has an Adaptec, so most likely it's not a
SCSI hardware driver bug
  - I have a PPC, Lorenzo doesn't, so it's not CPU-specific
  - corruption is always a block of 32 bytes being replaced by 32 bytes from
the previous tape block (depending on block size!) (approx. 6 errors per
256 MB)

Lorenzo, can you please investigate the exact nature of the corruption on your
system?
  - How many successive bytes are corrupted?
  - Where do the corrupted data come from?

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [EMAIL PROTECTED]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say programmer or something like that.
-- Linus Torvalds

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: SCSI Tape corruption - update

2001-05-07 Thread Lorenzo Marcantonio


On Mon, 7 May 2001, Rob Turk wrote:

> Lorenzo,
>
> Have you ruled out hardware failures? There's been a few isolated reports

That tape drive (Sony SDT-9000, less than 2 years of service) works
perfectly on Windows NT (were it was before) and even on Linux 2.2

Also the cartridge was brand new.

(BTW, I've tried even with DC disabled. Well, it's REALLY fast
dumping /dev/zero on tape with DC enabled :) )

-- Lorenzo Marcantonio


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: SCSI Tape corruption - update

2001-05-07 Thread Rob Turk


"Lorenzo Marcantonio" <[EMAIL PROTECTED]> wrote in message
news:[EMAIL PROTECTED]...
>
> As of my latest build [2.4.5-pre1] I've STILL got the tape corruption
> problem. Some new facts:
>
> (1) It happens only writing the tape (tried exchanging tapes with a
> brand new Alpha Digital Tru64 box). I can read her tape, she can't read
> my tape. Tried with GNU tar and gzip.
>

Lorenzo,

Have you ruled out hardware failures? There's been a few isolated reports
about tape drives returning good status on write, where in fact they were
writing corrupt data. This can happen when the compression hardware is
malfunctioning. On many tape drives, read-back check isn't carried all the
way back to the original (uncompressed) data.

Rob




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: SCSI Tape corruption - update

2001-05-07 Thread Rob Turk


Lorenzo Marcantonio [EMAIL PROTECTED] wrote in message
news:[EMAIL PROTECTED]...

 As of my latest build [2.4.5-pre1] I've STILL got the tape corruption
 problem. Some new facts:

 (1) It happens only writing the tape (tried exchanging tapes with a
 brand new Alpha Digital Tru64 box). I can read her tape, she can't read
 my tape. Tried with GNU tar and gzip.


Lorenzo,

Have you ruled out hardware failures? There's been a few isolated reports
about tape drives returning good status on write, where in fact they were
writing corrupt data. This can happen when the compression hardware is
malfunctioning. On many tape drives, read-back check isn't carried all the
way back to the original (uncompressed) data.

Rob




-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: SCSI Tape corruption - update

2001-05-07 Thread Lorenzo Marcantonio


On Mon, 7 May 2001, Rob Turk wrote:

 Lorenzo,

 Have you ruled out hardware failures? There's been a few isolated reports

That tape drive (Sony SDT-9000, less than 2 years of service) works
perfectly on Windows NT (were it was before) and even on Linux 2.2

Also the cartridge was brand new.

(BTW, I've tried even with DC disabled. Well, it's REALLY fast
dumping /dev/zero on tape with DC enabled :) )

-- Lorenzo Marcantonio


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: SCSI Tape Corruption - update 2

2001-04-14 Thread Chip Salzenberg


In article <[EMAIL PROTECTED]> you write:
>On Fri, 13 Apr 2001, Nate Eldredge wrote:
>> (32 bytes is the size of a cache line.)  A memory tester might be
>> something to try (I wrote a simple program that seemed to show the
>> error better than memtest86; can send it if desired.)
>
>Already tried that... this system has passed some 20 hours running
>memtest86...

I suggest you try Cerberus:

  https://sourceforge.net/projects/va-ctcs/

which will viciously beat your system to within an inch of its life.
If you have any motherboard problems, they're more likely to show up
with Cerberus than with a simple memtest.
-- 
Chip Salzenberg  - a.k.a. - <[EMAIL PROTECTED]>
 "We have no fuel on board, plus or minus 8 kilograms."  -- NEAR tech
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: SCSI Tape Corruption - update 2

2001-04-14 Thread Lorenzo Marcantonio


On Fri, 13 Apr 2001, Nate Eldredge wrote:

> (32 bytes is the size of a cache line.)  A memory tester might be
> something to try (I wrote a simple program that seemed to show the
> error better than memtest86; can send it if desired.)

Already tried that... this system has passed some 20 hours running
memtest86...

Also I've got NO OTHER memory failure symptom (and the tape fails only on
writing)

-- Lorenzo Marcantonio

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: SCSI Tape Corruption - update 2

2001-04-14 Thread Geert Uytterhoeven


On Fri, 13 Apr 2001, Nate Eldredge wrote:
> [EMAIL PROTECTED] wrote:
> > Well, the 2.2 distributed with Mandrake 7.2 works fine ... :) 
> >
> > Hmmm... 32 CONSECUTIVE bytes are a very peculiar error. What can it be? 
> >
> > Still experimenting...
> 
> I once ran into a problem with 32-byte errors appearing in files, and
> later, in memory.  I eventually traced it to buggy motherboard cache.
> (32 bytes is the size of a cache line.)  A memory tester might be
> something to try (I wrote a simple program that seemed to show the
> error better than memtest86; can send it if desired.)

In that case I'd expect the problem to show up when doing whatever. So far I
could not find corrupted files on my hard disk, only when writing to tape, and
only with 2.3/2.4.

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [EMAIL PROTECTED]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: SCSI Tape Corruption - update 2

2001-04-14 Thread Geert Uytterhoeven


On Fri, 13 Apr 2001, Nate Eldredge wrote:
 [EMAIL PROTECTED] wrote:
  Well, the 2.2 distributed with Mandrake 7.2 works fine ... :) 
 
  Hmmm... 32 CONSECUTIVE bytes are a very peculiar error. What can it be? 
 
  Still experimenting...
 
 I once ran into a problem with 32-byte errors appearing in files, and
 later, in memory.  I eventually traced it to buggy motherboard cache.
 (32 bytes is the size of a cache line.)  A memory tester might be
 something to try (I wrote a simple program that seemed to show the
 error better than memtest86; can send it if desired.)

In that case I'd expect the problem to show up when doing whatever. So far I
could not find corrupted files on my hard disk, only when writing to tape, and
only with 2.3/2.4.

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [EMAIL PROTECTED]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: SCSI Tape Corruption - update 2

2001-04-14 Thread Lorenzo Marcantonio


On Fri, 13 Apr 2001, Nate Eldredge wrote:

 (32 bytes is the size of a cache line.)  A memory tester might be
 something to try (I wrote a simple program that seemed to show the
 error better than memtest86; can send it if desired.)

Already tried that... this system has passed some 20 hours running
memtest86...

Also I've got NO OTHER memory failure symptom (and the tape fails only on
writing)

-- Lorenzo Marcantonio

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: SCSI Tape Corruption - update 2

2001-04-14 Thread Chip Salzenberg


In article [EMAIL PROTECTED] you write:
On Fri, 13 Apr 2001, Nate Eldredge wrote:
 (32 bytes is the size of a cache line.)  A memory tester might be
 something to try (I wrote a simple program that seemed to show the
 error better than memtest86; can send it if desired.)

Already tried that... this system has passed some 20 hours running
memtest86...

I suggest you try Cerberus:

  https://sourceforge.net/projects/va-ctcs/

which will viciously beat your system to within an inch of its life.
If you have any motherboard problems, they're more likely to show up
with Cerberus than with a simple memtest.
-- 
Chip Salzenberg  - a.k.a. - [EMAIL PROTECTED]
 "We have no fuel on board, plus or minus 8 kilograms."  -- NEAR tech
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: SCSI Tape Corruption - update 2

2001-04-13 Thread Nate Eldredge

[EMAIL PROTECTED] wrote:

> Well, the 2.2 distributed with Mandrake 7.2 works fine ... :) 
>
> Hmmm... 32 CONSECUTIVE bytes are a very peculiar error. What can it be? 
>
> Still experimenting...

I once ran into a problem with 32-byte errors appearing in files, and
later, in memory.  I eventually traced it to buggy motherboard cache.
(32 bytes is the size of a cache line.)  A memory tester might be
something to try (I wrote a simple program that seemed to show the
error better than memtest86; can send it if desired.)

-- 

Nate Eldredge
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: SCSI Tape Corruption - update

2001-04-13 Thread Geert Uytterhoeven

On Fri, 13 Apr 2001, Geert Uytterhoeven wrote:
> On Thu, 12 Apr 2001 [EMAIL PROTECTED] wrote:
> > It seems that the tape is written incorrectly. I wrote some large file
> > (300MB)
> > and read it back four time. The read copies are all the same. They differ
> > from the original only in 32 consecutive bytes (the replaced values SEEM
> > random). Of course, 32 bytes in 300MB tar.gz files are TOO MUCH to be 
> > accepted :)
> 
> In my case, the 32 bad bytes are always a copy of the 32 bytes 10K before (10K
> = blocksize of tar). Can you verify that's the case for you as well? For
> reference, I have approx. 6 sequences of corrupted data when writing 256 MB to
> tape. Reading gives no problems.

Forgot some things...

It also happens with dd, so it's not a bug in tar.
If I set the tar blocksize to 512 bytes, the offset changes to 512 bytes as
well.
If I set the tar blocksize to 57*512 bytes, I didn't see a problem (however,
could have been `good luck').

The problem seems to be there since at least 2.4.0-test1-ac10, which means
quite some people may no longer have known good backups of their valuable data
(of course we should not run 2.[34].x kernels on our systems, right? :-)

Since you have a different SCSI host adapter, the problem is most likely in
st.c. I was thinking of writing `predictable' data (or checksummed blocks or
so) to tape and add some data verification tests to st.c at the very last
moment before it sends a write command to the SCSI host adapter, but I haven't
found time for that yet.

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [EMAIL PROTECTED]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: SCSI Tape Corruption - update

2001-04-13 Thread Geert Uytterhoeven

On Thu, 12 Apr 2001 [EMAIL PROTECTED] wrote:
> Still experimenting with my SDT-9000... tried connecting it to another
> controller
> (2940AU in place of 2904, sorry but I've only Adaptec stuff :). Same
> problem.
> Tried with another tape (even with an old DDS-2 tape). Same. Even tried
> another
> cable/removing the CDWR drive from the bus.
> 
> It seems that the tape is written incorrectly. I wrote some large file
> (300MB)
> and read it back four time. The read copies are all the same. They differ
> from the original only in 32 consecutive bytes (the replaced values SEEM
> random). Of course, 32 bytes in 300MB tar.gz files are TOO MUCH to be 
> accepted :)

As Gérard already replied, I have the same problem on my PPC box (cfr. my
postings last month) with DDS-1 tape drive. It has 2 SCSI adapters (MESH and
Sym53c875), and it seems to happen with the '875 only (but the MESH sucks
anyway and has other problems making it unusable for my DDS-1).

In my case, the 32 bad bytes are always a copy of the 32 bytes 10K before (10K
= blocksize of tar). Can you verify that's the case for you as well? For
reference, I have approx. 6 sequences of corrupted data when writing 256 MB to
tape. Reading gives no problems.

The problem does not appear in 2.2.13 (yep, that's old, but so far the latest
2.2.x kernel that runs on my CHRP LongTrail). I have to fix later kernels
first.

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [EMAIL PROTECTED]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: SCSI Tape Corruption - update

2001-04-13 Thread Geert Uytterhoeven


On Thu, 12 Apr 2001 [EMAIL PROTECTED] wrote:
 Still experimenting with my SDT-9000... tried connecting it to another
 controller
 (2940AU in place of 2904, sorry but I've only Adaptec stuff :). Same
 problem.
 Tried with another tape (even with an old DDS-2 tape). Same. Even tried
 another
 cable/removing the CDWR drive from the bus.
 
 It seems that the tape is written incorrectly. I wrote some large file
 (300MB)
 and read it back four time. The read copies are all the same. They differ
 from the original only in 32 consecutive bytes (the replaced values SEEM
 random). Of course, 32 bytes in 300MB tar.gz files are TOO MUCH to be 
 accepted :)

As Grard already replied, I have the same problem on my PPC box (cfr. my
postings last month) with DDS-1 tape drive. It has 2 SCSI adapters (MESH and
Sym53c875), and it seems to happen with the '875 only (but the MESH sucks
anyway and has other problems making it unusable for my DDS-1).

In my case, the 32 bad bytes are always a copy of the 32 bytes 10K before (10K
= blocksize of tar). Can you verify that's the case for you as well? For
reference, I have approx. 6 sequences of corrupted data when writing 256 MB to
tape. Reading gives no problems.

The problem does not appear in 2.2.13 (yep, that's old, but so far the latest
2.2.x kernel that runs on my CHRP LongTrail). I have to fix later kernels
first.

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [EMAIL PROTECTED]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: SCSI Tape Corruption - update

2001-04-13 Thread Geert Uytterhoeven


On Fri, 13 Apr 2001, Geert Uytterhoeven wrote:
 On Thu, 12 Apr 2001 [EMAIL PROTECTED] wrote:
  It seems that the tape is written incorrectly. I wrote some large file
  (300MB)
  and read it back four time. The read copies are all the same. They differ
  from the original only in 32 consecutive bytes (the replaced values SEEM
  random). Of course, 32 bytes in 300MB tar.gz files are TOO MUCH to be 
  accepted :)
 
 In my case, the 32 bad bytes are always a copy of the 32 bytes 10K before (10K
 = blocksize of tar). Can you verify that's the case for you as well? For
 reference, I have approx. 6 sequences of corrupted data when writing 256 MB to
 tape. Reading gives no problems.

Forgot some things...

It also happens with dd, so it's not a bug in tar.
If I set the tar blocksize to 512 bytes, the offset changes to 512 bytes as
well.
If I set the tar blocksize to 57*512 bytes, I didn't see a problem (however,
could have been `good luck').

The problem seems to be there since at least 2.4.0-test1-ac10, which means
quite some people may no longer have known good backups of their valuable data
(of course we should not run 2.[34].x kernels on our systems, right? :-)

Since you have a different SCSI host adapter, the problem is most likely in
st.c. I was thinking of writing `predictable' data (or checksummed blocks or
so) to tape and add some data verification tests to st.c at the very last
moment before it sends a write command to the SCSI host adapter, but I haven't
found time for that yet.

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [EMAIL PROTECTED]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: SCSI Tape Corruption - update 2

2001-04-13 Thread Nate Eldredge


[EMAIL PROTECTED] wrote:

 Well, the 2.2 distributed with Mandrake 7.2 works fine ... :) 

 Hmmm... 32 CONSECUTIVE bytes are a very peculiar error. What can it be? 

 Still experimenting...

I once ran into a problem with 32-byte errors appearing in files, and
later, in memory.  I eventually traced it to buggy motherboard cache.
(32 bytes is the size of a cache line.)  A memory tester might be
something to try (I wrote a simple program that seemed to show the
error better than memtest86; can send it if desired.)
 
-- 

Nate Eldredge
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: SCSI Tape Corruption - update 2

2001-04-12 Thread Lorenzo Marcantonio


On Thu, 12 Apr 2001, Gérard Roudier wrote:

> using a sym53c875 controller. In this case, kernel 2.2 was fine.
>
> > Now I'll build some old 2.2 kernel to try...
>
> If 2.2 is ok with your tape, a software error in 2.4 gets very likely, in
> my opinion.

Well, the 2.2 distributed with Mandrake 7.2 works fine ... :)

Hmmm... 32 CONSECUTIVE bytes are a very peculiar error. What can it be?

Still experimenting...

-- Lorenzo Marcantonio

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: SCSI Tape Corruption - update

2001-04-12 Thread Gérard Roudier




On Thu, 12 Apr 2001 [EMAIL PROTECTED] wrote:

> Still experimenting with my SDT-9000... tried connecting it to another
> controller
> (2940AU in place of 2904, sorry but I've only Adaptec stuff :). Same
> problem.
> Tried with another tape (even with an old DDS-2 tape). Same. Even tried
> another
> cable/removing the CDWR drive from the bus.
> 
> It seems that the tape is written incorrectly. I wrote some large file
> (300MB)
> and read it back four time. The read copies are all the same. They differ
> from the original only in 32 consecutive bytes (the replaced values SEEM
> random). Of course, 32 bytes in 300MB tar.gz files are TOO MUCH to be 
> accepted :)

A similar problem has been reported under Linux/PPC a couple of weeks ago
using a sym53c875 controller. In this case, kernel 2.2 was fine.

> Now I'll build some old 2.2 kernel to try...

If 2.2 is ok with your tape, a software error in 2.4 gets very likely, in
my opinion.

  Gérard.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: SCSI Tape Corruption - update

2001-04-12 Thread Bob_Tracy

[EMAIL PROTECTED] wrote:
> It seems that the tape is written incorrectly. I wrote some large file
> (300MB)
> and read it back four time. The read copies are all the same. They differ
> from the original only in 32 consecutive bytes (the replaced values SEEM
> random). Of course, 32 bytes in 300MB tar.gz files are TOO MUCH to be 
> accepted :)

Several years ago I ran into a problem with similar symptoms on an old
Adaptec AHA-154X controller.  Files (and most certainly "file systems"
if I had persisted) on my hard disk were getting corrupted in random
places with constant length strings of garbage.  This turned out to be
an inappropriate setting for the AHA1542_SCATTER constant: it *was* 16,
and setting it to 8 fixed my problem.  I'd look for a similar "#define"
in the header file for your SCSI device driver and try cutting the value
by half.  Why "half"?  No justification other than it worked for me, and
it's a power-of-two kind of thing that hardware seems to like :-).

--Bob
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: SCSI Tape Corruption - update

2001-04-12 Thread Bob_Tracy


[EMAIL PROTECTED] wrote:
 It seems that the tape is written incorrectly. I wrote some large file
 (300MB)
 and read it back four time. The read copies are all the same. They differ
 from the original only in 32 consecutive bytes (the replaced values SEEM
 random). Of course, 32 bytes in 300MB tar.gz files are TOO MUCH to be 
 accepted :)

Several years ago I ran into a problem with similar symptoms on an old
Adaptec AHA-154X controller.  Files (and most certainly "file systems"
if I had persisted) on my hard disk were getting corrupted in random
places with constant length strings of garbage.  This turned out to be
an inappropriate setting for the AHA1542_SCATTER constant: it *was* 16,
and setting it to 8 fixed my problem.  I'd look for a similar "#define"
in the header file for your SCSI device driver and try cutting the value
by half.  Why "half"?  No justification other than it worked for me, and
it's a power-of-two kind of thing that hardware seems to like :-).

--Bob
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: SCSI Tape Corruption - update

2001-04-12 Thread Gérard Roudier




On Thu, 12 Apr 2001 [EMAIL PROTECTED] wrote:

 Still experimenting with my SDT-9000... tried connecting it to another
 controller
 (2940AU in place of 2904, sorry but I've only Adaptec stuff :). Same
 problem.
 Tried with another tape (even with an old DDS-2 tape). Same. Even tried
 another
 cable/removing the CDWR drive from the bus.
 
 It seems that the tape is written incorrectly. I wrote some large file
 (300MB)
 and read it back four time. The read copies are all the same. They differ
 from the original only in 32 consecutive bytes (the replaced values SEEM
 random). Of course, 32 bytes in 300MB tar.gz files are TOO MUCH to be 
 accepted :)

A similar problem has been reported under Linux/PPC a couple of weeks ago
using a sym53c875 controller. In this case, kernel 2.2 was fine.

 Now I'll build some old 2.2 kernel to try...

If 2.2 is ok with your tape, a software error in 2.4 gets very likely, in
my opinion.

  Grard.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: SCSI Tape Corruption - update 2

2001-04-12 Thread Lorenzo Marcantonio


On Thu, 12 Apr 2001, Grard Roudier wrote:

 using a sym53c875 controller. In this case, kernel 2.2 was fine.

  Now I'll build some old 2.2 kernel to try...

 If 2.2 is ok with your tape, a software error in 2.4 gets very likely, in
 my opinion.

Well, the 2.2 distributed with Mandrake 7.2 works fine ... :)

Hmmm... 32 CONSECUTIVE bytes are a very peculiar error. What can it be?

Still experimenting...

-- Lorenzo Marcantonio

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

37 matches

Mail list logo