Re: 2.6.19-mm1 (md/raid1 randomly drops partitions - possible sata_uli problem)

2006-12-13 Thread Rafael J. Wysocki
On Wednesday, 13 December 2006 01:53, Neil Brown wrote:
> On Tuesday December 12, [EMAIL PROTECTED] wrote:
> > > 
> > > So when md writes to write out the superblock, to gets EIO... Odd that
> > > you aren't getting errors for normal writes.
> > > 
> > > What devices are the md/raid1 built on?
> > 
> > Sata drives, on sata_uli.
> > 
> > > > 
> > > > I'll try to reproduce it tomorrow and collect some more information.
> > > 
> > > Thanks.  More information is definitely better than less, so send over
> > > anything you can find.
> > 
> > Okay, seems to be readily reproducible, dmesg output from the failing kernel
> > attached.
> 
> Weird.  You are getting silent write errors...
> 
> Can you write to these drives are all? e.g.
> 
>   dd if=/dev/sdb3 of=/tmp/tmp count=1
>   dd if=/tmp/tmp of=/dev/sdb3 oflag=direct
> 
> (hopefully 'direct' will cause write errors to be passed up).

Unfortunately I have no access to the machine right now.

> I really think this looks like a sata problem, not an md problem.

That's possible, but everything except for the md RAID seems to work.  Strange.

I think I'll wait until the next -mm is out and check if the problem goes away. 
;-)

Greetings,
Rafael


-- 
If you don't have the time to read,
you don't have the time or the tools to write.
- Stephen King
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.19-mm1 (md/raid1 randomly drops partitions - possible sata_uli problem)

2006-12-13 Thread Rafael J. Wysocki
On Wednesday, 13 December 2006 01:53, Neil Brown wrote:
 On Tuesday December 12, [EMAIL PROTECTED] wrote:
   
   So when md writes to write out the superblock, to gets EIO... Odd that
   you aren't getting errors for normal writes.
   
   What devices are the md/raid1 built on?
  
  Sata drives, on sata_uli.
  

I'll try to reproduce it tomorrow and collect some more information.
   
   Thanks.  More information is definitely better than less, so send over
   anything you can find.
  
  Okay, seems to be readily reproducible, dmesg output from the failing kernel
  attached.
 
 Weird.  You are getting silent write errors...
 
 Can you write to these drives are all? e.g.
 
   dd if=/dev/sdb3 of=/tmp/tmp count=1
   dd if=/tmp/tmp of=/dev/sdb3 oflag=direct
 
 (hopefully 'direct' will cause write errors to be passed up).

Unfortunately I have no access to the machine right now.

 I really think this looks like a sata problem, not an md problem.

That's possible, but everything except for the md RAID seems to work.  Strange.

I think I'll wait until the next -mm is out and check if the problem goes away. 
;-)

Greetings,
Rafael


-- 
If you don't have the time to read,
you don't have the time or the tools to write.
- Stephen King
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.19-mm1 (md/raid1 randomly drops partitions - possible sata_uli problem)

2006-12-12 Thread Neil Brown
On Tuesday December 12, [EMAIL PROTECTED] wrote:
> > 
> > So when md writes to write out the superblock, to gets EIO... Odd that
> > you aren't getting errors for normal writes.
> > 
> > What devices are the md/raid1 built on?
> 
> Sata drives, on sata_uli.
> 
> > > 
> > > I'll try to reproduce it tomorrow and collect some more information.
> > 
> > Thanks.  More information is definitely better than less, so send over
> > anything you can find.
> 
> Okay, seems to be readily reproducible, dmesg output from the failing kernel
> attached.

Weird.  You are getting silent write errors...

Can you write to these drives are all? e.g.

  dd if=/dev/sdb3 of=/tmp/tmp count=1
  dd if=/tmp/tmp of=/dev/sdb3 oflag=direct

(hopefully 'direct' will cause write errors to be passed up).

I really think this looks like a sata problem, not an md problem.

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.19-mm1 (md/raid1 randomly drops partitions)

2006-12-12 Thread Rafael J. Wysocki
On Tuesday, 12 December 2006 00:54, Neil Brown wrote:
> On Tuesday December 12, [EMAIL PROTECTED] wrote:
> > On Monday, 11 December 2006 23:52, Neil Brown wrote:
> > > On Monday December 11, [EMAIL PROTECTED] wrote:
> > > > Hi,
> > > > 
> > > > On Monday, 11 December 2006 09:58, Andrew Morton wrote:
> > > > > 
> > > > > Temporarily at
> > > > > 
> > > > >   http://userweb.kernel.org/~akpm/2.6.19-mm1/
> > > > > 
> > > > > Will appear later at
> > > > > 
> > > > >   
> > > > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.19/2.6.19-mm1/
> > > > 
> > > > It caused all of the md RAID1s on my test box to drop one of their 
> > > > partitions,
> > > > apparently at random.
> > > 
> > > That's clever
> > > 
> > > Do you have any kernel logs of this happening?  My guess would be the
> > > underlying device driver is returned more errors than before, but we
> > > need the logs to be sure.
> > 
> > I've only found lots of messages like this:
> > 
> > md: super_written gets error=-5, uptodate=0
> 
> So when md writes to write out the superblock, to gets EIO... Odd that
> you aren't getting errors for normal writes.
> 
> What devices are the md/raid1 built on?

Sata drives, on sata_uli.

> > 
> > I'll try to reproduce it tomorrow and collect some more information.
> 
> Thanks.  More information is definitely better than less, so send over
> anything you can find.

Okay, seems to be readily reproducible, dmesg output from the failing kernel
attached.

Greetings,
Rafael


-- 
If you don't have the time to read,
you don't have the time or the tools to write.
- Stephen King


dmesg.log.gz
Description: GNU Zip compressed data


Re: 2.6.19-mm1 (md/raid1 randomly drops partitions)

2006-12-12 Thread Rafael J. Wysocki
On Tuesday, 12 December 2006 00:54, Neil Brown wrote:
 On Tuesday December 12, [EMAIL PROTECTED] wrote:
  On Monday, 11 December 2006 23:52, Neil Brown wrote:
   On Monday December 11, [EMAIL PROTECTED] wrote:
Hi,

On Monday, 11 December 2006 09:58, Andrew Morton wrote:
 
 Temporarily at
 
   http://userweb.kernel.org/~akpm/2.6.19-mm1/
 
 Will appear later at
 
   
 ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.19/2.6.19-mm1/

It caused all of the md RAID1s on my test box to drop one of their 
partitions,
apparently at random.
   
   That's clever
   
   Do you have any kernel logs of this happening?  My guess would be the
   underlying device driver is returned more errors than before, but we
   need the logs to be sure.
  
  I've only found lots of messages like this:
  
  md: super_written gets error=-5, uptodate=0
 
 So when md writes to write out the superblock, to gets EIO... Odd that
 you aren't getting errors for normal writes.
 
 What devices are the md/raid1 built on?

Sata drives, on sata_uli.

  
  I'll try to reproduce it tomorrow and collect some more information.
 
 Thanks.  More information is definitely better than less, so send over
 anything you can find.

Okay, seems to be readily reproducible, dmesg output from the failing kernel
attached.

Greetings,
Rafael


-- 
If you don't have the time to read,
you don't have the time or the tools to write.
- Stephen King


dmesg.log.gz
Description: GNU Zip compressed data


Re: 2.6.19-mm1 (md/raid1 randomly drops partitions - possible sata_uli problem)

2006-12-12 Thread Neil Brown
On Tuesday December 12, [EMAIL PROTECTED] wrote:
  
  So when md writes to write out the superblock, to gets EIO... Odd that
  you aren't getting errors for normal writes.
  
  What devices are the md/raid1 built on?
 
 Sata drives, on sata_uli.
 
   
   I'll try to reproduce it tomorrow and collect some more information.
  
  Thanks.  More information is definitely better than less, so send over
  anything you can find.
 
 Okay, seems to be readily reproducible, dmesg output from the failing kernel
 attached.

Weird.  You are getting silent write errors...

Can you write to these drives are all? e.g.

  dd if=/dev/sdb3 of=/tmp/tmp count=1
  dd if=/tmp/tmp of=/dev/sdb3 oflag=direct

(hopefully 'direct' will cause write errors to be passed up).

I really think this looks like a sata problem, not an md problem.

NeilBrown
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.19-mm1 (md/raid1 randomly drops partitions)

2006-12-11 Thread Neil Brown
On Tuesday December 12, [EMAIL PROTECTED] wrote:
> On Monday, 11 December 2006 23:52, Neil Brown wrote:
> > On Monday December 11, [EMAIL PROTECTED] wrote:
> > > Hi,
> > > 
> > > On Monday, 11 December 2006 09:58, Andrew Morton wrote:
> > > > 
> > > > Temporarily at
> > > > 
> > > > http://userweb.kernel.org/~akpm/2.6.19-mm1/
> > > > 
> > > > Will appear later at
> > > > 
> > > > 
> > > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.19/2.6.19-mm1/
> > > 
> > > It caused all of the md RAID1s on my test box to drop one of their 
> > > partitions,
> > > apparently at random.
> > 
> > That's clever
> > 
> > Do you have any kernel logs of this happening?  My guess would be the
> > underlying device driver is returned more errors than before, but we
> > need the logs to be sure.
> 
> I've only found lots of messages like this:
> 
> md: super_written gets error=-5, uptodate=0

So when md writes to write out the superblock, to gets EIO... Odd that
you aren't getting errors for normal writes.

What devices are the md/raid1 built on?

> 
> I'll try to reproduce it tomorrow and collect some more information.

Thanks.  More information is definitely better than less, so send over
anything you can find.

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.19-mm1 (md/raid1 randomly drops partitions)

2006-12-11 Thread Rafael J. Wysocki
On Monday, 11 December 2006 23:52, Neil Brown wrote:
> On Monday December 11, [EMAIL PROTECTED] wrote:
> > Hi,
> > 
> > On Monday, 11 December 2006 09:58, Andrew Morton wrote:
> > > 
> > > Temporarily at
> > > 
> > >   http://userweb.kernel.org/~akpm/2.6.19-mm1/
> > > 
> > > Will appear later at
> > > 
> > >   
> > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.19/2.6.19-mm1/
> > 
> > It caused all of the md RAID1s on my test box to drop one of their 
> > partitions,
> > apparently at random.
> 
> That's clever
> 
> Do you have any kernel logs of this happening?  My guess would be the
> underlying device driver is returned more errors than before, but we
> need the logs to be sure.

I've only found lots of messages like this:

md: super_written gets error=-5, uptodate=0

I'll try to reproduce it tomorrow and collect some more information.

Greetings,
Rafael


-- 
If you don't have the time to read,
you don't have the time or the tools to write.
- Stephen King
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.19-mm1 (md/raid1 randomly drops partitions)

2006-12-11 Thread Neil Brown
On Monday December 11, [EMAIL PROTECTED] wrote:
> Hi,
> 
> On Monday, 11 December 2006 09:58, Andrew Morton wrote:
> > 
> > Temporarily at
> > 
> > http://userweb.kernel.org/~akpm/2.6.19-mm1/
> > 
> > Will appear later at
> > 
> > 
> > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.19/2.6.19-mm1/
> 
> It caused all of the md RAID1s on my test box to drop one of their partitions,
> apparently at random.

That's clever

Do you have any kernel logs of this happening?  My guess would be the
underlying device driver is returned more errors than before, but we
need the logs to be sure.

Thanks,
NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.19-mm1 (md/raid1 randomly drops partitions)

2006-12-11 Thread Neil Brown
On Monday December 11, [EMAIL PROTECTED] wrote:
 Hi,
 
 On Monday, 11 December 2006 09:58, Andrew Morton wrote:
  
  Temporarily at
  
  http://userweb.kernel.org/~akpm/2.6.19-mm1/
  
  Will appear later at
  
  
  ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.19/2.6.19-mm1/
 
 It caused all of the md RAID1s on my test box to drop one of their partitions,
 apparently at random.

That's clever

Do you have any kernel logs of this happening?  My guess would be the
underlying device driver is returned more errors than before, but we
need the logs to be sure.

Thanks,
NeilBrown
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.19-mm1 (md/raid1 randomly drops partitions)

2006-12-11 Thread Rafael J. Wysocki
On Monday, 11 December 2006 23:52, Neil Brown wrote:
 On Monday December 11, [EMAIL PROTECTED] wrote:
  Hi,
  
  On Monday, 11 December 2006 09:58, Andrew Morton wrote:
   
   Temporarily at
   
 http://userweb.kernel.org/~akpm/2.6.19-mm1/
   
   Will appear later at
   
 
   ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.19/2.6.19-mm1/
  
  It caused all of the md RAID1s on my test box to drop one of their 
  partitions,
  apparently at random.
 
 That's clever
 
 Do you have any kernel logs of this happening?  My guess would be the
 underlying device driver is returned more errors than before, but we
 need the logs to be sure.

I've only found lots of messages like this:

md: super_written gets error=-5, uptodate=0

I'll try to reproduce it tomorrow and collect some more information.

Greetings,
Rafael


-- 
If you don't have the time to read,
you don't have the time or the tools to write.
- Stephen King
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.19-mm1 (md/raid1 randomly drops partitions)

2006-12-11 Thread Neil Brown
On Tuesday December 12, [EMAIL PROTECTED] wrote:
 On Monday, 11 December 2006 23:52, Neil Brown wrote:
  On Monday December 11, [EMAIL PROTECTED] wrote:
   Hi,
   
   On Monday, 11 December 2006 09:58, Andrew Morton wrote:

Temporarily at

http://userweb.kernel.org/~akpm/2.6.19-mm1/

Will appear later at


ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.19/2.6.19-mm1/
   
   It caused all of the md RAID1s on my test box to drop one of their 
   partitions,
   apparently at random.
  
  That's clever
  
  Do you have any kernel logs of this happening?  My guess would be the
  underlying device driver is returned more errors than before, but we
  need the logs to be sure.
 
 I've only found lots of messages like this:
 
 md: super_written gets error=-5, uptodate=0

So when md writes to write out the superblock, to gets EIO... Odd that
you aren't getting errors for normal writes.

What devices are the md/raid1 built on?

 
 I'll try to reproduce it tomorrow and collect some more information.

Thanks.  More information is definitely better than less, so send over
anything you can find.

NeilBrown
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/