Re: 2.6.19-mm1 (md/raid1 randomly drops partitions - possible sata_uli problem)
On Wednesday, 13 December 2006 01:53, Neil Brown wrote: > On Tuesday December 12, [EMAIL PROTECTED] wrote: > > > > > > So when md writes to write out the superblock, to gets EIO... Odd that > > > you aren't getting errors for normal writes. > > > > > > What devices are the md/raid1 built on? > > > > Sata drives, on sata_uli. > > > > > > > > > > I'll try to reproduce it tomorrow and collect some more information. > > > > > > Thanks. More information is definitely better than less, so send over > > > anything you can find. > > > > Okay, seems to be readily reproducible, dmesg output from the failing kernel > > attached. > > Weird. You are getting silent write errors... > > Can you write to these drives are all? e.g. > > dd if=/dev/sdb3 of=/tmp/tmp count=1 > dd if=/tmp/tmp of=/dev/sdb3 oflag=direct > > (hopefully 'direct' will cause write errors to be passed up). Unfortunately I have no access to the machine right now. > I really think this looks like a sata problem, not an md problem. That's possible, but everything except for the md RAID seems to work. Strange. I think I'll wait until the next -mm is out and check if the problem goes away. ;-) Greetings, Rafael -- If you don't have the time to read, you don't have the time or the tools to write. - Stephen King - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.19-mm1 (md/raid1 randomly drops partitions - possible sata_uli problem)
On Wednesday, 13 December 2006 01:53, Neil Brown wrote: On Tuesday December 12, [EMAIL PROTECTED] wrote: So when md writes to write out the superblock, to gets EIO... Odd that you aren't getting errors for normal writes. What devices are the md/raid1 built on? Sata drives, on sata_uli. I'll try to reproduce it tomorrow and collect some more information. Thanks. More information is definitely better than less, so send over anything you can find. Okay, seems to be readily reproducible, dmesg output from the failing kernel attached. Weird. You are getting silent write errors... Can you write to these drives are all? e.g. dd if=/dev/sdb3 of=/tmp/tmp count=1 dd if=/tmp/tmp of=/dev/sdb3 oflag=direct (hopefully 'direct' will cause write errors to be passed up). Unfortunately I have no access to the machine right now. I really think this looks like a sata problem, not an md problem. That's possible, but everything except for the md RAID seems to work. Strange. I think I'll wait until the next -mm is out and check if the problem goes away. ;-) Greetings, Rafael -- If you don't have the time to read, you don't have the time or the tools to write. - Stephen King - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.19-mm1 (md/raid1 randomly drops partitions - possible sata_uli problem)
On Tuesday December 12, [EMAIL PROTECTED] wrote: > > > > So when md writes to write out the superblock, to gets EIO... Odd that > > you aren't getting errors for normal writes. > > > > What devices are the md/raid1 built on? > > Sata drives, on sata_uli. > > > > > > > I'll try to reproduce it tomorrow and collect some more information. > > > > Thanks. More information is definitely better than less, so send over > > anything you can find. > > Okay, seems to be readily reproducible, dmesg output from the failing kernel > attached. Weird. You are getting silent write errors... Can you write to these drives are all? e.g. dd if=/dev/sdb3 of=/tmp/tmp count=1 dd if=/tmp/tmp of=/dev/sdb3 oflag=direct (hopefully 'direct' will cause write errors to be passed up). I really think this looks like a sata problem, not an md problem. NeilBrown - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.19-mm1 (md/raid1 randomly drops partitions)
On Tuesday, 12 December 2006 00:54, Neil Brown wrote: > On Tuesday December 12, [EMAIL PROTECTED] wrote: > > On Monday, 11 December 2006 23:52, Neil Brown wrote: > > > On Monday December 11, [EMAIL PROTECTED] wrote: > > > > Hi, > > > > > > > > On Monday, 11 December 2006 09:58, Andrew Morton wrote: > > > > > > > > > > Temporarily at > > > > > > > > > > http://userweb.kernel.org/~akpm/2.6.19-mm1/ > > > > > > > > > > Will appear later at > > > > > > > > > > > > > > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.19/2.6.19-mm1/ > > > > > > > > It caused all of the md RAID1s on my test box to drop one of their > > > > partitions, > > > > apparently at random. > > > > > > That's clever > > > > > > Do you have any kernel logs of this happening? My guess would be the > > > underlying device driver is returned more errors than before, but we > > > need the logs to be sure. > > > > I've only found lots of messages like this: > > > > md: super_written gets error=-5, uptodate=0 > > So when md writes to write out the superblock, to gets EIO... Odd that > you aren't getting errors for normal writes. > > What devices are the md/raid1 built on? Sata drives, on sata_uli. > > > > I'll try to reproduce it tomorrow and collect some more information. > > Thanks. More information is definitely better than less, so send over > anything you can find. Okay, seems to be readily reproducible, dmesg output from the failing kernel attached. Greetings, Rafael -- If you don't have the time to read, you don't have the time or the tools to write. - Stephen King dmesg.log.gz Description: GNU Zip compressed data
Re: 2.6.19-mm1 (md/raid1 randomly drops partitions)
On Tuesday, 12 December 2006 00:54, Neil Brown wrote: On Tuesday December 12, [EMAIL PROTECTED] wrote: On Monday, 11 December 2006 23:52, Neil Brown wrote: On Monday December 11, [EMAIL PROTECTED] wrote: Hi, On Monday, 11 December 2006 09:58, Andrew Morton wrote: Temporarily at http://userweb.kernel.org/~akpm/2.6.19-mm1/ Will appear later at ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.19/2.6.19-mm1/ It caused all of the md RAID1s on my test box to drop one of their partitions, apparently at random. That's clever Do you have any kernel logs of this happening? My guess would be the underlying device driver is returned more errors than before, but we need the logs to be sure. I've only found lots of messages like this: md: super_written gets error=-5, uptodate=0 So when md writes to write out the superblock, to gets EIO... Odd that you aren't getting errors for normal writes. What devices are the md/raid1 built on? Sata drives, on sata_uli. I'll try to reproduce it tomorrow and collect some more information. Thanks. More information is definitely better than less, so send over anything you can find. Okay, seems to be readily reproducible, dmesg output from the failing kernel attached. Greetings, Rafael -- If you don't have the time to read, you don't have the time or the tools to write. - Stephen King dmesg.log.gz Description: GNU Zip compressed data
Re: 2.6.19-mm1 (md/raid1 randomly drops partitions - possible sata_uli problem)
On Tuesday December 12, [EMAIL PROTECTED] wrote: So when md writes to write out the superblock, to gets EIO... Odd that you aren't getting errors for normal writes. What devices are the md/raid1 built on? Sata drives, on sata_uli. I'll try to reproduce it tomorrow and collect some more information. Thanks. More information is definitely better than less, so send over anything you can find. Okay, seems to be readily reproducible, dmesg output from the failing kernel attached. Weird. You are getting silent write errors... Can you write to these drives are all? e.g. dd if=/dev/sdb3 of=/tmp/tmp count=1 dd if=/tmp/tmp of=/dev/sdb3 oflag=direct (hopefully 'direct' will cause write errors to be passed up). I really think this looks like a sata problem, not an md problem. NeilBrown - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.19-mm1 (md/raid1 randomly drops partitions)
On Tuesday December 12, [EMAIL PROTECTED] wrote: > On Monday, 11 December 2006 23:52, Neil Brown wrote: > > On Monday December 11, [EMAIL PROTECTED] wrote: > > > Hi, > > > > > > On Monday, 11 December 2006 09:58, Andrew Morton wrote: > > > > > > > > Temporarily at > > > > > > > > http://userweb.kernel.org/~akpm/2.6.19-mm1/ > > > > > > > > Will appear later at > > > > > > > > > > > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.19/2.6.19-mm1/ > > > > > > It caused all of the md RAID1s on my test box to drop one of their > > > partitions, > > > apparently at random. > > > > That's clever > > > > Do you have any kernel logs of this happening? My guess would be the > > underlying device driver is returned more errors than before, but we > > need the logs to be sure. > > I've only found lots of messages like this: > > md: super_written gets error=-5, uptodate=0 So when md writes to write out the superblock, to gets EIO... Odd that you aren't getting errors for normal writes. What devices are the md/raid1 built on? > > I'll try to reproduce it tomorrow and collect some more information. Thanks. More information is definitely better than less, so send over anything you can find. NeilBrown - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.19-mm1 (md/raid1 randomly drops partitions)
On Monday, 11 December 2006 23:52, Neil Brown wrote: > On Monday December 11, [EMAIL PROTECTED] wrote: > > Hi, > > > > On Monday, 11 December 2006 09:58, Andrew Morton wrote: > > > > > > Temporarily at > > > > > > http://userweb.kernel.org/~akpm/2.6.19-mm1/ > > > > > > Will appear later at > > > > > > > > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.19/2.6.19-mm1/ > > > > It caused all of the md RAID1s on my test box to drop one of their > > partitions, > > apparently at random. > > That's clever > > Do you have any kernel logs of this happening? My guess would be the > underlying device driver is returned more errors than before, but we > need the logs to be sure. I've only found lots of messages like this: md: super_written gets error=-5, uptodate=0 I'll try to reproduce it tomorrow and collect some more information. Greetings, Rafael -- If you don't have the time to read, you don't have the time or the tools to write. - Stephen King - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.19-mm1 (md/raid1 randomly drops partitions)
On Monday December 11, [EMAIL PROTECTED] wrote: > Hi, > > On Monday, 11 December 2006 09:58, Andrew Morton wrote: > > > > Temporarily at > > > > http://userweb.kernel.org/~akpm/2.6.19-mm1/ > > > > Will appear later at > > > > > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.19/2.6.19-mm1/ > > It caused all of the md RAID1s on my test box to drop one of their partitions, > apparently at random. That's clever Do you have any kernel logs of this happening? My guess would be the underlying device driver is returned more errors than before, but we need the logs to be sure. Thanks, NeilBrown - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.19-mm1 (md/raid1 randomly drops partitions)
On Monday December 11, [EMAIL PROTECTED] wrote: Hi, On Monday, 11 December 2006 09:58, Andrew Morton wrote: Temporarily at http://userweb.kernel.org/~akpm/2.6.19-mm1/ Will appear later at ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.19/2.6.19-mm1/ It caused all of the md RAID1s on my test box to drop one of their partitions, apparently at random. That's clever Do you have any kernel logs of this happening? My guess would be the underlying device driver is returned more errors than before, but we need the logs to be sure. Thanks, NeilBrown - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.19-mm1 (md/raid1 randomly drops partitions)
On Monday, 11 December 2006 23:52, Neil Brown wrote: On Monday December 11, [EMAIL PROTECTED] wrote: Hi, On Monday, 11 December 2006 09:58, Andrew Morton wrote: Temporarily at http://userweb.kernel.org/~akpm/2.6.19-mm1/ Will appear later at ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.19/2.6.19-mm1/ It caused all of the md RAID1s on my test box to drop one of their partitions, apparently at random. That's clever Do you have any kernel logs of this happening? My guess would be the underlying device driver is returned more errors than before, but we need the logs to be sure. I've only found lots of messages like this: md: super_written gets error=-5, uptodate=0 I'll try to reproduce it tomorrow and collect some more information. Greetings, Rafael -- If you don't have the time to read, you don't have the time or the tools to write. - Stephen King - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.19-mm1 (md/raid1 randomly drops partitions)
On Tuesday December 12, [EMAIL PROTECTED] wrote: On Monday, 11 December 2006 23:52, Neil Brown wrote: On Monday December 11, [EMAIL PROTECTED] wrote: Hi, On Monday, 11 December 2006 09:58, Andrew Morton wrote: Temporarily at http://userweb.kernel.org/~akpm/2.6.19-mm1/ Will appear later at ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.19/2.6.19-mm1/ It caused all of the md RAID1s on my test box to drop one of their partitions, apparently at random. That's clever Do you have any kernel logs of this happening? My guess would be the underlying device driver is returned more errors than before, but we need the logs to be sure. I've only found lots of messages like this: md: super_written gets error=-5, uptodate=0 So when md writes to write out the superblock, to gets EIO... Odd that you aren't getting errors for normal writes. What devices are the md/raid1 built on? I'll try to reproduce it tomorrow and collect some more information. Thanks. More information is definitely better than less, so send over anything you can find. NeilBrown - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/