Re: kernel panic in SoftwareRAID autodetection

2000-12-06 Thread Neil Brown

On Wednesday December 6, [EMAIL PROTECTED] wrote:
> Neil Brown wrote:
> > 
> > here we have lost the "part" automatic variable in disk_name but 
> > 
> 
>   I don't think so.  Look again.  

Gulp... :-(

Yes, your patch is indeed fine.  I heartily recommend it (for whatever
that is worth).

Your mailer on the other hand  could use some work.
The latest patch that you mailed out still had only spaces, no tabs.
Nevertheless, I managed to apply it and it seems to work fine, as well
as looking all right.

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: kernel panic in SoftwareRAID autodetection

2000-12-06 Thread Brian Kress

Neil Brown wrote:
> 
> On Wednesday December 6, [EMAIL PROTECTED] wrote:
> > Peter Samuelson wrote:
> > >
> > > [Roberto Ragusa]
> > > > BTW, here is a little patch regarding a silly problem I found
> > > > about RAID partitions naming (/proc/partitions).
> > > > No more "md8" "md9" "md:" "md;" ... but "md8" "md9" "md10" "md11" ...
> > > > Well, this patch should work up to "md99".
> > >
> > > This stuff *really* should be split out into the drivers.  Brian Kress
> > > had a patch against test11 for this.  Brian?  You want to fold in this
> > > fix?
> >
> >   Sure.  I got resounding silence to posting the patch last time,
> > so I'm not sure if anyone actually wants this patch, but here it is
> > again with this fix (the non ugly version) folded in.
> >
> 
> Have you ever noticed that bad patches get a lot more response than
> good patches?  I think people like having something useful to say and
> a simple "I like it" often doesn't seem worth it, though in reality is
> very valuable.
> 
> Mayge the thing to do is include a few obvious but not very
> significant errors like:
>   - initialise a static variable to 0
>   - use /** to introduce a comment that isn't in the right format for
> the auto-documentation stuff
>   - uses lines wider than 80 characters
>   - use unnecessary extra parentheses
>   - use non-standard indenting
> 
> that way people will respond because they think they have something to
> say, and will hopefully comment further... :-)
> 

Perhaps this should be in the Linus-HOWTO.  :)

>
> But I see that you have done that  a simple compiler error (I
> think).
> 
> > diff -u --recursive linux-2.4.0-test11/fs/partitions/check.c
> > linux-2.4.0-test11-ppfix/fs/partitions/check.c
> > --- linux-2.4.0-test11/fs/partitions/check.c  Mon Nov 20 15:17:27 2000
> > +++ linux-2.4.0-test11-ppfix/fs/partitions/check.cThu Nov 23 14:30:45
> > 2000
> > @@ -83,11 +83,10 @@
> >   */
> >  char *disk_name (struct gendisk *hd, int minor, char *buf)
> >  {
> > - unsigned int part;
> >   const char *maj = hd->major_name;
> >   int unit = (minor >> hd->minor_shift) + 'a';
> > + unsigned int part = minor & ((1 << hd->minor_shift) - 1);
> >
> > - part = minor & ((1 << hd->minor_shift) - 1);
> 
> here we have lost the "part" automatic variable in disk_name but 
> 

I don't think so.  Look again.  I deleted the declaration and
the assignment and replaced it with a declare and assign all at once.
The other pieces were like that and that one being inconsisent 
offended me.  :)  This does compile, and seems to work.

> 
> But apart from this, I like it, and I think that it would be good if
> it went into 2.4.
> 
> Maybe do one more iteration (or convince me that the above isn't a
> bug), and ask for comment.  I promise to test and respond.
> Then send it to Linus, and explain what it does, and mention that it
> fixes a real issue in lvm (I assume it does as someone said so. I
> haven't actually looked into that) and leave it to him.
> 

If you want to test it, get the version I posted later
in this thread.  It has some fixes like declaring some functions
static and my mailer not eating it.  If you'd like that version
mailed to you, let me know.
Let me know how it works.


Brian Kress
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: kernel panic in SoftwareRAID autodetection

2000-12-06 Thread Neil Brown

On Wednesday December 6, [EMAIL PROTECTED] wrote:
> Peter Samuelson wrote:
> > 
> > [Roberto Ragusa]
> > > BTW, here is a little patch regarding a silly problem I found
> > > about RAID partitions naming (/proc/partitions).
> > > No more "md8" "md9" "md:" "md;" ... but "md8" "md9" "md10" "md11" ...
> > > Well, this patch should work up to "md99".
> > 
> > This stuff *really* should be split out into the drivers.  Brian Kress
> > had a patch against test11 for this.  Brian?  You want to fold in this
> > fix?
> 
>   Sure.  I got resounding silence to posting the patch last time,
> so I'm not sure if anyone actually wants this patch, but here it is 
> again with this fix (the non ugly version) folded in.
> 

Have you ever noticed that bad patches get a lot more response than
good patches?  I think people like having something useful to say and
a simple "I like it" often doesn't seem worth it, though in reality is
very valuable.

Mayge the thing to do is include a few obvious but not very
significant errors like:
  - initialise a static variable to 0
  - use /** to introduce a comment that isn't in the right format for
the auto-documentation stuff
  - uses lines wider than 80 characters
  - use unnecessary extra parentheses 
  - use non-standard indenting

that way people will respond because they think they have something to
say, and will hopefully comment further... :-)

But I see that you have done that  a simple compiler error (I
think).

> diff -u --recursive linux-2.4.0-test11/fs/partitions/check.c
> linux-2.4.0-test11-ppfix/fs/partitions/check.c
> --- linux-2.4.0-test11/fs/partitions/check.c  Mon Nov 20 15:17:27 2000
> +++ linux-2.4.0-test11-ppfix/fs/partitions/check.cThu Nov 23 14:30:45
> 2000
> @@ -83,11 +83,10 @@
>   */
>  char *disk_name (struct gendisk *hd, int minor, char *buf)
>  {
> - unsigned int part;
>   const char *maj = hd->major_name;
>   int unit = (minor >> hd->minor_shift) + 'a';
> + unsigned int part = minor & ((1 << hd->minor_shift) - 1);
>  
> - part = minor & ((1 << hd->minor_shift) - 1);

here we have lost the "part" automatic variable in disk_name but 

(stuff deleted)
> - else
> - sprintf(buf, "%s/c%dd%dp%d", maj, ctlr, disk, part);
> - return buf;
> - }
> + if (hd->hd_name) return hd->hd_name(hd, minor, buf);
> +
>   if (part)
>   sprintf(buf, "%s%c%d", maj, unit, part);

here we use it.  Does this compile?

But apart from this, I like it, and I think that it would be good if
it went into 2.4.  

Maybe do one more iteration (or convince me that the above isn't a
bug), and ask for comment.  I promise to test and respond.
Then send it to Linus, and explain what it does, and mention that it
fixes a real issue in lvm (I assume it does as someone said so. I
haven't actually looked into that) and leave it to him.

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: kernel panic in SoftwareRAID autodetection

2000-12-06 Thread Brian Kress

Peter Samuelson wrote:
> 
> [Brian Kress <[EMAIL PROTECTED]>]
> > I got resounding silence to posting the patch last time, so I'm not
> > sure if anyone actually wants this patch,
> 
> Well, I like it, but admittedly it's mostly in the "cleanup" category
> (though it does fix the LVM name issue) so at this point in 2.4 I guess
> Linus has more important stuff to worry about.
> 

Probably.  Maybe I can get it in when 2.5 kicks off.

>
> The best thing about your patch is that by putting the logic back in
> the individual drivers, it makes check.c not depend on your module
> configuration (so you can compile a disk module, either inside or
> outside the kernel tree, without worrying about editing or recompiling
> check.c).
> 

Yep, that's the point.  I hate having generic code have to
know things about specific drivers.

>
> > +char *DAC960_disk_name(struct gendisk *hd, int minor, char *buf)
> > +char *cciss_disk_name(struct gendisk *hd, int minor, char *buf)
> > +char *ida_disk_name(struct gendisk *hd, int minor, char *buf)
> > +char* ide_disk_name(struct gendisk *hd, int minor, char *buf)
> > +char *lvm_hd_name(struct gendisk *, int, char *);
> > +char *lvm_hd_name(struct gendisk *hd, int minor, char *buf)
> > +char *md_disk_name(struct gendisk*, int, char *);
> > +char * md_disk_name(struct gendisk *hd, int minor, char* buf)
> > +char *scsi_disk_name(struct gendisk *hd, int minor, char *buf)
> 
> These should all be 'static char *'.
> 

Yes, you're right.  I've updated that and (hopefully) fixed 
the problem (as Toon van der Pas pointed out) with my mailer mangling 
the patch.  Here's the updated patch:


Brian

- Snip 

diff -u --recursive linux-2.4.0-test11/drivers/block/DAC960.c 
linux-2.4.0-test11-ppfix/drivers/block/DAC960.c
--- linux-2.4.0-test11/drivers/block/DAC960.c   Mon Nov 20 15:17:25 2000
+++ linux-2.4.0-test11-ppfix/drivers/block/DAC960.c Thu Nov 23 14:29:37 2000
@@ -1885,6 +1885,21 @@
 }
 
 
+static char *DAC960_disk_name(struct gendisk *hd, int minor, char *buf)
+
+{
+   int ctlr = hd->major - DAC960_MAJOR;
+   int disk = minor >> hd->minor_shift;
+   int part = minor & ((1 << hd->minor_shift) - 1);
+
+   if (part)
+   sprintf(buf, "%s/c%dd%dp%d", hd->major_name, ctlr, disk, part);
+   else
+   sprintf(buf, "%s/c%dd%d", hd->major_name, ctlr, disk);
+   return buf;
+}
+
+
 /*
   DAC960_RegisterBlockDevice registers the Block Device structures
   associated with Controller.
@@ -1945,6 +1960,7 @@
   Controller->GenericDiskInfo.nr_real = Controller->LogicalDriveCount;
   Controller->GenericDiskInfo.next = NULL;
   Controller->GenericDiskInfo.fops = _BlockDeviceOperations;
+  Controller->GenericDiskInfo.hd_name = DAC960_disk_name;
   /*
 Install the Generic Disk Information structure at the end of the list.
   */
diff -u --recursive linux-2.4.0-test11/drivers/block/cciss.c 
linux-2.4.0-test11-ppfix/drivers/block/cciss.c
--- linux-2.4.0-test11/drivers/block/cciss.cMon Nov 20 15:17:25 2000
+++ linux-2.4.0-test11-ppfix/drivers/block/cciss.c  Thu Nov 23 14:22:55 2000
@@ -1749,6 +1749,20 @@
kfree(size_buff);
 }  
 
+static char *cciss_disk_name(struct gendisk *hd, int minor, char *buf)
+
+{
+   int ctlr = hd->major - COMPAQ_CISS_MAJOR;
+   int disk = minor >> hd->minor_shift;
+   int part = minor & ((1 << hd->minor_shift) - 1);
+
+   if (part)
+   sprintf(buf, "%s/c%dd%dp%d", hd->major_name, ctlr, disk, part);
+   else
+   sprintf(buf, "%s/c%dd%d", hd->major_name, ctlr, disk);
+   return buf;
+}
+
 /*
  *  This is it.  Find all the controllers and register them.  I really hate
  *  stealing all these major device numbers.
@@ -1851,6 +1865,7 @@
hba[i]->gendisk.part = hba[i]->hd;
hba[i]->gendisk.sizes = hba[i]->sizes;
hba[i]->gendisk.nr_real = hba[i]->num_luns;
+   hba[i]->gendisk.hd_name = cciss_disk_name;
 
/* Get on the disk list */ 
hba[i]->gendisk.next = gendisk_head;
diff -u --recursive linux-2.4.0-test11/drivers/block/cpqarray.c 
linux-2.4.0-test11-ppfix/drivers/block/cpqarray.c
--- linux-2.4.0-test11/drivers/block/cpqarray.c Mon Nov 20 15:20:29 2000
+++ linux-2.4.0-test11-ppfix/drivers/block/cpqarray.c   Thu Nov 23 14:14:03 2000
@@ -362,6 +362,20 @@
 }
 #endif /* MODULE */
 
+static char *ida_disk_name(struct gendisk *hd, int minor, char *buf)
+
+{
+   int ctlr = hd->major - COMPAQ_SMART2_MAJOR;
+   int disk = minor >> hd->minor_shift;
+   int part = minor & ((1 << hd->minor_shift) - 1);
+
+   if (part)
+   sprintf(buf, "%s/c%dd%dp%d", hd->major_name, ctlr, disk, part);
+   else
+   sprintf(buf, "%s/c%dd%d", hd->major_name, ctlr, disk);
+   return buf;   
+}
+
 /*
  *  This is it.  Find all the 

Re: kernel panic in SoftwareRAID autodetection

2000-12-06 Thread Brian Kress

Peter Samuelson wrote:
> 
> [Roberto Ragusa]
> > BTW, here is a little patch regarding a silly problem I found
> > about RAID partitions naming (/proc/partitions).
> > No more "md8" "md9" "md:" "md;" ... but "md8" "md9" "md10" "md11" ...
> > Well, this patch should work up to "md99".
> 
> This stuff *really* should be split out into the drivers.  Brian Kress
> had a patch against test11 for this.  Brian?  You want to fold in this
> fix?

Sure.  I got resounding silence to posting the patch last time,
so I'm not sure if anyone actually wants this patch, but here it is 
again with this fix (the non ugly version) folded in.


Brian 


--- Patch
---

diff -u --recursive linux-2.4.0-test11/drivers/block/DAC960.c
linux-2.4.0-test11-ppfix/drivers/block/DAC960.c
--- linux-2.4.0-test11/drivers/block/DAC960.c   Mon Nov 20 15:17:25 2000
+++ linux-2.4.0-test11-ppfix/drivers/block/DAC960.c Thu Nov 23 14:29:37
2000
@@ -1885,6 +1885,21 @@
 }
 
 
+char *DAC960_disk_name(struct gendisk *hd, int minor, char *buf)
+
+{
+   int ctlr = hd->major - DAC960_MAJOR;
+   int disk = minor >> hd->minor_shift;
+   int part = minor & ((1 << hd->minor_shift) - 1);
+
+   if (part)
+   sprintf(buf, "%s/c%dd%dp%d", hd->major_name, ctlr, disk, part);
+   else
+   sprintf(buf, "%s/c%dd%d", hd->major_name, ctlr, disk);
+   return buf;
+}
+
+
 /*
   DAC960_RegisterBlockDevice registers the Block Device structures
   associated with Controller.
@@ -1945,6 +1960,7 @@
   Controller->GenericDiskInfo.nr_real = Controller->LogicalDriveCount;
   Controller->GenericDiskInfo.next = NULL;
   Controller->GenericDiskInfo.fops = _BlockDeviceOperations;
+  Controller->GenericDiskInfo.hd_name = DAC960_disk_name;
   /*
 Install the Generic Disk Information structure at the end of the
list.
   */
diff -u --recursive linux-2.4.0-test11/drivers/block/cciss.c
linux-2.4.0-test11-ppfix/drivers/block/cciss.c
--- linux-2.4.0-test11/drivers/block/cciss.cMon Nov 20 15:17:25 2000
+++ linux-2.4.0-test11-ppfix/drivers/block/cciss.c  Thu Nov 23 14:22:55
2000
@@ -1749,6 +1749,20 @@
kfree(size_buff);
 }  
 
+char *cciss_disk_name(struct gendisk *hd, int minor, char *buf)
+
+{
+   int ctlr = hd->major - COMPAQ_CISS_MAJOR;
+   int disk = minor >> hd->minor_shift;
+   int part = minor & ((1 << hd->minor_shift) - 1);
+
+   if (part)
+   sprintf(buf, "%s/c%dd%dp%d", hd->major_name, ctlr, disk, part);
+   else
+   sprintf(buf, "%s/c%dd%d", hd->major_name, ctlr, disk);
+   return buf;
+}
+
 /*
  *  This is it.  Find all the controllers and register them.  I really
hate
  *  stealing all these major device numbers.
@@ -1851,6 +1865,7 @@
hba[i]->gendisk.part = hba[i]->hd;
hba[i]->gendisk.sizes = hba[i]->sizes;
hba[i]->gendisk.nr_real = hba[i]->num_luns;
+   hba[i]->gendisk.hd_name = cciss_disk_name;
 
/* Get on the disk list */ 
hba[i]->gendisk.next = gendisk_head;
diff -u --recursive linux-2.4.0-test11/drivers/block/cpqarray.c
linux-2.4.0-test11-ppfix/drivers/block/cpqarray.c
--- linux-2.4.0-test11/drivers/block/cpqarray.c Mon Nov 20 15:20:29 2000
+++ linux-2.4.0-test11-ppfix/drivers/block/cpqarray.c   Thu Nov 23
14:14:03 2000
@@ -362,6 +362,20 @@
 }
 #endif /* MODULE */
 
+char *ida_disk_name(struct gendisk *hd, int minor, char *buf)
+
+{
+   int ctlr = hd->major - COMPAQ_SMART2_MAJOR;
+   int disk = minor >> hd->minor_shift;
+   int part = minor & ((1 << hd->minor_shift) - 1);
+
+   if (part)
+   sprintf(buf, "%s/c%dd%dp%d", hd->major_name, ctlr, disk, part);
+   else
+   sprintf(buf, "%s/c%dd%d", hd->major_name, ctlr, disk);
+   return buf;   
+}
+
 /*
  *  This is it.  Find all the controllers and register them.  I really
hate
  *  stealing all these major device numbers.
@@ -512,6 +526,7 @@
ida_gendisk[i].part = ida + (i*256);
ida_gendisk[i].sizes = ida_sizes + (i*256);
ida_gendisk[i].nr_real = 0; 
+   ida_gendisk[i].hd_name = ida_disk_name;

/* Get on the disk list */
ida_gendisk[i].next = gendisk_head;
diff -u --recursive linux-2.4.0-test11/drivers/ide/ide-probe.c
linux-2.4.0-test11-ppfix/drivers/ide/ide-probe.c
--- linux-2.4.0-test11/drivers/ide/ide-probe.c  Thu Aug  3 19:29:49 2000
+++ linux-2.4.0-test11-ppfix/drivers/ide/ide-probe.cThu Nov 23 12:32:16
2000
@@ -726,6 +726,40 @@
return 0;
 }
 
+char* ide_disk_name(struct gendisk *hd, int minor, char *buf)
+
+{
+   int unit = (minor >> hd->minor_shift) + 'a'; 
+   unsigned int part = minor & ((1 << hd->minor_shift) - 1);
+
+   switch (hd->major) {
+   case IDE9_MAJOR:
+   

Re: kernel panic in SoftwareRAID autodetection

2000-12-06 Thread Brian Kress

Peter Samuelson wrote:
 
 [Roberto Ragusa]
  BTW, here is a little patch regarding a silly problem I found
  about RAID partitions naming (/proc/partitions).
  No more "md8" "md9" "md:" "md;" ... but "md8" "md9" "md10" "md11" ...
  Well, this patch should work up to "md99".
 
 This stuff *really* should be split out into the drivers.  Brian Kress
 had a patch against test11 for this.  Brian?  You want to fold in this
 fix?

Sure.  I got resounding silence to posting the patch last time,
so I'm not sure if anyone actually wants this patch, but here it is 
again with this fix (the non ugly version) folded in.


Brian 


--- Patch
---

diff -u --recursive linux-2.4.0-test11/drivers/block/DAC960.c
linux-2.4.0-test11-ppfix/drivers/block/DAC960.c
--- linux-2.4.0-test11/drivers/block/DAC960.c   Mon Nov 20 15:17:25 2000
+++ linux-2.4.0-test11-ppfix/drivers/block/DAC960.c Thu Nov 23 14:29:37
2000
@@ -1885,6 +1885,21 @@
 }
 
 
+char *DAC960_disk_name(struct gendisk *hd, int minor, char *buf)
+
+{
+   int ctlr = hd-major - DAC960_MAJOR;
+   int disk = minor  hd-minor_shift;
+   int part = minor  ((1  hd-minor_shift) - 1);
+
+   if (part)
+   sprintf(buf, "%s/c%dd%dp%d", hd-major_name, ctlr, disk, part);
+   else
+   sprintf(buf, "%s/c%dd%d", hd-major_name, ctlr, disk);
+   return buf;
+}
+
+
 /*
   DAC960_RegisterBlockDevice registers the Block Device structures
   associated with Controller.
@@ -1945,6 +1960,7 @@
   Controller-GenericDiskInfo.nr_real = Controller-LogicalDriveCount;
   Controller-GenericDiskInfo.next = NULL;
   Controller-GenericDiskInfo.fops = DAC960_BlockDeviceOperations;
+  Controller-GenericDiskInfo.hd_name = DAC960_disk_name;
   /*
 Install the Generic Disk Information structure at the end of the
list.
   */
diff -u --recursive linux-2.4.0-test11/drivers/block/cciss.c
linux-2.4.0-test11-ppfix/drivers/block/cciss.c
--- linux-2.4.0-test11/drivers/block/cciss.cMon Nov 20 15:17:25 2000
+++ linux-2.4.0-test11-ppfix/drivers/block/cciss.c  Thu Nov 23 14:22:55
2000
@@ -1749,6 +1749,20 @@
kfree(size_buff);
 }  
 
+char *cciss_disk_name(struct gendisk *hd, int minor, char *buf)
+
+{
+   int ctlr = hd-major - COMPAQ_CISS_MAJOR;
+   int disk = minor  hd-minor_shift;
+   int part = minor  ((1  hd-minor_shift) - 1);
+
+   if (part)
+   sprintf(buf, "%s/c%dd%dp%d", hd-major_name, ctlr, disk, part);
+   else
+   sprintf(buf, "%s/c%dd%d", hd-major_name, ctlr, disk);
+   return buf;
+}
+
 /*
  *  This is it.  Find all the controllers and register them.  I really
hate
  *  stealing all these major device numbers.
@@ -1851,6 +1865,7 @@
hba[i]-gendisk.part = hba[i]-hd;
hba[i]-gendisk.sizes = hba[i]-sizes;
hba[i]-gendisk.nr_real = hba[i]-num_luns;
+   hba[i]-gendisk.hd_name = cciss_disk_name;
 
/* Get on the disk list */ 
hba[i]-gendisk.next = gendisk_head;
diff -u --recursive linux-2.4.0-test11/drivers/block/cpqarray.c
linux-2.4.0-test11-ppfix/drivers/block/cpqarray.c
--- linux-2.4.0-test11/drivers/block/cpqarray.c Mon Nov 20 15:20:29 2000
+++ linux-2.4.0-test11-ppfix/drivers/block/cpqarray.c   Thu Nov 23
14:14:03 2000
@@ -362,6 +362,20 @@
 }
 #endif /* MODULE */
 
+char *ida_disk_name(struct gendisk *hd, int minor, char *buf)
+
+{
+   int ctlr = hd-major - COMPAQ_SMART2_MAJOR;
+   int disk = minor  hd-minor_shift;
+   int part = minor  ((1  hd-minor_shift) - 1);
+
+   if (part)
+   sprintf(buf, "%s/c%dd%dp%d", hd-major_name, ctlr, disk, part);
+   else
+   sprintf(buf, "%s/c%dd%d", hd-major_name, ctlr, disk);
+   return buf;   
+}
+
 /*
  *  This is it.  Find all the controllers and register them.  I really
hate
  *  stealing all these major device numbers.
@@ -512,6 +526,7 @@
ida_gendisk[i].part = ida + (i*256);
ida_gendisk[i].sizes = ida_sizes + (i*256);
ida_gendisk[i].nr_real = 0; 
+   ida_gendisk[i].hd_name = ida_disk_name;

/* Get on the disk list */
ida_gendisk[i].next = gendisk_head;
diff -u --recursive linux-2.4.0-test11/drivers/ide/ide-probe.c
linux-2.4.0-test11-ppfix/drivers/ide/ide-probe.c
--- linux-2.4.0-test11/drivers/ide/ide-probe.c  Thu Aug  3 19:29:49 2000
+++ linux-2.4.0-test11-ppfix/drivers/ide/ide-probe.cThu Nov 23 12:32:16
2000
@@ -726,6 +726,40 @@
return 0;
 }
 
+char* ide_disk_name(struct gendisk *hd, int minor, char *buf)
+
+{
+   int unit = (minor  hd-minor_shift) + 'a'; 
+   unsigned int part = minor  ((1  hd-minor_shift) - 1);
+
+   switch (hd-major) {
+   case IDE9_MAJOR:
+   unit += 2;
+   case IDE8_MAJOR:
+  

Re: kernel panic in SoftwareRAID autodetection

2000-12-06 Thread Brian Kress

Peter Samuelson wrote:
 
 [Brian Kress [EMAIL PROTECTED]]
  I got resounding silence to posting the patch last time, so I'm not
  sure if anyone actually wants this patch,
 
 Well, I like it, but admittedly it's mostly in the "cleanup" category
 (though it does fix the LVM name issue) so at this point in 2.4 I guess
 Linus has more important stuff to worry about.
 

Probably.  Maybe I can get it in when 2.5 kicks off.


 The best thing about your patch is that by putting the logic back in
 the individual drivers, it makes check.c not depend on your module
 configuration (so you can compile a disk module, either inside or
 outside the kernel tree, without worrying about editing or recompiling
 check.c).
 

Yep, that's the point.  I hate having generic code have to
know things about specific drivers.


  +char *DAC960_disk_name(struct gendisk *hd, int minor, char *buf)
  +char *cciss_disk_name(struct gendisk *hd, int minor, char *buf)
  +char *ida_disk_name(struct gendisk *hd, int minor, char *buf)
  +char* ide_disk_name(struct gendisk *hd, int minor, char *buf)
  +char *lvm_hd_name(struct gendisk *, int, char *);
  +char *lvm_hd_name(struct gendisk *hd, int minor, char *buf)
  +char *md_disk_name(struct gendisk*, int, char *);
  +char * md_disk_name(struct gendisk *hd, int minor, char* buf)
  +char *scsi_disk_name(struct gendisk *hd, int minor, char *buf)
 
 These should all be 'static char *'.
 

Yes, you're right.  I've updated that and (hopefully) fixed 
the problem (as Toon van der Pas pointed out) with my mailer mangling 
the patch.  Here's the updated patch:


Brian

- Snip 

diff -u --recursive linux-2.4.0-test11/drivers/block/DAC960.c 
linux-2.4.0-test11-ppfix/drivers/block/DAC960.c
--- linux-2.4.0-test11/drivers/block/DAC960.c   Mon Nov 20 15:17:25 2000
+++ linux-2.4.0-test11-ppfix/drivers/block/DAC960.c Thu Nov 23 14:29:37 2000
@@ -1885,6 +1885,21 @@
 }
 
 
+static char *DAC960_disk_name(struct gendisk *hd, int minor, char *buf)
+
+{
+   int ctlr = hd-major - DAC960_MAJOR;
+   int disk = minor  hd-minor_shift;
+   int part = minor  ((1  hd-minor_shift) - 1);
+
+   if (part)
+   sprintf(buf, "%s/c%dd%dp%d", hd-major_name, ctlr, disk, part);
+   else
+   sprintf(buf, "%s/c%dd%d", hd-major_name, ctlr, disk);
+   return buf;
+}
+
+
 /*
   DAC960_RegisterBlockDevice registers the Block Device structures
   associated with Controller.
@@ -1945,6 +1960,7 @@
   Controller-GenericDiskInfo.nr_real = Controller-LogicalDriveCount;
   Controller-GenericDiskInfo.next = NULL;
   Controller-GenericDiskInfo.fops = DAC960_BlockDeviceOperations;
+  Controller-GenericDiskInfo.hd_name = DAC960_disk_name;
   /*
 Install the Generic Disk Information structure at the end of the list.
   */
diff -u --recursive linux-2.4.0-test11/drivers/block/cciss.c 
linux-2.4.0-test11-ppfix/drivers/block/cciss.c
--- linux-2.4.0-test11/drivers/block/cciss.cMon Nov 20 15:17:25 2000
+++ linux-2.4.0-test11-ppfix/drivers/block/cciss.c  Thu Nov 23 14:22:55 2000
@@ -1749,6 +1749,20 @@
kfree(size_buff);
 }  
 
+static char *cciss_disk_name(struct gendisk *hd, int minor, char *buf)
+
+{
+   int ctlr = hd-major - COMPAQ_CISS_MAJOR;
+   int disk = minor  hd-minor_shift;
+   int part = minor  ((1  hd-minor_shift) - 1);
+
+   if (part)
+   sprintf(buf, "%s/c%dd%dp%d", hd-major_name, ctlr, disk, part);
+   else
+   sprintf(buf, "%s/c%dd%d", hd-major_name, ctlr, disk);
+   return buf;
+}
+
 /*
  *  This is it.  Find all the controllers and register them.  I really hate
  *  stealing all these major device numbers.
@@ -1851,6 +1865,7 @@
hba[i]-gendisk.part = hba[i]-hd;
hba[i]-gendisk.sizes = hba[i]-sizes;
hba[i]-gendisk.nr_real = hba[i]-num_luns;
+   hba[i]-gendisk.hd_name = cciss_disk_name;
 
/* Get on the disk list */ 
hba[i]-gendisk.next = gendisk_head;
diff -u --recursive linux-2.4.0-test11/drivers/block/cpqarray.c 
linux-2.4.0-test11-ppfix/drivers/block/cpqarray.c
--- linux-2.4.0-test11/drivers/block/cpqarray.c Mon Nov 20 15:20:29 2000
+++ linux-2.4.0-test11-ppfix/drivers/block/cpqarray.c   Thu Nov 23 14:14:03 2000
@@ -362,6 +362,20 @@
 }
 #endif /* MODULE */
 
+static char *ida_disk_name(struct gendisk *hd, int minor, char *buf)
+
+{
+   int ctlr = hd-major - COMPAQ_SMART2_MAJOR;
+   int disk = minor  hd-minor_shift;
+   int part = minor  ((1  hd-minor_shift) - 1);
+
+   if (part)
+   sprintf(buf, "%s/c%dd%dp%d", hd-major_name, ctlr, disk, part);
+   else
+   sprintf(buf, "%s/c%dd%d", hd-major_name, ctlr, disk);
+   return buf;   
+}
+
 /*
  *  This is it.  Find all the controllers and register them.  I really hate
  *  stealing all these major 

Re: kernel panic in SoftwareRAID autodetection

2000-12-06 Thread Neil Brown

On Wednesday December 6, [EMAIL PROTECTED] wrote:
 Peter Samuelson wrote:
  
  [Roberto Ragusa]
   BTW, here is a little patch regarding a silly problem I found
   about RAID partitions naming (/proc/partitions).
   No more "md8" "md9" "md:" "md;" ... but "md8" "md9" "md10" "md11" ...
   Well, this patch should work up to "md99".
  
  This stuff *really* should be split out into the drivers.  Brian Kress
  had a patch against test11 for this.  Brian?  You want to fold in this
  fix?
 
   Sure.  I got resounding silence to posting the patch last time,
 so I'm not sure if anyone actually wants this patch, but here it is 
 again with this fix (the non ugly version) folded in.
 

Have you ever noticed that bad patches get a lot more response than
good patches?  I think people like having something useful to say and
a simple "I like it" often doesn't seem worth it, though in reality is
very valuable.

Mayge the thing to do is include a few obvious but not very
significant errors like:
  - initialise a static variable to 0
  - use /** to introduce a comment that isn't in the right format for
the auto-documentation stuff
  - uses lines wider than 80 characters
  - use unnecessary extra parentheses 
  - use non-standard indenting

that way people will respond because they think they have something to
say, and will hopefully comment further... :-)

But I see that you have done that  a simple compiler error (I
think).

 diff -u --recursive linux-2.4.0-test11/fs/partitions/check.c
 linux-2.4.0-test11-ppfix/fs/partitions/check.c
 --- linux-2.4.0-test11/fs/partitions/check.c  Mon Nov 20 15:17:27 2000
 +++ linux-2.4.0-test11-ppfix/fs/partitions/check.cThu Nov 23 14:30:45
 2000
 @@ -83,11 +83,10 @@
   */
  char *disk_name (struct gendisk *hd, int minor, char *buf)
  {
 - unsigned int part;
   const char *maj = hd-major_name;
   int unit = (minor  hd-minor_shift) + 'a';
 + unsigned int part = minor  ((1  hd-minor_shift) - 1);
  
 - part = minor  ((1  hd-minor_shift) - 1);

here we have lost the "part" automatic variable in disk_name but 

(stuff deleted)
 - else
 - sprintf(buf, "%s/c%dd%dp%d", maj, ctlr, disk, part);
 - return buf;
 - }
 + if (hd-hd_name) return hd-hd_name(hd, minor, buf);
 +
   if (part)
   sprintf(buf, "%s%c%d", maj, unit, part);

here we use it.  Does this compile?

But apart from this, I like it, and I think that it would be good if
it went into 2.4.  

Maybe do one more iteration (or convince me that the above isn't a
bug), and ask for comment.  I promise to test and respond.
Then send it to Linus, and explain what it does, and mention that it
fixes a real issue in lvm (I assume it does as someone said so. I
haven't actually looked into that) and leave it to him.

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: kernel panic in SoftwareRAID autodetection

2000-12-06 Thread Brian Kress

Neil Brown wrote:
 
 On Wednesday December 6, [EMAIL PROTECTED] wrote:
  Peter Samuelson wrote:
  
   [Roberto Ragusa]
BTW, here is a little patch regarding a silly problem I found
about RAID partitions naming (/proc/partitions).
No more "md8" "md9" "md:" "md;" ... but "md8" "md9" "md10" "md11" ...
Well, this patch should work up to "md99".
  
   This stuff *really* should be split out into the drivers.  Brian Kress
   had a patch against test11 for this.  Brian?  You want to fold in this
   fix?
 
Sure.  I got resounding silence to posting the patch last time,
  so I'm not sure if anyone actually wants this patch, but here it is
  again with this fix (the non ugly version) folded in.
 
 
 Have you ever noticed that bad patches get a lot more response than
 good patches?  I think people like having something useful to say and
 a simple "I like it" often doesn't seem worth it, though in reality is
 very valuable.
 
 Mayge the thing to do is include a few obvious but not very
 significant errors like:
   - initialise a static variable to 0
   - use /** to introduce a comment that isn't in the right format for
 the auto-documentation stuff
   - uses lines wider than 80 characters
   - use unnecessary extra parentheses
   - use non-standard indenting
 
 that way people will respond because they think they have something to
 say, and will hopefully comment further... :-)
 

Perhaps this should be in the Linus-HOWTO.  :)


 But I see that you have done that  a simple compiler error (I
 think).
 
  diff -u --recursive linux-2.4.0-test11/fs/partitions/check.c
  linux-2.4.0-test11-ppfix/fs/partitions/check.c
  --- linux-2.4.0-test11/fs/partitions/check.c  Mon Nov 20 15:17:27 2000
  +++ linux-2.4.0-test11-ppfix/fs/partitions/check.cThu Nov 23 14:30:45
  2000
  @@ -83,11 +83,10 @@
*/
   char *disk_name (struct gendisk *hd, int minor, char *buf)
   {
  - unsigned int part;
const char *maj = hd-major_name;
int unit = (minor  hd-minor_shift) + 'a';
  + unsigned int part = minor  ((1  hd-minor_shift) - 1);
 
  - part = minor  ((1  hd-minor_shift) - 1);
 
 here we have lost the "part" automatic variable in disk_name but 
 

I don't think so.  Look again.  I deleted the declaration and
the assignment and replaced it with a declare and assign all at once.
The other pieces were like that and that one being inconsisent 
offended me.  :)  This does compile, and seems to work.

 
 But apart from this, I like it, and I think that it would be good if
 it went into 2.4.
 
 Maybe do one more iteration (or convince me that the above isn't a
 bug), and ask for comment.  I promise to test and respond.
 Then send it to Linus, and explain what it does, and mention that it
 fixes a real issue in lvm (I assume it does as someone said so. I
 haven't actually looked into that) and leave it to him.
 

If you want to test it, get the version I posted later
in this thread.  It has some fixes like declaring some functions
static and my mailer not eating it.  If you'd like that version
mailed to you, let me know.
Let me know how it works.


Brian Kress
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: kernel panic in SoftwareRAID autodetection

2000-12-06 Thread Neil Brown

On Wednesday December 6, [EMAIL PROTECTED] wrote:
 Neil Brown wrote:
  
  here we have lost the "part" automatic variable in disk_name but 
  
 
   I don't think so.  Look again.  

Gulp... :-(

Yes, your patch is indeed fine.  I heartily recommend it (for whatever
that is worth).

Your mailer on the other hand  could use some work.
The latest patch that you mailed out still had only spaces, no tabs.
Nevertheless, I managed to apply it and it seems to work fine, as well
as looking all right.

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: kernel panic in SoftwareRAID autodetection

2000-12-05 Thread Peter Samuelson


[Roberto Ragusa]
> BTW, here is a little patch regarding a silly problem I found
> about RAID partitions naming (/proc/partitions).
> No more "md8" "md9" "md:" "md;" ... but "md8" "md9" "md10" "md11" ...
> Well, this patch should work up to "md99".

This stuff *really* should be split out into the drivers.  Brian Kress
had a patch against test11 for this.  Brian?  You want to fold in this
fix?

> + if (unit<10) {
> + sprintf(buf, "%s%c", maj, '0' + unit);
> + return buf;
> + }
> + else {
> + sprintf(buf, "%s%c%c", maj, '0' + unit / 10, '0' + unit % 10);
> + return buf;
> + }

Errrm, that's ugly. (:  Use %d, unless I'm missing something:

sprintf(buf, "%s%d", maj, unit);

Peter
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: kernel panic in SoftwareRAID autodetection

2000-12-05 Thread Roberto Ragusa

On 06-Dec-00, Neil Brown wrote:

> The following patch isn't *correct*, but if it makes a difference for
> you, then it means that we have found the problem.. please let me
> know.
[one line patch]

Yes, it makes a difference :-) . The boot doesn't fail anymore and
all the RAID partitions are correctly detected.

BTW, here is a little patch regarding a silly problem I found
about RAID partitions naming (/proc/partitions).
No more "md8" "md9" "md:" "md;" ... but "md8" "md9" "md10" "md11" ...
Well, this patch should work up to "md99".

--- fs/partitions/check.cThu Nov  2 12:22:50 2000
+++ fs/partitions/check.cWed Dec  6 00:34:46 2000
@@ -121,10 +121,17 @@
unit += 2;
case IDE0_MAJOR:
maj = "hd";
-   break;
-   case MD_MAJOR:
-   unit -= 'a'-'0';
-   break;
+   }
+   if (hd->major == MD_MAJOR) {
+   unit -= 'a';
+   if (unit<10) {
+   sprintf(buf, "%s%c", maj, '0' + unit);
+   return buf;
+   }
+   else {
+   sprintf(buf, "%s%c%c", maj, '0' + unit / 10, '0' + unit % 10);
+   return buf;
+   }
}
if (hd->major >= SCSI_DISK1_MAJOR && hd->major <= SCSI_DISK7_MAJOR) {
unit = unit + (hd->major - SCSI_DISK1_MAJOR + 1) * 16;


For 2.2.x kernels, the file to be patched is drivers/block/genhd.c.

>> Please CC to me because I'm not a LKML subscriber.
> 
> Ofcourse.  I think it is common courtesy to reply to the author, and
> cc to the list if appropriate.

OK.

-- 
   Roberto Ragusa   robertoragusa at technologist.com

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: kernel panic in SoftwareRAID autodetection

2000-12-05 Thread Neil Brown

On Monday December 4, [EMAIL PROTECTED] wrote:
> On 01-Dec-00, Neil Brown wrote:
> > On Friday December 1, [EMAIL PROTECTED] wrote:
> >> I found a real showstopper problem in the SoftwareRAID autodetect
> >> code; 2.4.0-test10 and 2.4.0-test11 are affected (I didn't test
> >> previous versions).
> [detailed report]
> > 
> > Fixed in 2.4.0-test12pre3.  
> 
> I tried 2.4.0-test12pre3.
> The problem is *not* fixed: kernel panic again.

My apologies.  There was a "oops in SoftwareRAID autodetect" in test10
and test11 that was fixed in test12pre3, and I just assumed that your's
was the same, and didn't look at it properly.

On looking again, your problem is definately different and quite
weird.

The fact that it works fine with "console=ttyS0" but doesn't without
is very suspicious.  It suggests to me that there is some odd
interaction going on, and that the problem may well have nothing
directly to do with RAID.

I will try to reproduce it myself, but I suspect that there is at
least an even chance that I won't be able to.  If small changes like
"console=ttyS0"  make it go away then some other small difference in
my setup could equally change the result.

The code which was Oopsing was in kfree, and this seems to compile
very differently for SMP than for UP.  I think that you were compiling
for UP - is that correct?  Could you try compiling with SMP support
and see if that makes a difference?

.. however, I went back and poured over the code for a little
while, and I think I found something.  linear.c may over-run a
kmalloced buffer, which could product exactly what you are getting.
The following patch isn't *correct*, but if it makes a difference for
you, then it means that we have found the problem.. please let me
know.


--- drivers/md/linear.c 2000/12/03 22:14:54 1.3
+++ drivers/md/linear.c 2000/12/05 21:57:42
@@ -98,7 +98,7 @@
table++;
}
}
-   table->dev1 = NULL;
+/* table->dev1 = NULL; */
 
return 0;
 

> 
> Please CC to me because I'm not a LKML subscriber.

Ofcourse.  I think it is common courtesy to reply to the author, and
cc to the list if appropriate.
I would prefer it if, when replying to me, you send the reply
explicitly to me as well as to the list, as that way I get to see it
sooner (I only read mail to linux-kernel every few days, whereas mail
explicitly addressed to me get a much higher priority).

NeilBrown

> 
> -- 
> Roberto Ragusa   robertoragusa at technologist.com
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> Please read the FAQ at http://www.tux.org/lkml/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: kernel panic in SoftwareRAID autodetection

2000-12-05 Thread Neil Brown

On Monday December 4, [EMAIL PROTECTED] wrote:
 On 01-Dec-00, Neil Brown wrote:
  On Friday December 1, [EMAIL PROTECTED] wrote:
  I found a real showstopper problem in the SoftwareRAID autodetect
  code; 2.4.0-test10 and 2.4.0-test11 are affected (I didn't test
  previous versions).
 [detailed report]
  
  Fixed in 2.4.0-test12pre3.  
 
 I tried 2.4.0-test12pre3.
 The problem is *not* fixed: kernel panic again.

My apologies.  There was a "oops in SoftwareRAID autodetect" in test10
and test11 that was fixed in test12pre3, and I just assumed that your's
was the same, and didn't look at it properly.

On looking again, your problem is definately different and quite
weird.

The fact that it works fine with "console=ttyS0" but doesn't without
is very suspicious.  It suggests to me that there is some odd
interaction going on, and that the problem may well have nothing
directly to do with RAID.

I will try to reproduce it myself, but I suspect that there is at
least an even chance that I won't be able to.  If small changes like
"console=ttyS0"  make it go away then some other small difference in
my setup could equally change the result.

The code which was Oopsing was in kfree, and this seems to compile
very differently for SMP than for UP.  I think that you were compiling
for UP - is that correct?  Could you try compiling with SMP support
and see if that makes a difference?

.. however, I went back and poured over the code for a little
while, and I think I found something.  linear.c may over-run a
kmalloced buffer, which could product exactly what you are getting.
The following patch isn't *correct*, but if it makes a difference for
you, then it means that we have found the problem.. please let me
know.


--- drivers/md/linear.c 2000/12/03 22:14:54 1.3
+++ drivers/md/linear.c 2000/12/05 21:57:42
@@ -98,7 +98,7 @@
table++;
}
}
-   table-dev1 = NULL;
+/* table-dev1 = NULL; */
 
return 0;
 

 
 Please CC to me because I'm not a LKML subscriber.

Ofcourse.  I think it is common courtesy to reply to the author, and
cc to the list if appropriate.
I would prefer it if, when replying to me, you send the reply
explicitly to me as well as to the list, as that way I get to see it
sooner (I only read mail to linux-kernel every few days, whereas mail
explicitly addressed to me get a much higher priority).

NeilBrown

 
 -- 
 Roberto Ragusa   robertoragusa at technologist.com
 
 -
 To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
 the body of a message to [EMAIL PROTECTED]
 Please read the FAQ at http://www.tux.org/lkml/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: kernel panic in SoftwareRAID autodetection

2000-12-05 Thread Roberto Ragusa

On 06-Dec-00, Neil Brown wrote:

 The following patch isn't *correct*, but if it makes a difference for
 you, then it means that we have found the problem.. please let me
 know.
[one line patch]

Yes, it makes a difference :-) . The boot doesn't fail anymore and
all the RAID partitions are correctly detected.

BTW, here is a little patch regarding a silly problem I found
about RAID partitions naming (/proc/partitions).
No more "md8" "md9" "md:" "md;" ... but "md8" "md9" "md10" "md11" ...
Well, this patch should work up to "md99".

--- fs/partitions/check.cThu Nov  2 12:22:50 2000
+++ fs/partitions/check.cWed Dec  6 00:34:46 2000
@@ -121,10 +121,17 @@
unit += 2;
case IDE0_MAJOR:
maj = "hd";
-   break;
-   case MD_MAJOR:
-   unit -= 'a'-'0';
-   break;
+   }
+   if (hd-major == MD_MAJOR) {
+   unit -= 'a';
+   if (unit10) {
+   sprintf(buf, "%s%c", maj, '0' + unit);
+   return buf;
+   }
+   else {
+   sprintf(buf, "%s%c%c", maj, '0' + unit / 10, '0' + unit % 10);
+   return buf;
+   }
}
if (hd-major = SCSI_DISK1_MAJOR  hd-major = SCSI_DISK7_MAJOR) {
unit = unit + (hd-major - SCSI_DISK1_MAJOR + 1) * 16;


For 2.2.x kernels, the file to be patched is drivers/block/genhd.c.

 Please CC to me because I'm not a LKML subscriber.
 
 Ofcourse.  I think it is common courtesy to reply to the author, and
 cc to the list if appropriate.

OK.

-- 
   Roberto Ragusa   robertoragusa at technologist.com

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: kernel panic in SoftwareRAID autodetection

2000-12-05 Thread Peter Samuelson


[Roberto Ragusa]
 BTW, here is a little patch regarding a silly problem I found
 about RAID partitions naming (/proc/partitions).
 No more "md8" "md9" "md:" "md;" ... but "md8" "md9" "md10" "md11" ...
 Well, this patch should work up to "md99".

This stuff *really* should be split out into the drivers.  Brian Kress
had a patch against test11 for this.  Brian?  You want to fold in this
fix?

 + if (unit10) {
 + sprintf(buf, "%s%c", maj, '0' + unit);
 + return buf;
 + }
 + else {
 + sprintf(buf, "%s%c%c", maj, '0' + unit / 10, '0' + unit % 10);
 + return buf;
 + }

Errrm, that's ugly. (:  Use %d, unless I'm missing something:

sprintf(buf, "%s%d", maj, unit);

Peter
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: kernel panic in SoftwareRAID autodetection

2000-12-04 Thread Roberto Ragusa

On 01-Dec-00, Neil Brown wrote:
> On Friday December 1, [EMAIL PROTECTED] wrote:
>> I found a real showstopper problem in the SoftwareRAID autodetect
>> code; 2.4.0-test10 and 2.4.0-test11 are affected (I didn't test
>> previous versions).
[detailed report]
> 
> Fixed in 2.4.0-test12pre3.  

I tried 2.4.0-test12pre3.
The problem is *not* fixed: kernel panic again.

Please CC to me because I'm not a LKML subscriber.

-- 
Roberto Ragusa   robertoragusa at technologist.com

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: kernel panic in SoftwareRAID autodetection

2000-12-01 Thread Neil Brown

On Friday December 1, [EMAIL PROTECTED] wrote:
> Please CC to me because I'm not a LKML subscriber.
> 
> Hi,
> 
> I found a real showstopper problem in the SoftwareRAID autodetect
> code; 2.4.0-test10 and 2.4.0-test11 are affected (I didn't test
> previous versions).

Fixed in 2.4.0-test12pre3.  
If you are a raid users, it might be worth while subscribing to
[EMAIL PROTECTED]   Then you probably would have known
already

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



kernel panic in SoftwareRAID autodetection

2000-12-01 Thread Roberto Ragusa

Please CC to me because I'm not a LKML subscriber.

Hi,

I found a real showstopper problem in the SoftwareRAID autodetect
code; 2.4.0-test10 and 2.4.0-test11 are affected (I didn't test
previous versions).

I'm using two IDE disk with some RAIDed partitions:

  md5 : active raid0 hdc5[1] hda5[0] 2056064 blocks 32k chunks
  md6 : active raid0 hdc6[1] hda6[0] 1027968 blocks 32k chunks
  md7 : active raid0 hdc7[1] hda7[0] 634752 blocks 32k chunks
  md8 : active raid0 hdc8[1] hda8[0] 1027968 blocks 32k chunks
  md9 : active raid0 hdc9[1] hda9[0] 2237568 blocks 32k chunks
  md10 : active raid0 hdc10[1] hda10[0] 60288 blocks 32k chunks
  md11 : active linear hdc13[3] hdc11[1] hda13[2] hda11[0] 422912 blocks 32k rounding

that is 6 of them are RAID0 and 1 is RAIDlinear.

I compiled the 2.4.0-test11 kernel on a RH7.0 installation (using kgcc),
with the following options:

  CONFIG_MD=y
  CONFIG_BLK_DEV_MD=y
  CONFIG_MD_LINEAR=y
  CONFIG_MD_RAID0=y
  CONFIG_MD_RAID1=y
  CONFIG_MD_RAID5=m
  # CONFIG_MD_BOOT is not set
  CONFIG_AUTODETECT_RAID=y
  # CONFIG_BLK_DEV_LVM is not set
  # CONFIG_LVM_PROC_FS is not set

but when I try to boot the kernel panics during the RAID autodetect phase
(without autodetection it seems OK). Tried many times, not a random problem.
So I used "console=ttyS0" to grab the output and I found it is booting
without errors (!). Tried many times.
I had to hand copy the text from the screen to this mail.

This is what I see when everything is OK (serial console grab):

  Linux version 2.4.0-test11 ([EMAIL PROTECTED]) (gcc version egcs-2.91.66 19990314
  /Linux (egcs-1.1.2 release)) #1 Fri Dec 1 12:27:42 CET 2000
[...]
  Partition check:
   hda: hda1 hda2 hda3 hda4 < hda5 hda6 hda7 hda8 hda9 hda10 hda11 hda12 hda13 hda
  14 >
   hdc: hdc1 hdc2 hdc3 hdc4 < hdc5 hdc6 hdc7 hdc8 hdc9 hdc10 hdc11 hdc12 hdc13 hdc
  14 >
   hdd: hdd1 hdd2 hdd3 hdd4
[...]
  md driver 0.90.0 MAX_MD_DEVS=256, MAX_REAL=12
  linear personality registered
  raid0 personality registered
  raid1 personality registered
  md.c: sizeof(mdp_super_t) = 4096
  autodetecting RAID arrays
  (read) hda5's sb offset: 1028032 [events: 0ace]
  (read) hda6's sb offset: 513984 [events: 0aca]
  (read) hda7's sb offset: 317376 [events: 0aca]
  (read) hda8's sb offset: 513984 [events: 0aca]
  (read) hda9's sb offset: 1118784 [events: 0096]
  (read) hda10's sb offset: 30144 [events: 0096]
  (read) hda11's sb offset: 105728 [events: 00a4]
  (read) hda13's sb offset: 105728 [events: 00a4]
  (read) hdc5's sb offset: 1028032 [events: 0ace]
  (read) hdc6's sb offset: 513984 [events: 0aca]
  (read) hdc7's sb offset: 317376 [events: 0aca]
  (read) hdc8's sb offset: 513984 [events: 0aca]
  (read) hdc9's sb offset: 1118784 [events: 0096]
  (read) hdc10's sb offset: 30144 [events: 0096]
  (read) hdc11's sb offset: 105728 [events: 00a4]
  (read) hdc13's sb offset: 105728 [events: 00a4]
  autorun ...
  considering hdc13 ...
adding hdc13 ...
adding hdc11 ...
adding hda13 ...
adding hda11 ...
  created md11
  bind
  md11: WARNING: hda13 appears to be on the same physical disk as hda11. True
   protection against single-disk failure might be compromised.
  bind
  bind
  md11: WARNING: hdc13 appears to be on the same physical disk as hdc11. True
   protection against single-disk failure might be compromised.
  bind
  running: 
  now!
  hdc13's event counter: 00a4
  hdc11's event counter: 00a4
  hda13's event counter: 00a4
  hda11's event counter: 00a4
  md11: max total readahead window set to 124k
  md11: 1 data-disks, max readahead per data-disk: 124k
  md: updating md11 RAID superblock on device
  hdc13 [events: 00a5](write) hdc13's sb offset: 105728
  hdc11 [events: 00a5](write) hdc11's sb offset: 105728
  hda13 [events: 00a5](write) hda13's sb offset: 105728
  hda11 [events: 00a5](write) hda11's sb offset: 105728
  .
  considering hdc10 ...
adding hdc10 ...
adding hda10 ...
  created md10
  bind
  bind
  running: 
  now!
  hdc10's event counter: 0096
  hda10's event counter: 0096
  md10: max total readahead window set to 496k
  md10: 2 data-disks, max readahead per data-disk: 248k
  raid0: looking at hda10
  raid0:   comparing hda10(30144) with hda10(30144)
  raid0:   END
  raid0:   ==> UNIQUE
  raid0: 1 zones
  raid0: looking at hdc10
  raid0:   comparing hdc10(30144) with hda10(30144)
  raid0:   EQUAL
  raid0: FINAL 1 zones
  zone 0
   checking hda10 ... contained as device 0
(30144) is smallest!.
   checking hdc10 ... contained as device 1
   zone->nb_dev: 2, size: 60288
  current zone offset: 30144
  done.
  raid0 : md_size is 60288 blocks.
  raid0 : conf->smallest->size is 60288 blocks.
  raid0 : nb_zone is 1.
  raid0 : Allocating 8 bytes for hash.
  md: updating md10 RAID superblock on device
  hdc10 [events: 0097](write) hdc10's sb offset: 30144
  hda10 [events: 0097](write) hda10's sb offset: 30144
  .
 

kernel panic in SoftwareRAID autodetection

2000-12-01 Thread Roberto Ragusa

Please CC to me because I'm not a LKML subscriber.

Hi,

I found a real showstopper problem in the SoftwareRAID autodetect
code; 2.4.0-test10 and 2.4.0-test11 are affected (I didn't test
previous versions).

I'm using two IDE disk with some RAIDed partitions:

  md5 : active raid0 hdc5[1] hda5[0] 2056064 blocks 32k chunks
  md6 : active raid0 hdc6[1] hda6[0] 1027968 blocks 32k chunks
  md7 : active raid0 hdc7[1] hda7[0] 634752 blocks 32k chunks
  md8 : active raid0 hdc8[1] hda8[0] 1027968 blocks 32k chunks
  md9 : active raid0 hdc9[1] hda9[0] 2237568 blocks 32k chunks
  md10 : active raid0 hdc10[1] hda10[0] 60288 blocks 32k chunks
  md11 : active linear hdc13[3] hdc11[1] hda13[2] hda11[0] 422912 blocks 32k rounding

that is 6 of them are RAID0 and 1 is RAIDlinear.

I compiled the 2.4.0-test11 kernel on a RH7.0 installation (using kgcc),
with the following options:

  CONFIG_MD=y
  CONFIG_BLK_DEV_MD=y
  CONFIG_MD_LINEAR=y
  CONFIG_MD_RAID0=y
  CONFIG_MD_RAID1=y
  CONFIG_MD_RAID5=m
  # CONFIG_MD_BOOT is not set
  CONFIG_AUTODETECT_RAID=y
  # CONFIG_BLK_DEV_LVM is not set
  # CONFIG_LVM_PROC_FS is not set

but when I try to boot the kernel panics during the RAID autodetect phase
(without autodetection it seems OK). Tried many times, not a random problem.
So I used "console=ttyS0" to grab the output and I found it is booting
without errors (!). Tried many times.
I had to hand copy the text from the screen to this mail.

This is what I see when everything is OK (serial console grab):

  Linux version 2.4.0-test11 ([EMAIL PROTECTED]) (gcc version egcs-2.91.66 19990314
  /Linux (egcs-1.1.2 release)) #1 Fri Dec 1 12:27:42 CET 2000
[...]
  Partition check:
   hda: hda1 hda2 hda3 hda4  hda5 hda6 hda7 hda8 hda9 hda10 hda11 hda12 hda13 hda
  14 
   hdc: hdc1 hdc2 hdc3 hdc4  hdc5 hdc6 hdc7 hdc8 hdc9 hdc10 hdc11 hdc12 hdc13 hdc
  14 
   hdd: hdd1 hdd2 hdd3 hdd4
[...]
  md driver 0.90.0 MAX_MD_DEVS=256, MAX_REAL=12
  linear personality registered
  raid0 personality registered
  raid1 personality registered
  md.c: sizeof(mdp_super_t) = 4096
  autodetecting RAID arrays
  (read) hda5's sb offset: 1028032 [events: 0ace]
  (read) hda6's sb offset: 513984 [events: 0aca]
  (read) hda7's sb offset: 317376 [events: 0aca]
  (read) hda8's sb offset: 513984 [events: 0aca]
  (read) hda9's sb offset: 1118784 [events: 0096]
  (read) hda10's sb offset: 30144 [events: 0096]
  (read) hda11's sb offset: 105728 [events: 00a4]
  (read) hda13's sb offset: 105728 [events: 00a4]
  (read) hdc5's sb offset: 1028032 [events: 0ace]
  (read) hdc6's sb offset: 513984 [events: 0aca]
  (read) hdc7's sb offset: 317376 [events: 0aca]
  (read) hdc8's sb offset: 513984 [events: 0aca]
  (read) hdc9's sb offset: 1118784 [events: 0096]
  (read) hdc10's sb offset: 30144 [events: 0096]
  (read) hdc11's sb offset: 105728 [events: 00a4]
  (read) hdc13's sb offset: 105728 [events: 00a4]
  autorun ...
  considering hdc13 ...
adding hdc13 ...
adding hdc11 ...
adding hda13 ...
adding hda11 ...
  created md11
  bindhda11,1
  md11: WARNING: hda13 appears to be on the same physical disk as hda11. True
   protection against single-disk failure might be compromised.
  bindhda13,2
  bindhdc11,3
  md11: WARNING: hdc13 appears to be on the same physical disk as hdc11. True
   protection against single-disk failure might be compromised.
  bindhdc13,4
  running: hdc13hdc11hda13hda11
  now!
  hdc13's event counter: 00a4
  hdc11's event counter: 00a4
  hda13's event counter: 00a4
  hda11's event counter: 00a4
  md11: max total readahead window set to 124k
  md11: 1 data-disks, max readahead per data-disk: 124k
  md: updating md11 RAID superblock on device
  hdc13 [events: 00a5](write) hdc13's sb offset: 105728
  hdc11 [events: 00a5](write) hdc11's sb offset: 105728
  hda13 [events: 00a5](write) hda13's sb offset: 105728
  hda11 [events: 00a5](write) hda11's sb offset: 105728
  .
  considering hdc10 ...
adding hdc10 ...
adding hda10 ...
  created md10
  bindhda10,1
  bindhdc10,2
  running: hdc10hda10
  now!
  hdc10's event counter: 0096
  hda10's event counter: 0096
  md10: max total readahead window set to 496k
  md10: 2 data-disks, max readahead per data-disk: 248k
  raid0: looking at hda10
  raid0:   comparing hda10(30144) with hda10(30144)
  raid0:   END
  raid0:   == UNIQUE
  raid0: 1 zones
  raid0: looking at hdc10
  raid0:   comparing hdc10(30144) with hda10(30144)
  raid0:   EQUAL
  raid0: FINAL 1 zones
  zone 0
   checking hda10 ... contained as device 0
(30144) is smallest!.
   checking hdc10 ... contained as device 1
   zone-nb_dev: 2, size: 60288
  current zone offset: 30144
  done.
  raid0 : md_size is 60288 blocks.
  raid0 : conf-smallest-size is 60288 blocks.
  raid0 : nb_zone is 1.
  raid0 : Allocating 8 bytes for hash.
  md: updating md10 RAID superblock on device
  hdc10 [events: 0097](write) hdc10's sb offset: 30144

Re: kernel panic in SoftwareRAID autodetection

2000-12-01 Thread Neil Brown

On Friday December 1, [EMAIL PROTECTED] wrote:
 Please CC to me because I'm not a LKML subscriber.
 
 Hi,
 
 I found a real showstopper problem in the SoftwareRAID autodetect
 code; 2.4.0-test10 and 2.4.0-test11 are affected (I didn't test
 previous versions).

Fixed in 2.4.0-test12pre3.  
If you are a raid users, it might be worth while subscribing to
[EMAIL PROTECTED]   Then you probably would have known
already

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/