Re: sparc64: find root device on hardware RAID
> Date: Mon, 13 Jan 2020 10:59:30 +0100 > From: Klemens Nanni > > On Fri, Jan 03, 2020 at 08:30:42PM +0100, Mark Kettenis wrote: > > Can we leave out the #ifdef __sparc64__? Unless somebody can come up > > with a really good reason for it... > The code should be safe on all platforms but I put it in to ensure I'm > not breaking stuff I cannot test, e.g. anything else than sparc64/OBP. > > deraadt expressed the same concerns. > > With the two of you arguing for removal, I'm confident enough to remove > it and make mpii(4) MI again; unless I hear arguments against it, I'll > commit this soon. Great, ok kettenis@ > Index: mpii.c > === > RCS file: /cvs/src/sys/dev/pci/mpii.c,v > retrieving revision 1.125 > diff -u -p -r1.125 mpii.c > --- mpii.c3 Jan 2020 08:39:31 - 1.125 > +++ mpii.c13 Jan 2020 09:59:01 - > @@ -930,15 +930,13 @@ mpii_scsi_probe(struct scsi_link *link) > return (EINVAL); > > link->port_wwn = letoh64(vpg.wwid); > -#ifdef __sparc64__ > /* >* WWIDs generated by LSI firmware are not IEEE NAA compliant > - * so historical practise in OBP is to set the top nibble to 3 > - * to indicate that this is a RAID volume. > + * and historical practise in OBP on sparc64 is to set the top > + * nibble to 3 to indicate that this is a RAID volume. >*/ > link->port_wwn &= 0x0fff; > link->port_wwn |= 0x3000; > -#endif > > return (0); > } >
Re: sparc64: find root device on hardware RAID
On Fri, Jan 03, 2020 at 08:30:42PM +0100, Mark Kettenis wrote: > Can we leave out the #ifdef __sparc64__? Unless somebody can come up > with a really good reason for it... The code should be safe on all platforms but I put it in to ensure I'm not breaking stuff I cannot test, e.g. anything else than sparc64/OBP. deraadt expressed the same concerns. With the two of you arguing for removal, I'm confident enough to remove it and make mpii(4) MI again; unless I hear arguments against it, I'll commit this soon. Index: mpii.c === RCS file: /cvs/src/sys/dev/pci/mpii.c,v retrieving revision 1.125 diff -u -p -r1.125 mpii.c --- mpii.c 3 Jan 2020 08:39:31 - 1.125 +++ mpii.c 13 Jan 2020 09:59:01 - @@ -930,15 +930,13 @@ mpii_scsi_probe(struct scsi_link *link) return (EINVAL); link->port_wwn = letoh64(vpg.wwid); -#ifdef __sparc64__ /* * WWIDs generated by LSI firmware are not IEEE NAA compliant -* so historical practise in OBP is to set the top nibble to 3 -* to indicate that this is a RAID volume. +* and historical practise in OBP on sparc64 is to set the top +* nibble to 3 to indicate that this is a RAID volume. */ link->port_wwn &= 0x0fff; link->port_wwn |= 0x3000; -#endif return (0); }
Re: sparc64: find root device on hardware RAID
> Date: Tue, 31 Dec 2019 09:12:56 +1000 > From: Jonathan Matthew > Content-Type: text/plain; charset=us-ascii > Content-Disposition: inline > > On Mon, Dec 30, 2019 at 03:36:54PM +0100, Klemens Nanni wrote: > > On Mon, Dec 30, 2019 at 06:59:35PM +1000, Jonathan Matthew wrote: > > > Here's the Solaris explanation: > > > > > > https://github.com/illumos/illumos-gate/blob/master/usr/src/uts/common/sys/scsi/adapters/mpt_sas/mptsas_var.h#L195 > > Thanks for digging. > > > > > I think we should copy what they're doing here, that is, replace the high > > > bits > > > with 3 rather than adding 3 to it. I'm really not sure where we should do > > > this though. Maybe in mpii, but only on sparc64? > > As that is now a general quirk in all RAID volumes and no longer > > specific to bootpath handling, mpii seems only appropiate and autoconf > > must not know about this. > > > > Since Illumos does exactly that with mptsas_get_raid_wwid() in > > mptsas_raidconf_page_0_cb(), which after a brief look seems like the > > code path equivalent to our recent WWID addition in mpii_scsi_probe(), > > I'm inclined to just do it there. > > > > Diff below doas that. > > > > > As far as I can tell, the raid controller generates 128 bit WWIDs for raid > > > volumes, but only has a 64 bit field to report the ID to the host, so it > > > only > > > puts the vendor-specified part here (you can see the last half of the ID > > > string > > > printed when sd0 attaches matches sl->port_wwn in reverse). I think the > > > purpose of setting the high 4 bits to 3 is that 3 is not a defined NAA > > > value, > > > so you're not going to find a proper WWID coming from other device that > > > will > > > match that. > > I did not manage to recognise this detail of the reversed ID, indeed: > > > > sd0 at scsibus1 targ 0 lun 0: > > naa.600508e06cd1dcd59022a30a > > > > Feedback? OK? > > I think this is probably the most sensible thing we can do here. > ok jmatthew@ but I'd wait and see if anyone has a better idea. Can we leave out the #ifdef __sparc64__? Unless somebody can come up with a really good reason for it... > > Index: dev/pci/mpii.c > > === > > RCS file: /cvs/src/sys/dev/pci/mpii.c,v > > retrieving revision 1.123 > > diff -u -p -r1.123 mpii.c > > --- dev/pci/mpii.c 29 Dec 2019 21:30:21 - 1.123 > > +++ dev/pci/mpii.c 30 Dec 2019 14:26:57 - > > @@ -930,6 +930,15 @@ mpii_scsi_probe(struct scsi_link *link) > > return (EINVAL); > > > > link->port_wwn = letoh64(vpg.wwid); > > +#ifdef __sparc64__ > > + /* > > +* WWIDs generated by LSI firmware are not IEEE NAA compliant > > +* so historical practise in OBP is to set the top nibble to 3 > > +* to indicate that this is a RAID volume. > > +*/ > > + link->port_wwn &= 0x0fff; > > + link->port_wwn |= 0x3000; > > +#endif > > > > return (0); > > } > > Index: arch/sparc64/sparc64/autoconf.c > > === > > RCS file: /cvs/src/sys/arch/sparc64/sparc64/autoconf.c,v > > retrieving revision 1.133 > > diff -u -p -r1.133 autoconf.c > > --- arch/sparc64/sparc64/autoconf.c 15 Oct 2019 05:21:16 - 1.133 > > +++ arch/sparc64/sparc64/autoconf.c 30 Dec 2019 14:27:17 - > > @@ -1455,7 +1455,7 @@ device_register(struct device *dev, void > > u_int lun = bp->val[1]; > > > > if (bp->val[0] & 0x && bp->val[0] != -1) { > > - /* Fibre channel? */ > > + /* Hardware RAID or Fibre channel? */ > > if (bp->val[0] == sl->port_wwn && lun == sl->lun) { > > nail_bootdev(dev, bp); > > } > >
Re: sparc64: find root device on hardware RAID
> Date: Mon, 30 Dec 2019 18:59:35 +1000 > From: Jonathan Matthew > > On Sun, Dec 29, 2019 at 05:58:02AM +0100, Klemens Nanni wrote: > > On Sun, Dec 29, 2019 at 01:59:38PM +1000, Jonathan Matthew wrote: > > > I think we have the wrong size for the volume name, hence the difference > > > between the size reported by the controller and the size of vpg. > > Indeed, good catch! > > > > OBP's `create-raid*-volume' commands also prompt for names no longer > > than that: > > > > {0} ok 9 b c d create-raid0-volume > > ... > > Enter a volume name: [0 to 15 characters] foo > > {0} ok > > > > > try this out? > > Just works, the WWID is no longer clobbered and autoconf eventually sees > > it in the port WWN: > > > > mpii0 at pci15 dev 0 function 0 "Symbios Logic SAS2008" rev 0x03: msi > > mpii0: Solana On-Board, firmware 9.0.0.0 IR, MPI 2.0 > > scsibus1 at mpii0: 834 targets > > mpii_scsi_probe: target 0 lun 0 port_wwn 0 node_wwn 0 has > > MPII_DF_VOLUME set in flags 10 > > struct mpii_cfg_raid_vol_pg1 vpg: > > volume_id: 81, tvolume_bus: 3, volume_ioc: 0 > > wwid: aa32290d5dcd16c > > sd0 at scsibus1 targ 0 lun 0: > > naa.600508e06cd1dcd59022a30a > > device_register: RAID: > > bp->val[]: 3aa32290d5dcd16c, 0, 0 > > target: d5dcd16c, sl->target: 0 > > lun: 0, sl->lun: 0 > > sl->port_wwn: aa32290d5dcd16c, sl->node_wwn: 0 > > > > sd0: 713824MB, 512 bytes/sector, 1461911552 sectors > > > > Thanks a lot, > > OK kn > > > > Now I need to work around the first digit's mismatch; for reasons still > > unknown to me, official documentation states that the RAID volume WWID's > > first digit --if it is zero-- must be replaced with three, so the > > bootpath contains 3aa32290d5dcd16c whereas the port WWN has the correct > > aa32290d5dcd16c. > > Here's the Solaris explanation: > > https://github.com/illumos/illumos-gate/blob/master/usr/src/uts/common/sys/scsi/adapters/mpt_sas/mptsas_var.h#L195 > > I think we should copy what they're doing here, that is, replace the high bits > with 3 rather than adding 3 to it. I'm really not sure where we should do > this though. Maybe in mpii, but only on sparc64? > > As far as I can tell, the raid controller generates 128 bit WWIDs for raid > volumes, but only has a 64 bit field to report the ID to the host, so it only > puts the vendor-specified part here (you can see the last half of the ID > string > printed when sd0 attaches matches sl->port_wwn in reverse). I think the > purpose of setting the high 4 bits to 3 is that 3 is not a defined NAA value, > so you're not going to find a proper WWID coming from other device that will > match that. Can you think of a reason why we would care about the exact WWIDs for these RAID volumes on other architectures? If not, I'd prefer to "fix" this in mpii(4) since this sounds like a deficiency in the LSI firmware.
Re: sparc64: find root device on hardware RAID
On Tue, Dec 31, 2019 at 09:12:56AM +1000, Jonathan Matthew wrote: > I think this is probably the most sensible thing we can do here. > ok jmatthew@ but I'd wait and see if anyone has a better idea. I'll commit later this evening so that snapshots will just work on my machine, this makes upgrading easier for me. We can still improve this in-tree if someone comes up with a better idea.
Re: sparc64: find root device on hardware RAID
On Mon, Dec 30, 2019 at 03:36:54PM +0100, Klemens Nanni wrote: > On Mon, Dec 30, 2019 at 06:59:35PM +1000, Jonathan Matthew wrote: > > Here's the Solaris explanation: > > > > https://github.com/illumos/illumos-gate/blob/master/usr/src/uts/common/sys/scsi/adapters/mpt_sas/mptsas_var.h#L195 > Thanks for digging. > > > I think we should copy what they're doing here, that is, replace the high > > bits > > with 3 rather than adding 3 to it. I'm really not sure where we should do > > this though. Maybe in mpii, but only on sparc64? > As that is now a general quirk in all RAID volumes and no longer > specific to bootpath handling, mpii seems only appropiate and autoconf > must not know about this. > > Since Illumos does exactly that with mptsas_get_raid_wwid() in > mptsas_raidconf_page_0_cb(), which after a brief look seems like the > code path equivalent to our recent WWID addition in mpii_scsi_probe(), > I'm inclined to just do it there. > > Diff below doas that. > > > As far as I can tell, the raid controller generates 128 bit WWIDs for raid > > volumes, but only has a 64 bit field to report the ID to the host, so it > > only > > puts the vendor-specified part here (you can see the last half of the ID > > string > > printed when sd0 attaches matches sl->port_wwn in reverse). I think the > > purpose of setting the high 4 bits to 3 is that 3 is not a defined NAA > > value, > > so you're not going to find a proper WWID coming from other device that will > > match that. > I did not manage to recognise this detail of the reversed ID, indeed: > > sd0 at scsibus1 targ 0 lun 0: > naa.600508e06cd1dcd59022a30a > > Feedback? OK? I think this is probably the most sensible thing we can do here. ok jmatthew@ but I'd wait and see if anyone has a better idea. > > > Index: dev/pci/mpii.c > === > RCS file: /cvs/src/sys/dev/pci/mpii.c,v > retrieving revision 1.123 > diff -u -p -r1.123 mpii.c > --- dev/pci/mpii.c29 Dec 2019 21:30:21 - 1.123 > +++ dev/pci/mpii.c30 Dec 2019 14:26:57 - > @@ -930,6 +930,15 @@ mpii_scsi_probe(struct scsi_link *link) > return (EINVAL); > > link->port_wwn = letoh64(vpg.wwid); > +#ifdef __sparc64__ > + /* > + * WWIDs generated by LSI firmware are not IEEE NAA compliant > + * so historical practise in OBP is to set the top nibble to 3 > + * to indicate that this is a RAID volume. > + */ > + link->port_wwn &= 0x0fff; > + link->port_wwn |= 0x3000; > +#endif > > return (0); > } > Index: arch/sparc64/sparc64/autoconf.c > === > RCS file: /cvs/src/sys/arch/sparc64/sparc64/autoconf.c,v > retrieving revision 1.133 > diff -u -p -r1.133 autoconf.c > --- arch/sparc64/sparc64/autoconf.c 15 Oct 2019 05:21:16 - 1.133 > +++ arch/sparc64/sparc64/autoconf.c 30 Dec 2019 14:27:17 - > @@ -1455,7 +1455,7 @@ device_register(struct device *dev, void > u_int lun = bp->val[1]; > > if (bp->val[0] & 0x && bp->val[0] != -1) { > - /* Fibre channel? */ > + /* Hardware RAID or Fibre channel? */ > if (bp->val[0] == sl->port_wwn && lun == sl->lun) { > nail_bootdev(dev, bp); > }
Re: sparc64: find root device on hardware RAID
On Mon, Dec 30, 2019 at 06:59:35PM +1000, Jonathan Matthew wrote: > Here's the Solaris explanation: > > https://github.com/illumos/illumos-gate/blob/master/usr/src/uts/common/sys/scsi/adapters/mpt_sas/mptsas_var.h#L195 Thanks for digging. > I think we should copy what they're doing here, that is, replace the high bits > with 3 rather than adding 3 to it. I'm really not sure where we should do > this though. Maybe in mpii, but only on sparc64? As that is now a general quirk in all RAID volumes and no longer specific to bootpath handling, mpii seems only appropiate and autoconf must not know about this. Since Illumos does exactly that with mptsas_get_raid_wwid() in mptsas_raidconf_page_0_cb(), which after a brief look seems like the code path equivalent to our recent WWID addition in mpii_scsi_probe(), I'm inclined to just do it there. Diff below doas that. > As far as I can tell, the raid controller generates 128 bit WWIDs for raid > volumes, but only has a 64 bit field to report the ID to the host, so it only > puts the vendor-specified part here (you can see the last half of the ID > string > printed when sd0 attaches matches sl->port_wwn in reverse). I think the > purpose of setting the high 4 bits to 3 is that 3 is not a defined NAA value, > so you're not going to find a proper WWID coming from other device that will > match that. I did not manage to recognise this detail of the reversed ID, indeed: sd0 at scsibus1 targ 0 lun 0: naa.600508e06cd1dcd59022a30a Feedback? OK? Index: dev/pci/mpii.c === RCS file: /cvs/src/sys/dev/pci/mpii.c,v retrieving revision 1.123 diff -u -p -r1.123 mpii.c --- dev/pci/mpii.c 29 Dec 2019 21:30:21 - 1.123 +++ dev/pci/mpii.c 30 Dec 2019 14:26:57 - @@ -930,6 +930,15 @@ mpii_scsi_probe(struct scsi_link *link) return (EINVAL); link->port_wwn = letoh64(vpg.wwid); +#ifdef __sparc64__ + /* +* WWIDs generated by LSI firmware are not IEEE NAA compliant +* so historical practise in OBP is to set the top nibble to 3 +* to indicate that this is a RAID volume. +*/ + link->port_wwn &= 0x0fff; + link->port_wwn |= 0x3000; +#endif return (0); } Index: arch/sparc64/sparc64/autoconf.c === RCS file: /cvs/src/sys/arch/sparc64/sparc64/autoconf.c,v retrieving revision 1.133 diff -u -p -r1.133 autoconf.c --- arch/sparc64/sparc64/autoconf.c 15 Oct 2019 05:21:16 - 1.133 +++ arch/sparc64/sparc64/autoconf.c 30 Dec 2019 14:27:17 - @@ -1455,7 +1455,7 @@ device_register(struct device *dev, void u_int lun = bp->val[1]; if (bp->val[0] & 0x && bp->val[0] != -1) { - /* Fibre channel? */ + /* Hardware RAID or Fibre channel? */ if (bp->val[0] == sl->port_wwn && lun == sl->lun) { nail_bootdev(dev, bp); }
Re: sparc64: find root device on hardware RAID
On Sun, Dec 29, 2019 at 05:58:02AM +0100, Klemens Nanni wrote: > On Sun, Dec 29, 2019 at 01:59:38PM +1000, Jonathan Matthew wrote: > > I think we have the wrong size for the volume name, hence the difference > > between the size reported by the controller and the size of vpg. > Indeed, good catch! > > OBP's `create-raid*-volume' commands also prompt for names no longer > than that: > > {0} ok 9 b c d create-raid0-volume > ... > Enter a volume name: [0 to 15 characters] foo > {0} ok > > > try this out? > Just works, the WWID is no longer clobbered and autoconf eventually sees > it in the port WWN: > > mpii0 at pci15 dev 0 function 0 "Symbios Logic SAS2008" rev 0x03: msi > mpii0: Solana On-Board, firmware 9.0.0.0 IR, MPI 2.0 > scsibus1 at mpii0: 834 targets > mpii_scsi_probe: target 0 lun 0 port_wwn 0 node_wwn 0 has > MPII_DF_VOLUME set in flags 10 > struct mpii_cfg_raid_vol_pg1 vpg: > volume_id: 81, tvolume_bus: 3, volume_ioc: 0 > wwid: aa32290d5dcd16c > sd0 at scsibus1 targ 0 lun 0: > naa.600508e06cd1dcd59022a30a > device_register: RAID: > bp->val[]: 3aa32290d5dcd16c, 0, 0 > target: d5dcd16c, sl->target: 0 > lun: 0, sl->lun: 0 > sl->port_wwn: aa32290d5dcd16c, sl->node_wwn: 0 > > sd0: 713824MB, 512 bytes/sector, 1461911552 sectors > > Thanks a lot, > OK kn > > Now I need to work around the first digit's mismatch; for reasons still > unknown to me, official documentation states that the RAID volume WWID's > first digit --if it is zero-- must be replaced with three, so the > bootpath contains 3aa32290d5dcd16c whereas the port WWN has the correct > aa32290d5dcd16c. Here's the Solaris explanation: https://github.com/illumos/illumos-gate/blob/master/usr/src/uts/common/sys/scsi/adapters/mpt_sas/mptsas_var.h#L195 I think we should copy what they're doing here, that is, replace the high bits with 3 rather than adding 3 to it. I'm really not sure where we should do this though. Maybe in mpii, but only on sparc64? As far as I can tell, the raid controller generates 128 bit WWIDs for raid volumes, but only has a 64 bit field to report the ID to the host, so it only puts the vendor-specified part here (you can see the last half of the ID string printed when sd0 attaches matches sl->port_wwn in reverse). I think the purpose of setting the high 4 bits to 3 is that 3 is not a defined NAA value, so you're not going to find a proper WWID coming from other device that will match that.
Re: sparc64: find root device on hardware RAID
On Mon, Dec 30, 2019 at 06:54:02AM +1000, Jonathan Matthew wrote: > I'd prefer this: > > pagelen = min(hdr.page_length * 4, sizeof(vpg)); > > just to avoid trashing the stack if the page grows in newer firmware versions. Sure, I'll commit with min() and a small comment for that, thanks.
Re: sparc64: find root device on hardware RAID
On Sun, Dec 29, 2019 at 07:40:01AM +0100, Klemens Nanni wrote: > On Sun, Dec 29, 2019 at 03:30:15AM +0100, Klemens Nanni wrote: > > > link->target isn't the right place to put this, for one thing it's only > > > 16 bits > > > and the wwn is 64 bits, and it's used throughout the driver to look up > > > devices > > > in an array, so changing it will break things. I think link->port_wwn is > > > the > > > right place to store it. > > Ah, link->target was there because I played with that in earlier diffs, > > port_wwn is indeed correct here. > > > > > The fields in the page returned by the driver are also little endian, so > > > you'd > > > need letoh64(vpg.wwid) here. > > Thanks, I see that is done elsewhere in the driver, too. > > > > > You should still return 0 here, as continuing on will send the SAS device > > > page 0 request to the raid volume, which will probably upset the > > > controller > > > enough to stop talking to you. > > Oh well... that explains the hangs - I accidentially dropped the return, > > it should obviously stay and my diff should only add another page fetch. > All three points are addressed in the updated diff below, diff three to > find the root device. > > mpii(4) currently leaves the port WWN empty for RAID volumes (physical > devices are populated). autoboot(9) uses this to match the root device > as per the second diff, so we must fill in after fetching the relevant > page as suggested by mikeb and jmatthew. > > Feedback? OK? ok jmatthew@ with one tweak below. > > Index: mpii.c > === > RCS file: /cvs/src/sys/dev/pci/mpii.c,v > retrieving revision 1.122 > diff -u -p -r1.122 mpii.c > --- mpii.c28 Dec 2019 04:38:22 - 1.122 > +++ mpii.c29 Dec 2019 05:02:38 - > @@ -910,8 +910,28 @@ mpii_scsi_probe(struct scsi_link *link) > if (ISSET(flags, MPII_DF_HIDDEN) || ISSET(flags, MPII_DF_UNUSED)) > return (1); > > - if (ISSET(flags, MPII_DF_VOLUME)) > + if (ISSET(flags, MPII_DF_VOLUME)) { > + struct mpii_cfg_hdr hdr; > + struct mpii_cfg_raid_vol_pg1 vpg; > + size_t pagelen; > + > + address = MPII_CFG_RAID_VOL_ADDR_HANDLE | dev->dev_handle; > + > + if (mpii_req_cfg_header(sc, MPII_CONFIG_REQ_PAGE_TYPE_RAID_VOL, > + 1, address, MPII_PG_POLL, ) != 0) > + return (EINVAL); > + > + memset(, 0, sizeof(vpg)); > + pagelen = hdr.page_length * 4; I'd prefer this: pagelen = min(hdr.page_length * 4, sizeof(vpg)); just to avoid trashing the stack if the page grows in newer firmware versions. > + > + if (mpii_req_cfg_page(sc, address, MPII_PG_POLL, , 1, > + , pagelen) != 0) > + return (EINVAL); > + > + link->port_wwn = letoh64(vpg.wwid); > + > return (0); > + } > > memset(, 0, sizeof(ehdr)); > ehdr.page_type = MPII_CONFIG_REQ_PAGE_TYPE_EXTENDED;
Re: sparc64: find root device on hardware RAID
On Sun, Dec 29, 2019 at 03:30:15AM +0100, Klemens Nanni wrote: > > link->target isn't the right place to put this, for one thing it's only 16 > > bits > > and the wwn is 64 bits, and it's used throughout the driver to look up > > devices > > in an array, so changing it will break things. I think link->port_wwn is > > the > > right place to store it. > Ah, link->target was there because I played with that in earlier diffs, > port_wwn is indeed correct here. > > > The fields in the page returned by the driver are also little endian, so > > you'd > > need letoh64(vpg.wwid) here. > Thanks, I see that is done elsewhere in the driver, too. > > > You should still return 0 here, as continuing on will send the SAS device > > page 0 request to the raid volume, which will probably upset the controller > > enough to stop talking to you. > Oh well... that explains the hangs - I accidentially dropped the return, > it should obviously stay and my diff should only add another page fetch. All three points are addressed in the updated diff below, diff three to find the root device. mpii(4) currently leaves the port WWN empty for RAID volumes (physical devices are populated). autoboot(9) uses this to match the root device as per the second diff, so we must fill in after fetching the relevant page as suggested by mikeb and jmatthew. Feedback? OK? Index: mpii.c === RCS file: /cvs/src/sys/dev/pci/mpii.c,v retrieving revision 1.122 diff -u -p -r1.122 mpii.c --- mpii.c 28 Dec 2019 04:38:22 - 1.122 +++ mpii.c 29 Dec 2019 05:02:38 - @@ -910,8 +910,28 @@ mpii_scsi_probe(struct scsi_link *link) if (ISSET(flags, MPII_DF_HIDDEN) || ISSET(flags, MPII_DF_UNUSED)) return (1); - if (ISSET(flags, MPII_DF_VOLUME)) + if (ISSET(flags, MPII_DF_VOLUME)) { + struct mpii_cfg_hdr hdr; + struct mpii_cfg_raid_vol_pg1 vpg; + size_t pagelen; + + address = MPII_CFG_RAID_VOL_ADDR_HANDLE | dev->dev_handle; + + if (mpii_req_cfg_header(sc, MPII_CONFIG_REQ_PAGE_TYPE_RAID_VOL, + 1, address, MPII_PG_POLL, ) != 0) + return (EINVAL); + + memset(, 0, sizeof(vpg)); + pagelen = hdr.page_length * 4; + + if (mpii_req_cfg_page(sc, address, MPII_PG_POLL, , 1, + , pagelen) != 0) + return (EINVAL); + + link->port_wwn = letoh64(vpg.wwid); + return (0); + } memset(, 0, sizeof(ehdr)); ehdr.page_type = MPII_CONFIG_REQ_PAGE_TYPE_EXTENDED;
Re: sparc64: find root device on hardware RAID
On Sun, Dec 29, 2019 at 05:58:02AM +0100, Klemens Nanni wrote: > Now I need to work around the first digit's mismatch; for reasons still > unknown to me, official documentation states that the RAID volume WWID's > first digit --if it is zero-- must be replaced with three, so the > bootpath contains 3aa32290d5dcd16c whereas the port WWN has the correct > aa32290d5dcd16c. > > Fix that and the kernel will match the device and find its root device > automatically. I get around to a diff for that now. https://jp.fujitsu.com/platform/server/sparc/manual/en/c120-e679/14.2.12.html is the only freely available documentation I could find online which mentions how to craft bootpaths for RAID volumes. It says Replace the number "0" at the beginning of WWID of the RAID volume with "3". which implies to me that every WWID of such RAID volumes starts with zero, hence it must always be replaced three; otherwise documentation is incomplete. I cannot find documentation or any kind of reasoning as to *why* this must be done, anyone who knows please enlighten me. So to match these, simply add an additional check in the existing code path for Fibre channel with already does what we want for hardware RAID. This is the second of three diffs for autoconf(9) to find the root device automatically, jmatthew previously sent the first one. Third one follows in a separate mail. Feedback? OK? Index: sparc64/autoconf.c === RCS file: /cvs/src/sys/arch/sparc64/sparc64/autoconf.c,v retrieving revision 1.133 diff -u -p -r1.133 autoconf.c --- sparc64/autoconf.c 15 Oct 2019 05:21:16 - 1.133 +++ sparc64/autoconf.c 29 Dec 2019 06:05:44 - @@ -1455,8 +1455,9 @@ device_register(struct device *dev, void u_int lun = bp->val[1]; if (bp->val[0] & 0x && bp->val[0] != -1) { - /* Fibre channel? */ - if (bp->val[0] == sl->port_wwn && lun == sl->lun) { + /* Hardware RAID or Fibre channel? */ + if ((bp->val[0] == sl->port_wwn + 0x3000 || +bp->val[0] == sl->port_wwn) && lun == sl->lun) { nail_bootdev(dev, bp); }
Re: sparc64: find root device on hardware RAID
On Sun, Dec 29, 2019 at 01:59:38PM +1000, Jonathan Matthew wrote: > I think we have the wrong size for the volume name, hence the difference > between the size reported by the controller and the size of vpg. Indeed, good catch! OBP's `create-raid*-volume' commands also prompt for names no longer than that: {0} ok 9 b c d create-raid0-volume ... Enter a volume name: [0 to 15 characters] foo {0} ok > try this out? Just works, the WWID is no longer clobbered and autoconf eventually sees it in the port WWN: mpii0 at pci15 dev 0 function 0 "Symbios Logic SAS2008" rev 0x03: msi mpii0: Solana On-Board, firmware 9.0.0.0 IR, MPI 2.0 scsibus1 at mpii0: 834 targets mpii_scsi_probe: target 0 lun 0 port_wwn 0 node_wwn 0 has MPII_DF_VOLUME set in flags 10 struct mpii_cfg_raid_vol_pg1 vpg: volume_id: 81, tvolume_bus: 3, volume_ioc: 0 wwid: aa32290d5dcd16c sd0 at scsibus1 targ 0 lun 0: naa.600508e06cd1dcd59022a30a device_register: RAID: bp->val[]: 3aa32290d5dcd16c, 0, 0 target: d5dcd16c, sl->target: 0 lun: 0, sl->lun: 0 sl->port_wwn: aa32290d5dcd16c, sl->node_wwn: 0 sd0: 713824MB, 512 bytes/sector, 1461911552 sectors Thanks a lot, OK kn Now I need to work around the first digit's mismatch; for reasons still unknown to me, official documentation states that the RAID volume WWID's first digit --if it is zero-- must be replaced with three, so the bootpath contains 3aa32290d5dcd16c whereas the port WWN has the correct aa32290d5dcd16c. Fix that and the kernel will match the device and find its root device automatically. I get around to a diff for that now.
Re: sparc64: find root device on hardware RAID
On Sun, Dec 29, 2019 at 03:30:15AM +0100, Klemens Nanni wrote: > On Sun, Dec 29, 2019 at 10:29:48AM +1000, Jonathan Matthew wrote: > > > + memset(, 0, sizeof(vpg)); > > > + pagelen = hdr.page_length * 4; > > > > We probably should check that this isn't larger than the size of vpg. > pagelen is 64, sizeof(vpg) is 80. > > > link->target isn't the right place to put this, for one thing it's only 16 > > bits > > and the wwn is 64 bits, and it's used throughout the driver to look up > > devices > > in an array, so changing it will break things. I think link->port_wwn is > > the > > right place to store it. > Ah, link->target was there because I played with that in earlier diffs, > port_wwn is indeed correct here. > > > The fields in the page returned by the driver are also little endian, so > > you'd > > need letoh64(vpg.wwid) here. > Thanks, I see that is done elsewhere in the driver, too. > > > You should still return 0 here, as continuing on will send the SAS device > > page 0 request to the raid volume, which will probably upset the controller > > enough to stop talking to you. > Oh well... that explains the hangs - I accidentially dropped the return, > it should obviously stay and my diff should only add another page fetch. > > With the return the kernel no longer loops in mpii_wait() and boots fine. > > vpg.wwid however is zero; the other members volume_id, volume_bus, and > volume_ioc are set/non-zero, guid contains "LSI" and name contains > "ssd1e" which I how I named this RAID volume in OBP. > > So why are those bits filled in but not the WWID? I think we have the wrong size for the volume name, hence the difference between the size reported by the controller and the size of vpg. try this out? diff --git sys/dev/pci/mpiireg.h sys/dev/pci/mpiireg.h index 638f31171d1..11dacf86953 100644 --- sys/dev/pci/mpiireg.h +++ sys/dev/pci/mpiireg.h @@ -1355,7 +1355,7 @@ struct mpii_cfg_raid_vol_pg1 { u_int8_tguid[24]; - u_int8_tname[32]; + u_int8_tname[16]; u_int64_t wwid;
Re: sparc64: find root device on hardware RAID
On Sun, Dec 29, 2019 at 10:29:48AM +1000, Jonathan Matthew wrote: > > + memset(, 0, sizeof(vpg)); > > + pagelen = hdr.page_length * 4; > > We probably should check that this isn't larger than the size of vpg. pagelen is 64, sizeof(vpg) is 80. > link->target isn't the right place to put this, for one thing it's only 16 > bits > and the wwn is 64 bits, and it's used throughout the driver to look up devices > in an array, so changing it will break things. I think link->port_wwn is the > right place to store it. Ah, link->target was there because I played with that in earlier diffs, port_wwn is indeed correct here. > The fields in the page returned by the driver are also little endian, so you'd > need letoh64(vpg.wwid) here. Thanks, I see that is done elsewhere in the driver, too. > You should still return 0 here, as continuing on will send the SAS device > page 0 request to the raid volume, which will probably upset the controller > enough to stop talking to you. Oh well... that explains the hangs - I accidentially dropped the return, it should obviously stay and my diff should only add another page fetch. With the return the kernel no longer loops in mpii_wait() and boots fine. vpg.wwid however is zero; the other members volume_id, volume_bus, and volume_ioc are set/non-zero, guid contains "LSI" and name contains "ssd1e" which I how I named this RAID volume in OBP. So why are those bits filled in but not the WWID?
Re: sparc64: find root device on hardware RAID
On Fri, Dec 27, 2019 at 07:50:34PM +0100, Klemens Nanni wrote: > On Fri, Dec 27, 2019 at 09:46:56AM +0100, Mike Belopuhov wrote: > > Looks like WWID for the RAID volume can be read from the RAID Volume > > Page 1 (mpii_cfg_raid_vol_pg1). > jmatthew also suggested that, thanks. > > I'm looking into now mpii(4) and already had a rather naive attempt at > setting the SCSI target to the volume's WWID, but with no avail. > > Diff below is what I booted last, but for reasons yet unkown to me the > kernel just hangs afer debug printf() > > ... > mpii0 at pci15 dev 0 function 0 "Symbios Logic SAS2008" rev 0x03: msi > > mpii0: Solana On-Board, firmware 9.0.0.0 IR, MPI 2.0 > > scsibus1 at mpii0: 834 targets > > mpii_scsi_probe: target 0 lun 0 port_wwn 0 node_wwn 0 has > MPII_DF_VOLUME set in flags 10 > > and the machine must be reset. So something is wrong with this diff and > I need to get more familiar with this code, but one can also observe > lun->target = vpg.wwid = 0 which I did not expect; perhaps the driver > currently does not obtain the volume's WWID at all or incorrectly? Or > perhaps I am missing steps prior to fetching the page; current mpii(4) > does not seem to use struct mpii_cfg_raid_vol_pg1 at all. > > Index: mpii.c > === > RCS file: /cvs/src/sys/dev/pci/mpii.c,v > retrieving revision 1.121 > diff -u -p -r1.121 mpii.c > --- mpii.c12 Sep 2019 22:22:53 - 1.121 > +++ mpii.c27 Dec 2019 14:51:36 - > @@ -909,8 +909,33 @@ mpii_scsi_probe(struct scsi_link *link) > if (ISSET(flags, MPII_DF_HIDDEN) || ISSET(flags, MPII_DF_UNUSED)) > return (1); > > - if (ISSET(flags, MPII_DF_VOLUME)) > - return (0); > + if (ISSET(flags, MPII_DF_VOLUME)) { > + struct mpii_cfg_hdr hdr; > + struct mpii_cfg_raid_vol_pg1 vpg; > + size_t pagelen; > + > + address = MPII_CFG_RAID_VOL_ADDR_HANDLE | dev->dev_handle; > + > + if (mpii_req_cfg_header(sc, MPII_CONFIG_REQ_PAGE_TYPE_RAID_VOL, > + 1, address, MPII_PG_POLL, ) != 0) > + return (EINVAL); > + > + memset(, 0, sizeof(vpg)); > + pagelen = hdr.page_length * 4; We probably should check that this isn't larger than the size of vpg. > + > + if (mpii_req_cfg_page(sc, address, MPII_PG_POLL, , 1, > + , pagelen) != 0) > + return (EINVAL); > + > + link->target = vpg.wwid; link->target isn't the right place to put this, for one thing it's only 16 bits and the wwn is 64 bits, and it's used throughout the driver to look up devices in an array, so changing it will break things. I think link->port_wwn is the right place to store it. The fields in the page returned by the driver are also little endian, so you'd need letoh64(vpg.wwid) here. > + > + printf("%s: target %x lun %x" > + " port_wwn %llx node_wwn %llx" > + " has MPII_DF_VOLUME set in flags %x\n", > + __func__, link->target, link->lun, > + link->port_wwn, link->node_wwn, > + flags); You should still return 0 here, as continuing on will send the SAS device page 0 request to the raid volume, which will probably upset the controller enough to stop talking to you. > + } > > memset(, 0, sizeof(ehdr)); > ehdr.page_type = MPII_CONFIG_REQ_PAGE_TYPE_EXTENDED; > > > The previous version of this diff put vpg on the heap, e.g. > > + struct mpii_cfg_raid_vol_pg1 *vpg; > + vpg = malloc(pagelen, M_TEMP, M_WAITOK | M_CANFAIL | M_ZERO); > ... > > just like it is done for the mpii_cfg_raid_vol_pg0 struct in > mpii_ioctl_cache(), which is where I copied the code from to fetch pages. > > But with this, the kernel paniced: > > mpii_scsi_probe: target beef lun 0 port_wwn 0 node_wwn 0 has MPII_DF_VOLUME > set in flags 10 > panic: kernel data fault: pc=165e310 addr=40090fae000 > panic: Unable to send mondo 1011fa4 to cpu 0: 6 > Stopped at db_enter+0x8: nop > TIDPIDUID PRFLAGS PFLAGS CPU COMMAND > * 0 0 0 0x1 0x2000K swapper > sun4v_send_ipi(0, 1011fa4, 0, 6, 0, 16) at sun4v_send_ipi+0xac > db_enter_ddb(419aa7f8000, a, 1c3, 2007c68, 3f, 78) at db_enter_ddb+0x244 > db_ktrap(101, 20074a0, 1, 0, 0, 2007728) at db_ktrap+0x104 > trap(20074a0, 101, 11e55e4, 820006, 0, 78) at trap+0x2c0 > Lslowtrap_reenter(1, 20077f8, 175adf8, 20077f8, 193f570, cb) at > Lslowtrap_reenter+0xf8 > panic(175adf8, 165e310, 40090fae000, 20077f8, 1ca1000, 100) at panic+0xb8 > data_access_fault(20078f0, 31, 165e310, 40090fae000, 40090fae000, 1) at >
Re: sparc64: find root device on hardware RAID
On Fri, Dec 27, 2019 at 07:50:34PM +0100, Klemens Nanni wrote: > Diff below is what I booted last, but for reasons yet unkown to me the > kernel just hangs afer debug printf() > > ... > mpii0 at pci15 dev 0 function 0 "Symbios Logic SAS2008" rev 0x03: msi > mpii0: Solana On-Board, firmware 9.0.0.0 IR, MPI 2.0 > scsibus1 at mpii0: 834 targets > mpii_scsi_probe: target 0 lun 0 port_wwn 0 node_wwn 0 has > MPII_DF_VOLUME set in flags 10 More specifically, the driver hangs here: 2839 void 2840 mpii_wait(struct mpii_softc *sc, struct mpii_ccb *ccb) 2841 { 2842struct mutexmtx = MUTEX_INITIALIZER(IPL_BIO); 2843void(*done)(struct mpii_ccb *); 2844void*cookie; 2845 2846done = ccb->ccb_done; 2847cookie = ccb->ccb_cookie; 2848 2849ccb->ccb_done = mpii_wait_done; 2850ccb->ccb_cookie = 2851 2852/* XXX this will wait forever for the ccb to complete */ 2853 2854mpii_start(sc, ccb); 2855 2856mtx_enter(); 2857while (ccb->ccb_cookie != NULL) 2858msleep(ccb, , PRIBIO, "mpiiwait", 0); 2859mtx_leave(); 2860 2861ccb->ccb_cookie = cookie; 2862done(ccb); 2863 } mpii_start() returns then mpii_wait() "wait[s] forever". I'm still practically lost in this area. Why does it hang there only if I fetch the `struct mpii_cfg_raid_vol_pg1' page for its wwid member in mpii_scsi_probe() as per the previous diff? The commit which introduced the XXX: revision 1.31 date: 2010/07/07 10:29:17; author: dlg; state: Exp; lines: +52 -34; bring mpi_wait over to mpii for an mpsafe mechanism to sleep while waiting for a command to complete. this also replaces all the while (!ready) \ tsleep() wrapped in splbio code with mpii_wait. tested with bioctl runs and sensor updates on a raid volume
Re: sparc64: find root device on hardware RAID
On Fri, Dec 27, 2019 at 09:46:56AM +0100, Mike Belopuhov wrote: > Looks like WWID for the RAID volume can be read from the RAID Volume > Page 1 (mpii_cfg_raid_vol_pg1). jmatthew also suggested that, thanks. I'm looking into now mpii(4) and already had a rather naive attempt at setting the SCSI target to the volume's WWID, but with no avail. Diff below is what I booted last, but for reasons yet unkown to me the kernel just hangs afer debug printf() ... mpii0 at pci15 dev 0 function 0 "Symbios Logic SAS2008" rev 0x03: msi mpii0: Solana On-Board, firmware 9.0.0.0 IR, MPI 2.0 scsibus1 at mpii0: 834 targets mpii_scsi_probe: target 0 lun 0 port_wwn 0 node_wwn 0 has MPII_DF_VOLUME set in flags 10 and the machine must be reset. So something is wrong with this diff and I need to get more familiar with this code, but one can also observe lun->target = vpg.wwid = 0 which I did not expect; perhaps the driver currently does not obtain the volume's WWID at all or incorrectly? Or perhaps I am missing steps prior to fetching the page; current mpii(4) does not seem to use struct mpii_cfg_raid_vol_pg1 at all. Index: mpii.c === RCS file: /cvs/src/sys/dev/pci/mpii.c,v retrieving revision 1.121 diff -u -p -r1.121 mpii.c --- mpii.c 12 Sep 2019 22:22:53 - 1.121 +++ mpii.c 27 Dec 2019 14:51:36 - @@ -909,8 +909,33 @@ mpii_scsi_probe(struct scsi_link *link) if (ISSET(flags, MPII_DF_HIDDEN) || ISSET(flags, MPII_DF_UNUSED)) return (1); - if (ISSET(flags, MPII_DF_VOLUME)) - return (0); + if (ISSET(flags, MPII_DF_VOLUME)) { + struct mpii_cfg_hdr hdr; + struct mpii_cfg_raid_vol_pg1 vpg; + size_t pagelen; + + address = MPII_CFG_RAID_VOL_ADDR_HANDLE | dev->dev_handle; + + if (mpii_req_cfg_header(sc, MPII_CONFIG_REQ_PAGE_TYPE_RAID_VOL, + 1, address, MPII_PG_POLL, ) != 0) + return (EINVAL); + + memset(, 0, sizeof(vpg)); + pagelen = hdr.page_length * 4; + + if (mpii_req_cfg_page(sc, address, MPII_PG_POLL, , 1, + , pagelen) != 0) + return (EINVAL); + + link->target = vpg.wwid; + + printf("%s: target %x lun %x" + " port_wwn %llx node_wwn %llx" + " has MPII_DF_VOLUME set in flags %x\n", + __func__, link->target, link->lun, + link->port_wwn, link->node_wwn, + flags); + } memset(, 0, sizeof(ehdr)); ehdr.page_type = MPII_CONFIG_REQ_PAGE_TYPE_EXTENDED; The previous version of this diff put vpg on the heap, e.g. + struct mpii_cfg_raid_vol_pg1 *vpg; + vpg = malloc(pagelen, M_TEMP, M_WAITOK | M_CANFAIL | M_ZERO); ... just like it is done for the mpii_cfg_raid_vol_pg0 struct in mpii_ioctl_cache(), which is where I copied the code from to fetch pages. But with this, the kernel paniced: mpii_scsi_probe: target beef lun 0 port_wwn 0 node_wwn 0 has MPII_DF_VOLUME set in flags 10 panic: kernel data fault: pc=165e310 addr=40090fae000 panic: Unable to send mondo 1011fa4 to cpu 0: 6 Stopped at db_enter+0x8: nop TIDPIDUID PRFLAGS PFLAGS CPU COMMAND * 0 0 0 0x1 0x2000K swapper sun4v_send_ipi(0, 1011fa4, 0, 6, 0, 16) at sun4v_send_ipi+0xac db_enter_ddb(419aa7f8000, a, 1c3, 2007c68, 3f, 78) at db_enter_ddb+0x244 db_ktrap(101, 20074a0, 1, 0, 0, 2007728) at db_ktrap+0x104 trap(20074a0, 101, 11e55e4, 820006, 0, 78) at trap+0x2c0 Lslowtrap_reenter(1, 20077f8, 175adf8, 20077f8, 193f570, cb) at Lslowtrap_reenter+0xf8 panic(175adf8, 165e310, 40090fae000, 20077f8, 1ca1000, 100) at panic+0xb8 data_access_fault(20078f0, 31, 165e310, 40090fae000, 40090fae000, 1) at data_access_fault+0x2f0 sun4v_datatrap(0, 2018000, fff8, 0, 40090ea7ec0, 16) at sun4v_datatrap+0x210 _kernel_lock(40090ea7ec0, a, 1, ac3b86cb36c, 0, 16) at _kernel_lock+0x34 scsi_xs_exec(40090ea7ec0, 40090eee710, 1c3, 2007c68, 3f, 78) at scsi_xs_exec+0x30 scsi_xs_sync(40090ea7ec0, 1c3, 1b0, 0, 193f570, 1c00) at scsi_xs_sync+0x84 scsi_test_unit_ready(c, 5, 40090ea7ec0, 193f570, 1c00, 40090ee7500) at scsi_test_unit_ready+0x38 scsi_probedev(40090ee5680, 0, 0, 0, 1c00, 40090ed28c0) at scsi_probedev+0x42c scsi_probe(40090ee5680, 0, , 0, 193f570, 73) at scsi_probe+0x98 https://www.openbsd.org/ddb.html describes the minimum info required in bug reports. Insufficient info makes it difficult to find and fix bugs. ddb{0}> I need to get more familiar with this code.
Re: sparc64: find root device on hardware RAID
Klemens Nanni writes: > On Thu, Dec 26, 2019 at 07:49:06PM +0100, Mark Kettenis wrote: >> Well, there's your problem. The mpii(4) doesn't fill in the WWNs for >> the logical volume so there is nothing that can be matched to the WWN >> from the bootpath. > Obvious now that you mention it. > >> > See below a diff for debug printf() I use to look at thoes values. >> > Complete console log from OBP prompt to multiuser follows to to show the >> > boot process and debug output for all devices. >> > >> > What I find odd is how 0aa32290d5dcd16c is the WWID of the RAID volume, >> > and yet all devices attaching to scsibus* including those not being part >> > of the RAID show the very same bp->val[0] of 3aa32290d5dcd16c. >> >> bp->val[0] comes from the boot path; there is only one. > Ha, sure that. I confused myself with printing it for every device > passing that code path where it is used as target, hence debug printfs > showing the same value for multiple devices. > >> As you can see, the WWNs are filled in for the other disks (sd1, cd0) >> that attach to the controller. So you probably need some additional >> code in mpii(4) to fill in the WWNs for logical volumes. I recommend >> talking to dlg@ and jmatthew@ directly about that. > That makes sense, I didn't look toward mpii(4) yet. > > Thank you for pointing things out and asking such questions, this is > very very helpful guidance. I'm looking further into the controller > driver now. Looks like WWID for the RAID volume can be read from the RAID Volume Page 1 (mpii_cfg_raid_vol_pg1). Cheers, Mike
Re: sparc64: find root device on hardware RAID
On Thu, Dec 26, 2019 at 07:49:06PM +0100, Mark Kettenis wrote: > Well, there's your problem. The mpii(4) doesn't fill in the WWNs for > the logical volume so there is nothing that can be matched to the WWN > from the bootpath. Obvious now that you mention it. > > See below a diff for debug printf() I use to look at thoes values. > > Complete console log from OBP prompt to multiuser follows to to show the > > boot process and debug output for all devices. > > > > What I find odd is how 0aa32290d5dcd16c is the WWID of the RAID volume, > > and yet all devices attaching to scsibus* including those not being part > > of the RAID show the very same bp->val[0] of 3aa32290d5dcd16c. > > bp->val[0] comes from the boot path; there is only one. Ha, sure that. I confused myself with printing it for every device passing that code path where it is used as target, hence debug printfs showing the same value for multiple devices. > As you can see, the WWNs are filled in for the other disks (sd1, cd0) > that attach to the controller. So you probably need some additional > code in mpii(4) to fill in the WWNs for logical volumes. I recommend > talking to dlg@ and jmatthew@ directly about that. That makes sense, I didn't look toward mpii(4) yet. Thank you for pointing things out and asking such questions, this is very very helpful guidance. I'm looking further into the controller driver now.
Re: sparc64: find root device on hardware RAID
> Date: Thu, 26 Dec 2019 19:02:26 +0100 > From: Klemens Nanni > > On Thu, Dec 26, 2019 at 06:25:10PM +0100, Mark Kettenis wrote: > > Hmm, you really shouldn't end up there if you're booting by WWN. I > > guess the > > > > bp->val[0] == sl->port_wwn > > > > check is failing in your case. What are the values of: > > > > bp->val[0] > > sl->port_wwn > > sl->node_wwn > > > > in your case? > For the RAID volume sd0: > > bp->val[0] = 0x3aa32290d5dcd16c > sl->port_wwn = 0 > sl->node_wwn = 0 Well, there's your problem. The mpii(4) doesn't fill in the WWNs for the logical volume so there is nothing that can be matched to the WWN from the bootpath. > See below a diff for debug printf() I use to look at thoes values. > Complete console log from OBP prompt to multiuser follows to to show the > boot process and debug output for all devices. > > What I find odd is how 0aa32290d5dcd16c is the WWID of the RAID volume, > and yet all devices attaching to scsibus* including those not being part > of the RAID show the very same bp->val[0] of 3aa32290d5dcd16c. bp->val[0] comes from the boot path; there is only one. As you can see, the WWNs are filled in for the other disks (sd1, cd0) that attach to the controller. So you probably need some additional code in mpii(4) to fill in the WWNs for logical volumes. I recommend talking to dlg@ and jmatthew@ directly about that. Cheers, Mark P.S. dlg@ and jmatthew@ may also be interested in the following unrecognized device: > vendor "Symbios Logic", unknown product 0x007e (class mass storage subclass > SAS, rev 0x03) at pci19 dev 0 function 0 not configured
Re: sparc64: find root device on hardware RAID
On Thu, Dec 26, 2019 at 06:25:10PM +0100, Mark Kettenis wrote: > Hmm, you really shouldn't end up there if you're booting by WWN. I > guess the > > bp->val[0] == sl->port_wwn > > check is failing in your case. What are the values of: > > bp->val[0] > sl->port_wwn > sl->node_wwn > > in your case? For the RAID volume sd0: bp->val[0] = 0x3aa32290d5dcd16c sl->port_wwn = 0 sl->node_wwn = 0 See below a diff for debug printf() I use to look at thoes values. Complete console log from OBP prompt to multiuser follows to to show the boot process and debug output for all devices. What I find odd is how 0aa32290d5dcd16c is the WWID of the RAID volume, and yet all devices attaching to scsibus* including those not being part of the RAID show the very same bp->val[0] of 3aa32290d5dcd16c. Index: sparc64/autoconf.c === RCS file: /cvs/src/sys/arch/sparc64/sparc64/autoconf.c,v retrieving revision 1.133 diff -u -p -r1.133 autoconf.c --- sparc64/autoconf.c 15 Oct 2019 05:21:16 - 1.133 +++ sparc64/autoconf.c 26 Dec 2019 17:47:20 - @@ -1455,6 +1455,24 @@ device_register(struct device *dev, void u_int lun = bp->val[1]; if (bp->val[0] & 0x && bp->val[0] != -1) { + printf("\n%s: RAID:\n" + "\tbp->val[]: %lx, %lx, %lx\n" + "\ttarget: %x, sl->target: %x, sl->adapter_target: %x\n" + "\tlun: %x, sl->lun: %x\n" + "\tpartition: '%c'\n" + "\tsl->port_wwn: %llx\n" + "\tsl->node_wwn: %llx\n" + "\tsl->id->d_type: %d, sl->id: %s\n", + __func__, + bp->val[0], bp->val[1], bp->val[2], + target, sl->target, sl->adapter_target, + lun, sl->lun, + (int)bp->val[2], + sl->port_wwn, + sl->node_wwn, + (sl->id == NULL ? -1 : sl->id->d_type), (sl->id == NULL ? 0 : (const char *)(sl->id + 1)) + ); + /* Fibre channel? */ if (bp->val[0] == sl->port_wwn && lun == sl->lun) { nail_bootdev(dev, bp); {0} ok boot NOTICE: Entering OpenBoot. NOTICE: Fetching Guest MD from HV. NOTICE: Starting additional cpus. NOTICE: Initializing LDC services. NOTICE: Probing PCI devices. NOTICE: Finished PCI probing. SPARC T4-2, No Keyboard Copyright (c) 1998, 2018, Oracle and/or its affiliates. All rights reserved. OpenBoot 4.38.16, 64. GB memory available, Serial #100254168. Ethernet address 0:21:28:f9:c1:d8, Host ID: 85f9c1d8. Boot device: raid File and args: /bsd.debug OpenBSD IEEE 1275 Bootblock 1.4 ..>> OpenBSD BOOT 1.15 ERROR: /iscsi-hba: No iscsi-network-bootpath property Booting /pci@400/pci@2/pci@0/pci@e/scsi@0/disk@w3aa32290d5dcd16c,0:a/bsd.debug 9688952@0x100+2184@0x193d778+196104@0x1c0+3998200@0x1c2fe08 symbols @ 0xfe458400 165+625008+428736 start=0x100 [ using 1054936 bytes of bsd ELF symbol table ] console is /virtual-devices@100/console@1 Copyright (c) 1982, 1986, 1989, 1991, 1993 The Regents of the University of California. All rights reserved. Copyright (c) 1995-2019 OpenBSD. All rights reserved. https://www.OpenBSD.org OpenBSD 6.6-current (GENERIC.MP) #31: Thu Dec 26 18:47:42 CET 2019 kn@xxx:/sys/arch/sparc64/compile/GENERIC.MP real mem = 68719476736 (65536MB) avail mem = 67497238528 (64370MB) mpath0 at root scsibus0 at mpath0: 256 targets mainbus0 at root: SPARC T4-2 cpu0 at mainbus0: SPARC-T4 (rev 0.0) @ 2847.862 MHz cpu1 at mainbus0: SPARC-T4 (rev 0.0) @ 2847.862 MHz cpu2 at mainbus0: SPARC-T4 (rev 0.0) @ 2847.862 MHz cpu3 at mainbus0: SPARC-T4 (rev 0.0) @ 2847.862 MHz cpu4 at mainbus0: SPARC-T4 (rev 0.0) @ 2847.862 MHz cpu5 at mainbus0: SPARC-T4 (rev 0.0) @ 2847.862 MHz cpu6 at mainbus0: SPARC-T4 (rev 0.0) @ 2847.862 MHz cpu7 at mainbus0: SPARC-T4 (rev 0.0) @ 2847.862 MHz cpu8 at mainbus0: SPARC-T4 (rev 0.0) @ 2847.862 MHz cpu9 at mainbus0: SPARC-T4 (rev 0.0) @ 2847.862 MHz cpu10 at mainbus0: SPARC-T4 (rev 0.0) @ 2847.862 MHz cpu11 at mainbus0: SPARC-T4 (rev 0.0) @ 2847.862 MHz cpu12 at mainbus0: SPARC-T4 (rev 0.0) @ 2847.862 MHz cpu13 at mainbus0: SPARC-T4 (rev 0.0) @ 2847.862 MHz cpu14 at mainbus0: SPARC-T4 (rev 0.0) @ 2847.862 MHz cpu15 at mainbus0: SPARC-T4 (rev 0.0) @ 2847.862 MHz cpu16 at mainbus0: SPARC-T4 (rev 0.0) @ 2847.862 MHz cpu17 at mainbus0: SPARC-T4 (rev 0.0) @ 2847.862 MHz cpu18 at mainbus0: SPARC-T4 (rev 0.0) @ 2847.862 MHz cpu19 at mainbus0: SPARC-T4 (rev 0.0) @ 2847.862 MHz cpu20 at mainbus0: SPARC-T4 (rev 0.0) @ 2847.862 MHz cpu21 at mainbus0: SPARC-T4 (rev 0.0) @ 2847.862 MHz cpu22 at mainbus0: SPARC-T4 (rev 0.0)
Re: sparc64: find root device on hardware RAID
> Date: Thu, 26 Dec 2019 17:55:24 +0100 > From: Klemens Nanni > > On Thu, Dec 26, 2019 at 09:50:16AM +0100, Mark Kettenis wrote: > > What happens if the bootpath doesn't specify a partition? Currently > > we end up with bp->val[2] being zero in that case, which means we end > > up booting from 'a' which is the default partition. So I think you > > need to check for that case when you call setroot(). > Right, if no partition is specified val[2] is zero - for both current > code as well as with my diff. > > So in setroot() I should have checked whether val[2] was set. > > Note how the current code accounts for bp being NULL and sets bootdv > accordingly, but then uses bp->val[2] unconditionally. This looks like > a potential NULL dereference, perhaps noone has ever hit it? > > bp = nbootpath == 0 ? NULL : [nbootpath-1]; > bootdv = (bp == NULL) ? NULL : bp->dev; > > #if NMPATH > 0 > if (bootdv != NULL) > bootdv = mpath_bootdv(bootdv); > #endif > > setroot(bootdv, bp->val[2], RB_USERREQ | RB_HALT); > > > However, that means the partition != 0 check is probably meaningless... > Indeed, when omitting the partition OBP passes the bootpath as is > > {0} ok boot /pci@400/pci@2/pci@0/pci@e/scsi@0/disk@w3aa32290d5dcd16c,0 > /bsd.debug > ... > Boot device: /pci@400/pci@2/pci@0/pci@e/scsi@0/disk@w3aa32290d5dcd16c,0 > File and args: /bsd.debug > > > but stand/ofwboot/ofdev.c:devopen() will default to "a" > > OpenBSD IEEE 1275 Bootblock 1.4 > > ..>> OpenBSD BOOT 1.15 > > > ERROR: /iscsi-hba: No iscsi-network-bootpath property > > Booting > /pci@400/pci@2/pci@0/pci@e/scsi@0/disk@w3aa32290d5dcd16c,0:a/bsd.debug > > So the kernel will always see a partition with such given bootpath, > hence the val[2] check is always true in that place. > > > Also note that booting by WWWN isn't just for fibre-channel and > > hardware RAID. It works for single disks as well on some controllers. > Ah, OK. I errornuously implied this from reading the code. > > > Why does matching the LUN fail? If I read the code correctly, you'll > > end up with bp->val[1] = 0. What is the value of sl->lun? > the LUN always matches, but the Target does not. Both lun = bp->val[1] > and sl->lun are zero. > > However, the last check > > if (bp->val[0] & 0x && bp->val[0] != -1) { > ... > > if (target == sl->target && lun == sl->lun) { > > nail_bootdev(dev, bp); > > return; > > } > > } > > fails because of target mismatch. > > On my sd0 being the RAID volume I see target = bp->val[2] being > 0xd5dcd16c (lower 16 bit of WWID) while sl->target is zero. Hmm, you really shouldn't end up there if you're booting by WWN. I guess the bp->val[0] == sl->port_wwn check is failing in your case. What are the values of: bp->val[0] sl->port_wwn sl->node_wwn in your case?
Re: sparc64: find root device on hardware RAID
On Thu, Dec 26, 2019 at 09:50:16AM +0100, Mark Kettenis wrote: > What happens if the bootpath doesn't specify a partition? Currently > we end up with bp->val[2] being zero in that case, which means we end > up booting from 'a' which is the default partition. So I think you > need to check for that case when you call setroot(). Right, if no partition is specified val[2] is zero - for both current code as well as with my diff. So in setroot() I should have checked whether val[2] was set. Note how the current code accounts for bp being NULL and sets bootdv accordingly, but then uses bp->val[2] unconditionally. This looks like a potential NULL dereference, perhaps noone has ever hit it? bp = nbootpath == 0 ? NULL : [nbootpath-1]; bootdv = (bp == NULL) ? NULL : bp->dev; #if NMPATH > 0 if (bootdv != NULL) bootdv = mpath_bootdv(bootdv); #endif setroot(bootdv, bp->val[2], RB_USERREQ | RB_HALT); > However, that means the partition != 0 check is probably meaningless... Indeed, when omitting the partition OBP passes the bootpath as is {0} ok boot /pci@400/pci@2/pci@0/pci@e/scsi@0/disk@w3aa32290d5dcd16c,0 /bsd.debug ... Boot device: /pci@400/pci@2/pci@0/pci@e/scsi@0/disk@w3aa32290d5dcd16c,0 File and args: /bsd.debug but stand/ofwboot/ofdev.c:devopen() will default to "a" OpenBSD IEEE 1275 Bootblock 1.4 ..>> OpenBSD BOOT 1.15 ERROR: /iscsi-hba: No iscsi-network-bootpath property Booting /pci@400/pci@2/pci@0/pci@e/scsi@0/disk@w3aa32290d5dcd16c,0:a/bsd.debug So the kernel will always see a partition with such given bootpath, hence the val[2] check is always true in that place. > Also note that booting by WWWN isn't just for fibre-channel and > hardware RAID. It works for single disks as well on some controllers. Ah, OK. I errornuously implied this from reading the code. > Why does matching the LUN fail? If I read the code correctly, you'll > end up with bp->val[1] = 0. What is the value of sl->lun? the LUN always matches, but the Target does not. Both lun = bp->val[1] and sl->lun are zero. However, the last check if (bp->val[0] & 0x && bp->val[0] != -1) { ... if (target == sl->target && lun == sl->lun) { nail_bootdev(dev, bp); return; } } fails because of target mismatch. On my sd0 being the RAID volume I see target = bp->val[2] being 0xd5dcd16c (lower 16 bit of WWID) while sl->target is zero.
Re: sparc64: find root device on hardware RAID
> Date: Thu, 26 Dec 2019 07:38:47 +0100 > From: Klemens Nanni > > Hardware RAID volume bootpaths are similar to Fibre channel ones in that > they use the disk's WWID (lower 16 bit only) as SCSI target, e.g. > > /pci@400/pci@2/pci@0/pci@e/scsi@0/disk@w3aa32290d5dcd16c,0:a > > Currently device_register() does recognise devices with their WWN as > target but only continues to nail_bootdev() them if their lun and port > match as well. > > Diff below makes device_register() look at the bootpath's partition and > --in lack of a better criterium-- the device name as well to test for > hardware RAIDs. > > But to be able to match the partition, partition index calculation must > be deferred from bootpath_build() to diskconf() so it becomes possible > to check whether partition "a" (index 0) has been specified in the > bootpath - this is currently not possible because the partition gets > turned into an index right away which makes it indistinguishable from > val[2]'s default initialised value. > > As a bonus, this now makes also makes the kernel print back the bootpath > correctly with regard to "a" partitions: > > bootpath: > /pci@400,0/pci@2,0/pci@0,0/pci@e,0/scsi@0,0/disk@3aa32290d5dcd16c,0:a > root on sd0a (5eb46f4312eeecb7.a) swap on sd0b dump on sd0b > > > Is there a better way to detect hardware RAIDs as such? > > > Index: sparc64/autoconf.c > === > RCS file: /cvs/src/sys/arch/sparc64/sparc64/autoconf.c,v > retrieving revision 1.133 > diff -u -p -r1.133 autoconf.c > --- sparc64/autoconf.c15 Oct 2019 05:21:16 - 1.133 > +++ sparc64/autoconf.c26 Dec 2019 05:56:20 - > @@ -525,7 +525,7 @@ bootpath_build(void) >* be an ethernet media specification, so be >* sure to skip all letters. >*/ > - bp->val[2] = *++cp - 'a'; > + bp->val[2] = *++cp; > while (*cp != '\0' && *cp != '/') > cp++; > } > @@ -611,7 +611,7 @@ bootpath_print(struct bootpath *bp) > else > printf("/%s@%lx,%lx", bp->name, bp->val[0], bp->val[1]); > if (bp->val[2] != 0) > - printf(":%c", (int)bp->val[2] + 'a'); > + printf(":%c", (int)bp->val[2]); > bp++; > } > printf("\n"); > @@ -813,7 +813,7 @@ diskconf(void) > bootdv = mpath_bootdv(bootdv); > #endif > > - setroot(bootdv, bp->val[2], RB_USERREQ | RB_HALT); > + setroot(bootdv, bp->val[2] - 'a', RB_USERREQ | RB_HALT); > dumpconf(); > } > > @@ -1453,7 +1453,9 @@ device_register(struct device *dev, void > (struct scsibus_softc *)dev->dv_parent; > u_int target = bp->val[0]; > u_int lun = bp->val[1]; > + int partition = bp->val[2]; > > + /* Is target a full WWN/WWID? */ > if (bp->val[0] & 0x && bp->val[0] != -1) { > /* Fibre channel? */ > if (bp->val[0] == sl->port_wwn && lun == sl->lun) { > @@ -1468,6 +1470,12 @@ device_register(struct device *dev, void > sl->id->d_type == DEVID_NAA && > memcmp(sl->id + 1, >val[0], 8) == 0) > nail_bootdev(dev, bp); > + > + /* Hardware RAID? */ > + /* XXX: how to detect properly? */ > + if (strcmp(devname, "sd") == 0 && partition != 0) { > + nail_bootdev(dev, bp); > + } > return; What happens if the bootpath doesn't specify a partition? Currently we end up with bp->val[2] being zero in that case, which means we end up booting from 'a' which is the default partition. So I think you need to check for that case when you call setroot(). However, that means the partition != 0 check is probably meaningless... Also note that booting by WWWN isn't just for fibre-channel and hardware RAID. It works for single disks as well on some controllers. Why does matching the LUN fail? If I read the code correctly, you'll end up with bp->val[1] = 0. What is the value of sl->lun?
sparc64: find root device on hardware RAID
Hardware RAID volume bootpaths are similar to Fibre channel ones in that they use the disk's WWID (lower 16 bit only) as SCSI target, e.g. /pci@400/pci@2/pci@0/pci@e/scsi@0/disk@w3aa32290d5dcd16c,0:a Currently device_register() does recognise devices with their WWN as target but only continues to nail_bootdev() them if their lun and port match as well. Diff below makes device_register() look at the bootpath's partition and --in lack of a better criterium-- the device name as well to test for hardware RAIDs. But to be able to match the partition, partition index calculation must be deferred from bootpath_build() to diskconf() so it becomes possible to check whether partition "a" (index 0) has been specified in the bootpath - this is currently not possible because the partition gets turned into an index right away which makes it indistinguishable from val[2]'s default initialised value. As a bonus, this now makes also makes the kernel print back the bootpath correctly with regard to "a" partitions: bootpath: /pci@400,0/pci@2,0/pci@0,0/pci@e,0/scsi@0,0/disk@3aa32290d5dcd16c,0:a root on sd0a (5eb46f4312eeecb7.a) swap on sd0b dump on sd0b Is there a better way to detect hardware RAIDs as such? Index: sparc64/autoconf.c === RCS file: /cvs/src/sys/arch/sparc64/sparc64/autoconf.c,v retrieving revision 1.133 diff -u -p -r1.133 autoconf.c --- sparc64/autoconf.c 15 Oct 2019 05:21:16 - 1.133 +++ sparc64/autoconf.c 26 Dec 2019 05:56:20 - @@ -525,7 +525,7 @@ bootpath_build(void) * be an ethernet media specification, so be * sure to skip all letters. */ - bp->val[2] = *++cp - 'a'; + bp->val[2] = *++cp; while (*cp != '\0' && *cp != '/') cp++; } @@ -611,7 +611,7 @@ bootpath_print(struct bootpath *bp) else printf("/%s@%lx,%lx", bp->name, bp->val[0], bp->val[1]); if (bp->val[2] != 0) - printf(":%c", (int)bp->val[2] + 'a'); + printf(":%c", (int)bp->val[2]); bp++; } printf("\n"); @@ -813,7 +813,7 @@ diskconf(void) bootdv = mpath_bootdv(bootdv); #endif - setroot(bootdv, bp->val[2], RB_USERREQ | RB_HALT); + setroot(bootdv, bp->val[2] - 'a', RB_USERREQ | RB_HALT); dumpconf(); } @@ -1453,7 +1453,9 @@ device_register(struct device *dev, void (struct scsibus_softc *)dev->dv_parent; u_int target = bp->val[0]; u_int lun = bp->val[1]; + int partition = bp->val[2]; + /* Is target a full WWN/WWID? */ if (bp->val[0] & 0x && bp->val[0] != -1) { /* Fibre channel? */ if (bp->val[0] == sl->port_wwn && lun == sl->lun) { @@ -1468,6 +1470,12 @@ device_register(struct device *dev, void sl->id->d_type == DEVID_NAA && memcmp(sl->id + 1, >val[0], 8) == 0) nail_bootdev(dev, bp); + + /* Hardware RAID? */ + /* XXX: how to detect properly? */ + if (strcmp(devname, "sd") == 0 && partition != 0) { + nail_bootdev(dev, bp); + } return; }