On Fri, Dec 27, 2019 at 07:50:34PM +0100, Klemens Nanni wrote:
> On Fri, Dec 27, 2019 at 09:46:56AM +0100, Mike Belopuhov wrote:
> > Looks like WWID for the RAID volume can be read from the RAID Volume
> > Page 1 (mpii_cfg_raid_vol_pg1).
> jmatthew also suggested that, thanks.
> 
> I'm looking into now mpii(4) and already had a rather naive attempt at
> setting the SCSI target to the volume's WWID, but with no avail.
> 
> Diff below is what I booted last, but for reasons yet unkown to me the
> kernel just hangs afer debug printf()
> 
>       ...
>       mpii0 at pci15 dev 0 function 0 "Symbios Logic SAS2008" rev 0x03: msi   
>                                        
>       mpii0: Solana On-Board, firmware 9.0.0.0 IR, MPI 2.0                    
>                                        
>       scsibus1 at mpii0: 834 targets                                          
>                                        
>       mpii_scsi_probe: target 0 lun 0 port_wwn 0 node_wwn 0 has 
> MPII_DF_VOLUME set in flags 10
> 
> and the machine must be reset.  So something is wrong with this diff and
> I need to get more familiar with this code, but one can also observe
> lun->target = vpg.wwid = 0 which I did not expect;  perhaps the driver
> currently does not obtain the volume's WWID at all or incorrectly?  Or
> perhaps I am missing steps prior to fetching the page;  current mpii(4)
> does not seem to use struct mpii_cfg_raid_vol_pg1 at all.
> 
> Index: mpii.c
> ===================================================================
> RCS file: /cvs/src/sys/dev/pci/mpii.c,v
> retrieving revision 1.121
> diff -u -p -r1.121 mpii.c
> --- mpii.c    12 Sep 2019 22:22:53 -0000      1.121
> +++ mpii.c    27 Dec 2019 14:51:36 -0000
> @@ -909,8 +909,33 @@ mpii_scsi_probe(struct scsi_link *link)
>       if (ISSET(flags, MPII_DF_HIDDEN) || ISSET(flags, MPII_DF_UNUSED))
>               return (1);
>  
> -     if (ISSET(flags, MPII_DF_VOLUME))
> -             return (0);
> +     if (ISSET(flags, MPII_DF_VOLUME)) {
> +             struct mpii_cfg_hdr hdr;
> +             struct mpii_cfg_raid_vol_pg1 vpg;
> +             size_t pagelen;
> +
> +             address = MPII_CFG_RAID_VOL_ADDR_HANDLE | dev->dev_handle;
> +
> +             if (mpii_req_cfg_header(sc, MPII_CONFIG_REQ_PAGE_TYPE_RAID_VOL,
> +                 1, address, MPII_PG_POLL, &hdr) != 0)
> +                     return (EINVAL);
> +
> +             memset(&vpg, 0, sizeof(vpg));
> +             pagelen = hdr.page_length * 4;

We probably should check that this isn't larger than the size of vpg.

> +
> +             if (mpii_req_cfg_page(sc, address, MPII_PG_POLL, &hdr, 1,
> +                 &vpg, pagelen) != 0)
> +                     return (EINVAL);
> +
> +             link->target = vpg.wwid;

link->target isn't the right place to put this, for one thing it's only 16 bits
and the wwn is 64 bits, and it's used throughout the driver to look up devices
in an array, so changing it will break things.  I think link->port_wwn is the
right place to store it.

The fields in the page returned by the driver are also little endian, so you'd
need letoh64(vpg.wwid) here.

> +
> +             printf("%s: target %x lun %x"
> +                 " port_wwn %llx node_wwn %llx"
> +                 " has MPII_DF_VOLUME set in flags %x\n",
> +                 __func__, link->target, link->lun,
> +                 link->port_wwn, link->node_wwn,
> +                 flags);


You should still return 0 here, as continuing on will send the SAS device
page 0 request to the raid volume, which will probably upset the controller
enough to stop talking to you.

> +     }
>  
>       memset(&ehdr, 0, sizeof(ehdr));
>       ehdr.page_type = MPII_CONFIG_REQ_PAGE_TYPE_EXTENDED;
> 
> 
> The previous version of this diff put vpg on the heap, e.g.
> 
> +             struct mpii_cfg_raid_vol_pg1 *vpg;
> +             vpg = malloc(pagelen, M_TEMP, M_WAITOK | M_CANFAIL | M_ZERO);
>               ...
> 
> just like it is done for the mpii_cfg_raid_vol_pg0 struct in
> mpii_ioctl_cache(), which is where I copied the code from to fetch pages.
> 
> But with this, the kernel paniced:
> 
> mpii_scsi_probe: target beef lun 0 port_wwn 0 node_wwn 0 has MPII_DF_VOLUME 
> set in flags 10
> panic: kernel data fault: pc=165e310 addr=40090fae000
> panic: Unable to send mondo 1011fa4 to cpu 0: 6
> Stopped at      db_enter+0x8:   nop
>     TID    PID    UID     PRFLAGS     PFLAGS  CPU  COMMAND
> *     0      0      0     0x10000      0x200    0K swapper
> sun4v_send_ipi(0, 1011fa4, 0, 6, 0, 16) at sun4v_send_ipi+0xac
> db_enter_ddb(419aa7f8000, a, 1c3, 2007c68, 3f, 78) at db_enter_ddb+0x244
> db_ktrap(101, 20074a0, 1, 0, 0, 2007728) at db_ktrap+0x104
> trap(20074a0, 101, 11e55e4, 820006, 0, 78) at trap+0x2c0
> Lslowtrap_reenter(1, 20077f8, 175adf8, 20077f8, 193f570, cb) at 
> Lslowtrap_reenter+0xf8
> panic(175adf8, 165e310, 40090fae000, 20077f8, 1ca1000, 100) at panic+0xb8
> data_access_fault(20078f0, 31, 165e310, 40090fae000, 40090fae000, 1) at 
> data_access_fault+0x2f0
> sun4v_datatrap(0, 2018000, fffffffffffffff8, 0, 40090ea7ec0, 16) at 
> sun4v_datatrap+0x210
> _kernel_lock(40090ea7ec0, a, 1, ac3b86cb36c0000, 0, 16) at _kernel_lock+0x34
> scsi_xs_exec(40090ea7ec0, 40090eee710, 1c3, 2007c68, 3f, 78) at 
> scsi_xs_exec+0x30
> scsi_xs_sync(40090ea7ec0, 1c3, 1b0, 0, 193f570, 1c00) at scsi_xs_sync+0x84
> scsi_test_unit_ready(c, 5, 40090ea7ec0, 193f570, 1c00, 40090ee7500) at 
> scsi_test_unit_ready+0x38
> scsi_probedev(40090ee5680, 0, 0, 0, 1c00, 40090ed28c0) at scsi_probedev+0x42c
> scsi_probe(40090ee5680, 0, ffffffffffffffff, 0, 193f570, 73) at 
> scsi_probe+0x98
> 
> https://www.openbsd.org/ddb.html describes the minimum info required in bug
> reports.  Insufficient info makes it difficult to find and fix bugs.
> ddb{0}>
> 
> I need to get more familiar with this code.
> 

Reply via email to