Marcus Glocker <mar...@nazgul.ch> wrote:

> On Tue, Mar 23, 2021 at 09:52:42AM -0600, Theo de Raadt wrote:
> 
> > Mark Kettenis <mark.kette...@xs4all.nl> wrote:
> > 
> > > > > Index: sys/dev/sdmmc/sdmmc_scsi.c
> > > > > ===================================================================
> > > > > RCS file: /cvs/src/sys/dev/sdmmc/sdmmc_scsi.c,v
> > > > > retrieving revision 1.59
> > > > > diff -u -p -u -p -r1.59 sdmmc_scsi.c
> > > > > --- sys/dev/sdmmc/sdmmc_scsi.c        15 Oct 2020 13:22:13 -0000      
> > > > > 1.59
> > > > > +++ sys/dev/sdmmc/sdmmc_scsi.c        23 Mar 2021 07:32:52 -0000
> > > > > @@ -365,9 +365,8 @@ sdmmc_scsi_cmd(struct scsi_xfer *xs)
> > > > >       /* A read or write operation. */
> > > > >       sdmmc_scsi_decode_rw(xs, &blockno, &blockcnt);
> > > > >  
> > > > > -     if (blockno >= tgt->card->csd.capacity ||
> > > > > -         blockno + blockcnt > tgt->card->csd.capacity) {
> > > > > -             DPRINTF(("%s: out of bounds %u-%u >= %u\n", DEVNAME(sc),
> > > > > +     if (blockno + blockcnt >= tgt->card->csd.capacity) {
> > > > > +             DPRINTF(("%s: out of bounds %u+%u >= %u\n", DEVNAME(sc),
> > > > >                   blockno, blockcnt, tgt->card->csd.capacity));
> > > > >               xs->error = XS_DRIVER_STUFFUP;
> > > > >               scsi_done(xs);
> > > 
> > > Apart from a potential overflow, the original condition looks correct
> > > to me.  If csd.capacity is 16 and blockno is 0 and blockcnt is 16, the
> > > operation should succeed, shouldn't it?
> 
> Yes, you're right.  In my case the patch suppressed the issue by
> skipping the last block of the disk.
>  
> > Hmm.  Maybe this is a specific SD card with a bug accessing the last sector.
> > 
> > A manual dd using
> > 
> >   skip='c partition size -1' count=1 bs=512
> > 
> > Is failing on that SD card, but it is working on other SD cards.  Which 
> > appear
> > to all be SD v2 cards.
> > 
> > Maybe his specific SD card has an off-by-one bug relating to the last 
> > sector,
> > and we need to determine a way to handle that, or skip the last sector on
> > all devices of this sub-class?
> > 
> > I don't see any issue in the controller code.   Marcus, can you move the 
> > card
> > to another machine (different controller) and use dd to read the last 
> > sector?
> > Though I suspect that might not provide enough clue either, the "fail to 
> > read"
> > behaviour might vary between controllers...
> > 
> > > > blockno and blockcnt are both 32-bit, so this feels dangerously close
> > > > to integer wrap, and I believe the first condition cannot be deleted.
> > > 
> > > But the second condition can still wrap.  So it probably needs to be
> > > something like:
> > > 
> > >    if (blockno >= tgt->card->csd.capacity ||
> > >        blockcnt > tgt->card->csd.capacity - blockno)
> > 
> > All 3 variables are int.  Not sure how moving the + to - on the other
> > side changes anything.
> 
> In the meantime I figured out how to trigger the SD controller command
> to fail.  It always happens if you read more than one block through one
> command, where the last block is accessing the last disk block.  If you
> only access the last disk block directly, the command doesn't fail.
> Some examples:
> 
>       # dd if=/dev/rsd0c of=/dev/null skip=15591935 bs=4096 count=1
>       nblks=8, datalen=4096, blklen=512, arg=0x76f4ff8, cmd=0x8052
>       CMD FAIL!
>       bcmsdhost0: transfer timeout!
> 
>       # dd if=/dev/rsd0c of=/dev/null skip=31183871 bs=2048 count=1
>       CMD FAIL!
>       bcmsdhost0: transfer timeout!
> 
>       # dd if=/dev/rsd0c of=/dev/null skip=62367743 bs=1024 count=1
>       CMD FAIL!
>       bcmsdhost0: transfer timeout!
> 
>       # dd if=/dev/rsd0c of=/dev/null skip=124735487 bs=512 count=1
>       < all good >
> 
> I don't think it's an issue with the microSD card since I have NetBSD
> installed on a different card of exactly the same type/size, and there I
> also can trigger the issue:
> 
>       # dd if=/dev/rld0c of=/dev/null skip=15591935 bs=4096 count=1
>       [ 179.4170221] ld0c: error reading fsbn 124735480 of 124735480-124735487
>       (ld0 bn 124735480; cn 60905 tn 63 sn 24), retrying
> 
> Also I have tried that on the Pine64/sximmc(4) with the same microSD
> card, and there all is hunky-dory.
> 
> It seems to be a behavior/bug of the BCM2835 SD controller.  But I'm
> not sure what's the right approach to work around that in the driver, or
> even if there is a real fix by changing the command code.
> 

I think the bug is in bcm2835_sdhost.c bcmsdhost_exec_command()

This chunk is fishy, why is it increasing the transaction size?

                if (nblks == 0 || (cmd->c_datalen % cmd->c_blklen) != 0)
                        ++nblks;

I guess this does not occur for this operation, but still it is weird.

Maybe investigate which of the "goto done" cases gets hit; I am pretty
sure the controller is indicating which failure mode occurs.

Surely c_error isn't zero when this function completes?


> I mean one can just change the disklabel to skip the last block, but
> it's not really a solution since somebody lazy like me, who does a
> very simple disklabel where 'a'=root, and 'b'=swap is the rest of the
> disk can easily trigger that "bug" through e.g. savecore(8) which we
> run by default during boot.  At least in this case it's a very good
> debugger :-)

We would need to change the disklabel on all existing and future SD
cards, since the cards are removable and can move between controllers.
That won't work.

Reply via email to