On Tue, Mar 23, 2021 at 09:52:42AM -0600, Theo de Raadt wrote:

> Mark Kettenis <mark.kette...@xs4all.nl> wrote:
> 
> > > > Index: sys/dev/sdmmc/sdmmc_scsi.c
> > > > ===================================================================
> > > > RCS file: /cvs/src/sys/dev/sdmmc/sdmmc_scsi.c,v
> > > > retrieving revision 1.59
> > > > diff -u -p -u -p -r1.59 sdmmc_scsi.c
> > > > --- sys/dev/sdmmc/sdmmc_scsi.c  15 Oct 2020 13:22:13 -0000      1.59
> > > > +++ sys/dev/sdmmc/sdmmc_scsi.c  23 Mar 2021 07:32:52 -0000
> > > > @@ -365,9 +365,8 @@ sdmmc_scsi_cmd(struct scsi_xfer *xs)
> > > >         /* A read or write operation. */
> > > >         sdmmc_scsi_decode_rw(xs, &blockno, &blockcnt);
> > > >  
> > > > -       if (blockno >= tgt->card->csd.capacity ||
> > > > -           blockno + blockcnt > tgt->card->csd.capacity) {
> > > > -               DPRINTF(("%s: out of bounds %u-%u >= %u\n", DEVNAME(sc),
> > > > +       if (blockno + blockcnt >= tgt->card->csd.capacity) {
> > > > +               DPRINTF(("%s: out of bounds %u+%u >= %u\n", DEVNAME(sc),
> > > >                     blockno, blockcnt, tgt->card->csd.capacity));
> > > >                 xs->error = XS_DRIVER_STUFFUP;
> > > >                 scsi_done(xs);
> > 
> > Apart from a potential overflow, the original condition looks correct
> > to me.  If csd.capacity is 16 and blockno is 0 and blockcnt is 16, the
> > operation should succeed, shouldn't it?

Yes, you're right.  In my case the patch suppressed the issue by
skipping the last block of the disk.
 
> Hmm.  Maybe this is a specific SD card with a bug accessing the last sector.
> 
> A manual dd using
> 
>   skip='c partition size -1' count=1 bs=512
> 
> Is failing on that SD card, but it is working on other SD cards.  Which appear
> to all be SD v2 cards.
> 
> Maybe his specific SD card has an off-by-one bug relating to the last sector,
> and we need to determine a way to handle that, or skip the last sector on
> all devices of this sub-class?
> 
> I don't see any issue in the controller code.   Marcus, can you move the card
> to another machine (different controller) and use dd to read the last sector?
> Though I suspect that might not provide enough clue either, the "fail to read"
> behaviour might vary between controllers...
> 
> > > blockno and blockcnt are both 32-bit, so this feels dangerously close
> > > to integer wrap, and I believe the first condition cannot be deleted.
> > 
> > But the second condition can still wrap.  So it probably needs to be
> > something like:
> > 
> >    if (blockno >= tgt->card->csd.capacity ||
> >        blockcnt > tgt->card->csd.capacity - blockno)
> 
> All 3 variables are int.  Not sure how moving the + to - on the other
> side changes anything.

In the meantime I figured out how to trigger the SD controller command
to fail.  It always happens if you read more than one block through one
command, where the last block is accessing the last disk block.  If you
only access the last disk block directly, the command doesn't fail.
Some examples:

        # dd if=/dev/rsd0c of=/dev/null skip=15591935 bs=4096 count=1
        nblks=8, datalen=4096, blklen=512, arg=0x76f4ff8, cmd=0x8052
        CMD FAIL!
        bcmsdhost0: transfer timeout!

        # dd if=/dev/rsd0c of=/dev/null skip=31183871 bs=2048 count=1
        CMD FAIL!
        bcmsdhost0: transfer timeout!

        # dd if=/dev/rsd0c of=/dev/null skip=62367743 bs=1024 count=1
        CMD FAIL!
        bcmsdhost0: transfer timeout!

        # dd if=/dev/rsd0c of=/dev/null skip=124735487 bs=512 count=1
        < all good >

I don't think it's an issue with the microSD card since I have NetBSD
installed on a different card of exactly the same type/size, and there I
also can trigger the issue:

        # dd if=/dev/rld0c of=/dev/null skip=15591935 bs=4096 count=1
        [ 179.4170221] ld0c: error reading fsbn 124735480 of 124735480-124735487
        (ld0 bn 124735480; cn 60905 tn 63 sn 24), retrying

Also I have tried that on the Pine64/sximmc(4) with the same microSD
card, and there all is hunky-dory.

It seems to be a behavior/bug of the BCM2835 SD controller.  But I'm
not sure what's the right approach to work around that in the driver, or
even if there is a real fix by changing the command code.

I mean one can just change the disklabel to skip the last block, but
it's not really a solution since somebody lazy like me, who does a
very simple disklabel where 'a'=root, and 'b'=swap is the rest of the
disk can easily trigger that "bug" through e.g. savecore(8) which we
run by default during boot.  At least in this case it's a very good
debugger :-)

Reply via email to