Marcus Glocker <mar...@nazgul.ch> wrote: > On Tue, Mar 23, 2021 at 09:52:42AM -0600, Theo de Raadt wrote: > > > Mark Kettenis <mark.kette...@xs4all.nl> wrote: > > > > > > > Index: sys/dev/sdmmc/sdmmc_scsi.c > > > > > =================================================================== > > > > > RCS file: /cvs/src/sys/dev/sdmmc/sdmmc_scsi.c,v > > > > > retrieving revision 1.59 > > > > > diff -u -p -u -p -r1.59 sdmmc_scsi.c > > > > > --- sys/dev/sdmmc/sdmmc_scsi.c 15 Oct 2020 13:22:13 -0000 > > > > > 1.59 > > > > > +++ sys/dev/sdmmc/sdmmc_scsi.c 23 Mar 2021 07:32:52 -0000 > > > > > @@ -365,9 +365,8 @@ sdmmc_scsi_cmd(struct scsi_xfer *xs) > > > > > /* A read or write operation. */ > > > > > sdmmc_scsi_decode_rw(xs, &blockno, &blockcnt); > > > > > > > > > > - if (blockno >= tgt->card->csd.capacity || > > > > > - blockno + blockcnt > tgt->card->csd.capacity) { > > > > > - DPRINTF(("%s: out of bounds %u-%u >= %u\n", DEVNAME(sc), > > > > > + if (blockno + blockcnt >= tgt->card->csd.capacity) { > > > > > + DPRINTF(("%s: out of bounds %u+%u >= %u\n", DEVNAME(sc), > > > > > blockno, blockcnt, tgt->card->csd.capacity)); > > > > > xs->error = XS_DRIVER_STUFFUP; > > > > > scsi_done(xs); > > > > > > Apart from a potential overflow, the original condition looks correct > > > to me. If csd.capacity is 16 and blockno is 0 and blockcnt is 16, the > > > operation should succeed, shouldn't it? > > Yes, you're right. In my case the patch suppressed the issue by > skipping the last block of the disk. > > > Hmm. Maybe this is a specific SD card with a bug accessing the last sector. > > > > A manual dd using > > > > skip='c partition size -1' count=1 bs=512 > > > > Is failing on that SD card, but it is working on other SD cards. Which > > appear > > to all be SD v2 cards. > > > > Maybe his specific SD card has an off-by-one bug relating to the last > > sector, > > and we need to determine a way to handle that, or skip the last sector on > > all devices of this sub-class? > > > > I don't see any issue in the controller code. Marcus, can you move the > > card > > to another machine (different controller) and use dd to read the last > > sector? > > Though I suspect that might not provide enough clue either, the "fail to > > read" > > behaviour might vary between controllers... > > > > > > blockno and blockcnt are both 32-bit, so this feels dangerously close > > > > to integer wrap, and I believe the first condition cannot be deleted. > > > > > > But the second condition can still wrap. So it probably needs to be > > > something like: > > > > > > if (blockno >= tgt->card->csd.capacity || > > > blockcnt > tgt->card->csd.capacity - blockno) > > > > All 3 variables are int. Not sure how moving the + to - on the other > > side changes anything. > > In the meantime I figured out how to trigger the SD controller command > to fail. It always happens if you read more than one block through one > command, where the last block is accessing the last disk block. If you > only access the last disk block directly, the command doesn't fail. > Some examples: > > # dd if=/dev/rsd0c of=/dev/null skip=15591935 bs=4096 count=1 > nblks=8, datalen=4096, blklen=512, arg=0x76f4ff8, cmd=0x8052 > CMD FAIL! > bcmsdhost0: transfer timeout! > > # dd if=/dev/rsd0c of=/dev/null skip=31183871 bs=2048 count=1 > CMD FAIL! > bcmsdhost0: transfer timeout! > > # dd if=/dev/rsd0c of=/dev/null skip=62367743 bs=1024 count=1 > CMD FAIL! > bcmsdhost0: transfer timeout! > > # dd if=/dev/rsd0c of=/dev/null skip=124735487 bs=512 count=1 > < all good > > > I don't think it's an issue with the microSD card since I have NetBSD > installed on a different card of exactly the same type/size, and there I > also can trigger the issue: > > # dd if=/dev/rld0c of=/dev/null skip=15591935 bs=4096 count=1 > [ 179.4170221] ld0c: error reading fsbn 124735480 of 124735480-124735487 > (ld0 bn 124735480; cn 60905 tn 63 sn 24), retrying > > Also I have tried that on the Pine64/sximmc(4) with the same microSD > card, and there all is hunky-dory. > > It seems to be a behavior/bug of the BCM2835 SD controller. But I'm > not sure what's the right approach to work around that in the driver, or > even if there is a real fix by changing the command code. >
I think the bug is in bcm2835_sdhost.c bcmsdhost_exec_command() This chunk is fishy, why is it increasing the transaction size? if (nblks == 0 || (cmd->c_datalen % cmd->c_blklen) != 0) ++nblks; I guess this does not occur for this operation, but still it is weird. Maybe investigate which of the "goto done" cases gets hit; I am pretty sure the controller is indicating which failure mode occurs. Surely c_error isn't zero when this function completes? > I mean one can just change the disklabel to skip the last block, but > it's not really a solution since somebody lazy like me, who does a > very simple disklabel where 'a'=root, and 'b'=swap is the rest of the > disk can easily trigger that "bug" through e.g. savecore(8) which we > run by default during boot. At least in this case it's a very good > debugger :-) We would need to change the disklabel on all existing and future SD cards, since the cards are removable and can move between controllers. That won't work.