On 5/30/20 12:59 AM, John Snow wrote: > outl 0xcf8 0x8000fa24 > outl 0xcfc 0xe106c000 (Writes e106c00 to BAR5 for 0:31:2)
We might eventually display this in the reproducer output. > > outl 0xcf8 0x8000fa04 > outw 0xcfc 0x7 (Enables BM, Memory IO and PIO for 0:31:2) > > outl 0xcf8 0x8000fb20 (Enables 0:31:3, I guess? My PCI knowledge is > iffy. We set the enable bit and select BAR4, but then we don't actually > write to 0xcfc again to set anything.) > > > write 0x0 0x3 0x2780e7 > - write these three bytes to addr 0 in memory. > > write 0xe106c22c 0xd 0x1130c218021130c218021130c2 > - ahci_port_write ahci(0x555c950f71a0)[2]: port write [reg:PxSCTL] @ 0x2c: > 0x18c23011 > - ahci_port_write ahci(0x555c950f71a0)[2]: port write [reg:PxSERR] @ 0x30: > 0xc2301102 > - ahci_port_write ahci(0x555c950f71a0)[2]: port write [reg:PxSACT] @ 0x34: > 0x30110218 > - ahci_port_write ahci(0x555c950f71a0)[2]: port write [reg:PxCI] @ 0x38: > 0x000000c2 > > write 0xe106c218 0x15 0x110010110010110010110010110010110010110010 > > - ahci_port_write ahci(0x555c950f71a0)[2]: port write [reg:PxCMD] @ 0x18: > 0x11100011 > - ahci_port_write ahci(0x555c950f71a0)[2]: port write [reg:Reserved] @ 0x1c: > 0x00111000 > - ahci_port_write ahci(0x555c950f71a0)[2]: port write [reg:PxTFD] @ 0x20: > 0x10001110 > - ahci_port_write ahci(0x555c950f71a0)[2]: port write [reg:PxSIG] @ 0x24: > 0x11100011 > - ahci_port_write ahci(0x555c950f71a0)[2]: port write [reg:PxSSTS] @ 0x28: > 0x00111000 > - ahci_port_write ahci(0x555c950f71a0)[2]: port write [reg:PxSCTL] @ 0x2c: > 0x00000010 > > Not all of those register writes are actually important for the bug, so > I simplified them to the fewest writes and fewest bits. > > outl 0xcf8 0x8000fa24 > outl 0xcfc 0xe106c000 > outl 0xcf8 0x8000fa04 > outw 0xcfc 0x7 > outl 0xcf8 0x8000fb20 > write 0x0 0x3 0x2780e7 > write 0xe106c22c 0x4 0x01000000 > write 0xe106c238 0x2 0x02 > write 0xe106c218 0x4 0x11000000 > write 0xe106c22c 0x1 0x00 > > > 1. PxSCTL write arms the DET bit. It isn't intended to be left on when > PxCMD.ST (Start) is issued. It's not clear what should happen if this DOES > occur. (Undefined behavior, at the very least.) > See AHCI 1.3 section 3.3.1.1 "Offset 2Ch: PxSCTL – Port x Serial ATA Control > (SCR2: SControl)" > > This bit is intended to send a reset signal to attached SATA drives. > QEMU just synchronously resets the drive because we can. > > > 2. PxCI is not intended to be written to when PxCMD.ST is unset. The spec > suggests that when ST transitions from '1' to '0' that this field is cleared, > but it does not suggest what happens when it transitions from '0' to '1'. > QEMU will happily set the register. > > > 3. PxCMD write: This sets PxCMD.ST and PxCMD.FRE, which engages the AHCI > device in earnest. > > At this point, AHCI sees outstanding commands and tries to process them. > The FIS receive address is never programmed, so it's at zero. We start > reading a FIS there: > > 15712@1590789960.784835:handle_cmd_fis_dump ahci(0x55b4c56621a0)[2]: FIS: > 0x00: 27 80 e7 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x40: 34 40 70 01 01 14 eb 20 00 00 00 00 01 00 00 00 > 0x50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 0x70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > This is translated as: > 0x27 SATA_FIS_TYPE_REGISTER_H2D > 0x80 SATA_FIS_REG_H2D_UPDATE_COMMAND_REGISTER > 0xe7 command = FLUSH CACHE > > This will engage ide_flush_cache() (core.c) > > > At this point I get a little confused. I wouldn't think we'd have a > BlockBackend here for ide_flush to work on, but it seems to think we do and > allows the flush command to proceed. We then immediately try to cancel this > flush, but bdrv_aio_cancel can't tolerate an aiocb with a null BDS and panics. > > ...Hm, it should be the case that blk_do_flush detects this as > ENOMEDIUM, but are we maybe just canceling this request too fast? I > actually can't trigger this through the console, but I can trigger it by > redirecting input from a .txt file. > > Yup, OK: if you look in blk_aio_prwv, we schedule a oneshot to invoke > the callback on a synchronous failure, but we are managing to inject the > reset command before the oneshot gets a chance to run. What is not clear to me is, can this be reproduced by a real guest, or only replaying the fuzzer payload (via the qtest chardev)? Very nicely detailed analysis btw! Various parts are worth being copied in the fix commit description. > > I think either blk_aio_cancel or bdrv_aio_cancel needs to check that > there isn't a dangling BH callback -- it seems wrong to make it as far > as bdrv_aio_cancel when there's no BDS, but the IDE device no longer has > any idea why its callback hasn't returned yet, and blk_aio_cancel is the > only mechanism it has to kick the state machine forward. >