On Monday, 2 October 2023 John Snow <js...@redhat.com> wrote: > Which reset pathway are you testing that causes the problem?
The test centres on a VM-initiated bus reset because a DMA write has stalled (I deliberately discard the iSCSI response). > I'm not fully clear on why checking for DRQ is legitimate here. It was the best condition I could find, there’s probably something better. DRQ_STAT seems to be set by ide_sector_start_dma() and cleared when the transfer ends. It would be far simpler if s->nsector was trustworthy, but it’s massaged by ide_set_signature() immediately after being zeroed. > Does this new condition fire before or after the registers have been reset as > part > of the reset ...? After, the flow is as follows: - DMA transfer started - Guest triggers AHCI reset - ahci_reset_port() calls ide_bus_reset() calls ide_reset() - ide_reset() clears state including LBA48 support etc - ide_bus_reset() attempts to cancel pending async DMA operation - bdrv_aio_cancel() sends async cancel request then polls for response - Completion of DMA request arrives - ide_dma_cb() calculates sector number by calling ide_get_sector() - Because of the controller state after reset sector number is 0 - Next part of transfer is done > I trust you there's a problem, but I don't know the specifics of it > just yet to understand your proposed fix. (I have a nagging fear that > it might trigger in more circumstances than we want it to, but I need > more info to audit that.) Hopefully the above clarifies things. I’ve done my best to make the fix very targeted but this is a complex interaction in subsystems I have little knowledge of. > I'm also concerned about -- depending on WHEN this conditional will > fire -- the effects of processing the end-of-transfer block on a newly > reset (or about-to-be reset) device. I understand, do you think there’s a better approach? Regards Simon