On Monday, 2 October 2023 John Snow <js...@redhat.com> wrote:

> Which reset pathway are you testing that causes the problem?

The test centres on a VM-initiated bus reset because a DMA write has stalled (I 
deliberately discard the iSCSI response).

> I'm not fully clear on why checking for DRQ is legitimate here.

It was the best condition I could find, there’s probably something better. 
DRQ_STAT seems to be set by ide_sector_start_dma() and cleared when the 
transfer ends.

It would be far simpler if s->nsector was trustworthy, but it’s massaged by 
ide_set_signature() immediately after being zeroed.

> Does this new condition fire before or after the registers have been reset as 
> part
> of the reset ...?

After, the flow is as follows:
- DMA transfer started
- Guest triggers AHCI reset
- ahci_reset_port() calls ide_bus_reset() calls ide_reset()
- ide_reset() clears state including LBA48 support etc
- ide_bus_reset() attempts to cancel pending async DMA operation
- bdrv_aio_cancel() sends async cancel request then polls for response
- Completion of DMA request arrives
- ide_dma_cb() calculates sector number by calling ide_get_sector()
- Because of the controller state after reset sector number is 0
- Next part of transfer is done

> I trust you there's a problem, but I don't know the specifics of it
> just yet to understand your proposed fix. (I have a nagging fear that
> it might trigger in more circumstances than we want it to, but I need
> more info to audit that.)

Hopefully the above clarifies things. I’ve done my best to make the fix very 
targeted but this is a complex interaction in subsystems I have little 
knowledge of.

> I'm also concerned about -- depending on WHEN this conditional will
> fire -- the effects of processing the end-of-transfer block on a newly
> reset (or about-to-be reset) device.

I understand, do you think there’s a better approach?

Regards
Simon

Reply via email to