Re: [Qemu-devel] [PATCH 2/2] ide: fix crash in IDE cdrom read

John Snow Tue, 12 Dec 2017 15:41:20 -0800

Hi Den, the long email about IDE stuff:

On 11/30/2017 07:01 AM, Denis V. Lunev wrote:
> On 11/29/2017 02:50 AM, John Snow wrote:
>>
>> On 11/28/2017 07:10 AM, Denis V. Lunev wrote:
>>> There is the following crash reported from the field in QEMU 2.9:
>>>     bdrv_inc_in_flight (bs=bs@entry=0x0)
>>>     blk_aio_prwv
>>>     blk_aio_preadv
>>>     ide_buffered_readv
>>>     cd_read_sector
>> Is ide_atapi_cmd_reply_end missing from the call stack here for some
>> reason? ide_data_readw /should/ have end_transfer_func set to that if it
>> was processing an ATAPI command; otherwise it shouldn't be able to get
>> here under normal circumstances...
> this is my fault. I have lost this line while removing unnecessary into
> from callstack. Here if the full one.
>


Well, at least we're back on terra firma here.

I'm going to talk myself through what the problem looks like here, and I
have some questions for you at the bottom of the email, so you can just
skim if you're not interested in the details.

> (gdb) bt
> #0  bdrv_inc_in_flight (bs=bs@entry=0x0) at block/io.c:561
> #1  0x000055b224160d37 in blk_aio_prwv (blk=0x55b2265405a0,
>     offset=offset@entry=285122560, bytes=2048,
> qiov=qiov@entry=0x55b22a31de20,
>     co_entry=co_entry@entry=0x55b224160400 <blk_aio_read_entry>,
>     flags=flags@entry=0, cb=cb@entry=0x55b224048e00
> <ide_buffered_readv_cb>,
>     opaque=opaque@entry=0x55b22a31de00) at block/block-backend.c:1151
> #2  0x000055b224160db5 in blk_aio_preadv (blk=<optimized out>,
>     offset=offset@entry=285122560, qiov=qiov@entry=0x55b22a31de20,
>     flags=flags@entry=0, cb=cb@entry=0x55b224048e00
> <ide_buffered_readv_cb>,
>     opaque=opaque@entry=0x55b22a31de00) at block/block-backend.c:1256
> #3  0x000055b22404bd8d in ide_buffered_readv (s=s@entry=0x55b22a712a68,
>     sector_num=556880, iov=iov@entry=0x55b22a712d60,
>     nb_sectors=nb_sectors@entry=4,
>     cb=cb@entry=0x55b22404f2a0 <cd_read_sector_cb>,
>     opaque=opaque@entry=0x55b22a712a68) at hw/ide/core.c:637
> #4  0x000055b22404e2c1 in cd_read_sector (s=0x55b22a712a68)
>     at hw/ide/atapi.c:198
> #5  ide_atapi_cmd_reply_end (s=0x55b22a712a68) at hw/ide/atapi.c:272
> #6  0x000055b224049704 in ide_data_readw (opaque=<optimized out>,
>     addr=<optimized out>) at hw/ide/core.c:2263
> #7  0x000055b223edd7f0 in portio_read (opaque=0x55b22a836000, addr=0,
> size=2)
>     at /usr/src/debug/qemu-2.9.0/ioport.c:180
> #8  0x000055b223ee8e3b in memory_region_read_accessor (mr=0x55b22a836000,
>     addr=0, value=0x7f843b5be7c0, size=2, shift=0, mask=65535, attrs=...)
>     at /usr/src/debug/qemu-2.9.0/memory.c:435
> #9  0x000055b223ee6369 in access_with_adjusted_size (addr=addr@entry=0,
>     value=value@entry=0x7f843b5be7c0, size=size@entry=2,
>     access_size_min=<optimized out>, access_size_max=<optimized out>,
>     access=access@entry=0x55b223ee8e10 <memory_region_read_accessor>,
>     mr=mr@entry=0x55b22a836000, attrs=attrs@entry=...)
>     at /usr/src/debug/qemu-2.9.0/memory.c:592
> #10 0x000055b223ee9d36 in memory_region_dispatch_read1 (attrs=..., size=2,
>     pval=0x7f843b5be7c0, addr=0, mr=0x55b22a836000)
>     at /usr/src/debug/qemu-2.9.0/memory.c:1238
> #11 memory_region_dispatch_read (mr=mr@entry=0x55b22a836000,
>     addr=addr@entry=0, pval=pval@entry=0x7f843b5be7c0, size=size@entry=2,
>     attrs=attrs@entry=...) at /usr/src/debug/qemu-2.9.0/memory.c:1269
> #12 0x000055b223e9bdb2 in address_space_read_continue (
>     as=as@entry=0x55b2247db8c0 <address_space_io>, addr=addr@entry=496,
> ---Type <return> to continue, or q <return> to quit---
>     attrs=..., attrs@entry=...,
>     buf=buf@entry=0x7f844dc103fe <Address 0x7f844dc103fe out of bounds>,
>     len=len@entry=2, addr1=0, l=2, mr=0x55b22a836000)
>     at /usr/src/debug/qemu-2.9.0/exec.c:2844
> #13 0x000055b223e9be67 in address_space_read_full (
>     as=0x55b2247db8c0 <address_space_io>, addr=addr@entry=496, attrs=...,
>     buf=buf@entry=0x7f844dc103fe <Address 0x7f844dc103fe out of bounds>,
>     len=2, len@entry=602521191) at /usr/src/debug/qemu-2.9.0/exec.c:2895
> #14 0x000055b223e9bfce in address_space_read (len=602521191,
>     buf=0x7f844dc103fe <Address 0x7f844dc103fe out of bounds>, attrs=...,
>     addr=496, as=<optimized out>)
>     at /usr/src/debug/qemu-2.9.0/include/exec/memory.h:1718
> #15 address_space_rw (as=<optimized out>, addr=addr@entry=496, attrs=...,
>     attrs@entry=...,
>     buf=buf@entry=0x7f844dc103fe <Address 0x7f844dc103fe out of bounds>,
>     len=len@entry=2, is_write=is_write@entry=false)
>     at /usr/src/debug/qemu-2.9.0/exec.c:2909
> #16 0x000055b223ee5271 in kvm_handle_io (count=512, size=2,
>     direction=<optimized out>, data=<optimized out>, attrs=..., port=496)
>     at /usr/src/debug/qemu-2.9.0/kvm-all.c:1828
> #17 kvm_cpu_exec (cpu=cpu@entry=0x55b229032000)
>     at /usr/src/debug/qemu-2.9.0/kvm-all.c:2058
> #18 0x000055b223ed1c22 in qemu_kvm_cpu_thread_fn (arg=0x55b229032000)
>     at /usr/src/debug/qemu-2.9.0/cpus.c:1119
> #19 0x00007f8443993e25 in start_thread (arg=0x7f843b5bf700)
>     at pthread_create.c:308
> #20 0x00007f84436c134d in clone ()
>     at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
> 
> 
> 
>>>     ide_data_readw
>> How do we get here? ide_is_pio_out ought to be false here; do you know
>> what command was being processed? Do you have a reproducer?
>>
>> Knowing both the ATA and ATAPI commands under consideration here would
>> be helpful.
>>
> 
> if you prefer, I can upload core/binary to some storage.
> Here is the state.
> 

This might be helpful for me to poke around at, but unfortunately it
looks like I can't see the ATAPI command that was being processed from
the trace given. I don't think there's a way for me to retrieve it from
a core, either.

> (gdb) p *s
> $3 = {bus = 0x55b22a7129f0, unit = 0 '\000', drive_kind = IDE_CD,
>   cylinders = 0, heads = 0, sectors = 0, chs_trans = 0, nb_sectors =
> 11588488,
>   mult_sectors = 16, identify_set = 1,
>   identify_data = "\300\205", '\000' <repeats 18 times>, "MQ0000 1", ' '
> <repeats 12 times>, "\003\000\000\002\004\000.2+5    EQUMD DVR-MO", ' '
> <repeats 28 times>,
> "\000\000\001\000\000\003\000\000\000\000\000\000\a", '\000' <repeats 17
> times>,
> "\a\000\a\000\003\000\264\000\264\000,\001\264\000\000\000\000\000\036\000\036",
> '\000' <repeats 15 times>, "\036", '\000' <repeats 15 times>, "? ",
> '\000' <repeats 333 times>, drive_serial = 1,
>   drive_serial_str = "QM00001", '\000' <repeats 13 times>,
>   drive_model_str = "QEMU DVD-ROM", '\000' <repeats 28 times>, wwn = 0,
>   feature = 0 '\000', error = 0 '\000', nsector = 2, sector = 0 '\000',
>   lcyl = 0 '\000', hcyl = 8 '\b', hob_feature = 0 '\000',
>   hob_nsector = 3 '\003', hob_sector = 0 '\000', hob_lcyl = 0 '\000',
>   hob_hcyl = 8 '\b', select = 32 ' ', status = 80 'P', lba48 = 0 '\000',

status = BUSY_STAT
nsector = ATAPI_INT_REASON_IO (set in ide_atapi_cmd_reply_end)
byte_count_limit = 2048 (hcyl << 8)

>   blk = 0x55b2265405a0, version = "2.5+\000\000\000\000", events = {
>     eject_request = false, new_media = false}, sense_key = 0 '\000',
>   asc = 0 '\000', tray_open = false, tray_locked = false,
>   cdrom_changed = 0 '\000', packet_transfer_size = 55296,
>   elementary_transfer_size = 0, io_buffer_index = 2048, lba = 139220,

packet_transfer_size = 55296, (27 cd sectors)
elementary_transfer_size = 0,
io_buffer_index = 2048,
lba = 139220, or 0x021fd4


This is either an initial read request, or a rebuffering.
In the normative case, initial reads look like this:

IDE:
ide_ioport_write
  ide_exec_cmd (0xA0)
    cmd_packet
      ide_transfer_start
        s->end_transfer_func = ide_atapi_cmd
        [data_ptr and data_end set to meaningful values]
[Return to guest, await PIO write of scsi cdb]
ide_data_writew
  ide_atapi_cmd
    [soft error if !blk->bs]
    cmd_read_cd
      ide_atapi_cmd_read
        ide_atapi_cmd_read_pio
          ide_atapi_cmd_reply_end
            cd_read_sector
              [segv if !blk->bs]

AHCI:
handle_reg_h2d_fis
  [scsi CDB is copied into s->io_buffer]
  ide_exec_cmd
    cmd_packet
      ide_transfer_start
        [data_ptr and data_end set to transfer scsi cdb]
          ahci_start_transfer
            [shortcut, no transfer]
            s->data_ptr set to s->data_end [*]
              ide_atapi_cmd

* (but data_end likely is still != s->io_buffer)

from here the callback is the same as above: we'll wind up at
cd_read_sector. Neither of these cases match your traceback.

If it was the initial buffering, it would have occurred synchronously
with receipt of the SCSI CDB, but your traces place it as a consequence
of an attempt to read data, so that makes me suspect it's a rebuffering
attempt, and the data in s->io_buffer is... cd data? I guess? did you
have a CD in here at one point?

>   cd_sector_size = 2048, atapi_dma = 0, acct = {bytes = 2048,

So we're using PIO paths when applicable. The s->feature register being
0 seems to agree.

>     start_time_ns = 433417215764666, type = BLOCK_ACCT_READ}, pio_aiocb
> = 0x0,
>   iov = {iov_base = 0x55b22a8ae000, iov_len = 2048}, qiov = {
>     iov = 0x55b22a712d50, niov = 1, nalloc = -1, size = 2048},
>   buffered_requests = {lh_first = 0x0}, io_buffer_offset = 0,
>   io_buffer_size = 0, sg = {sg = 0x0, nsg = 0, nalloc = 0, size = 0,
>     dev = 0x0, as = 0x0}, req_nb_sectors = 0,
>   end_transfer_func = 0x55b22404e190 <ide_atapi_cmd_reply_end>,
>   data_ptr = 0x55b22a8ae800 "", data_end = 0x55b22a8ae800 "",
>   io_buffer = 0x55b22a8ae000

s->data_ptr  = 0x55b22a8ae800
s->data_end  = 0x55b22a8ae800
s->io_buffer = 0x55b22a8ae000
s->io_buffer_offset = 0

Uh, hm. This does not look like an in-progress PIO transfer of any kind...

This says to me that ide_dummy_transfer_stop was called at some point
and nothing has started up in its place.

This seems to conflict with the trace, which looks like a rebuffering.
This looks like... undefined behavior entirely. I am not sure I have a
good guess as to what's happening.

> ")8\"\t\031\"\232\vx.\311l\301\270o$\335\025\257\064F\271\rI,R\342\032\315\nԆV\341od\251\023>\024\370\323\060A\256\337\300텋\024\233\201U\221T^\202\303\036\"E%\262\230\367ξ\fW\302\360\277\347\334\022\253\a\025\216\rj\334z\355>)\230/\021\255%a^\306\001\032",

... hmm

0x29 56 34 09 19

0x29 is... almost a read command (0x28) but it isn't. 0x29 shouldn't map
to anything, actually. So this must be data we've buffered, or something
left over from god knows what.

> 
>   io_buffer_total_len = 131076, cur_io_buffer_offset = 0,
>   cur_io_buffer_len = 0, end_transfer_fn_idx = 0 '\000',
>   sector_write_timer = 0x55b22a45d5c0, irq_count = 0, ext_error = 0 '\000',
>   mdata_size = 0, mdata_storage = 0x0, media_changed = 0,
>   dma_cmd = IDE_DMA_READ, smart_enabled = 1 '\001', smart_autosave = 1
> '\001',
>   smart_errors = 0, smart_selftest_count = 0 '\000',
> ---Type <return> to continue, or q <return> to quit---
>   smart_selftest_data = 0x55b22a827000 "", ncq_queues = 0}
> (gdb)
> 
> In general, I can upload the binary and core to the location
> you want. Send it to me in person. By the way. I am staring to
> receive these crashes in number. 2-3 in a day BUT from one
> specific host. This could be important.
> 
>>>     portio_read
>>>     memory_region_read_accessor
>>>     access_with_adjusted_size
>>>     memory_region_dispatch_read1
>>>     memory_region_dispatch_read
>>>     address_space_read_continue
>>>     address_space_read_full
>>>     address_space_read
>>>     address_space_rw
>>>     kvm_handle_io
>>>     kvm_cpu_exec
>>>     qemu_kvm_cpu_thread_fn
>>>     start_thread
>>>     clone
>> The thing that looks to be incongruent here is that we appear to be
>> servicing a ATAPI reply; that reply is either the kind that uses
>> preformatted data in a buffer, or the kind that buffers a read.
>>
>> If it buffers a read, it should require CHECK_READY which requires a medium.
>>
>> If it does not buffer a read, it should not be able to invoke
>> cd_read_sector or any bdrv function from ide_data_readw.
>>
>> If it neither serves preformatted data nor buffers a read, it should not
>> allow ide_data_readw to perform any action at all.
> the guest could misbehave! This is not a reason to crash.
> For example there is a race or something like this. What
> if the guest will execute read from the device without
> ANY sanity checks? QEMU should not crash. This is the point
> and this is the approach taken in the original patch.
> 

The stuff quoted above isn't participatory, it's mandatory. I don't know
how we're bypassing CHECK_READY. I want to understand what's happening
before I pepper in sanity checks.

in 2.9.0, you have this code:

    if ((cmd->flags & CHECK_READY) &&
        (!media_present(s) || !blk_is_inserted(s->blk)))
    {
        ide_atapi_cmd_error(s, NOT_READY, ASC_MEDIUM_NOT_PRESENT);
        return;
    }

with media_present defined as:

static inline int media_present(IDEState *s)
{
    return !s->tray_open && s->nb_sectors > 0;
}

... however, from your gdb output, we do have:

tray_open = false
nb_sectors = 11588488

but blk_is_inserted is still going to check for the presence of 'bs'
attached to the Block Backend, so this check should still prevent us
from executing a read command out of the gate.

Is the guest executing a read and then are we racing with the removal of
the bds? Do you expect a CD to be inserted on this guest?

nb_sectors I would expect to be 0; based on ide_cd_change_cb:

    blk_get_geometry(s->blk, &nb_sectors);

this should set nb_sectors to 0 if there's no blk->bs.

I'm sort of lost, I can't really easily construct what exactly happened
here.

> 
>>> Indeed, the CDROM device without media has blk->bs == NULL. We should
>>> check that the media is really available for the device like has been done
>>> in SCSI code.
>>>
>>> May be the patch adds a bit more check than necessary, but this is not be
>>> the problem. We should always stay on the safe side.
>>>
>>> Signed-off-by: Denis V. Lunev <d...@openvz.org>
>>> CC: John Snow <js...@redhat.com>
>>> CC: Kevin Wolf <kw...@redhat.com>
>>> CC: Stefan Hajnoczi <stefa...@redhat.com>
>>> ---
>>>  hw/ide/atapi.c | 32 ++++++++++++++++++++++++++++----
>>>  hw/ide/core.c  |  4 ++--
>>>  2 files changed, 30 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c
>>> index c0509c8bf5..fa50c0ccf6 100644
>>> --- a/hw/ide/atapi.c
>>> +++ b/hw/ide/atapi.c
>>> @@ -119,6 +119,11 @@ cd_read_sector_sync(IDEState *s)
>>>  
>>>      trace_cd_read_sector_sync(s->lba);
>>>  
>>> +    if (!blk_is_available(s->blk)) {
>>> +        ret = -ENOMEDIUM;
>>> +        goto fail;
>>> +    }
>>> +
>>>      switch (s->cd_sector_size) {
>>>      case 2048:
>>>          ret = blk_pread(s->blk, (int64_t)s->lba << ATAPI_SECTOR_BITS,
>>> @@ -132,8 +137,8 @@ cd_read_sector_sync(IDEState *s)
>>>          }
>>>          break;
>>>      default:
>>> -        block_acct_invalid(blk_get_stats(s->blk), BLOCK_ACCT_READ);
>>> -        return -EIO;
>>> +        ret = -EIO;
>>> +        goto fail;
>>>      }
>>>  
>>>      if (ret < 0) {
>>> @@ -145,6 +150,10 @@ cd_read_sector_sync(IDEState *s)
>>>      }
>>>  
>>>      return ret;
>>> +
>>> +fail:
>>> +    block_acct_invalid(blk_get_stats(s->blk), BLOCK_ACCT_READ);
>>> +    return ret;
>>>  }
>>>  
>>>  static void cd_read_sector_cb(void *opaque, int ret)
>>> @@ -174,9 +183,15 @@ static void cd_read_sector_cb(void *opaque, int ret)
>>>  
>>>  static int cd_read_sector(IDEState *s)
>>>  {
>>> +    int err;
>>> +
>>>      if (s->cd_sector_size != 2048 && s->cd_sector_size != 2352) {
>>> -        block_acct_invalid(blk_get_stats(s->blk), BLOCK_ACCT_READ);
>>> -        return -EINVAL;
>>> +        err = -EINVAL;
>>> +        goto fail;
>>> +    }
>>> +    if (!blk_is_available(s->blk)) {
>>> +        err = -ENOMEDIUM;
>>> +        goto fail;
>>>      }
>>>  
>>>      s->iov.iov_base = (s->cd_sector_size == 2352) ?
>>> @@ -195,6 +210,10 @@ static int cd_read_sector(IDEState *s)
>>>  
>>>      s->status |= BUSY_STAT;
>>>      return 0;
>>> +
>>> +fail:
>>> +    block_acct_invalid(blk_get_stats(s->blk), BLOCK_ACCT_READ);
>>> +    return err;
>>>  }
>>>  
>>>  void ide_atapi_cmd_ok(IDEState *s)
>>> @@ -404,6 +423,11 @@ static void ide_atapi_cmd_read_dma_cb(void *opaque, 
>>> int ret)
>>>          goto eot;
>>>      }
>>>  
>>> +    if (!blk_is_available(s->blk)) {
>>> +        ide_atapi_cmd_read_dma_cb(s, -ENOMEDIUM);
>>> +        return;
>>> +    }
>>> +
>> I'm not sure this is OK, because it's an error but not setting the sense
>> code or raising an IRQ; it's only calling ide_set_inactive, so this
>> might be a problem. Normally the error code from the data read is
>> handled earlier in the call by ide_handle_rw_error which can set the
>> proper codes.
>>
> this follows the approach used by the error handling in this call.
> I this code is not made worse. We can hit to the same processing
> with the different error upper.
> 
>>>      s->io_buffer_index = 0;
>>>      if (s->cd_sector_size == 2352) {
>>>          n = 1;
>>> diff --git a/hw/ide/core.c b/hw/ide/core.c
>>> index 471d0c928b..71780fc9d1 100644
>>> --- a/hw/ide/core.c
>>> +++ b/hw/ide/core.c
>>> @@ -758,7 +758,7 @@ static void ide_sector_read(IDEState *s)
>>>  
>>>      trace_ide_sector_read(sector_num, n);
>>>  
>>> -    if (!ide_sect_range_ok(s, sector_num, n)) {
>>> +    if (!ide_sect_range_ok(s, sector_num, n) || !blk_is_available(s->blk)) 
>>> {
>>>          ide_rw_error(s);
>>>          block_acct_invalid(blk_get_stats(s->blk), BLOCK_ACCT_READ);
>>>          return;
>>> @@ -1023,7 +1023,7 @@ static void ide_sector_write(IDEState *s)
>>>  
>>>      trace_ide_sector_write(sector_num, n);
>>>  
>>> -    if (!ide_sect_range_ok(s, sector_num, n)) {
>>> +    if (!ide_sect_range_ok(s, sector_num, n) || !blk_is_available(s->blk)) 
>>> {
>>>          ide_rw_error(s);
>>>          block_acct_invalid(blk_get_stats(s->blk), BLOCK_ACCT_WRITE);
>>>          return;
>>>
>> Since it's not a new regression (this is being reported against 2.9) I
>> am somewhat hesitant to rush it into 2.11-rc3 without a little more info.
>>
>> That said, here's a list of the ATAPI commands we service that either
>> return or CAN return data, but do not enforce CHECK_READY:
>>
>>     [ 0x03 ] = { cmd_request_sense,                 ALLOW_UA },
>>     [ 0x12 ] = { cmd_inquiry,                       ALLOW_UA },
>>     [ 0x46 ] = { cmd_get_configuration,             ALLOW_UA },
>>     [ 0x4a ] = { cmd_get_event_status_notification, ALLOW_UA },
>>     [ 0x5a ] = { cmd_mode_sense, /* (10) */         0 },
>>     [ 0xbd ] = { cmd_mechanism_status,              0 },
>>
>> These six all invoke ide_atapi_cmd_reply in some way (which allows the
>> guest to begin reading the reply via PIO mechanisms if DMA is not set):
>>
>> cmd_request_sense:     ide_atapi_cmd_reply(s, 18, max_len);
>> cmd_inquiry:           ide_atapi_cmd_reply(s, idx, max_len);
>> cmd_get_configuration: ide_atapi_cmd_reply(s, len, max_len);
>> cmd_get_event_status_notification:
>>                        ide_atapi_cmd_reply(s, used_len, max_len);
>> cmd_mode_sense:        ide_atapi_cmd_reply(s, 16, max_len);
>>                        ide_atapi_cmd_reply(s, 24, max_len);
>>                        ide_atapi_cmd_reply(s, 30, max_len);
>> cmd_mechanism_status:  ide_atapi_cmd_reply(s, 8, max_len);
>>
>> ide_atapi_cmd_reply itself sets lba to -1, which should inform
>> ide_atapi_cmd_reply_end not to attempt to buffer any new data. These
>> *should* be safe. The reply sizes are also all small enough that they
>> are almost certainly getting buffered in one single transfer without
>> attempting to buffer more data.
>>
>> In the normative PIO case then, reads will consume from this buffer
>> until empty, then we'll call ide_atapi_cmd_reply_end through
>> end_transfer_func and hit the end of transfer logic.
>>
>> I'm not sure I see how this crash is happening; it doesn't look like
>> this path allows for invoking the block read functions and everything
>> else is guarded by NONDATA or CHECK_READY.
>>
>> Unless this is an interesting interaction with ide_atapi_cmd_reply
>> setting the DRQ bit which may trick the pio read function to allow PIO
>> reads to come in during this time?
>>
>> Hypothetically:
>>
>> cmd_packet sets end_transfer_func to ide_atapi_cmd so it can process
>> further once that PIO is completed. (The PIO may be emulated
>> synchronously, in the case of AHCI.) In the true PIO case, it may be
>> asynchronous.
>>
>> Now, in the AHCI case, the guest has control back and the CDROM is now
>> executing a command to, perhaps, read data via DMA. Then,
>> simultaneously, the guest issues a PIO read and because the DRQ bit is
>> set for DMA, the PIO read also goes through.
>>
>> This still requires a media, though... and a very broken guest doing
>> something naughty on purpose.
>>
>> I can't quite connect the dots to see how this crash is possible in
>> general... it'd help to have:
>>
>> - The ATAPI command being processed at the time
>> - The IDE command being processed at the time
>> - s->status
>>
>> as a minimum, and then perhaps optionally some of the ATAPI loop
>> variables, like packet_transfer_size, elementary_transfer_size, and
>> io_buffer_index. Or a reproducer!
>>
>> Sorry I'm not more help and I wrote too much again :(
> imho this does not matter. Once again - the guest can misbehave.

I never said that if a guest acting out of spec was the problem I
wouldn't fix it. I just want to find the root cause.

If this has been broken since 2.9, 2.11-rc3 is too late for a bandaid
applied to something I can't diagnose. Let's discuss this for 2.12 and I
will keep trying to figure out what the root cause is.


Some questions for you:

(1) Is the guest Linux? Do we know why this one machine might be
tripping up QEMU? (Is it running a fuzzer, a weird OS, etc...?)
(2) Does the VM actually have a CDROM inserted at some point? Is it
possible we're racing on some kind of eject or graph manipulation failure?
(3) Is this using AHCI or IDE?


If I can't figure it out within a week or so from here I'll just check
in the band-aid with some /* FIXME */ comments attached.

--js

Re: [Qemu-devel] [PATCH 2/2] ide: fix crash in IDE cdrom read

Reply via email to