Re: [Qemu-devel] [PATCH 2/2] atapi migration: Throw recoverable error to avoid recovery

John Snow Tue, 09 Dec 2014 22:16:34 -0800


On 12/09/2014 01:15 PM, Dr. David Alan Gilbert (git) wrote:

From: "Dr. David Alan Gilbert" <dgilb...@redhat.com>

(With the previous atapi_dma flag recovery)
If migration happens between the ATAPI command being written and the
bmdma being started, the DMA is dropped.  Eventually the guest times
out and recovers, but that can take many seconds.
(This is rare, on a pingpong reading the CD continuously I hit
this about ~1/30-1/50 migrates)

I don't think we've got enough state to be able to recover safely
at this point, so I throw a 'medium error, no seek complete'
that I'm assuming guests will try and recover from an apparently
dirty CD.

OK, it's a hack, the real solution is probably to push a lot of
ATAPI state into the migration stream, but this is a fix that
works with no stream changes. Tested only on Linux (both RHEL5
(pre-libata) and RHEL7).

Signed-off-by: Dr. David Alan Gilbert <dgilb...@redhat.com>
---
  hw/ide/atapi.c    | 17 +++++++++++++++++
  hw/ide/internal.h |  2 ++
  hw/ide/pci.c      | 11 +++++++++++
  3 files changed, 30 insertions(+)

diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c
index c63b7e5..e17799c 100644
--- a/hw/ide/atapi.c
+++ b/hw/ide/atapi.c
@@ -394,6 +394,23 @@ static void ide_atapi_cmd_read(IDEState *s, int lba, int 
nb_sectors,
      }
  }

+
+/* Called by *_restart_bh when the transfer function points
+ * to ide_atapi_cmd
+ */
+void ide_atapi_dma_restart(IDEState *s)
+{
+    /*
+     * I'm not sure we have enough stored to restart the command
+     * safely, so give the guest an error it should recover from.
+     * I'm assuming most guests will try to recover from something
+     * listed as a medium error on a CD; it seems to work on Linux.
+     * This would be more of a problem if we did any other type of
+     * DMA operation.
+     */
+    ide_atapi_cmd_error(s, MEDIUM_ERROR, ASC_NO_SEEK_COMPLETE);
+}
+


Is this safe for non-data commands? Can we even get there in such a case?

  static inline uint8_t ide_atapi_set_profile(uint8_t *buf, uint8_t *index,
                                              uint16_t profile)
  {
diff --git a/hw/ide/internal.h b/hw/ide/internal.h
index 8a3eca4..8b65285 100644
--- a/hw/ide/internal.h
+++ b/hw/ide/internal.h
@@ -289,6 +289,7 @@ typedef struct IDEDMAOps IDEDMAOps;
  #define ATAPI_INT_REASON_TAG            0xf8

  /* same constants as bochs */
+#define ASC_NO_SEEK_COMPLETE                 0x02
  #define ASC_ILLEGAL_OPCODE                   0x20
  #define ASC_LOGICAL_BLOCK_OOR                0x21
  #define ASC_INV_FIELD_IN_CMD_PACKET          0x24
@@ -529,6 +530,7 @@ void ide_dma_error(IDEState *s);

  void ide_atapi_cmd_ok(IDEState *s);
  void ide_atapi_cmd_error(IDEState *s, int sense_key, int asc);
+void ide_atapi_dma_restart(IDEState *s);
  void ide_atapi_io_error(IDEState *s, int ret);

  void ide_ioport_write(void *opaque, uint32_t addr, uint32_t val);
diff --git a/hw/ide/pci.c b/hw/ide/pci.c
index bee5ad3..e3f2054 100644
--- a/hw/ide/pci.c
+++ b/hw/ide/pci.c
@@ -235,6 +235,17 @@ static void bmdma_restart_bh(void *opaque)
          }
      } else if (error_status & IDE_RETRY_FLUSH) {
          ide_flush_cache(bmdma_active_if(bm));
+    } else {
+        IDEState *s = bmdma_active_if(bm);
+
+        /*
+         * We've not got any bits to tell us about ATAPI - but
+         * we do have the end_transfer_func that tells us what
+         * we're trying to do.
+         */
+        if (s->end_transfer_func == ide_atapi_cmd) {
+            ide_atapi_dma_restart(s);
+        }

OK, so when the restart routines get invoked we add a hook to see if wewere in the middle of an ATAPI command and acknowledge that we don'tknow how to properly handle this.

Isn't this going to run on every vmstate change, though? I think wedon't clear out end_transfer_func on success, so this might fire offmore than we want it to, although I guess end_transfer_func is usuallygoing to get set to ide_atapi_cmd_reply_end if it finishes normally ...

      }
  }

Indeed a hack, but it's probably appropriate: if our code cannot in facthandle ATAPI migration, throwing an error or disabling migration is thecorrect thing to do, but I don't think users would be very happy withthe second option. I feel that this is an OK workaround because itshould not introduce spurious errors or retries for cases where wemanage to avoid migrating in the middle of the loop. This will at leastlet the currently broken case limp along until we fix it more properly.

What makes me the most curious is how this plays out in Windows if thiscase is triggered. Throw a trace around the fake error and see if youcan't observe it getting called during a pingpong test while Windowsreads a CD.

Re: [Qemu-devel] [PATCH 2/2] atapi migration: Throw recoverable error to avoid recovery

Reply via email to