Hi On Mon, Feb 29, 2016 at 7:40 PM, Markus Armbruster <arm...@redhat.com> wrote: > When configured for interrupts (property "chardev" given), we receive > the shared memory from an ivshmem server. We do so asynchronously > after realize() completes, by setting up callbacks with > qemu_chr_add_handlers(). > > Keeping server I/O out of realize() that way avoids delays due to a > slow server. This is probably relevant only for hot plug. > > However, this funny "no shared memory, yet" state of the device also > causes a raft of issues that are hard or impossible to work around: > > * The guest is exposed to this state: when we enter and leave it its > shared memory contents is apruptly replaced, and device register > IVPosition changes. > > This is a known issue. We document that guests should not access > the shared memory after device initialization until the IVPosition > register becomes non-negative. > > For cold plug, the funny state is unlikely to be visible in > practice, because we normally receive the shared memory long before > the guest gets around to mess with the device. > > For hot plug, the timing is tighter, but the relative slowness of > PCI device configuration has a good chance to hide the funny state. > > In either case, guests complying with the documented procedure are > safe. > > * Migration becomes racy. > > If migration completes before the shared memory setup completes on > the source, shared memory contents is silently lost. Fortunately, > migration is rather unlikely to win this race. > > If the shared memory's ramblock arrives at the destination before > shared memory setup completes, migration fails. > > There is no known way for a management application to wait for > shared memory setup to complete. > > All you can do is retry failed migration. You can improve your > chances by leaving more time between running the destination QEMU > and the migrate command. > > To mitigate silent memory loss, you need to ensure the server > initializes shared memory exactly the same on source and > destination. > > These issues are entirely undocumented so far. > > I'd expect the server to be almost always fast enough to hide these > issues. But then rare catastrophic races are in a way the worst kind. > > This is way more trouble than I'm willing to take from any device. > Kill the funny state by receiving shared memory synchronously in > realize(). If your hot plug hangs, go kill your ivshmem server. > > For easier review, this commit only makes the receive synchronous, it > doesn't add the necessary error propagation. Without that, the funny > state persists. The next commit will do that, and kill it off for > real. > > Signed-off-by: Markus Armbruster <arm...@redhat.com> > --- > hw/misc/ivshmem.c | 70 > +++++++++++++++++++++++++++++++++++++--------------- > tests/ivshmem-test.c | 26 ++++++------------- > 2 files changed, 57 insertions(+), 39 deletions(-) > > diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c > index c366087..352937f 100644 > --- a/hw/misc/ivshmem.c > +++ b/hw/misc/ivshmem.c > @@ -676,27 +676,47 @@ static void ivshmem_read(void *opaque, const uint8_t > *buf, int size) > process_msg(s, incoming_posn, incoming_fd); > } > > -static void ivshmem_check_version(void *opaque, const uint8_t * buf, int > size) > +static int64_t ivshmem_recv_msg(IVShmemState *s, int *pfd) > { > - IVShmemState *s = opaque; > - int tmp; > - int64_t version; > + int64_t msg; > + int n, ret; > > - if (!fifo_update_and_get_i64(s, buf, size, &version)) { > - return; > - } > + n = 0; > + do { > + ret = qemu_chr_fe_read_all(s->server_chr, (uint8_t *)&msg + n, > + sizeof(msg) - n); > + if (ret < 0 && ret != -EINTR) { > + /* TODO error handling */ > + return INT64_MIN; > + } > + n += ret; > + } while (n < sizeof(msg)); > > - tmp = qemu_chr_fe_get_msgfd(s->server_chr); > - if (tmp != -1 || version != IVSHMEM_PROTOCOL_VERSION) { > + *pfd = qemu_chr_fe_get_msgfd(s->server_chr); > + return msg; > +} > + > +static void ivshmem_recv_setup(IVShmemState *s) > +{ > + int64_t msg; > + int fd; > + > + msg = ivshmem_recv_msg(s, &fd); > + if (fd != -1 || msg != IVSHMEM_PROTOCOL_VERSION) { > fprintf(stderr, "incompatible version, you are connecting to a > ivshmem-" > "server using a different protocol please check your > setup\n"); > - qemu_chr_add_handlers(s->server_chr, NULL, NULL, NULL, s); > return; > } > > - IVSHMEM_DPRINTF("version check ok, switch to real chardev handler\n"); > - qemu_chr_add_handlers(s->server_chr, ivshmem_can_receive, ivshmem_read, > - NULL, s); > + /* > + * Receive more messages until we got shared memory. > + */ > + do { > + msg = ivshmem_recv_msg(s, &fd); > + process_msg(s, msg, fd); > + } while (msg != -1); > + > + assert(memory_region_is_mapped(&s->ivshmem));
It is easy to trigger at runtime, I suggest to report an error instead. looks good otherwise > } > > /* Select the MSI-X vectors used by device. > @@ -903,19 +923,29 @@ static void pci_ivshmem_realize(PCIDevice *dev, Error > **errp) > IVSHMEM_DPRINTF("using shared memory server (socket = %s)\n", > s->server_chr->filename); > > - if (ivshmem_setup_interrupts(s) < 0) { > - error_setg(errp, "failed to initialize interrupts"); > - return; > - } > - > /* we allocate enough space for 16 peers and grow as needed */ > resize_peers(s, 16); > s->vm_id = -1; > > pci_register_bar(dev, 2, attr, &s->bar); > > - qemu_chr_add_handlers(s->server_chr, ivshmem_can_receive, > - ivshmem_check_version, NULL, s); > + /* > + * Receive setup messages from server synchronously. > + * Older versions did it asynchronously, but that creates a > + * number of entertaining race conditions. > + * TODO Propagate errors! Without that, we still have races > + * on errors. > + */ > + ivshmem_recv_setup(s); > + if (memory_region_is_mapped(&s->ivshmem)) { > + qemu_chr_add_handlers(s->server_chr, ivshmem_can_receive, > + ivshmem_read, NULL, s); > + } > + > + if (ivshmem_setup_interrupts(s) < 0) { > + error_setg(errp, "failed to initialize interrupts"); > + return; > + } > } else { > /* just map the file immediately, we're not using a server */ > int fd; > diff --git a/tests/ivshmem-test.c b/tests/ivshmem-test.c > index c1dd7bb..68d6840 100644 > --- a/tests/ivshmem-test.c > +++ b/tests/ivshmem-test.c > @@ -309,35 +309,23 @@ static void test_ivshmem_server(bool msi) > ret = ivshmem_server_start(&server); > g_assert_cmpint(ret, ==, 0); > > - setup_vm_with_server(&state1, nvectors, msi); > - s1 = &state1; > - setup_vm_with_server(&state2, nvectors, msi); > - s2 = &state2; > - > - /* check state before server sends stuff */ > - g_assert_cmpuint(in_reg(s1, IVPOSITION), ==, 0xffffffff); > - g_assert_cmpuint(in_reg(s2, IVPOSITION), ==, 0xffffffff); > - g_assert_cmpuint(qtest_readb(s1->qtest, (uintptr_t)s1->mem_base), ==, > 0x00); > - > thread.server = &server; > ret = pipe(thread.pipe); > g_assert_cmpint(ret, ==, 0); > thread.thread = g_thread_new("ivshmem-server", server_thread, &thread); > g_assert(thread.thread != NULL); > > - /* waiting for devices to become operational */ > - while (g_get_monotonic_time() < end_time) { > - g_usleep(1000); > - if ((int)in_reg(s1, IVPOSITION) >= 0 && > - (int)in_reg(s2, IVPOSITION) >= 0) { > - break; > - } > - } > + setup_vm_with_server(&state1, nvectors, msi); > + s1 = &state1; > + setup_vm_with_server(&state2, nvectors, msi); > + s2 = &state2; > > /* check got different VM ids */ > vm1 = in_reg(s1, IVPOSITION); > vm2 = in_reg(s2, IVPOSITION); > - g_assert_cmpuint(vm1, !=, vm2); > + g_assert_cmpint(vm1, >=, 0); > + g_assert_cmpint(vm2, >=, 0); > + g_assert_cmpint(vm1, !=, vm2); > > /* check number of MSI-X vectors */ > global_qtest = s1->qtest; > -- > 2.4.3 > > -- Marc-André Lureau