Re: [Qemu-devel] Re: [PATCH 09/21] Introduce event-tap.
2011/1/4 Michael S. Tsirkin m...@redhat.com: On Tue, Jan 04, 2011 at 10:45:13PM +0900, Yoshiaki Tamura wrote: 2011/1/4 Michael S. Tsirkin m...@redhat.com: On Tue, Jan 04, 2011 at 09:20:53PM +0900, Yoshiaki Tamura wrote: 2011/1/4 Michael S. Tsirkin m...@redhat.com: On Tue, Jan 04, 2011 at 08:02:54PM +0900, Yoshiaki Tamura wrote: 2010/11/29 Stefan Hajnoczi stefa...@gmail.com: On Thu, Nov 25, 2010 at 6:06 AM, Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp wrote: event-tap controls when to start FT transaction, and provides proxy functions to called from net/block devices. While FT transaction, it queues up net/block requests, and flush them when the transaction gets completed. Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp Signed-off-by: OHMURA Kei ohmura@lab.ntt.co.jp --- Makefile.target | 1 + block.h | 9 + event-tap.c | 794 +++ event-tap.h | 34 +++ net.h | 4 + net/queue.c | 1 + 6 files changed, 843 insertions(+), 0 deletions(-) create mode 100644 event-tap.c create mode 100644 event-tap.h event_tap_state is checked at the beginning of several functions. If there is an unexpected state the function silently returns. Should these checks really be assert() so there is an abort and backtrace if the program ever reaches this state? +typedef struct EventTapBlkReq { + char *device_name; + int num_reqs; + int num_cbs; + bool is_multiwrite; Is multiwrite logging necessary? If event tap is called from within the block layer then multiwrite is turned into one or more bdrv_aio_writev() calls. +static void event_tap_replay(void *opaque, int running, int reason) +{ + EventTapLog *log, *next; + + if (!running) { + return; + } + + if (event_tap_state != EVENT_TAP_LOAD) { + return; + } + + event_tap_state = EVENT_TAP_REPLAY; + + QTAILQ_FOREACH(log, event_list, node) { + EventTapBlkReq *blk_req; + + /* event resume */ + switch (log-mode ~EVENT_TAP_TYPE_MASK) { + case EVENT_TAP_NET: + event_tap_net_flush(log-net_req); + break; + case EVENT_TAP_BLK: + blk_req = log-blk_req; + if ((log-mode EVENT_TAP_TYPE_MASK) == EVENT_TAP_IOPORT) { + switch (log-ioport.index) { + case 0: + cpu_outb(log-ioport.address, log-ioport.data); + break; + case 1: + cpu_outw(log-ioport.address, log-ioport.data); + break; + case 2: + cpu_outl(log-ioport.address, log-ioport.data); + break; + } + } else { + /* EVENT_TAP_MMIO */ + cpu_physical_memory_rw(log-mmio.address, + log-mmio.buf, + log-mmio.len, 1); + } + break; Why are net tx packets replayed at the net level but blk requests are replayed at the pio/mmio level? I expected everything to replay either as pio/mmio or as net/block. Stefan, After doing some heavy load tests, I realized that we have to take a hybrid approach to replay for now. This is because when a device moves to the next state (e.g. virtio decreases inuse) is different between net and block. For example, virtio-net decreases inuse upon returning from the net layer, but virtio-blk does that inside of the callback. For TX, virtio-net calls virtqueue_push from virtio_net_tx_complete. For RX, virtio-net calls virtqueue_flush from virtio_net_receive. Both are invoked from a callback. If we only use pio/mmio replay, even though event-tap tries to replay net requests, some get lost because the state has proceeded already. It seems that all you need to do to avoid this is to delay the callback? Yeah, if it's possible. But if you take a look at virtio-net, you'll see that virtio_push is called immediately after calling qemu_sendv_packet while virtio-blk does that in the callback. This is only if the packet was sent immediately. I was referring to the case where the packet is queued. I see. I usually don't see packets get queued in the net layer. What would be the effect to devices? Restraint sending packets? Yes. This doesn't happen with block, because the state is still old enough to replay. Note that using hybrid approach won't cause duplicated requests on the secondary. An assumption
Re: [Qemu-devel] Re: [PATCH 09/21] Introduce event-tap.
On Thu, Jan 06, 2011 at 05:47:27PM +0900, Yoshiaki Tamura wrote: 2011/1/4 Michael S. Tsirkin m...@redhat.com: On Tue, Jan 04, 2011 at 10:45:13PM +0900, Yoshiaki Tamura wrote: 2011/1/4 Michael S. Tsirkin m...@redhat.com: On Tue, Jan 04, 2011 at 09:20:53PM +0900, Yoshiaki Tamura wrote: 2011/1/4 Michael S. Tsirkin m...@redhat.com: On Tue, Jan 04, 2011 at 08:02:54PM +0900, Yoshiaki Tamura wrote: 2010/11/29 Stefan Hajnoczi stefa...@gmail.com: On Thu, Nov 25, 2010 at 6:06 AM, Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp wrote: event-tap controls when to start FT transaction, and provides proxy functions to called from net/block devices. While FT transaction, it queues up net/block requests, and flush them when the transaction gets completed. Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp Signed-off-by: OHMURA Kei ohmura@lab.ntt.co.jp --- Makefile.target | 1 + block.h | 9 + event-tap.c | 794 +++ event-tap.h | 34 +++ net.h | 4 + net/queue.c | 1 + 6 files changed, 843 insertions(+), 0 deletions(-) create mode 100644 event-tap.c create mode 100644 event-tap.h event_tap_state is checked at the beginning of several functions. If there is an unexpected state the function silently returns. Should these checks really be assert() so there is an abort and backtrace if the program ever reaches this state? +typedef struct EventTapBlkReq { + char *device_name; + int num_reqs; + int num_cbs; + bool is_multiwrite; Is multiwrite logging necessary? If event tap is called from within the block layer then multiwrite is turned into one or more bdrv_aio_writev() calls. +static void event_tap_replay(void *opaque, int running, int reason) +{ + EventTapLog *log, *next; + + if (!running) { + return; + } + + if (event_tap_state != EVENT_TAP_LOAD) { + return; + } + + event_tap_state = EVENT_TAP_REPLAY; + + QTAILQ_FOREACH(log, event_list, node) { + EventTapBlkReq *blk_req; + + /* event resume */ + switch (log-mode ~EVENT_TAP_TYPE_MASK) { + case EVENT_TAP_NET: + event_tap_net_flush(log-net_req); + break; + case EVENT_TAP_BLK: + blk_req = log-blk_req; + if ((log-mode EVENT_TAP_TYPE_MASK) == EVENT_TAP_IOPORT) { + switch (log-ioport.index) { + case 0: + cpu_outb(log-ioport.address, log-ioport.data); + break; + case 1: + cpu_outw(log-ioport.address, log-ioport.data); + break; + case 2: + cpu_outl(log-ioport.address, log-ioport.data); + break; + } + } else { + /* EVENT_TAP_MMIO */ + cpu_physical_memory_rw(log-mmio.address, + log-mmio.buf, + log-mmio.len, 1); + } + break; Why are net tx packets replayed at the net level but blk requests are replayed at the pio/mmio level? I expected everything to replay either as pio/mmio or as net/block. Stefan, After doing some heavy load tests, I realized that we have to take a hybrid approach to replay for now. This is because when a device moves to the next state (e.g. virtio decreases inuse) is different between net and block. For example, virtio-net decreases inuse upon returning from the net layer, but virtio-blk does that inside of the callback. For TX, virtio-net calls virtqueue_push from virtio_net_tx_complete. For RX, virtio-net calls virtqueue_flush from virtio_net_receive. Both are invoked from a callback. If we only use pio/mmio replay, even though event-tap tries to replay net requests, some get lost because the state has proceeded already. It seems that all you need to do to avoid this is to delay the callback? Yeah, if it's possible. But if you take a look at virtio-net, you'll see that virtio_push is called immediately after calling qemu_sendv_packet while virtio-blk does that in the callback. This is only if the packet was sent immediately. I was referring to the case where the packet is queued. I see. I usually don't see packets get queued in the net layer. What would be the effect to devices?
Re: [Qemu-devel] Re: [PATCH 09/21] Introduce event-tap.
2011/1/6 Michael S. Tsirkin m...@redhat.com: On Thu, Jan 06, 2011 at 05:47:27PM +0900, Yoshiaki Tamura wrote: 2011/1/4 Michael S. Tsirkin m...@redhat.com: On Tue, Jan 04, 2011 at 10:45:13PM +0900, Yoshiaki Tamura wrote: 2011/1/4 Michael S. Tsirkin m...@redhat.com: On Tue, Jan 04, 2011 at 09:20:53PM +0900, Yoshiaki Tamura wrote: 2011/1/4 Michael S. Tsirkin m...@redhat.com: On Tue, Jan 04, 2011 at 08:02:54PM +0900, Yoshiaki Tamura wrote: 2010/11/29 Stefan Hajnoczi stefa...@gmail.com: On Thu, Nov 25, 2010 at 6:06 AM, Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp wrote: event-tap controls when to start FT transaction, and provides proxy functions to called from net/block devices. While FT transaction, it queues up net/block requests, and flush them when the transaction gets completed. Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp Signed-off-by: OHMURA Kei ohmura@lab.ntt.co.jp --- Makefile.target | 1 + block.h | 9 + event-tap.c | 794 +++ event-tap.h | 34 +++ net.h | 4 + net/queue.c | 1 + 6 files changed, 843 insertions(+), 0 deletions(-) create mode 100644 event-tap.c create mode 100644 event-tap.h event_tap_state is checked at the beginning of several functions. If there is an unexpected state the function silently returns. Should these checks really be assert() so there is an abort and backtrace if the program ever reaches this state? +typedef struct EventTapBlkReq { + char *device_name; + int num_reqs; + int num_cbs; + bool is_multiwrite; Is multiwrite logging necessary? If event tap is called from within the block layer then multiwrite is turned into one or more bdrv_aio_writev() calls. +static void event_tap_replay(void *opaque, int running, int reason) +{ + EventTapLog *log, *next; + + if (!running) { + return; + } + + if (event_tap_state != EVENT_TAP_LOAD) { + return; + } + + event_tap_state = EVENT_TAP_REPLAY; + + QTAILQ_FOREACH(log, event_list, node) { + EventTapBlkReq *blk_req; + + /* event resume */ + switch (log-mode ~EVENT_TAP_TYPE_MASK) { + case EVENT_TAP_NET: + event_tap_net_flush(log-net_req); + break; + case EVENT_TAP_BLK: + blk_req = log-blk_req; + if ((log-mode EVENT_TAP_TYPE_MASK) == EVENT_TAP_IOPORT) { + switch (log-ioport.index) { + case 0: + cpu_outb(log-ioport.address, log-ioport.data); + break; + case 1: + cpu_outw(log-ioport.address, log-ioport.data); + break; + case 2: + cpu_outl(log-ioport.address, log-ioport.data); + break; + } + } else { + /* EVENT_TAP_MMIO */ + cpu_physical_memory_rw(log-mmio.address, + log-mmio.buf, + log-mmio.len, 1); + } + break; Why are net tx packets replayed at the net level but blk requests are replayed at the pio/mmio level? I expected everything to replay either as pio/mmio or as net/block. Stefan, After doing some heavy load tests, I realized that we have to take a hybrid approach to replay for now. This is because when a device moves to the next state (e.g. virtio decreases inuse) is different between net and block. For example, virtio-net decreases inuse upon returning from the net layer, but virtio-blk does that inside of the callback. For TX, virtio-net calls virtqueue_push from virtio_net_tx_complete. For RX, virtio-net calls virtqueue_flush from virtio_net_receive. Both are invoked from a callback. If we only use pio/mmio replay, even though event-tap tries to replay net requests, some get lost because the state has proceeded already. It seems that all you need to do to avoid this is to delay the callback? Yeah, if it's possible. But if you take a look at virtio-net, you'll see that virtio_push is called immediately after calling qemu_sendv_packet while virtio-blk does that in the callback. This is only if the packet was sent immediately. I was referring to the case where the packet is queued. I see. I usually don't see packets get queued in the
Re: [PATCH 09/21] Introduce event-tap.
2010/11/29 Stefan Hajnoczi stefa...@gmail.com: On Thu, Nov 25, 2010 at 6:06 AM, Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp wrote: event-tap controls when to start FT transaction, and provides proxy functions to called from net/block devices. While FT transaction, it queues up net/block requests, and flush them when the transaction gets completed. Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp Signed-off-by: OHMURA Kei ohmura@lab.ntt.co.jp --- Makefile.target | 1 + block.h | 9 + event-tap.c | 794 +++ event-tap.h | 34 +++ net.h | 4 + net/queue.c | 1 + 6 files changed, 843 insertions(+), 0 deletions(-) create mode 100644 event-tap.c create mode 100644 event-tap.h event_tap_state is checked at the beginning of several functions. If there is an unexpected state the function silently returns. Should these checks really be assert() so there is an abort and backtrace if the program ever reaches this state? +typedef struct EventTapBlkReq { + char *device_name; + int num_reqs; + int num_cbs; + bool is_multiwrite; Is multiwrite logging necessary? If event tap is called from within the block layer then multiwrite is turned into one or more bdrv_aio_writev() calls. +static void event_tap_replay(void *opaque, int running, int reason) +{ + EventTapLog *log, *next; + + if (!running) { + return; + } + + if (event_tap_state != EVENT_TAP_LOAD) { + return; + } + + event_tap_state = EVENT_TAP_REPLAY; + + QTAILQ_FOREACH(log, event_list, node) { + EventTapBlkReq *blk_req; + + /* event resume */ + switch (log-mode ~EVENT_TAP_TYPE_MASK) { + case EVENT_TAP_NET: + event_tap_net_flush(log-net_req); + break; + case EVENT_TAP_BLK: + blk_req = log-blk_req; + if ((log-mode EVENT_TAP_TYPE_MASK) == EVENT_TAP_IOPORT) { + switch (log-ioport.index) { + case 0: + cpu_outb(log-ioport.address, log-ioport.data); + break; + case 1: + cpu_outw(log-ioport.address, log-ioport.data); + break; + case 2: + cpu_outl(log-ioport.address, log-ioport.data); + break; + } + } else { + /* EVENT_TAP_MMIO */ + cpu_physical_memory_rw(log-mmio.address, + log-mmio.buf, + log-mmio.len, 1); + } + break; Why are net tx packets replayed at the net level but blk requests are replayed at the pio/mmio level? I expected everything to replay either as pio/mmio or as net/block. Stefan, After doing some heavy load tests, I realized that we have to take a hybrid approach to replay for now. This is because when a device moves to the next state (e.g. virtio decreases inuse) is different between net and block. For example, virtio-net decreases inuse upon returning from the net layer, but virtio-blk does that inside of the callback. If we only use pio/mmio replay, even though event-tap tries to replay net requests, some get lost because the state has proceeded already. This doesn't happen with block, because the state is still old enough to replay. Note that using hybrid approach won't cause duplicated requests on the secondary. Thanks, Yoshi -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 09/21] Introduce event-tap.
On Tue, Jan 4, 2011 at 11:02 AM, Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp wrote: After doing some heavy load tests, I realized that we have to take a hybrid approach to replay for now. This is because when a device moves to the next state (e.g. virtio decreases inuse) is different between net and block. For example, virtio-net decreases inuse upon returning from the net layer, but virtio-blk does that inside of the callback. If we only use pio/mmio replay, even though event-tap tries to replay net requests, some get lost because the state has proceeded already. This doesn't happen with block, because the state is still old enough to replay. Note that using hybrid approach won't cause duplicated requests on the secondary. Thanks Yoshi. I think I understand what you're saying. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 09/21] Introduce event-tap.
On Tue, Jan 04, 2011 at 08:02:54PM +0900, Yoshiaki Tamura wrote: 2010/11/29 Stefan Hajnoczi stefa...@gmail.com: On Thu, Nov 25, 2010 at 6:06 AM, Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp wrote: event-tap controls when to start FT transaction, and provides proxy functions to called from net/block devices. While FT transaction, it queues up net/block requests, and flush them when the transaction gets completed. Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp Signed-off-by: OHMURA Kei ohmura@lab.ntt.co.jp --- Makefile.target | 1 + block.h | 9 + event-tap.c | 794 +++ event-tap.h | 34 +++ net.h | 4 + net/queue.c | 1 + 6 files changed, 843 insertions(+), 0 deletions(-) create mode 100644 event-tap.c create mode 100644 event-tap.h event_tap_state is checked at the beginning of several functions. If there is an unexpected state the function silently returns. Should these checks really be assert() so there is an abort and backtrace if the program ever reaches this state? +typedef struct EventTapBlkReq { + char *device_name; + int num_reqs; + int num_cbs; + bool is_multiwrite; Is multiwrite logging necessary? If event tap is called from within the block layer then multiwrite is turned into one or more bdrv_aio_writev() calls. +static void event_tap_replay(void *opaque, int running, int reason) +{ + EventTapLog *log, *next; + + if (!running) { + return; + } + + if (event_tap_state != EVENT_TAP_LOAD) { + return; + } + + event_tap_state = EVENT_TAP_REPLAY; + + QTAILQ_FOREACH(log, event_list, node) { + EventTapBlkReq *blk_req; + + /* event resume */ + switch (log-mode ~EVENT_TAP_TYPE_MASK) { + case EVENT_TAP_NET: + event_tap_net_flush(log-net_req); + break; + case EVENT_TAP_BLK: + blk_req = log-blk_req; + if ((log-mode EVENT_TAP_TYPE_MASK) == EVENT_TAP_IOPORT) { + switch (log-ioport.index) { + case 0: + cpu_outb(log-ioport.address, log-ioport.data); + break; + case 1: + cpu_outw(log-ioport.address, log-ioport.data); + break; + case 2: + cpu_outl(log-ioport.address, log-ioport.data); + break; + } + } else { + /* EVENT_TAP_MMIO */ + cpu_physical_memory_rw(log-mmio.address, + log-mmio.buf, + log-mmio.len, 1); + } + break; Why are net tx packets replayed at the net level but blk requests are replayed at the pio/mmio level? I expected everything to replay either as pio/mmio or as net/block. Stefan, After doing some heavy load tests, I realized that we have to take a hybrid approach to replay for now. This is because when a device moves to the next state (e.g. virtio decreases inuse) is different between net and block. For example, virtio-net decreases inuse upon returning from the net layer, but virtio-blk does that inside of the callback. For TX, virtio-net calls virtqueue_push from virtio_net_tx_complete. For RX, virtio-net calls virtqueue_flush from virtio_net_receive. Both are invoked from a callback. If we only use pio/mmio replay, even though event-tap tries to replay net requests, some get lost because the state has proceeded already. It seems that all you need to do to avoid this is to delay the callback? This doesn't happen with block, because the state is still old enough to replay. Note that using hybrid approach won't cause duplicated requests on the secondary. An assumption devices make is that a buffer is unused once completion callback was invoked. Does this violate that assumption? -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Re: [PATCH 09/21] Introduce event-tap.
2011/1/4 Michael S. Tsirkin m...@redhat.com: On Tue, Jan 04, 2011 at 08:02:54PM +0900, Yoshiaki Tamura wrote: 2010/11/29 Stefan Hajnoczi stefa...@gmail.com: On Thu, Nov 25, 2010 at 6:06 AM, Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp wrote: event-tap controls when to start FT transaction, and provides proxy functions to called from net/block devices. While FT transaction, it queues up net/block requests, and flush them when the transaction gets completed. Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp Signed-off-by: OHMURA Kei ohmura@lab.ntt.co.jp --- Makefile.target | 1 + block.h | 9 + event-tap.c | 794 +++ event-tap.h | 34 +++ net.h | 4 + net/queue.c | 1 + 6 files changed, 843 insertions(+), 0 deletions(-) create mode 100644 event-tap.c create mode 100644 event-tap.h event_tap_state is checked at the beginning of several functions. If there is an unexpected state the function silently returns. Should these checks really be assert() so there is an abort and backtrace if the program ever reaches this state? +typedef struct EventTapBlkReq { + char *device_name; + int num_reqs; + int num_cbs; + bool is_multiwrite; Is multiwrite logging necessary? If event tap is called from within the block layer then multiwrite is turned into one or more bdrv_aio_writev() calls. +static void event_tap_replay(void *opaque, int running, int reason) +{ + EventTapLog *log, *next; + + if (!running) { + return; + } + + if (event_tap_state != EVENT_TAP_LOAD) { + return; + } + + event_tap_state = EVENT_TAP_REPLAY; + + QTAILQ_FOREACH(log, event_list, node) { + EventTapBlkReq *blk_req; + + /* event resume */ + switch (log-mode ~EVENT_TAP_TYPE_MASK) { + case EVENT_TAP_NET: + event_tap_net_flush(log-net_req); + break; + case EVENT_TAP_BLK: + blk_req = log-blk_req; + if ((log-mode EVENT_TAP_TYPE_MASK) == EVENT_TAP_IOPORT) { + switch (log-ioport.index) { + case 0: + cpu_outb(log-ioport.address, log-ioport.data); + break; + case 1: + cpu_outw(log-ioport.address, log-ioport.data); + break; + case 2: + cpu_outl(log-ioport.address, log-ioport.data); + break; + } + } else { + /* EVENT_TAP_MMIO */ + cpu_physical_memory_rw(log-mmio.address, + log-mmio.buf, + log-mmio.len, 1); + } + break; Why are net tx packets replayed at the net level but blk requests are replayed at the pio/mmio level? I expected everything to replay either as pio/mmio or as net/block. Stefan, After doing some heavy load tests, I realized that we have to take a hybrid approach to replay for now. This is because when a device moves to the next state (e.g. virtio decreases inuse) is different between net and block. For example, virtio-net decreases inuse upon returning from the net layer, but virtio-blk does that inside of the callback. For TX, virtio-net calls virtqueue_push from virtio_net_tx_complete. For RX, virtio-net calls virtqueue_flush from virtio_net_receive. Both are invoked from a callback. If we only use pio/mmio replay, even though event-tap tries to replay net requests, some get lost because the state has proceeded already. It seems that all you need to do to avoid this is to delay the callback? Yeah, if it's possible. But if you take a look at virtio-net, you'll see that virtio_push is called immediately after calling qemu_sendv_packet while virtio-blk does that in the callback. This doesn't happen with block, because the state is still old enough to replay. Note that using hybrid approach won't cause duplicated requests on the secondary. An assumption devices make is that a buffer is unused once completion callback was invoked. Does this violate that assumption? No, it shouldn't. In case of net with net layer replay, we copy the content of the requests, and in case of block, because we haven't called the callback yet, the requests remains fresh. Yoshi -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Re: [PATCH 09/21] Introduce event-tap.
On Tue, Jan 04, 2011 at 09:20:53PM +0900, Yoshiaki Tamura wrote: 2011/1/4 Michael S. Tsirkin m...@redhat.com: On Tue, Jan 04, 2011 at 08:02:54PM +0900, Yoshiaki Tamura wrote: 2010/11/29 Stefan Hajnoczi stefa...@gmail.com: On Thu, Nov 25, 2010 at 6:06 AM, Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp wrote: event-tap controls when to start FT transaction, and provides proxy functions to called from net/block devices. While FT transaction, it queues up net/block requests, and flush them when the transaction gets completed. Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp Signed-off-by: OHMURA Kei ohmura@lab.ntt.co.jp --- Makefile.target | 1 + block.h | 9 + event-tap.c | 794 +++ event-tap.h | 34 +++ net.h | 4 + net/queue.c | 1 + 6 files changed, 843 insertions(+), 0 deletions(-) create mode 100644 event-tap.c create mode 100644 event-tap.h event_tap_state is checked at the beginning of several functions. If there is an unexpected state the function silently returns. Should these checks really be assert() so there is an abort and backtrace if the program ever reaches this state? +typedef struct EventTapBlkReq { + char *device_name; + int num_reqs; + int num_cbs; + bool is_multiwrite; Is multiwrite logging necessary? If event tap is called from within the block layer then multiwrite is turned into one or more bdrv_aio_writev() calls. +static void event_tap_replay(void *opaque, int running, int reason) +{ + EventTapLog *log, *next; + + if (!running) { + return; + } + + if (event_tap_state != EVENT_TAP_LOAD) { + return; + } + + event_tap_state = EVENT_TAP_REPLAY; + + QTAILQ_FOREACH(log, event_list, node) { + EventTapBlkReq *blk_req; + + /* event resume */ + switch (log-mode ~EVENT_TAP_TYPE_MASK) { + case EVENT_TAP_NET: + event_tap_net_flush(log-net_req); + break; + case EVENT_TAP_BLK: + blk_req = log-blk_req; + if ((log-mode EVENT_TAP_TYPE_MASK) == EVENT_TAP_IOPORT) { + switch (log-ioport.index) { + case 0: + cpu_outb(log-ioport.address, log-ioport.data); + break; + case 1: + cpu_outw(log-ioport.address, log-ioport.data); + break; + case 2: + cpu_outl(log-ioport.address, log-ioport.data); + break; + } + } else { + /* EVENT_TAP_MMIO */ + cpu_physical_memory_rw(log-mmio.address, + log-mmio.buf, + log-mmio.len, 1); + } + break; Why are net tx packets replayed at the net level but blk requests are replayed at the pio/mmio level? I expected everything to replay either as pio/mmio or as net/block. Stefan, After doing some heavy load tests, I realized that we have to take a hybrid approach to replay for now. This is because when a device moves to the next state (e.g. virtio decreases inuse) is different between net and block. For example, virtio-net decreases inuse upon returning from the net layer, but virtio-blk does that inside of the callback. For TX, virtio-net calls virtqueue_push from virtio_net_tx_complete. For RX, virtio-net calls virtqueue_flush from virtio_net_receive. Both are invoked from a callback. If we only use pio/mmio replay, even though event-tap tries to replay net requests, some get lost because the state has proceeded already. It seems that all you need to do to avoid this is to delay the callback? Yeah, if it's possible. But if you take a look at virtio-net, you'll see that virtio_push is called immediately after calling qemu_sendv_packet while virtio-blk does that in the callback. This is only if the packet was sent immediately. I was referring to the case where the packet is queued. This doesn't happen with block, because the state is still old enough to replay. Note that using hybrid approach won't cause duplicated requests on the secondary. An assumption devices make is that a buffer is unused once completion callback was invoked. Does this violate that assumption? No, it shouldn't. In case of net with net layer replay, we copy the content of the requests, and in case of block, because we haven't called the callback yet, the requests remains fresh. Yoshi Yes, as long as you copy it should be fine. Maybe it's a good idea for event-tap to queue all packets to avoid the copy and avoid
Re: [Qemu-devel] Re: [PATCH 09/21] Introduce event-tap.
2011/1/4 Michael S. Tsirkin m...@redhat.com: On Tue, Jan 04, 2011 at 09:20:53PM +0900, Yoshiaki Tamura wrote: 2011/1/4 Michael S. Tsirkin m...@redhat.com: On Tue, Jan 04, 2011 at 08:02:54PM +0900, Yoshiaki Tamura wrote: 2010/11/29 Stefan Hajnoczi stefa...@gmail.com: On Thu, Nov 25, 2010 at 6:06 AM, Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp wrote: event-tap controls when to start FT transaction, and provides proxy functions to called from net/block devices. While FT transaction, it queues up net/block requests, and flush them when the transaction gets completed. Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp Signed-off-by: OHMURA Kei ohmura@lab.ntt.co.jp --- Makefile.target | 1 + block.h | 9 + event-tap.c | 794 +++ event-tap.h | 34 +++ net.h | 4 + net/queue.c | 1 + 6 files changed, 843 insertions(+), 0 deletions(-) create mode 100644 event-tap.c create mode 100644 event-tap.h event_tap_state is checked at the beginning of several functions. If there is an unexpected state the function silently returns. Should these checks really be assert() so there is an abort and backtrace if the program ever reaches this state? +typedef struct EventTapBlkReq { + char *device_name; + int num_reqs; + int num_cbs; + bool is_multiwrite; Is multiwrite logging necessary? If event tap is called from within the block layer then multiwrite is turned into one or more bdrv_aio_writev() calls. +static void event_tap_replay(void *opaque, int running, int reason) +{ + EventTapLog *log, *next; + + if (!running) { + return; + } + + if (event_tap_state != EVENT_TAP_LOAD) { + return; + } + + event_tap_state = EVENT_TAP_REPLAY; + + QTAILQ_FOREACH(log, event_list, node) { + EventTapBlkReq *blk_req; + + /* event resume */ + switch (log-mode ~EVENT_TAP_TYPE_MASK) { + case EVENT_TAP_NET: + event_tap_net_flush(log-net_req); + break; + case EVENT_TAP_BLK: + blk_req = log-blk_req; + if ((log-mode EVENT_TAP_TYPE_MASK) == EVENT_TAP_IOPORT) { + switch (log-ioport.index) { + case 0: + cpu_outb(log-ioport.address, log-ioport.data); + break; + case 1: + cpu_outw(log-ioport.address, log-ioport.data); + break; + case 2: + cpu_outl(log-ioport.address, log-ioport.data); + break; + } + } else { + /* EVENT_TAP_MMIO */ + cpu_physical_memory_rw(log-mmio.address, + log-mmio.buf, + log-mmio.len, 1); + } + break; Why are net tx packets replayed at the net level but blk requests are replayed at the pio/mmio level? I expected everything to replay either as pio/mmio or as net/block. Stefan, After doing some heavy load tests, I realized that we have to take a hybrid approach to replay for now. This is because when a device moves to the next state (e.g. virtio decreases inuse) is different between net and block. For example, virtio-net decreases inuse upon returning from the net layer, but virtio-blk does that inside of the callback. For TX, virtio-net calls virtqueue_push from virtio_net_tx_complete. For RX, virtio-net calls virtqueue_flush from virtio_net_receive. Both are invoked from a callback. If we only use pio/mmio replay, even though event-tap tries to replay net requests, some get lost because the state has proceeded already. It seems that all you need to do to avoid this is to delay the callback? Yeah, if it's possible. But if you take a look at virtio-net, you'll see that virtio_push is called immediately after calling qemu_sendv_packet while virtio-blk does that in the callback. This is only if the packet was sent immediately. I was referring to the case where the packet is queued. I see. I usually don't see packets get queued in the net layer. What would be the effect to devices? Restraint sending packets? This doesn't happen with block, because the state is still old enough to replay. Note that using hybrid approach won't cause duplicated requests on the secondary. An assumption devices make is that a buffer is unused once completion callback was invoked. Does this violate that assumption? No, it shouldn't. In case of net with net layer replay, we copy the content of the requests, and in case of block, because we haven't called the
Re: [Qemu-devel] Re: [PATCH 09/21] Introduce event-tap.
On Tue, Jan 04, 2011 at 10:45:13PM +0900, Yoshiaki Tamura wrote: 2011/1/4 Michael S. Tsirkin m...@redhat.com: On Tue, Jan 04, 2011 at 09:20:53PM +0900, Yoshiaki Tamura wrote: 2011/1/4 Michael S. Tsirkin m...@redhat.com: On Tue, Jan 04, 2011 at 08:02:54PM +0900, Yoshiaki Tamura wrote: 2010/11/29 Stefan Hajnoczi stefa...@gmail.com: On Thu, Nov 25, 2010 at 6:06 AM, Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp wrote: event-tap controls when to start FT transaction, and provides proxy functions to called from net/block devices. While FT transaction, it queues up net/block requests, and flush them when the transaction gets completed. Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp Signed-off-by: OHMURA Kei ohmura@lab.ntt.co.jp --- Makefile.target | 1 + block.h | 9 + event-tap.c | 794 +++ event-tap.h | 34 +++ net.h | 4 + net/queue.c | 1 + 6 files changed, 843 insertions(+), 0 deletions(-) create mode 100644 event-tap.c create mode 100644 event-tap.h event_tap_state is checked at the beginning of several functions. If there is an unexpected state the function silently returns. Should these checks really be assert() so there is an abort and backtrace if the program ever reaches this state? +typedef struct EventTapBlkReq { + char *device_name; + int num_reqs; + int num_cbs; + bool is_multiwrite; Is multiwrite logging necessary? If event tap is called from within the block layer then multiwrite is turned into one or more bdrv_aio_writev() calls. +static void event_tap_replay(void *opaque, int running, int reason) +{ + EventTapLog *log, *next; + + if (!running) { + return; + } + + if (event_tap_state != EVENT_TAP_LOAD) { + return; + } + + event_tap_state = EVENT_TAP_REPLAY; + + QTAILQ_FOREACH(log, event_list, node) { + EventTapBlkReq *blk_req; + + /* event resume */ + switch (log-mode ~EVENT_TAP_TYPE_MASK) { + case EVENT_TAP_NET: + event_tap_net_flush(log-net_req); + break; + case EVENT_TAP_BLK: + blk_req = log-blk_req; + if ((log-mode EVENT_TAP_TYPE_MASK) == EVENT_TAP_IOPORT) { + switch (log-ioport.index) { + case 0: + cpu_outb(log-ioport.address, log-ioport.data); + break; + case 1: + cpu_outw(log-ioport.address, log-ioport.data); + break; + case 2: + cpu_outl(log-ioport.address, log-ioport.data); + break; + } + } else { + /* EVENT_TAP_MMIO */ + cpu_physical_memory_rw(log-mmio.address, + log-mmio.buf, + log-mmio.len, 1); + } + break; Why are net tx packets replayed at the net level but blk requests are replayed at the pio/mmio level? I expected everything to replay either as pio/mmio or as net/block. Stefan, After doing some heavy load tests, I realized that we have to take a hybrid approach to replay for now. This is because when a device moves to the next state (e.g. virtio decreases inuse) is different between net and block. For example, virtio-net decreases inuse upon returning from the net layer, but virtio-blk does that inside of the callback. For TX, virtio-net calls virtqueue_push from virtio_net_tx_complete. For RX, virtio-net calls virtqueue_flush from virtio_net_receive. Both are invoked from a callback. If we only use pio/mmio replay, even though event-tap tries to replay net requests, some get lost because the state has proceeded already. It seems that all you need to do to avoid this is to delay the callback? Yeah, if it's possible. But if you take a look at virtio-net, you'll see that virtio_push is called immediately after calling qemu_sendv_packet while virtio-blk does that in the callback. This is only if the packet was sent immediately. I was referring to the case where the packet is queued. I see. I usually don't see packets get queued in the net layer. What would be the effect to devices? Restraint sending packets? Yes. This doesn't happen with block, because the state is still old enough to replay. Note that using hybrid approach won't cause duplicated requests on the secondary. An assumption devices make is that a buffer is unused once completion
Re: [Qemu-devel] Re: [PATCH 09/21] Introduce event-tap.
2010/11/30 Marcelo Tosatti mtosa...@redhat.com: On Thu, Nov 25, 2010 at 03:06:48PM +0900, Yoshiaki Tamura wrote: event-tap controls when to start FT transaction, and provides proxy functions to called from net/block devices. While FT transaction, it queues up net/block requests, and flush them when the transaction gets completed. Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp Signed-off-by: OHMURA Kei ohmura@lab.ntt.co.jp +static void event_tap_alloc_blk_req(EventTapBlkReq *blk_req, + BlockDriverState *bs, BlockRequest *reqs, + int num_reqs, BlockDriverCompletionFunc *cb, + void *opaque, bool is_multiwrite) +{ + int i; + + blk_req-num_reqs = num_reqs; + blk_req-num_cbs = num_reqs; + blk_req-device_name = qemu_strdup(bs-device_name); + blk_req-is_multiwrite = is_multiwrite; + + for (i = 0; i num_reqs; i++) { + blk_req-reqs[i].sector = reqs[i].sector; + blk_req-reqs[i].nb_sectors = reqs[i].nb_sectors; + blk_req-reqs[i].qiov = reqs[i].qiov; + blk_req-reqs[i].cb = cb; + blk_req-reqs[i].opaque = opaque; + blk_req-cb[i] = reqs[i].cb; + blk_req-opaque[i] = reqs[i].opaque; + } +} bdrv_aio_flush should also be logged, so that guest initiated flush is respected on replay. In the current implementation w/o flush logging, there might be order inversion after replay? Yoshi -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 09/21] Introduce event-tap.
2010/11/29 Stefan Hajnoczi stefa...@gmail.com: On Thu, Nov 25, 2010 at 6:06 AM, Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp wrote: event-tap controls when to start FT transaction, and provides proxy functions to called from net/block devices. While FT transaction, it queues up net/block requests, and flush them when the transaction gets completed. Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp Signed-off-by: OHMURA Kei ohmura@lab.ntt.co.jp --- Makefile.target | 1 + block.h | 9 + event-tap.c | 794 +++ event-tap.h | 34 +++ net.h | 4 + net/queue.c | 1 + 6 files changed, 843 insertions(+), 0 deletions(-) create mode 100644 event-tap.c create mode 100644 event-tap.h event_tap_state is checked at the beginning of several functions. If there is an unexpected state the function silently returns. Should these checks really be assert() so there is an abort and backtrace if the program ever reaches this state? I'm wondering whether abort is too strong, but I think you're right because stopping Kemari may not be enough as an error handling. +typedef struct EventTapBlkReq { + char *device_name; + int num_reqs; + int num_cbs; + bool is_multiwrite; Is multiwrite logging necessary? If event tap is called from within the block layer then multiwrite is turned into one or more bdrv_aio_writev() calls. If we move event-tap into block layer, I guess it won't be necessary. +static void event_tap_replay(void *opaque, int running, int reason) +{ + EventTapLog *log, *next; + + if (!running) { + return; + } + + if (event_tap_state != EVENT_TAP_LOAD) { + return; + } + + event_tap_state = EVENT_TAP_REPLAY; + + QTAILQ_FOREACH(log, event_list, node) { + EventTapBlkReq *blk_req; + + /* event resume */ + switch (log-mode ~EVENT_TAP_TYPE_MASK) { + case EVENT_TAP_NET: + event_tap_net_flush(log-net_req); + break; + case EVENT_TAP_BLK: + blk_req = log-blk_req; + if ((log-mode EVENT_TAP_TYPE_MASK) == EVENT_TAP_IOPORT) { + switch (log-ioport.index) { + case 0: + cpu_outb(log-ioport.address, log-ioport.data); + break; + case 1: + cpu_outw(log-ioport.address, log-ioport.data); + break; + case 2: + cpu_outl(log-ioport.address, log-ioport.data); + break; + } + } else { + /* EVENT_TAP_MMIO */ + cpu_physical_memory_rw(log-mmio.address, + log-mmio.buf, + log-mmio.len, 1); + } + break; Why are net tx packets replayed at the net level but blk requests are replayed at the pio/mmio level? I expected everything to replay either as pio/mmio or as net/block. It's my mistake, sorry about that. We're just in way of moving replay from pio/mmio to net/block, and I mixed up. I'll revert it to pio/mmio replay in the next spin. BTW, I would like to ask a question regarding this. There is a callback which net/block calls after processing the requests, and is there a clean way to set this callback on the failovered host upon replay? +static void event_tap_blk_load(QEMUFile *f, EventTapBlkReq *blk_req) +{ + BlockRequest *req; + ram_addr_t page_addr; + int i, j, len; + + len = qemu_get_byte(f); + blk_req-device_name = qemu_malloc(len + 1); + qemu_get_buffer(f, (uint8_t *)blk_req-device_name, len); + blk_req-device_name[len] = '\0'; + blk_req-num_reqs = qemu_get_byte(f); + + for (i = 0; i blk_req-num_reqs; i++) { + req = blk_req-reqs[i]; + req-sector = qemu_get_be64(f); + req-nb_sectors = qemu_get_be32(f); + req-qiov = qemu_malloc(sizeof(QEMUIOVector)); It would make sense to have common QEMUIOVector load/save functions instead of inlining this code here. OK. +static int event_tap_load(QEMUFile *f, void *opaque, int version_id) +{ + EventTapLog *log, *next; + int mode; + + event_tap_state = EVENT_TAP_LOAD; + + QTAILQ_FOREACH_SAFE(log, event_list, node, next) { + QTAILQ_REMOVE(event_list, log, node); + event_tap_free_log(log); + } + + /* loop until EOF */ + while ((mode = qemu_get_byte(f)) != 0) { + EventTapLog *log = event_tap_alloc_log(); + + log-mode = mode; + switch (log-mode EVENT_TAP_TYPE_MASK) { + case EVENT_TAP_IOPORT: + event_tap_ioport_load(f, log-ioport); + break; + case EVENT_TAP_MMIO: + event_tap_mmio_load(f, log-mmio); + break; + case 0: + DPRINTF(No
Re: [PATCH 09/21] Introduce event-tap.
On Tue, Nov 30, 2010 at 9:50 AM, Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp wrote: 2010/11/29 Stefan Hajnoczi stefa...@gmail.com: On Thu, Nov 25, 2010 at 6:06 AM, Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp wrote: event-tap controls when to start FT transaction, and provides proxy functions to called from net/block devices. While FT transaction, it queues up net/block requests, and flush them when the transaction gets completed. Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp Signed-off-by: OHMURA Kei ohmura@lab.ntt.co.jp --- Makefile.target | 1 + block.h | 9 + event-tap.c | 794 +++ event-tap.h | 34 +++ net.h | 4 + net/queue.c | 1 + 6 files changed, 843 insertions(+), 0 deletions(-) create mode 100644 event-tap.c create mode 100644 event-tap.h event_tap_state is checked at the beginning of several functions. If there is an unexpected state the function silently returns. Should these checks really be assert() so there is an abort and backtrace if the program ever reaches this state? Fancier error handling would work too. For example cleaning up, turning off Kemari, and producing an error message with error_report(). In that case we need to think through the state of the environment carefully and make sure we don't cause secondary failures (like memory leaks). BTW, I would like to ask a question regarding this. There is a callback which net/block calls after processing the requests, and is there a clean way to set this callback on the failovered host upon replay? I think this is a limitation in the current design. If requests are re-issued by Kemari at the net/block level, how will the higher layers know about these requests? How will they be prepared to accept callbacks? Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 09/21] Introduce event-tap.
2010/11/30 Stefan Hajnoczi stefa...@gmail.com: On Tue, Nov 30, 2010 at 9:50 AM, Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp wrote: 2010/11/29 Stefan Hajnoczi stefa...@gmail.com: On Thu, Nov 25, 2010 at 6:06 AM, Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp wrote: event-tap controls when to start FT transaction, and provides proxy functions to called from net/block devices. While FT transaction, it queues up net/block requests, and flush them when the transaction gets completed. Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp Signed-off-by: OHMURA Kei ohmura@lab.ntt.co.jp --- Makefile.target | 1 + block.h | 9 + event-tap.c | 794 +++ event-tap.h | 34 +++ net.h | 4 + net/queue.c | 1 + 6 files changed, 843 insertions(+), 0 deletions(-) create mode 100644 event-tap.c create mode 100644 event-tap.h event_tap_state is checked at the beginning of several functions. If there is an unexpected state the function silently returns. Should these checks really be assert() so there is an abort and backtrace if the program ever reaches this state? Fancier error handling would work too. For example cleaning up, turning off Kemari, and producing an error message with error_report(). In that case we need to think through the state of the environment carefully and make sure we don't cause secondary failures (like memory leaks). Turning off Kemari should include canceling the transaction which notifies the secondary. So same as you commented for ft_trans_file error handling, I would implement better handling for event-tap again. BTW, I would like to ask a question regarding this. There is a callback which net/block calls after processing the requests, and is there a clean way to set this callback on the failovered host upon replay? I think this is a limitation in the current design. If requests are re-issued by Kemari at the net/block level, how will the higher layers know about these requests? How will they be prepared to accept callbacks? That's why we're using pio/mmio replay at this moment. With a dirty hack in device emulators setting callbacks before replay, block/net replay seems to work, but I don't think that to be a correct solution. Yoshi Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Re: [PATCH 09/21] Introduce event-tap.
On Tue, Nov 30, 2010 at 06:28:55PM +0900, Yoshiaki Tamura wrote: 2010/11/30 Marcelo Tosatti mtosa...@redhat.com: On Thu, Nov 25, 2010 at 03:06:48PM +0900, Yoshiaki Tamura wrote: event-tap controls when to start FT transaction, and provides proxy functions to called from net/block devices. While FT transaction, it queues up net/block requests, and flush them when the transaction gets completed. Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp Signed-off-by: OHMURA Kei ohmura@lab.ntt.co.jp +static void event_tap_alloc_blk_req(EventTapBlkReq *blk_req, + BlockDriverState *bs, BlockRequest *reqs, + int num_reqs, BlockDriverCompletionFunc *cb, + void *opaque, bool is_multiwrite) +{ + int i; + + blk_req-num_reqs = num_reqs; + blk_req-num_cbs = num_reqs; + blk_req-device_name = qemu_strdup(bs-device_name); + blk_req-is_multiwrite = is_multiwrite; + + for (i = 0; i num_reqs; i++) { + blk_req-reqs[i].sector = reqs[i].sector; + blk_req-reqs[i].nb_sectors = reqs[i].nb_sectors; + blk_req-reqs[i].qiov = reqs[i].qiov; + blk_req-reqs[i].cb = cb; + blk_req-reqs[i].opaque = opaque; + blk_req-cb[i] = reqs[i].cb; + blk_req-opaque[i] = reqs[i].opaque; + } +} bdrv_aio_flush should also be logged, so that guest initiated flush is respected on replay. In the current implementation w/o flush logging, there might be order inversion after replay? Yoshi Yes, since a vcpu is allowed to continue after synchronization is scheduled via a bh. For virtio-blk, for example: 1) bdrv_aio_write, event queued. 2) bdrv_aio_flush 3) bdrv_aio_write, event queued. On replay, there is no flush between the two writes. Why can't synchronization be done from event-tap itself, synchronously, to avoid this kind of problem? The way you hook synchronization into savevm seems unclean. Perhaps better separation between standard savevm path and FT savevm would make it cleaner. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Re: [PATCH 09/21] Introduce event-tap.
Marcelo Tosatti wrote: On Tue, Nov 30, 2010 at 06:28:55PM +0900, Yoshiaki Tamura wrote: 2010/11/30 Marcelo Tosattimtosa...@redhat.com: On Thu, Nov 25, 2010 at 03:06:48PM +0900, Yoshiaki Tamura wrote: event-tap controls when to start FT transaction, and provides proxy functions to called from net/block devices. While FT transaction, it queues up net/block requests, and flush them when the transaction gets completed. Signed-off-by: Yoshiaki Tamuratamura.yoshi...@lab.ntt.co.jp Signed-off-by: OHMURA Keiohmura@lab.ntt.co.jp +static void event_tap_alloc_blk_req(EventTapBlkReq *blk_req, +BlockDriverState *bs, BlockRequest *reqs, +int num_reqs, BlockDriverCompletionFunc *cb, +void *opaque, bool is_multiwrite) +{ +int i; + +blk_req-num_reqs = num_reqs; +blk_req-num_cbs = num_reqs; +blk_req-device_name = qemu_strdup(bs-device_name); +blk_req-is_multiwrite = is_multiwrite; + +for (i = 0; i num_reqs; i++) { +blk_req-reqs[i].sector = reqs[i].sector; +blk_req-reqs[i].nb_sectors = reqs[i].nb_sectors; +blk_req-reqs[i].qiov = reqs[i].qiov; +blk_req-reqs[i].cb = cb; +blk_req-reqs[i].opaque = opaque; +blk_req-cb[i] = reqs[i].cb; +blk_req-opaque[i] = reqs[i].opaque; +} +} bdrv_aio_flush should also be logged, so that guest initiated flush is respected on replay. In the current implementation w/o flush logging, there might be order inversion after replay? Yoshi Yes, since a vcpu is allowed to continue after synchronization is scheduled via a bh. For virtio-blk, for example: 1) bdrv_aio_write, event queued. 2) bdrv_aio_flush 3) bdrv_aio_write, event queued. On replay, there is no flush between the two writes. Why can't synchronization be done from event-tap itself, synchronously, to avoid this kind of problem? Thanks. I would fix it. The way you hook synchronization into savevm seems unclean. Perhaps better separation between standard savevm path and FT savevm would make it cleaner. I think you're mentioning about the changes in migration.c? Yoshi -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Re: [PATCH 09/21] Introduce event-tap.
On Tue, Nov 30, 2010 at 07:35:54PM +0900, Yoshiaki Tamura wrote: Marcelo Tosatti wrote: On Tue, Nov 30, 2010 at 06:28:55PM +0900, Yoshiaki Tamura wrote: 2010/11/30 Marcelo Tosattimtosa...@redhat.com: On Thu, Nov 25, 2010 at 03:06:48PM +0900, Yoshiaki Tamura wrote: event-tap controls when to start FT transaction, and provides proxy functions to called from net/block devices. While FT transaction, it queues up net/block requests, and flush them when the transaction gets completed. Signed-off-by: Yoshiaki Tamuratamura.yoshi...@lab.ntt.co.jp Signed-off-by: OHMURA Keiohmura@lab.ntt.co.jp +static void event_tap_alloc_blk_req(EventTapBlkReq *blk_req, +BlockDriverState *bs, BlockRequest *reqs, +int num_reqs, BlockDriverCompletionFunc *cb, +void *opaque, bool is_multiwrite) +{ +int i; + +blk_req-num_reqs = num_reqs; +blk_req-num_cbs = num_reqs; +blk_req-device_name = qemu_strdup(bs-device_name); +blk_req-is_multiwrite = is_multiwrite; + +for (i = 0; i num_reqs; i++) { +blk_req-reqs[i].sector = reqs[i].sector; +blk_req-reqs[i].nb_sectors = reqs[i].nb_sectors; +blk_req-reqs[i].qiov = reqs[i].qiov; +blk_req-reqs[i].cb = cb; +blk_req-reqs[i].opaque = opaque; +blk_req-cb[i] = reqs[i].cb; +blk_req-opaque[i] = reqs[i].opaque; +} +} bdrv_aio_flush should also be logged, so that guest initiated flush is respected on replay. In the current implementation w/o flush logging, there might be order inversion after replay? Yoshi Yes, since a vcpu is allowed to continue after synchronization is scheduled via a bh. For virtio-blk, for example: 1) bdrv_aio_write, event queued. 2) bdrv_aio_flush 3) bdrv_aio_write, event queued. On replay, there is no flush between the two writes. Why can't synchronization be done from event-tap itself, synchronously, to avoid this kind of problem? Thanks. I would fix it. The way you hook synchronization into savevm seems unclean. Perhaps better separation between standard savevm path and FT savevm would make it cleaner. I think you're mentioning about the changes in migration.c? Yoshi The important point is to stop vcpu activity after the event is queued, and resume once synchronization is performed. Stopping the vm after Kemari event queueing should do it, once Michael's stable migration patchset is in (and net/block layers fixed). -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 09/21] Introduce event-tap.
On Thu, Nov 25, 2010 at 6:06 AM, Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp wrote: event-tap controls when to start FT transaction, and provides proxy functions to called from net/block devices. While FT transaction, it queues up net/block requests, and flush them when the transaction gets completed. Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp Signed-off-by: OHMURA Kei ohmura@lab.ntt.co.jp --- Makefile.target | 1 + block.h | 9 + event-tap.c | 794 +++ event-tap.h | 34 +++ net.h | 4 + net/queue.c | 1 + 6 files changed, 843 insertions(+), 0 deletions(-) create mode 100644 event-tap.c create mode 100644 event-tap.h event_tap_state is checked at the beginning of several functions. If there is an unexpected state the function silently returns. Should these checks really be assert() so there is an abort and backtrace if the program ever reaches this state? +typedef struct EventTapBlkReq { + char *device_name; + int num_reqs; + int num_cbs; + bool is_multiwrite; Is multiwrite logging necessary? If event tap is called from within the block layer then multiwrite is turned into one or more bdrv_aio_writev() calls. +static void event_tap_replay(void *opaque, int running, int reason) +{ + EventTapLog *log, *next; + + if (!running) { + return; + } + + if (event_tap_state != EVENT_TAP_LOAD) { + return; + } + + event_tap_state = EVENT_TAP_REPLAY; + + QTAILQ_FOREACH(log, event_list, node) { + EventTapBlkReq *blk_req; + + /* event resume */ + switch (log-mode ~EVENT_TAP_TYPE_MASK) { + case EVENT_TAP_NET: + event_tap_net_flush(log-net_req); + break; + case EVENT_TAP_BLK: + blk_req = log-blk_req; + if ((log-mode EVENT_TAP_TYPE_MASK) == EVENT_TAP_IOPORT) { + switch (log-ioport.index) { + case 0: + cpu_outb(log-ioport.address, log-ioport.data); + break; + case 1: + cpu_outw(log-ioport.address, log-ioport.data); + break; + case 2: + cpu_outl(log-ioport.address, log-ioport.data); + break; + } + } else { + /* EVENT_TAP_MMIO */ + cpu_physical_memory_rw(log-mmio.address, + log-mmio.buf, + log-mmio.len, 1); + } + break; Why are net tx packets replayed at the net level but blk requests are replayed at the pio/mmio level? I expected everything to replay either as pio/mmio or as net/block. +static void event_tap_blk_load(QEMUFile *f, EventTapBlkReq *blk_req) +{ + BlockRequest *req; + ram_addr_t page_addr; + int i, j, len; + + len = qemu_get_byte(f); + blk_req-device_name = qemu_malloc(len + 1); + qemu_get_buffer(f, (uint8_t *)blk_req-device_name, len); + blk_req-device_name[len] = '\0'; + blk_req-num_reqs = qemu_get_byte(f); + + for (i = 0; i blk_req-num_reqs; i++) { + req = blk_req-reqs[i]; + req-sector = qemu_get_be64(f); + req-nb_sectors = qemu_get_be32(f); + req-qiov = qemu_malloc(sizeof(QEMUIOVector)); It would make sense to have common QEMUIOVector load/save functions instead of inlining this code here. +static int event_tap_load(QEMUFile *f, void *opaque, int version_id) +{ + EventTapLog *log, *next; + int mode; + + event_tap_state = EVENT_TAP_LOAD; + + QTAILQ_FOREACH_SAFE(log, event_list, node, next) { + QTAILQ_REMOVE(event_list, log, node); + event_tap_free_log(log); + } + + /* loop until EOF */ + while ((mode = qemu_get_byte(f)) != 0) { + EventTapLog *log = event_tap_alloc_log(); + + log-mode = mode; + switch (log-mode EVENT_TAP_TYPE_MASK) { + case EVENT_TAP_IOPORT: + event_tap_ioport_load(f, log-ioport); + break; + case EVENT_TAP_MMIO: + event_tap_mmio_load(f, log-mmio); + break; + case 0: + DPRINTF(No event\n); + break; + default: + fprintf(stderr, Unknown state %d\n, log-mode); + return -1; log is leaked here... + } + + switch (log-mode ~EVENT_TAP_TYPE_MASK) { + case EVENT_TAP_NET: + event_tap_net_load(f, log-net_req); + break; + case EVENT_TAP_BLK: + event_tap_blk_load(f, log-blk_req); + break; + default: + fprintf(stderr, Unknown state %d\n, log-mode); + return -1; ...and here. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to
Re: [PATCH 09/21] Introduce event-tap.
On Thu, Nov 25, 2010 at 03:06:48PM +0900, Yoshiaki Tamura wrote: event-tap controls when to start FT transaction, and provides proxy functions to called from net/block devices. While FT transaction, it queues up net/block requests, and flush them when the transaction gets completed. Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp Signed-off-by: OHMURA Kei ohmura@lab.ntt.co.jp +static void event_tap_alloc_blk_req(EventTapBlkReq *blk_req, +BlockDriverState *bs, BlockRequest *reqs, +int num_reqs, BlockDriverCompletionFunc *cb, +void *opaque, bool is_multiwrite) +{ +int i; + +blk_req-num_reqs = num_reqs; +blk_req-num_cbs = num_reqs; +blk_req-device_name = qemu_strdup(bs-device_name); +blk_req-is_multiwrite = is_multiwrite; + +for (i = 0; i num_reqs; i++) { +blk_req-reqs[i].sector = reqs[i].sector; +blk_req-reqs[i].nb_sectors = reqs[i].nb_sectors; +blk_req-reqs[i].qiov = reqs[i].qiov; +blk_req-reqs[i].cb = cb; +blk_req-reqs[i].opaque = opaque; +blk_req-cb[i] = reqs[i].cb; +blk_req-opaque[i] = reqs[i].opaque; +} +} bdrv_aio_flush should also be logged, so that guest initiated flush is respected on replay. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html