Re: [Qemu-devel] Re: [PATCH 09/21] Introduce event-tap.

2011-01-06 Thread Yoshiaki Tamura
2011/1/4 Michael S. Tsirkin m...@redhat.com:
 On Tue, Jan 04, 2011 at 10:45:13PM +0900, Yoshiaki Tamura wrote:
 2011/1/4 Michael S. Tsirkin m...@redhat.com:
  On Tue, Jan 04, 2011 at 09:20:53PM +0900, Yoshiaki Tamura wrote:
  2011/1/4 Michael S. Tsirkin m...@redhat.com:
   On Tue, Jan 04, 2011 at 08:02:54PM +0900, Yoshiaki Tamura wrote:
   2010/11/29 Stefan Hajnoczi stefa...@gmail.com:
On Thu, Nov 25, 2010 at 6:06 AM, Yoshiaki Tamura
tamura.yoshi...@lab.ntt.co.jp wrote:
event-tap controls when to start FT transaction, and provides proxy
functions to called from net/block devices.  While FT transaction, 
it
queues up net/block requests, and flush them when the transaction 
gets
completed.
   
Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
Signed-off-by: OHMURA Kei ohmura@lab.ntt.co.jp
---
 Makefile.target |    1 +
 block.h         |    9 +
 event-tap.c     |  794 
+++
 event-tap.h     |   34 +++
 net.h           |    4 +
 net/queue.c     |    1 +
 6 files changed, 843 insertions(+), 0 deletions(-)
 create mode 100644 event-tap.c
 create mode 100644 event-tap.h
   
event_tap_state is checked at the beginning of several functions.  If
there is an unexpected state the function silently returns.  Should
these checks really be assert() so there is an abort and backtrace if
the program ever reaches this state?
   
+typedef struct EventTapBlkReq {
+    char *device_name;
+    int num_reqs;
+    int num_cbs;
+    bool is_multiwrite;
   
Is multiwrite logging necessary?  If event tap is called from within
the block layer then multiwrite is turned into one or more
bdrv_aio_writev() calls.
   
+static void event_tap_replay(void *opaque, int running, int reason)
+{
+    EventTapLog *log, *next;
+
+    if (!running) {
+        return;
+    }
+
+    if (event_tap_state != EVENT_TAP_LOAD) {
+        return;
+    }
+
+    event_tap_state = EVENT_TAP_REPLAY;
+
+    QTAILQ_FOREACH(log, event_list, node) {
+        EventTapBlkReq *blk_req;
+
+        /* event resume */
+        switch (log-mode  ~EVENT_TAP_TYPE_MASK) {
+        case EVENT_TAP_NET:
+            event_tap_net_flush(log-net_req);
+            break;
+        case EVENT_TAP_BLK:
+            blk_req = log-blk_req;
+            if ((log-mode  EVENT_TAP_TYPE_MASK) == 
EVENT_TAP_IOPORT) {
+                switch (log-ioport.index) {
+                case 0:
+                    cpu_outb(log-ioport.address, 
log-ioport.data);
+                    break;
+                case 1:
+                    cpu_outw(log-ioport.address, 
log-ioport.data);
+                    break;
+                case 2:
+                    cpu_outl(log-ioport.address, 
log-ioport.data);
+                    break;
+                }
+            } else {
+                /* EVENT_TAP_MMIO */
+                cpu_physical_memory_rw(log-mmio.address,
+                                       log-mmio.buf,
+                                       log-mmio.len, 1);
+            }
+            break;
   
Why are net tx packets replayed at the net level but blk requests are
replayed at the pio/mmio level?
   
I expected everything to replay either as pio/mmio or as net/block.
  
   Stefan,
  
   After doing some heavy load tests, I realized that we have to
   take a hybrid approach to replay for now.  This is because when a
   device moves to the next state (e.g. virtio decreases inuse) is
   different between net and block.  For example, virtio-net
   decreases inuse upon returning from the net layer,
   but virtio-blk
   does that inside of the callback.
  
   For TX, virtio-net calls virtqueue_push from virtio_net_tx_complete.
   For RX, virtio-net calls virtqueue_flush from virtio_net_receive.
   Both are invoked from a callback.
  
   If we only use pio/mmio
   replay, even though event-tap tries to replay net requests, some
   get lost because the state has proceeded already.
  
   It seems that all you need to do to avoid this is to
   delay the callback?
 
  Yeah, if it's possible.  But if you take a look at virtio-net,
  you'll see that virtio_push is called immediately after calling
  qemu_sendv_packet
  while virtio-blk does that in the callback.
 
  This is only if the packet was sent immediately.
  I was referring to the case where the packet is queued.

 I see.  I usually don't see packets get queued in the net layer.
 What would be the effect to devices?  Restraint sending packets?

 Yes.

 
  
   This doesn't
   happen with block, because the state is still old enough to
   replay.  Note that using hybrid approach won't cause duplicated
   requests on the secondary.
  
   An assumption 

Re: [Qemu-devel] Re: [PATCH 09/21] Introduce event-tap.

2011-01-06 Thread Michael S. Tsirkin
On Thu, Jan 06, 2011 at 05:47:27PM +0900, Yoshiaki Tamura wrote:
 2011/1/4 Michael S. Tsirkin m...@redhat.com:
  On Tue, Jan 04, 2011 at 10:45:13PM +0900, Yoshiaki Tamura wrote:
  2011/1/4 Michael S. Tsirkin m...@redhat.com:
   On Tue, Jan 04, 2011 at 09:20:53PM +0900, Yoshiaki Tamura wrote:
   2011/1/4 Michael S. Tsirkin m...@redhat.com:
On Tue, Jan 04, 2011 at 08:02:54PM +0900, Yoshiaki Tamura wrote:
2010/11/29 Stefan Hajnoczi stefa...@gmail.com:
 On Thu, Nov 25, 2010 at 6:06 AM, Yoshiaki Tamura
 tamura.yoshi...@lab.ntt.co.jp wrote:
 event-tap controls when to start FT transaction, and provides 
 proxy
 functions to called from net/block devices.  While FT 
 transaction, it
 queues up net/block requests, and flush them when the transaction 
 gets
 completed.

 Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
 Signed-off-by: OHMURA Kei ohmura@lab.ntt.co.jp
 ---
  Makefile.target |    1 +
  block.h         |    9 +
  event-tap.c     |  794 
 +++
  event-tap.h     |   34 +++
  net.h           |    4 +
  net/queue.c     |    1 +
  6 files changed, 843 insertions(+), 0 deletions(-)
  create mode 100644 event-tap.c
  create mode 100644 event-tap.h

 event_tap_state is checked at the beginning of several functions.  
 If
 there is an unexpected state the function silently returns.  Should
 these checks really be assert() so there is an abort and backtrace 
 if
 the program ever reaches this state?

 +typedef struct EventTapBlkReq {
 +    char *device_name;
 +    int num_reqs;
 +    int num_cbs;
 +    bool is_multiwrite;

 Is multiwrite logging necessary?  If event tap is called from 
 within
 the block layer then multiwrite is turned into one or more
 bdrv_aio_writev() calls.

 +static void event_tap_replay(void *opaque, int running, int 
 reason)
 +{
 +    EventTapLog *log, *next;
 +
 +    if (!running) {
 +        return;
 +    }
 +
 +    if (event_tap_state != EVENT_TAP_LOAD) {
 +        return;
 +    }
 +
 +    event_tap_state = EVENT_TAP_REPLAY;
 +
 +    QTAILQ_FOREACH(log, event_list, node) {
 +        EventTapBlkReq *blk_req;
 +
 +        /* event resume */
 +        switch (log-mode  ~EVENT_TAP_TYPE_MASK) {
 +        case EVENT_TAP_NET:
 +            event_tap_net_flush(log-net_req);
 +            break;
 +        case EVENT_TAP_BLK:
 +            blk_req = log-blk_req;
 +            if ((log-mode  EVENT_TAP_TYPE_MASK) == 
 EVENT_TAP_IOPORT) {
 +                switch (log-ioport.index) {
 +                case 0:
 +                    cpu_outb(log-ioport.address, 
 log-ioport.data);
 +                    break;
 +                case 1:
 +                    cpu_outw(log-ioport.address, 
 log-ioport.data);
 +                    break;
 +                case 2:
 +                    cpu_outl(log-ioport.address, 
 log-ioport.data);
 +                    break;
 +                }
 +            } else {
 +                /* EVENT_TAP_MMIO */
 +                cpu_physical_memory_rw(log-mmio.address,
 +                                       log-mmio.buf,
 +                                       log-mmio.len, 1);
 +            }
 +            break;

 Why are net tx packets replayed at the net level but blk requests 
 are
 replayed at the pio/mmio level?

 I expected everything to replay either as pio/mmio or as net/block.
   
Stefan,
   
After doing some heavy load tests, I realized that we have to
take a hybrid approach to replay for now.  This is because when a
device moves to the next state (e.g. virtio decreases inuse) is
different between net and block.  For example, virtio-net
decreases inuse upon returning from the net layer,
but virtio-blk
does that inside of the callback.
   
For TX, virtio-net calls virtqueue_push from virtio_net_tx_complete.
For RX, virtio-net calls virtqueue_flush from virtio_net_receive.
Both are invoked from a callback.
   
If we only use pio/mmio
replay, even though event-tap tries to replay net requests, some
get lost because the state has proceeded already.
   
It seems that all you need to do to avoid this is to
delay the callback?
  
   Yeah, if it's possible.  But if you take a look at virtio-net,
   you'll see that virtio_push is called immediately after calling
   qemu_sendv_packet
   while virtio-blk does that in the callback.
  
   This is only if the packet was sent immediately.
   I was referring to the case where the packet is queued.
 
  I see.  I usually don't see packets get queued in the net layer.
  What would be the effect to devices?  

Re: [Qemu-devel] Re: [PATCH 09/21] Introduce event-tap.

2011-01-06 Thread Yoshiaki Tamura
2011/1/6 Michael S. Tsirkin m...@redhat.com:
 On Thu, Jan 06, 2011 at 05:47:27PM +0900, Yoshiaki Tamura wrote:
 2011/1/4 Michael S. Tsirkin m...@redhat.com:
  On Tue, Jan 04, 2011 at 10:45:13PM +0900, Yoshiaki Tamura wrote:
  2011/1/4 Michael S. Tsirkin m...@redhat.com:
   On Tue, Jan 04, 2011 at 09:20:53PM +0900, Yoshiaki Tamura wrote:
   2011/1/4 Michael S. Tsirkin m...@redhat.com:
On Tue, Jan 04, 2011 at 08:02:54PM +0900, Yoshiaki Tamura wrote:
2010/11/29 Stefan Hajnoczi stefa...@gmail.com:
 On Thu, Nov 25, 2010 at 6:06 AM, Yoshiaki Tamura
 tamura.yoshi...@lab.ntt.co.jp wrote:
 event-tap controls when to start FT transaction, and provides 
 proxy
 functions to called from net/block devices.  While FT 
 transaction, it
 queues up net/block requests, and flush them when the 
 transaction gets
 completed.

 Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
 Signed-off-by: OHMURA Kei ohmura@lab.ntt.co.jp
 ---
  Makefile.target |    1 +
  block.h         |    9 +
  event-tap.c     |  794 
 +++
  event-tap.h     |   34 +++
  net.h           |    4 +
  net/queue.c     |    1 +
  6 files changed, 843 insertions(+), 0 deletions(-)
  create mode 100644 event-tap.c
  create mode 100644 event-tap.h

 event_tap_state is checked at the beginning of several functions. 
  If
 there is an unexpected state the function silently returns.  
 Should
 these checks really be assert() so there is an abort and 
 backtrace if
 the program ever reaches this state?

 +typedef struct EventTapBlkReq {
 +    char *device_name;
 +    int num_reqs;
 +    int num_cbs;
 +    bool is_multiwrite;

 Is multiwrite logging necessary?  If event tap is called from 
 within
 the block layer then multiwrite is turned into one or more
 bdrv_aio_writev() calls.

 +static void event_tap_replay(void *opaque, int running, int 
 reason)
 +{
 +    EventTapLog *log, *next;
 +
 +    if (!running) {
 +        return;
 +    }
 +
 +    if (event_tap_state != EVENT_TAP_LOAD) {
 +        return;
 +    }
 +
 +    event_tap_state = EVENT_TAP_REPLAY;
 +
 +    QTAILQ_FOREACH(log, event_list, node) {
 +        EventTapBlkReq *blk_req;
 +
 +        /* event resume */
 +        switch (log-mode  ~EVENT_TAP_TYPE_MASK) {
 +        case EVENT_TAP_NET:
 +            event_tap_net_flush(log-net_req);
 +            break;
 +        case EVENT_TAP_BLK:
 +            blk_req = log-blk_req;
 +            if ((log-mode  EVENT_TAP_TYPE_MASK) == 
 EVENT_TAP_IOPORT) {
 +                switch (log-ioport.index) {
 +                case 0:
 +                    cpu_outb(log-ioport.address, 
 log-ioport.data);
 +                    break;
 +                case 1:
 +                    cpu_outw(log-ioport.address, 
 log-ioport.data);
 +                    break;
 +                case 2:
 +                    cpu_outl(log-ioport.address, 
 log-ioport.data);
 +                    break;
 +                }
 +            } else {
 +                /* EVENT_TAP_MMIO */
 +                cpu_physical_memory_rw(log-mmio.address,
 +                                       log-mmio.buf,
 +                                       log-mmio.len, 1);
 +            }
 +            break;

 Why are net tx packets replayed at the net level but blk requests 
 are
 replayed at the pio/mmio level?

 I expected everything to replay either as pio/mmio or as 
 net/block.
   
Stefan,
   
After doing some heavy load tests, I realized that we have to
take a hybrid approach to replay for now.  This is because when a
device moves to the next state (e.g. virtio decreases inuse) is
different between net and block.  For example, virtio-net
decreases inuse upon returning from the net layer,
but virtio-blk
does that inside of the callback.
   
For TX, virtio-net calls virtqueue_push from virtio_net_tx_complete.
For RX, virtio-net calls virtqueue_flush from virtio_net_receive.
Both are invoked from a callback.
   
If we only use pio/mmio
replay, even though event-tap tries to replay net requests, some
get lost because the state has proceeded already.
   
It seems that all you need to do to avoid this is to
delay the callback?
  
   Yeah, if it's possible.  But if you take a look at virtio-net,
   you'll see that virtio_push is called immediately after calling
   qemu_sendv_packet
   while virtio-blk does that in the callback.
  
   This is only if the packet was sent immediately.
   I was referring to the case where the packet is queued.
 
  I see.  I usually don't see packets get queued in the 

Re: [PATCH 09/21] Introduce event-tap.

2011-01-04 Thread Yoshiaki Tamura
2010/11/29 Stefan Hajnoczi stefa...@gmail.com:
 On Thu, Nov 25, 2010 at 6:06 AM, Yoshiaki Tamura
 tamura.yoshi...@lab.ntt.co.jp wrote:
 event-tap controls when to start FT transaction, and provides proxy
 functions to called from net/block devices.  While FT transaction, it
 queues up net/block requests, and flush them when the transaction gets
 completed.

 Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
 Signed-off-by: OHMURA Kei ohmura@lab.ntt.co.jp
 ---
  Makefile.target |    1 +
  block.h         |    9 +
  event-tap.c     |  794 
 +++
  event-tap.h     |   34 +++
  net.h           |    4 +
  net/queue.c     |    1 +
  6 files changed, 843 insertions(+), 0 deletions(-)
  create mode 100644 event-tap.c
  create mode 100644 event-tap.h

 event_tap_state is checked at the beginning of several functions.  If
 there is an unexpected state the function silently returns.  Should
 these checks really be assert() so there is an abort and backtrace if
 the program ever reaches this state?

 +typedef struct EventTapBlkReq {
 +    char *device_name;
 +    int num_reqs;
 +    int num_cbs;
 +    bool is_multiwrite;

 Is multiwrite logging necessary?  If event tap is called from within
 the block layer then multiwrite is turned into one or more
 bdrv_aio_writev() calls.

 +static void event_tap_replay(void *opaque, int running, int reason)
 +{
 +    EventTapLog *log, *next;
 +
 +    if (!running) {
 +        return;
 +    }
 +
 +    if (event_tap_state != EVENT_TAP_LOAD) {
 +        return;
 +    }
 +
 +    event_tap_state = EVENT_TAP_REPLAY;
 +
 +    QTAILQ_FOREACH(log, event_list, node) {
 +        EventTapBlkReq *blk_req;
 +
 +        /* event resume */
 +        switch (log-mode  ~EVENT_TAP_TYPE_MASK) {
 +        case EVENT_TAP_NET:
 +            event_tap_net_flush(log-net_req);
 +            break;
 +        case EVENT_TAP_BLK:
 +            blk_req = log-blk_req;
 +            if ((log-mode  EVENT_TAP_TYPE_MASK) == EVENT_TAP_IOPORT) {
 +                switch (log-ioport.index) {
 +                case 0:
 +                    cpu_outb(log-ioport.address, log-ioport.data);
 +                    break;
 +                case 1:
 +                    cpu_outw(log-ioport.address, log-ioport.data);
 +                    break;
 +                case 2:
 +                    cpu_outl(log-ioport.address, log-ioport.data);
 +                    break;
 +                }
 +            } else {
 +                /* EVENT_TAP_MMIO */
 +                cpu_physical_memory_rw(log-mmio.address,
 +                                       log-mmio.buf,
 +                                       log-mmio.len, 1);
 +            }
 +            break;

 Why are net tx packets replayed at the net level but blk requests are
 replayed at the pio/mmio level?

 I expected everything to replay either as pio/mmio or as net/block.

Stefan,

After doing some heavy load tests, I realized that we have to
take a hybrid approach to replay for now.  This is because when a
device moves to the next state (e.g. virtio decreases inuse) is
different between net and block.  For example, virtio-net
decreases inuse upon returning from the net layer, but virtio-blk
does that inside of the callback.  If we only use pio/mmio
replay, even though event-tap tries to replay net requests, some
get lost because the state has proceeded already.  This doesn't
happen with block, because the state is still old enough to
replay.  Note that using hybrid approach won't cause duplicated
requests on the secondary.

Thanks,

Yoshi
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 09/21] Introduce event-tap.

2011-01-04 Thread Stefan Hajnoczi
On Tue, Jan 4, 2011 at 11:02 AM, Yoshiaki Tamura
tamura.yoshi...@lab.ntt.co.jp wrote:
 After doing some heavy load tests, I realized that we have to
 take a hybrid approach to replay for now.  This is because when a
 device moves to the next state (e.g. virtio decreases inuse) is
 different between net and block.  For example, virtio-net
 decreases inuse upon returning from the net layer, but virtio-blk
 does that inside of the callback.  If we only use pio/mmio
 replay, even though event-tap tries to replay net requests, some
 get lost because the state has proceeded already.  This doesn't
 happen with block, because the state is still old enough to
 replay.  Note that using hybrid approach won't cause duplicated
 requests on the secondary.

Thanks Yoshi.  I think I understand what you're saying.

Stefan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 09/21] Introduce event-tap.

2011-01-04 Thread Michael S. Tsirkin
On Tue, Jan 04, 2011 at 08:02:54PM +0900, Yoshiaki Tamura wrote:
 2010/11/29 Stefan Hajnoczi stefa...@gmail.com:
  On Thu, Nov 25, 2010 at 6:06 AM, Yoshiaki Tamura
  tamura.yoshi...@lab.ntt.co.jp wrote:
  event-tap controls when to start FT transaction, and provides proxy
  functions to called from net/block devices.  While FT transaction, it
  queues up net/block requests, and flush them when the transaction gets
  completed.
 
  Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
  Signed-off-by: OHMURA Kei ohmura@lab.ntt.co.jp
  ---
   Makefile.target |    1 +
   block.h         |    9 +
   event-tap.c     |  794 
  +++
   event-tap.h     |   34 +++
   net.h           |    4 +
   net/queue.c     |    1 +
   6 files changed, 843 insertions(+), 0 deletions(-)
   create mode 100644 event-tap.c
   create mode 100644 event-tap.h
 
  event_tap_state is checked at the beginning of several functions.  If
  there is an unexpected state the function silently returns.  Should
  these checks really be assert() so there is an abort and backtrace if
  the program ever reaches this state?
 
  +typedef struct EventTapBlkReq {
  +    char *device_name;
  +    int num_reqs;
  +    int num_cbs;
  +    bool is_multiwrite;
 
  Is multiwrite logging necessary?  If event tap is called from within
  the block layer then multiwrite is turned into one or more
  bdrv_aio_writev() calls.
 
  +static void event_tap_replay(void *opaque, int running, int reason)
  +{
  +    EventTapLog *log, *next;
  +
  +    if (!running) {
  +        return;
  +    }
  +
  +    if (event_tap_state != EVENT_TAP_LOAD) {
  +        return;
  +    }
  +
  +    event_tap_state = EVENT_TAP_REPLAY;
  +
  +    QTAILQ_FOREACH(log, event_list, node) {
  +        EventTapBlkReq *blk_req;
  +
  +        /* event resume */
  +        switch (log-mode  ~EVENT_TAP_TYPE_MASK) {
  +        case EVENT_TAP_NET:
  +            event_tap_net_flush(log-net_req);
  +            break;
  +        case EVENT_TAP_BLK:
  +            blk_req = log-blk_req;
  +            if ((log-mode  EVENT_TAP_TYPE_MASK) == EVENT_TAP_IOPORT) {
  +                switch (log-ioport.index) {
  +                case 0:
  +                    cpu_outb(log-ioport.address, log-ioport.data);
  +                    break;
  +                case 1:
  +                    cpu_outw(log-ioport.address, log-ioport.data);
  +                    break;
  +                case 2:
  +                    cpu_outl(log-ioport.address, log-ioport.data);
  +                    break;
  +                }
  +            } else {
  +                /* EVENT_TAP_MMIO */
  +                cpu_physical_memory_rw(log-mmio.address,
  +                                       log-mmio.buf,
  +                                       log-mmio.len, 1);
  +            }
  +            break;
 
  Why are net tx packets replayed at the net level but blk requests are
  replayed at the pio/mmio level?
 
  I expected everything to replay either as pio/mmio or as net/block.
 
 Stefan,
 
 After doing some heavy load tests, I realized that we have to
 take a hybrid approach to replay for now.  This is because when a
 device moves to the next state (e.g. virtio decreases inuse) is
 different between net and block.  For example, virtio-net
 decreases inuse upon returning from the net layer,
 but virtio-blk
 does that inside of the callback.

For TX, virtio-net calls virtqueue_push from virtio_net_tx_complete.
For RX, virtio-net calls virtqueue_flush from virtio_net_receive.
Both are invoked from a callback.

 If we only use pio/mmio
 replay, even though event-tap tries to replay net requests, some
 get lost because the state has proceeded already.

It seems that all you need to do to avoid this is to
delay the callback?

 This doesn't
 happen with block, because the state is still old enough to
 replay.  Note that using hybrid approach won't cause duplicated
 requests on the secondary.

An assumption devices make is that a buffer is unused once
completion callback was invoked. Does this violate that assumption?

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [PATCH 09/21] Introduce event-tap.

2011-01-04 Thread Yoshiaki Tamura
2011/1/4 Michael S. Tsirkin m...@redhat.com:
 On Tue, Jan 04, 2011 at 08:02:54PM +0900, Yoshiaki Tamura wrote:
 2010/11/29 Stefan Hajnoczi stefa...@gmail.com:
  On Thu, Nov 25, 2010 at 6:06 AM, Yoshiaki Tamura
  tamura.yoshi...@lab.ntt.co.jp wrote:
  event-tap controls when to start FT transaction, and provides proxy
  functions to called from net/block devices.  While FT transaction, it
  queues up net/block requests, and flush them when the transaction gets
  completed.
 
  Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
  Signed-off-by: OHMURA Kei ohmura@lab.ntt.co.jp
  ---
   Makefile.target |    1 +
   block.h         |    9 +
   event-tap.c     |  794 
  +++
   event-tap.h     |   34 +++
   net.h           |    4 +
   net/queue.c     |    1 +
   6 files changed, 843 insertions(+), 0 deletions(-)
   create mode 100644 event-tap.c
   create mode 100644 event-tap.h
 
  event_tap_state is checked at the beginning of several functions.  If
  there is an unexpected state the function silently returns.  Should
  these checks really be assert() so there is an abort and backtrace if
  the program ever reaches this state?
 
  +typedef struct EventTapBlkReq {
  +    char *device_name;
  +    int num_reqs;
  +    int num_cbs;
  +    bool is_multiwrite;
 
  Is multiwrite logging necessary?  If event tap is called from within
  the block layer then multiwrite is turned into one or more
  bdrv_aio_writev() calls.
 
  +static void event_tap_replay(void *opaque, int running, int reason)
  +{
  +    EventTapLog *log, *next;
  +
  +    if (!running) {
  +        return;
  +    }
  +
  +    if (event_tap_state != EVENT_TAP_LOAD) {
  +        return;
  +    }
  +
  +    event_tap_state = EVENT_TAP_REPLAY;
  +
  +    QTAILQ_FOREACH(log, event_list, node) {
  +        EventTapBlkReq *blk_req;
  +
  +        /* event resume */
  +        switch (log-mode  ~EVENT_TAP_TYPE_MASK) {
  +        case EVENT_TAP_NET:
  +            event_tap_net_flush(log-net_req);
  +            break;
  +        case EVENT_TAP_BLK:
  +            blk_req = log-blk_req;
  +            if ((log-mode  EVENT_TAP_TYPE_MASK) == EVENT_TAP_IOPORT) {
  +                switch (log-ioport.index) {
  +                case 0:
  +                    cpu_outb(log-ioport.address, log-ioport.data);
  +                    break;
  +                case 1:
  +                    cpu_outw(log-ioport.address, log-ioport.data);
  +                    break;
  +                case 2:
  +                    cpu_outl(log-ioport.address, log-ioport.data);
  +                    break;
  +                }
  +            } else {
  +                /* EVENT_TAP_MMIO */
  +                cpu_physical_memory_rw(log-mmio.address,
  +                                       log-mmio.buf,
  +                                       log-mmio.len, 1);
  +            }
  +            break;
 
  Why are net tx packets replayed at the net level but blk requests are
  replayed at the pio/mmio level?
 
  I expected everything to replay either as pio/mmio or as net/block.

 Stefan,

 After doing some heavy load tests, I realized that we have to
 take a hybrid approach to replay for now.  This is because when a
 device moves to the next state (e.g. virtio decreases inuse) is
 different between net and block.  For example, virtio-net
 decreases inuse upon returning from the net layer,
 but virtio-blk
 does that inside of the callback.

 For TX, virtio-net calls virtqueue_push from virtio_net_tx_complete.
 For RX, virtio-net calls virtqueue_flush from virtio_net_receive.
 Both are invoked from a callback.

 If we only use pio/mmio
 replay, even though event-tap tries to replay net requests, some
 get lost because the state has proceeded already.

 It seems that all you need to do to avoid this is to
 delay the callback?

Yeah, if it's possible.  But if you take a look at virtio-net,
you'll see that virtio_push is called immediately after calling
qemu_sendv_packet while virtio-blk does that in the callback.


 This doesn't
 happen with block, because the state is still old enough to
 replay.  Note that using hybrid approach won't cause duplicated
 requests on the secondary.

 An assumption devices make is that a buffer is unused once
 completion callback was invoked. Does this violate that assumption?

No, it shouldn't.  In case of net with net layer replay, we copy
the content of the requests, and in case of block, because we
haven't called the callback yet, the requests remains fresh.

Yoshi


 --
 MST


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [PATCH 09/21] Introduce event-tap.

2011-01-04 Thread Michael S. Tsirkin
On Tue, Jan 04, 2011 at 09:20:53PM +0900, Yoshiaki Tamura wrote:
 2011/1/4 Michael S. Tsirkin m...@redhat.com:
  On Tue, Jan 04, 2011 at 08:02:54PM +0900, Yoshiaki Tamura wrote:
  2010/11/29 Stefan Hajnoczi stefa...@gmail.com:
   On Thu, Nov 25, 2010 at 6:06 AM, Yoshiaki Tamura
   tamura.yoshi...@lab.ntt.co.jp wrote:
   event-tap controls when to start FT transaction, and provides proxy
   functions to called from net/block devices.  While FT transaction, it
   queues up net/block requests, and flush them when the transaction gets
   completed.
  
   Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
   Signed-off-by: OHMURA Kei ohmura@lab.ntt.co.jp
   ---
    Makefile.target |    1 +
    block.h         |    9 +
    event-tap.c     |  794 
   +++
    event-tap.h     |   34 +++
    net.h           |    4 +
    net/queue.c     |    1 +
    6 files changed, 843 insertions(+), 0 deletions(-)
    create mode 100644 event-tap.c
    create mode 100644 event-tap.h
  
   event_tap_state is checked at the beginning of several functions.  If
   there is an unexpected state the function silently returns.  Should
   these checks really be assert() so there is an abort and backtrace if
   the program ever reaches this state?
  
   +typedef struct EventTapBlkReq {
   +    char *device_name;
   +    int num_reqs;
   +    int num_cbs;
   +    bool is_multiwrite;
  
   Is multiwrite logging necessary?  If event tap is called from within
   the block layer then multiwrite is turned into one or more
   bdrv_aio_writev() calls.
  
   +static void event_tap_replay(void *opaque, int running, int reason)
   +{
   +    EventTapLog *log, *next;
   +
   +    if (!running) {
   +        return;
   +    }
   +
   +    if (event_tap_state != EVENT_TAP_LOAD) {
   +        return;
   +    }
   +
   +    event_tap_state = EVENT_TAP_REPLAY;
   +
   +    QTAILQ_FOREACH(log, event_list, node) {
   +        EventTapBlkReq *blk_req;
   +
   +        /* event resume */
   +        switch (log-mode  ~EVENT_TAP_TYPE_MASK) {
   +        case EVENT_TAP_NET:
   +            event_tap_net_flush(log-net_req);
   +            break;
   +        case EVENT_TAP_BLK:
   +            blk_req = log-blk_req;
   +            if ((log-mode  EVENT_TAP_TYPE_MASK) == EVENT_TAP_IOPORT) 
   {
   +                switch (log-ioport.index) {
   +                case 0:
   +                    cpu_outb(log-ioport.address, log-ioport.data);
   +                    break;
   +                case 1:
   +                    cpu_outw(log-ioport.address, log-ioport.data);
   +                    break;
   +                case 2:
   +                    cpu_outl(log-ioport.address, log-ioport.data);
   +                    break;
   +                }
   +            } else {
   +                /* EVENT_TAP_MMIO */
   +                cpu_physical_memory_rw(log-mmio.address,
   +                                       log-mmio.buf,
   +                                       log-mmio.len, 1);
   +            }
   +            break;
  
   Why are net tx packets replayed at the net level but blk requests are
   replayed at the pio/mmio level?
  
   I expected everything to replay either as pio/mmio or as net/block.
 
  Stefan,
 
  After doing some heavy load tests, I realized that we have to
  take a hybrid approach to replay for now.  This is because when a
  device moves to the next state (e.g. virtio decreases inuse) is
  different between net and block.  For example, virtio-net
  decreases inuse upon returning from the net layer,
  but virtio-blk
  does that inside of the callback.
 
  For TX, virtio-net calls virtqueue_push from virtio_net_tx_complete.
  For RX, virtio-net calls virtqueue_flush from virtio_net_receive.
  Both are invoked from a callback.
 
  If we only use pio/mmio
  replay, even though event-tap tries to replay net requests, some
  get lost because the state has proceeded already.
 
  It seems that all you need to do to avoid this is to
  delay the callback?
 
 Yeah, if it's possible.  But if you take a look at virtio-net,
 you'll see that virtio_push is called immediately after calling
 qemu_sendv_packet
 while virtio-blk does that in the callback.

This is only if the packet was sent immediately.
I was referring to the case where the packet is queued.

 
  This doesn't
  happen with block, because the state is still old enough to
  replay.  Note that using hybrid approach won't cause duplicated
  requests on the secondary.
 
  An assumption devices make is that a buffer is unused once
  completion callback was invoked. Does this violate that assumption?
 
 No, it shouldn't.  In case of net with net layer replay, we copy
 the content of the requests, and in case of block, because we
 haven't called the callback yet, the requests remains fresh.
 
 Yoshi
 

Yes, as long as you copy it should be fine.  Maybe it's a good idea for
event-tap to queue all packets to avoid the copy and avoid 

Re: [Qemu-devel] Re: [PATCH 09/21] Introduce event-tap.

2011-01-04 Thread Yoshiaki Tamura
2011/1/4 Michael S. Tsirkin m...@redhat.com:
 On Tue, Jan 04, 2011 at 09:20:53PM +0900, Yoshiaki Tamura wrote:
 2011/1/4 Michael S. Tsirkin m...@redhat.com:
  On Tue, Jan 04, 2011 at 08:02:54PM +0900, Yoshiaki Tamura wrote:
  2010/11/29 Stefan Hajnoczi stefa...@gmail.com:
   On Thu, Nov 25, 2010 at 6:06 AM, Yoshiaki Tamura
   tamura.yoshi...@lab.ntt.co.jp wrote:
   event-tap controls when to start FT transaction, and provides proxy
   functions to called from net/block devices.  While FT transaction, it
   queues up net/block requests, and flush them when the transaction gets
   completed.
  
   Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
   Signed-off-by: OHMURA Kei ohmura@lab.ntt.co.jp
   ---
    Makefile.target |    1 +
    block.h         |    9 +
    event-tap.c     |  794 
   +++
    event-tap.h     |   34 +++
    net.h           |    4 +
    net/queue.c     |    1 +
    6 files changed, 843 insertions(+), 0 deletions(-)
    create mode 100644 event-tap.c
    create mode 100644 event-tap.h
  
   event_tap_state is checked at the beginning of several functions.  If
   there is an unexpected state the function silently returns.  Should
   these checks really be assert() so there is an abort and backtrace if
   the program ever reaches this state?
  
   +typedef struct EventTapBlkReq {
   +    char *device_name;
   +    int num_reqs;
   +    int num_cbs;
   +    bool is_multiwrite;
  
   Is multiwrite logging necessary?  If event tap is called from within
   the block layer then multiwrite is turned into one or more
   bdrv_aio_writev() calls.
  
   +static void event_tap_replay(void *opaque, int running, int reason)
   +{
   +    EventTapLog *log, *next;
   +
   +    if (!running) {
   +        return;
   +    }
   +
   +    if (event_tap_state != EVENT_TAP_LOAD) {
   +        return;
   +    }
   +
   +    event_tap_state = EVENT_TAP_REPLAY;
   +
   +    QTAILQ_FOREACH(log, event_list, node) {
   +        EventTapBlkReq *blk_req;
   +
   +        /* event resume */
   +        switch (log-mode  ~EVENT_TAP_TYPE_MASK) {
   +        case EVENT_TAP_NET:
   +            event_tap_net_flush(log-net_req);
   +            break;
   +        case EVENT_TAP_BLK:
   +            blk_req = log-blk_req;
   +            if ((log-mode  EVENT_TAP_TYPE_MASK) == 
   EVENT_TAP_IOPORT) {
   +                switch (log-ioport.index) {
   +                case 0:
   +                    cpu_outb(log-ioport.address, log-ioport.data);
   +                    break;
   +                case 1:
   +                    cpu_outw(log-ioport.address, log-ioport.data);
   +                    break;
   +                case 2:
   +                    cpu_outl(log-ioport.address, log-ioport.data);
   +                    break;
   +                }
   +            } else {
   +                /* EVENT_TAP_MMIO */
   +                cpu_physical_memory_rw(log-mmio.address,
   +                                       log-mmio.buf,
   +                                       log-mmio.len, 1);
   +            }
   +            break;
  
   Why are net tx packets replayed at the net level but blk requests are
   replayed at the pio/mmio level?
  
   I expected everything to replay either as pio/mmio or as net/block.
 
  Stefan,
 
  After doing some heavy load tests, I realized that we have to
  take a hybrid approach to replay for now.  This is because when a
  device moves to the next state (e.g. virtio decreases inuse) is
  different between net and block.  For example, virtio-net
  decreases inuse upon returning from the net layer,
  but virtio-blk
  does that inside of the callback.
 
  For TX, virtio-net calls virtqueue_push from virtio_net_tx_complete.
  For RX, virtio-net calls virtqueue_flush from virtio_net_receive.
  Both are invoked from a callback.
 
  If we only use pio/mmio
  replay, even though event-tap tries to replay net requests, some
  get lost because the state has proceeded already.
 
  It seems that all you need to do to avoid this is to
  delay the callback?

 Yeah, if it's possible.  But if you take a look at virtio-net,
 you'll see that virtio_push is called immediately after calling
 qemu_sendv_packet
 while virtio-blk does that in the callback.

 This is only if the packet was sent immediately.
 I was referring to the case where the packet is queued.

I see.  I usually don't see packets get queued in the net layer.
What would be the effect to devices?  Restraint sending packets?


 
  This doesn't
  happen with block, because the state is still old enough to
  replay.  Note that using hybrid approach won't cause duplicated
  requests on the secondary.
 
  An assumption devices make is that a buffer is unused once
  completion callback was invoked. Does this violate that assumption?

 No, it shouldn't.  In case of net with net layer replay, we copy
 the content of the requests, and in case of block, because we
 haven't called the 

Re: [Qemu-devel] Re: [PATCH 09/21] Introduce event-tap.

2011-01-04 Thread Michael S. Tsirkin
On Tue, Jan 04, 2011 at 10:45:13PM +0900, Yoshiaki Tamura wrote:
 2011/1/4 Michael S. Tsirkin m...@redhat.com:
  On Tue, Jan 04, 2011 at 09:20:53PM +0900, Yoshiaki Tamura wrote:
  2011/1/4 Michael S. Tsirkin m...@redhat.com:
   On Tue, Jan 04, 2011 at 08:02:54PM +0900, Yoshiaki Tamura wrote:
   2010/11/29 Stefan Hajnoczi stefa...@gmail.com:
On Thu, Nov 25, 2010 at 6:06 AM, Yoshiaki Tamura
tamura.yoshi...@lab.ntt.co.jp wrote:
event-tap controls when to start FT transaction, and provides proxy
functions to called from net/block devices.  While FT transaction, it
queues up net/block requests, and flush them when the transaction 
gets
completed.
   
Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
Signed-off-by: OHMURA Kei ohmura@lab.ntt.co.jp
---
 Makefile.target |    1 +
 block.h         |    9 +
 event-tap.c     |  794 
+++
 event-tap.h     |   34 +++
 net.h           |    4 +
 net/queue.c     |    1 +
 6 files changed, 843 insertions(+), 0 deletions(-)
 create mode 100644 event-tap.c
 create mode 100644 event-tap.h
   
event_tap_state is checked at the beginning of several functions.  If
there is an unexpected state the function silently returns.  Should
these checks really be assert() so there is an abort and backtrace if
the program ever reaches this state?
   
+typedef struct EventTapBlkReq {
+    char *device_name;
+    int num_reqs;
+    int num_cbs;
+    bool is_multiwrite;
   
Is multiwrite logging necessary?  If event tap is called from within
the block layer then multiwrite is turned into one or more
bdrv_aio_writev() calls.
   
+static void event_tap_replay(void *opaque, int running, int reason)
+{
+    EventTapLog *log, *next;
+
+    if (!running) {
+        return;
+    }
+
+    if (event_tap_state != EVENT_TAP_LOAD) {
+        return;
+    }
+
+    event_tap_state = EVENT_TAP_REPLAY;
+
+    QTAILQ_FOREACH(log, event_list, node) {
+        EventTapBlkReq *blk_req;
+
+        /* event resume */
+        switch (log-mode  ~EVENT_TAP_TYPE_MASK) {
+        case EVENT_TAP_NET:
+            event_tap_net_flush(log-net_req);
+            break;
+        case EVENT_TAP_BLK:
+            blk_req = log-blk_req;
+            if ((log-mode  EVENT_TAP_TYPE_MASK) == 
EVENT_TAP_IOPORT) {
+                switch (log-ioport.index) {
+                case 0:
+                    cpu_outb(log-ioport.address, log-ioport.data);
+                    break;
+                case 1:
+                    cpu_outw(log-ioport.address, log-ioport.data);
+                    break;
+                case 2:
+                    cpu_outl(log-ioport.address, log-ioport.data);
+                    break;
+                }
+            } else {
+                /* EVENT_TAP_MMIO */
+                cpu_physical_memory_rw(log-mmio.address,
+                                       log-mmio.buf,
+                                       log-mmio.len, 1);
+            }
+            break;
   
Why are net tx packets replayed at the net level but blk requests are
replayed at the pio/mmio level?
   
I expected everything to replay either as pio/mmio or as net/block.
  
   Stefan,
  
   After doing some heavy load tests, I realized that we have to
   take a hybrid approach to replay for now.  This is because when a
   device moves to the next state (e.g. virtio decreases inuse) is
   different between net and block.  For example, virtio-net
   decreases inuse upon returning from the net layer,
   but virtio-blk
   does that inside of the callback.
  
   For TX, virtio-net calls virtqueue_push from virtio_net_tx_complete.
   For RX, virtio-net calls virtqueue_flush from virtio_net_receive.
   Both are invoked from a callback.
  
   If we only use pio/mmio
   replay, even though event-tap tries to replay net requests, some
   get lost because the state has proceeded already.
  
   It seems that all you need to do to avoid this is to
   delay the callback?
 
  Yeah, if it's possible.  But if you take a look at virtio-net,
  you'll see that virtio_push is called immediately after calling
  qemu_sendv_packet
  while virtio-blk does that in the callback.
 
  This is only if the packet was sent immediately.
  I was referring to the case where the packet is queued.
 
 I see.  I usually don't see packets get queued in the net layer.
 What would be the effect to devices?  Restraint sending packets?

Yes.

 
  
   This doesn't
   happen with block, because the state is still old enough to
   replay.  Note that using hybrid approach won't cause duplicated
   requests on the secondary.
  
   An assumption devices make is that a buffer is unused once
   completion 

Re: [Qemu-devel] Re: [PATCH 09/21] Introduce event-tap.

2010-11-30 Thread Yoshiaki Tamura
2010/11/30 Marcelo Tosatti mtosa...@redhat.com:
 On Thu, Nov 25, 2010 at 03:06:48PM +0900, Yoshiaki Tamura wrote:
 event-tap controls when to start FT transaction, and provides proxy
 functions to called from net/block devices.  While FT transaction, it
 queues up net/block requests, and flush them when the transaction gets
 completed.

 Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
 Signed-off-by: OHMURA Kei ohmura@lab.ntt.co.jp

 +static void event_tap_alloc_blk_req(EventTapBlkReq *blk_req,
 +                                    BlockDriverState *bs, BlockRequest 
 *reqs,
 +                                    int num_reqs, BlockDriverCompletionFunc 
 *cb,
 +                                    void *opaque, bool is_multiwrite)
 +{
 +    int i;
 +
 +    blk_req-num_reqs = num_reqs;
 +    blk_req-num_cbs = num_reqs;
 +    blk_req-device_name = qemu_strdup(bs-device_name);
 +    blk_req-is_multiwrite = is_multiwrite;
 +
 +    for (i = 0; i  num_reqs; i++) {
 +        blk_req-reqs[i].sector = reqs[i].sector;
 +        blk_req-reqs[i].nb_sectors = reqs[i].nb_sectors;
 +        blk_req-reqs[i].qiov = reqs[i].qiov;
 +        blk_req-reqs[i].cb = cb;
 +        blk_req-reqs[i].opaque = opaque;
 +        blk_req-cb[i] = reqs[i].cb;
 +        blk_req-opaque[i] = reqs[i].opaque;
 +    }
 +}

 bdrv_aio_flush should also be logged, so that guest initiated flush is
 respected on replay.

In the current implementation w/o flush logging, there might be
order inversion after replay?

Yoshi
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 09/21] Introduce event-tap.

2010-11-30 Thread Yoshiaki Tamura
2010/11/29 Stefan Hajnoczi stefa...@gmail.com:
 On Thu, Nov 25, 2010 at 6:06 AM, Yoshiaki Tamura
 tamura.yoshi...@lab.ntt.co.jp wrote:
 event-tap controls when to start FT transaction, and provides proxy
 functions to called from net/block devices.  While FT transaction, it
 queues up net/block requests, and flush them when the transaction gets
 completed.

 Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
 Signed-off-by: OHMURA Kei ohmura@lab.ntt.co.jp
 ---
  Makefile.target |    1 +
  block.h         |    9 +
  event-tap.c     |  794 
 +++
  event-tap.h     |   34 +++
  net.h           |    4 +
  net/queue.c     |    1 +
  6 files changed, 843 insertions(+), 0 deletions(-)
  create mode 100644 event-tap.c
  create mode 100644 event-tap.h

 event_tap_state is checked at the beginning of several functions.  If
 there is an unexpected state the function silently returns.  Should
 these checks really be assert() so there is an abort and backtrace if
 the program ever reaches this state?

I'm wondering whether abort is too strong, but I think you're
right because stopping Kemari may not be enough as an error
handling.


 +typedef struct EventTapBlkReq {
 +    char *device_name;
 +    int num_reqs;
 +    int num_cbs;
 +    bool is_multiwrite;

 Is multiwrite logging necessary?  If event tap is called from within
 the block layer then multiwrite is turned into one or more
 bdrv_aio_writev() calls.

If we move event-tap into block layer, I guess it won't be
necessary.

 +static void event_tap_replay(void *opaque, int running, int reason)
 +{
 +    EventTapLog *log, *next;
 +
 +    if (!running) {
 +        return;
 +    }
 +
 +    if (event_tap_state != EVENT_TAP_LOAD) {
 +        return;
 +    }
 +
 +    event_tap_state = EVENT_TAP_REPLAY;
 +
 +    QTAILQ_FOREACH(log, event_list, node) {
 +        EventTapBlkReq *blk_req;
 +
 +        /* event resume */
 +        switch (log-mode  ~EVENT_TAP_TYPE_MASK) {
 +        case EVENT_TAP_NET:
 +            event_tap_net_flush(log-net_req);
 +            break;
 +        case EVENT_TAP_BLK:
 +            blk_req = log-blk_req;
 +            if ((log-mode  EVENT_TAP_TYPE_MASK) == EVENT_TAP_IOPORT) {
 +                switch (log-ioport.index) {
 +                case 0:
 +                    cpu_outb(log-ioport.address, log-ioport.data);
 +                    break;
 +                case 1:
 +                    cpu_outw(log-ioport.address, log-ioport.data);
 +                    break;
 +                case 2:
 +                    cpu_outl(log-ioport.address, log-ioport.data);
 +                    break;
 +                }
 +            } else {
 +                /* EVENT_TAP_MMIO */
 +                cpu_physical_memory_rw(log-mmio.address,
 +                                       log-mmio.buf,
 +                                       log-mmio.len, 1);
 +            }
 +            break;

 Why are net tx packets replayed at the net level but blk requests are
 replayed at the pio/mmio level?

 I expected everything to replay either as pio/mmio or as net/block.

It's my mistake, sorry about that.  We're just in way of moving
replay from pio/mmio to net/block, and I mixed up.  I'll revert
it to pio/mmio replay in the next spin.

BTW, I would like to ask a question regarding this.  There is a
callback which net/block calls after processing the requests, and
is there a clean way to set this callback on the failovered
host upon replay?

 +static void event_tap_blk_load(QEMUFile *f, EventTapBlkReq *blk_req)
 +{
 +    BlockRequest *req;
 +    ram_addr_t page_addr;
 +    int i, j, len;
 +
 +    len = qemu_get_byte(f);
 +    blk_req-device_name = qemu_malloc(len + 1);
 +    qemu_get_buffer(f, (uint8_t *)blk_req-device_name, len);
 +    blk_req-device_name[len] = '\0';
 +    blk_req-num_reqs = qemu_get_byte(f);
 +
 +    for (i = 0; i  blk_req-num_reqs; i++) {
 +        req = blk_req-reqs[i];
 +        req-sector = qemu_get_be64(f);
 +        req-nb_sectors = qemu_get_be32(f);
 +        req-qiov = qemu_malloc(sizeof(QEMUIOVector));

 It would make sense to have common QEMUIOVector load/save functions
 instead of inlining this code here.

OK.

 +static int event_tap_load(QEMUFile *f, void *opaque, int version_id)
 +{
 +    EventTapLog *log, *next;
 +    int mode;
 +
 +    event_tap_state = EVENT_TAP_LOAD;
 +
 +    QTAILQ_FOREACH_SAFE(log, event_list, node, next) {
 +        QTAILQ_REMOVE(event_list, log, node);
 +        event_tap_free_log(log);
 +    }
 +
 +    /* loop until EOF */
 +    while ((mode = qemu_get_byte(f)) != 0) {
 +        EventTapLog *log = event_tap_alloc_log();
 +
 +        log-mode = mode;
 +        switch (log-mode  EVENT_TAP_TYPE_MASK) {
 +        case EVENT_TAP_IOPORT:
 +            event_tap_ioport_load(f, log-ioport);
 +            break;
 +        case EVENT_TAP_MMIO:
 +            event_tap_mmio_load(f, log-mmio);
 +            break;
 +        case 0:
 +            DPRINTF(No 

Re: [PATCH 09/21] Introduce event-tap.

2010-11-30 Thread Stefan Hajnoczi
On Tue, Nov 30, 2010 at 9:50 AM, Yoshiaki Tamura
tamura.yoshi...@lab.ntt.co.jp wrote:
 2010/11/29 Stefan Hajnoczi stefa...@gmail.com:
 On Thu, Nov 25, 2010 at 6:06 AM, Yoshiaki Tamura
 tamura.yoshi...@lab.ntt.co.jp wrote:
 event-tap controls when to start FT transaction, and provides proxy
 functions to called from net/block devices.  While FT transaction, it
 queues up net/block requests, and flush them when the transaction gets
 completed.

 Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
 Signed-off-by: OHMURA Kei ohmura@lab.ntt.co.jp
 ---
  Makefile.target |    1 +
  block.h         |    9 +
  event-tap.c     |  794 
 +++
  event-tap.h     |   34 +++
  net.h           |    4 +
  net/queue.c     |    1 +
  6 files changed, 843 insertions(+), 0 deletions(-)
  create mode 100644 event-tap.c
  create mode 100644 event-tap.h

 event_tap_state is checked at the beginning of several functions.  If
 there is an unexpected state the function silently returns.  Should
 these checks really be assert() so there is an abort and backtrace if
 the program ever reaches this state?

Fancier error handling would work too.  For example cleaning up,
turning off Kemari, and producing an error message with
error_report().  In that case we need to think through the state of
the environment carefully and make sure we don't cause secondary
failures (like memory leaks).

 BTW, I would like to ask a question regarding this.  There is a
 callback which net/block calls after processing the requests, and
 is there a clean way to set this callback on the failovered
 host upon replay?

I think this is a limitation in the current design.  If requests are
re-issued by Kemari at the net/block level, how will the higher layers
know about these requests?  How will they be prepared to accept
callbacks?

Stefan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 09/21] Introduce event-tap.

2010-11-30 Thread Yoshiaki Tamura
2010/11/30 Stefan Hajnoczi stefa...@gmail.com:
 On Tue, Nov 30, 2010 at 9:50 AM, Yoshiaki Tamura
 tamura.yoshi...@lab.ntt.co.jp wrote:
 2010/11/29 Stefan Hajnoczi stefa...@gmail.com:
 On Thu, Nov 25, 2010 at 6:06 AM, Yoshiaki Tamura
 tamura.yoshi...@lab.ntt.co.jp wrote:
 event-tap controls when to start FT transaction, and provides proxy
 functions to called from net/block devices.  While FT transaction, it
 queues up net/block requests, and flush them when the transaction gets
 completed.

 Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
 Signed-off-by: OHMURA Kei ohmura@lab.ntt.co.jp
 ---
  Makefile.target |    1 +
  block.h         |    9 +
  event-tap.c     |  794 
 +++
  event-tap.h     |   34 +++
  net.h           |    4 +
  net/queue.c     |    1 +
  6 files changed, 843 insertions(+), 0 deletions(-)
  create mode 100644 event-tap.c
  create mode 100644 event-tap.h

 event_tap_state is checked at the beginning of several functions.  If
 there is an unexpected state the function silently returns.  Should
 these checks really be assert() so there is an abort and backtrace if
 the program ever reaches this state?

 Fancier error handling would work too.  For example cleaning up,
 turning off Kemari, and producing an error message with
 error_report().  In that case we need to think through the state of
 the environment carefully and make sure we don't cause secondary
 failures (like memory leaks).

Turning off Kemari should include canceling the transaction which
notifies the secondary.  So same as you commented for
ft_trans_file error handling, I would implement better handling
for event-tap again.

 BTW, I would like to ask a question regarding this.  There is a
 callback which net/block calls after processing the requests, and
 is there a clean way to set this callback on the failovered
 host upon replay?

 I think this is a limitation in the current design.  If requests are
 re-issued by Kemari at the net/block level, how will the higher layers
 know about these requests?  How will they be prepared to accept
 callbacks?

That's why we're using pio/mmio replay at this moment.  With a
dirty hack in device emulators setting callbacks before replay,
block/net replay seems to work, but I don't think that to be a
correct solution.

Yoshi


 Stefan
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [PATCH 09/21] Introduce event-tap.

2010-11-30 Thread Marcelo Tosatti
On Tue, Nov 30, 2010 at 06:28:55PM +0900, Yoshiaki Tamura wrote:
 2010/11/30 Marcelo Tosatti mtosa...@redhat.com:
  On Thu, Nov 25, 2010 at 03:06:48PM +0900, Yoshiaki Tamura wrote:
  event-tap controls when to start FT transaction, and provides proxy
  functions to called from net/block devices.  While FT transaction, it
  queues up net/block requests, and flush them when the transaction gets
  completed.
 
  Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
  Signed-off-by: OHMURA Kei ohmura@lab.ntt.co.jp
 
  +static void event_tap_alloc_blk_req(EventTapBlkReq *blk_req,
  +                                    BlockDriverState *bs, BlockRequest 
  *reqs,
  +                                    int num_reqs, 
  BlockDriverCompletionFunc *cb,
  +                                    void *opaque, bool is_multiwrite)
  +{
  +    int i;
  +
  +    blk_req-num_reqs = num_reqs;
  +    blk_req-num_cbs = num_reqs;
  +    blk_req-device_name = qemu_strdup(bs-device_name);
  +    blk_req-is_multiwrite = is_multiwrite;
  +
  +    for (i = 0; i  num_reqs; i++) {
  +        blk_req-reqs[i].sector = reqs[i].sector;
  +        blk_req-reqs[i].nb_sectors = reqs[i].nb_sectors;
  +        blk_req-reqs[i].qiov = reqs[i].qiov;
  +        blk_req-reqs[i].cb = cb;
  +        blk_req-reqs[i].opaque = opaque;
  +        blk_req-cb[i] = reqs[i].cb;
  +        blk_req-opaque[i] = reqs[i].opaque;
  +    }
  +}
 
  bdrv_aio_flush should also be logged, so that guest initiated flush is
  respected on replay.
 
 In the current implementation w/o flush logging, there might be
 order inversion after replay?
 
 Yoshi

Yes, since a vcpu is allowed to continue after synchronization is
scheduled via a bh. For virtio-blk, for example:

1) bdrv_aio_write, event queued.
2) bdrv_aio_flush
3) bdrv_aio_write, event queued.

On replay, there is no flush between the two writes.

Why can't synchronization be done from event-tap itself, synchronously,
to avoid this kind of problem?

The way you hook synchronization into savevm seems unclean. Perhaps
better separation between standard savevm path and FT savevm would make
it cleaner.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [PATCH 09/21] Introduce event-tap.

2010-11-30 Thread Yoshiaki Tamura

Marcelo Tosatti wrote:

On Tue, Nov 30, 2010 at 06:28:55PM +0900, Yoshiaki Tamura wrote:

2010/11/30 Marcelo Tosattimtosa...@redhat.com:

On Thu, Nov 25, 2010 at 03:06:48PM +0900, Yoshiaki Tamura wrote:

event-tap controls when to start FT transaction, and provides proxy
functions to called from net/block devices.  While FT transaction, it
queues up net/block requests, and flush them when the transaction gets
completed.

Signed-off-by: Yoshiaki Tamuratamura.yoshi...@lab.ntt.co.jp
Signed-off-by: OHMURA Keiohmura@lab.ntt.co.jp



+static void event_tap_alloc_blk_req(EventTapBlkReq *blk_req,
+BlockDriverState *bs, BlockRequest *reqs,
+int num_reqs, BlockDriverCompletionFunc 
*cb,
+void *opaque, bool is_multiwrite)
+{
+int i;
+
+blk_req-num_reqs = num_reqs;
+blk_req-num_cbs = num_reqs;
+blk_req-device_name = qemu_strdup(bs-device_name);
+blk_req-is_multiwrite = is_multiwrite;
+
+for (i = 0; i  num_reqs; i++) {
+blk_req-reqs[i].sector = reqs[i].sector;
+blk_req-reqs[i].nb_sectors = reqs[i].nb_sectors;
+blk_req-reqs[i].qiov = reqs[i].qiov;
+blk_req-reqs[i].cb = cb;
+blk_req-reqs[i].opaque = opaque;
+blk_req-cb[i] = reqs[i].cb;
+blk_req-opaque[i] = reqs[i].opaque;
+}
+}


bdrv_aio_flush should also be logged, so that guest initiated flush is
respected on replay.


In the current implementation w/o flush logging, there might be
order inversion after replay?

Yoshi


Yes, since a vcpu is allowed to continue after synchronization is
scheduled via a bh. For virtio-blk, for example:

1) bdrv_aio_write, event queued.
2) bdrv_aio_flush
3) bdrv_aio_write, event queued.

On replay, there is no flush between the two writes.

Why can't synchronization be done from event-tap itself, synchronously,
to avoid this kind of problem?


Thanks.  I would fix it.


The way you hook synchronization into savevm seems unclean. Perhaps
better separation between standard savevm path and FT savevm would make
it cleaner.


I think you're mentioning about the changes in migration.c?

Yoshi
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [PATCH 09/21] Introduce event-tap.

2010-11-30 Thread Marcelo Tosatti
On Tue, Nov 30, 2010 at 07:35:54PM +0900, Yoshiaki Tamura wrote:
 Marcelo Tosatti wrote:
 On Tue, Nov 30, 2010 at 06:28:55PM +0900, Yoshiaki Tamura wrote:
 2010/11/30 Marcelo Tosattimtosa...@redhat.com:
 On Thu, Nov 25, 2010 at 03:06:48PM +0900, Yoshiaki Tamura wrote:
 event-tap controls when to start FT transaction, and provides proxy
 functions to called from net/block devices.  While FT transaction, it
 queues up net/block requests, and flush them when the transaction gets
 completed.
 
 Signed-off-by: Yoshiaki Tamuratamura.yoshi...@lab.ntt.co.jp
 Signed-off-by: OHMURA Keiohmura@lab.ntt.co.jp
 
 +static void event_tap_alloc_blk_req(EventTapBlkReq *blk_req,
 +BlockDriverState *bs, BlockRequest 
 *reqs,
 +int num_reqs, 
 BlockDriverCompletionFunc *cb,
 +void *opaque, bool is_multiwrite)
 +{
 +int i;
 +
 +blk_req-num_reqs = num_reqs;
 +blk_req-num_cbs = num_reqs;
 +blk_req-device_name = qemu_strdup(bs-device_name);
 +blk_req-is_multiwrite = is_multiwrite;
 +
 +for (i = 0; i  num_reqs; i++) {
 +blk_req-reqs[i].sector = reqs[i].sector;
 +blk_req-reqs[i].nb_sectors = reqs[i].nb_sectors;
 +blk_req-reqs[i].qiov = reqs[i].qiov;
 +blk_req-reqs[i].cb = cb;
 +blk_req-reqs[i].opaque = opaque;
 +blk_req-cb[i] = reqs[i].cb;
 +blk_req-opaque[i] = reqs[i].opaque;
 +}
 +}
 
 bdrv_aio_flush should also be logged, so that guest initiated flush is
 respected on replay.
 
 In the current implementation w/o flush logging, there might be
 order inversion after replay?
 
 Yoshi
 
 Yes, since a vcpu is allowed to continue after synchronization is
 scheduled via a bh. For virtio-blk, for example:
 
 1) bdrv_aio_write, event queued.
 2) bdrv_aio_flush
 3) bdrv_aio_write, event queued.
 
 On replay, there is no flush between the two writes.
 
 Why can't synchronization be done from event-tap itself, synchronously,
 to avoid this kind of problem?
 
 Thanks.  I would fix it.
 
 The way you hook synchronization into savevm seems unclean. Perhaps
 better separation between standard savevm path and FT savevm would make
 it cleaner.
 
 I think you're mentioning about the changes in migration.c?
 
 Yoshi

The important point is to stop vcpu activity after the event is queued,
and resume once synchronization is performed. Stopping the vm after
Kemari event queueing should do it, once Michael's stable migration
patchset is in (and net/block layers fixed).

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 09/21] Introduce event-tap.

2010-11-29 Thread Stefan Hajnoczi
On Thu, Nov 25, 2010 at 6:06 AM, Yoshiaki Tamura
tamura.yoshi...@lab.ntt.co.jp wrote:
 event-tap controls when to start FT transaction, and provides proxy
 functions to called from net/block devices.  While FT transaction, it
 queues up net/block requests, and flush them when the transaction gets
 completed.

 Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
 Signed-off-by: OHMURA Kei ohmura@lab.ntt.co.jp
 ---
  Makefile.target |    1 +
  block.h         |    9 +
  event-tap.c     |  794 
 +++
  event-tap.h     |   34 +++
  net.h           |    4 +
  net/queue.c     |    1 +
  6 files changed, 843 insertions(+), 0 deletions(-)
  create mode 100644 event-tap.c
  create mode 100644 event-tap.h

event_tap_state is checked at the beginning of several functions.  If
there is an unexpected state the function silently returns.  Should
these checks really be assert() so there is an abort and backtrace if
the program ever reaches this state?

 +typedef struct EventTapBlkReq {
 +    char *device_name;
 +    int num_reqs;
 +    int num_cbs;
 +    bool is_multiwrite;

Is multiwrite logging necessary?  If event tap is called from within
the block layer then multiwrite is turned into one or more
bdrv_aio_writev() calls.

 +static void event_tap_replay(void *opaque, int running, int reason)
 +{
 +    EventTapLog *log, *next;
 +
 +    if (!running) {
 +        return;
 +    }
 +
 +    if (event_tap_state != EVENT_TAP_LOAD) {
 +        return;
 +    }
 +
 +    event_tap_state = EVENT_TAP_REPLAY;
 +
 +    QTAILQ_FOREACH(log, event_list, node) {
 +        EventTapBlkReq *blk_req;
 +
 +        /* event resume */
 +        switch (log-mode  ~EVENT_TAP_TYPE_MASK) {
 +        case EVENT_TAP_NET:
 +            event_tap_net_flush(log-net_req);
 +            break;
 +        case EVENT_TAP_BLK:
 +            blk_req = log-blk_req;
 +            if ((log-mode  EVENT_TAP_TYPE_MASK) == EVENT_TAP_IOPORT) {
 +                switch (log-ioport.index) {
 +                case 0:
 +                    cpu_outb(log-ioport.address, log-ioport.data);
 +                    break;
 +                case 1:
 +                    cpu_outw(log-ioport.address, log-ioport.data);
 +                    break;
 +                case 2:
 +                    cpu_outl(log-ioport.address, log-ioport.data);
 +                    break;
 +                }
 +            } else {
 +                /* EVENT_TAP_MMIO */
 +                cpu_physical_memory_rw(log-mmio.address,
 +                                       log-mmio.buf,
 +                                       log-mmio.len, 1);
 +            }
 +            break;

Why are net tx packets replayed at the net level but blk requests are
replayed at the pio/mmio level?

I expected everything to replay either as pio/mmio or as net/block.

 +static void event_tap_blk_load(QEMUFile *f, EventTapBlkReq *blk_req)
 +{
 +    BlockRequest *req;
 +    ram_addr_t page_addr;
 +    int i, j, len;
 +
 +    len = qemu_get_byte(f);
 +    blk_req-device_name = qemu_malloc(len + 1);
 +    qemu_get_buffer(f, (uint8_t *)blk_req-device_name, len);
 +    blk_req-device_name[len] = '\0';
 +    blk_req-num_reqs = qemu_get_byte(f);
 +
 +    for (i = 0; i  blk_req-num_reqs; i++) {
 +        req = blk_req-reqs[i];
 +        req-sector = qemu_get_be64(f);
 +        req-nb_sectors = qemu_get_be32(f);
 +        req-qiov = qemu_malloc(sizeof(QEMUIOVector));

It would make sense to have common QEMUIOVector load/save functions
instead of inlining this code here.

 +static int event_tap_load(QEMUFile *f, void *opaque, int version_id)
 +{
 +    EventTapLog *log, *next;
 +    int mode;
 +
 +    event_tap_state = EVENT_TAP_LOAD;
 +
 +    QTAILQ_FOREACH_SAFE(log, event_list, node, next) {
 +        QTAILQ_REMOVE(event_list, log, node);
 +        event_tap_free_log(log);
 +    }
 +
 +    /* loop until EOF */
 +    while ((mode = qemu_get_byte(f)) != 0) {
 +        EventTapLog *log = event_tap_alloc_log();
 +
 +        log-mode = mode;
 +        switch (log-mode  EVENT_TAP_TYPE_MASK) {
 +        case EVENT_TAP_IOPORT:
 +            event_tap_ioport_load(f, log-ioport);
 +            break;
 +        case EVENT_TAP_MMIO:
 +            event_tap_mmio_load(f, log-mmio);
 +            break;
 +        case 0:
 +            DPRINTF(No event\n);
 +            break;
 +        default:
 +            fprintf(stderr, Unknown state %d\n, log-mode);
 +            return -1;

log is leaked here...

 +        }
 +
 +        switch (log-mode  ~EVENT_TAP_TYPE_MASK) {
 +        case EVENT_TAP_NET:
 +            event_tap_net_load(f, log-net_req);
 +            break;
 +        case EVENT_TAP_BLK:
 +            event_tap_blk_load(f, log-blk_req);
 +            break;
 +        default:
 +            fprintf(stderr, Unknown state %d\n, log-mode);
 +            return -1;

...and here.

Stefan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to 

Re: [PATCH 09/21] Introduce event-tap.

2010-11-29 Thread Marcelo Tosatti
On Thu, Nov 25, 2010 at 03:06:48PM +0900, Yoshiaki Tamura wrote:
 event-tap controls when to start FT transaction, and provides proxy
 functions to called from net/block devices.  While FT transaction, it
 queues up net/block requests, and flush them when the transaction gets
 completed.
 
 Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
 Signed-off-by: OHMURA Kei ohmura@lab.ntt.co.jp

 +static void event_tap_alloc_blk_req(EventTapBlkReq *blk_req,
 +BlockDriverState *bs, BlockRequest *reqs,
 +int num_reqs, BlockDriverCompletionFunc 
 *cb,
 +void *opaque, bool is_multiwrite)
 +{
 +int i;
 +
 +blk_req-num_reqs = num_reqs;
 +blk_req-num_cbs = num_reqs;
 +blk_req-device_name = qemu_strdup(bs-device_name);
 +blk_req-is_multiwrite = is_multiwrite;
 +
 +for (i = 0; i  num_reqs; i++) {
 +blk_req-reqs[i].sector = reqs[i].sector;
 +blk_req-reqs[i].nb_sectors = reqs[i].nb_sectors;
 +blk_req-reqs[i].qiov = reqs[i].qiov;
 +blk_req-reqs[i].cb = cb;
 +blk_req-reqs[i].opaque = opaque;
 +blk_req-cb[i] = reqs[i].cb;
 +blk_req-opaque[i] = reqs[i].opaque;
 +}
 +}   

bdrv_aio_flush should also be logged, so that guest initiated flush is
respected on replay.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html