date:20130307

This series tries to let guest instead of qemu to send the gratuitous packets
after migration when guest is capable of doing this. This is needed since it's
impossible for qemu to keep track of all configurations (e.g 802.1Q) and mac
addresses (more than one mac address may be used by guest). So qemu can't build
gratuitous packets for all those configurations properly. The only solution is
let guest driver who knew all needed information to do this.

The series first introduces a new runstate which just tracks the state when the
migration is finished and guest is about to start. And then we can just trying
to notify the guest to send the GARP after changing from this state to
running. A model specific announcing method were also also introduced to let
each kinds of nic do its own notification. When there's no such method register
for the nic, the old style of sending RARP were kept. And the last two patches
implemented the virtio-net method of notification.

Changes from V6:
- introduce a new runstate instead of using a global variable check the state

Changes from V5:
- use a global variable to decide whether an announcement is needed after 
migration
- align with virtio spec and let guest ack the announcement notification through
  control vq instead of config status writing

Changes from V4:
- keep the old behavior that send the gratuitous packets only after migration
- decide whether to send gratuitous packets by previous runstate instead of a 
dedicated parameter
- check virtio_net_started() instead of VIRTIO_NET_S_LINK_UP before issue the 
config update interrupt
- move VIRTIO_NET_S_ANNOUNCE to 0x100 and supress guest config write to RO bits
- cleanups suggested by Michael

Tested with migration within 802.1Q.

Jason Wang (5):
  runstate: introduce prelaunch-migrate state
  net: announce self after vm is started
  net: model specific announcing support
  virtio-net: notify guest to annouce itself
  virtio-net: compat guest announce

 hw/pc.h   |6 +-
 hw/virtio-net.c   |   30 ++
 hw/virtio-net.h   |   15 ++-
 include/net/net.h |2 ++
 migration.c   |4 +---
 qapi-schema.json  |5 -
 savevm.c  |8 ++--
 vl.c  |8 +++-
 8 files changed, 69 insertions(+), 9 deletions(-)

[Qemu-devel] [PATCH V7 1/5] runstate: introduce prelaunch-migrate state

Sometimes, we need track the state when guest is just about to start after
migration. There's not a accurate state available which do this accurately
(consider qemu may started with -S in destination).

So this patch introduces a new state prelaunch-migrate which just tracks this
state, it covers the case both w/ and w/o -S in destination. The first user of
this is the support of doing announce by guest.

Signed-off-by: Jason Wang jasow...@redhat.com
---
 migration.c  |3 +--
 qapi-schema.json |5 -
 vl.c |4 +++-
 3 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/migration.c b/migration.c
index 11725ae..ecdf2c5 100644
--- a/migration.c
+++ b/migration.c
@@ -107,10 +107,9 @@ static void process_incoming_migration_co(void *opaque)
 /* Make sure all file formats flush their mutable metadata */
 bdrv_invalidate_cache_all();
 
+runstate_set(RUN_STATE_PRELAUNCH_MIGRATE);
 if (autostart) {
 vm_start();
-} else {
-runstate_set(RUN_STATE_PAUSED);
 }
 }
 
diff --git a/qapi-schema.json b/qapi-schema.json
index 28b070f..baa6361 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -174,11 +174,14 @@
 # @suspended: guest is suspended (ACPI S3)
 #
 # @watchdog: the watchdog action is configured to pause and has been triggered
+#
+# @migrate-prelaunch: migration is completed and QEMU were started with -S
 ##
 { 'enum': 'RunState',
   'data': [ 'debug', 'inmigrate', 'internal-error', 'io-error', 'paused',
 'postmigrate', 'prelaunch', 'finish-migrate', 'restore-vm',
-'running', 'save-vm', 'shutdown', 'suspended', 'watchdog' ] }
+'running', 'save-vm', 'shutdown', 'suspended', 'watchdog',
+'prelaunch-migrate'] }
 
 ##
 # @SnapshotInfo
diff --git a/vl.c b/vl.c
index c03edf1..5dd2e0e 100644
--- a/vl.c
+++ b/vl.c
@@ -534,7 +534,7 @@ static const RunStateTransition runstate_transitions_def[] 
= {
 { RUN_STATE_DEBUG, RUN_STATE_RUNNING },
 
 { RUN_STATE_INMIGRATE, RUN_STATE_RUNNING },
-{ RUN_STATE_INMIGRATE, RUN_STATE_PAUSED },
+{ RUN_STATE_INMIGRATE, RUN_STATE_PRELAUNCH_MIGRATE },
 
 { RUN_STATE_INTERNAL_ERROR, RUN_STATE_PAUSED },
 { RUN_STATE_INTERNAL_ERROR, RUN_STATE_FINISH_MIGRATE },
@@ -580,6 +580,8 @@ static const RunStateTransition runstate_transitions_def[] 
= {
 { RUN_STATE_WATCHDOG, RUN_STATE_RUNNING },
 { RUN_STATE_WATCHDOG, RUN_STATE_FINISH_MIGRATE },
 
+{ RUN_STATE_PRELAUNCH_MIGRATE, RUN_STATE_RUNNING },
+
 { RUN_STATE_MAX, RUN_STATE_MAX },
 };
 
-- 
1.7.1

[Qemu-devel] [PATCH V7 2/5] net: announce self after vm is started

Since we may want to send garp by guest if guest driver is capable of it. We
need trigger this event after vm is started. So this patch do this when the
state is changing from RUN_STATE_PRELAUNCH_MIGRATE to RUN_STATE_RUNINNG.

Signed-off-by: Jason Wang jasow...@redhat.com
---
 migration.c |1 -
 vl.c|4 
 2 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/migration.c b/migration.c
index ecdf2c5..3599d2d 100644
--- a/migration.c
+++ b/migration.c
@@ -100,7 +100,6 @@ static void process_incoming_migration_co(void *opaque)
 fprintf(stderr, load of migration failed\n);
 exit(0);
 }
-qemu_announce_self();
 DPRINTF(successfully loaded vm state\n);
 
 bdrv_clear_incoming_migration_all();
diff --git a/vl.c b/vl.c
index 5dd2e0e..ccf6ea7 100644
--- a/vl.c
+++ b/vl.c
@@ -1681,11 +1681,15 @@ void vm_state_notify(int running, RunState state)
 void vm_start(void)
 {
 if (!runstate_is_running()) {
+RunState prev_run_state = current_run_state;
 cpu_enable_ticks();
 runstate_set(RUN_STATE_RUNNING);
 vm_state_notify(1, RUN_STATE_RUNNING);
 resume_all_vcpus();
 monitor_protocol_event(QEVENT_RESUME, NULL);
+if (prev_run_state == RUN_STATE_PRELAUNCH_MIGRATE) {
+qemu_announce_self();
+}
 }
 }
 
-- 
1.7.1

[Qemu-devel] [PATCH V7 4/5] virtio-net: notify guest to annouce itself

It's hard to track all mac addresses and their configurations (e.g vlan) in
qemu. Without those information, it's impossible to build proper garp packet
after migration. The only possible solution to this is let guest ( who knew all
configurations) to do this.

So, this patch introduces a new rw config status bit of virtio-net,
VIRTIO_NET_S_ANNOUNCE which is used to notify guest to announce presence of its
link through config update interrupt. When gust have done the announcement, it
should ack the notification through VIRTIO_NET_CTRL_ANNOUNCE_ACK cmd. This
feature is negotiated by a new feature bit VIRTIO_NET_F_ANNOUNCE.

Signed-off-by: Jason Wang jasow...@redhat.com
---
 hw/virtio-net.c |   30 ++
 hw/virtio-net.h |   15 ++-
 2 files changed, 44 insertions(+), 1 deletions(-)

diff --git a/hw/virtio-net.c b/hw/virtio-net.c
index 009f09e..278c42c 100644
--- a/hw/virtio-net.c
+++ b/hw/virtio-net.c
@@ -610,6 +610,18 @@ static int virtio_net_handle_mq(VirtIONet *n, uint8_t cmd,
 
 return VIRTIO_NET_OK;
 }
+
+static int virtio_net_handle_announce(VirtIONet *n, uint8_t cmd,
+  struct iovec *iov, unsigned int iov_cnt)
+{
+if (cmd == VIRTIO_NET_CTRL_ANNOUNCE_ACK) {
+n-status = ~VIRTIO_NET_S_ANNOUNCE;
+return VIRTIO_NET_OK;
+} else {
+return VIRTIO_NET_ERR;
+}
+}
+
 static void virtio_net_handle_ctrl(VirtIODevice *vdev, VirtQueue *vq)
 {
 VirtIONet *n = to_virtio_net(vdev);
@@ -639,6 +651,8 @@ static void virtio_net_handle_ctrl(VirtIODevice *vdev, 
VirtQueue *vq)
 status = virtio_net_handle_mac(n, ctrl.cmd, iov, iov_cnt);
 } else if (ctrl.class == VIRTIO_NET_CTRL_VLAN) {
 status = virtio_net_handle_vlan_table(n, ctrl.cmd, iov, iov_cnt);
+} else if (ctrl.class == VIRTIO_NET_CTRL_ANNOUNCE) {
+status = virtio_net_handle_announce(n, ctrl.cmd, iov, iov_cnt);
 } else if (ctrl.class == VIRTIO_NET_CTRL_MQ) {
 status = virtio_net_handle_mq(n, ctrl.cmd, iov, iov_cnt);
 }
@@ -1275,6 +1289,21 @@ static void virtio_net_cleanup(NetClientState *nc)
 n-nic = NULL;
 }
 
+static int virtio_net_announce(NetClientState *nc)
+{
+VirtIONet *n = qemu_get_nic_opaque(nc);
+
+if (n-vdev.guest_features  (0x1  VIRTIO_NET_F_GUEST_ANNOUNCE)
+ n-vdev.guest_features  (0x1  VIRTIO_NET_F_CTRL_VQ)
+ virtio_net_started(n, n-vdev.status)) {
+n-status |= VIRTIO_NET_S_ANNOUNCE;
+virtio_notify_config(n-vdev);
+return 0;
+}
+
+return -1;
+}
+
 static NetClientInfo net_virtio_info = {
 .type = NET_CLIENT_OPTIONS_KIND_NIC,
 .size = sizeof(NICState),
@@ -1282,6 +1311,7 @@ static NetClientInfo net_virtio_info = {
 .receive = virtio_net_receive,
 .cleanup = virtio_net_cleanup,
 .link_status_changed = virtio_net_set_link_status,
+.announce = virtio_net_announce,
 };
 
 static bool virtio_net_guest_notifier_pending(VirtIODevice *vdev, int idx)
diff --git a/hw/virtio-net.h b/hw/virtio-net.h
index e654c13..c0f3de5 100644
--- a/hw/virtio-net.h
+++ b/hw/virtio-net.h
@@ -43,12 +43,13 @@
 #define VIRTIO_NET_F_CTRL_RX18  /* Control channel RX mode support */
 #define VIRTIO_NET_F_CTRL_VLAN  19  /* Control channel VLAN filtering */
 #define VIRTIO_NET_F_CTRL_RX_EXTRA 20   /* Extra RX mode control support */
+#define VIRTIO_NET_F_GUEST_ANNOUNCE 21  /* Guest can announce itself */
 #define VIRTIO_NET_F_MQ 22  /* Device supports Receive Flow
  * Steering */
-
 #define VIRTIO_NET_F_CTRL_MAC_ADDR   23 /* Set MAC address */
 
 #define VIRTIO_NET_S_LINK_UP1   /* Link is up */
+#define VIRTIO_NET_S_ANNOUNCE   2   /* Announcement is needed */
 
 #define TX_TIMER_INTERVAL 15 /* 150 us */
 
@@ -152,6 +153,17 @@ struct virtio_net_ctrl_mac {
  #define VIRTIO_NET_CTRL_VLAN_DEL 1
 
 /*
+ * Control link announce acknowledgement
+ *
+ * The command VIRTIO_NET_CTRL_ANNOUNCE_ACK is used to indicate that
+ * driver has recevied the notification and device would clear the
+ * VIRTIO_NET_S_ANNOUNCE bit in the status filed after it received
+ * this command.
+ */
+#define VIRTIO_NET_CTRL_ANNOUNCE   3
+ #define VIRTIO_NET_CTRL_ANNOUNCE_ACK 0
+
+/*
  * Control Multiqueue
  *
  * The command VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET
@@ -180,6 +192,7 @@ struct virtio_net_ctrl_mq {
 DEFINE_PROP_BIT(guest_tso6, _state, _field, VIRTIO_NET_F_GUEST_TSO6, 
true), \
 DEFINE_PROP_BIT(guest_ecn, _state, _field, VIRTIO_NET_F_GUEST_ECN, 
true), \
 DEFINE_PROP_BIT(guest_ufo, _state, _field, VIRTIO_NET_F_GUEST_UFO, 
true), \
+DEFINE_PROP_BIT(guest_announce, _state, _field, 
VIRTIO_NET_F_GUEST_ANNOUNCE, true), \
 DEFINE_PROP_BIT(host_tso4, _state, _field, VIRTIO_NET_F_HOST_TSO4, 
true), \
 DEFINE_PROP_BIT(host_tso6, _state, _field, VIRTIO_NET_F_HOST_TSO6, 
true), \

[Qemu-devel] [PATCH V7 3/5] net: model specific announcing support

This patch introduces a function pointer in NetClientInfo which is called during
self announcement. With this, each kind of card can announce the link with a
model specific way. The old method is still kept for cards that have not
implemented this or old guest. The first user would be virtio-net.

Signed-off-by: Jason Wang jasow...@redhat.com
---
 include/net/net.h |2 ++
 savevm.c  |8 ++--
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/include/net/net.h b/include/net/net.h
index cb049a1..49031b4 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -44,6 +44,7 @@ typedef ssize_t (NetReceiveIOV)(NetClientState *, const 
struct iovec *, int);
 typedef void (NetCleanup) (NetClientState *);
 typedef void (LinkStatusChanged)(NetClientState *);
 typedef void (NetClientDestructor)(NetClientState *);
+typedef int (NetAnnounce)(NetClientState *);
 
 typedef struct NetClientInfo {
 NetClientOptionsKind type;
@@ -55,6 +56,7 @@ typedef struct NetClientInfo {
 NetCleanup *cleanup;
 LinkStatusChanged *link_status_changed;
 NetPoll *poll;
+NetAnnounce *announce;
 } NetClientInfo;
 
 struct NetClientState {
diff --git a/savevm.c b/savevm.c
index a8a53ef..fd69e32 100644
--- a/savevm.c
+++ b/savevm.c
@@ -76,12 +76,16 @@ static int announce_self_create(uint8_t *buf,
 
 static void qemu_announce_self_iter(NICState *nic, void *opaque)
 {
+NetClientState *nc = qemu_get_queue(nic);
+NetAnnounce *announce = nc-info-announce;
 uint8_t buf[60];
 int len;
 
-len = announce_self_create(buf, nic-conf-macaddr.a);
+if (!announce || announce(nc)) {
+len = announce_self_create(buf, nic-conf-macaddr.a);
 
-qemu_send_packet_raw(qemu_get_queue(nic), buf, len);
+qemu_send_packet_raw(qemu_get_queue(nic), buf, len);
+}
 }
 
 
-- 
1.7.1

[Qemu-devel] [PATCH V7 5/5] virtio-net: compat guest announce

Disable guest announce for pre-1.5.

Signed-off-by: Jason Wang jasow...@redhat.com
---
 hw/pc.h |6 +-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/hw/pc.h b/hw/pc.h
index f2c1b1c..2de7cd4 100644
--- a/hw/pc.h
+++ b/hw/pc.h
@@ -221,6 +221,10 @@ int e820_add_entry(uint64_t, uint64_t, uint32_t);
 .property = vectors,\
 /* DEV_NVECTORS_UNSPECIFIED as a uint32_t string */\
 .value= stringify(0x),\
-   }
+},{\
+.driver   = virtio-net-pci,   \
+.property = guest_announce, \
+.value= off,\
+}
 
 #endif
-- 
1.7.1

Re: [Qemu-devel] [PATCH v6 11/24] hw/nand.c: correct the sense of the BUSY/READY status bit

2013-03-07 Thread Edgar E. Iglesias

On Thu, Mar 07, 2013 at 12:11:51PM +1000, Peter Crosthwaite wrote:
 Hi Kuo Jung, Peter,
 
 This patch fixes bugs for us in Zynq Nand (cc Wendy Liang). Can we get
 a cherry pick of this?

I've applied this one.

Thanks,
Edgar


 
 Regards,
 Peter
 
 On Wed, Mar 6, 2013 at 5:27 PM, Kuo-Jung Su dant...@gmail.com wrote:
  The BIT6 of Status Register(SR):
 
  SR[6] behaves the same as R/B# pin
  SR[6] = 0 indicates the device is busy;
  SR[6] = 1 means the device is ready
 
  Some NAND flash controller (i.e. ftnandc021) relies on the SR[6]
  to determine if the NAND flash erase/program is success or error timeout.
 
  P.S:
  The exmaple NAND flash datasheet could be found at following link:
  http://www.mxic.com.tw/QuickPlace/hq/PageLibrary4825740B00298A3B.nsf/h_Index/8FEA549237D2F7674825795800104C26/$File/MX30LF1G08AA,%203V,%201Gb,%20v1.1.pdf
 
  Signed-off-by: Kuo-Jung Su dant...@gmail.com
 
 Reviewed-by: Peter Crosthwaite peter.crosthwa...@xilinx.com
 
  ---
   hw/nand.c |3 ++-
   1 file changed, 2 insertions(+), 1 deletion(-)
 
  diff --git a/hw/nand.c b/hw/nand.c
  index 4a71265..61e918f 100644
  --- a/hw/nand.c
  +++ b/hw/nand.c
  @@ -46,7 +46,7 @@
   # define NAND_IOSTATUS_PLANE1  (1  2)
   # define NAND_IOSTATUS_PLANE2  (1  3)
   # define NAND_IOSTATUS_PLANE3  (1  4)
  -# define NAND_IOSTATUS_BUSY(1  6)
  +# define NAND_IOSTATUS_READY(1  6)
   # define NAND_IOSTATUS_UNPROTCT(1  7)
 
   # define MAX_PAGE  0x800
  @@ -231,6 +231,7 @@ static void nand_reset(DeviceState *dev)
   s-iolen = 0;
   s-offset = 0;
   s-status = NAND_IOSTATUS_UNPROTCT;
  +s-status |= NAND_IOSTATUS_READY;
   }
 
   static inline void nand_pushio_byte(NANDFlashState *s, uint8_t value)
  --
  1.7.9.5

Re: [Qemu-devel] problems with freeBSD

2013-03-07 Thread Aurelien Jarno

On Wed, Mar 06, 2013 at 07:53:51PM -0500, Kevin O'Connor wrote:
 On Thu, Mar 07, 2013 at 12:12:08AM +0100, Aurelien Jarno wrote:
  On Wed, Mar 06, 2013 at 08:21:11AM +, Dietmar Maurer wrote:
   Using qemu 1.4.0:
   
   # qemu -hda test.raw -m 512 -cdrom 
   pfSense-LiveCD-2.0.2-RELEASE-amd64-20121207-2239.iso
   
   Results in:
   
   trap 12: page fault while in kernel mode
   ...
   stopped at x86bios_emu_rdw+0x2f: movzwl (%rbx),%eax
   
   Any ideas? Can somebody reproduce that?
   
   To get the FreeBSD VM boot use the console, enter the boot loader, then:
   # set hint.atkbd.0.disabled=1
   # boot
   
   But that disables the keyboard.
  
  I was actually digging about that problem. It is indeed present in
  version 1.4.0, but is fixed in the current git master. The problem is
  actually not directly in QEMU but in seabios, the update to version
  1.7.2.1 commit 5c75fb10) fixes the issue. Maybe it is worth 
  cherry-picking it into stable-1.4 (hence the Cc:). In the meantime
  using bios.bin from master with QEMU version 1.4.0 should also fix the
  issue.
  
  What is strange is the seabios commit fixing the issue:
  
  commit 4219149ad2b783abfa61e80e9e9f6910db0c76c9
  Author: Kevin O'Connor ke...@koconnor.net
  Date:   Sun Feb 17 10:56:10 2013 -0500
  
  build: Don't require $(OUT) to be a sub-directory of the main 
  directory.
 
 That change is definitely just build related - I don't see how it
 could impact the final SeaBIOS binary.  How did you conclude that this
 commit is what fixes the issue?
 

I did a git bisect to find the commit fixing the issue. Then, as I was
not believing the result, I tried the following sequence a dozen of
times (for some unknown reasons the FreeBSD install CD doesn't exhibit
the issue, so I used the Debian GNU/kFreeBSD installer):

| mkdir qemu-freebsd-bug
| cd qemu-freebsd-bug
|
| wget 
http://ftp.debian.org/debian/dists/squeeze/main/installer-kfreebsd-amd64/current/images/netboot/mini.iso
 
|
| git clone git://git.qemu.org/qemu.git
| cd qemu
| git checkout -b stable-1.4 v1.4.0
| ./configure --target-list=x86_64-softmmu
| make
| cd ..
|
| git clone git://git.seabios.org/seabios.git
| cd seabios
| git checkout -b 1.7.2-stable origin/1.7.2-stable
| git reset --hard 4219149ad2b783abfa61e80e9e9f6910db0c76c9
| make
| cp out/bios.bin ../qemu/pc-bios
| cd..
|
| # debian-installer boots correctly 
| ./qemu/x86_64-softmmu/qemu-system-x86_64 -enable-kvm -cdrom mini.iso
|
| cd seabios
| git reset --hard d75c22fcb6521dad11428b65789d92f89675c600 
| git clean -fdx
| make
| cp out/bios.bin ../qemu/pc-bios
| cd ..
|
| # debian-installer fails to boot
| ./qemu/x86_64-softmmu/qemu-system-x86_64 -enable-kvm -cdrom mini.iso 


Maybe I am doing something wrong or there is a bug in my toolchain
(Debian Sid). It would be nice if someone could try to reproduce that on
another distro/system.

-- 
Aurelien Jarno  GPG: 1024D/F1BCDB73
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH] sysbus: add no_user for devices using mmio or IRQ or GPIO

Peter Maydell peter.mayd...@linaro.org writes:

 On 5 March 2013 04:16, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 04/03/2013 18:58, Peter Maydell ha scritto:
  Mass-mark these devices as no_user.
 There is no such thing as a 'no-user' device -- Anthony
 (http://lists.gnu.org/archive/html/qemu-devel/2013-01/msg00896.html)

 We should figure out what we might be trying to use 'no-user'
 for, and consistently use it that way. Or alternatively we
 should remove it (perhaps replacing it with other flags).
 Mass-marking all the sysbus devices when we don't have a
 consistent sane defined semantics for the flag seems like
 a bad idea.

 $ x86_64-softmmu/qemu-system-x86_64 -device xlnx,,ps7-usb
 (qemu) info qtree
 bus: main-system-bus
   type System
   dev: xlnx,ps7-usb, id 
 maxframes = 128
 irq 1
 mmio /1000
 bus: usb-bus.0
   type usb-bus

 I have no idea what this means, but I'm pretty sure that no matter how I
 configure it, it will never work.

 I agree. Exhibiting something that's a suboptimal user
 experience doesn't count as defining consistent semantics
 for no_user, though :-)

Can we make some progress towards something that makes more sense?

First step: reasons for marking a device no_user.

From a user point of view, I think there's just one: -device/device_add
cannot possibly result in a working device.  Coherent enough semantics
for no_user, in my opinion, but I readily concede it's not particularly
useful for maintaining it as infrastructure and device models evolve.

For that, we need to drill down into reasons for cannot possibly work.
Here are two, please add more:

* Can't make required connections

  -device can make only one connection: to a qbus (first order
  approximation).  Usually suffices for devices on real buses like
  PCI, SCSI, ...  Doesn't make any useful connection on sysbus (put in
  quotes because it's not really a bus).  Cases in between may exist:
  the bus connection is real enough, but the device wants additional
  connections.

  Additional connections are made by setup code that's not reachable
  from -device.  If they're required for the device to work, then
  -device cannot possibly result in a working device.  Worse, -device
  could happily claim success all the same.

  Are these additional connections always required?

  The need for additional connections depends only on the device, right?

  Automated static reasoning on setup code to determine the value of a
  makes additional connections flag feels intractable.  A runtime
  check could be feasible, though.

* Resource collision with board

  The device must connect to some fixed resource, but the board already
  connects something to it.  Without no_user, this should result in an
  error message of sorts, which is much better than the silent breakage
  above.  Whether the message makes any sense to the user is a different
  question.

  Unlike above, this isn't just about the device, it's about the device
  and the board.  Right now, one no_user has to make do for all boards.
  Hmm.

  A -device can also resource-collide with another -device.  But that's
  outside no_user's mission: -device *can* result in a working device
  there, just not in this particular combination.

  In a declarative world, we could automatically filter out devices who
  need resources that are never available on this board.  qdev/QOM is
  moving away from declarative.  Ideas?

  [and I don't think this device
 can be added via the monitor but not the command line
 counts as consistent or coherent...]

no_user applies equally to -device and device_add.

The command line additionally provides a number of settings the board
setup code may (or may not) use, and no_user has no effect on them.

Coherent enough for me.

 Yes, the right thing to do would be to QOMify memory regions and
 introduce pins, but that's a bit more than the amount of time I have now
 for this.

 ...plus it means that when we do have these things we
 have to go round and identify the cases where no_user
 was set only because we didn't have the features before.

 My attitude here really is yes, it's not great but it's
 been like this forever and we don't seem to have had a
 huge flood of user complaints, so better not to mess
 with it unless what you're doing is going to amount to
 some kind of cleanup.

Never underestimate users' capability to suffer quietly.

Re: [Qemu-devel] [PATCH 1/7] block: only force IO completion in .bdrv_truncate if we are shrinking

Am 06.03.2013 um 21:39 hat Paolo Bonzini geschrieben:
 Il 06/03/2013 20:03, Peter Lieven ha scritto:
  Am 06.03.2013 19:48, schrieb Jeff Cody:
  On Wed, Mar 06, 2013 at 07:31:51PM +0100, Paolo Bonzini wrote:
  Il 06/03/2013 19:14, Jeff Cody ha scritto:
  QCOW breaks with it using a normal raw posix file as a device.  As a
  test: qemu-img create -f qcow test.qcow 5G.  Now run qemu with that
  drive mounted, and try to partition and format it.  QEMU now asserts.
 
  The nicety of being able to using truncate during a write call,
  especially for VHDX (which can have relatively large block/cluster
  sizes), so to grow the file sparsely in a dynamically allocated file.
 
  Perhaps we need two APIs, truncate and revalidate.
 
  Truncate should be a no-op if (!bs-growable).
 
  Revalidate could be called by the block_resize monitor command with no
  size specified.
 
  Paolo
 
  I think that is a good solution.  Is it better to have truncate and
  revalidate, or truncate and grow, with grow being a subset of
  truncate, with fewer restrictions?  There may still be operations
  where it is OK to grow a file, but not OK to shrink it.

What semantics would the both operations have? Is truncate the same as
it used to be? I don't really understand what revalidate would do, it
sounds like a read-only operation from its name?

  Or as a first step:
  
  a) Call brdv_drain_all() only if the device is shrinked (independently of 
  !bs-growable)
  b) Call brdv_drain_all() inside iscsi_truncate() because it is a special 
  requirement there
  c) Fix the value of bs-growable for all drivers
 
 Let's start from (c).  bdrv_file_open sets bs-growable = 1.  I think it
 should be removed and only the file protocol should set it.

This is probably right.

 Then we can add bdrv_revalidate and, for block_resize, call
 bdrv_revalidate+bdrv_truncate.  For bs-growable = 0 
 !bs-drv-bdrv_truncate, bdrv_truncate can just check that the actual
 size is the same or bigger as the one requested, and fail otherwise.

This one not so much. bs-growable does not mean that you can use
bdrv_truncate. It rather means that you may write beyond the end of the
file even without truncating it first. Mabye bs-auto_grow would be a
better for it.

So bs-growable == true implies that bdrv_truncate() should be allowed
as well, because obviously changing the BDS size is possbile, but it's
not true the other way round.

Kevin

Re: [Qemu-devel] [PATCH 1/7] block: only force IO completion in .bdrv_truncate if we are shrinking


On 06.03.2013 21:39, Paolo Bonzini wrote:

Il 06/03/2013 20:03, Peter Lieven ha scritto:

Am 06.03.2013 19:48, schrieb Jeff Cody:

On Wed, Mar 06, 2013 at 07:31:51PM +0100, Paolo Bonzini wrote:

Il 06/03/2013 19:14, Jeff Cody ha scritto:

QCOW breaks with it using a normal raw posix file as a device.  As a
test: qemu-img create -f qcow test.qcow 5G.  Now run qemu with that
drive mounted, and try to partition and format it.  QEMU now asserts.

The nicety of being able to using truncate during a write call,
especially for VHDX (which can have relatively large block/cluster
sizes), so to grow the file sparsely in a dynamically allocated file.


Perhaps we need two APIs, truncate and revalidate.

Truncate should be a no-op if (!bs-growable).

Revalidate could be called by the block_resize monitor command with no
size specified.

Paolo


I think that is a good solution.  Is it better to have truncate and
revalidate, or truncate and grow, with grow being a subset of
truncate, with fewer restrictions?  There may still be operations
where it is OK to grow a file, but not OK to shrink it.


Or as a first step:

a) Call brdv_drain_all() only if the device is shrinked (independently of 
!bs-growable)
b) Call brdv_drain_all() inside iscsi_truncate() because it is a special 
requirement there
c) Fix the value of bs-growable for all drivers


Let's start from (c).  bdrv_file_open sets bs-growable = 1.  I think it
should be removed and only the file protocol should set it.

Then we can add bdrv_revalidate and, for block_resize, call
bdrv_revalidate+bdrv_truncate.  For bs-growable = 0 
!bs-drv-bdrv_truncate, bdrv_truncate can just check that the actual
size is the same or bigger as the one requested, and fail otherwise.

Paolo



Regarding brd_drain_all(). Is the fix right to call it only on device shrink?
In this case it has to be added to iscsi_truncate as well.

Peter

Re: [Qemu-devel] [PATCH 1/7] block: only force IO completion in .bdrv_truncate if we are shrinking


On 07.03.2013 09:50, Kevin Wolf wrote:

Am 06.03.2013 um 21:39 hat Paolo Bonzini geschrieben:

Il 06/03/2013 20:03, Peter Lieven ha scritto:

Am 06.03.2013 19:48, schrieb Jeff Cody:

On Wed, Mar 06, 2013 at 07:31:51PM +0100, Paolo Bonzini wrote:

Il 06/03/2013 19:14, Jeff Cody ha scritto:

QCOW breaks with it using a normal raw posix file as a device.  As a
test: qemu-img create -f qcow test.qcow 5G.  Now run qemu with that
drive mounted, and try to partition and format it.  QEMU now asserts.

The nicety of being able to using truncate during a write call,
especially for VHDX (which can have relatively large block/cluster
sizes), so to grow the file sparsely in a dynamically allocated file.


Perhaps we need two APIs, truncate and revalidate.

Truncate should be a no-op if (!bs-growable).

Revalidate could be called by the block_resize monitor command with no
size specified.

Paolo


I think that is a good solution.  Is it better to have truncate and
revalidate, or truncate and grow, with grow being a subset of
truncate, with fewer restrictions?  There may still be operations
where it is OK to grow a file, but not OK to shrink it.


What semantics would the both operations have? Is truncate the same as
it used to be? I don't really understand what revalidate would do, it
sounds like a read-only operation from its name?


Or as a first step:

a) Call brdv_drain_all() only if the device is shrinked (independently of 
!bs-growable)
b) Call brdv_drain_all() inside iscsi_truncate() because it is a special 
requirement there
c) Fix the value of bs-growable for all drivers


Let's start from (c).  bdrv_file_open sets bs-growable = 1.  I think it
should be removed and only the file protocol should set it.


This is probably right.


If bs-growable is 1 for all drivers, whats the fix status of CVE-2008-0928? 
This
flag was introduced as a fix for this problem.

bdrv_check_byte_request() does nothing useful if bs-growable is 1.

Peter

Re: [Qemu-devel] [PATCH 1/7] block: only force IO completion in .bdrv_truncate if we are shrinking

Am 06.03.2013 um 19:27 hat Peter Lieven geschrieben:
 Am 06.03.2013 19:06, schrieb Paolo Bonzini:
  Il 06/03/2013 18:50, Peter Lieven ha scritto:
  Commit 9a665b2b made bdrv_truncate() call bdrv_drain_all(), but this 
  breaks
  QCOW images, as well other future image formats (such as VHDX) that may 
  call
  bdrv_truncate(bs-file) from within a read/write operation.  For 
  example, QCOW
  will cause an assert, due to tracked_requests not being empty (since the
  read/write that called bdrv_truncate() is still in progress).
  
  I'm not sure such bdrv_truncate calls are necessary.  QCOW2 doesn't have
  them (almost; there is one in qcow2_write_compressed, I'm not even sure
  that one is necessary though), and I think QCOW's breaks using it with a
  block device as a backing file.
 
 I think we have to check the sense of bs-growable nevertheless. It is set
 to 1 for all drivers which can't be right?!

For everything that goes through bdrv_file_open(), which are protocol
drivers, not format drivers. It is required for files so that formats
can write past EOF; for other drivers that can't actually grow their
backing storage it doesn't hurt because they will return an eror anyway
when you write to it.

bdrv_truncate() works and bs-growable == true are totally different
things, so even though it doesn't look right at the first sight,
bs-growable does its job correctly.

In your other mail you're talking about setting it for raw, qcow2, VHDX.
This would be discussing on the entirely wrong level, this isn't about
formats, but about protocols (file, host_device, nbd, iscsi, http,
vvfat...), that are below the format drivers.

Kevin

Re: [Qemu-devel] [PATCH 1/7] block: only force IO completion in .bdrv_truncate if we are shrinking

Am 07.03.2013 um 09:53 hat Peter Lieven geschrieben:
 On 06.03.2013 21:39, Paolo Bonzini wrote:
 Il 06/03/2013 20:03, Peter Lieven ha scritto:
 Am 06.03.2013 19:48, schrieb Jeff Cody:
 On Wed, Mar 06, 2013 at 07:31:51PM +0100, Paolo Bonzini wrote:
 Il 06/03/2013 19:14, Jeff Cody ha scritto:
 QCOW breaks with it using a normal raw posix file as a device.  As a
 test: qemu-img create -f qcow test.qcow 5G.  Now run qemu with that
 drive mounted, and try to partition and format it.  QEMU now asserts.
 
 The nicety of being able to using truncate during a write call,
 especially for VHDX (which can have relatively large block/cluster
 sizes), so to grow the file sparsely in a dynamically allocated file.
 
 Perhaps we need two APIs, truncate and revalidate.
 
 Truncate should be a no-op if (!bs-growable).
 
 Revalidate could be called by the block_resize monitor command with no
 size specified.
 
 Paolo
 
 I think that is a good solution.  Is it better to have truncate and
 revalidate, or truncate and grow, with grow being a subset of
 truncate, with fewer restrictions?  There may still be operations
 where it is OK to grow a file, but not OK to shrink it.
 
 Or as a first step:
 
 a) Call brdv_drain_all() only if the device is shrinked (independently of 
 !bs-growable)
 b) Call brdv_drain_all() inside iscsi_truncate() because it is a special 
 requirement there
 c) Fix the value of bs-growable for all drivers
 
 Let's start from (c).  bdrv_file_open sets bs-growable = 1.  I think it
 should be removed and only the file protocol should set it.
 
 Then we can add bdrv_revalidate and, for block_resize, call
 bdrv_revalidate+bdrv_truncate.  For bs-growable = 0 
 !bs-drv-bdrv_truncate, bdrv_truncate can just check that the actual
 size is the same or bigger as the one requested, and fail otherwise.
 
 Paolo
 
 
 Regarding brd_drain_all(). Is the fix right to call it only on device shrink?
 In this case it has to be added to iscsi_truncate as well.

The real fix would bdrv_drain(bs). I hope we're not too far away from
that today, even though we're not quite there.

Kevin

Re: [Qemu-devel] [PATCH 1/7] block: only force IO completion in .bdrv_truncate if we are shrinking

Am 07.03.2013 um 09:56 hat Peter Lieven geschrieben:
 On 07.03.2013 09:50, Kevin Wolf wrote:
 Am 06.03.2013 um 21:39 hat Paolo Bonzini geschrieben:
 Il 06/03/2013 20:03, Peter Lieven ha scritto:
 Am 06.03.2013 19:48, schrieb Jeff Cody:
 On Wed, Mar 06, 2013 at 07:31:51PM +0100, Paolo Bonzini wrote:
 Il 06/03/2013 19:14, Jeff Cody ha scritto:
 QCOW breaks with it using a normal raw posix file as a device.  As a
 test: qemu-img create -f qcow test.qcow 5G.  Now run qemu with that
 drive mounted, and try to partition and format it.  QEMU now asserts.
 
 The nicety of being able to using truncate during a write call,
 especially for VHDX (which can have relatively large block/cluster
 sizes), so to grow the file sparsely in a dynamically allocated file.
 
 Perhaps we need two APIs, truncate and revalidate.
 
 Truncate should be a no-op if (!bs-growable).
 
 Revalidate could be called by the block_resize monitor command with no
 size specified.
 
 Paolo
 
 I think that is a good solution.  Is it better to have truncate and
 revalidate, or truncate and grow, with grow being a subset of
 truncate, with fewer restrictions?  There may still be operations
 where it is OK to grow a file, but not OK to shrink it.
 
 What semantics would the both operations have? Is truncate the same as
 it used to be? I don't really understand what revalidate would do, it
 sounds like a read-only operation from its name?
 
 Or as a first step:
 
 a) Call brdv_drain_all() only if the device is shrinked (independently of 
 !bs-growable)
 b) Call brdv_drain_all() inside iscsi_truncate() because it is a special 
 requirement there
 c) Fix the value of bs-growable for all drivers
 
 Let's start from (c).  bdrv_file_open sets bs-growable = 1.  I think it
 should be removed and only the file protocol should set it.
 
 This is probably right.
 
 If bs-growable is 1 for all drivers, whats the fix status of CVE-2008-0928? 
 This
 flag was introduced as a fix for this problem.
 
 bdrv_check_byte_request() does nothing useful if bs-growable is 1.

Don't ignore the difference between bdrv_open() and bdrv_file_open().
Typically you have two BDSes: On top there is e.g. a qcow2 BDS that is
opened through bdrv_open() and has bs-growable = false. Its bs-file is
using the file protocol (raw-posix driver) and opened by
bdrv_file_open(). This one has bs-file-growable = true so that qcow2
can write to newly allocated areas without calling bdrv_truncate()
first.

Kevin

Re: [Qemu-devel] [PATCH v4 3/6] add backup related monitor commands

Am 06.03.2013 um 18:39 hat Dietmar Maurer geschrieben:
How about variable block sizes? I mean this is a stream format that
has a header for each block anyway. Include a size there and be done.
  
   You can make that as complex as you want. I simply do not need that.
  
  Then you also don't need the performance that you lose by using NBD.
 
 Why exactly?

Because your format kills more performance than any NBD connection
could.

Kevin

Re: [Qemu-devel] [PATCH v4 3/6] add backup related monitor commands

On Wed, Mar 06, 2013 at 02:42:57PM +, Dietmar Maurer wrote:
Maybe you'd better use a different output format that doesn't
restrict you to 64k writes.
  
   The output format is not really the restriction. The problem is that
   an additional IPC layer add overhead, an d I do not want that (because it 
   is
  totally unnecessary).
  
  I missed the reason why you cannot increase the block size.  
 
 When we run backup, we need to read such block on every write from the guest.
 So if we increase block size we get additional delays.

Don't increase the bitmap block size.

Just let the block job do larger reads.  This is the bulk of the I/O
workload.  You can use large reads here independently of the bitmap
block size.

That way guest writes still only have a 64 KB read overhead but you
reduce the overhead of doing so many 64 KB writes from the backup block
job.

Stefan

Re: [Qemu-devel] [PATCH] Fix the wrong description in qemu manual

Copying qemu-trivial.

Lei Li li...@linux.vnet.ibm.com writes:

 Fix LP#1151450 the wrong description in qemu manual:

 'qemu-system-x86_84' should be 'qemu-system-x86_64'.

 Signed-off-by: Lei Li li...@linux.vnet.ibm.com
 ---
  qemu-options.hx |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)

 diff --git a/qemu-options.hx b/qemu-options.hx
 index 6f9334a..cd76f2a 100644
 --- a/qemu-options.hx
 +++ b/qemu-options.hx
 @@ -2132,7 +2132,7 @@
 gluster[+transport]://[server[:port]]/volname/image[?socket=...]
  
  Example
  @example
 -qemu-system-x86_84 --drive file=gluster://192.0.2.1/testvol/a.img
 +qemu-system-x86_64 --drive file=gluster://192.0.2.1/testvol/a.img
  @end example
  
  See also @url{http://www.gluster.org}.

Re: [Qemu-devel] [PATCHv3 0/2] bridge helper: includedir conf arg

On Thu, Mar 07, 2013 at 12:32:08AM -0600, Doug Goldstein wrote:
 The goal is to support an 'includedir' to include all files within a
 directory specified in the bridge.conf file. The rationale is to allow
 libvirt to be able to configure interfaces to for use by unprivileged
 users by just simply generating a new configuration file to the directory.
 
 Change from v2:
 - Integrated review changes from Corey Bryant
 - Integrated review changes from Stefan Hajnoczi
 
 Change from v1:
 - Reversed patch order to make the series clearer
 - Integrated review changes from Corey Bryant
 - Integrated review changes from Stefan Hajnoczi
 
 Doug Goldstein (2):
   bridge helper: unified error cleanup for parse_acl_file
   bridge helper: support conf dirs
 
  qemu-bridge-helper.c | 83 
 
  1 file changed, 71 insertions(+), 12 deletions(-)
 
 -- 
 1.7.12.4
 
 

Reviewed-by: Stefan Hajnoczi stefa...@redhat.com

Re: [Qemu-devel] [PATCH 1/7] block: only force IO completion in .bdrv_truncate if we are shrinking

Am 07.03.2013 um 10:03 schrieb Kevin Wolf kw...@redhat.com:

Am 07.03.2013 um 09:56 hat Peter Lieven geschrieben:
On 07.03.2013 09:50, Kevin Wolf wrote:
Am 06.03.2013 um 21:39 hat Paolo Bonzini geschrieben:
Il 06/03/2013 20:03, Peter Lieven ha scritto:
Am 06.03.2013 19:48, schrieb Jeff Cody:
On Wed, Mar 06, 2013 at 07:31:51PM +0100, Paolo Bonzini wrote:
Il 06/03/2013 19:14, Jeff Cody ha scritto:
QCOW breaks with it using a normal raw posix file as a device. As a
test: qemu-img create -f qcow test.qcow 5G. Now run qemu with that
drive mounted, and try to partition and format it. QEMU now asserts.

The nicety of being able to using truncate during a write call,
especially for VHDX (which can have relatively large block/cluster
sizes), so to grow the file sparsely in a dynamically allocated file.

Perhaps we need two APIs, truncate and revalidate.

Truncate should be a no-op if (!bs-growable).

Revalidate could be called by the block_resize monitor command with no
size specified.

Paolo

I think that is a good solution. Is it better to have truncate and
revalidate, or truncate and grow, with grow being a subset of
truncate, with fewer restrictions? There may still be operations
where it is OK to grow a file, but not OK to shrink it.

What semantics would the both operations have? Is truncate the same as
it used to be? I don't really understand what revalidate would do, it
sounds like a read-only operation from its name?

Or as a first step:

a) Call brdv_drain_all() only if the device is shrinked (independently of
!bs-growable)
b) Call brdv_drain_all() inside iscsi_truncate() because it is a special
requirement there
c) Fix the value of bs-growable for all drivers

Let's start from (c). bdrv_file_open sets bs-growable = 1. I think it
should be removed and only the file protocol should set it.

This is probably right.

If bs-growable is 1 for all drivers, whats the fix status of CVE-2008-0928?
This
flag was introduced as a fix for this problem.

bdrv_check_byte_request() does nothing useful if bs-growable is 1.

Don't ignore the difference between bdrv_open() and bdrv_file_open().
Typically you have two BDSes: On top there is e.g. a qcow2 BDS that is
opened through bdrv_open() and has bs-growable = false. Its bs-file is
using the file protocol (raw-posix driver) and opened by
bdrv_file_open(). This one has bs-file-growable = true so that qcow2
can write to newly allocated areas without calling bdrv_truncate()
first.

Sorry, I have to admin I am little confused by what is happening in bdrv_open().

However, what I can say is that bs-growable is 1 for an iSCSI backed
harddrive and I wonder how this can happen if bdrv_file_open is not used for
opening it because that is the only place where bs-growable is set to 1.

cmdline:
x86_64-softmmu/qemu-system-x86_64 -k de -enable-kvm -m 1024 -drive
if=virtio,file=iscsi://172.21.200.31/iqn.2001-05.com.equallogic:0-8a0906-16470e107-713001aa6de511e0-001-test/0
-vnc :1 -boot dc -monitor stdio

Peter

Kevin

Re: [Qemu-devel] [PATCH 1/7] block: only force IO completion in .bdrv_truncate if we are shrinking

Am 07.03.2013 um 10:16 hat Peter Lieven geschrieben:
  If bs-growable is 1 for all drivers, whats the fix status of 
  CVE-2008-0928? This
  flag was introduced as a fix for this problem.
  
  bdrv_check_byte_request() does nothing useful if bs-growable is 1.
  
  Don't ignore the difference between bdrv_open() and bdrv_file_open().
  Typically you have two BDSes: On top there is e.g. a qcow2 BDS that is
  opened through bdrv_open() and has bs-growable = false. Its bs-file is
  using the file protocol (raw-posix driver) and opened by
  bdrv_file_open(). This one has bs-file-growable = true so that qcow2
  can write to newly allocated areas without calling bdrv_truncate()
  first.
 
 Sorry, I have to admin I am little confused by what is happening in 
 bdrv_open().
 
 However, what I can say is that bs-growable is 1 for an iSCSI backed
 harddrive and I wonder how this can happen if bdrv_file_open is not used for
 opening it because that is the only place where bs-growable is set to 1.
 
 cmdline:
 x86_64-softmmu/qemu-system-x86_64 -k de -enable-kvm -m 1024 -drive 
 if=virtio,file=iscsi://172.21.200.31/iqn.2001-05.com.equallogic:0-8a0906-16470e107-713001aa6de511e0-001-test/0
  -vnc :1 -boot dc -monitor stdio

It is used for the iscsi driver. You have a raw BDS (growable == false)
on top of an iscsi one (growable == true).

Kevin

Re: [Qemu-devel] [PATCH v4 3/6] add backup related monitor commands

2013-03-07 Thread Dietmar Maurer

You can make that as complex as you want. I simply do not need that.
  
   Then you also don't need the performance that you lose by using NBD.
 
  Why exactly?
 
 Because your format kills more performance than any NBD connection could.

That is not true.

Re: [Qemu-devel] [PATCH 1/7] block: only force IO completion in .bdrv_truncate if we are shrinking


Am 07.03.2013 um 10:22 schrieb Kevin Wolf kw...@redhat.com:

 Am 07.03.2013 um 10:16 hat Peter Lieven geschrieben:
 If bs-growable is 1 for all drivers, whats the fix status of 
 CVE-2008-0928? This
 flag was introduced as a fix for this problem.
 
 bdrv_check_byte_request() does nothing useful if bs-growable is 1.
 
 Don't ignore the difference between bdrv_open() and bdrv_file_open().
 Typically you have two BDSes: On top there is e.g. a qcow2 BDS that is
 opened through bdrv_open() and has bs-growable = false. Its bs-file is
 using the file protocol (raw-posix driver) and opened by
 bdrv_file_open(). This one has bs-file-growable = true so that qcow2
 can write to newly allocated areas without calling bdrv_truncate()
 first.
 
 Sorry, I have to admin I am little confused by what is happening in 
 bdrv_open().
 
 However, what I can say is that bs-growable is 1 for an iSCSI backed
 harddrive and I wonder how this can happen if bdrv_file_open is not used for
 opening it because that is the only place where bs-growable is set to 1.
 
 cmdline:
 x86_64-softmmu/qemu-system-x86_64 -k de -enable-kvm -m 1024 -drive 
 if=virtio,file=iscsi://172.21.200.31/iqn.2001-05.com.equallogic:0-8a0906-16470e107-713001aa6de511e0-001-test/0
  -vnc :1 -boot dc -monitor stdio
 
 It is used for the iscsi driver. You have a raw BDS (growable == false)
 on top of an iscsi one (growable == true).

Ok, but growable == true is wrong for the iSCSI driver isn`t it?

Peter

Re: [Qemu-devel] [PATCH v4 3/6] add backup related monitor commands

2013-03-07 Thread Dietmar Maurer

  When we run backup, we need to read such block on every write from the
 guest.
  So if we increase block size we get additional delays.
 
 Don't increase the bitmap block size.
 
 Just let the block job do larger reads.  This is the bulk of the I/O 
 workload.  You
 can use large reads here independently of the bitmap block size.
 
 That way guest writes still only have a 64 KB read overhead but you reduce the
 overhead of doing so many 64 KB writes from the backup block job.

If there are many writes from the guest, you still get many 64KB block.

Anyways, and additional RPC layer always adds overhead, and It can be 
completely avoided.
Maybe not much, but I want to make backup as efficient as possible.

Re: [Qemu-devel] [PATCH 0/3] * make netlayer re-entrant *

On Thu, Mar 07, 2013 at 10:06:52AM +0800, liu ping fan wrote:
On Wed, Mar 6, 2013 at 5:30 AM, mdroth mdr...@linux.vnet.ibm.com wrote:
On Sun, Mar 03, 2013 at 09:21:19PM +0800, Liu Ping Fan wrote:
From: Liu Ping Fan pingf...@linux.vnet.ibm.com

This series aim to make netlayer re-entrant, so netlayer can
run out of biglock safely.

I think most of the locking considerations are still applicable either
way, but this series seems to be written under the assumption that
we'll be associating hubs/ports with separate AioContexts to facilitate
driving the event handling outside of the iothread. Is this the case?

Yes.
From what I gathered from the other thread, the path forward was to
replace the global iohandler list that we currently use to drive
NetClient events and replace it with a GSource and GMainContext, rather
than relying on AioContexts.

Not quite sure about it. Seems that AioContext is built on GSource, so
I think they are similar, and AioContext is easy to reuse.

I do agree that the event handlers currently grouped under
iohandler.c:io_handlers look like a nice fit for AioContexts, but other
things like slirp and chardevs seem better served by a more general
mechanism like GSources/GMainContexts. The chardev flow control patches
seem to be doing something similar already as well.

I have made some fix for this series, apart from the concert about
GSource/ AioContext. Hope to discuss it clearly in next version and
fix it too. BTW what can we benefit from AioContext besides those from
GSource

This is a good discussion. I'd like to hear more about using glib event
loop concepts instead of rolling our own. Here are my thoughts after
exploring the glib main loop vs AioContext.

AioContext supports two things:

1. BH which is similar to GIdle for scheduling functions that get called
from the event loop.

Note that GIdle doesn't work in aio_poll() because we don't run
integrate the glib event loop.

2. aio_poll() which goes a step beyond iohandlers because the
-io_flush() handler signals whether there are pending aio requests.
This way aio_poll() can be called in a loop until all pending
requests have finished.

Imagine block/iscsi.c which has a TCP socket to the iSCSI target.
We're ready to receive from the socket but we only want to wait until
all pending requests complete. That means the socket fd is always
looking for G_IO_IN events but we shouldn't wait unless there are
actually iSCSI requests pending.

This feature is important for the synchronous (nested event loop)
functionality that QEMU relies on for bdrv_drain_all() and
synchronous I/O (bdrv_read(), bdrv_write(), bdrv_flush()).

The glib equivalent to aio_poll() is g_main_context_iteration():

https://developer.gnome.org/glib/2.30/glib-The-Main-Event-Loop.html#g-main-context-iteration

But note that the return value is different. g_main_context_iteration()
tells you if any event handlers were called. aio_poll() tells you
whether more calls are necessary in order to reach a quiescent state
(all requests completed).

I guess it's time to look at the chardev flow control patches to see how
glib event loop concepts are being used.

Stefan

Re: [Qemu-devel] [PATCH 0/3] * make netlayer re-entrant *

On Thu, Mar 07, 2013 at 10:06:52AM +0800, liu ping fan wrote:
 On Wed, Mar 6, 2013 at 5:30 AM, mdroth mdr...@linux.vnet.ibm.com wrote:
  On Sun, Mar 03, 2013 at 09:21:19PM +0800, Liu Ping Fan wrote:
  From: Liu Ping Fan pingf...@linux.vnet.ibm.com
 
  This series aim to make netlayer re-entrant, so netlayer can
  run out of biglock safely.
 
  I think most of the locking considerations are still applicable either
  way, but this series seems to be written under the assumption that
  we'll be associating hubs/ports with separate AioContexts to facilitate
  driving the event handling outside of the iothread. Is this the case?
 
 Yes.
  From what I gathered from the other thread, the path forward was to
  replace the global iohandler list that we currently use to drive
  NetClient events and replace it with a GSource and GMainContext, rather
  than relying on AioContexts.
 
 Not quite sure about it. Seems that AioContext is built on GSource, so
 I think they are similar, and AioContext is easy to reuse.
 
  I do agree that the event handlers currently grouped under
  iohandler.c:io_handlers look like a nice fit for AioContexts, but other
  things like slirp and chardevs seem better served by a more general
  mechanism like GSources/GMainContexts. The chardev flow control patches
  seem to be doing something similar already as well.
 
 I have made some fix for this series, apart from the concert about
 GSource/ AioContext.  Hope to discuss it clearly in next version and
 fix it too. BTW what can we benefit from AioContext besides those from
 GSource

One thing I forgot to add: net/ doesn't use BH or
qemu_aio_set_fd_handler() so AioContext isn't strictly necessary.

Stefan

Re: [Qemu-devel] [PER] Re: socket, mcast looping back frames - IPv6 broken

On Wed, Mar 06, 2013 at 02:15:25PM +0100, Samuel Thibault wrote:
 Stefan Hajnoczi, le Wed 06 Mar 2013 13:29:37 +0100, a écrit :
  On Tue, Mar 05, 2013 at 05:35:10PM +0100, Samuel Thibault wrote:
  Unfortunately net/socket.c does not have the concept of a link-layer
  address, so we cannot easily filter out multicast packets coming from
  our NIC's address.
  
  Are you aware of a way to filter out just the packets sent by *this*
  process?
 
 I haven't seen any in the Linux source code.  One thing that should
 work, however, is to use recvfrom, and drop whatever comes from our
 sockname.

Sounds like a plan :).

Stefan

Re: [Qemu-devel] [PATCH] qdev: DEVICE_DELETED event

Eric Blake ebl...@redhat.com writes:

 [adding libvirt]

 On 03/06/2013 07:52 AM, Paolo Bonzini wrote:
 Il 06/03/2013 15:44, Eric Blake ha scritto:
 Question - if libvirt misses the event (for example, if libvirtd
 requests a remove, but then gets restarted, and the event arrives before
 libvirtd is back up), is there a way to poll whether the the removal has
 completed?  The event is great to minimize polling overhead in the
 common case, but we generally provide this sort of information via a
 pollable interface at the same time.

General rule: no event without a way to poll.

 Yes, you can use qom-list on /machine/peripheral.

 Which means libvirt should be patched to use qom-list right now, even
 before this event makes it in, in order to avoid its current bug of
 reusing a device id before the deletion has completed.  Adding the event
 is still useful, as polling is never nice; and we already know how to
 make libvirt do conditional code based on whether query-events shows
 that the event has been added to minimize polling where possible.

Yes.

[Qemu-devel] [Bug 1151450] [NEW] wrong description in qemu manual

2013-03-07 Thread cyberyoung

Public bug reported:


Description:
man qemu, there is a line:
qemu-system-x86_84 --drive file=gluster://192.0.2.1/testvol/a.img
seems should be:
qemu-system-x86_64 --drive file=gluster://192.0.2.1/testvol/a.img

Additional info:
* operating system
arch linux x86_64
* package version(s)
1.4.0
* config and/or log files etc.


Steps to reproduce:
man qemu

** Affects: qemu
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1151450

Title:
  wrong description in qemu manual

Status in QEMU:
  New

Bug description:
  
  Description:
  man qemu, there is a line:
  qemu-system-x86_84 --drive file=gluster://192.0.2.1/testvol/a.img
  seems should be:
  qemu-system-x86_64 --drive file=gluster://192.0.2.1/testvol/a.img

  Additional info:
  * operating system
  arch linux x86_64
  * package version(s)
  1.4.0
  * config and/or log files etc.

  
  Steps to reproduce:
  man qemu

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1151450/+subscriptions

Re: [Qemu-devel] [PATCH] Fix the wrong description in qemu manual

On Thu, Mar 07, 2013 at 03:50:26PM +0800, Lei Li wrote:
 Fix LP#1151450 the wrong description in qemu manual:
 
 'qemu-system-x86_84' should be 'qemu-system-x86_64'.
 
 Signed-off-by: Lei Li li...@linux.vnet.ibm.com
 ---
  qemu-options.hx |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)

Thanks, applied to the trivial patches tree:
https://github.com/stefanha/qemu/commits/trivial-patches

Stefan

[Qemu-devel] [PATCHv3] qdev: DEVICE_DELETED event

libvirt has a long-standing bug: when removing the device,
it can request removal but does not know when the
removal completes. Add an event so we can fix this in a robust way.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---

Changes from v2:
- move event toward the end of device_unparent,
  so that parents are reported after their children,
  as suggested by Paolo
Changes from v1:
- move to device_unparent
- address comments by Andreas and Eric


 QMP/qmp-events.txt| 15 +++
 hw/qdev.c |  6 ++
 include/monitor/monitor.h |  1 +
 monitor.c |  1 +
 qapi-schema.json  |  4 +++-
 5 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/QMP/qmp-events.txt b/QMP/qmp-events.txt
index b2698e4..f2f115a 100644
--- a/QMP/qmp-events.txt
+++ b/QMP/qmp-events.txt
@@ -136,6 +136,21 @@ Example:
 Note: The ready to complete status is always reset by a BLOCK_JOB_ERROR
 event.
 
+DEVICE_DELETED
+-
+
+Emitted whenever the device removal completion is acknowledged
+by the guest. At this point, it's safe to reuse the specified device ID.
+Device removal can be initiated by the guest or by HMP/QMP commands.
+
+Data:
+
+- device: device name (json-string)
+
+{ event: DEVICE_DELETED,
+  data: { device: virtio-net-pci-0 },
+  timestamp: { seconds: 1265044230, microseconds: 450486 } }
+
 DEVICE_TRAY_MOVED
 -
 
diff --git a/hw/qdev.c b/hw/qdev.c
index 689cd54..393e83e 100644
--- a/hw/qdev.c
+++ b/hw/qdev.c
@@ -29,6 +29,7 @@
 #include sysemu/sysemu.h
 #include qapi/error.h
 #include qapi/visitor.h
+#include qapi/qmp/qjson.h
 
 int qdev_hotplug = 0;
 static bool qdev_hot_added = false;
@@ -778,6 +779,11 @@ static void device_unparent(Object *obj)
 object_unref(OBJECT(dev-parent_bus));
 dev-parent_bus = NULL;
 }
+if (dev-id) {
+QObject *data = qobject_from_jsonf({ 'device': %s }, dev-id);
+monitor_protocol_event(QEVENT_DEVICE_DELETED, data);
+qobject_decref(data);
+}
 }
 
 static void device_class_init(ObjectClass *class, void *data)
diff --git a/include/monitor/monitor.h b/include/monitor/monitor.h
index 87fb49c..b868760 100644
--- a/include/monitor/monitor.h
+++ b/include/monitor/monitor.h
@@ -39,6 +39,7 @@ typedef enum MonitorEvent {
 QEVENT_BLOCK_JOB_CANCELLED,
 QEVENT_BLOCK_JOB_ERROR,
 QEVENT_BLOCK_JOB_READY,
+QEVENT_DEVICE_DELETED,
 QEVENT_DEVICE_TRAY_MOVED,
 QEVENT_SUSPEND,
 QEVENT_SUSPEND_DISK,
diff --git a/monitor.c b/monitor.c
index 32a6e74..2a5e7b6 100644
--- a/monitor.c
+++ b/monitor.c
@@ -457,6 +457,7 @@ static const char *monitor_event_names[] = {
 [QEVENT_BLOCK_JOB_CANCELLED] = BLOCK_JOB_CANCELLED,
 [QEVENT_BLOCK_JOB_ERROR] = BLOCK_JOB_ERROR,
 [QEVENT_BLOCK_JOB_READY] = BLOCK_JOB_READY,
+[QEVENT_DEVICE_DELETED] = DEVICE_DELETED,
 [QEVENT_DEVICE_TRAY_MOVED] = DEVICE_TRAY_MOVED,
 [QEVENT_SUSPEND] = SUSPEND,
 [QEVENT_SUSPEND_DISK] = SUSPEND_DISK,
diff --git a/qapi-schema.json b/qapi-schema.json
index 28b070f..bb361e1 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -2354,7 +2354,9 @@
 # Notes: When this command completes, the device may not be removed from the
 #guest.  Hot removal is an operation that requires guest cooperation.
 #This command merely requests that the guest begin the hot removal
-#process.
+#process.  Completion of the device removal process is signaled with a
+#DEVICE_DELETED event. Guest reset will automatically complete removal
+#for all devices.
 #
 # Since: 0.14.0
 ##
-- 
MST

Re: [Qemu-devel] [PATCH] qdev: DEVICE_DELETED event

Michael S. Tsirkin m...@redhat.com writes:

 On Wed, Mar 06, 2013 at 02:57:22PM +0100, Andreas Färber wrote:
 Am 06.03.2013 14:00, schrieb Michael S. Tsirkin:
  libvirt has a long-standing bug: when removing the device,
  it can request removal but does not know when does the
  removal complete. Add an event so we can fix this in a robust way.
  
  Signed-off-by: Michael S. Tsirkin m...@redhat.com
 
 Sounds like a good idea to me. :)
 
 [...]
  diff --git a/hw/qdev.c b/hw/qdev.c
  index 689cd54..f30d251 100644
  --- a/hw/qdev.c
  +++ b/hw/qdev.c
  @@ -29,6 +29,7 @@
   #include sysemu/sysemu.h
   #include qapi/error.h
   #include qapi/visitor.h
  +#include qapi/qmp/qjson.h
   
   int qdev_hotplug = 0;
   static bool qdev_hot_added = false;
  @@ -267,6 +268,11 @@ void qdev_init_nofail(DeviceState *dev)
   /* Unlink device from bus and free the structure.  */
   void qdev_free(DeviceState *dev)
   {
  +if (dev-id) {
  +QObject *data = qobject_from_jsonf({ 'device': %s }, dev-id);
  +monitor_protocol_event(QEVENT_DEVICE_DELETED, data);
  +qobject_decref(data);
  +}
   object_unparent(OBJECT(dev));
   }
   
 
 I'm pretty sure this is the wrong place to fire the notification. We
 should rather do this when the device is actually deleted - which
 qdev_free() does *not* actually guarantee, as criticized in the s390x
 and unref'ing contexts.
 I would suggest to place your code into device_unparent() instead.
 
 Another thing to consider is what data to pass to the event: Not all
 devices have an ID.

 If they don't they were not created by management so management is
 probably not interested in them being removed.

 We could always add a 'path' key later if this assumption
 proves incorrect.

In old qdev, ID was all we had, because paths were busted.  Thus,
management had no choice but use IDs.

If I understand modern qdev correctly, we got a canonical path.  Old
APIs like device_del still accept only ID.  Should new APIs still be
designed that way?  Or should they always accept / provide the canonical
path, plus optional ID for convenience?

 We should still have a canonical path when we fire
 this event in either qdev_free() or in device_unparent() before the if
 (dev-parent_bus) block though. That would be a question for Anthony,
 not having a use case for the event I am indifferent there.
 
 Further, thinking of objects such as virtio-rng backends or future
 blockdev/chardev objects, might it make sense to turn this into a
 generic object deletion event rather than a device event?
 
 Andreas

 Backend deletion doesn't normally have guest interaction right?
 So why do we need an event?

We need an event because device_del may send its reply before it
completes the job.

device_del does that when it deletion needs to interact with the guest,
which can take unbounded time.

Conversely, we don't need an event when a QMP always completes the job
(as far as observable by the QMP client) before it sends its reply.  Off
hand, I can't see why backend deletion would do anything else.

I'm always reluctant to abstract when there are fewer than two
different, concrete things to abstract from.  Right now, we got just
one: device models.

Re: [Qemu-devel] qemu-img do not verify encrytption

On Thu, Mar 07, 2013 at 02:49:30PM +0800, yue-kvm wrote:
 i create qcow2 format image with -o encryption, and assign password.
 
 but when i qemu-img info encryt.qcow2 ,  it display encryt.qcow2 info no 
 master  what i input, even just hit enter.
 the 'Password Authentication ' of qemu-img may be invalid.

Don't worry, your data is encrypted.  The qemu-img info password check
is unnecessary because qcow2 metadata is unencrypted.

I didn't get a password prompt from qemu-img info on an encrypted qcow2
file using qemu.git/master.

Which version of qemu-img are you using?  Perhaps the unnecessary prompt
has been fixed in a later version.

Stefan

Re: [Qemu-devel] [PATCH 1/7] block: only force IO completion in .bdrv_truncate if we are shrinking

Am 07.03.2013 um 10:25 hat Peter Lieven geschrieben:
 
 Am 07.03.2013 um 10:22 schrieb Kevin Wolf kw...@redhat.com:
 
  Am 07.03.2013 um 10:16 hat Peter Lieven geschrieben:
  If bs-growable is 1 for all drivers, whats the fix status of 
  CVE-2008-0928? This
  flag was introduced as a fix for this problem.
  
  bdrv_check_byte_request() does nothing useful if bs-growable is 1.
  
  Don't ignore the difference between bdrv_open() and bdrv_file_open().
  Typically you have two BDSes: On top there is e.g. a qcow2 BDS that is
  opened through bdrv_open() and has bs-growable = false. Its bs-file is
  using the file protocol (raw-posix driver) and opened by
  bdrv_file_open(). This one has bs-file-growable = true so that qcow2
  can write to newly allocated areas without calling bdrv_truncate()
  first.
  
  Sorry, I have to admin I am little confused by what is happening in 
  bdrv_open().
  
  However, what I can say is that bs-growable is 1 for an iSCSI backed
  harddrive and I wonder how this can happen if bdrv_file_open is not used 
  for
  opening it because that is the only place where bs-growable is set to 1.
  
  cmdline:
  x86_64-softmmu/qemu-system-x86_64 -k de -enable-kvm -m 1024 -drive 
  if=virtio,file=iscsi://172.21.200.31/iqn.2001-05.com.equallogic:0-8a0906-16470e107-713001aa6de511e0-001-test/0
   -vnc :1 -boot dc -monitor stdio
  
  It is used for the iscsi driver. You have a raw BDS (growable == false)
  on top of an iscsi one (growable == true).
 
 Ok, but growable == true is wrong for the iSCSI driver isn`t it?

I guess it depends on your definition. If you say that growable includes
the capability of growing the image, then yes, it's wrong. If you only
interpret it as permission to write beyond EOF (if the driver supports
that), then it's right even though any such attempt will result in an
error.

Practically speaking, the difference is between -EIO returned from
bdrv_check_byte_request() and -EIO from iscsi_aio_write16_cb(), i.e.
the result is the same.

Kevin

Re: [Qemu-devel] [PATCH] sysbus: add no_user for devices using mmio or IRQ or GPIO

2013-03-07 Thread Peter Maydell

On 7 March 2013 16:48, Markus Armbruster arm...@redhat.com wrote:
 Can we make some progress towards something that makes more sense?

 First step: reasons for marking a device no_user.

 From a user point of view, I think there's just one: -device/device_add
 cannot possibly result in a working device.  Coherent enough semantics
 for no_user, in my opinion, but I readily concede it's not particularly
 useful for maintaining it as infrastructure and device models evolve.

Ideally it would be nice to move to a model where the error message
was the device you've created has some required connections which
haven't been wired up, and then for some devices you'd always get
that error [until we implemented syntax for doing the wiring] and
for some you'd only get it if you messed up the command line switches.

 For that, we need to drill down into reasons for cannot possibly work.
 Here are two, please add more:

 * Can't make required connections

 * Resource collision with board

   The device must connect to some fixed resource, but the board already
   connects something to it.  Without no_user, this should result in an
   error message of sorts, which is much better than the silent breakage
   above.  Whether the message makes any sense to the user is a different
   question.

Do we have any concrete examples of this? I don't think devices should
be making connections to resources themselves. If the only things wiring
up devices are (a) the board models and (b) anything the user says on
the command line, then the conflict only happens if the user says
-device foo,bar=0x42 and bar 0x42 is in use; in that case the message
will make sense to them.

I think a third case for no-user is this device is actually an
abstract base class (eg arm_gic_common), which we could deal with
by checking the TypeInfo instead.

Case four: we really don't expect anybody to be trying to wire this
up dynamically, which would apply to things like the on-cpu peripherals
for some ARM cores. There it is really just an attempt at being friendly
by cutting down the length of the devices list.

Speaking of friendlyness, it might be helpful if the '-devices help'
list was somehow more structured than a single long list and if
more devices had description text?

  [and I don't think this device
 can be added via the monitor but not the command line
 counts as consistent or coherent...]

 no_user applies equally to -device and device_add.

In the codebase as it stands, it applies only to -device.
I agree that we should be consistent here, which we could do
by applying Christian's patch or some variation to make device_add
honour no_user. (Or by removing no_user altogether :-))

-- PMM

Re: [Qemu-devel] [PATCH 3/5] aio: add a ThreadPool instance to AioContext

On Wed, Mar 06, 2013 at 06:24:37PM +0100, Paolo Bonzini wrote:
 Il 06/03/2013 16:45, Stefan Hajnoczi ha scritto:
  This patch adds a ThreadPool to AioContext.  It's possible that some
  AioContext instances will never use the ThreadPool, so defer creation
  until aio_get_thread_pool().
 
 What lock should protect against doing this twice?

aio_get_thread_pool() is not thread-safe and I don't think it needs to
be.  I imagine it will only be called from the thread that runs the
AioContext.

Stefan

Re: [Qemu-devel] [PATCH V7 0/5] Send the gratuitous by guest

On Thu, Mar 07, 2013 at 04:23:46PM +0800, Jason Wang wrote:
 This series tries to let guest instead of qemu to send the gratuitous packets
 after migration when guest is capable of doing this. This is needed since it's
 impossible for qemu to keep track of all configurations (e.g 802.1Q) and mac
 addresses (more than one mac address may be used by guest). So qemu can't 
 build
 gratuitous packets for all those configurations properly. The only solution is
 let guest driver who knew all needed information to do this.
 
 The series first introduces a new runstate which just tracks the state when 
 the
 migration is finished and guest is about to start. And then we can just trying
 to notify the guest to send the GARP after changing from this state to
 running. A model specific announcing method were also also introduced to let
 each kinds of nic do its own notification. When there's no such method 
 register
 for the nic, the old style of sending RARP were kept. And the last two patches
 implemented the virtio-net method of notification.

Do we want to retry SELF_ANNOUNCE_ROUNDS?

 Changes from V6:
 - introduce a new runstate instead of using a global variable check the state
 
 Changes from V5:
 - use a global variable to decide whether an announcement is needed after 
 migration
 - align with virtio spec and let guest ack the announcement notification 
 through
   control vq instead of config status writing
 
 Changes from V4:
 - keep the old behavior that send the gratuitous packets only after migration

I wonder why it's a sane thing to do. How about simply sending the event after 
load?

 - decide whether to send gratuitous packets by previous runstate instead of a 
 dedicated parameter
 - check virtio_net_started() instead of VIRTIO_NET_S_LINK_UP before issue the 
 config update interrupt
 - move VIRTIO_NET_S_ANNOUNCE to 0x100 and supress guest config write to RO 
 bits
 - cleanups suggested by Michael
 
 Tested with migration within 802.1Q.
 
 Jason Wang (5):
   runstate: introduce prelaunch-migrate state
   net: announce self after vm is started
   net: model specific announcing support
   virtio-net: notify guest to annouce itself
   virtio-net: compat guest announce
 
  hw/pc.h   |6 +-
  hw/virtio-net.c   |   30 ++
  hw/virtio-net.h   |   15 ++-
  include/net/net.h |2 ++
  migration.c   |4 +---
  qapi-schema.json  |5 -
  savevm.c  |8 ++--
  vl.c  |8 +++-
  8 files changed, 69 insertions(+), 9 deletions(-)

Re: [Qemu-devel] [PATCH] qdev: DEVICE_DELETED event

On Thu, Mar 07, 2013 at 10:55:23AM +0100, Markus Armbruster wrote:
 Michael S. Tsirkin m...@redhat.com writes:
 
  On Wed, Mar 06, 2013 at 02:57:22PM +0100, Andreas Färber wrote:
  Am 06.03.2013 14:00, schrieb Michael S. Tsirkin:
   libvirt has a long-standing bug: when removing the device,
   it can request removal but does not know when does the
   removal complete. Add an event so we can fix this in a robust way.
   
   Signed-off-by: Michael S. Tsirkin m...@redhat.com
  
  Sounds like a good idea to me. :)
  
  [...]
   diff --git a/hw/qdev.c b/hw/qdev.c
   index 689cd54..f30d251 100644
   --- a/hw/qdev.c
   +++ b/hw/qdev.c
   @@ -29,6 +29,7 @@
#include sysemu/sysemu.h
#include qapi/error.h
#include qapi/visitor.h
   +#include qapi/qmp/qjson.h

int qdev_hotplug = 0;
static bool qdev_hot_added = false;
   @@ -267,6 +268,11 @@ void qdev_init_nofail(DeviceState *dev)
/* Unlink device from bus and free the structure.  */
void qdev_free(DeviceState *dev)
{
   +if (dev-id) {
   +QObject *data = qobject_from_jsonf({ 'device': %s }, dev-id);
   +monitor_protocol_event(QEVENT_DEVICE_DELETED, data);
   +qobject_decref(data);
   +}
object_unparent(OBJECT(dev));
}

  
  I'm pretty sure this is the wrong place to fire the notification. We
  should rather do this when the device is actually deleted - which
  qdev_free() does *not* actually guarantee, as criticized in the s390x
  and unref'ing contexts.
  I would suggest to place your code into device_unparent() instead.
  
  Another thing to consider is what data to pass to the event: Not all
  devices have an ID.
 
  If they don't they were not created by management so management is
  probably not interested in them being removed.
 
  We could always add a 'path' key later if this assumption
  proves incorrect.
 
 In old qdev, ID was all we had, because paths were busted.  Thus,
 management had no choice but use IDs.
 
 If I understand modern qdev correctly, we got a canonical path.  Old
 APIs like device_del still accept only ID.  Should new APIs still be
 designed that way?  Or should they always accept / provide the canonical
 path, plus optional ID for convenience?

What are advantages of exposing the path to users in this way?
Looks like maintainance hassle without real benefits?

  We should still have a canonical path when we fire
  this event in either qdev_free() or in device_unparent() before the if
  (dev-parent_bus) block though. That would be a question for Anthony,
  not having a use case for the event I am indifferent there.
  
  Further, thinking of objects such as virtio-rng backends or future
  blockdev/chardev objects, might it make sense to turn this into a
  generic object deletion event rather than a device event?
  
  Andreas
 
  Backend deletion doesn't normally have guest interaction right?
  So why do we need an event?
 
 We need an event because device_del may send its reply before it
 completes the job.
 
 device_del does that when it deletion needs to interact with the guest,
 which can take unbounded time.
 
 Conversely, we don't need an event when a QMP always completes the job
 (as far as observable by the QMP client) before it sends its reply.  Off
 hand, I can't see why backend deletion would do anything else.
 
 I'm always reluctant to abstract when there are fewer than two
 different, concrete things to abstract from.  Right now, we got just
 one: device models.

Re: [Qemu-devel] [PATCH 5/5] threadpool: drop global thread pool

On Wed, Mar 06, 2013 at 05:35:09PM +0100, Paolo Bonzini wrote:
 Il 06/03/2013 16:45, Stefan Hajnoczi ha scritto:
  Now that each AioContext has a ThreadPool and the main loop AioContext
  can be fetched with qemu_get_aio_context(), we can eliminate the concept
  of a global thread pool from thread-pool.c.
  
  The submit functions must take a ThreadPool* argument.
 
 This is certainly ok for thread-pool.c.  For raw-posix and raw-win32,
 what about adding already a bdrv_get_aio_context() function and using
 that in paio_submit?  Is it putting the cart before the horse?

Thanks, we might as well do that.  Then I won't have to go back and
modify block/raw-posix.c and block/raw-win32.c in the next patch series
which lets BlockDriverState bind to an AioContext.

Stefan

Re: [Qemu-devel] [PATCH] coroutine: use AioContext for CoQueue BH

On Wed, Mar 06, 2013 at 04:03:49PM +0100, Paolo Bonzini wrote:
 Il 06/03/2013 15:53, Stefan Hajnoczi ha scritto:
  CoQueue uses a BH to awake coroutines that were made ready to run again
  using qemu_co_queue_next() or qemu_co_queue_restart_all().  The BH
  currently runs in the iothread AioContext and would break coroutines
  that run in a different AioContext.
  
  This is a slightly tricky problem because the lifetime of the BH exceeds
  that of the CoQueue.  This means coroutines can be awoken after CoQueue
  itself has been freed.  Also, there is no qemu_co_queue_destroy()
  function which we could use to handle freeing resources.
  
  Introducing qemu_co_queue_destroy() has a ripple effect of requiring us
  to also add qemu_co_mutex_destroy() and qemu_co_rwlock_destroy(), as
  well as updating all callers.  Avoid doing that.
  
  We also cannot switch from BH to GIdle function because aio_poll() does
  not dispatch GIdle functions.  (GIdle functions make memory management
  slightly easier because they free themselves.)
  
  Finally, I don't want to move unlock_queue and unlock_bh into
  AioContext.  That would break encapsulation - AioContext isn't supposed
  to know about CoQueue.
  
  This patch implements a different solution: each qemu_co_queue_next() or
  qemu_co_queue_restart_all() call creates a new BH and list of coroutines
  to wake up.  Callers tend to invoke qemu_co_queue_next() and
  qemu_co_queue_restart_all() occasionally after blocking I/O, so creating
  a new BH for each call shouldn't be massively inefficient.
  
  Note that this patch does not add an interface for specifying the
  AioContext.  That is left to future patches which will convert CoQueue,
  CoMutex, and CoRwlock to expose AioContext.
  
  Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
  ---
   include/block/coroutine.h |  1 +
   qemu-coroutine-lock.c | 59 
  ---
   2 files changed, 42 insertions(+), 18 deletions(-)
  
  diff --git a/include/block/coroutine.h b/include/block/coroutine.h
  index c31fae3..a978162 100644
  --- a/include/block/coroutine.h
  +++ b/include/block/coroutine.h
  @@ -104,6 +104,7 @@ bool qemu_in_coroutine(void);
*/
   typedef struct CoQueue {
   QTAILQ_HEAD(, Coroutine) entries;
  +AioContext *ctx;
   } CoQueue;
   
   /**
  diff --git a/qemu-coroutine-lock.c b/qemu-coroutine-lock.c
  index 97ef01c..ae986b3 100644
  --- a/qemu-coroutine-lock.c
  +++ b/qemu-coroutine-lock.c
  @@ -29,28 +29,34 @@
   #include block/aio.h
   #include trace.h
   
  -static QTAILQ_HEAD(, Coroutine) unlock_bh_queue =
  -QTAILQ_HEAD_INITIALIZER(unlock_bh_queue);
  -static QEMUBH* unlock_bh;
  +/* Coroutines are awoken from a BH to allow the current coroutine to 
  complete
  + * its flow of execution.  The BH may run after the CoQueue has been 
  destroyed,
  + * so keep BH data in a separate heap-allocated struct.
  + */
  +typedef struct {
  +QEMUBH *bh;
  +QTAILQ_HEAD(, Coroutine) entries;
  +} CoQueueNextData;
   
   static void qemu_co_queue_next_bh(void *opaque)
   {
  +CoQueueNextData *data = opaque;
   Coroutine *next;
   
   trace_qemu_co_queue_next_bh();
  -while ((next = QTAILQ_FIRST(unlock_bh_queue))) {
  -QTAILQ_REMOVE(unlock_bh_queue, next, co_queue_next);
  +while ((next = QTAILQ_FIRST(data-entries))) {
  +QTAILQ_REMOVE(data-entries, next, co_queue_next);
   qemu_coroutine_enter(next, NULL);
   }
  +
  +qemu_bh_delete(data-bh);
  +g_slice_free(CoQueueNextData, data);
   }
   
   void qemu_co_queue_init(CoQueue *queue)
   {
   QTAILQ_INIT(queue-entries);
  -
  -if (!unlock_bh) {
  -unlock_bh = qemu_bh_new(qemu_co_queue_next_bh, NULL);
  -}
  +queue-ctx = NULL;
 
 What about adding an accessor for qemu_aio_context and using it?  Then
 you can just use aio_bh_new in qemu_co_queue_do_restart.

Your wish is my command.  I'll add this patch to the threadpool series
where I've already introduced qemu_get_aio_context().

Stefan

Re: [Qemu-devel] [PATCH V7 0/5] Send the gratuitous by guest

On 03/07/2013 06:04 PM, Michael S. Tsirkin wrote:
 On Thu, Mar 07, 2013 at 04:23:46PM +0800, Jason Wang wrote:
 This series tries to let guest instead of qemu to send the gratuitous packets
 after migration when guest is capable of doing this. This is needed since 
 it's
 impossible for qemu to keep track of all configurations (e.g 802.1Q) and mac
 addresses (more than one mac address may be used by guest). So qemu can't 
 build
 gratuitous packets for all those configurations properly. The only solution 
 is
 let guest driver who knew all needed information to do this.

 The series first introduces a new runstate which just tracks the state when 
 the
 migration is finished and guest is about to start. And then we can just 
 trying
 to notify the guest to send the GARP after changing from this state to
 running. A model specific announcing method were also also introduced to let
 each kinds of nic do its own notification. When there's no such method 
 register
 for the nic, the old style of sending RARP were kept. And the last two 
 patches
 implemented the virtio-net method of notification.
 Do we want to retry SELF_ANNOUNCE_ROUNDS?

Yes, we do the announcement several times like in the past.
 Changes from V6:
 - introduce a new runstate instead of using a global variable check the state

 Changes from V5:
 - use a global variable to decide whether an announcement is needed after 
 migration
 - align with virtio spec and let guest ack the announcement notification 
 through
   control vq instead of config status writing

 Changes from V4:
 - keep the old behavior that send the gratuitous packets only after migration
 I wonder why it's a sane thing to do. How about simply sending the event 
 after load?

The aim is to limit the change of the behaviour to focus on migration.

We may also need this after cont, and then maybe we can just do this
unconditionally in vm_start().
 - decide whether to send gratuitous packets by previous runstate instead of 
 a dedicated parameter
 - check virtio_net_started() instead of VIRTIO_NET_S_LINK_UP before issue 
 the config update interrupt
 - move VIRTIO_NET_S_ANNOUNCE to 0x100 and supress guest config write to RO 
 bits
 - cleanups suggested by Michael

 Tested with migration within 802.1Q.

 Jason Wang (5):
   runstate: introduce prelaunch-migrate state
   net: announce self after vm is started
   net: model specific announcing support
   virtio-net: notify guest to annouce itself
   virtio-net: compat guest announce

  hw/pc.h   |6 +-
  hw/virtio-net.c   |   30 ++
  hw/virtio-net.h   |   15 ++-
  include/net/net.h |2 ++
  migration.c   |4 +---
  qapi-schema.json  |5 -
  savevm.c  |8 ++--
  vl.c  |8 +++-
  8 files changed, 69 insertions(+), 9 deletions(-)

[Qemu-devel] [PATCH 0/7] linux-user updates

From: Riku Voipio riku.voi...@linaro.org

Hi,

I did a dig through the archive for linux-user patches
that have slipped through during my inactivity. The ones
in this patchset appear good and pass smoketest.

I will send these as pull request once I'm back from travel 
and can update my git repository.

Riku

Dillon Amburgey (3):
  linux-user: Add Alpha socket constants
  linux-user: Support setgroups syscall with no groups
  linux-user: Add more sparc syscall numbers

John Rigby (2):
  linux-user/syscall.c: handle FUTEX_WAIT_BITSET in do_futex
  linux-user: fix futex strace of FUTEX_CLOCK_REALTIME

Laurent Vivier (2):
  linux-user: improve print_fcntl()
  linux-user: correct semctl() and shmctl()

 linux-user/socket.h   |  69 
 linux-user/sparc/syscall_nr.h |   2 +
 linux-user/strace.c   | 103 ++
 linux-user/syscall.c  |  81 -
 4 files changed, 205 insertions(+), 50 deletions(-)

-- 
1.8.1.2

[Qemu-devel] [PATCH 1/7] linux-user: Add Alpha socket constants

From: Dillon Amburgey dill...@dillona.com

Without these, some networking programs will not work

Signed-off-by: Dillon Amburgey dill...@dillona.com
Reviewed-by: Richard Henderson r...@twiddle.net
Signed-off-by: Riku Voipio riku.voi...@linaro.org
---
 linux-user/socket.h | 69 +
 1 file changed, 69 insertions(+)

diff --git a/linux-user/socket.h b/linux-user/socket.h
index 93d4782..339cae5 100644
--- a/linux-user/socket.h
+++ b/linux-user/socket.h
@@ -87,6 +87,75 @@
 
#define TARGET_SOCK_MAX (SOCK_PACKET + 1)
 
+#elif defined(TARGET_ALPHA)
+
+/* For setsockopt(2) */
+#define TARGET_SOL_SOCKET   0x
+
+#define TARGET_SO_DEBUG 0x0001
+#define TARGET_SO_REUSEADDR 0x0004
+#define TARGET_SO_KEEPALIVE 0x0008
+#define TARGET_SO_DONTROUTE 0x0010
+#define TARGET_SO_BROADCAST 0x0020
+#define TARGET_SO_LINGER0x0080
+#define TARGET_SO_OOBINLINE 0x0100
+/* To add :#define TARGET_SO_REUSEPORT 0x0200 */
+
+#define TARGET_SO_TYPE  0x1008
+#define TARGET_SO_ERROR 0x1007
+#define TARGET_SO_SNDBUF0x1001
+#define TARGET_SO_RCVBUF0x1002
+#define TARGET_SO_SNDBUFFORCE   0x100a
+#define TARGET_SO_RCVBUFFORCE   0x100b
+#define TARGET_SO_RCVLOWAT  0x1010
+#define TARGET_SO_SNDLOWAT  0x1011
+#define TARGET_SO_RCVTIMEO  0x1012
+#define TARGET_SO_SNDTIMEO  0x1013
+#define TARGET_SO_ACCEPTCONN0x1014
+#define TARGET_SO_PROTOCOL  0x1028
+#define TARGET_SO_DOMAIN0x1029
+
+/* linux-specific, might as well be the same as on i386 */
+#define TARGET_SO_NO_CHECK  11
+#define TARGET_SO_PRIORITY  12
+#define TARGET_SO_BSDCOMPAT 14
+
+#define TARGET_SO_PASSCRED  17
+#define TARGET_SO_PEERCRED  18
+#define TARGET_SO_BINDTODEVICE 25
+
+/* Socket filtering */
+#define TARGET_SO_ATTACH_FILTER26
+#define TARGET_SO_DETACH_FILTER27
+
+#define TARGET_SO_PEERNAME  28
+#define TARGET_SO_TIMESTAMP 29
+#define TARGET_SCM_TIMESTAMPTARGET_SO_TIMESTAMP
+
+#define TARGET_SO_PEERSEC   30
+#define TARGET_SO_PASSSEC   34
+#define TARGET_SO_TIMESTAMPNS   35
+#define TARGET_SCM_TIMESTAMPNS  TARGET_SO_TIMESTAMPNS
+
+/* Security levels - as per NRL IPv6 - don't actually do anything */
+#define TARGET_SO_SECURITY_AUTHENTICATION   19
+#define TARGET_SO_SECURITY_ENCRYPTION_TRANSPORT 20
+#define TARGET_SO_SECURITY_ENCRYPTION_NETWORK   21
+
+#define TARGET_SO_MARK  36
+
+#define TARGET_SO_TIMESTAMPING  37
+#define TARGET_SCM_TIMESTAMPING TARGET_SO_TIMESTAMPING
+
+#define TARGET_SO_RXQ_OVFL 40
+
+#define TARGET_SO_WIFI_STATUS   41
+#define TARGET_SCM_WIFI_STATUS  TARGET_SO_WIFI_STATUS
+#define TARGET_SO_PEEK_OFF  42
+
+/* Instruct lower device to use last 4-bytes of skb data as FCS */
+#define TARGET_SO_NOFCS 43
+
 #else
 
/* For setsockopt(2) */
-- 
1.8.1.2

[Qemu-devel] [PATCH 4/7] linux-user: fix futex strace of FUTEX_CLOCK_REALTIME

From: John Rigby john.ri...@linaro.org

Handle same as existing FUTEX_PRIVATE_FLAG.

Signed-off-by: John Rigby john.ri...@linaro.org
Signed-off-by: Riku Voipio riku.voi...@linaro.org
---
 linux-user/strace.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/linux-user/strace.c b/linux-user/strace.c
index 9a18146..0fbae3c 100644
--- a/linux-user/strace.c
+++ b/linux-user/strace.c
@@ -1498,6 +1498,12 @@ if( cmd == val ) { \
 cmd = ~FUTEX_PRIVATE_FLAG;
 }
 #endif
+#ifdef FUTEX_CLOCK_REALTIME
+if (cmd  FUTEX_CLOCK_REALTIME) {
+gemu_log(FUTEX_CLOCK_REALTIME|);
+cmd = ~FUTEX_CLOCK_REALTIME;
+}
+#endif
 print_op(FUTEX_WAIT)
 print_op(FUTEX_WAKE)
 print_op(FUTEX_FD)
-- 
1.8.1.2

[Qemu-devel] [PATCH 2/7] linux-user: improve print_fcntl()

From: Laurent Vivier laur...@vivier.eu

Signed-off-by: Laurent Vivier laur...@vivier.eu
Signed-off-by: Riku Voipio riku.voi...@linaro.org
---
 linux-user/strace.c | 97 +++--
 1 file changed, 79 insertions(+), 18 deletions(-)

diff --git a/linux-user/strace.c b/linux-user/strace.c
index 4e91a6e..9a18146 100644
--- a/linux-user/strace.c
+++ b/linux-user/strace.c
@@ -462,18 +462,6 @@ UNUSED static struct flags mmap_flags[] = {
 FLAG_END,
 };
 
-UNUSED static struct flags fcntl_flags[] = {
-FLAG_TARGET(F_DUPFD),
-FLAG_TARGET(F_GETFD),
-FLAG_TARGET(F_SETFD),
-FLAG_TARGET(F_GETFL),
-FLAG_TARGET(F_SETFL),
-FLAG_TARGET(F_GETLK),
-FLAG_TARGET(F_SETLK),
-FLAG_TARGET(F_SETLKW),
-FLAG_END,
-};
-
 UNUSED static struct flags clone_flags[] = {
 FLAG_GENERIC(CLONE_VM),
 FLAG_GENERIC(CLONE_FS),
@@ -867,12 +855,85 @@ print_fcntl(const struct syscallname *name,
 {
 print_syscall_prologue(name);
 print_raw_param(%d, arg0, 0);
-print_flags(fcntl_flags, arg1, 0);
-/*
- * TODO: check flags and print following argument only
- *   when needed.
- */
-print_pointer(arg2, 1);
+switch(arg1) {
+case TARGET_F_DUPFD:
+gemu_log(F_DUPFD,);
+print_raw_param(TARGET_ABI_FMT_ld, arg2, 1);
+break;
+case TARGET_F_GETFD:
+gemu_log(F_GETFD);
+break;
+case TARGET_F_SETFD:
+gemu_log(F_SETFD,);
+print_raw_param(TARGET_ABI_FMT_ld, arg2, 1);
+break;
+case TARGET_F_GETFL:
+gemu_log(F_GETFL);
+break;
+case TARGET_F_SETFL:
+gemu_log(F_SETFL,);
+print_open_flags(arg2, 1);
+break;
+case TARGET_F_GETLK:
+gemu_log(F_GETLK,);
+print_pointer(arg2, 1);
+break;
+case TARGET_F_SETLK:
+gemu_log(F_SETLK,);
+print_pointer(arg2, 1);
+break;
+case TARGET_F_SETLKW:
+gemu_log(F_SETLKW,);
+print_pointer(arg2, 1);
+break;
+case TARGET_F_GETOWN:
+gemu_log(F_GETOWN);
+break;
+case TARGET_F_SETOWN:
+gemu_log(F_SETOWN,);
+print_raw_param(TARGET_ABI_FMT_ld, arg2, 0);
+break;
+case TARGET_F_GETSIG:
+gemu_log(F_GETSIG);
+break;
+case TARGET_F_SETSIG:
+gemu_log(F_SETSIG,);
+print_raw_param(TARGET_ABI_FMT_ld, arg2, 0);
+break;
+#if TARGET_ABI_BITS == 32
+case TARGET_F_GETLK64:
+gemu_log(F_GETLK64,);
+print_pointer(arg2, 1);
+break;
+case TARGET_F_SETLK64:
+gemu_log(F_SETLK64,);
+print_pointer(arg2, 1);
+break;
+case TARGET_F_SETLKW64:
+gemu_log(F_SETLKW64,);
+print_pointer(arg2, 1);
+break;
+#endif
+case TARGET_F_SETLEASE:
+gemu_log(F_SETLEASE,);
+print_raw_param(TARGET_ABI_FMT_ld, arg2, 0);
+break;
+case TARGET_F_GETLEASE:
+gemu_log(F_GETLEASE);
+break;
+case TARGET_F_DUPFD_CLOEXEC:
+gemu_log(F_DUPFD_CLOEXEC,);
+print_raw_param(TARGET_ABI_FMT_ld, arg2, 1);
+break;
+case TARGET_F_NOTIFY:
+gemu_log(F_NOTIFY,);
+print_raw_param(TARGET_ABI_FMT_ld, arg2, 0);
+break;
+default:
+print_raw_param(TARGET_ABI_FMT_ld, arg1, 0);
+print_pointer(arg2, 1);
+break;
+}
 print_syscall_epilogue(name);
 }
 #define print_fcntl64   print_fcntl
-- 
1.8.1.2

[Qemu-devel] [PATCH 5/7] linux-user: correct semctl() and shmctl()

From: Laurent Vivier laur...@vivier.eu

The parameter union semun of semctl() is not a value
but a pointer to the value.

Moreover, all fields of target_su must be swapped (if needed).

The third argument of shmctl is a pointer.

WITHOUT this patch:

$ ipcs

kernel not configured for shared memory

qemu: uncaught target signal 11 (Segmentation fault) - core dumped

WITH this patch:

$ ipcs

-- Shared Memory Segments 
keyshmid  owner  perms  bytes  nattch status
0x4e545030 0  root  60096 1
0x4e545031 32769  root  60096 1
0x4e545032 65538  root  66696 1
0x4e545033 98307  root  66696 1
0x47505344 131076 root  6668240   1
0x3c81b7f5 163845 laurent   6664096   0
0x 729513990  laurent   600393216 2  dest
0x 729546759  laurent   600393216 2  dest
0x 1879179273 laurent   600393216 2  dest

-- Semaphore Arrays 
keysemid  owner  perms  nsems
0x3c81b7f6 32768  laurent   6661
0x1c44ac47 6586369laurent   6001

-- Message Queues 
keymsqid  owner  perms  used-bytes   messages
0x1c44ac45 458752 laurent60000
0x1c44ac46 491521 laurent60000

Signed-off-by: Laurent Vivier laur...@vivier.eu
Signed-off-by: Riku Voipio riku.voi...@linaro.org
---
 linux-user/syscall.c | 56 
 1 file changed, 35 insertions(+), 21 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index c7fcfc0..7f12563 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -2652,8 +2652,9 @@ static inline abi_long host_to_target_semarray(int semid, 
abi_ulong target_addr,
 }
 
 static inline abi_long do_semctl(int semid, int semnum, int cmd,
- union target_semun target_su)
+ abi_ulong ptr)
 {
+union target_semun *target_su;
 union semun arg;
 struct semid_ds dsarg;
 unsigned short *array = NULL;
@@ -2662,43 +2663,55 @@ static inline abi_long do_semctl(int semid, int semnum, 
int cmd,
 abi_long err;
 cmd = 0xff;
 
+if (!lock_user_struct(VERIFY_READ, target_su, ptr, 1)) {
+return -TARGET_EFAULT;
+}
 switch( cmd ) {
case GETVAL:
case SETVAL:
-arg.val = tswap32(target_su.val);
+arg.val = tswap32(target_su-val);
 ret = get_errno(semctl(semid, semnum, cmd, arg));
-target_su.val = tswap32(arg.val);
+target_su-val = tswap32(arg.val);
 break;
case GETALL:
case SETALL:
-err = target_to_host_semarray(semid, array, target_su.array);
-if (err)
-return err;
+err = target_to_host_semarray(semid, array,
+  tswapal(target_su-array));
+if (err) {
+ret = err;
+break;
+}
 arg.array = array;
 ret = get_errno(semctl(semid, semnum, cmd, arg));
-err = host_to_target_semarray(semid, target_su.array, array);
-if (err)
-return err;
+err = host_to_target_semarray(semid, tswapal(target_su-array),
+  array);
+if (err) {
+ret = err;
+}
 break;
case IPC_STAT:
case IPC_SET:
case SEM_STAT:
-err = target_to_host_semid_ds(dsarg, target_su.buf);
-if (err)
-return err;
+err = target_to_host_semid_ds(dsarg, tswapal(target_su-buf));
+if (err) {
+ret = err;
+break;
+}
 arg.buf = dsarg;
 ret = get_errno(semctl(semid, semnum, cmd, arg));
-err = host_to_target_semid_ds(target_su.buf, dsarg);
-if (err)
-return err;
+err = host_to_target_semid_ds(tswapal(target_su-buf), dsarg);
+if (err) {
+ret = err;
+}
 break;
case IPC_INFO:
case SEM_INFO:
 arg.__buf = seminfo;
 ret = get_errno(semctl(semid, semnum, cmd, arg));
-err = host_to_target_seminfo(target_su.__buf, seminfo);
-if (err)
-return err;
+err = host_to_target_seminfo(tswapal(target_su-__buf), seminfo);
+if (err) {
+ret = err;
+}
 break;
case IPC_RMID:
case GETPID:
@@ -2707,6 +2720,7 @@ static inline abi_long do_semctl(int semid, int semnum, 
int cmd,
 ret = get_errno(semctl(semid, semnum, cmd, NULL));
 break;
 }
+

[Qemu-devel] [PATCH 6/7] linux-user: Support setgroups syscall with no groups

From: Dillon Amburgey dill...@dillona.com

Signed-off-by: Dillon Amburgey dill...@dillona.com
Reviewed-by: Peter Maydell peter.mayd...@linaro.org
Signed-off-by: Riku Voipio riku.voi...@linaro.org
---
 linux-user/syscall.c | 22 --
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 7f12563..e0c71fb 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -7694,18 +7694,20 @@ abi_long do_syscall(void *cpu_env, int num, abi_long 
arg1,
 {
 int gidsetsize = arg1;
 target_id *target_grouplist;
-gid_t *grouplist;
+gid_t *grouplist = NULL;
 int i;
-
-grouplist = alloca(gidsetsize * sizeof(gid_t));
-target_grouplist = lock_user(VERIFY_READ, arg2, gidsetsize * 2, 1);
-if (!target_grouplist) {
-ret = -TARGET_EFAULT;
-goto fail;
+if (gidsetsize) {
+grouplist = alloca(gidsetsize * sizeof(gid_t));
+target_grouplist = lock_user(VERIFY_READ, arg2, gidsetsize * 
2, 1);
+if (!target_grouplist) {
+ret = -TARGET_EFAULT;
+goto fail;
+}
+for (i = 0; i  gidsetsize; i++) {
+grouplist[i] = low2highgid(tswapid(target_grouplist[i]));
+}
+unlock_user(target_grouplist, arg2, 0);
 }
-for(i = 0;i  gidsetsize; i++)
-grouplist[i] = low2highgid(tswapid(target_grouplist[i]));
-unlock_user(target_grouplist, arg2, 0);
 ret = get_errno(setgroups(gidsetsize, grouplist));
 }
 break;
-- 
1.8.1.2

[Qemu-devel] [PATCH 3/7] linux-user/syscall.c: handle FUTEX_WAIT_BITSET in do_futex

From: John Rigby john.ri...@linaro.org

Upstream libc has recently changed to start using
FUTEX_WAIT_BITSET instead of FUTEX_WAIT and this
is causing do_futex to return -TARGET_ENOSYS.

Pass bitset in val3 to sys_futex which will be
ignored by kernel for the FUTEX_WAIT case.

Signed-off-by: John Rigby john.ri...@linaro.org
Signed-off-by: Riku Voipio riku.voi...@linaro.org
---
 linux-user/syscall.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 19630ea..c7fcfc0 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -4922,6 +4922,7 @@ static int do_futex(target_ulong uaddr, int op, int val, 
target_ulong timeout,
 #endif
 switch (base_op) {
 case FUTEX_WAIT:
+case FUTEX_WAIT_BITSET:
 if (timeout) {
 pts = ts;
 target_to_host_timespec(pts, timeout);
@@ -4929,7 +4930,7 @@ static int do_futex(target_ulong uaddr, int op, int val, 
target_ulong timeout,
 pts = NULL;
 }
 return get_errno(sys_futex(g2h(uaddr), op, tswap32(val),
- pts, NULL, 0));
+ pts, NULL, val3));
 case FUTEX_WAKE:
 return get_errno(sys_futex(g2h(uaddr), op, val, NULL, NULL, 0));
 case FUTEX_FD:
-- 
1.8.1.2

[Qemu-devel] [PATCH 7/7] linux-user: Add more sparc syscall numbers

From: Dillon Amburgey dill...@dillona.com

Signed-off-by: Dillon Amburgey dill...@dillona.com
Signed-off-by: Riku Voipio riku.voi...@linaro.org
---
 linux-user/sparc/syscall_nr.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/linux-user/sparc/syscall_nr.h b/linux-user/sparc/syscall_nr.h
index 061711c..534e6e9 100644
--- a/linux-user/sparc/syscall_nr.h
+++ b/linux-user/sparc/syscall_nr.h
@@ -200,6 +200,8 @@
 #define TARGET_NR__newselect 230 /* Linux Specific 
 */
 #define TARGET_NR_time   231 /* Linux Specific 
 */
 #define TARGET_NR_stime  233 /* Linux Specific 
 */
+#define TARGET_NR_statfs64   234 /* Linux Specific 
 */
+#define TARGET_NR_fstatfs64  235 /* Linux Specific 
 */
 #define TARGET_NR__llseek236 /* Linux Specific 
 */
 #define TARGET_NR_mlock  237
 #define TARGET_NR_munlock238
-- 
1.8.1.2

Re: [Qemu-devel] [PATCH V7 0/5] Send the gratuitous by guest

On Thu, Mar 07, 2013 at 06:13:41PM +0800, Jason Wang wrote:
 On 03/07/2013 06:04 PM, Michael S. Tsirkin wrote:
  On Thu, Mar 07, 2013 at 04:23:46PM +0800, Jason Wang wrote:
  This series tries to let guest instead of qemu to send the gratuitous 
  packets
  after migration when guest is capable of doing this. This is needed since 
  it's
  impossible for qemu to keep track of all configurations (e.g 802.1Q) and 
  mac
  addresses (more than one mac address may be used by guest). So qemu can't 
  build
  gratuitous packets for all those configurations properly. The only 
  solution is
  let guest driver who knew all needed information to do this.
 
  The series first introduces a new runstate which just tracks the state 
  when the
  migration is finished and guest is about to start. And then we can just 
  trying
  to notify the guest to send the GARP after changing from this state to
  running. A model specific announcing method were also also introduced to 
  let
  each kinds of nic do its own notification. When there's no such method 
  register
  for the nic, the old style of sending RARP were kept. And the last two 
  patches
  implemented the virtio-net method of notification.
  Do we want to retry SELF_ANNOUNCE_ROUNDS?
 
 Yes, we do the announcement several times like in the past.
  Changes from V6:
  - introduce a new runstate instead of using a global variable check the 
  state
 
  Changes from V5:
  - use a global variable to decide whether an announcement is needed after 
  migration
  - align with virtio spec and let guest ack the announcement notification 
  through
control vq instead of config status writing
 
  Changes from V4:
  - keep the old behavior that send the gratuitous packets only after 
  migration
  I wonder why it's a sane thing to do. How about simply sending the event 
  after load?
 
 The aim is to limit the change of the behaviour to focus on migration.
 We may also need this after cont,

Hmm why after cont?

 and then maybe we can just do this
 unconditionally in vm_start().

OK but then the new infrastructure we are adding will be dead code,
won't it?

Can we do this simply using a post load hook for now?


  - decide whether to send gratuitous packets by previous runstate instead 
  of a dedicated parameter
  - check virtio_net_started() instead of VIRTIO_NET_S_LINK_UP before issue 
  the config update interrupt
  - move VIRTIO_NET_S_ANNOUNCE to 0x100 and supress guest config write to RO 
  bits
  - cleanups suggested by Michael
 
  Tested with migration within 802.1Q.
 
  Jason Wang (5):
runstate: introduce prelaunch-migrate state
net: announce self after vm is started
net: model specific announcing support
virtio-net: notify guest to annouce itself
virtio-net: compat guest announce
 
   hw/pc.h   |6 +-
   hw/virtio-net.c   |   30 ++
   hw/virtio-net.h   |   15 ++-
   include/net/net.h |2 ++
   migration.c   |4 +---
   qapi-schema.json  |5 -
   savevm.c  |8 ++--
   vl.c  |8 +++-
   8 files changed, 69 insertions(+), 9 deletions(-)

Re: [Qemu-devel] [PATCH] coroutine: use AioContext for CoQueue BH

On Wed, Mar 06, 2013 at 04:41:01PM +0100, Kevin Wolf wrote:
 Am 06.03.2013 um 15:53 hat Stefan Hajnoczi geschrieben:
  CoQueue uses a BH to awake coroutines that were made ready to run again
  using qemu_co_queue_next() or qemu_co_queue_restart_all().  The BH
  currently runs in the iothread AioContext and would break coroutines
  that run in a different AioContext.
  
  This is a slightly tricky problem because the lifetime of the BH exceeds
  that of the CoQueue.  This means coroutines can be awoken after CoQueue
  itself has been freed.
 
 Does this really happen in practice? If so, that sounds like a bug to
 me.

I didn't audit the callers.  It seems reasonable that a CoQueue is
allowed to go out of scope once qemu_co_queue_empty() returns true.  I
don't see what is buggy about that.

  Finally, I don't want to move unlock_queue and unlock_bh into
  AioContext.  That would break encapsulation - AioContext isn't supposed
  to know about CoQueue.
 
 So what you would need here is AioContext local storage. I wonder if
 this will stay a requirement unique to CoQueues when AioContexts gain
 wider use.

I'm not convinced we need AioContext local storage.

We could go all the way and make AioContext* thread-local.  Then we can
figure out the current AioContext* or NULL, if none, at any location in
the code.

Then CoQueue (and others?) could attach their resources to AioContext
local storage.

But let's avoid it if there are clean solutions because implicit state
like globals or thread-locals makes code harder to understand.

Stefan

Re: [Qemu-devel] [PATCH 1/7] block: only force IO completion in .bdrv_truncate if we are shrinking

Am 07.03.2013 um 11:00 schrieb Kevin Wolf kw...@redhat.com:

Am 07.03.2013 um 10:25 hat Peter Lieven geschrieben:

Am 07.03.2013 um 10:22 schrieb Kevin Wolf kw...@redhat.com:

Am 07.03.2013 um 10:16 hat Peter Lieven geschrieben:
If bs-growable is 1 for all drivers, whats the fix status of
CVE-2008-0928? This
flag was introduced as a fix for this problem.

bdrv_check_byte_request() does nothing useful if bs-growable is 1.

Sorry, I have to admin I am little confused by what is happening in
bdrv_open().

However, what I can say is that bs-growable is 1 for an iSCSI backed
harddrive and I wonder how this can happen if bdrv_file_open is not used
for
opening it because that is the only place where bs-growable is set to 1.

It is used for the iscsi driver. You have a raw BDS (growable == false)
on top of an iscsi one (growable == true).

Ok, but growable == true is wrong for the iSCSI driver isn`t it?

I guess it depends on your definition. If you say that growable includes
the capability of growing the image, then yes, it's wrong. If you only
interpret it as permission to write beyond EOF (if the driver supports
that), then it's right even though any such attempt will result in an
error.

Practically speaking, the difference is between -EIO returned from
bdrv_check_byte_request() and -EIO from iscsi_aio_write16_cb(), i.e.
the result is the same.

Yes, but there are many broken storage implementations outside which might react
freaky if there is a read/write beyond the LUN size or with negative offset.
The check_request would block such requests earlier protecting the
infrastructure.
I have a case open with the vendor of a storage system where I am able to crash
the storage sending illegal requests. I personally would feel more comfortable
if illegal requests where just not possible.

Peter

Re: [Qemu-devel] [PATCH V7 0/5] Send the gratuitous by guest

On 03/07/2013 06:25 PM, Michael S. Tsirkin wrote:
 On Thu, Mar 07, 2013 at 06:13:41PM +0800, Jason Wang wrote:
 On 03/07/2013 06:04 PM, Michael S. Tsirkin wrote:
 On Thu, Mar 07, 2013 at 04:23:46PM +0800, Jason Wang wrote:
 This series tries to let guest instead of qemu to send the gratuitous 
 packets
 after migration when guest is capable of doing this. This is needed since 
 it's
 impossible for qemu to keep track of all configurations (e.g 802.1Q) and 
 mac
 addresses (more than one mac address may be used by guest). So qemu can't 
 build
 gratuitous packets for all those configurations properly. The only 
 solution is
 let guest driver who knew all needed information to do this.

 The series first introduces a new runstate which just tracks the state 
 when the
 migration is finished and guest is about to start. And then we can just 
 trying
 to notify the guest to send the GARP after changing from this state to
 running. A model specific announcing method were also also introduced to 
 let
 each kinds of nic do its own notification. When there's no such method 
 register
 for the nic, the old style of sending RARP were kept. And the last two 
 patches
 implemented the virtio-net method of notification.
 Do we want to retry SELF_ANNOUNCE_ROUNDS?
 Yes, we do the announcement several times like in the past.
 Changes from V6:
 - introduce a new runstate instead of using a global variable check the 
 state

 Changes from V5:
 - use a global variable to decide whether an announcement is needed after 
 migration
 - align with virtio spec and let guest ack the announcement notification 
 through
   control vq instead of config status writing

 Changes from V4:
 - keep the old behavior that send the gratuitous packets only after 
 migration
 I wonder why it's a sane thing to do. How about simply sending the event 
 after load?
 The aim is to limit the change of the behaviour to focus on migration.
 We may also need this after cont,
 Hmm why after cont?

If we stop the vm for a long period, the mac will be missed in the
forward table of the bridge also.
 and then maybe we can just do this
 unconditionally in vm_start().
 OK but then the new infrastructure we are adding will be dead code,
 won't it?

If we do this, there's no need to introduce a new state then.

 Can we do this simply using a post load hook for now?

Maybe not, this means we may want to inject an interrupt to guest when
vm is not running.
 - decide whether to send gratuitous packets by previous runstate instead 
 of a dedicated parameter
 - check virtio_net_started() instead of VIRTIO_NET_S_LINK_UP before issue 
 the config update interrupt
 - move VIRTIO_NET_S_ANNOUNCE to 0x100 and supress guest config write to RO 
 bits
 - cleanups suggested by Michael

 Tested with migration within 802.1Q.

 Jason Wang (5):
   runstate: introduce prelaunch-migrate state
   net: announce self after vm is started
   net: model specific announcing support
   virtio-net: notify guest to annouce itself
   virtio-net: compat guest announce

  hw/pc.h   |6 +-
  hw/virtio-net.c   |   30 ++
  hw/virtio-net.h   |   15 ++-
  include/net/net.h |2 ++
  migration.c   |4 +---
  qapi-schema.json  |5 -
  savevm.c  |8 ++--
  vl.c  |8 +++-
  8 files changed, 69 insertions(+), 9 deletions(-)

Re: [Qemu-devel] [RFC] s390: dump-guest-memory implementation for s390x arch.

2013-03-07 Thread Alexander Graf


On 28.02.2013, at 10:39, Jens Freimann wrote:

 From: Ekaterina Tumanova tuman...@linux.vnet.ibm.com
 
 dump-guest-memory QEMU monitor command didn't work for s390 architecture.
 The result of the command is supposed to be ELF format crash-readable
 dump.
 In order to implement this, the arch-specific part of dump-guest-memory
 was added:
 target-s390x/arch_dump.c contains the whole set of function for writing
 Elf note sections of all types for s390x.
 target-s390x/arch_memory_mapping.c contains stub functions, current
 patch places all guest physical memory into the dump.
 
 Signed-off-by: Ekaterina Tumanova tuman...@linux.vnet.ibm.com
 Signed-off-by: Jens Freimann jf...@linux.vnet.ibm.com
 
 ---
 configure  |   4 +-
 include/elf.h  |   6 +
 target-s390x/Makefile.objs |   2 +-
 target-s390x/arch_dump.c   | 240 +
 target-s390x/arch_memory_mapping.c |  26 
 5 files changed, 275 insertions(+), 3 deletions(-)
 create mode 100644 target-s390x/arch_dump.c
 create mode 100644 target-s390x/arch_memory_mapping.c
 
 diff --git a/configure b/configure
 index 19738ac..938be14 100755
 --- a/configure
 +++ b/configure
 @@ -4146,7 +4146,7 @@ case $target_arch2 in
 fi
 esac
 case $target_arch2 in
 -  i386|x86_64)
 +  i386|x86_64|s390x)
 echo CONFIG_HAVE_GET_MEMORY_MAPPING=y  $config_target_mak
 esac
 if test $target_arch2 = ppc64 -a $fdt = yes; then
 @@ -4159,7 +4159,7 @@ if test $target_softmmu = yes ; then
   echo CONFIG_SOFTMMU=y  $config_target_mak
   echo LIBS+=$libs_softmmu $target_libs_softmmu  $config_target_mak
   case $target_arch2 in
 -i386|x86_64)
 +i386|x86_64|s390x)
   echo CONFIG_HAVE_CORE_DUMP=y  $config_target_mak
   esac
 fi
 diff --git a/include/elf.h b/include/elf.h
 index a21ea53..ba4b3a7 100644
 --- a/include/elf.h
 +++ b/include/elf.h
 @@ -1219,11 +1219,17 @@ typedef struct elf64_shdr {
 
 /* Notes used in ET_CORE */
 #define NT_PRSTATUS   1
 +#define NT_FPREGSET 2
 #define NT_PRFPREG2
 #define NT_PRPSINFO   3
 #define NT_TASKSTRUCT 4
 #define NT_AUXV   6
 #define NT_PRXFPREG 0x46e62b7f  /* copied from 
 gdb5.1/include/elf/common.h */
 +#define NT_S390_PREFIX  0x305   /* s390 prefix register */
 +#define NT_S390_CTRS0x304   /* s390 control registers */
 +#define NT_S390_TODPREG 0x303   /* s390 TOD programmable register */
 +#define NT_S390_TODCMP  0x302   /* s390 TOD clock comparator 
 register */
 +#define NT_S390_TIMER   0x301   /* s390 timer register */
 
 
 /* Note header in a PT_NOTE section */
 diff --git a/target-s390x/Makefile.objs b/target-s390x/Makefile.objs
 index 4e63417..82c79c4 100644
 --- a/target-s390x/Makefile.objs
 +++ b/target-s390x/Makefile.objs
 @@ -1,4 +1,4 @@
 obj-y += translate.o helper.o cpu.o interrupt.o
 obj-y += int_helper.o fpu_helper.o cc_helper.o mem_helper.o misc_helper.o
 -obj-$(CONFIG_SOFTMMU) += ioinst.o
 +obj-$(CONFIG_SOFTMMU) += ioinst.o arch_dump.o arch_memory_mapping.o
 obj-$(CONFIG_KVM) += kvm.o
 diff --git a/target-s390x/arch_dump.c b/target-s390x/arch_dump.c
 new file mode 100644
 index 000..3098cd1
 --- /dev/null
 +++ b/target-s390x/arch_dump.c
 @@ -0,0 +1,240 @@
 +/*
 + * writing ELF notes for s390x arch
 + *
 + *  
 + * Copyright IBM Corp. 2012
 + *
 + * Ekaterina Tumanova tuman...@linux.vnet.ibm.com
 + *
 + * This work is licensed under the terms of the GNU GPL, version 2 or later.
 + * See the COPYING file in the top-level directory.
 + *
 + */
 +
 +#include cpu.h
 +#include elf.h
 +#include exec/cpu-all.h
 +#include sysemu/dump.h
 +#include sysemu/kvm.h
 +
 +
 +struct s390x_user_regs_struct {
 +uint64_tpsw[2];
 +uint64_tgprs[16];
 +uint32_tacrs[16];
 +} QEMU_PACKED;
 +
 +typedef struct s390x_user_regs_struct s390x_user_regs;
 +
 +struct s390x_elf_prstatus_struct {
 +uint8_t pad1[32];
 +uint32_t pid;
 +uint8_t pad2[76];
 +s390x_user_regs regs;
 +uint8_t pad3[16];
 +} QEMU_PACKED;
 +
 +typedef struct s390x_elf_prstatus_struct s390x_elf_prstatus;
 +
 +struct s390x_elf_fpregset_struct {
 +uint32_tfpc;
 +uint32_tpad;
 +uint64_tfprs[16];
 +} QEMU_PACKED;
 +
 +typedef struct s390x_elf_fpregset_struct s390x_elf_fpregset;
 +
 +typedef struct note_struct {
 +Elf64_Nhdr hdr;
 +char name[5];
 +char pad3[3];
 + union {
 +s390x_elf_prstatus prstatus;
 +s390x_elf_fpregset fpregset;
 +uint32_t prefix;
 +uint64_t timer;
 +uint64_t todcmp;
 +uint32_t todpreg;
 +uint64_t ctrs[16];
 +} contents;
 +} QEMU_PACKED note_t;
 +
 +static int s390x_write_elf64_prstatus(note_t *note, CPUArchState *env)
 +{
 +int i;
 +s390x_user_regs *regs;
 +
 +note-hdr.n_type = cpu_to_be32(NT_PRSTATUS);
 +
 +regs = (note-contents.prstatus.regs);
 +

Re: [Qemu-devel] [PATCH V7 0/5] Send the gratuitous by guest

On Thu, Mar 07, 2013 at 06:33:30PM +0800, Jason Wang wrote:
 On 03/07/2013 06:25 PM, Michael S. Tsirkin wrote:
  On Thu, Mar 07, 2013 at 06:13:41PM +0800, Jason Wang wrote:
  On 03/07/2013 06:04 PM, Michael S. Tsirkin wrote:
  On Thu, Mar 07, 2013 at 04:23:46PM +0800, Jason Wang wrote:
  This series tries to let guest instead of qemu to send the gratuitous 
  packets
  after migration when guest is capable of doing this. This is needed 
  since it's
  impossible for qemu to keep track of all configurations (e.g 802.1Q) and 
  mac
  addresses (more than one mac address may be used by guest). So qemu 
  can't build
  gratuitous packets for all those configurations properly. The only 
  solution is
  let guest driver who knew all needed information to do this.
 
  The series first introduces a new runstate which just tracks the state 
  when the
  migration is finished and guest is about to start. And then we can just 
  trying
  to notify the guest to send the GARP after changing from this state to
  running. A model specific announcing method were also also introduced to 
  let
  each kinds of nic do its own notification. When there's no such method 
  register
  for the nic, the old style of sending RARP were kept. And the last two 
  patches
  implemented the virtio-net method of notification.
  Do we want to retry SELF_ANNOUNCE_ROUNDS?
  Yes, we do the announcement several times like in the past.
  Changes from V6:
  - introduce a new runstate instead of using a global variable check the 
  state
 
  Changes from V5:
  - use a global variable to decide whether an announcement is needed 
  after migration
  - align with virtio spec and let guest ack the announcement notification 
  through
control vq instead of config status writing
 
  Changes from V4:
  - keep the old behavior that send the gratuitous packets only after 
  migration
  I wonder why it's a sane thing to do. How about simply sending the event 
  after load?
  The aim is to limit the change of the behaviour to focus on migration.
  We may also need this after cont,
  Hmm why after cont?
 
 If we stop the vm for a long period, the mac will be missed in the
 forward table of the bridge also.

Hmm okay, needs some thought.

  and then maybe we can just do this
  unconditionally in vm_start().
  OK but then the new infrastructure we are adding will be dead code,
  won't it?
 
 If we do this, there's no need to introduce a new state then.
 
  Can we do this simply using a post load hook for now?
 
 Maybe not, this means we may want to inject an interrupt to guest when
 vm is not running.

What I'm suggesting is basically:
- set some per device flag on load
- announce based on vmstart if flag is set

We can drop the flag later if we want it on every vmstart.

  - decide whether to send gratuitous packets by previous runstate instead 
  of a dedicated parameter
  - check virtio_net_started() instead of VIRTIO_NET_S_LINK_UP before 
  issue the config update interrupt
  - move VIRTIO_NET_S_ANNOUNCE to 0x100 and supress guest config write to 
  RO bits
  - cleanups suggested by Michael
 
  Tested with migration within 802.1Q.
 
  Jason Wang (5):
runstate: introduce prelaunch-migrate state
net: announce self after vm is started
net: model specific announcing support
virtio-net: notify guest to annouce itself
virtio-net: compat guest announce
 
   hw/pc.h   |6 +-
   hw/virtio-net.c   |   30 ++
   hw/virtio-net.h   |   15 ++-
   include/net/net.h |2 ++
   migration.c   |4 +---
   qapi-schema.json  |5 -
   savevm.c  |8 ++--
   vl.c  |8 +++-
   8 files changed, 69 insertions(+), 9 deletions(-)

Re: [Qemu-devel] qemu-img do not verify encrytption

2013-03-07 Thread yue-kvm

hi stefan:


[root@kvm ~]# qemu-img -V
qemu-img version 0.12.1, Copyright (c) 2004-2008 Fabrice Bellard


OS centos 6.3
thanks








At 2013-03-07 17:56:50,Stefan Hajnoczi stefa...@gmail.com wrote:
On Thu, Mar 07, 2013 at 02:49:30PM +0800, yue-kvm wrote:
 i create qcow2 format image with -o encryption, and assign password.
 
 but when i qemu-img info encryt.qcow2 ,  it display encryt.qcow2 info no 
 master  what i input, even just hit enter.
 the 'Password Authentication ' of qemu-img may be invalid.

Don't worry, your data is encrypted.  The qemu-img info password check
is unnecessary because qcow2 metadata is unencrypted.

I didn't get a password prompt from qemu-img info on an encrypted qcow2
file using qemu.git/master.

Which version of qemu-img are you using?  Perhaps the unnecessary prompt
has been fixed in a later version.

Stefan

Re: [Qemu-devel] qemu-img do not verify encrytption

2013-03-07 Thread Wei-Ren Chen

On Thu, Mar 07, 2013 at 06:57:19PM +0800, yue-kvm wrote:
 hi stefan:
 
 [root@kvm ~]# qemu-img -V
 qemu-img version 0.12.1, Copyright (c) 2004-2008 Fabrice Bellard

  Hrm..., 0.12 is too old, why don't you use newer version?

Regards,
chenwj

-- 
Wei-Ren Chen (陳韋任)
Computer Systems Lab, Institute of Information Science,
Academia Sinica, Taiwan (R.O.C.)
Tel:886-2-2788-3799 #1667
Homepage: http://people.cs.nctu.edu.tw/~chenwj

Re: [Qemu-devel] [PATCH 01/19] linux-user: stack_base is now mandatory on all targets

2013-03-07 Thread Laurent Desnogues

On Wed, Feb 8, 2012 at 10:46 AM, Laurent Desnogues
laurent.desnog...@gmail.com wrote:
 On Fri, Feb 3, 2012 at 3:49 PM,  riku.voi...@linaro.org wrote:
 From: Riku Voipio riku.voi...@linaro.org

 Signed-off-by: Riku Voipio riku.voi...@linaro.org
 ---
  linux-user/qemu.h |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)

 diff --git a/linux-user/qemu.h b/linux-user/qemu.h
 index 55ad9d8..30e2abd 100644
 --- a/linux-user/qemu.h
 +++ b/linux-user/qemu.h
 @@ -123,10 +123,10 @@ typedef struct TaskState {
  #endif
  #if defined(TARGET_ARM) || defined(TARGET_M68K) || defined(TARGET_UNICORE32)
 /* Extra fields for semihosted binaries.  */
 -uint32_t stack_base;
 uint32_t heap_base;
 uint32_t heap_limit;
  #endif
 +uint32_t stack_base;

 Shouldn't this be abi_ulong instead of uint32_t?

Ping...

Yes it's more than a year old, but if Aarch64 support with semihosting
ever comes to life :)

Laurent

 int used; /* non zero if used */
 struct image_info *info;
 struct linux_binprm *bprm;

 Laurent

Re: [Qemu-devel] problems with freeBSD

2013-03-07 Thread Gerd Hoffmann

  Hi,

 Probably works, but never appeared in a separate release:
 
 commit 3588185b8396eb97fd9efd41c2b97775465f67c4
 Author: Gerd Hoffmann kra...@redhat.com
 Date:   Mon Jan 21 09:17:16 2013 +0100
 
 seabios: update to 1.7.2 release

Built with gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-3) (rhel-6).

 commit 5f876756c57c15f5e14d4136fc432b74f05f082b
 Author: Anthony Liguori aligu...@us.ibm.com
 Date:   Wed Feb 6 05:12:06 2013 -0600
 
 bios: recompile BIOS

 gcc version 4.7.2 20121109 (Red Hat 4.7.2-8) (GCC)

 commit 5c75fb10029c5fd1e705a6ef5d698fbea06c7a33
 Author: Gerd Hoffmann kra...@redhat.com
 Date:   Thu Feb 28 09:18:56 2013 +0100
 
 update seabios to 1.7.2.1

Built with gcc (GCC) 4.7.2 20121015 (Red Hat 4.7.2-5)
(rhel-6 devtoolkit-1.1) as gcc 4.4 builds are too big and bumb seabios
size from 128k to 256k.  Which was the reason for anthonys rebuild.

cheers,
  Gerd

Re: [Qemu-devel] [PATCH v11 1/3] iov: Factor out hexdumper

2013-03-07 Thread Peter Crosthwaite

Ping!

Any issues here? Peter wanted to give this list time so it missed the
last arm-devs.

Regards,
Peter

On Wed, Feb 27, 2013 at 3:17 PM, Peter Crosthwaite
peter.crosthwa...@xilinx.com wrote:
 Factor out the hexdumper functionality from iov for all to use. Useful for
 creating verbose debug printfery that dumps packet data.

 Signed-off-by: Peter Crosthwaite peter.crosthwa...@xilinx.com
 ---
 changed from v10:
 Added Gerd and Red Hat to (c)
 reworked iov hexdumper to use iov_to_buf (Gerd Review)
 changed from v9:
 changed original source info in header to point to Gerds hexdump commit to 
 iov.c
 Moved header prototype to qemu-common.h (MJT review)
 Added brief comment in header for hexdump()

  include/qemu-common.h |6 ++
  util/Makefile.objs|1 +
  util/hexdump.c|   37 +
  util/iov.c|   36 +++-
  4 files changed, 55 insertions(+), 25 deletions(-)
  create mode 100644 util/hexdump.c

 diff --git a/include/qemu-common.h b/include/qemu-common.h
 index 80016ad..804667a 100644
 --- a/include/qemu-common.h
 +++ b/include/qemu-common.h
 @@ -430,4 +430,10 @@ int64_t pow2floor(int64_t value);
  int uleb128_encode_small(uint8_t *out, uint32_t n);
  int uleb128_decode_small(const uint8_t *in, uint32_t *n);

 +/*
 + * Hexdump a buffer to a file. An optional string prefix is added to every 
 line
 + */
 +
 +void hexdump(const char *buf, FILE *fp, const char *prefix, size_t size);
 +
  #endif
 diff --git a/util/Makefile.objs b/util/Makefile.objs
 index 495a178..068ceac 100644
 --- a/util/Makefile.objs
 +++ b/util/Makefile.objs
 @@ -8,3 +8,4 @@ util-obj-y += error.o qemu-error.o
  util-obj-$(CONFIG_POSIX) += compatfd.o
  util-obj-y += iov.o aes.o qemu-config.o qemu-sockets.o uri.o notify.o
  util-obj-y += qemu-option.o qemu-progress.o
 +util-obj-y += hexdump.o
 diff --git a/util/hexdump.c b/util/hexdump.c
 new file mode 100644
 index 000..0d0efc8
 --- /dev/null
 +++ b/util/hexdump.c
 @@ -0,0 +1,37 @@
 +/*
 + * Helper to hexdump a buffer
 + *
 + * Copyright (c) 2013 Red Hat, Inc.
 + * Copyright (c) 2013 Gerd Hoffmann kra...@redhat.com
 + * Copyright (c) 2013 Peter Crosthwaite peter.crosthwa...@xilinx.com
 + * Copyright (c) 2013 Xilinx, Inc
 + *
 + * This work is licensed under the terms of the GNU GPL, version 2.  See
 + * the COPYING file in the top-level directory.
 + *
 + * Contributions after 2012-01-13 are licensed under the terms of the
 + * GNU GPL, version 2 or (at your option) any later version.
 + */
 +
 +#include qemu-common.h
 +
 +void hexdump(const char *buf, FILE *fp, const char *prefix, size_t size)
 +{
 +unsigned int b;
 +
 +for (b = 0; b  size; b++) {
 +if ((b % 16) == 0) {
 +fprintf(fp, %s: %04x:, prefix, b);
 +}
 +if ((b % 4) == 0) {
 +fprintf(fp,  );
 +}
 +fprintf(fp,  %02x, (unsigned char)buf[b]);
 +if ((b % 16) == 15) {
 +fprintf(fp, \n);
 +}
 +}
 +if ((b % 16) != 0) {
 +fprintf(fp, \n);
 +}
 +}
 diff --git a/util/iov.c b/util/iov.c
 index fbe675d..9dae318 100644
 --- a/util/iov.c
 +++ b/util/iov.c
 @@ -201,32 +201,18 @@ ssize_t iov_send_recv(int sockfd, struct iovec *iov, 
 unsigned iov_cnt,
  void iov_hexdump(const struct iovec *iov, const unsigned int iov_cnt,
   FILE *fp, const char *prefix, size_t limit)
  {
 -unsigned int i, v, b;
 -uint8_t *c;
 -
 -c = iov[0].iov_base;
 -for (i = 0, v = 0, b = 0; b  limit; i++, b++) {
 -if (i == iov[v].iov_len) {
 -i = 0; v++;
 -if (v == iov_cnt) {
 -break;
 -}
 -c = iov[v].iov_base;
 -}
 -if ((b % 16) == 0) {
 -fprintf(fp, %s: %04x:, prefix, b);
 -}
 -if ((b % 4) == 0) {
 -fprintf(fp,  );
 -}
 -fprintf(fp,  %02x, c[i]);
 -if ((b % 16) == 15) {
 -fprintf(fp, \n);
 -}
 -}
 -if ((b % 16) != 0) {
 -fprintf(fp, \n);
 +int v;
 +size_t size = 0;
 +char *buf;
 +
 +for (v = 0; v  iov_cnt; v++) {
 +size += iov[v].iov_len;
  }
 +size = size  limit ? limit : size;
 +buf = g_malloc(size);
 +iov_to_buf(iov, iov_cnt, 0, buf, size);
 +hexdump(buf, fp, prefix, size);
 +g_free(buf);
  }

  unsigned iov_copy(struct iovec *dst_iov, unsigned int dst_iov_cnt,
 --
 1.7.0.4

Re: [Qemu-devel] [RFC] s390: dump-guest-memory implementation for s390x arch.

2013-03-07 Thread Christian Borntraeger

On 07/03/13 11:49, Alexander Graf wrote:
 
 On 28.02.2013, at 10:39, Jens Freimann wrote:
 
 From: Ekaterina Tumanova tuman...@linux.vnet.ibm.com

 dump-guest-memory QEMU monitor command didn't work for s390 architecture.
 The result of the command is supposed to be ELF format crash-readable
 dump.
 In order to implement this, the arch-specific part of dump-guest-memory
 was added:
 target-s390x/arch_dump.c contains the whole set of function for writing
 Elf note sections of all types for s390x.
 target-s390x/arch_memory_mapping.c contains stub functions, current
 patch places all guest physical memory into the dump.

 Signed-off-by: Ekaterina Tumanova tuman...@linux.vnet.ibm.com
 Signed-off-by: Jens Freimann jf...@linux.vnet.ibm.com

 ---
 configure  |   4 +-
 include/elf.h  |   6 +
 target-s390x/Makefile.objs |   2 +-
 target-s390x/arch_dump.c   | 240 
 +
 target-s390x/arch_memory_mapping.c |  26 
 5 files changed, 275 insertions(+), 3 deletions(-)
 create mode 100644 target-s390x/arch_dump.c
 create mode 100644 target-s390x/arch_memory_mapping.c

 diff --git a/configure b/configure
 index 19738ac..938be14 100755
 --- a/configure
 +++ b/configure
 @@ -4146,7 +4146,7 @@ case $target_arch2 in
 fi
 esac
 case $target_arch2 in
 -  i386|x86_64)
 +  i386|x86_64|s390x)
 echo CONFIG_HAVE_GET_MEMORY_MAPPING=y  $config_target_mak
 esac
 if test $target_arch2 = ppc64 -a $fdt = yes; then
 @@ -4159,7 +4159,7 @@ if test $target_softmmu = yes ; then
   echo CONFIG_SOFTMMU=y  $config_target_mak
   echo LIBS+=$libs_softmmu $target_libs_softmmu  $config_target_mak
   case $target_arch2 in
 -i386|x86_64)
 +i386|x86_64|s390x)
   echo CONFIG_HAVE_CORE_DUMP=y  $config_target_mak
   esac
 fi
 diff --git a/include/elf.h b/include/elf.h
 index a21ea53..ba4b3a7 100644
 --- a/include/elf.h
 +++ b/include/elf.h
 @@ -1219,11 +1219,17 @@ typedef struct elf64_shdr {

 /* Notes used in ET_CORE */
 #define NT_PRSTATUS  1
 +#define NT_FPREGSET 2
 #define NT_PRFPREG   2
 #define NT_PRPSINFO  3
 #define NT_TASKSTRUCT4
 #define NT_AUXV  6
 #define NT_PRXFPREG 0x46e62b7f  /* copied from 
 gdb5.1/include/elf/common.h */
 +#define NT_S390_PREFIX  0x305   /* s390 prefix register */
 +#define NT_S390_CTRS0x304   /* s390 control registers */
 +#define NT_S390_TODPREG 0x303   /* s390 TOD programmable register */
 +#define NT_S390_TODCMP  0x302   /* s390 TOD clock comparator 
 register */
 +#define NT_S390_TIMER   0x301   /* s390 timer register */


 /* Note header in a PT_NOTE section */
 diff --git a/target-s390x/Makefile.objs b/target-s390x/Makefile.objs
 index 4e63417..82c79c4 100644
 --- a/target-s390x/Makefile.objs
 +++ b/target-s390x/Makefile.objs
 @@ -1,4 +1,4 @@
 obj-y += translate.o helper.o cpu.o interrupt.o
 obj-y += int_helper.o fpu_helper.o cc_helper.o mem_helper.o misc_helper.o
 -obj-$(CONFIG_SOFTMMU) += ioinst.o
 +obj-$(CONFIG_SOFTMMU) += ioinst.o arch_dump.o arch_memory_mapping.o
 obj-$(CONFIG_KVM) += kvm.o
 diff --git a/target-s390x/arch_dump.c b/target-s390x/arch_dump.c
 new file mode 100644
 index 000..3098cd1
 --- /dev/null
 +++ b/target-s390x/arch_dump.c
 @@ -0,0 +1,240 @@
 +/*
 + * writing ELF notes for s390x arch
 + *
 + *  
 + * Copyright IBM Corp. 2012
 + *
 + * Ekaterina Tumanova tuman...@linux.vnet.ibm.com
 + *
 + * This work is licensed under the terms of the GNU GPL, version 2 or later.
 + * See the COPYING file in the top-level directory.
 + *
 + */
 +
 +#include cpu.h
 +#include elf.h
 +#include exec/cpu-all.h
 +#include sysemu/dump.h
 +#include sysemu/kvm.h
 +
 +
 +struct s390x_user_regs_struct {
 +uint64_tpsw[2];
 +uint64_tgprs[16];
 +uint32_tacrs[16];
 +} QEMU_PACKED;
 +
 +typedef struct s390x_user_regs_struct s390x_user_regs;
 +
 +struct s390x_elf_prstatus_struct {
 +uint8_t pad1[32];
 +uint32_t pid;
 +uint8_t pad2[76];
 +s390x_user_regs regs;
 +uint8_t pad3[16];
 +} QEMU_PACKED;
 +
 +typedef struct s390x_elf_prstatus_struct s390x_elf_prstatus;
 +
 +struct s390x_elf_fpregset_struct {
 +uint32_tfpc;
 +uint32_tpad;
 +uint64_tfprs[16];
 +} QEMU_PACKED;
 +
 +typedef struct s390x_elf_fpregset_struct s390x_elf_fpregset;
 +
 +typedef struct note_struct {
 +Elf64_Nhdr hdr;
 +char name[5];
 +char pad3[3];
 +union {
 +s390x_elf_prstatus prstatus;
 +s390x_elf_fpregset fpregset;
 +uint32_t prefix;
 +uint64_t timer;
 +uint64_t todcmp;
 +uint32_t todpreg;
 +uint64_t ctrs[16];
 +} contents;
 +} QEMU_PACKED note_t;
 +
 +static int s390x_write_elf64_prstatus(note_t *note, CPUArchState *env)
 +{
 +int i;
 +s390x_user_regs *regs;
 +
 +note-hdr.n_type = cpu_to_be32(NT_PRSTATUS);
 +
 +regs

Re: [Qemu-devel] [RFC] s390: dump-guest-memory implementation for s390x arch.

2013-03-07 Thread Alexander Graf


On 07.03.2013, at 12:21, Christian Borntraeger wrote:

 On 07/03/13 11:49, Alexander Graf wrote:
 
 On 28.02.2013, at 10:39, Jens Freimann wrote:
 
 From: Ekaterina Tumanova tuman...@linux.vnet.ibm.com
 
 dump-guest-memory QEMU monitor command didn't work for s390 architecture.
 The result of the command is supposed to be ELF format crash-readable
 dump.
 In order to implement this, the arch-specific part of dump-guest-memory
 was added:
 target-s390x/arch_dump.c contains the whole set of function for writing
 Elf note sections of all types for s390x.
 target-s390x/arch_memory_mapping.c contains stub functions, current
 patch places all guest physical memory into the dump.
 
 Signed-off-by: Ekaterina Tumanova tuman...@linux.vnet.ibm.com
 Signed-off-by: Jens Freimann jf...@linux.vnet.ibm.com
 
 ---
 configure  |   4 +-
 include/elf.h  |   6 +
 target-s390x/Makefile.objs |   2 +-
 target-s390x/arch_dump.c   | 240 
 +
 target-s390x/arch_memory_mapping.c |  26 
 5 files changed, 275 insertions(+), 3 deletions(-)
 create mode 100644 target-s390x/arch_dump.c
 create mode 100644 target-s390x/arch_memory_mapping.c
 
 diff --git a/configure b/configure
 index 19738ac..938be14 100755
 --- a/configure
 +++ b/configure
 @@ -4146,7 +4146,7 @@ case $target_arch2 in
fi
 esac
 case $target_arch2 in
 -  i386|x86_64)
 +  i386|x86_64|s390x)
echo CONFIG_HAVE_GET_MEMORY_MAPPING=y  $config_target_mak
 esac
 if test $target_arch2 = ppc64 -a $fdt = yes; then
 @@ -4159,7 +4159,7 @@ if test $target_softmmu = yes ; then
  echo CONFIG_SOFTMMU=y  $config_target_mak
  echo LIBS+=$libs_softmmu $target_libs_softmmu  $config_target_mak
  case $target_arch2 in
 -i386|x86_64)
 +i386|x86_64|s390x)
  echo CONFIG_HAVE_CORE_DUMP=y  $config_target_mak
  esac
 fi
 diff --git a/include/elf.h b/include/elf.h
 index a21ea53..ba4b3a7 100644
 --- a/include/elf.h
 +++ b/include/elf.h
 @@ -1219,11 +1219,17 @@ typedef struct elf64_shdr {
 
 /* Notes used in ET_CORE */
 #define NT_PRSTATUS 1
 +#define NT_FPREGSET 2
 #define NT_PRFPREG  2
 #define NT_PRPSINFO 3
 #define NT_TASKSTRUCT   4
 #define NT_AUXV 6
 #define NT_PRXFPREG 0x46e62b7f  /* copied from 
 gdb5.1/include/elf/common.h */
 +#define NT_S390_PREFIX  0x305   /* s390 prefix register */
 +#define NT_S390_CTRS0x304   /* s390 control registers */
 +#define NT_S390_TODPREG 0x303   /* s390 TOD programmable register 
 */
 +#define NT_S390_TODCMP  0x302   /* s390 TOD clock comparator 
 register */
 +#define NT_S390_TIMER   0x301   /* s390 timer register */
 
 
 /* Note header in a PT_NOTE section */
 diff --git a/target-s390x/Makefile.objs b/target-s390x/Makefile.objs
 index 4e63417..82c79c4 100644
 --- a/target-s390x/Makefile.objs
 +++ b/target-s390x/Makefile.objs
 @@ -1,4 +1,4 @@
 obj-y += translate.o helper.o cpu.o interrupt.o
 obj-y += int_helper.o fpu_helper.o cc_helper.o mem_helper.o misc_helper.o
 -obj-$(CONFIG_SOFTMMU) += ioinst.o
 +obj-$(CONFIG_SOFTMMU) += ioinst.o arch_dump.o arch_memory_mapping.o
 obj-$(CONFIG_KVM) += kvm.o
 diff --git a/target-s390x/arch_dump.c b/target-s390x/arch_dump.c
 new file mode 100644
 index 000..3098cd1
 --- /dev/null
 +++ b/target-s390x/arch_dump.c
 @@ -0,0 +1,240 @@
 +/*
 + * writing ELF notes for s390x arch
 + *
 + *  
 + * Copyright IBM Corp. 2012
 + *
 + * Ekaterina Tumanova tuman...@linux.vnet.ibm.com
 + *
 + * This work is licensed under the terms of the GNU GPL, version 2 or 
 later.
 + * See the COPYING file in the top-level directory.
 + *
 + */
 +
 +#include cpu.h
 +#include elf.h
 +#include exec/cpu-all.h
 +#include sysemu/dump.h
 +#include sysemu/kvm.h
 +
 +
 +struct s390x_user_regs_struct {
 +uint64_tpsw[2];
 +uint64_tgprs[16];
 +uint32_tacrs[16];
 +} QEMU_PACKED;
 +
 +typedef struct s390x_user_regs_struct s390x_user_regs;
 +
 +struct s390x_elf_prstatus_struct {
 +uint8_t pad1[32];
 +uint32_t pid;
 +uint8_t pad2[76];
 +s390x_user_regs regs;
 +uint8_t pad3[16];
 +} QEMU_PACKED;
 +
 +typedef struct s390x_elf_prstatus_struct s390x_elf_prstatus;
 +
 +struct s390x_elf_fpregset_struct {
 +uint32_tfpc;
 +uint32_tpad;
 +uint64_tfprs[16];
 +} QEMU_PACKED;
 +
 +typedef struct s390x_elf_fpregset_struct s390x_elf_fpregset;
 +
 +typedef struct note_struct {
 +Elf64_Nhdr hdr;
 +char name[5];
 +char pad3[3];
 +   union {
 +s390x_elf_prstatus prstatus;
 +s390x_elf_fpregset fpregset;
 +uint32_t prefix;
 +uint64_t timer;
 +uint64_t todcmp;
 +uint32_t todpreg;
 +uint64_t ctrs[16];
 +} contents;
 +} QEMU_PACKED note_t;
 +
 +static int s390x_write_elf64_prstatus(note_t *note, CPUArchState *env)
 +{
 +int i;
 +s390x_user_regs *regs;
 +
 +

Re: [Qemu-devel] [RFC] s390: dump-guest-memory implementation for s390x arch.

2013-03-07 Thread Christian Borntraeger

 diff --git a/target-s390x/arch_memory_mapping.c 
 b/target-s390x/arch_memory_mapping.c
 new file mode 100644
 index 000..3dad3b9
 --- /dev/null
 +++ b/target-s390x/arch_memory_mapping.c
 @@ -0,0 +1,26 @@
 +/*
 + * s390x memory mapping
 + *
 + * Copyright IBM Corp. 2012
 + *
 + * Authors:
 + * Ekaterina Tumanova tuman...@linux.vnet.ibm.com
 + *
 + * This work is licensed under the terms of the GNU GPL, version 2 or 
 later.
 + * See the COPYING file in the top-level directory.
 + *
 + */
 +
 +#include cpu.h
 +#include exec/cpu-all.h
 +#include sysemu/memory_mapping.h
 +
 +int cpu_get_memory_mapping(MemoryMappingList *list, CPUArchState *env)
 +{
 +return 0;
 +}
 +
 +bool cpu_paging_enabled(CPUArchState *env)
 +{
 +return false;

 Why? :)

 To make things compile :-)
 Why no paging? The idea was to have some code that allows getting an ELF 
 dump of a Linux kernel.
 All existing tooling for s390 expects real memory dumps. 

 Actually having CONFIG_HAVE_GET_MEMORY_MAPPING=n would be a sane thing todo, 
 but IIRC the common
 code in dump.c references some functions that are only available with 
 CONFIG_HAVE_GET_MEMORY_MAPPING=y. 

 /home/cborntra/REPOS/qemu/dump.c:84: undefined reference to 
 `memory_mapping_list_free'
 dump.o: In function `dump_init':
 /home/cborntra/REPOS/qemu/dump.c:753: undefined reference to 
 `memory_mapping_list_init'
 /home/cborntra/REPOS/qemu/dump.c:757: undefined reference to 
 `qemu_get_guest_simple_memory_mapping'
 /home/cborntra/REPOS/qemu/dump.c:761: undefined reference to 
 `memory_mapping_filter'
 
 It might be better to just fix those places and not claim that you can fetch 
 memory mappings then. QEMU has all the knowledge of s390 virtual memory you 
 would need to implement it. I can see how you wouldn't need it for core dumps 
 of the kernel, but it's confusing to claim dump support for VM and then say 
 you don't.
 
 
 Alex

So you suggest to implementing stubs for those 4 functions in 
memory_mapping-stub.c and see if dumping
of a kernel still works with CONFIG_HAVE_GET_MEMORY_MAPPING=n, right?

Christian

Re: [Qemu-devel] [RFC] s390: dump-guest-memory implementation for s390x arch.

2013-03-07 Thread Alexander Graf


On 07.03.2013, at 12:30, Christian Borntraeger wrote:

 diff --git a/target-s390x/arch_memory_mapping.c 
 b/target-s390x/arch_memory_mapping.c
 new file mode 100644
 index 000..3dad3b9
 --- /dev/null
 +++ b/target-s390x/arch_memory_mapping.c
 @@ -0,0 +1,26 @@
 +/*
 + * s390x memory mapping
 + *
 + * Copyright IBM Corp. 2012
 + *
 + * Authors:
 + * Ekaterina Tumanova tuman...@linux.vnet.ibm.com
 + *
 + * This work is licensed under the terms of the GNU GPL, version 2 or 
 later.
 + * See the COPYING file in the top-level directory.
 + *
 + */
 +
 +#include cpu.h
 +#include exec/cpu-all.h
 +#include sysemu/memory_mapping.h
 +
 +int cpu_get_memory_mapping(MemoryMappingList *list, CPUArchState *env)
 +{
 +return 0;
 +}
 +
 +bool cpu_paging_enabled(CPUArchState *env)
 +{
 +return false;
 
 Why? :)
 
 To make things compile :-)
 Why no paging? The idea was to have some code that allows getting an ELF 
 dump of a Linux kernel.
 All existing tooling for s390 expects real memory dumps. 
 
 Actually having CONFIG_HAVE_GET_MEMORY_MAPPING=n would be a sane thing 
 todo, but IIRC the common
 code in dump.c references some functions that are only available with 
 CONFIG_HAVE_GET_MEMORY_MAPPING=y. 
 
 /home/cborntra/REPOS/qemu/dump.c:84: undefined reference to 
 `memory_mapping_list_free'
 dump.o: In function `dump_init':
 /home/cborntra/REPOS/qemu/dump.c:753: undefined reference to 
 `memory_mapping_list_init'
 /home/cborntra/REPOS/qemu/dump.c:757: undefined reference to 
 `qemu_get_guest_simple_memory_mapping'
 /home/cborntra/REPOS/qemu/dump.c:761: undefined reference to 
 `memory_mapping_filter'
 
 It might be better to just fix those places and not claim that you can fetch 
 memory mappings then. QEMU has all the knowledge of s390 virtual memory you 
 would need to implement it. I can see how you wouldn't need it for core 
 dumps of the kernel, but it's confusing to claim dump support for VM and 
 then say you don't.
 
 
 Alex
 
 So you suggest to implementing stubs for those 4 functions in 
 memory_mapping-stub.c and see if dumping
 of a kernel still works with CONFIG_HAVE_GET_MEMORY_MAPPING=n, right?

Anything that gets you rolling without CONFIG_HAVE_GET_MEMORY_MAPPING :)


Alex

Re: [Qemu-devel] [Qemu-stable] [SeaBIOS] problems with freeBSD

2013-03-07 Thread Gerd Hoffmann

  Hi,

 Would qemu consider using those blobs rather than different developers
 using their distro toolchain to build up a random commit ID. I say
 random only because often qemu releases ship with a non-release
 seabios.

It happened (well, the snapshot thing).  But that isn't our preferred
model, it was just bad release cycle coordination and we'll try to avoid
doing it again.  There is reason why we have a 1.7.2 stable branch now.

The binary blobs are just a convenience thing because rebuilding the
firmware might not be that easy.  Not so much for virtualization where
host + firmware are the same arch.  But for emulation, where you need
the -- say -- ppc slof on a x86 machine, and the binaries shipped allow
people to get started without having to install (or compile) a cross
compiler first.

It is certainly not required to use blobs shipped.  In fact the release
tarballs include all the firmware submodules, so you can build the
firmware yourself if you want.  I've created roms/Makefile so you can
easily rebuild the (x86) firmware.  seabios is there, vgabios is there,
patches for ipxe are in flight.

 Just a note, or an alternative opinion, so to say.  In Debian, we have
 a social contract which, among other things, ensures that the binaries
 you, as a user, get, comes with source which you can modify on the
 system you installed.  This requires, for example, that all binaries
 shipped are actually built from the corresponding source, and that
 no blobs from whatever other sources are used, ever.

That is perfectly fine.  How to you handle ppc firmware for x86 hosts
btw?  Build on ppc buildhost and ship as noarch package?  Or do you do
cross compiler builds, so that users can patch+rebuild it on their x86
host too?

 We don't even ship any upstream blobs in the debian qemu _source_
 package: we repack upstream qemu.tar.gz by removing these blobs.

That's a bit over the top for my taste as the release tarballs include
both source and blobs.  Although it might be the debian release is just
a bit too old for that, not fully sure with which release anthony
started to include the firmware submodules, might be it was after 1.1

cheers,
  Gerd

Re: [Qemu-devel] problems with freeBSD

2013-03-07 Thread Laszlo Ersek

On 03/07/13 09:43, Aurelien Jarno wrote:

 I did a git bisect to find the commit fixing the issue. Then, as I was
 not believing the result, I tried the following sequence a dozen of
 times (for some unknown reasons the FreeBSD install CD doesn't exhibit
 the issue, so I used the Debian GNU/kFreeBSD installer):
 
 | mkdir qemu-freebsd-bug
 | cd qemu-freebsd-bug
 |
 | wget 
 http://ftp.debian.org/debian/dists/squeeze/main/installer-kfreebsd-amd64/current/images/netboot/mini.iso
  
 |
 | git clone git://git.qemu.org/qemu.git
 | cd qemu
 | git checkout -b stable-1.4 v1.4.0
 | ./configure --target-list=x86_64-softmmu
 | make
 | cd ..
 |
 | git clone git://git.seabios.org/seabios.git
 | cd seabios
 | git checkout -b 1.7.2-stable origin/1.7.2-stable
 | git reset --hard 4219149ad2b783abfa61e80e9e9f6910db0c76c9
 | make
 | cp out/bios.bin ../qemu/pc-bios
 | cd..
 |
 | # debian-installer boots correctly 
 | ./qemu/x86_64-softmmu/qemu-system-x86_64 -enable-kvm -cdrom mini.iso
 |
 | cd seabios
 | git reset --hard d75c22fcb6521dad11428b65789d92f89675c600 
 | git clean -fdx
 | make
 | cp out/bios.bin ../qemu/pc-bios
 | cd ..
 |
 | # debian-installer fails to boot
 | ./qemu/x86_64-softmmu/qemu-system-x86_64 -enable-kvm -cdrom mini.iso 
 
 
 Maybe I am doing something wrong or there is a bug in my toolchain
 (Debian Sid). It would be nice if someone could try to reproduce that on
 another distro/system.
 

Can you save the out/ directory from both builds and diff -ur them
(maybe just the *.lds files)?

I'm noticing that pathnames are embedded in some ELF section names (I
hope this sentence makes sense), and when you build at d75c22fc, those
pathnames contain dot-dot (..), ie. two section name separators next
to each other. Maybe that's not good; no idea.

Laszlo

Re: [Qemu-devel] [PATCH 2/3] hw/virtio-net.c: set config size using host features

2013-03-07 Thread Christian Borntraeger

Ping.
Anthony, Jesse,

how is this supposed to work?


Christian
On 05/03/13 18:03, Christian Borntraeger wrote:
 On 05/03/13 17:48, Alexander Graf wrote:
 On 02/06/2013 12:47 AM, Jesse Larrew wrote:
 Currently, the config size for virtio devices is hard coded. When a new
 feature is added that changes the config size, drivers that assume a static
 config size will break. For purposes of backward compatibility, there needs
 to be a way to inform drivers of the config size needed to accommodate the
 set of features enabled.

 Signed-off-by: Jesse Larrewjlar...@linux.vnet.ibm.com

 The following patch gets my s390 virtio guest working again, but I doubt 
 it's the right fix.

 What is the expected dependency chain of feature calls?


 Alex

 diff --git a/hw/s390x/s390-virtio-bus.c b/hw/s390x/s390-virtio-bus.c
 index 089ed92..81be971 100644
 --- a/hw/s390x/s390-virtio-bus.c
 +++ b/hw/s390x/s390-virtio-bus.c
 @@ -154,7 +154,7 @@ static int s390_virtio_net_init(VirtIOS390Device *dev)
  VirtIODevice *vdev;

  vdev = virtio_net_init((DeviceState *)dev, dev-nic, dev-net,
 -   dev-host_features);
 +   dev-host_features | (1  VIRTIO_NET_F_MAC));
  if (!vdev) {
  return -1;
  }


 
 Actually this goes back to
 
 commit 1e89ad5b00ba0426d4e949c9e6ce2926c15b81b7
 Author: Anthony Liguori aligu...@us.ibm.com
 Date:   Tue Feb 5 17:47:15 2013 -0600
 
 virtio-net: pass host features to virtio_net_init
 
 Signed-off-by: Anthony Liguori aligu...@us.ibm.com
 
 virtio-s390 calls  virtio_net_init  before it actually queries the 
 dev-features into
 the host_features. virtio-ccw does the same, but it does not BUG. (Its still 
 wrong IMHO)
 
 Same for virtio-pci:
 
 
 static int virtio_net_init_pci(PCIDevice *pci_dev)
 {
 VirtIOPCIProxy *proxy = DO_UPCAST(VirtIOPCIProxy, pci_dev, pci_dev);
 VirtIODevice *vdev;
 
 vdev = virtio_net_init(pci_dev-qdev, proxy-nic, proxy-net,
proxy-host_features);   --- use host_feature
 
 vdev-nvectors = proxy-nvectors;
 virtio_init_pci(proxy, vdev);   - actually gets host_feature 
 (!)
 [..]
 
 
 Good that my old rusty virtio-ccw transport has lots of BUG_ONS :-)

Re: [Qemu-devel] [PATCH 28/28] hw/sd.c: add SD card save/load support

2013-03-07 Thread Igor Mitsyanko

 On 03/06/2013 10:52 PM, Peter Maydell wrote:

On 7 March 2013 02:31, Michael Walle mich...@walle.cc
mich...@walle.cc wrote:

 Sorry for digging out such an old thread :) but this patch introduced a memory
corruption, see below.

 CC'ing Igor as the author of the patch...


 Am Dienstag 30 Oktober 2012, 09:44:24 schrieb Peter Maydell:

 From: Igor Mitsyanko i.mitsya...@gmail.com i.mitsya...@gmail.com

This patch updates SD card model to support save/load of card's state.

+static const VMStateDescription sd_vmstate = {
+.name = sd-card,
+.version_id = 1,
+.minimum_version_id = 1,
+.fields = (VMStateField[]) {
+VMSTATE_UINT32(mode, SDState),
+VMSTATE_INT32(state, SDState),
+VMSTATE_UINT8_ARRAY(cid, SDState, 16),
+VMSTATE_UINT8_ARRAY(csd, SDState, 16),
+VMSTATE_UINT16(rca, SDState),
+VMSTATE_UINT32(card_status, SDState),
+VMSTATE_PARTIAL_BUFFER(sd_status, SDState, 1),
+VMSTATE_UINT32(vhs, SDState),
+VMSTATE_BITMAP(wp_groups, SDState, 0, wpgrps_size),
+VMSTATE_UINT32(blk_len, SDState),
+VMSTATE_UINT32(erase_start, SDState),
+VMSTATE_UINT32(erase_end, SDState),
+VMSTATE_UINT8_ARRAY(pwd, SDState, 16),
+VMSTATE_UINT32(pwd_len, SDState),
+VMSTATE_UINT8_ARRAY(function_group, SDState, 6),
+VMSTATE_UINT8(current_cmd, SDState),
+VMSTATE_BOOL(expecting_acmd, SDState),
+VMSTATE_UINT32(blk_written, SDState),
+VMSTATE_UINT64(data_start, SDState),
+VMSTATE_UINT32(data_offset, SDState),
+VMSTATE_UINT8_ARRAY(data, SDState, 512),
+VMSTATE_BUFFER_UNSAFE(buf, SDState, 1, 512),

 buf is dynamically allocated in the sd_init(), see also the SDState:

struct SDState {
[...]
uint8_t *buf;

bool enable;
};


 Agreed, VMSTATE_BUFFER_UNSAFE() is for buffers that are inline
in the struct, not for buffers that the struct only points to.
I guess we want one of the VMSTATE_VARRAY_* types instead.

thanks
-- PMM


Oops, sorry. It needs VMS_POINTER flag apparently. Looks like hw/onenand.c
has the same problem with it's otp variable?
I can think of two good ways to fix it:

1. Use VMSTATE_BUFFER_UNSAFE_INFO instead. Just pass a custom VMStateInfo
to it where we will dereference address and get a proper data pointer.
hw/pci/pci.c does this, for example.

2. Introduce a new vmstate macro VMSTATE_BUFFER_POINTER_UNSAFE, which will
be the same as VMSTATE_BUFFER_UNSAFE,
but with an extra VMS_POINTER flag. The best option in my opinion, it also
could be reused for hw/onenand.c

Re: [Qemu-devel] [PATCH] sysbus: add no_user for devices using mmio or IRQ or GPIO

Peter Maydell peter.mayd...@linaro.org writes:

 On 7 March 2013 16:48, Markus Armbruster arm...@redhat.com wrote:
 Can we make some progress towards something that makes more sense?

 First step: reasons for marking a device no_user.

 From a user point of view, I think there's just one: -device/device_add
 cannot possibly result in a working device.  Coherent enough semantics
 for no_user, in my opinion, but I readily concede it's not particularly
 useful for maintaining it as infrastructure and device models evolve.

 Ideally it would be nice to move to a model where the error message
 was the device you've created has some required connections which
 haven't been wired up, and then for some devices you'd always get
 that error [until we implemented syntax for doing the wiring] and
 for some you'd only get it if you messed up the command line switches.

Sounds good to me.

 For that, we need to drill down into reasons for cannot possibly work.
 Here are two, please add more:

 * Can't make required connections

 * Resource collision with board

   The device must connect to some fixed resource, but the board already
   connects something to it.  Without no_user, this should result in an
   error message of sorts, which is much better than the silent breakage
   above.  Whether the message makes any sense to the user is a different
   question.

 Do we have any concrete examples of this? I don't think devices should
 be making connections to resources themselves. If the only things wiring
 up devices are (a) the board models and (b) anything the user says on
 the command line, then the conflict only happens if the user says
 -device foo,bar=0x42 and bar 0x42 is in use; in that case the message
 will make sense to them.

Not exactly what you asked for, but here goes anyway:

$ qemu-system-x86_64 -nodefaults -S -vnc :0 -monitor stdio -device 
isa-fdc,id=foo
QEMU 1.4.50 monitor - type 'help' for more information
(qemu) info qtree
bus: main-system-bus
[...]
bus: isa.0
  type ISA
  dev: isa-fdc, id foo
iobase = 0x3f0
irq = 6
dma = 2
driveA = null
driveB = null
bootindexA = -1
bootindexB = -1
check_media_rate = on
isa irq 6
  dev: isa-fdc, id 
iobase = 0x3f0
irq = 6
dma = 2
driveA = null
driveB = null
bootindexA = -1
bootindexB = -1
check_media_rate = on
isa irq 6
[...]
$ qemu-system-x86_64 -nodefaults -S -vnc :0 -monitor stdio -device 
q35-pcihost
Segmentation fault (core dumped)

 I think a third case for no-user is this device is actually an
 abstract base class (eg arm_gic_common), which we could deal with
 by checking the TypeInfo instead.

Yes.

 Case four: we really don't expect anybody to be trying to wire this
 up dynamically, which would apply to things like the on-cpu peripherals
 for some ARM cores. There it is really just an attempt at being friendly
 by cutting down the length of the devices list.

Yes.  Case-by-case decision, I guess.

 Speaking of friendlyness, it might be helpful if the '-devices help'
 list was somehow more structured than a single long list and if
 more devices had description text?

Oh yes.  I wanted to do that, but when Anthony started to dig up qdev, I
hastened to get out of the way.

  [and I don't think this device
 can be added via the monitor but not the command line
 counts as consistent or coherent...]

 no_user applies equally to -device and device_add.

 In the codebase as it stands, it applies only to -device.
 I agree that we should be consistent here, which we could do
 by applying Christian's patch or some variation to make device_add
 honour no_user. (Or by removing no_user altogether :-))

Actually, it appears to apply only to help now!

git-bisect blames this one:

commit 18b6dade8c0799c48f5c5e124b8c407cd5e22e96
Author: Anthony Liguori aligu...@us.ibm.com
Date:   Fri Dec 9 12:08:01 2011 -0600

qdev: refactor device creation to allow bus_info to be set only in class

As we use class_init to set class members, DeviceInfo no longer holds this
information.

Signed-off-by: Anthony Liguori aligu...@us.ibm.com

Review fail, testing fail :(

[Qemu-devel] [PATCH v2 0/7] threadpool: support multiple ThreadPools

This patch series changes the global thread pool to a one ThreadPool per
AioContext model.  We still only use the main loop AioContext so in practice
there is just one ThreadPool.  But this opens the door to refactoring the block
layer (which depends on ThreadPool) so block devices can be accessed outside
the global mutex in the future.

ThreadPool is tightly bound to an AioContext because it uses an EventNotifier
to signal work completion.  Completed work items are reaped and their callback
functions are invoked from the EventNotifier read handler (executing under
AioContext).

It might be possible to record the AioContext for the completion callback on a
per-request basis and continuing to use a global pool of worker threads.  After
discussing thread pool models with Paolo I have been convinced that it is
simpler and more scalable to have one ThreadPool per AioContext instead.
Therefore this series implements the 1:1 approach.  For details on previous
thread pool model discussion, see:
http://lists.gnu.org/archive/html/qemu-devel/2013-02/msg03987.html

The final patch was previously separate but I have included it because it
depends on qemu_get_aio_context().  It is unrelated to ThreadPool but the patch
reviewers are the same in both instances, so I combined the series.

At the end of this series block/raw-posix.c and block/raw-win32.c are aware of
the ThreadPool they submit work to.  The next step after this series is to
associate BlockDriverState with an AioContext so that the block layer can run
outside the global main loop.

v2:
 * Always find AioContext, don't split if (ctx) cases [Paolo]
 * Introduce bdrv_get_aio_context() [Paolo]
 * Add CoQueue AioContext patch since it depends on qemu_get_aio_context()

Stefan Hajnoczi (7):
  main-loop: add qemu_get_aio_context()
  threadpool: move globals into struct ThreadPool
  threadpool: add thread_pool_new() and thread_pool_free()
  aio: add a ThreadPool instance to AioContext
  block: add bdrv_get_aio_context()
  threadpool: drop global thread pool
  coroutine: use AioContext for CoQueue BH

 async.c |  11 ++
 block.c |   6 ++
 block/raw-posix.c   |   8 +-
 block/raw-win32.c   |   4 +-
 include/block/aio.h |   6 ++
 include/block/block_int.h   |   7 ++
 include/block/coroutine.h   |   1 +
 include/block/thread-pool.h |  15 ++-
 include/qemu/main-loop.h|   5 +
 main-loop.c |   5 +
 qemu-coroutine-lock.c   |  55 ++
 tests/test-thread-pool.c|  44 
 thread-pool.c   | 243 
 trace-events|   4 +-
 14 files changed, 276 insertions(+), 138 deletions(-)

-- 
1.8.1.4

[Qemu-devel] [PATCH v2 1/7] main-loop: add qemu_get_aio_context()

It is very useful to get the main loop AioContext, which is a static
variable in main-loop.c.

I'm not sure whether qemu_get_aio_context() will be necessary in the
future once devices focus on using their own AioContext instead of the
main loop AioContext, but for now it allows us to refactor code to
support multiple AioContext while actually passing the main loop
AioContext.

Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 include/qemu/main-loop.h | 5 +
 main-loop.c  | 5 +
 2 files changed, 10 insertions(+)

diff --git a/include/qemu/main-loop.h b/include/qemu/main-loop.h
index 0995288..6f0200a 100644
--- a/include/qemu/main-loop.h
+++ b/include/qemu/main-loop.h
@@ -82,6 +82,11 @@ int qemu_init_main_loop(void);
 int main_loop_wait(int nonblocking);
 
 /**
+ * qemu_get_aio_context: Return the main loop's AioContext
+ */
+AioContext *qemu_get_aio_context(void);
+
+/**
  * qemu_notify_event: Force processing of pending events.
  *
  * Similar to signaling a condition variable, qemu_notify_event forces
diff --git a/main-loop.c b/main-loop.c
index 8c9b58c..eb80ff3 100644
--- a/main-loop.c
+++ b/main-loop.c
@@ -109,6 +109,11 @@ static int qemu_signal_init(void)
 
 static AioContext *qemu_aio_context;
 
+AioContext *qemu_get_aio_context(void)
+{
+return qemu_aio_context;
+}
+
 void qemu_notify_event(void)
 {
 if (!qemu_aio_context) {
-- 
1.8.1.4

[Qemu-devel] [PATCH v2 2/7] threadpool: move globals into struct ThreadPool

Move global variables into a struct so multiple thread pools can be
supported in the future.

This patch does not change thread-pool.h interfaces.  There is still a
global thread pool and it is not yet possible to create/destroy
individual thread pools.  Moving the variables into a struct first makes
later patches easier to review.

Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 thread-pool.c | 190 +-
 trace-events  |   4 +-
 2 files changed, 112 insertions(+), 82 deletions(-)

diff --git a/thread-pool.c b/thread-pool.c
index e3ca64d..a0aecd0 100644
--- a/thread-pool.c
+++ b/thread-pool.c
@@ -24,7 +24,9 @@
 #include qemu/event_notifier.h
 #include block/thread-pool.h
 
-static void do_spawn_thread(void);
+typedef struct ThreadPool ThreadPool;
+
+static void do_spawn_thread(ThreadPool *pool);
 
 typedef struct ThreadPoolElement ThreadPoolElement;
 
@@ -37,6 +39,7 @@ enum ThreadState {
 
 struct ThreadPoolElement {
 BlockDriverAIOCB common;
+ThreadPool *pool;
 ThreadPoolFunc *func;
 void *arg;
 
@@ -54,49 +57,56 @@ struct ThreadPoolElement {
 QLIST_ENTRY(ThreadPoolElement) all;
 };
 
-static EventNotifier notifier;
-static QemuMutex lock;
-static QemuCond check_cancel;
-static QemuSemaphore sem;
-static int max_threads = 64;
-static QEMUBH *new_thread_bh;
-
-/* The following variables are protected by the global mutex.  */
-static QLIST_HEAD(, ThreadPoolElement) head;
-
-/* The following variables are protected by lock.  */
-static QTAILQ_HEAD(, ThreadPoolElement) request_list;
-static int cur_threads;
-static int idle_threads;
-static int new_threads; /* backlog of threads we need to create */
-static int pending_threads; /* threads created but not running yet */
-static int pending_cancellations; /* whether we need a cond_broadcast */
-
-static void *worker_thread(void *unused)
+struct ThreadPool {
+EventNotifier notifier;
+QemuMutex lock;
+QemuCond check_cancel;
+QemuSemaphore sem;
+int max_threads;
+QEMUBH *new_thread_bh;
+
+/* The following variables are only accessed from one AioContext. */
+QLIST_HEAD(, ThreadPoolElement) head;
+
+/* The following variables are protected by lock.  */
+QTAILQ_HEAD(, ThreadPoolElement) request_list;
+int cur_threads;
+int idle_threads;
+int new_threads; /* backlog of threads we need to create */
+int pending_threads; /* threads created but not running yet */
+int pending_cancellations; /* whether we need a cond_broadcast */
+};
+
+/* Currently there is only one thread pool instance. */
+static ThreadPool global_pool;
+
+static void *worker_thread(void *opaque)
 {
-qemu_mutex_lock(lock);
-pending_threads--;
-do_spawn_thread();
+ThreadPool *pool = opaque;
+
+qemu_mutex_lock(pool-lock);
+pool-pending_threads--;
+do_spawn_thread(pool);
 
 while (1) {
 ThreadPoolElement *req;
 int ret;
 
 do {
-idle_threads++;
-qemu_mutex_unlock(lock);
-ret = qemu_sem_timedwait(sem, 1);
-qemu_mutex_lock(lock);
-idle_threads--;
-} while (ret == -1  !QTAILQ_EMPTY(request_list));
+pool-idle_threads++;
+qemu_mutex_unlock(pool-lock);
+ret = qemu_sem_timedwait(pool-sem, 1);
+qemu_mutex_lock(pool-lock);
+pool-idle_threads--;
+} while (ret == -1  !QTAILQ_EMPTY(pool-request_list));
 if (ret == -1) {
 break;
 }
 
-req = QTAILQ_FIRST(request_list);
-QTAILQ_REMOVE(request_list, req, reqs);
+req = QTAILQ_FIRST(pool-request_list);
+QTAILQ_REMOVE(pool-request_list, req, reqs);
 req-state = THREAD_ACTIVE;
-qemu_mutex_unlock(lock);
+qemu_mutex_unlock(pool-lock);
 
 ret = req-func(req-arg);
 
@@ -105,45 +115,47 @@ static void *worker_thread(void *unused)
 smp_wmb();
 req-state = THREAD_DONE;
 
-qemu_mutex_lock(lock);
-if (pending_cancellations) {
-qemu_cond_broadcast(check_cancel);
+qemu_mutex_lock(pool-lock);
+if (pool-pending_cancellations) {
+qemu_cond_broadcast(pool-check_cancel);
 }
 
-event_notifier_set(notifier);
+event_notifier_set(pool-notifier);
 }
 
-cur_threads--;
-qemu_mutex_unlock(lock);
+pool-cur_threads--;
+qemu_mutex_unlock(pool-lock);
 return NULL;
 }
 
-static void do_spawn_thread(void)
+static void do_spawn_thread(ThreadPool *pool)
 {
 QemuThread t;
 
 /* Runs with lock taken.  */
-if (!new_threads) {
+if (!pool-new_threads) {
 return;
 }
 
-new_threads--;
-pending_threads++;
+pool-new_threads--;
+pool-pending_threads++;
 
-qemu_thread_create(t, worker_thread, NULL, QEMU_THREAD_DETACHED);
+qemu_thread_create(t, worker_thread, pool, QEMU_THREAD_DETACHED);
 }
 
 static void

[Qemu-devel] [PATCH v2 5/7] block: add bdrv_get_aio_context()

For now bdrv_get_aio_context() is just a stub that calls
qemu_aio_get_context() since the block layer is currently tied to the
main loop AioContext.

Add the stub now so that the block layer can begin accessing its
AioContext.

Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 block.c   | 6 ++
 include/block/block_int.h | 7 +++
 2 files changed, 13 insertions(+)

diff --git a/block.c b/block.c
index 124a9eb..0e5cd01 100644
--- a/block.c
+++ b/block.c
@@ -4638,3 +4638,9 @@ out:
 bdrv_delete(bs);
 }
 }
+
+AioContext *bdrv_get_aio_context(BlockDriverState *bs)
+{
+/* Currently BlockDriverState always uses the main loop AioContext */
+return qemu_get_aio_context();
+}
diff --git a/include/block/block_int.h b/include/block/block_int.h
index eaad53e..966f7fd 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -293,6 +293,13 @@ int get_tmp_filename(char *filename, int size);
 void bdrv_set_io_limits(BlockDriverState *bs,
 BlockIOLimit *io_limits);
 
+/**
+ * bdrv_get_aio_context:
+ *
+ * Returns: the currently bound #AioContext
+ */
+AioContext *bdrv_get_aio_context(BlockDriverState *bs);
+
 #ifdef _WIN32
 int is_windows_drive(const char *filename);
 #endif
-- 
1.8.1.4

[Qemu-devel] [PATCH v2 3/7] threadpool: add thread_pool_new() and thread_pool_free()

ThreadPool is tied to an AioContext through its event notifier, which
dictates in which AioContext the work item's callback function will be
invoked.

In order to support multiple AioContexts we need to support multiple
ThreadPool instances.

This patch adds the new/free functions.  The free function deserves
special attention because it quiesces remaining worker threads.  This
requires a new condition variable and a stopping flag to let workers
know they should terminate once idle.

We never needed to do this before since the global threadpool was not
explicitly destroyed until process termination.

Also stash the AioContext pointer in ThreadPool so that we can call
aio_set_event_notifier() in thread_pool_free().  We didn't need to hold
onto AioContext previously since there was no free function.

Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 include/block/thread-pool.h |  5 +
 thread-pool.c   | 52 +
 2 files changed, 53 insertions(+), 4 deletions(-)

diff --git a/include/block/thread-pool.h b/include/block/thread-pool.h
index 200703e..e1453c6 100644
--- a/include/block/thread-pool.h
+++ b/include/block/thread-pool.h
@@ -26,6 +26,11 @@
 
 typedef int ThreadPoolFunc(void *opaque);
 
+typedef struct ThreadPool ThreadPool;
+
+ThreadPool *thread_pool_new(struct AioContext *ctx);
+void thread_pool_free(ThreadPool *pool);
+
 BlockDriverAIOCB *thread_pool_submit_aio(ThreadPoolFunc *func, void *arg,
  BlockDriverCompletionFunc *cb, void *opaque);
 int coroutine_fn thread_pool_submit_co(ThreadPoolFunc *func, void *arg);
diff --git a/thread-pool.c b/thread-pool.c
index a0aecd0..d1e4570 100644
--- a/thread-pool.c
+++ b/thread-pool.c
@@ -24,8 +24,6 @@
 #include qemu/event_notifier.h
 #include block/thread-pool.h
 
-typedef struct ThreadPool ThreadPool;
-
 static void do_spawn_thread(ThreadPool *pool);
 
 typedef struct ThreadPoolElement ThreadPoolElement;
@@ -59,8 +57,10 @@ struct ThreadPoolElement {
 
 struct ThreadPool {
 EventNotifier notifier;
+AioContext *ctx;
 QemuMutex lock;
 QemuCond check_cancel;
+QemuCond worker_stopped;
 QemuSemaphore sem;
 int max_threads;
 QEMUBH *new_thread_bh;
@@ -75,6 +75,7 @@ struct ThreadPool {
 int new_threads; /* backlog of threads we need to create */
 int pending_threads; /* threads created but not running yet */
 int pending_cancellations; /* whether we need a cond_broadcast */
+bool stopping;
 };
 
 /* Currently there is only one thread pool instance. */
@@ -88,7 +89,7 @@ static void *worker_thread(void *opaque)
 pool-pending_threads--;
 do_spawn_thread(pool);
 
-while (1) {
+while (!pool-stopping) {
 ThreadPoolElement *req;
 int ret;
 
@@ -99,7 +100,7 @@ static void *worker_thread(void *opaque)
 qemu_mutex_lock(pool-lock);
 pool-idle_threads--;
 } while (ret == -1  !QTAILQ_EMPTY(pool-request_list));
-if (ret == -1) {
+if (ret == -1 || pool-stopping) {
 break;
 }
 
@@ -124,6 +125,7 @@ static void *worker_thread(void *opaque)
 }
 
 pool-cur_threads--;
+qemu_cond_signal(pool-worker_stopped);
 qemu_mutex_unlock(pool-lock);
 return NULL;
 }
@@ -298,8 +300,10 @@ static void thread_pool_init_one(ThreadPool *pool, 
AioContext *ctx)
 
 memset(pool, 0, sizeof(*pool));
 event_notifier_init(pool-notifier, false);
+pool-ctx = ctx;
 qemu_mutex_init(pool-lock);
 qemu_cond_init(pool-check_cancel);
+qemu_cond_init(pool-worker_stopped);
 qemu_sem_init(pool-sem, 0);
 pool-max_threads = 64;
 pool-new_thread_bh = aio_bh_new(ctx, spawn_thread_bh_fn, pool);
@@ -311,6 +315,46 @@ static void thread_pool_init_one(ThreadPool *pool, 
AioContext *ctx)
thread_pool_active);
 }
 
+ThreadPool *thread_pool_new(AioContext *ctx)
+{
+ThreadPool *pool = g_new(ThreadPool, 1);
+thread_pool_init_one(pool, ctx);
+return pool;
+}
+
+void thread_pool_free(ThreadPool *pool)
+{
+if (!pool) {
+return;
+}
+
+assert(QLIST_EMPTY(pool-head));
+
+qemu_mutex_lock(pool-lock);
+
+/* Stop new threads from spawning */
+qemu_bh_delete(pool-new_thread_bh);
+pool-cur_threads -= pool-new_threads;
+pool-new_threads = 0;
+
+/* Wait for worker threads to terminate */
+pool-stopping = true;
+while (pool-cur_threads  0) {
+qemu_sem_post(pool-sem);
+qemu_cond_wait(pool-worker_stopped, pool-lock);
+}
+
+qemu_mutex_unlock(pool-lock);
+
+aio_set_event_notifier(pool-ctx, pool-notifier, NULL, NULL);
+qemu_sem_destroy(pool-sem);
+qemu_cond_destroy(pool-check_cancel);
+qemu_cond_destroy(pool-worker_stopped);
+qemu_mutex_destroy(pool-lock);
+event_notifier_cleanup(pool-notifier);
+g_free(pool);
+}
+
 static void thread_pool_init(void)
 {
 thread_pool_init_one(global_pool, NULL);
-- 
1.8.1.4

[Qemu-devel] [PATCH v2 6/7] threadpool: drop global thread pool

Now that each AioContext has a ThreadPool and the main loop AioContext
can be fetched with bdrv_get_aio_context(), we can eliminate the concept
of a global thread pool from thread-pool.c.

The submit functions must take a ThreadPool* argument.

block/raw-posix.c and block/raw-win32.c use
aio_get_thread_pool(bdrv_get_aio_context(bs)) to fetch the main loop's
ThreadPool.

tests/test-thread-pool.c must be updated to reflect the new
thread_pool_submit() function prototypes.

Signed-off-by: Stefan Hajnoczi stefa...@redhat.com

use bdrv_get_aio_context

Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 block/raw-posix.c   |  8 ++--
 block/raw-win32.c   |  4 +++-
 include/block/thread-pool.h | 10 ++
 tests/test-thread-pool.c| 44 +---
 thread-pool.c   | 23 +++
 5 files changed, 43 insertions(+), 46 deletions(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index 4dfdf98..8a3cdbc 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -750,6 +750,7 @@ static BlockDriverAIOCB *paio_submit(BlockDriverState *bs, 
int fd,
 BlockDriverCompletionFunc *cb, void *opaque, int type)
 {
 RawPosixAIOData *acb = g_slice_new(RawPosixAIOData);
+ThreadPool *pool;
 
 acb-bs = bs;
 acb-aio_type = type;
@@ -763,7 +764,8 @@ static BlockDriverAIOCB *paio_submit(BlockDriverState *bs, 
int fd,
 acb-aio_offset = sector_num * 512;
 
 trace_paio_submit(acb, opaque, sector_num, nb_sectors, type);
-return thread_pool_submit_aio(aio_worker, acb, cb, opaque);
+pool = aio_get_thread_pool(bdrv_get_aio_context(bs));
+return thread_pool_submit_aio(pool, aio_worker, acb, cb, opaque);
 }
 
 static BlockDriverAIOCB *raw_aio_submit(BlockDriverState *bs,
@@ -1413,6 +1415,7 @@ static BlockDriverAIOCB *hdev_aio_ioctl(BlockDriverState 
*bs,
 {
 BDRVRawState *s = bs-opaque;
 RawPosixAIOData *acb;
+ThreadPool *pool;
 
 if (fd_open(bs)  0)
 return NULL;
@@ -1424,7 +1427,8 @@ static BlockDriverAIOCB *hdev_aio_ioctl(BlockDriverState 
*bs,
 acb-aio_offset = 0;
 acb-aio_ioctl_buf = buf;
 acb-aio_ioctl_cmd = req;
-return thread_pool_submit_aio(aio_worker, acb, cb, opaque);
+pool = aio_get_thread_pool(bdrv_get_aio_context(bs));
+return thread_pool_submit_aio(pool, aio_worker, acb, cb, opaque);
 }
 
 #elif defined(__FreeBSD__) || defined(__FreeBSD_kernel__)
diff --git a/block/raw-win32.c b/block/raw-win32.c
index b89ac19..18e0068 100644
--- a/block/raw-win32.c
+++ b/block/raw-win32.c
@@ -144,6 +144,7 @@ static BlockDriverAIOCB *paio_submit(BlockDriverState *bs, 
HANDLE hfile,
 BlockDriverCompletionFunc *cb, void *opaque, int type)
 {
 RawWin32AIOData *acb = g_slice_new(RawWin32AIOData);
+ThreadPool *pool;
 
 acb-bs = bs;
 acb-hfile = hfile;
@@ -157,7 +158,8 @@ static BlockDriverAIOCB *paio_submit(BlockDriverState *bs, 
HANDLE hfile,
 acb-aio_offset = sector_num * 512;
 
 trace_paio_submit(acb, opaque, sector_num, nb_sectors, type);
-return thread_pool_submit_aio(aio_worker, acb, cb, opaque);
+pool = aio_get_thread_pool(bdrv_get_aio_context(bs));
+return thread_pool_submit_aio(pool, aio_worker, acb, cb, opaque);
 }
 
 int qemu_ftruncate64(int fd, int64_t length)
diff --git a/include/block/thread-pool.h b/include/block/thread-pool.h
index e1453c6..32afcdd 100644
--- a/include/block/thread-pool.h
+++ b/include/block/thread-pool.h
@@ -31,9 +31,11 @@ typedef struct ThreadPool ThreadPool;
 ThreadPool *thread_pool_new(struct AioContext *ctx);
 void thread_pool_free(ThreadPool *pool);
 
-BlockDriverAIOCB *thread_pool_submit_aio(ThreadPoolFunc *func, void *arg,
- BlockDriverCompletionFunc *cb, void *opaque);
-int coroutine_fn thread_pool_submit_co(ThreadPoolFunc *func, void *arg);
-void thread_pool_submit(ThreadPoolFunc *func, void *arg);
+BlockDriverAIOCB *thread_pool_submit_aio(ThreadPool *pool,
+ThreadPoolFunc *func, void *arg,
+BlockDriverCompletionFunc *cb, void *opaque);
+int coroutine_fn thread_pool_submit_co(ThreadPool *pool,
+ThreadPoolFunc *func, void *arg);
+void thread_pool_submit(ThreadPool *pool, ThreadPoolFunc *func, void *arg);
 
 #endif
diff --git a/tests/test-thread-pool.c b/tests/test-thread-pool.c
index 9998e03..22915aa 100644
--- a/tests/test-thread-pool.c
+++ b/tests/test-thread-pool.c
@@ -4,6 +4,8 @@
 #include block/thread-pool.h
 #include block/block.h
 
+static AioContext *ctx;
+static ThreadPool *pool;
 static int active;
 
 typedef struct {
@@ -38,19 +40,10 @@ static void done_cb(void *opaque, int ret)
 active--;
 }
 
-/* A non-blocking poll of the main AIO context (we cannot use aio_poll
- * because we do not know the AioContext).
- */
-static void qemu_aio_wait_nonblocking(void)
-{
-qemu_notify_event();
-qemu_aio_wait();
-}
-
 /* Wait until all aio and bh activity has finished */
 static void qemu_aio_wait_all(void)
 {
-while (qemu_aio_wait()) {
+

[Qemu-devel] [PATCH v2 4/7] aio: add a ThreadPool instance to AioContext

This patch adds a ThreadPool to AioContext.  It's possible that some
AioContext instances will never use the ThreadPool, so defer creation
until aio_get_thread_pool().

The reason why AioContext should have the ThreadPool is because the
ThreadPool is bound to a AioContext instance where the work item's
callback function is invoked.  It doesn't make sense to keep the
ThreadPool pointer anywhere other than AioContext.  For example,
block/raw-posix.c can get its AioContext's ThreadPool and submit work.

Special note about headers: I used struct ThreadPool in aio.h because
there is a circular dependency if aio.h includes thread-pool.h.

Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 async.c | 11 +++
 include/block/aio.h |  6 ++
 2 files changed, 17 insertions(+)

diff --git a/async.c b/async.c
index f2d47ba..90fe906 100644
--- a/async.c
+++ b/async.c
@@ -24,6 +24,7 @@
 
 #include qemu-common.h
 #include block/aio.h
+#include block/thread-pool.h
 #include qemu/main-loop.h
 
 /***/
@@ -172,6 +173,7 @@ aio_ctx_finalize(GSource *source)
 {
 AioContext *ctx = (AioContext *) source;
 
+thread_pool_free(ctx-thread_pool);
 aio_set_event_notifier(ctx, ctx-notifier, NULL, NULL);
 event_notifier_cleanup(ctx-notifier);
 g_array_free(ctx-pollfds, TRUE);
@@ -190,6 +192,14 @@ GSource *aio_get_g_source(AioContext *ctx)
 return ctx-source;
 }
 
+ThreadPool *aio_get_thread_pool(AioContext *ctx)
+{
+if (!ctx-thread_pool) {
+ctx-thread_pool = thread_pool_new(ctx);
+}
+return ctx-thread_pool;
+}
+
 void aio_notify(AioContext *ctx)
 {
 event_notifier_set(ctx-notifier);
@@ -200,6 +210,7 @@ AioContext *aio_context_new(void)
 AioContext *ctx;
 ctx = (AioContext *) g_source_new(aio_source_funcs, sizeof(AioContext));
 ctx-pollfds = g_array_new(FALSE, FALSE, sizeof(GPollFD));
+ctx-thread_pool = NULL;
 event_notifier_init(ctx-notifier, false);
 aio_set_event_notifier(ctx, ctx-notifier, 
(EventNotifierHandler *)
diff --git a/include/block/aio.h b/include/block/aio.h
index 5b54d38..1836793 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -66,6 +66,9 @@ typedef struct AioContext {
 
 /* GPollFDs for aio_poll() */
 GArray *pollfds;
+
+/* Thread pool for performing work and receiving completion callbacks */
+struct ThreadPool *thread_pool;
 } AioContext;
 
 /* Returns 1 if there are still outstanding AIO requests; 0 otherwise */
@@ -223,6 +226,9 @@ void aio_set_event_notifier(AioContext *ctx,
  */
 GSource *aio_get_g_source(AioContext *ctx);
 
+/* Return the ThreadPool bound to this AioContext */
+struct ThreadPool *aio_get_thread_pool(AioContext *ctx);
+
 /* Functions to operate on the main QEMU AioContext.  */
 
 bool qemu_aio_wait(void);
-- 
1.8.1.4

[Qemu-devel] [PATCH v2 7/7] coroutine: use AioContext for CoQueue BH

CoQueue uses a BH to awake coroutines that were made ready to run again
using qemu_co_queue_next() or qemu_co_queue_restart_all().  The BH
currently runs in the iothread AioContext and would break coroutines
that run in a different AioContext.

This is a slightly tricky problem because the lifetime of the BH exceeds
that of the CoQueue.  This means coroutines can be awoken after CoQueue
itself has been freed.  Also, there is no qemu_co_queue_destroy()
function which we could use to handle freeing resources.

Introducing qemu_co_queue_destroy() has a ripple effect of requiring us
to also add qemu_co_mutex_destroy() and qemu_co_rwlock_destroy(), as
well as updating all callers.  Avoid doing that.

We also cannot switch from BH to GIdle function because aio_poll() does
not dispatch GIdle functions.  (GIdle functions make memory management
slightly easier because they free themselves.)

Finally, I don't want to move unlock_queue and unlock_bh into
AioContext.  That would break encapsulation - AioContext isn't supposed
to know about CoQueue.

This patch implements a different solution: each qemu_co_queue_next() or
qemu_co_queue_restart_all() call creates a new BH and list of coroutines
to wake up.  Callers tend to invoke qemu_co_queue_next() and
qemu_co_queue_restart_all() occasionally after blocking I/O, so creating
a new BH for each call shouldn't be massively inefficient.

Note that this patch does not add an interface for specifying the
AioContext.  That is left to future patches which will convert CoQueue,
CoMutex, and CoRwlock to expose AioContext.

Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 include/block/coroutine.h |  1 +
 qemu-coroutine-lock.c | 55 ---
 2 files changed, 39 insertions(+), 17 deletions(-)

diff --git a/include/block/coroutine.h b/include/block/coroutine.h
index c31fae3..a978162 100644
--- a/include/block/coroutine.h
+++ b/include/block/coroutine.h
@@ -104,6 +104,7 @@ bool qemu_in_coroutine(void);
  */
 typedef struct CoQueue {
 QTAILQ_HEAD(, Coroutine) entries;
+AioContext *ctx;
 } CoQueue;
 
 /**
diff --git a/qemu-coroutine-lock.c b/qemu-coroutine-lock.c
index 97ef01c..86efe1f 100644
--- a/qemu-coroutine-lock.c
+++ b/qemu-coroutine-lock.c
@@ -29,28 +29,36 @@
 #include block/aio.h
 #include trace.h
 
-static QTAILQ_HEAD(, Coroutine) unlock_bh_queue =
-QTAILQ_HEAD_INITIALIZER(unlock_bh_queue);
-static QEMUBH* unlock_bh;
+/* Coroutines are awoken from a BH to allow the current coroutine to complete
+ * its flow of execution.  The BH may run after the CoQueue has been destroyed,
+ * so keep BH data in a separate heap-allocated struct.
+ */
+typedef struct {
+QEMUBH *bh;
+QTAILQ_HEAD(, Coroutine) entries;
+} CoQueueNextData;
 
 static void qemu_co_queue_next_bh(void *opaque)
 {
+CoQueueNextData *data = opaque;
 Coroutine *next;
 
 trace_qemu_co_queue_next_bh();
-while ((next = QTAILQ_FIRST(unlock_bh_queue))) {
-QTAILQ_REMOVE(unlock_bh_queue, next, co_queue_next);
+while ((next = QTAILQ_FIRST(data-entries))) {
+QTAILQ_REMOVE(data-entries, next, co_queue_next);
 qemu_coroutine_enter(next, NULL);
 }
+
+qemu_bh_delete(data-bh);
+g_slice_free(CoQueueNextData, data);
 }
 
 void qemu_co_queue_init(CoQueue *queue)
 {
 QTAILQ_INIT(queue-entries);
 
-if (!unlock_bh) {
-unlock_bh = qemu_bh_new(qemu_co_queue_next_bh, NULL);
-}
+/* This will be exposed to callers once there are multiple AioContexts */
+queue-ctx = qemu_get_aio_context();
 }
 
 void coroutine_fn qemu_co_queue_wait(CoQueue *queue)
@@ -69,26 +77,39 @@ void coroutine_fn qemu_co_queue_wait_insert_head(CoQueue 
*queue)
 assert(qemu_in_coroutine());
 }
 
-bool qemu_co_queue_next(CoQueue *queue)
+static bool qemu_co_queue_do_restart(CoQueue *queue, bool single)
 {
 Coroutine *next;
+CoQueueNextData *data;
+
+if (QTAILQ_EMPTY(queue-entries)) {
+return false;
+}
 
-next = QTAILQ_FIRST(queue-entries);
-if (next) {
+data = g_slice_new(CoQueueNextData);
+data-bh = aio_bh_new(queue-ctx, qemu_co_queue_next_bh, data);
+QTAILQ_INIT(data-entries);
+qemu_bh_schedule(data-bh);
+
+while ((next = QTAILQ_FIRST(queue-entries)) != NULL) {
 QTAILQ_REMOVE(queue-entries, next, co_queue_next);
-QTAILQ_INSERT_TAIL(unlock_bh_queue, next, co_queue_next);
+QTAILQ_INSERT_TAIL(data-entries, next, co_queue_next);
 trace_qemu_co_queue_next(next);
-qemu_bh_schedule(unlock_bh);
+if (single) {
+break;
+}
 }
+return true;
+}
 
-return (next != NULL);
+bool qemu_co_queue_next(CoQueue *queue)
+{
+return qemu_co_queue_do_restart(queue, true);
 }
 
 void qemu_co_queue_restart_all(CoQueue *queue)
 {
-while (qemu_co_queue_next(queue)) {
-/* Do nothing */
-}
+qemu_co_queue_do_restart(queue, false);
 }
 
 bool qemu_co_queue_empty(CoQueue *queue)
-- 
1.8.1.4

Re: [Qemu-devel] [PATCH v13 5/5] VMXNET3 device implementation

Am 06.03.2013 08:21, schrieb Dmitry Fleytman:
 Signed-off-by: Dmitry Fleytman dmi...@daynix.com
 Signed-off-by: Yan Vugenfirer y...@daynix.com
 ---
  default-configs/pci.mak |1 +
  hw/Makefile.objs|1 +
  hw/pci/pci.h|1 +
  hw/vmxnet3.c| 2460 
 +++
  hw/vmxnet3.h|  760 +++
  5 files changed, 3223 insertions(+)
  create mode 100644 hw/vmxnet3.c
  create mode 100644 hw/vmxnet3.h
 
 diff --git a/default-configs/pci.mak b/default-configs/pci.mak
 index ee2d18d..ce56d58 100644
 --- a/default-configs/pci.mak
 +++ b/default-configs/pci.mak
 @@ -13,6 +13,7 @@ CONFIG_LSI_SCSI_PCI=y
  CONFIG_MEGASAS_SCSI_PCI=y
  CONFIG_RTL8139_PCI=y
  CONFIG_E1000_PCI=y
 +CONFIG_VMXNET3_PCI=y
  CONFIG_IDE_CORE=y
  CONFIG_IDE_QDEV=y
  CONFIG_IDE_PCI=y
 diff --git a/hw/Makefile.objs b/hw/Makefile.objs
 index 14922cb..026aff6 100644
 --- a/hw/Makefile.objs
 +++ b/hw/Makefile.objs
 @@ -120,6 +120,7 @@ common-obj-$(CONFIG_PCNET_COMMON) += pcnet.o
  common-obj-$(CONFIG_E1000_PCI) += e1000.o
  common-obj-$(CONFIG_RTL8139_PCI) += rtl8139.o
  common-obj-$(CONFIG_VMXNET3_PCI) += vmxnet_tx_pkt.o vmxnet_rx_pkt.o
 +common-obj-$(CONFIG_VMXNET3_PCI) += vmxnet3.o
  
  common-obj-$(CONFIG_SMC91C111) += smc91c111.o
  common-obj-$(CONFIG_LAN9118) += lan9118.o
 diff --git a/hw/pci/pci.h b/hw/pci/pci.h
 index f340fe5..3beb70b 100644
 --- a/hw/pci/pci.h
 +++ b/hw/pci/pci.h
 @@ -60,6 +60,7 @@
  #define PCI_DEVICE_ID_VMWARE_NET 0x0720
  #define PCI_DEVICE_ID_VMWARE_SCSI0x0730
  #define PCI_DEVICE_ID_VMWARE_IDE 0x1729
 +#define PCI_DEVICE_ID_VMWARE_VMXNET3 0x07B0
  
  /* Intel (0x8086) */
  #define PCI_DEVICE_ID_INTEL_82551IT  0x1209
 diff --git a/hw/vmxnet3.c b/hw/vmxnet3.c
 new file mode 100644
 index 000..75b7181
 --- /dev/null
 +++ b/hw/vmxnet3.c
 @@ -0,0 +1,2460 @@
 +/*
 + * QEMU VMWARE VMXNET3 paravirtual NIC
 + *
 + * Copyright (c) 2012 Ravello Systems LTD (http://ravellosystems.com)
 + *
 + * Developed by Daynix Computing LTD (http://www.daynix.com)
 + *
 + * Authors:
 + * Dmitry Fleytman dmi...@daynix.com
 + * Tamir Shomer tam...@daynix.com
 + * Yan Vugenfirer y...@daynix.com
 + *
 + * This work is licensed under the terms of the GNU GPL, version 2 or later.
 + * See the COPYING file in the top-level directory.
 + *
 + */
 +
 +#include hw.h
 +#include pci/pci.h
 +#include net/net.h
 +#include virtio-net.h
 +#include net/tap.h
 +#include net/checksum.h
 +#include sysemu/sysemu.h
 +#include qemu-common.h
 +#include qemu/bswap.h
 +#include pci/msix.h
 +#include pci/msi.h
 +
 +#include vmxnet3.h
 +#include vmxnet_debug.h
 +#include vmware_utils.h
 +#include vmxnet_tx_pkt.h
 +#include vmxnet_rx_pkt.h
 +
 +#define PCI_DEVICE_ID_VMWARE_VMXNET3_REVISION 0x1
 +#define VMXNET3_MSIX_BAR_SIZE 0x2000
 +
 +#define VMXNET3_BAR0_IDX  (0)
 +#define VMXNET3_BAR1_IDX  (1)
 +#define VMXNET3_MSIX_BAR_IDX  (2)
 +
 +#define VMXNET3_OFF_MSIX_TABLE (0x000)
 +#define VMXNET3_OFF_MSIX_PBA   (0x800)
 +
 +/* Link speed in Mbps should be shifted by 16 */
 +#define VMXNET3_LINK_SPEED  (1000  16)
 +
 +/* Link status: 1 - up, 0 - down. */
 +#define VMXNET3_LINK_STATUS_UP  0x1
 +
 +/* Least significant bit should be set for revision and version */
 +#define VMXNET3_DEVICE_VERSION0x1
 +#define VMXNET3_DEVICE_REVISION   0x1
 +
 +/* Macros for rings descriptors access */
 +#define VMXNET3_READ_TX_QUEUE_DESCR8(dpa, field) \
 +(vmw_shmem_ld8(dpa + offsetof(struct Vmxnet3_TxQueueDesc, field)))
 +
 +#define VMXNET3_WRITE_TX_QUEUE_DESCR8(dpa, field, value) \
 +(vmw_shmem_st8(dpa + offsetof(struct Vmxnet3_TxQueueDesc, field, value)))
 +
 +#define VMXNET3_READ_TX_QUEUE_DESCR32(dpa, field) \
 +(vmw_shmem_ld32(dpa + offsetof(struct Vmxnet3_TxQueueDesc, field)))
 +
 +#define VMXNET3_WRITE_TX_QUEUE_DESCR32(dpa, field, value) \
 +(vmw_shmem_st32(dpa + offsetof(struct Vmxnet3_TxQueueDesc, field), 
 value))
 +
 +#define VMXNET3_READ_TX_QUEUE_DESCR64(dpa, field) \
 +(vmw_shmem_ld64(dpa + offsetof(struct Vmxnet3_TxQueueDesc, field)))
 +
 +#define VMXNET3_WRITE_TX_QUEUE_DESCR64(dpa, field, value) \
 +(vmw_shmem_st64(dpa + offsetof(struct Vmxnet3_TxQueueDesc, field), 
 value))
 +
 +#define VMXNET3_READ_RX_QUEUE_DESCR64(dpa, field) \
 +(vmw_shmem_ld64(dpa + offsetof(struct Vmxnet3_RxQueueDesc, field)))
 +
 +#define VMXNET3_READ_RX_QUEUE_DESCR32(dpa, field) \
 +(vmw_shmem_ld32(dpa + offsetof(struct Vmxnet3_RxQueueDesc, field)))
 +
 +#define VMXNET3_WRITE_RX_QUEUE_DESCR64(dpa, field, value) \
 +(vmw_shmem_st64(dpa + offsetof(struct Vmxnet3_RxQueueDesc, field), 
 value))
 +
 +#define VMXNET3_WRITE_RX_QUEUE_DESCR8(dpa, field, value) \
 +(vmw_shmem_st8(dpa + offsetof(struct Vmxnet3_RxQueueDesc, field), value))
 +
 +/* Macros for guest driver shared area access */
 +#define VMXNET3_READ_DRV_SHARED64(shpa, field) \
 +(vmw_shmem_ld64(shpa + offsetof(struct Vmxnet3_DriverShared, field)))
 +
 +#define

Re: [Qemu-devel] [PATCHv3] qdev: DEVICE_DELETED event

Am 07.03.2013 10:49, schrieb Michael S. Tsirkin:
 libvirt has a long-standing bug: when removing the device,
 it can request removal but does not know when the
 removal completes. Add an event so we can fix this in a robust way.
 
 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 ---
 
 Changes from v2:
 - move event toward the end of device_unparent,
   so that parents are reported after their children,
   as suggested by Paolo
 Changes from v1:
 - move to device_unparent
 - address comments by Andreas and Eric

Reviewed-by: Andreas Färber afaer...@suse.de

Thanks,
Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

Re: [Qemu-devel] [Qemu-stable] [SeaBIOS] problems with freeBSD

2013-03-07 Thread Michael Tokarev

07.03.2013 15:56, Gerd Hoffmann wrote:

 Just a note, or an alternative opinion, so to say.  In Debian, we have
 a social contract which, among other things, ensures that the binaries
 you, as a user, get, comes with source which you can modify on the
 system you installed.  This requires, for example, that all binaries
 shipped are actually built from the corresponding source, and that
 no blobs from whatever other sources are used, ever.
 
 That is perfectly fine.  How to you handle ppc firmware for x86 hosts
 btw?  Build on ppc buildhost and ship as noarch package?  Or do you do
 cross compiler builds, so that users can patch+rebuild it on their x86
 host too?

Foreign firmware handling is an unsolved issue.  Yes, basically,
it is built on the corresponding build machine (eg, openbios-ppc is
built on a ppc machine) and shipped as noarch package.  The same
is for X86 seabios/vgabios/etc stuff actually.

So technically this is a violation in some way.  However, that same
ppc firmware actually can be rebuilt in a (qemu) virtual machine with
the same debian release installed.

That's why I say it is basically unsolved issue -- the solution
isn't actuall a solution but a workaround instead.

For added fun, currently we ship optionroms from qemu in seabios
source package, exactly because of this reason - seabios is what
gets built on x86 only and is distributed on as noarch binary,
while qemu is built on all architectures.

And for another fun, we've several broken qemu architectures in
Debian at the moment -- s390 and ppc64 are missing firmwares
exactly because of the difficulties building them, despite the
fact they're shipping in upstream qemu tarball.

 We don't even ship any upstream blobs in the debian qemu _source_
 package: we repack upstream qemu.tar.gz by removing these blobs.
 
 That's a bit over the top for my taste as the release tarballs include
 both source and blobs.

Yes this is a bit extremistic, so to say.  But this is the same
base principle of Debian: we should ensure and demonstrate it all
is buildable.  Just mere presence of a blob in a tarball is already
suspicious :)

   Although it might be the debian release is just
 a bit too old for that, not fully sure with which release anthony
 started to include the firmware submodules, might be it was after 1.1

Nope, qemu started including sources before that.  But it doesn't
matter really, it is just the way how debian works, nothing to
do with qemu really.  Especially since most of these submodules
are already packaged in Debian separately (be it because the
qemu needs or due to other means).

Thanks,

/mjt

Re: [Qemu-devel] [PATCH] qdev: DEVICE_DELETED event

Am 07.03.2013 11:07, schrieb Michael S. Tsirkin:
 On Thu, Mar 07, 2013 at 10:55:23AM +0100, Markus Armbruster wrote:
 Michael S. Tsirkin m...@redhat.com writes:

 On Wed, Mar 06, 2013 at 02:57:22PM +0100, Andreas Färber wrote:
 Am 06.03.2013 14:00, schrieb Michael S. Tsirkin:
 libvirt has a long-standing bug: when removing the device,
 it can request removal but does not know when does the
 removal complete. Add an event so we can fix this in a robust way.

 Signed-off-by: Michael S. Tsirkin m...@redhat.com

 Sounds like a good idea to me. :)

 [...]
 diff --git a/hw/qdev.c b/hw/qdev.c
 index 689cd54..f30d251 100644
 --- a/hw/qdev.c
 +++ b/hw/qdev.c
 @@ -29,6 +29,7 @@
  #include sysemu/sysemu.h
  #include qapi/error.h
  #include qapi/visitor.h
 +#include qapi/qmp/qjson.h
  
  int qdev_hotplug = 0;
  static bool qdev_hot_added = false;
 @@ -267,6 +268,11 @@ void qdev_init_nofail(DeviceState *dev)
  /* Unlink device from bus and free the structure.  */
  void qdev_free(DeviceState *dev)
  {
 +if (dev-id) {
 +QObject *data = qobject_from_jsonf({ 'device': %s }, dev-id);
 +monitor_protocol_event(QEVENT_DEVICE_DELETED, data);
 +qobject_decref(data);
 +}
  object_unparent(OBJECT(dev));
  }
  

 I'm pretty sure this is the wrong place to fire the notification. We
 should rather do this when the device is actually deleted - which
 qdev_free() does *not* actually guarantee, as criticized in the s390x
 and unref'ing contexts.
 I would suggest to place your code into device_unparent() instead.

 Another thing to consider is what data to pass to the event: Not all
 devices have an ID.

 If they don't they were not created by management so management is
 probably not interested in them being removed.

 We could always add a 'path' key later if this assumption
 proves incorrect.

 In old qdev, ID was all we had, because paths were busted.  Thus,
 management had no choice but use IDs.

 If I understand modern qdev correctly, we got a canonical path.  Old
 APIs like device_del still accept only ID.  Should new APIs still be
 designed that way?  Or should they always accept / provide the canonical
 path, plus optional ID for convenience?
 
 What are advantages of exposing the path to users in this way?
 Looks like maintainance hassle without real benefits?

Anthony had rejected earlier QOM patches by Paolo related to qdev id,
saying it was deprecated in favor of those QOM paths.

 We should still have a canonical path when we fire
 this event in either qdev_free() or in device_unparent() before the if
 (dev-parent_bus) block though. That would be a question for Anthony,
 not having a use case for the event I am indifferent there.

 Further, thinking of objects such as virtio-rng backends or future
 blockdev/chardev objects, might it make sense to turn this into a
 generic object deletion event rather than a device event?

 Andreas

 Backend deletion doesn't normally have guest interaction right?
 So why do we need an event?

 We need an event because device_del may send its reply before it
 completes the job.

 device_del does that when it deletion needs to interact with the guest,
 which can take unbounded time.

 Conversely, we don't need an event when a QMP always completes the job
 (as far as observable by the QMP client) before it sends its reply.  Off
 hand, I can't see why backend deletion would do anything else.

 I'm always reluctant to abstract when there are fewer than two
 different, concrete things to abstract from.  Right now, we got just
 one: device models.

Not quite: It's about unparenting hook and object deletion, which are
both not limited to devices.

But if the ID based approach gets accepted by Anthony then we can still
introduce an OBJECT_DELETED event once someone needs it.

Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

Re: [Qemu-devel] [PATCH v2] migration: Improve QMP documentation

Juan Quintela quint...@redhat.com writes:

 v2:
 Add Markus sugestions:

This needs to go below the --- line.

 Signed-off-by: Juan Quintela quint...@redhat.com
 ---
  qmp-commands.hx | 50 ++
  1 file changed, 30 insertions(+), 20 deletions(-)

 diff --git a/qmp-commands.hx b/qmp-commands.hx
 index 95022e2..a2c0c6b 100644
 --- a/qmp-commands.hx
 +++ b/qmp-commands.hx
 @@ -644,7 +644,7 @@ EQMP

  SQMP
  migrate-set-cache-size
 --
 +--

  Set cache size to be used by XBZRLE migration, the cache size will be rounded
  down to the nearest power of 2
 @@ -667,7 +667,7 @@ EQMP

  SQMP
  query-migrate-cache-size
 --
 +

  Show cache size to be used by XBZRLE migration

 @@ -2438,25 +2438,35 @@ The main json-object contains the following:
  total amount in ms for downtime that was calculated on
   the last bitmap round (json-int)
  - ram: only present if status is active, it is a json-object with the
 -  following RAM information (in bytes):
 - - transferred: amount transferred (json-int)
 - - remaining: amount remaining (json-int)
 - - total: total (json-int)
 - - duplicate: number of duplicated pages (json-int)
 - - normal : number of normal pages transferred (json-int)
 - - normal-bytes : number of normal bytes transferred (json-int)
 +  following RAM information:
 + - transferred: amount transferred in bytes (json-int)
 + - remaining: amount remaining to transfer in bytes (json-int)
 + - total: total amount of memory in bytes (json-int)
 + - duplicate: number of pages filled entirely with the same
 +byte (json-int)
 + These are sent over the wire much more efficiently.

Tab damage.

 + - normal : number of whole pages transfered.  I.e. they
 +were not sent as duplicate or xbzrle pages (json-int)
 + - normal-bytes : number of bytes transferred in whole
 +pages. This is just normal pages times size of one page,
 +but this way upper levels don't need to care about page
 +size (json-int)
  - disk: only present if status is active and it is a block migration,
 -  it is a json-object with the following disk information (in bytes):
 - - transferred: amount transferred (json-int)
 - - remaining: amount remaining (json-int)
 - - total: total (json-int)
 +  it is a json-object with the following disk information:
 + - transferred: amount transferred in bytes (json-int)
 + - remaining: amount remaining to transfer in bytes json-int)
 + - total: total disk size in bytes (json-int)
  - xbzrle-cache: only present if XBZRLE is active.
It is a json-object with the following XBZRLE information:
 - - cache-size: XBZRLE cache size
 - - bytes: total XBZRLE bytes transferred

Suggest

 - bytes: number of bytes transferred for XBZRLE compressed pages


 + - cache-size: XBZRLE cache size in bytes
 + - bytes: total XBZRLE bytes transferred as xbzrle pages
   - pages: number of XBZRLE compressed pages
 - - cache-miss: number of cache misses
 - - overflow: number of XBZRLE overflows
 + - cache-miss: number of XBRZRLE page cache misses
 + - overflow: number of times XBZRLE overflows.  This means
 +   that the XBZRLE encoding was bigger than just sent the
 +   whole page, and then we sent the whole page instead (as as
 +   normal page).
 +
  Examples:

  1. Before the first migration
[...]

[Qemu-devel] [PATCHv2] migration: move ram migration support

Move RAM migration code from arch_init to savevm-ram.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---

Note: this is on top of Juan's pull request

Changes from v1:
- renamed source file, rebased on top of migration.next as
  suggested by Paolo

 Makefile.target |   2 +-
 arch_init.c | 763 -
 savevm-ram.c| 804 
 3 files changed, 805 insertions(+), 764 deletions(-)
 create mode 100644 savevm-ram.c

diff --git a/Makefile.target b/Makefile.target
index ca657b3..54bc21b 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -108,7 +108,7 @@ CONFIG_NO_XEN = $(if $(subst n,,$(CONFIG_XEN)),n,y)
 CONFIG_NO_GET_MEMORY_MAPPING = $(if $(subst 
n,,$(CONFIG_HAVE_GET_MEMORY_MAPPING)),n,y)
 CONFIG_NO_CORE_DUMP = $(if $(subst n,,$(CONFIG_HAVE_CORE_DUMP)),n,y)
 
-obj-y += arch_init.o cpus.o monitor.o gdbstub.o balloon.o ioport.o
+obj-y += arch_init.o cpus.o monitor.o gdbstub.o balloon.o ioport.o savevm-ram.o
 obj-y += qtest.o
 obj-y += hw/
 obj-$(CONFIG_KVM) += kvm-all.o
diff --git a/arch_init.c b/arch_init.c
index 98e2bc6..9943ed4 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -31,20 +31,15 @@
 #include config.h
 #include monitor/monitor.h
 #include sysemu/sysemu.h
-#include qemu/bitops.h
-#include qemu/bitmap.h
 #include sysemu/arch_init.h
 #include audio/audio.h
 #include hw/pc.h
 #include hw/pci/pci.h
 #include hw/audiodev.h
 #include sysemu/kvm.h
-#include migration/migration.h
 #include exec/gdbstub.h
 #include hw/smbios.h
-#include exec/address-spaces.h
 #include hw/pcspk.h
-#include migration/page_cache.h
 #include qemu/config-file.h
 #include qmp-commands.h
 #include trace.h
@@ -103,38 +98,6 @@ int graphic_depth = 15;
 
 const uint32_t arch_type = QEMU_ARCH;
 
-/***/
-/* ram save/restore */
-
-#define RAM_SAVE_FLAG_FULL 0x01 /* Obsolete, not used anymore */
-#define RAM_SAVE_FLAG_COMPRESS 0x02
-#define RAM_SAVE_FLAG_MEM_SIZE 0x04
-#define RAM_SAVE_FLAG_PAGE 0x08
-#define RAM_SAVE_FLAG_EOS  0x10
-#define RAM_SAVE_FLAG_CONTINUE 0x20
-#define RAM_SAVE_FLAG_XBZRLE   0x40
-
-#ifdef __ALTIVEC__
-#include altivec.h
-#define VECTYPEvector unsigned char
-#define SPLAT(p)   vec_splat(vec_ld(0, p), 0)
-#define ALL_EQ(v1, v2) vec_all_eq(v1, v2)
-/* altivec.h may redefine the bool macro as vector type.
- * Reset it to POSIX semantics. */
-#undef bool
-#define bool _Bool
-#elif defined __SSE2__
-#include emmintrin.h
-#define VECTYPE__m128i
-#define SPLAT(p)   _mm_set1_epi8(*(p))
-#define ALL_EQ(v1, v2) (_mm_movemask_epi8(_mm_cmpeq_epi8(v1, v2)) == 0x)
-#else
-#define VECTYPEunsigned long
-#define SPLAT(p)   (*(p) * (~0UL / 255))
-#define ALL_EQ(v1, v2) ((v1) == (v2))
-#endif
-
-
 static struct defconfig_file {
 const char *filename;
 /* Indicates it is an user config file (disabled by -no-user-config) */
@@ -145,7 +108,6 @@ static struct defconfig_file {
 { NULL }, /* end of list */
 };
 
-
 int qemu_read_default_config_files(bool userconfig)
 {
 int ret;
@@ -164,731 +126,6 @@ int qemu_read_default_config_files(bool userconfig)
 return 0;
 }
 
-static int is_dup_page(uint8_t *page)
-{
-VECTYPE *p = (VECTYPE *)page;
-VECTYPE val = SPLAT(page);
-int i;
-
-for (i = 0; i  TARGET_PAGE_SIZE / sizeof(VECTYPE); i++) {
-if (!ALL_EQ(val, p[i])) {
-return 0;
-}
-}
-
-return 1;
-}
-
-/* struct contains XBZRLE cache and a static page
-   used by the compression */
-static struct {
-/* buffer used for XBZRLE encoding */
-uint8_t *encoded_buf;
-/* buffer for storing page content */
-uint8_t *current_buf;
-/* buffer used for XBZRLE decoding */
-uint8_t *decoded_buf;
-/* Cache for XBZRLE */
-PageCache *cache;
-} XBZRLE = {
-.encoded_buf = NULL,
-.current_buf = NULL,
-.decoded_buf = NULL,
-.cache = NULL,
-};
-
-
-int64_t xbzrle_cache_resize(int64_t new_size)
-{
-if (XBZRLE.cache != NULL) {
-return cache_resize(XBZRLE.cache, new_size / TARGET_PAGE_SIZE) *
-TARGET_PAGE_SIZE;
-}
-return pow2floor(new_size);
-}
-
-/* accounting for migration statistics */
-typedef struct AccountingInfo {
-uint64_t dup_pages;
-uint64_t norm_pages;
-uint64_t iterations;
-uint64_t xbzrle_bytes;
-uint64_t xbzrle_pages;
-uint64_t xbzrle_cache_miss;
-uint64_t xbzrle_overflows;
-} AccountingInfo;
-
-static AccountingInfo acct_info;
-
-static void acct_clear(void)
-{
-memset(acct_info, 0, sizeof(acct_info));
-}
-
-uint64_t dup_mig_bytes_transferred(void)
-{
-return acct_info.dup_pages * TARGET_PAGE_SIZE;
-}
-
-uint64_t dup_mig_pages_transferred(void)
-{
-return acct_info.dup_pages;
-}
-
-uint64_t norm_mig_bytes_transferred(void)
-{
-return acct_info.norm_pages * TARGET_PAGE_SIZE;
-}
-
-uint64_t norm_mig_pages_transferred(void)
-{
-return

Re: [Qemu-devel] [PATCH 3/7] block: vhdx header for the QEMU support of VHDX images

On Wed, Mar 06, 2013 at 09:47:41AM -0500, Jeff Cody wrote:
 +#define VHDX_ZERO_MGIC 0x6F72657A /* 'zero' */

s/MGIC/MAGIC/

 +typedef struct QEMU_PACKED vhdx_log_zero_descriptor {
 +uint32_tzero_signature; /* zero in ASCII */
 +uint32_treserver;

s/reserver/reserved/ ?

Re: [Qemu-devel] propose to implement ower device

2013-03-07 Thread Wenchao Xia


于 2013-3-7 11:40, li guang 写道:

Hi, Anthony and all

By now all devices of QEMU do not have much more
power management consideration, for example, if
system do suspend, it will call all registered notifiers,
this was loosely required, and the code to do power management
state transition seems just do 'ugly emulation', rather than be
conscious with whole system devices, same condition with reset(it
has been embedded in DeviceClass, good!),
shutdown, in real world, commonly all devices' power are controlled
by a power chip, then all power sequence can be done just
issue commands to this chip.
so, I come across an idea to implement qdev'ed power device, and
make all qdev struct of devices aware of self power management(add
reset, suspend, shutdown ... filed for DeviceClass), this will
bring tidy power management, and the emulation will more like what
happened in real world.
Hope I've expressed my idea clearly, if you have any questions, let me
know, if get your permission, I'll step ahead to do this power
chip emulation work.

Thanks!




Hi, Guang
  it seems the mail was sent as a child mail of Default guest RAM
size. The topic is interesting, suggest resend it.

--
Best Regards

Wenchao Xia

Re: [Qemu-devel] [PATCH 3/7] block: vhdx header for the QEMU support of VHDX images

2013-03-07 Thread Jeff Cody

On Thu, Mar 07, 2013 at 02:15:42PM +0100, Stefan Hajnoczi wrote:
 On Wed, Mar 06, 2013 at 09:47:41AM -0500, Jeff Cody wrote:
  +#define VHDX_ZERO_MGIC 0x6F72657A /* 'zero' */
 
 s/MGIC/MAGIC/
 
  +typedef struct QEMU_PACKED vhdx_log_zero_descriptor {
  +uint32_tzero_signature; /* zero in ASCII */
  +uint32_treserver;
 
 s/reserver/reserved/ ?

Yes, thanks (on both counts).

Re: [Qemu-devel] [PATCH v13 5/5] VMXNET3 device implementation

2013-03-07 Thread Dmitry Fleytman

Thanks, Andreas

Now I see.
We'll resubmit our patches soon.

Dmitry.


On Thu, Mar 7, 2013 at 2:59 PM, Andreas Färber afaer...@suse.de wrote:

 Am 06.03.2013 08:21, schrieb Dmitry Fleytman:
  Signed-off-by: Dmitry Fleytman dmi...@daynix.com
  Signed-off-by: Yan Vugenfirer y...@daynix.com
  ---
   default-configs/pci.mak |1 +
   hw/Makefile.objs|1 +
   hw/pci/pci.h|1 +
   hw/vmxnet3.c| 2460
 +++
   hw/vmxnet3.h|  760 +++
   5 files changed, 3223 insertions(+)
   create mode 100644 hw/vmxnet3.c
   create mode 100644 hw/vmxnet3.h
 
  diff --git a/default-configs/pci.mak b/default-configs/pci.mak
  index ee2d18d..ce56d58 100644
  --- a/default-configs/pci.mak
  +++ b/default-configs/pci.mak
  @@ -13,6 +13,7 @@ CONFIG_LSI_SCSI_PCI=y
   CONFIG_MEGASAS_SCSI_PCI=y
   CONFIG_RTL8139_PCI=y
   CONFIG_E1000_PCI=y
  +CONFIG_VMXNET3_PCI=y
   CONFIG_IDE_CORE=y
   CONFIG_IDE_QDEV=y
   CONFIG_IDE_PCI=y
  diff --git a/hw/Makefile.objs b/hw/Makefile.objs
  index 14922cb..026aff6 100644
  --- a/hw/Makefile.objs
  +++ b/hw/Makefile.objs
  @@ -120,6 +120,7 @@ common-obj-$(CONFIG_PCNET_COMMON) += pcnet.o
   common-obj-$(CONFIG_E1000_PCI) += e1000.o
   common-obj-$(CONFIG_RTL8139_PCI) += rtl8139.o
   common-obj-$(CONFIG_VMXNET3_PCI) += vmxnet_tx_pkt.o vmxnet_rx_pkt.o
  +common-obj-$(CONFIG_VMXNET3_PCI) += vmxnet3.o
 
   common-obj-$(CONFIG_SMC91C111) += smc91c111.o
   common-obj-$(CONFIG_LAN9118) += lan9118.o
  diff --git a/hw/pci/pci.h b/hw/pci/pci.h
  index f340fe5..3beb70b 100644
  --- a/hw/pci/pci.h
  +++ b/hw/pci/pci.h
  @@ -60,6 +60,7 @@
   #define PCI_DEVICE_ID_VMWARE_NET 0x0720
   #define PCI_DEVICE_ID_VMWARE_SCSI0x0730
   #define PCI_DEVICE_ID_VMWARE_IDE 0x1729
  +#define PCI_DEVICE_ID_VMWARE_VMXNET3 0x07B0
 
   /* Intel (0x8086) */
   #define PCI_DEVICE_ID_INTEL_82551IT  0x1209
  diff --git a/hw/vmxnet3.c b/hw/vmxnet3.c
  new file mode 100644
  index 000..75b7181
  --- /dev/null
  +++ b/hw/vmxnet3.c
  @@ -0,0 +1,2460 @@
  +/*
  + * QEMU VMWARE VMXNET3 paravirtual NIC
  + *
  + * Copyright (c) 2012 Ravello Systems LTD (http://ravellosystems.com)
  + *
  + * Developed by Daynix Computing LTD (http://www.daynix.com)
  + *
  + * Authors:
  + * Dmitry Fleytman dmi...@daynix.com
  + * Tamir Shomer tam...@daynix.com
  + * Yan Vugenfirer y...@daynix.com
  + *
  + * This work is licensed under the terms of the GNU GPL, version 2 or
 later.
  + * See the COPYING file in the top-level directory.
  + *
  + */
  +
  +#include hw.h
  +#include pci/pci.h
  +#include net/net.h
  +#include virtio-net.h
  +#include net/tap.h
  +#include net/checksum.h
  +#include sysemu/sysemu.h
  +#include qemu-common.h
  +#include qemu/bswap.h
  +#include pci/msix.h
  +#include pci/msi.h
  +
  +#include vmxnet3.h
  +#include vmxnet_debug.h
  +#include vmware_utils.h
  +#include vmxnet_tx_pkt.h
  +#include vmxnet_rx_pkt.h
  +
  +#define PCI_DEVICE_ID_VMWARE_VMXNET3_REVISION 0x1
  +#define VMXNET3_MSIX_BAR_SIZE 0x2000
  +
  +#define VMXNET3_BAR0_IDX  (0)
  +#define VMXNET3_BAR1_IDX  (1)
  +#define VMXNET3_MSIX_BAR_IDX  (2)
  +
  +#define VMXNET3_OFF_MSIX_TABLE (0x000)
  +#define VMXNET3_OFF_MSIX_PBA   (0x800)
  +
  +/* Link speed in Mbps should be shifted by 16 */
  +#define VMXNET3_LINK_SPEED  (1000  16)
  +
  +/* Link status: 1 - up, 0 - down. */
  +#define VMXNET3_LINK_STATUS_UP  0x1
  +
  +/* Least significant bit should be set for revision and version */
  +#define VMXNET3_DEVICE_VERSION0x1
  +#define VMXNET3_DEVICE_REVISION   0x1
  +
  +/* Macros for rings descriptors access */
  +#define VMXNET3_READ_TX_QUEUE_DESCR8(dpa, field) \
  +(vmw_shmem_ld8(dpa + offsetof(struct Vmxnet3_TxQueueDesc, field)))
  +
  +#define VMXNET3_WRITE_TX_QUEUE_DESCR8(dpa, field, value) \
  +(vmw_shmem_st8(dpa + offsetof(struct Vmxnet3_TxQueueDesc, field,
 value)))
  +
  +#define VMXNET3_READ_TX_QUEUE_DESCR32(dpa, field) \
  +(vmw_shmem_ld32(dpa + offsetof(struct Vmxnet3_TxQueueDesc, field)))
  +
  +#define VMXNET3_WRITE_TX_QUEUE_DESCR32(dpa, field, value) \
  +(vmw_shmem_st32(dpa + offsetof(struct Vmxnet3_TxQueueDesc, field),
 value))
  +
  +#define VMXNET3_READ_TX_QUEUE_DESCR64(dpa, field) \
  +(vmw_shmem_ld64(dpa + offsetof(struct Vmxnet3_TxQueueDesc, field)))
  +
  +#define VMXNET3_WRITE_TX_QUEUE_DESCR64(dpa, field, value) \
  +(vmw_shmem_st64(dpa + offsetof(struct Vmxnet3_TxQueueDesc, field),
 value))
  +
  +#define VMXNET3_READ_RX_QUEUE_DESCR64(dpa, field) \
  +(vmw_shmem_ld64(dpa + offsetof(struct Vmxnet3_RxQueueDesc, field)))
  +
  +#define VMXNET3_READ_RX_QUEUE_DESCR32(dpa, field) \
  +(vmw_shmem_ld32(dpa + offsetof(struct Vmxnet3_RxQueueDesc, field)))
  +
  +#define VMXNET3_WRITE_RX_QUEUE_DESCR64(dpa, field, value) \
  +(vmw_shmem_st64(dpa + offsetof(struct Vmxnet3_RxQueueDesc, field),
 value))
  +
  +#define VMXNET3_WRITE_RX_QUEUE_DESCR8(dpa, field, value) \
  +

Re: [Qemu-devel] problems with freeBSD

2013-03-07 Thread Aurelien Jarno

On Thu, Mar 07, 2013 at 01:16:54PM +0100, Laszlo Ersek wrote:
 On 03/07/13 09:43, Aurelien Jarno wrote:
 
  I did a git bisect to find the commit fixing the issue. Then, as I was
  not believing the result, I tried the following sequence a dozen of
  times (for some unknown reasons the FreeBSD install CD doesn't exhibit
  the issue, so I used the Debian GNU/kFreeBSD installer):
  
  | mkdir qemu-freebsd-bug
  | cd qemu-freebsd-bug
  |
  | wget 
  http://ftp.debian.org/debian/dists/squeeze/main/installer-kfreebsd-amd64/current/images/netboot/mini.iso
   
  |
  | git clone git://git.qemu.org/qemu.git
  | cd qemu
  | git checkout -b stable-1.4 v1.4.0
  | ./configure --target-list=x86_64-softmmu
  | make
  | cd ..
  |
  | git clone git://git.seabios.org/seabios.git
  | cd seabios
  | git checkout -b 1.7.2-stable origin/1.7.2-stable
  | git reset --hard 4219149ad2b783abfa61e80e9e9f6910db0c76c9
  | make
  | cp out/bios.bin ../qemu/pc-bios
  | cd..
  |
  | # debian-installer boots correctly 
  | ./qemu/x86_64-softmmu/qemu-system-x86_64 -enable-kvm -cdrom mini.iso
  |
  | cd seabios
  | git reset --hard d75c22fcb6521dad11428b65789d92f89675c600 
  | git clean -fdx
  | make
  | cp out/bios.bin ../qemu/pc-bios
  | cd ..
  |
  | # debian-installer fails to boot
  | ./qemu/x86_64-softmmu/qemu-system-x86_64 -enable-kvm -cdrom mini.iso 
  
  
  Maybe I am doing something wrong or there is a bug in my toolchain
  (Debian Sid). It would be nice if someone could try to reproduce that on
  another distro/system.
  
 
 Can you save the out/ directory from both builds and diff -ur them
 (maybe just the *.lds files)?

Please find it attached.

 I'm noticing that pathnames are embedded in some ELF section names (I
 hope this sentence makes sense), and when you build at d75c22fc, those
 pathnames contain dot-dot (..), ie. two section name separators next
 to each other. Maybe that's not good; no idea.
 

It's clearly the case, but I also don't know if it is a good solution or
not. What is sure is that it also changes the address of the text
section in bios.bin.elf.

-- 
Aurelien Jarno  GPG: 1024D/F1BCDB73
aurel...@aurel32.net http://www.aurel32.net


seabios.qemu-freebsd-bug.diff.xz
Description: Binary data

Re: [Qemu-devel] [PATCH 01/10] qdev: add qdev property for bool type

Am 25.02.2013 02:03, schrieb Igor Mammedov:
 Signed-off-by: Igor Mammedov imamm...@redhat.com

I vaguely remember having written something like this long time ago for
ISA and for Vasilis... looks good except for minor nits.

 ---
  hw/qdev-properties.c |   33 +
  hw/qdev-properties.h |   10 ++
  2 files changed, 43 insertions(+), 0 deletions(-)
 
 diff --git a/hw/qdev-properties.c b/hw/qdev-properties.c
 index a8a31f5..16ac814 100644
 --- a/hw/qdev-properties.c
 +++ b/hw/qdev-properties.c
 @@ -106,6 +106,39 @@ PropertyInfo qdev_prop_bit = {
  .set   = set_bit,
  };
  
 +/* --- bool --- */
 +
 +static void get_bool(Object *obj, Visitor *v, void *opaque,
 + const char *name, Error **errp)
 +{
 +DeviceState *dev = DEVICE(obj);
 +Property *prop = opaque;
 +bool *ptr = qdev_get_prop_ptr(dev, prop);
 +
 +visit_type_bool(v, ptr, name, errp);
 +}
 +
 +static void set_bool(Object *obj, Visitor *v, void *opaque,
 +   const char *name, Error **errp)

Indentation is off.

 +{
 +DeviceState *dev = DEVICE(obj);
 +Property *prop = opaque;
 +bool *ptr = qdev_get_prop_ptr(dev, prop);
 +
 +if (dev-realized) {
 +error_setg(errp, Insufficient permission to perform this 
 operation);

Should we be more specific here?
E.g., Property '%s' cannot be changed once realized.

Andreas

 +return;
 +}
 +
 +visit_type_bool(v, ptr, name, errp);
 +}
 +
 +PropertyInfo qdev_prop_bool = {
 +.name  = boolean,
 +.get   = get_bool,
 +.set   = set_bool,
 +};
 +
  /* --- 8bit integer --- */
  
  static void get_uint8(Object *obj, Visitor *v, void *opaque,
 diff --git a/hw/qdev-properties.h b/hw/qdev-properties.h
 index 20c67f3..3915f7c 100644
 --- a/hw/qdev-properties.h
 +++ b/hw/qdev-properties.h
 @@ -6,6 +6,7 @@
  /*** qdev-properties.c ***/
  
  extern PropertyInfo qdev_prop_bit;
 +extern PropertyInfo qdev_prop_bool;
  extern PropertyInfo qdev_prop_uint8;
  extern PropertyInfo qdev_prop_uint16;
  extern PropertyInfo qdev_prop_uint32;
 @@ -51,6 +52,15 @@ extern PropertyInfo qdev_prop_pci_host_devaddr;
  .defval= (bool)_defval,  \
  }
  
 +#define DEFINE_PROP_BOOL(_name, _state, _field, _defval) {   \
 +.name  = (_name),\
 +.info  = (qdev_prop_bool),  \
 +.offset= offsetof(_state, _field)\
 ++ type_check(bool, typeof_field(_state, _field)),\
 +.qtype = QTYPE_QBOOL,\
 +.defval= (bool)_defval,  \
 +}
 +
  #define DEFINE_PROP_UINT8(_n, _s, _f, _d)   \
  DEFINE_PROP_DEFAULT(_n, _s, _f, _d, qdev_prop_uint8, uint8_t)
  #define DEFINE_PROP_UINT16(_n, _s, _f, _d)  \
 


-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

Re: [Qemu-devel] problems with freeBSD

2013-03-07 Thread Kevin O'Connor

On Thu, Mar 07, 2013 at 09:43:04AM +0100, Aurelien Jarno wrote:
 On Wed, Mar 06, 2013 at 07:53:51PM -0500, Kevin O'Connor wrote:
  That change is definitely just build related - I don't see how it
  could impact the final SeaBIOS binary.  How did you conclude that this
  commit is what fixes the issue?
  
 
 I did a git bisect to find the commit fixing the issue. Then, as I was
 not believing the result, I tried the following sequence a dozen of
 times (for some unknown reasons the FreeBSD install CD doesn't exhibit
 the issue, so I used the Debian GNU/kFreeBSD installer):

Thanks.  I'll take a look at this tonight.  If you get a chance first,
could you try adding make distclean; make clean into the seabios
steps?

-Kevin

Re: [Qemu-devel] [PATCH] qdev: DEVICE_DELETED event

Andreas Färber afaer...@suse.de writes:

 Am 07.03.2013 11:07, schrieb Michael S. Tsirkin:
 On Thu, Mar 07, 2013 at 10:55:23AM +0100, Markus Armbruster wrote:
 Michael S. Tsirkin m...@redhat.com writes:

 On Wed, Mar 06, 2013 at 02:57:22PM +0100, Andreas Färber wrote:
 Am 06.03.2013 14:00, schrieb Michael S. Tsirkin:
 libvirt has a long-standing bug: when removing the device,
 it can request removal but does not know when does the
 removal complete. Add an event so we can fix this in a robust way.

 Signed-off-by: Michael S. Tsirkin m...@redhat.com

 Sounds like a good idea to me. :)

 [...]
 diff --git a/hw/qdev.c b/hw/qdev.c
 index 689cd54..f30d251 100644
 --- a/hw/qdev.c
 +++ b/hw/qdev.c
 @@ -29,6 +29,7 @@
  #include sysemu/sysemu.h
  #include qapi/error.h
  #include qapi/visitor.h
 +#include qapi/qmp/qjson.h
  
  int qdev_hotplug = 0;
  static bool qdev_hot_added = false;
 @@ -267,6 +268,11 @@ void qdev_init_nofail(DeviceState *dev)
  /* Unlink device from bus and free the structure.  */
  void qdev_free(DeviceState *dev)
  {
 +if (dev-id) {
 +QObject *data = qobject_from_jsonf({ 'device': %s }, dev-id);
 +monitor_protocol_event(QEVENT_DEVICE_DELETED, data);
 +qobject_decref(data);
 +}
  object_unparent(OBJECT(dev));
  }
  

 I'm pretty sure this is the wrong place to fire the notification. We
 should rather do this when the device is actually deleted - which
 qdev_free() does *not* actually guarantee, as criticized in the s390x
 and unref'ing contexts.
 I would suggest to place your code into device_unparent() instead.

 Another thing to consider is what data to pass to the event: Not all
 devices have an ID.

 If they don't they were not created by management so management is
 probably not interested in them being removed.

 We could always add a 'path' key later if this assumption
 proves incorrect.

 In old qdev, ID was all we had, because paths were busted.  Thus,
 management had no choice but use IDs.

 If I understand modern qdev correctly, we got a canonical path.  Old
 APIs like device_del still accept only ID.  Should new APIs still be
 designed that way?  Or should they always accept / provide the canonical
 path, plus optional ID for convenience?
 
 What are advantages of exposing the path to users in this way?

The path is the device's canonical name.  Canonical means path:device is
1:1.  Path always works.  Qdev ID only works when the user assigned one.

Funny case: board creates a hot-pluggable device by default (thus no
qdev ID), guest ejects it, what do you put into the event?  Your code
simply doesn't emit one.

You could blame the user; after all he could've used -nodefaults, and
added the device himself, with an ID.

I blame your design instead, which needlessly complicates the event's
semantics: it gets emitted only for devices with a qdev ID.  Which you
neglected to document clearly, by the way.

If you put the path into the event, you can emit it always, which is
simpler.  Feel free to throw in the qdev ID.

 Looks like maintainance hassle without real benefits?

I can't see path being a greater maintenance hassle than ID.

 Anthony had rejected earlier QOM patches by Paolo related to qdev id,
 saying it was deprecated in favor of those QOM paths.

More reason to put the path into the event, not just the qdev ID.

 We should still have a canonical path when we fire
 this event in either qdev_free() or in device_unparent() before the if
 (dev-parent_bus) block though. That would be a question for Anthony,
 not having a use case for the event I am indifferent there.

 Further, thinking of objects such as virtio-rng backends or future
 blockdev/chardev objects, might it make sense to turn this into a
 generic object deletion event rather than a device event?

 Andreas

 Backend deletion doesn't normally have guest interaction right?
 So why do we need an event?

 We need an event because device_del may send its reply before it
 completes the job.

 device_del does that when it deletion needs to interact with the guest,
 which can take unbounded time.

 Conversely, we don't need an event when a QMP always completes the job
 (as far as observable by the QMP client) before it sends its reply.  Off
 hand, I can't see why backend deletion would do anything else.

 I'm always reluctant to abstract when there are fewer than two
 different, concrete things to abstract from.  Right now, we got just
 one: device models.

 Not quite: It's about unparenting hook and object deletion, which are
 both not limited to devices.

Yes, the implementation of the event happens to be sit outside qdev land
in object land.

On a conceptual level, the event makes sense only for objects that can
go away asynchronously, i.e. not in response to a QMP command and before
that QMP command sends its reply.  The only such objects behaving like
that are device models, as far as I know (unless we widen the scope to
multiple monitors, where one monitor doesn't know what the other is

Re: [Qemu-devel] [PATCH 2/2] input: introduce keyboard handler list

Gerd Hoffmann kra...@redhat.com writes:

 Add a linked list of keyboard handlers.  Added handlers will go
 to the head of the list.  Removed handlers will be zapped from
 the list.  The head of the list will be used for events.

 This fixes the keyboard-dead-after-usb-kbd-unplug issue, key events
 will be re-routed to the ps/2 kbd instead of being discarded.

Patterned after the mouse code, which already has such a list.

For mice, we have a monitor command mouse_set to select the active one.
Useful for keyboards, too?  I dimly recall somebody trying something
like that in the past.

 Signed-off-by: Gerd Hoffmann kra...@redhat.com
 ---
  hw/hid.c |4 ++--
  hw/hid.h |1 +
  include/ui/console.h |6 --
  ui/input.c   |   37 +
  4 files changed, 32 insertions(+), 16 deletions(-)

 diff --git a/hw/hid.c b/hw/hid.c
 index 89b5415..6be00ec 100644
 --- a/hw/hid.c
 +++ b/hw/hid.c
 @@ -415,7 +415,7 @@ void hid_free(HIDState *hs)
  {
  switch (hs-kind) {
  case HID_KEYBOARD:
 -qemu_remove_kbd_event_handler();
 +qemu_remove_kbd_event_handler(hs-kbd.eh_entry);
  break;
  case HID_MOUSE:
  case HID_TABLET:
 @@ -431,7 +431,7 @@ void hid_init(HIDState *hs, int kind, HIDEventFunc event)
  hs-event = event;
  
  if (hs-kind == HID_KEYBOARD) {
 -qemu_add_kbd_event_handler(hid_keyboard_event, hs);
 +hs-kbd.eh_entry = qemu_add_kbd_event_handler(hid_keyboard_event, 
 hs);
  } else if (hs-kind == HID_MOUSE) {
  hs-ptr.eh_entry = qemu_add_mouse_event_handler(hid_pointer_event, 
 hs,
  0, QEMU HID Mouse);
 diff --git a/hw/hid.h b/hw/hid.h
 index 56c71ed..2567879 100644
 --- a/hw/hid.h
 +++ b/hw/hid.h
 @@ -31,6 +31,7 @@ typedef struct HIDKeyboardState {
  uint8_t leds;
  uint8_t key[16];
  int32_t keys;
 +QEMUPutKbdEntry *eh_entry;
  } HIDKeyboardState;
  
  struct HIDState {
 diff --git a/include/ui/console.h b/include/ui/console.h
 index ce5a550..fd39d94 100644
 --- a/include/ui/console.h
 +++ b/include/ui/console.h
 @@ -28,10 +28,12 @@ typedef void QEMUPutLEDEvent(void *opaque, int ledstate);
  typedef void QEMUPutMouseEvent(void *opaque, int dx, int dy, int dz, int 
 buttons_state);
  
  typedef struct QEMUPutMouseEntry QEMUPutMouseEntry;
 +typedef struct QEMUPutKbdEntry QEMUPutKbdEntry;
  typedef struct QEMUPutLEDEntry QEMUPutLEDEntry;
  
 -void qemu_add_kbd_event_handler(QEMUPutKBDEvent *func, void *opaque);
 -void qemu_remove_kbd_event_handler(void);
 +QEMUPutKbdEntry *qemu_add_kbd_event_handler(QEMUPutKBDEvent *func,
 +void *opaque);
 +void qemu_remove_kbd_event_handler(QEMUPutKbdEntry *entry);
  QEMUPutMouseEntry *qemu_add_mouse_event_handler(QEMUPutMouseEvent *func,
  void *opaque, int absolute,
  const char *name);
 diff --git a/ui/input.c b/ui/input.c
 index 87a23df..59d0578 100644
 --- a/ui/input.c
 +++ b/ui/input.c
 @@ -41,18 +41,25 @@ struct QEMUPutMouseEntry {
  QTAILQ_ENTRY(QEMUPutMouseEntry) node;
  };
  
 +struct QEMUPutKbdEntry {
 +QEMUPutLEDEvent *put_kbd;

Sure it's not QEMUPutKbdEvent?

 +void *opaque;
 +QTAILQ_ENTRY(QEMUPutKbdEntry) next;
 +};
 +
  struct QEMUPutLEDEntry {
  QEMUPutLEDEvent *put_led;
  void *opaque;
  QTAILQ_ENTRY(QEMUPutLEDEntry) next;
  };
  
 -static QEMUPutKBDEvent *qemu_put_kbd_event;
 -static void *qemu_put_kbd_event_opaque;
 -static QTAILQ_HEAD(, QEMUPutLEDEntry) led_handlers = 
 QTAILQ_HEAD_INITIALIZER(led_handlers);
 +static QTAILQ_HEAD(, QEMUPutLEDEntry) led_handlers =
 +QTAILQ_HEAD_INITIALIZER(led_handlers);

Unrelated style cleanup.  Tolerable.

 +static QTAILQ_HEAD(, QEMUPutKbdEntry) kbd_handlers =
 +QTAILQ_HEAD_INITIALIZER(kbd_handlers);
  static QTAILQ_HEAD(, QEMUPutMouseEntry) mouse_handlers =
  QTAILQ_HEAD_INITIALIZER(mouse_handlers);
 -static NotifierList mouse_mode_notifiers = 
 +static NotifierList mouse_mode_notifiers =
  NOTIFIER_LIST_INITIALIZER(mouse_mode_notifiers);

Unrelated whitespace cleanup.  Tolerable.

  
  static const int key_defs[] = {
 @@ -306,16 +313,20 @@ void qmp_send_key(KeyValueList *keys, bool 
 has_hold_time, int64_t hold_time,
 muldiv64(get_ticks_per_sec(), hold_time, 1000));
  }
  
 -void qemu_add_kbd_event_handler(QEMUPutKBDEvent *func, void *opaque)
 +QEMUPutKbdEntry *qemu_add_kbd_event_handler(QEMUPutKBDEvent *func, void 
 *opaque)
  {
 -qemu_put_kbd_event_opaque = opaque;
 -qemu_put_kbd_event = func;
 +QEMUPutKbdEntry *entry;
 +
 +entry = g_malloc0(sizeof(QEMUPutKbdEntry));
 +entry-put_kbd = func;
 +entry-opaque = opaque;
 +QTAILQ_INSERT_HEAD(kbd_handlers, entry, next);
 +return entry;
  }
  
 -void qemu_remove_kbd_event_handler(void)
 +void

Re: [Qemu-devel] [PATCH 1/2] input: make QEMUPutLEDEntry + QEMUPutMouseEntry private

Gerd Hoffmann kra...@redhat.com writes:

 There is no need for anybody outside ui/input.c to access the
 struct elements.  Move the definitions, leaving only the typedefs
 in the header files.

No-brainer (assuming it compiles).

Reviewed-by: Markus Armbruster arm...@redhat.com

Re: [Qemu-devel] [PATCH 4/7] block: initial VHDX driver support framework - supports open and probe

On Wed, Mar 06, 2013 at 09:48:11AM -0500, Jeff Cody wrote:
 +#define leguid_to_cpus(guid) do { \
 +le32_to_cpus((guid)-data1); \
 +le16_to_cpus((guid)-data2); \
 +le16_to_cpus((guid)-data3); } while (0)

This should be a function.  Please avoid macros.

 +static const ms_guid logical_sector_guid = {.data1 = 0x8141bf1d,
 +.data2 = 0xa96f,
 +.data3 = 0x4709,
 +   .data4 = { 0xba, 0x47, 0xf2, 0x33,

Indentation

 +  0xa8, 0xfa, 0xab, 
 0x5f} };
 +
 +static const ms_guid phys_sector_guid =  { .data1 = 0xcda348c7,
 +   .data2 = 0x445d,
 +   .data3 = 0x4471,
 +   .data4 = { 0x9c, 0xc9, 0xe9, 0x88,
 +  0x52, 0x51, 0xc5, 
 0x56} };
 +
 +static const ms_guid parent_locator_guid = {.data1 = 0xa8d35f2d,
 +.data2 = 0xb30b,
 +.data3 = 0x454d,
 +   .data4 = { 0xab, 0xf7, 0xd3, 0xd8,

Indentation

 +typedef struct BDRVVHDXState {
 +CoMutex lock;
 +
 +int curr_header;
 +vhdx_header *headers[2];
 +
 +vhdx_region_table_header rt;
 +vhdx_region_table_entry bat_rt; /* region table for the BAT */
 +vhdx_region_table_entry metadata_rt;/* region table for the metadata 
 */
 +vhdx_region_table_entry *unknown_rt;
 +unsigned int unknown_rt_size;

This field isn't used yet but unsigned int for something that has
size in its name is suspicious.  Should it be size_t (memory) or
uint64_t (disk)?

 +static int vhdx_probe(const uint8_t *buf, int buf_size, const char *filename)
 +{
 +if (buf_size = 8  !strncmp((char *)buf, vhdxfile, 8)) {

memcmp() saves you the trouble of casting buf (which should stay const,
BTW).

 +static int vhdx_parse_metadata(BlockDriverState *bs, BDRVVHDXState *s)
 +{
 +int ret = 0;
 +uint8_t *buffer;
 +int offset = 0;
 +int i = 0;
 +uint32_t block_size, sectors_per_block, logical_sector_size;
 +uint64_t chunk_ratio;
 +vhdx_metadata_table_entry md_entry;
 +
 +buffer = g_malloc(VHDX_METADATA_TABLE_MAX_SIZE);

Please use qemu_blockalign() to avoid bounce buffers in case the memory
is not 512-byte aligned.  Use qemu_vfree() instead of g_free().

This applies to all I/O buffers that the block driver allocates.

 +
 +ret = bdrv_pread(bs-file, s-metadata_rt.file_offset, buffer,
 + VHDX_METADATA_TABLE_MAX_SIZE);
 +if (ret  0) {
 +goto fail_no_free;
 +}
 +memcpy(s-metadata_hdr, buffer, sizeof(s-metadata_hdr));
 +offset += sizeof(s-metadata_hdr);
 +
 +le64_to_cpus(s-metadata_hdr.signature);
 +le16_to_cpus(s-metadata_hdr.reserved);
 +le16_to_cpus(s-metadata_hdr.entry_count);
 +
 +if (s-metadata_hdr.signature != VHDX_METADATA_MAGIC) {
 +ret = -EINVAL;
 +goto fail_no_free;
 +}
 +
 +s-metadata_entries.present = 0;
 +
 +for (i = 0; i  s-metadata_hdr.entry_count; i++) {
 +memcpy(md_entry, buffer+offset, sizeof(md_entry));

Read beyond end of buffer if metadata_hdr.entry_count is wrong.

We cannot trust the image file - it can be corrupt or malicious.  Checks
are required for everything that can be validated.  QEMU cannot do
anything that would cause a crash or memory corruption on invalid input.

 +ret = bdrv_pread(bs-file,
 + s-metadata_entries.file_parameters_entry.offset
 + + s-metadata_rt.file_offset,
 + s-params,
 + sizeof(s-params));

Missing error code path when ret  0.

 +/* determine virtual disk size, logical sector size,
 + * and phys sector size */
 +
 +ret = bdrv_pread(bs-file,
 + s-metadata_entries.virtual_disk_size_entry.offset
 +   + s-metadata_rt.file_offset,
 + s-virtual_disk_size,
 + sizeof(uint64_t));
 +if (ret  0) {
 +goto exit;
 +}
 +ret = bdrv_pread(bs-file,
 + s-metadata_entries.logical_sector_size_entry.offset
 + + s-metadata_rt.file_offset,
 + s-logical_sector_size,
 + sizeof(uint32_t));
 +if (ret  0) {
 +goto exit;
 +}
 +ret = bdrv_pread(bs-file,
 + s-metadata_entries.phys_sector_size_entry.offset
 +  + s-metadata_rt.file_offset,
 + s-physical_sector_size,
 + sizeof(uint32_t));
 +if (ret  0) {
 +goto exit;
 +}
 +
 +le64_to_cpus(s-virtual_disk_size);
 +

[Qemu-devel] Windows doesn't like MSI/MSI-X

2013-03-07 Thread Hannes Reinecke


Hi all,

recently I've tried to teach megasas MSI/MSI-X. While it works 
perfectly under Linux, Windows refuses to.


With really strange symptoms:
Windows Vista will BSOD when both MSI/MSI-X registers are present, 
and Windows 7 will hang as Windows (apparently) thinks MSI/MSI-X is 
enabled, whereas qemu doesn't and uses INTx.

So the Windows 7 guest will never see any interrupts.

The _really_ odd thing is that when I remove the MSI-X capability 
Windows will fall back to INTx and everything works.


Even more curious is that from the logs Windows will only ever write 
zeros into the MSI/MSI-X config registers.

Which makes me wonder what's going on there.

As I'm not sure if that's my fault I was wondering if anybody every 
succeeded in getting AHCI to use MSI under Windows.

Any pointers?

Cheers,

Hannes
--
Dr. Hannes Reinecke   zSeries  Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)

Re: [Qemu-devel] [PATCH 6/7] block: add header update capability for VHDX images.