date:20130509

08.05.2013 17:25, Anthony Liguori wrote:
 Michael Tokarev m...@tls.msk.ru writes:
 
 08.05.2013 13:46, Trival wrote:
 Signed-off-by: Trival triv...@linux.vnet.ibm.com
 
 Something went wrong in sending this.  This is not a valid SoB.

So, do we not accept it?  Should I revert it in the
trivial-patches-next ?

Thanks,

/mjt

Re: [Qemu-devel] [PATCH V18 5/6] add-cow file format core code.

On Thu, Apr 18, 2013 at 6:03 PM, Stefan Hajnoczi stefa...@gmail.com wrote:
 On Wed, Apr 10, 2013 at 04:11:52PM +0800, Dong Xu Wang wrote:
 +header.cluster_bits = ffs(cluster_size) - 1;
 +if (header.cluster_bits  MIN_CLUSTER_BITS ||
 +header.cluster_bits  MAX_CLUSTER_BITS ||
 +(1  header.cluster_bits) != cluster_size) {
 +error_report(
 +Cluster size must be a power of two between %d and %dk,
 +1  MIN_CLUSTER_BITS, 1  (MAX_CLUSTER_BITS - 10));
 +return -EINVAL;
 +}
 +
 +   header.header_size = MAX(cluster_size, DEFAULT_HEADER_SIZE);

 Indentation.

 +if (backing_filename) {
 +header.backing_offset = sizeof(header);
 +header.backing_size = strlen(backing_filename);
 +
 +if (!backing_fmt) {
 +backing_bs = bdrv_new(image);
 +ret = bdrv_open(backing_bs, backing_filename, NULL,
 +BDRV_O_RDWR | BDRV_O_CACHE_WB, NULL);
 +if (ret  0) {
 +return ret;

 backing_bs is leaked.

 +ret = bdrv_file_open(bs, filename, NULL, BDRV_O_RDWR);
 +if (ret  0) {
 +return ret;
 +}
 +snprintf(header.backing_fmt, sizeof(header.backing_fmt), %s,
 + backing_fmt ? backing_fmt : );
 +snprintf(header.image_fmt, sizeof(header.image_fmt), %s,
 + image_format ? image_format : raw);

 snprintf() doesn't have the semantics in the add-cow specification:

  44 - 59:backing file format
   Format of backing file. It will be filled with
   0 if backing file name offset is 0. If backing
   file name offset is non-empty, it must be
   non-empty. It is coded in free-form ASCII, and
   is not NUL-terminated. Zero padded on the right.

   60 - 75:image file format
   Format of image file. It must be non-empty. It
   is coded in free-form ASCII, and is not
   NUL-terminated. Zero padded on the right.

 strncpy() does the zero padding and doesn't NUL-terminate if the max buffer
 size is used.

 +if ((s-header.compat_features  ACOW_F_ALL_ALLOCATED) == 0) {
 +snprintf(bs-backing_format, sizeof(bs-backing_format),
 + %s, s-header.backing_fmt);

 s-header.backing_fmt is not NUL-terminated so using snprintf() is
 inappropriate (could it read beyond the end of .backing_fmt?).

 +}
 +
 +if (s-header.cluster_bits  MIN_CLUSTER_BITS ||
 +s-header.cluster_bits  MAX_CLUSTER_BITS) {
 +ret = -EINVAL;
 +goto fail;
 +}
 +
 +s-cluster_size = 1  s-header.cluster_bits;
 +if (s-header.header_size != MAX(s-cluster_size, DEFAULT_HEADER_SIZE)) 
 {
 +char buf[64];
 +snprintf(buf, sizeof(buf), Header size: %d,

 %u or PRIu32 since header_size is uint32_t.  This avoids compiler or
 code scanner warnings.

 +s-image_hd = bdrv_new();
 +ret = bdrv_open(s-image_hd, image_filename, NULL, flags,
 +bdrv_find_format(s-header.image_fmt));

 Cannot use image_fmt as a string since it is not NUL-terminated.

 +static coroutine_fn int add_cow_co_writev(BlockDriverState *bs,
 +  int64_t sector_num,
 +  int remaining_sectors,
 +  QEMUIOVector *qiov)
 +{
 +BDRVAddCowState *s = bs-opaque;
 +int ret = 0, i;
 +QEMUIOVector hd_qiov;
 +uint8_t *table;
 +uint64_t offset;
 +int mask = s-cluster_sectors - 1;
 +int cluster_mask = s-cluster_size - 1;
 +
 +qemu_co_mutex_lock(s-lock);
 +qemu_iovec_init(hd_qiov, qiov-niov);
 +ret = bdrv_co_writev(s-image_hd, sector_num,
 + remaining_sectors, qiov);

 All writes are serialized.  This means write performance will be very
 poor for multi-threaded workloads.

 qcow2 tracks allocating writes and allows them to execute at the same
 time if they do not overlap clusters.

 +
 +if (ret  0) {
 +goto fail;
 +}
 +if ((s-header.compat_features  ACOW_F_ALL_ALLOCATED) == 0) {
 +/* Copy content of unmodified sectors */
 +if (!is_cluster_head(sector_num, s-cluster_sectors)
 + !is_allocated(bs, sector_num)) {
 +ret = copy_sectors(bs, sector_num  ~mask, sector_num);
 +if (ret  0) {
 +goto fail;
 +}
 +}
 +
 +if (!is_cluster_tail(sector_num + remaining_sectors - 1,
 + s-cluster_sectors)
 + !is_allocated(bs, sector_num + remaining_sectors - 1)) {
 +ret = copy_sectors(bs, sector_num + remaining_sectors,
 +   ((sector_num + remaining_sectors) | mask) + 
 1);
 +if (ret  0) {
 +goto fail;
 +}
 +}

 This trashes the cluster when remaining_sectors = 0, sector_num =
 cluster_sectors, and sector cluster_sectors - 1 is

Re: [Qemu-devel] [PATCH v5 0/5] KVM flash memory support

Il 09/05/2013 00:44, Jordan Justen ha scritto:
 git://github.com/jljusten/qemu.git kvm-flash-v5
 
 Utilize KVM_CAP_READONLY_MEM to support PC system flash emulation
 with KVM.
 
 v5:
  * Remove patch to pflash_cfi01 which enabled readonly mode
  * Adjust kvm code to use KVM READONLY support for ranges that
either have the readonly flag set, or for devices with
readable set.
 
 v4:
  * With a machine type of isapc, don't mark the BIOS as read-only.
isapc + seabios will not boot if the BIOS is read-only. This
matches the current behavior of isapc with KVM, which is the
only mode under which isapc currently works.
 
 v3:
  * Squash patch 2  3 based on Xiao's feedback that what I
was calling a 'workaround' in patch 3 was actually what
is required by the KVM READONLY memory support.
 
 v2:
  * Remove rom_only from PC_COMPAT_1_4
  * Only enable flash when a pflash drive is created.
 
 Jordan Justen (5):
   isapc: Fix non-KVM qemu boot (read/write memory for isapc BIOS)
   kvm: add kvm_readonly_mem_enabled
   kvm: support using KVM_MEM_READONLY flag for regions
   pc_sysfw: allow flash (-pflash) memory to be used with KVM
   pc_sysfw: change rom_only default to 0
 
  hw/block/pc_sysfw.c  |   64 
 --
  hw/i386/pc_piix.c|5 
  include/hw/i386/pc.h |4 
  include/sysemu/kvm.h |   10 
  kvm-all.c|   44 +++---
  kvm-stub.c   |1 +
  6 files changed, 92 insertions(+), 36 deletions(-)
 

Reviewed-by: Paolo Bonzini pbonz...@redhat.com

Re: [Qemu-devel] [libvirt]virtio serial device problem

2013-05-09 Thread Alexey Kardashevskiy

On 05/09/2013 04:07 PM, Li Zhang wrote:
 On 2013年05月08日 23:53, fred.kon...@greensocs.com wrote:
 On 05/07/2013 07:50 PM, Paolo Bonzini wrote:
 Il 07/05/2013 09:20, Li Zhang ha scritto:
 Hi all,
 Hi,
 When we use the latest version of QEMU to build ovirt,
 we get this error reported from libvirt.
 What QEMU commit is this?

 b3e6d591b05538056d665572f3e3bbfb3cbb70e7
 This commit is from 05/29 no?

 there were issues with that. But it should be fixed.

 Do you still have the command-line issue with the last git?

 See commit 80270a19685dd20eda017b0360c743b3e3ed6f57
 
 Hi Fred,
 
 This patch is to change bus which can be compatible with old version, right?
 But I saw the current name is still different from old version.
 
 The current name is: virtio-serial-bus0.0
 The old version is: virtio-serial0.0
 
 Is it possible to change it back to the old name?


This is what the most recent qemu produces:

bus: pci
  type PCI
  dev: virtio-serial-pci, id virtio-serial0
ioeventfd = on
vectors = 2
class = 0x780
indirect_desc = on
event_idx = on
max_ports = 31
addr = 03.0
romfile = null
rombar = 1
multifunction = off
command_serr_enable = on
class Class 0780, addr 00:03.0, pci id 1af4:1003 (sub 1af4:0003)
bar 0: i/o at 0x [0x1e]
bar 1: mem at 0x [0xffe]
bus: virtio-bus
  type virtio-pci-bus
  dev: virtio-serial-device, id 
max_ports = 31
bus: virtio-serial0.0
  type virtio-serial-bus
  dev: virtserialport, id channel0
chardev = charchannel0
nr = 1
name = com.redhat.rhevm.vdsm
port 1, guest off, host off, throttle off


The device layout is new, the bus name is old - virtio-serial0.0,
everything should be ok now.



 Thanks. :)
 --Li

 Thanks,
 Fred


 It might have been fixed already.
 Hm. From what I see, it is all correct from the qemu side, the problem is
 in libvirt which does not know about virtio-pci-bus yet.



 Paolo

 qemu-system-ppc64: -device
 virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm:


 Bus 'virtio-serial0.0' is full
 qemu-system-ppc64: -device
 virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm:


 Bus 'virtio-serial0.0' not found

 Libvirt helps create QEMU command line and put virtserialport device to
 bus virtio-serial0.0.
 For latest version of QEMU, the bus type is changed.

 (qemu) info qtree
 bus: main-system-bus
type System
dev: spapr-pci-host-bridge, id 
  index = 0
  buid = 0x8002000
  liobn = 0x8000
  mem_win_addr = 0x100a000
  mem_win_size = 0x2000
  io_win_addr = 0x1008000
  io_win_size = 0x1
  msi_win_addr = 0x1009000
  irq 0
  bus: pci
type PCI
dev: virtio-serial-pci, id virtio-serial0
  ioeventfd = on
  vectors = 2
  class = 0x780
  indirect_desc = on
  event_idx = on
  max_ports = 31
  addr = 03.0
  romfile = null
  rombar = 1
  multifunction = off
  command_serr_enable = on
  class Class 0780, addr 00:03.0, pci id 1af4:1003 (sub
 1af4:0003)
  bar 0: i/o at 0x [0x1e]
  bar 1: mem at 0x [0xffe]
  bus: virtio-serial0.0
type virtio-pci-bus
dev: virtio-serial-device, id 
  max_ports = 31
  bus: virtio-serial-bus.0
type virtio-serial-bus
dev: virtserialport, id channel1
  chardev = charchannel1
  nr = 2
  name = org.qemu.guest_agent.0
  port 2, guest off, host off, throttle off
dev: virtserialport, id channel0
  chardev = charchannel0
  nr = 1
  name = com.redhat.rhevm.vdsm
  port 1, guest off, host off, throttle off


 But we tried to replace virtio-serial0.0 with virtio-serial-bus.0,
 SLOF crashes. It still doesn't work at all.

 Does anyone know how to use virtserialport in QEMU command line?
 If configuration is changed in QEMU, libvirt also needs to change it
 accordingly.

 Thanks. :)
 --Li




 -- 
 Alexey Kardashevskiy
 IBM OzLabs, LTC Team

 e-mail: a...@au1.ibm.com
 notes: Alexey Kardashevskiy/Australia/IBM





-- 
Alexey Kardashevskiy
IBM OzLabs, LTC Team

e-mail: a...@au1.ibm.com
notes: Alexey Kardashevskiy/Australia/IBM

Re: [Qemu-devel] [PATCH] curl: fix curl read

On Fri, May 03, 2013 at 04:00:09PM +0800, Fam Zheng wrote:
 @@ -391,7 +427,12 @@ static QemuOptsList runtime_opts = {
  .type = QEMU_OPT_SIZE,
  .help = Readahead size,
  },
 -{ /* end of list */ }
 +{
 +.name = ssl_no_cert,
 +.type = QEMU_OPT_BOOL,
 +.help = SSL certificate check,
 +},

This new option should be in a separate patch.

 +if (!strncmp(s-url, http://;, strlen(http://;))  !s-accept_range) {
 +strncpy(state-errmsg, Server not supporting range., 
 CURL_ERROR_SIZE);
 +goto out;
 +}

This check is unrelated to the API change and should be in a separate
patch.

  s-multi = curl_multi_init();
 -curl_multi_setopt( s-multi, CURLMOPT_SOCKETDATA, s); 
 -curl_multi_setopt( s-multi, CURLMOPT_SOCKETFUNCTION, curl_sock_cb ); 
 -curl_multi_do(s);
 +if (!s-multi) {
 +goto out_noclean;
 +}
 +curl_multi_setopt(s-multi, CURLMOPT_SOCKETDATA, s);
 +curl_multi_setopt(s-multi, CURLMOPT_SOCKETFUNCTION, curl_sock_cb);
 +curl_multi_setopt(s-multi, CURLMOPT_TIMERDATA, s);
 +curl_multi_setopt(s-multi, CURLMOPT_TIMERFUNCTION, curl_multi_timer_cb);
 +curl_multi_socket_action(s-multi, CURL_SOCKET_TIMEOUT, 0, running);

The timeout should be added in a separate patch.

 +cache = curl_find_cache(s, aio_base, aio_bytes);
 +if (cache) {
 +curl_complete_io(s, acb, cache);
 +return;
  }

What is the point of the cache?  Can you split it into a separate patch?

 +/* Try to release some cache */
 +while (0  s-cache_quota = 0) {

while 0?

[Qemu-devel] Jiajun, add me to your LinkedIn network?

2013-05-09 Thread Zhou Chunhua

LinkedIn





Zhou Chunhua requested to add you as a connection on LinkedIn:
  

--

Jiajun,

I'd like to add you to my professional network on LinkedIn.

- Zhou

Accept invitation from Zhou Chunhua
http://www.linkedin.com/e/-kkb1ec-hghkilms-5b/qTMmi8QEI_f3FNXUkL1mvZgy00BGYniwg3/blk/I516399382_11/3wOtCVFbmdxnSVFbm8JrnpKqlZJrmZzbmNJpjRQnOpBtn9QfmhBt71BoSd1p65Lr6lOfP4NnP8UcPAVcPoNdkALekxPcCNSdR8LczoTdz0UczcOc34LrCBxbOYWrSlI/eml-comm_invm-b-in_ac-inv28/?hs=falsetok=2MY3cz4ADhWRI1

View profile of Zhou Chunhua
http://www.linkedin.com/e/-kkb1ec-hghkilms-5b/rso/223045405/29qh/name/86670410_I516399382_11/?hs=falsetok=1agjIHEZvhWRI1
--
You are receiving Invitation emails.


This email was intended for Jiajun Liu.
Learn why this is included: 
http://www.linkedin.com/e/-kkb1ec-hghkilms-5b/plh/http%3A%2F%2Fhelp%2Elinkedin%2Ecom%2Fapp%2Fanswers%2Fdetail%2Fa_id%2F4788/-GXI/?hs=falsetok=2wwvEflx7hWRI1

(c) 2012, LinkedIn Corporation. 2029 Stierlin Ct, Mountain View, CA 94043, USA.

Re: [Qemu-devel] [ANNOUNCE] QEMU 1.5.0-rc1 is now available

2013-05-09 Thread Dongsheng Song

On Thu, May 9, 2013 at 6:29 AM, Anthony Liguori aligu...@us.ibm.com wrote:

 Hi,

 On behalf of the QEMU Team, I'd like to announce the availability of the
 second release candidate for the QEMU 1.5 release.  This release is meant
 for testing purposes and should not be used in a production environment.

 http://wiki.qemu.org/download/qemu-1.5.0-rc1.tar.bz2

 You can help improve the quality of the QEMU 1.5 release by testing this
 release and reporting bugs on Launchpad:

 https://bugs.launchpad.net/qemu/

 The release plan for the 1.5 release is available at:

 http://wiki.qemu.org/Planning/1.5

 Please add entries to the ChangeLog for the 1.5 release below:

 http://wiki.qemu.org/ChangeLog/Next

 This following changes have been made since 1.5.0-rc0:

  - virtio: properly validate address before accessing config (Jason Wang)
  - virtio-pci: fix level interrupts (Michael S. Tsirkin)
  - PPC: Fix rldcl (Alexander Graf)
  - PPC: Depend behavior of cmp instructions only on instruction encoding 
 (Alexander Graf)
  - target-mips: fix incorrect behaviour for INSV (Petar Jovanovic)
  - target-mips: add missing check_dspr2 for multiply instructions (Petar 
 Jovanovic)
  - qemu-iotests: fix 017 018 for vmdk (Fam Zheng)
  - qemu-iotests: exclude vmdk and qcow from 043 (Fam Zheng)
  - qemu-iotests: exclude vmdk for test 042 (Fam Zheng)
  - qtest/ide-test: Test short and long PRDTs (Kevin Wolf)
  - qtest/ide-test: Add simple DMA read/write test case (Kevin Wolf)
  - qtest: Add IDE test case (Kevin Wolf)
  - libqos/pci: Enable bus mastering (Kevin Wolf)
  - ide: Reset BMIDEA bit when the bus master is stopped (Kevin Wolf)
  - de_DE.po: Add missing leading spaces (Kevin Wolf)
  - ahci: Don't allow creating slave drives (Kevin Wolf)

 Regards,

 Anthony Liguori


For convenience, I had upload qemu-build-dependency-r1.zip,
qemu-1.5.0-rc1-win32.7z and qemu-1.5.0-rc1-win64.7z

https://code.google.com/p/i18n-zh/downloads/list

Regards,
Dongsheng

Re: [Qemu-devel] [Bug 1176366] [NEW] TCPIP not working on qemu 1.4.50 (master)

On Sat, May 04, 2013 at 04:13:19PM -, TC1988 wrote:
 whenever I try, in the guest OS, in this case it's NT 3.1, to enable
 TCP/IP, it crashes the whole emulator. With either the ne2000 isa,
 ne2000 pci or PCnet, still crashes
 
 below is attached a screenshot.

Please use git-bisect(1) to identify the commit that broke networking.

http://git-scm.com/book/en/Git-Tools-Debugging-with-Git#Binary-Search
https://www.kernel.org/pub/software/scm/git/docs/git-bisect.html

Stefan

Re: [Qemu-devel] [PATCH for-1.5] virtio-pci: bugfix

On Mon, May 06, 2013 at 06:00:27PM +0300, Michael S. Tsirkin wrote:
 mask notifiers are never called without msix,
 so devices with backend masking like vhost don't work.
 Call mask notifiers explicitly at
 startup/cleanup to make it work.
 
 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 Tested-by: Alexander Graf ag...@suse.de
 
 ---
  hw/virtio/virtio-pci.c | 4 
  1 file changed, 4 insertions(+)

Please choose a descriptive commit message, not just bugfix.

Stefan

Re: [Qemu-devel] [PATCH] curl: fix curl read

2013-05-09 Thread Fam Zheng

On Thu, 05/09 08:41, Stefan Hajnoczi wrote:
 On Fri, May 03, 2013 at 04:00:09PM +0800, Fam Zheng wrote:
  @@ -391,7 +427,12 @@ static QemuOptsList runtime_opts = {
   .type = QEMU_OPT_SIZE,
   .help = Readahead size,
   },
  -{ /* end of list */ }
  +{
  +.name = ssl_no_cert,
  +.type = QEMU_OPT_BOOL,
  +.help = SSL certificate check,
  +},
 
 This new option should be in a separate patch.

OK. I'll try to split them to patches.

 
  +if (!strncmp(s-url, http://;, strlen(http://;))  
  !s-accept_range) {
  +strncpy(state-errmsg, Server not supporting range., 
  CURL_ERROR_SIZE);
  +goto out;
  +}
 
 This check is unrelated to the API change and should be in a separate
 patch.
 
   s-multi = curl_multi_init();
  -curl_multi_setopt( s-multi, CURLMOPT_SOCKETDATA, s); 
  -curl_multi_setopt( s-multi, CURLMOPT_SOCKETFUNCTION, curl_sock_cb ); 
  -curl_multi_do(s);
  +if (!s-multi) {
  +goto out_noclean;
  +}
  +curl_multi_setopt(s-multi, CURLMOPT_SOCKETDATA, s);
  +curl_multi_setopt(s-multi, CURLMOPT_SOCKETFUNCTION, curl_sock_cb);
  +curl_multi_setopt(s-multi, CURLMOPT_TIMERDATA, s);
  +curl_multi_setopt(s-multi, CURLMOPT_TIMERFUNCTION, 
  curl_multi_timer_cb);
  +curl_multi_socket_action(s-multi, CURL_SOCKET_TIMEOUT, 0, running);
 
 The timeout should be added in a separate patch.
 
  +cache = curl_find_cache(s, aio_base, aio_bytes);
  +if (cache) {
  +curl_complete_io(s, acb, cache);
  +return;
   }
 
 What is the point of the cache?  Can you split it into a separate patch?

The cache is for prefetch. Data is fetched by 256k chunks using libcurl
and stored in cache to fill future io request, reducing overall http
request overhead.

-- 
Fam

Re: [Qemu-devel] [Qemu-trivial] [PATCH 2/2] remove needless semicolon

2013-05-09 Thread Hu Tao

On Thu, May 09, 2013 at 10:23:09AM +0400, Michael Tokarev wrote:
 08.05.2013 17:25, Anthony Liguori wrote:
  Michael Tokarev m...@tls.msk.ru writes:
  
  08.05.2013 13:46, Trival wrote:
  Signed-off-by: Trival triv...@linux.vnet.ibm.com

I think wdongxu probably wants HIS name and email go here. wdongxu, can
you resend your patches with:

  Signed-off-by: your real name wdon...@linux.vnet.ibm.com

?

  
  Something went wrong in sending this.  This is not a valid SoB.
 
 So, do we not accept it?  Should I revert it in the
 trivial-patches-next ?
 
 Thanks,
 
 /mjt

Re: [Qemu-devel] [libvirt]virtio serial device problem

2013-05-09 Thread Li Zhang


On 2013年05月09日 14:31, Alexey Kardashevskiy wrote:

On 05/09/2013 04:07 PM, Li Zhang wrote:

On 2013年05月08日 23:53, fred.kon...@greensocs.com wrote:

On 05/07/2013 07:50 PM, Paolo Bonzini wrote:

Il 07/05/2013 09:20, Li Zhang ha scritto:

Hi all,

Hi,

When we use the latest version of QEMU to build ovirt,
we get this error reported from libvirt.

What QEMU commit is this?

b3e6d591b05538056d665572f3e3bbfb3cbb70e7

This commit is from 05/29 no?

there were issues with that. But it should be fixed.

Do you still have the command-line issue with the last git?

See commit 80270a19685dd20eda017b0360c743b3e3ed6f57

Hi Fred,

This patch is to change bus which can be compatible with old version, right?
But I saw the current name is still different from old version.

The current name is: virtio-serial-bus0.0
The old version is: virtio-serial0.0

Is it possible to change it back to the old name?


This is what the most recent qemu produces:

 bus: pci
   type PCI
   dev: virtio-serial-pci, id virtio-serial0
 ioeventfd = on
 vectors = 2
 class = 0x780
 indirect_desc = on
 event_idx = on
 max_ports = 31
 addr = 03.0
 romfile = null
 rombar = 1
 multifunction = off
 command_serr_enable = on
 class Class 0780, addr 00:03.0, pci id 1af4:1003 (sub 1af4:0003)
 bar 0: i/o at 0x [0x1e]
 bar 1: mem at 0x [0xffe]
 bus: virtio-bus
   type virtio-pci-bus
   dev: virtio-serial-device, id 
 max_ports = 31
 bus: virtio-serial0.0
   type virtio-serial-bus
   dev: virtserialport, id channel0
 chardev = charchannel0
 nr = 1
 name = com.redhat.rhevm.vdsm
 port 1, guest off, host off, throttle off


The device layout is new, the bus name is old - virtio-serial0.0,
everything should be ok now.




Alexey, thanks.
It seems that both of them can be recognized. :)


Thanks. :)
--Li

Thanks,
Fred


It might have been fixed already.

Hm. From what I see, it is all correct from the qemu side, the problem is
in libvirt which does not know about virtio-pci-bus yet.




Paolo


qemu-system-ppc64: -device
virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm:


Bus 'virtio-serial0.0' is full
qemu-system-ppc64: -device
virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm:


Bus 'virtio-serial0.0' not found

Libvirt helps create QEMU command line and put virtserialport device to
bus virtio-serial0.0.
For latest version of QEMU, the bus type is changed.

(qemu) info qtree
bus: main-system-bus
type System
dev: spapr-pci-host-bridge, id 
  index = 0
  buid = 0x8002000
  liobn = 0x8000
  mem_win_addr = 0x100a000
  mem_win_size = 0x2000
  io_win_addr = 0x1008000
  io_win_size = 0x1
  msi_win_addr = 0x1009000
  irq 0
  bus: pci
type PCI
dev: virtio-serial-pci, id virtio-serial0
  ioeventfd = on
  vectors = 2
  class = 0x780
  indirect_desc = on
  event_idx = on
  max_ports = 31
  addr = 03.0
  romfile = null
  rombar = 1
  multifunction = off
  command_serr_enable = on
  class Class 0780, addr 00:03.0, pci id 1af4:1003 (sub
1af4:0003)
  bar 0: i/o at 0x [0x1e]
  bar 1: mem at 0x [0xffe]
  bus: virtio-serial0.0
type virtio-pci-bus
dev: virtio-serial-device, id 
  max_ports = 31
  bus: virtio-serial-bus.0
type virtio-serial-bus
dev: virtserialport, id channel1
  chardev = charchannel1
  nr = 2
  name = org.qemu.guest_agent.0
  port 2, guest off, host off, throttle off
dev: virtserialport, id channel0
  chardev = charchannel0
  nr = 1
  name = com.redhat.rhevm.vdsm
  port 1, guest off, host off, throttle off


But we tried to replace virtio-serial0.0 with virtio-serial-bus.0,
SLOF crashes. It still doesn't work at all.

Does anyone know how to use virtserialport in QEMU command line?
If configuration is changed in QEMU, libvirt also needs to change it
accordingly.

Thanks. :)
--Li




--
Alexey Kardashevskiy
IBM OzLabs, LTC Team

e-mail: a...@au1.ibm.com
notes: Alexey Kardashevskiy/Australia/IBM

Re: [Qemu-devel] [Qemu-trivial] [PATCH 2/2] remove needless semicolon

On 9 May 2013 08:08, Hu Tao hu...@cn.fujitsu.com wrote:
 On Thu, May 09, 2013 at 10:23:09AM +0400, Michael Tokarev wrote:
 08.05.2013 17:25, Anthony Liguori wrote:
  Michael Tokarev m...@tls.msk.ru writes:
 
  08.05.2013 13:46, Trival wrote:
  Signed-off-by: Trival triv...@linux.vnet.ibm.com

 I think wdongxu probably wants HIS name and email go here. wdongxu, can
 you resend your patches with:

   Signed-off-by: your real name wdon...@linux.vnet.ibm.com

 ?

You should make sure you fix the From: line as well
as the Signed-off-by:, please.

thanks
-- PMM

Re: [Qemu-devel] Few Qs related to using qemu-nbd for exporting snapshot

On Wed, May 08, 2013 at 10:30:29AM +0530, Deepak C Shetty wrote:
 I am looking at using qemu-nbd to export an existing snapshot to a Backup 
 virtual appliance (VA) and had the following Qs in that context...
 
 1) Exporting an image using unix socket (-k option) Vs using
 --connect=/dev/nbd0, which one is better / preferred ? The idea being
 once exported, the disk will be hotplugged into Backup VA which will
 then use it as a src for backup operation.

--connect=/dev/nbd0 requires that the host kernel ships the nbd.ko
driver.  RHEL doesn't support nbd.ko, other distros may or may not.

Why are you using qemu-nbd at all if the backup appliance is on the same
host?  Would it be possible to attach the image file directly to the
virtual appliance?

 2) If i use the --connect option, it requires the nbd kernel module to
 be loaded and i also get a nice --disconnect way of terminating the nbd
 export.
 
 Using -k option qemu-nbd doesn't need the nbd kernel module, but the
 cmdline blocks and sending SIGTERM is the only way to terminate the nbd
 export... Is there a better way to terminate qemu-nbd if using the -k
 option ? If not, i assume SIGTERM would gracefully shut down qemu-nbd ?

SIGTERM stops qemu-nbd properly.

 3) I was not able to get qemu-nbd running without the -t (persistent)
 option. Ideally without -t would be suited, since there is only one
 client/connection per nbd export and the nbd server would nicely shut
 itself down.. but i get the below error 
 
 server as...
 $ qemu-nbd -k /tmp/nbds1 ./debian_lenny_i386_standard.qcow2
 
 client as...
 $ qemu-system-x86_64 -m 1024 -drive file=nbd:unix:/tmp/nbds1
 connect(unix:/tmp/nbds1): No such file or directory
 qemu-system-x86_64: -drive file=nbd:unix:/tmp/nbds1: could not open disk
 image nbd:unix:/tmp/nbds1: No such file or directory
 
 nbds1 is indeed there...
 srwxrwxr-x. 1 dpkshetty dpkshetty 0 May  3 14:24 /tmp/nbds1
 
 and then the server returns to the shell.. echo $? is 0

Try -drive file=nbd:...,format=raw to prevent format probing.  Recent
QEMU versions avoid reopening the image file here.  Even this may not
help if QEMU probes the disk geometry by opening and inspecting the MBR.

Kevin can explain the current state of probing better than me.

Stefan

Re: [Qemu-devel] qemu-img problem when create a file larger than fs's size

On Wed, May 08, 2013 at 01:18:17PM +0800, yuxh wrote:
 I have to consult you a qemu-img's problem.
 
 Is this reasonable to create a file which is larger than the
 available size of the fs by qemu-img cmd ?
 
 When I use qemu-img create a file which is larger than the available
 size of the fs, the creation is completed succesfully.
 
 However when I use this file in guest as a guest's disk, and write
 beyond the size the host file can provides, the guest was paused by
 qemu-kvm or libvirt and was in maybe a infinite circle where the
 guest just can't be used except I detach the disk from guest or
 destroy the guest.

You can change the ENOSPC policy with -drive werror=,rerror=.  See the
QEMU man page.

The default behavior is to pause the guest so the host admin can free up
or add space.  Then the guest can be continued - this will retry the
I/O.

But you can also tell QEMU to pass the error through to the guest using
the -drive werror=,rerror= options.

Stefan

Re: [Qemu-devel] [Qemu-trivial] [PATCH 2/2] remove needless semicolon


On 2013/5/9 15:08, Hu Tao wrote:

On Thu, May 09, 2013 at 10:23:09AM +0400, Michael Tokarev wrote:

08.05.2013 17:25, Anthony Liguori wrote:

Michael Tokarev m...@tls.msk.ru writes:


08.05.2013 13:46, Trival wrote:

Signed-off-by: Trival triv...@linux.vnet.ibm.com


I think wdongxu probably wants HIS name and email go here. wdongxu, can
you resend your patches with:


Sorry for the inconvenience, will re-send the patches.

   Signed-off-by: your real name wdon...@linux.vnet.ibm.com

?



Something went wrong in sending this.  This is not a valid SoB.


So, do we not accept it?  Should I revert it in the
trivial-patches-next ?

Thanks,

/mjt

[Qemu-devel] [PATCH 1/2] clean unnecessary code: don't check g_strdup arg for NULL

Signed-off-by: Dong Xu Wang wdon...@linux.vnet.ibm.com
---
 util/uri.c | 2 +-
 vl.c   | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/util/uri.c b/util/uri.c
index 4238729..e348c17 100644
--- a/util/uri.c
+++ b/util/uri.c
@@ -2162,7 +2162,7 @@ query_params_append (struct QueryParams *ps,
 }
 
 ps-p[ps-n].name = g_strdup(name);
-ps-p[ps-n].value = value ? g_strdup(value) : NULL;
+ps-p[ps-n].value = g_strdup(value);
 ps-p[ps-n].ignore = 0;
 ps-n++;
 
diff --git a/vl.c b/vl.c
index 6e6225f..be0a93c 100644
--- a/vl.c
+++ b/vl.c
@@ -1215,7 +1215,7 @@ void add_boot_device_path(int32_t bootindex, DeviceState 
*dev,
 
 node = g_malloc0(sizeof(FWBootEntry));
 node-bootindex = bootindex;
-node-suffix = suffix ? g_strdup(suffix) : NULL;
+node-suffix = g_strdup(suffix);
 node-dev = dev;
 
 QTAILQ_FOREACH(i, fw_boot_order, link) {
-- 
1.7.11.7

[Qemu-devel] [PATCH 2/2] remove double semicolons

Signed-off-by: Dong Xu Wang wdon...@linux.vnet.ibm.com
---
 block/nbd.c | 2 +-
 fsdev/virtfs-proxy-helper.c | 4 ++--
 hw/9pfs/virtio-9p-local.c   | 2 +-
 hw/i386/pc_q35.c| 2 +-
 hw/intc/imx_avic.c  | 2 +-
 hw/usb/host-linux.c | 4 ++--
 qga/channel-win32.c | 2 +-
 7 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index fab114b..30e3b78 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -609,7 +609,7 @@ static int nbd_co_discard(BlockDriverState *bs, int64_t 
sector_num,
 return 0;
 }
 request.type = NBD_CMD_TRIM;
-request.from = sector_num * 512;;
+request.from = sector_num * 512;
 request.len = nb_sectors * 512;
 
 nbd_coroutine_start(s, request);
diff --git a/fsdev/virtfs-proxy-helper.c b/fsdev/virtfs-proxy-helper.c
index 36f6616..713a7b2 100644
--- a/fsdev/virtfs-proxy-helper.c
+++ b/fsdev/virtfs-proxy-helper.c
@@ -248,7 +248,7 @@ static int send_fd(int sockfd, int fd)
 static int send_status(int sockfd, struct iovec *iovec, int status)
 {
 ProxyHeader header;
-int retval, msg_size;;
+int retval, msg_size;
 
 if (status  0) {
 header.type = T_ERROR;
@@ -381,7 +381,7 @@ static int send_response(int sock, struct iovec *iovec, int 
size)
 proxy_marshal(iovec, 0, dd, header.type, header.size);
 retval = socket_write(sock, iovec-iov_base, header.size + PROXY_HDR_SZ);
 if (retval  0) {
-return retval;;
+return retval;
 }
 return 0;
 }
diff --git a/hw/9pfs/virtio-9p-local.c b/hw/9pfs/virtio-9p-local.c
index be898ec..6ece6f7 100644
--- a/hw/9pfs/virtio-9p-local.c
+++ b/hw/9pfs/virtio-9p-local.c
@@ -878,7 +878,7 @@ static int local_remove(FsContext *ctx, const char *path)
  * Now remove the name from parent directory
  * .virtfs_metadata directory
  */
-err = remove(local_mapped_attr_path(ctx, path, buffer));;
+err = remove(local_mapped_attr_path(ctx, path, buffer));
 if (err  0  errno != ENOENT) {
 /*
  * We didn't had the .virtfs_metadata file. May be file created
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index 4160e2b..6825380 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -128,7 +128,7 @@ static void pc_q35_init(QEMUMachineInitArgs *args)
 q35_host-mch.ram_memory = ram_memory;
 q35_host-mch.pci_address_space = pci_memory;
 q35_host-mch.system_memory = get_system_memory();
-q35_host-mch.address_space_io = get_system_io();;
+q35_host-mch.address_space_io = get_system_io();
 q35_host-mch.below_4g_mem_size = below_4g_mem_size;
 q35_host-mch.above_4g_mem_size = above_4g_mem_size;
 /* pci */
diff --git a/hw/intc/imx_avic.c b/hw/intc/imx_avic.c
index 4e280b6..ff45dcd 100644
--- a/hw/intc/imx_avic.c
+++ b/hw/intc/imx_avic.c
@@ -370,7 +370,7 @@ static void imx_avic_reset(DeviceState *dev)
 
 static int imx_avic_init(SysBusDevice *dev)
 {
-IMXAVICState *s = FROM_SYSBUS(IMXAVICState, dev);;
+IMXAVICState *s = FROM_SYSBUS(IMXAVICState, dev);
 
 memory_region_init_io(s-iomem, imx_avic_ops, s, imx_avic, 0x1000);
 sysbus_init_mmio(dev, s-iomem);
diff --git a/hw/usb/host-linux.c b/hw/usb/host-linux.c
index 8994668..ca09a89 100644
--- a/hw/usb/host-linux.c
+++ b/hw/usb/host-linux.c
@@ -651,7 +651,7 @@ static void usb_host_handle_reset(USBDevice *dev)
 
 trace_usb_host_reset(s-bus_num, s-addr);
 
-usb_host_do_reset(s);;
+usb_host_do_reset(s);
 
 usb_host_claim_interfaces(s, 0);
 usb_linux_update_endp_table(s);
@@ -1429,7 +1429,7 @@ static void usb_host_exit_notifier(struct Notifier *n, 
void *data)
 
 usb_host_release_port(s);
 if (s-fd != -1) {
-usb_host_do_reset(s);;
+usb_host_do_reset(s);
 }
 }
 
diff --git a/qga/channel-win32.c b/qga/channel-win32.c
index 7ed98d7..8a303f3 100644
--- a/qga/channel-win32.c
+++ b/qga/channel-win32.c
@@ -268,7 +268,7 @@ static GIOStatus ga_channel_write(GAChannel *c, const char 
*buf, size_t size,
 
 GIOStatus ga_channel_write_all(GAChannel *c, const char *buf, size_t size)
 {
-GIOStatus status = G_IO_STATUS_NORMAL;;
+GIOStatus status = G_IO_STATUS_NORMAL;
 size_t count;
 
 while (size) {
-- 
1.7.11.7

Re: [Qemu-devel] [PATCH 0/6] Drop the irredeemably racy cpu_unlink_tb()

[Rehashing a relatively old thread.
 It is available, in particular, at 
http://thread.gmane.org/gmane.comp.emulators.qemu/196629]

22.02.2013 22:09, Peter Maydell wrote:
 This patch series gets rid of cpu_unlink_tb(), which is irredeemably
 racy, since it modifies the TB graph with no locking from other
 threads, signal handlers, etc etc. (The signal handler case is
 why you can't just fix this with more locks.) Instead we take the
 much simpler approach of setting a flag for the CPU when we want
 it to stop executing TBs, and generate code to check the flag at
 the start of every TB. The raciness is easiest to provoke with
 multithreaded linux-user guests but it is I think also a risk
 in system emulation mode.

 This fixes the crashes seen in LP:668799; however there are another
 class of crashes described in LP:1098729 which stem from the fact
 that in linux-user with a multithreaded guest all threads will
 use and modify the same global TCG date structures (including the
 generated code buffer) without any kind of locking. This means that
 multithreaded guest binaries are still in the unsupported category.

Now when Debian Wheezy has been released with qemu 1.2, users started
to file bugreports about multithreaded apps crashing.  So, while I
understand these are unsupported as per above, I think it is better
to fix as much as we can, and allow some more usage cases, than to
describe that it does not work.  (And yes, I also understand it's
better to fix that before a release, but oh well).

So I tried to backport the series to 1.2 branch, to see how it goes.
But there were many changes since 1.2, so it ended up in quite some
changes, -- not exactly huge and complicated, but I need some help
or additional pair (or two) of eyes to ensure the sanity of the
resulting code.

The backported patchset is smaller than the original.

 Andreas Färber (1):
   cpu: Introduce ENV_OFFSET macros

This change isn't needed in 1.2, because all CPU state is in single
place there, it hasn't been split between env and cpu states there.

 Peter Maydell (5):
   tcg: Document tcg_qemu_tb_exec() and provide constants for low bit uses

Just a minor context difference in the header, no questions.

   cpu-exec: wrap tcg_qemu_tb_exec() in a fn to restore the PC

This needed a back merge of env+cpu states back to cpu.
Maybe we should somehow revisit the whole concept of the
two, because it's sorta fun: at some point all functions
has been converted to accept `cpu' instead of `env', but
in many places the first thing a function does is to
get `env' pointer out of `cpu'.  The backport of this
change reverts this piggy-backing and uses `env' everywhere
consistently again.

   Handle CPU interrupts by inline checking of a flag

The main patch in the series.

The same simplification from `cpu' back to `env' as above.
I had to add `tcg_exit_req' field into CPU_COMMON macro in
cpu-defs.h, instead of adding it to CPUState struct in
include/qom/cpu.h.

Plus the corresponding changes in gen-icount.h -- that's
where ENV_OFFSET was used, but due to the same env to cpu
split, not needed anymore.  My main question is actually
about this place, I don't exactly understand how the
code generation works, so am not sure if I backported
it correctly.

Plus minor code shuffling - for one, bits in translate-all.c
was in exec.c before, so I had to modify it there.

   translate-all.c: Remove cpu_unlink_tb()

That's easy, but again the bits being removed are in
exec.c in 1.2, not in translate-all.c (so the resulting
backported patch is now misnamed).

Technically this patch isn't needed for a backport,
since the function(s) are really unused, but gcc
generates a warning about unused static function
and the compilation fails due to -Werror.

   gen-icount.h: Rename gen_icount_start/end to gen_tb_start/end

And I didn't try to backport this one, since this is
just mechanical s/icount/tb/g without any code changes.

Now, the resulting thing compiles (ha!), but I'm not
really sure how to test it.  I ran a few random apps
using qemu-i386 and qemu-arm, it appears to work.

If all goes well, I think this series needs to be
included in whatever -stable qemu series are in use.

I'll post the 4 resulting patches in reply to this
message.

Thank you!

/mjt

[Qemu-devel] [PATCH 2/4] cpu-exec: wrap tcg_qemu_tb_exec() in a fn to restore the PC

From: Peter Maydell peter.mayd...@linaro.org

If tcg_qemu_tb_exec() returns a value whose low bits don't indicate a
link to an indexed next TB, this means that the TB execution never
started (eg because the instruction counter hit zero).  In this case the
guest PC has to be reset to the address of the start of the TB.
Refactor the cpu-exec code to make all tcg_qemu_tb_exec() calls pass
through a wrapper function which does this restoration if necessary.

Note that the apparent change in cpu_exec_nocache() from calling
cpu_pc_from_tb() with the old TB to calling it with the TB returned by
do_tcg_qemu_tb_exec() is safe, because in the nocache case we can
guarantee that the TB we try to execute is not linked to any others,
so the only possible returned TB is the one we started at. That is,
we should arguably previously have included in cpu_exec_nocache() an
assert(next_tb  ~TB_EXIT_MASK) == tb), since the API requires restore
from next_tb but we were using tb.

Signed-off-by: Peter Maydell peter.mayd...@linaro.org
Reviewed-by: Richard Henderson r...@twiddle.net
Signed-off-by: Blue Swirl blauwir...@gmail.com

(cherry picked from commit 77211379d73ea0c89c0b5bb6eee74b17cb06f9a8)

Conflicts:
cpu-exec.c

Signed-off-by: Michael Tokarev m...@tls.msk.ru
---
 cpu-exec.c |   28 +---
 1 file changed, 17 insertions(+), 11 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index 18da91e..edef121 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -51,12 +51,26 @@ void cpu_resume_from_signal(CPUArchState *env, void *puc)
 }
 #endif
 
+/* Execute a TB, and fix up the CPU state afterwards if necessary */
+static inline tcg_target_ulong cpu_tb_exec(CPUArchState *env, uint8_t *tb_ptr)
+{
+tcg_target_ulong next_tb = tcg_qemu_tb_exec(env, tb_ptr);
+if ((next_tb  TB_EXIT_MASK)  TB_EXIT_IDX1) {
+/* We didn't start executing this TB (eg because the instruction
+ * counter hit zero); we must restore the guest PC to the address
+ * of the start of the TB.
+ */
+TranslationBlock *tb = (TranslationBlock *)(next_tb  ~TB_EXIT_MASK);
+cpu_pc_from_tb(env, tb);
+}
+return next_tb;
+}
+
 /* Execute the code without caching the generated code. An interpreter
could be used if available. */
 static void cpu_exec_nocache(CPUArchState *env, int max_cycles,
  TranslationBlock *orig_tb)
 {
-tcg_target_ulong next_tb;
 TranslationBlock *tb;
 
 /* Should never happen.
@@ -68,14 +82,8 @@ static void cpu_exec_nocache(CPUArchState *env, int 
max_cycles,
  max_cycles);
 env-current_tb = tb;
 /* execute the generated code */
-next_tb = tcg_qemu_tb_exec(env, tb-tc_ptr);
+cpu_tb_exec(env, tb-tc_ptr);
 env-current_tb = NULL;
-
-if ((next_tb  TB_EXIT_MASK) == TB_EXIT_ICOUNT_EXPIRED) {
-/* Restore PC.  This may happen if async event occurs before
-   the TB starts executing.  */
-cpu_pc_from_tb(env, tb);
-}
 tb_phys_invalidate(tb, -1);
 tb_free(tb);
 }
@@ -569,13 +577,11 @@ int cpu_exec(CPUArchState *env)
 if (likely(!env-exit_request)) {
 tc_ptr = tb-tc_ptr;
 /* execute the generated code */
-next_tb = tcg_qemu_tb_exec(env, tc_ptr);
+next_tb = cpu_tb_exec(env, tc_ptr);
 if ((next_tb  TB_EXIT_MASK) == TB_EXIT_ICOUNT_EXPIRED) {
 /* Instruction counter expired.  */
 int insns_left;
 tb = (TranslationBlock *)(next_tb  ~TB_EXIT_MASK);
-/* Restore PC.  */
-cpu_pc_from_tb(env, tb);
 insns_left = env-icount_decr.u32;
 if (env-icount_extra  insns_left = 0) {
 /* Refill decrementer and continue execution.  */
-- 
1.7.10.4

Re: [Qemu-devel] [PATCH 1/2] clean unnecessary code: don't check g_strdup arg for NULL

09.05.2013 11:53, Dong Xu Wang wrote:
 Signed-off-by: Dong Xu Wang wdon...@linux.vnet.ibm.com

Thank you Dong, (re)applied both patches to the trivial patch queue.

/mjt

[Qemu-devel] [PATCH 3/4] Handle CPU interrupts by inline checking of a flag

From: Peter Maydell peter.mayd...@linaro.org

Fix some of the nasty TCG race conditions and crashes by implementing
cpu_exit() as setting a flag which is checked at the start of each TB.
This avoids crashes if a thread or signal handler calls cpu_exit()
while the execution thread is itself modifying the TB graph (which
may happen in system emulation mode as well as in linux-user mode
with a multithreaded guest binary).

This fixes the crashes seen in LP:668799; however there are another
class of crashes described in LP:1098729 which stem from the fact
that in linux-user with a multithreaded guest all threads will
use and modify the same global TCG date structures (including the
generated code buffer) without any kind of locking. This means that
multithreaded guest binaries are still in the unsupported
category.

Signed-off-by: Peter Maydell peter.mayd...@linaro.org
Reviewed-by: Richard Henderson r...@twiddle.net
Signed-off-by: Blue Swirl blauwir...@gmail.com

(cherry picked from commit 378df4b23753a11be650af7664ca76bc75cb9f01)

Conflicts:
cpu-exec.c
exec.c
include/qom/cpu.h
translate-all.c (original code was in exec.c, changed there)

Signed-off-by: Michael Tokarev m...@tls.msk.ru
---
 cpu-defs.h   |1 +
 cpu-exec.c   |   25 -
 exec.c   |6 +++---
 gen-icount.h |   11 +++
 tcg/tcg.h|5 +
 5 files changed, 44 insertions(+), 4 deletions(-)

diff --git a/cpu-defs.h b/cpu-defs.h
index f49e950..75c8e40 100644
--- a/cpu-defs.h
+++ b/cpu-defs.h
@@ -173,6 +173,7 @@ typedef struct CPUWatchpoint {
 uint32_t halted; /* Nonzero if the CPU is in suspend state */   \
 uint32_t interrupt_request; \
 volatile sig_atomic_t exit_request; \
+volatile sig_atomic_t tcg_exit_req; \
 CPU_COMMON_TLB  \
 struct TranslationBlock *tb_jmp_cache[TB_JMP_CACHE_SIZE];   \
 /* buffer for temporaries in the code generator */  \
diff --git a/cpu-exec.c b/cpu-exec.c
index edef121..3049e46 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -63,6 +63,12 @@ static inline tcg_target_ulong cpu_tb_exec(CPUArchState 
*env, uint8_t *tb_ptr)
 TranslationBlock *tb = (TranslationBlock *)(next_tb  ~TB_EXIT_MASK);
 cpu_pc_from_tb(env, tb);
 }
+if ((next_tb  TB_EXIT_MASK) == TB_EXIT_REQUESTED) {
+/* We were asked to stop executing TBs (probably a pending
+ * interrupt. We've now stopped, so clear the flag.
+ */
+env-tcg_exit_req = 0;
+}
 return next_tb;
 }
 
@@ -578,7 +584,20 @@ int cpu_exec(CPUArchState *env)
 tc_ptr = tb-tc_ptr;
 /* execute the generated code */
 next_tb = cpu_tb_exec(env, tc_ptr);
-if ((next_tb  TB_EXIT_MASK) == TB_EXIT_ICOUNT_EXPIRED) {
+switch (next_tb  TB_EXIT_MASK) {
+case TB_EXIT_REQUESTED:
+/* Something asked us to stop executing
+ * chained TBs; just continue round the main
+ * loop. Whatever requested the exit will also
+ * have set something else (eg exit_request or
+ * interrupt_request) which we will handle
+ * next time around the loop.
+ */
+tb = (TranslationBlock *)(next_tb  ~TB_EXIT_MASK);
+next_tb = 0;
+break;
+case TB_EXIT_ICOUNT_EXPIRED:
+{
 /* Instruction counter expired.  */
 int insns_left;
 tb = (TranslationBlock *)(next_tb  ~TB_EXIT_MASK);
@@ -602,6 +621,10 @@ int cpu_exec(CPUArchState *env)
 next_tb = 0;
 cpu_loop_exit(env);
 }
+break;
+}
+default:
+break;
 }
 }
 env-current_tb = NULL;
diff --git a/exec.c b/exec.c
index 0a67f07..54a37b5 100644
--- a/exec.c
+++ b/exec.c
@@ -1756,7 +1756,7 @@ static void tcg_handle_interrupt(CPUArchState *env, int 
mask)
 cpu_abort(env, Raised interrupt while not in I/O function);
 }
 } else {
-cpu_unlink_tb(env);
+env-tcg_exit_req = 1;
 }
 }
 
@@ -1767,7 +1767,7 @@ CPUInterruptHandler cpu_interrupt_handler = 
tcg_handle_interrupt;
 void cpu_interrupt(CPUArchState *env, int mask)
 {
 env-interrupt_request |= mask;
-cpu_unlink_tb(env);
+env-tcg_exit_req = 1;
 }
 #endif /* CONFIG_USER_ONLY */
 
@@ -1779,7 +1779,7 @@ void cpu_reset_interrupt(CPUArchState *env, int mask)
 void cpu_exit(CPUArchState

[Qemu-devel] [PATCH 1/4] tcg: Document tcg_qemu_tb_exec() and provide constants for low bit uses

From: Peter Maydell peter.mayd...@linaro.org

Document tcg_qemu_tb_exec(). In particular, its return value is a
combination of a pointer to the next translation block and some
extra information in the low two bits. Provide some #defines for
the values passed in these bits to improve code clarity.

Signed-off-by: Peter Maydell peter.mayd...@linaro.org
Reviewed-by: Richard Henderson r...@twiddle.net
Signed-off-by: Blue Swirl blauwir...@gmail.com

(cherry picked from commit 0980011b4f66482d2733ab2dd0f2f61747772c6b)

Conflicts:
tcg/tcg.h

Signed-off-by: Michael Tokarev m...@tls.msk.ru
---
 cpu-exec.c   |9 +
 gen-icount.h |2 +-
 tcg/tcg.h|   44 +++-
 3 files changed, 49 insertions(+), 6 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index 6db32cd..18da91e 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -71,7 +71,7 @@ static void cpu_exec_nocache(CPUArchState *env, int 
max_cycles,
 next_tb = tcg_qemu_tb_exec(env, tb-tc_ptr);
 env-current_tb = NULL;
 
-if ((next_tb  3) == 2) {
+if ((next_tb  TB_EXIT_MASK) == TB_EXIT_ICOUNT_EXPIRED) {
 /* Restore PC.  This may happen if async event occurs before
the TB starts executing.  */
 cpu_pc_from_tb(env, tb);
@@ -555,7 +555,8 @@ int cpu_exec(CPUArchState *env)
spans two pages, we cannot safely do a direct
jump. */
 if (next_tb != 0  tb-page_addr[1] == -1) {
-tb_add_jump((TranslationBlock *)(next_tb  ~3), next_tb  
3, tb);
+tb_add_jump((TranslationBlock *)(next_tb  ~TB_EXIT_MASK),
+next_tb  TB_EXIT_MASK, tb);
 }
 spin_unlock(tb_lock);
 
@@ -569,10 +570,10 @@ int cpu_exec(CPUArchState *env)
 tc_ptr = tb-tc_ptr;
 /* execute the generated code */
 next_tb = tcg_qemu_tb_exec(env, tc_ptr);
-if ((next_tb  3) == 2) {
+if ((next_tb  TB_EXIT_MASK) == TB_EXIT_ICOUNT_EXPIRED) {
 /* Instruction counter expired.  */
 int insns_left;
-tb = (TranslationBlock *)(next_tb  ~3);
+tb = (TranslationBlock *)(next_tb  ~TB_EXIT_MASK);
 /* Restore PC.  */
 cpu_pc_from_tb(env, tb);
 insns_left = env-icount_decr.u32;
diff --git a/gen-icount.h b/gen-icount.h
index 430cb44..3d5c131 100644
--- a/gen-icount.h
+++ b/gen-icount.h
@@ -29,7 +29,7 @@ static void gen_icount_end(TranslationBlock *tb, int 
num_insns)
 if (use_icount) {
 *icount_arg = num_insns;
 gen_set_label(icount_label);
-tcg_gen_exit_tb((tcg_target_long)tb + 2);
+tcg_gen_exit_tb((tcg_target_long)tb + TB_EXIT_ICOUNT_EXPIRED);
 }
 }
 
diff --git a/tcg/tcg.h b/tcg/tcg.h
index a83bddd..af06eb5 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -581,7 +581,49 @@ TCGv_i64 tcg_const_local_i64(int64_t val);
 
 extern uint8_t code_gen_prologue[];
 
-/* TCG targets may use a different definition of tcg_qemu_tb_exec. */
+/**
+ * tcg_qemu_tb_exec:
+ * @env: CPUArchState * for the CPU
+ * @tb_ptr: address of generated code for the TB to execute
+ *
+ * Start executing code from a given translation block.
+ * Where translation blocks have been linked, execution
+ * may proceed from the given TB into successive ones.
+ * Control eventually returns only when some action is needed
+ * from the top-level loop: either control must pass to a TB
+ * which has not yet been directly linked, or an asynchronous
+ * event such as an interrupt needs handling.
+ *
+ * The return value is a pointer to the next TB to execute
+ * (if known; otherwise zero). This pointer is assumed to be
+ * 4-aligned, and the bottom two bits are used to return further
+ * information:
+ *  0, 1: the link between this TB and the next is via the specified
+ *TB index (0 or 1). That is, we left the TB via (the equivalent
+ *of) goto_tb index. The main loop uses this to determine
+ *how to link the TB just executed to the next.
+ *  2:we are using instruction counting code generation, and we
+ *did not start executing this TB because the instruction counter
+ *would hit zero midway through it. In this case the next-TB pointer
+ *returned is the TB we were about to execute, and the caller must
+ *arrange to execute the remaining count of instructions.
+ *
+ * If the bottom two bits indicate an exit-via-index then the CPU
+ * state is correctly synchronised and ready for execution of the next
+ * TB (and in particular the guest PC is the address to execute next).
+ * Otherwise, we gave up on execution of this TB before it started, and
+ * the caller must fix up the CPU state by calling cpu_pc_from_tb()
+ * with the next-TB pointer we return.
+ *
+ * Note that TCG targets may

[Qemu-devel] [PATCH 4/4] translate-all.c: Remove cpu_unlink_tb()

From: Peter Maydell peter.mayd...@linaro.org

The (unsafe) function cpu_unlink_tb() is now unused, so we can simply
remove it and any code that was only used by it.

Signed-off-by: Peter Maydell peter.mayd...@linaro.org
Reviewed-by: Richard Henderson r...@twiddle.net
Signed-off-by: Blue Swirl blauwir...@gmail.com

(cherry picked from commit 3a808cc407744c30daa7470b5f191cde1fbc1aae)

Conflicts:
translate-all.c
(original code was in exec.c, so the patch title is now misleading)

Signed-off-by: Michael Tokarev m...@tls.msk.ru
---
 exec.c |   67 
 1 file changed, 67 deletions(-)

diff --git a/exec.c b/exec.c
index 54a37b5..fb72d74 100644
--- a/exec.c
+++ b/exec.c
@@ -1421,53 +1421,6 @@ TranslationBlock *tb_find_pc(uintptr_t tc_ptr)
 return tbs[m_max];
 }
 
-static void tb_reset_jump_recursive(TranslationBlock *tb);
-
-static inline void tb_reset_jump_recursive2(TranslationBlock *tb, int n)
-{
-TranslationBlock *tb1, *tb_next, **ptb;
-unsigned int n1;
-
-tb1 = tb-jmp_next[n];
-if (tb1 != NULL) {
-/* find head of list */
-for(;;) {
-n1 = (uintptr_t)tb1  3;
-tb1 = (TranslationBlock *)((uintptr_t)tb1  ~3);
-if (n1 == 2)
-break;
-tb1 = tb1-jmp_next[n1];
-}
-/* we are now sure now that tb jumps to tb1 */
-tb_next = tb1;
-
-/* remove tb from the jmp_first list */
-ptb = tb_next-jmp_first;
-for(;;) {
-tb1 = *ptb;
-n1 = (uintptr_t)tb1  3;
-tb1 = (TranslationBlock *)((uintptr_t)tb1  ~3);
-if (n1 == n  tb1 == tb)
-break;
-ptb = tb1-jmp_next[n1];
-}
-*ptb = tb-jmp_next[n];
-tb-jmp_next[n] = NULL;
-
-/* suppress the jump to next tb in generated code */
-tb_reset_jump(tb, n);
-
-/* suppress jumps in the tb on which we could have jumped */
-tb_reset_jump_recursive(tb_next);
-}
-}
-
-static void tb_reset_jump_recursive(TranslationBlock *tb)
-{
-tb_reset_jump_recursive2(tb, 0);
-tb_reset_jump_recursive2(tb, 1);
-}
-
 #if defined(TARGET_HAS_ICE)
 #if defined(CONFIG_USER_ONLY)
 static void breakpoint_invalidate(CPUArchState *env, target_ulong pc)
@@ -1711,26 +1664,6 @@ void cpu_set_log_filename(const char *filename)
 cpu_set_log(loglevel);
 }
 
-static void cpu_unlink_tb(CPUArchState *env)
-{
-/* FIXME: TB unchaining isn't SMP safe.  For now just ignore the
-   problem and hope the cpu will stop of its own accord.  For userspace
-   emulation this often isn't actually as bad as it sounds.  Often
-   signals are used primarily to interrupt blocking syscalls.  */
-TranslationBlock *tb;
-static spinlock_t interrupt_lock = SPIN_LOCK_UNLOCKED;
-
-spin_lock(interrupt_lock);
-tb = env-current_tb;
-/* if the cpu is currently executing code, we must unlink it and
-   all the potentially executing TB */
-if (tb) {
-env-current_tb = NULL;
-tb_reset_jump_recursive(tb);
-}
-spin_unlock(interrupt_lock);
-}
-
 #ifndef CONFIG_USER_ONLY
 /* mask must never be zero, except for A20 change call */
 static void tcg_handle_interrupt(CPUArchState *env, int mask)
-- 
1.7.10.4

Re: [Qemu-devel] qemu-img problem when create a file larger than fs's size

2013-05-09 Thread yuxh




On 05/09/2013 03:44 PM, Stefan Hajnoczi wrote:

On Wed, May 08, 2013 at 01:18:17PM +0800, yuxh wrote:

I have to consult you a qemu-img's problem.

Is this reasonable to create a file which is larger than the
available size of the fs by qemu-img cmd ?

When I use qemu-img create a file which is larger than the available
size of the fs, the creation is completed succesfully.

However when I use this file in guest as a guest's disk, and write
beyond the size the host file can provides, the guest was paused by
qemu-kvm or libvirt and was in maybe a infinite circle where the
guest just can't be used except I detach the disk from guest or
destroy the guest.


You can change the ENOSPC policy with -drive werror=,rerror=.  See the
QEMU man page.

The default behavior is to pause the guest so the host admin can free up
or add space.  Then the guest can be continued - this will retry the
I/O.

But you can also tell QEMU to pass the error through to the guest using
the -drive werror=,rerror= options.

Stefan




Oh, understand now.
Thank you so much for your explanation.

Yu

Re: [Qemu-devel] [PATCH 1/2] Vring: vring's listener's priority should higher than kvm

On Thu, May 09, 2013 at 08:40:21AM +0800, Liu Ping Fan wrote:
 From: Liu Ping Fan pingf...@linux.vnet.ibm.com
 
 Hosts threads which handle vring should have high MemoryListener priority
 than kvm. For currently code, take the following scenario:
   kvm_region_add() run earlier before vhost_region_add(), then in guest,
 vring's desc[i] can refer to addressX in the new region known by guest.
 But vhost does not know this new region yet, and the vring handler will
 fail.

Is there a concrete scenario where this happens?

I can think of situations like the ioeventfd being readable before
vhost/hostmem is populated.  But I don't see how that's related to the
priority of kvm_region_add().

Stefan

Re: [Qemu-devel] [PATCH 2/2] mem: highlight the listener's priority as enum

On Thu, May 09, 2013 at 08:40:22AM +0800, Liu Ping Fan wrote:
 diff --git a/include/exec/memory.h b/include/exec/memory.h
 index 9e88320..77e0d40 100644
 --- a/include/exec/memory.h
 +++ b/include/exec/memory.h
 @@ -192,6 +192,16 @@ struct MemoryRegionSection {
  
  typedef struct MemoryListener MemoryListener;
  
 +/* The list of priority, ex, vhost should have higher priority (less num) 
 than
 + * kvm, ie PRI_VRING  PRI_HYPV

s/PRI_HYPV/PRI_HYPERV/

 + */
 +typedef enum {
 +PRI_DEFAULT = 0,
 +PRI_CORE = 1,
 +PRI_VRING = 9,
 +PRI_HYPERV = 10,

HYPERV == hypervisor?  Easy to confuse with Microsoft Hyper-V.  Please
use PRI_ACCEL or PRI_HYPERVISOR.

Re: [Qemu-devel] [Qemu-trivial] [PATCH] target-mips: add missing check_dspr2 for multiply instructions

08.05.2013 18:09, Petar Jovanovic wrote:
 From: Petar Jovanovic petar.jovano...@imgtec.com
 
 The emulator needs to check in hflags if DSP unit has been turned off before
 it generates code for MUL_PH, MUL_S_PH, MULQ_S_W, and MULQ_RS_W.
 
 Signed-off-by: Petar Jovanovic petar.jovano...@imgtec.com
 ---
  target-mips/translate.c |1 +
  1 file changed, 1 insertion(+)
 
 diff --git a/target-mips/translate.c b/target-mips/translate.c
 index b7f8203..0a53203 100644
 --- a/target-mips/translate.c
 +++ b/target-mips/translate.c
 @@ -13400,6 +13400,7 @@ static void gen_mipsdsp_multiply(DisasContext *ctx, 
 uint32_t op1, uint32_t op2,
  /* OPC_MULT_G_2E, OPC_ADDUH_QB_DSP, OPC_MUL_PH_DSP have
   * the same mask and op1. */
  case OPC_MULT_G_2E:
 +check_dspr2(ctx);
  switch (op2) {
  case  OPC_MUL_PH:
  gen_helper_mul_ph(cpu_gpr[ret], v1_t, v2_t, cpu_env);
 

FWIW.  While the patch has been applied by Aurelien, I've a small
comment about the trivialness of patches, having in mind this
example.  This appears to be a trivial patch by the look of it,
that is, it is just one-liner adding a simple instruction.  But
nevertheless, it is NOT suitable for qemu-trivial, because it
need to be verified by the subsystem maintainer who understand
all the possible implications.  The patch changes core logic
which may have unexpected consequences in unexpected places,
it may lead to wrong code generation and bugs, and other fun
side effects.  In that way it is not trivial, and really
need to be sent to subsystem maintainers (and to qemu-devel@),
not to the trivial patch queue.

From not-so-trivial stuff I can consider, for example, fixes in
command line processing which do not break backward compatibility
(if it wasn't outright bug like a misspelt option name), -- but
even there, it might be a good idea to discuss it on qemu-devel
anyway.

In other words, if you know some subsystem and to you some thing
looks really trivial, it isn't necessary the same for someone
who does not have the same knowlege about that subsystem.. ;)

Thank you!

/mjt

Re: [Qemu-devel] [PATCH 1/2] Vring: vring's listener's priority should higher than kvm

2013-05-09 Thread Michael S. Tsirkin

On Thu, May 09, 2013 at 08:40:21AM +0800, Liu Ping Fan wrote:
 From: Liu Ping Fan pingf...@linux.vnet.ibm.com
 
 Hosts threads which handle vring should have high MemoryListener priority
 than kvm. For currently code, take the following scenario:
   kvm_region_add() run earlier before vhost_region_add(), then in guest,
 vring's desc[i] can refer to addressX in the new region known by guest.
 But vhost does not know this new region yet, and the vring handler will
 fail.
 
 Signed-off-by: Liu Ping Fan pingf...@linux.vnet.ibm.com

Is this seen in testing, or are you describing a theorecitical
scenario? Please make this clear in the commit log.

 ---
  hw/virtio/dataplane/hostmem.c |2 +-
  hw/virtio/vhost.c |2 +-
  2 files changed, 2 insertions(+), 2 deletions(-)
 
 diff --git a/hw/virtio/dataplane/hostmem.c b/hw/virtio/dataplane/hostmem.c
 index 37292ff..67cbce1 100644
 --- a/hw/virtio/dataplane/hostmem.c
 +++ b/hw/virtio/dataplane/hostmem.c
 @@ -158,7 +158,7 @@ void hostmem_init(HostMem *hostmem)
  .eventfd_del = hostmem_listener_eventfd_dummy,
  .coalesced_mmio_add = hostmem_listener_coalesced_mmio_dummy,
  .coalesced_mmio_del = hostmem_listener_coalesced_mmio_dummy,
 -.priority = 10,
 +.priority = 9,
  };
  
  memory_listener_register(hostmem-listener, address_space_memory);
 diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
 index fbabf99..91c313b 100644
 --- a/hw/virtio/vhost.c
 +++ b/hw/virtio/vhost.c
 @@ -856,7 +856,7 @@ int vhost_dev_init(struct vhost_dev *hdev, int devfd, 
 const char *devpath,
  .log_global_stop = vhost_log_global_stop,
  .eventfd_add = vhost_eventfd_add,
  .eventfd_del = vhost_eventfd_del,
 -.priority = 10
 +.priority = 9
  };
  hdev-mem = g_malloc0(offsetof(struct vhost_memory, regions));
  hdev-n_mem_sections = 0;
 -- 
 1.7.4.4

Re: [Qemu-devel] [PATCH 1/2] Vring: vring's listener's priority should higher than kvm

On Thu, May 9, 2013 at 4:44 PM, Michael S. Tsirkin m...@redhat.com wrote:
 On Thu, May 09, 2013 at 08:40:21AM +0800, Liu Ping Fan wrote:
 From: Liu Ping Fan pingf...@linux.vnet.ibm.com

 Hosts threads which handle vring should have high MemoryListener priority
 than kvm. For currently code, take the following scenario:
   kvm_region_add() run earlier before vhost_region_add(), then in guest,
 vring's desc[i] can refer to addressX in the new region known by guest.
 But vhost does not know this new region yet, and the vring handler will
 fail.

 Signed-off-by: Liu Ping Fan pingf...@linux.vnet.ibm.com

 Is this seen in testing, or are you describing a theorecitical
 scenario? Please make this clear in the commit log.

A  theorecitical scenario.  I think vcpu thread and vhost are async,
so we need this method to sync.
 ---
  hw/virtio/dataplane/hostmem.c |2 +-
  hw/virtio/vhost.c |2 +-
  2 files changed, 2 insertions(+), 2 deletions(-)

 diff --git a/hw/virtio/dataplane/hostmem.c b/hw/virtio/dataplane/hostmem.c
 index 37292ff..67cbce1 100644
 --- a/hw/virtio/dataplane/hostmem.c
 +++ b/hw/virtio/dataplane/hostmem.c
 @@ -158,7 +158,7 @@ void hostmem_init(HostMem *hostmem)
  .eventfd_del = hostmem_listener_eventfd_dummy,
  .coalesced_mmio_add = hostmem_listener_coalesced_mmio_dummy,
  .coalesced_mmio_del = hostmem_listener_coalesced_mmio_dummy,
 -.priority = 10,
 +.priority = 9,
  };

  memory_listener_register(hostmem-listener, address_space_memory);
 diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
 index fbabf99..91c313b 100644
 --- a/hw/virtio/vhost.c
 +++ b/hw/virtio/vhost.c
 @@ -856,7 +856,7 @@ int vhost_dev_init(struct vhost_dev *hdev, int devfd, 
 const char *devpath,
  .log_global_stop = vhost_log_global_stop,
  .eventfd_add = vhost_eventfd_add,
  .eventfd_del = vhost_eventfd_del,
 -.priority = 10
 +.priority = 9
  };
  hdev-mem = g_malloc0(offsetof(struct vhost_memory, regions));
  hdev-n_mem_sections = 0;
 --
 1.7.4.4

Re: [Qemu-devel] [PATCH 1/2] Vring: vring's listener's priority should higher than kvm

On Thu, May 9, 2013 at 4:30 PM, Stefan Hajnoczi stefa...@gmail.com wrote:
 On Thu, May 09, 2013 at 08:40:21AM +0800, Liu Ping Fan wrote:
 From: Liu Ping Fan pingf...@linux.vnet.ibm.com

 Hosts threads which handle vring should have high MemoryListener priority
 than kvm. For currently code, take the following scenario:
   kvm_region_add() run earlier before vhost_region_add(), then in guest,
 vring's desc[i] can refer to addressX in the new region known by guest.
 But vhost does not know this new region yet, and the vring handler will
 fail.

 Is there a concrete scenario where this happens?

 I can think of situations like the ioeventfd being readable before
 vhost/hostmem is populated.  But I don't see how that's related to the
 priority of kvm_region_add().

For kvm, ie, In guest, vring_desc.addr can point to a chunk of data in
the new added memory, and kick vhost. The vhost has not added this new
region, so its local lookup table can not translate this new address,
and vring handler will fail.  If vhost priority is higher than kvm,
then, it will know this new address earlier than kvm.

 Stefan

Re: [Qemu-devel] [PATCH 0/6] Drop the irredeemably racy cpu_unlink_tb()

On 9 May 2013 09:09, Michael Tokarev m...@tls.msk.ru wrote:
 Now, the resulting thing compiles (ha!), but I'm not
 really sure how to test it.  I ran a few random apps
 using qemu-i386 and qemu-arm, it appears to work.

You need to test TCG system emulation too, and in
particular something with multiple guest CPU cores.

-- PMM

Re: [Qemu-devel] [PATCH 2/2] mem: highlight the listener's priority as enum

On Thu, May 9, 2013 at 4:31 PM, Stefan Hajnoczi stefa...@gmail.com wrote:
 On Thu, May 09, 2013 at 08:40:22AM +0800, Liu Ping Fan wrote:
 diff --git a/include/exec/memory.h b/include/exec/memory.h
 index 9e88320..77e0d40 100644
 --- a/include/exec/memory.h
 +++ b/include/exec/memory.h
 @@ -192,6 +192,16 @@ struct MemoryRegionSection {

  typedef struct MemoryListener MemoryListener;

 +/* The list of priority, ex, vhost should have higher priority (less num) 
 than
 + * kvm, ie PRI_VRING  PRI_HYPV

 s/PRI_HYPV/PRI_HYPERV/

Will fix.
 + */
 +typedef enum {
 +PRI_DEFAULT = 0,
 +PRI_CORE = 1,
 +PRI_VRING = 9,
 +PRI_HYPERV = 10,

 HYPERV == hypervisor?  Easy to confuse with Microsoft Hyper-V.  Please
 use PRI_ACCEL or PRI_HYPERVISOR.

Ok.

Regards,
Pingfan

Re: [Qemu-devel] [PATCH 2/2] mem: highlight the listener's priority as enum

On 9 May 2013 01:40, Liu Ping Fan qemul...@gmail.com wrote:
 From: Liu Ping Fan pingf...@linux.vnet.ibm.com

 It will make the priority prominent, when new listener added.
 All the priority's value are kept unchanged, except for vhost
 and hostmem.(Changes introduced by prev patch)

Any chance of something somewhere which explains why
these various things care about the order and thus
need these particular priorities? (ie what are the
dependencies between the listeners?)

thanks
-- PMM

Re: [Qemu-devel] [PATCH 2/2] mem: highlight the listener's priority as enum

On Thu, May 9, 2013 at 5:21 PM, Peter Maydell peter.mayd...@linaro.org wrote:
 On 9 May 2013 01:40, Liu Ping Fan qemul...@gmail.com wrote:
 From: Liu Ping Fan pingf...@linux.vnet.ibm.com

 It will make the priority prominent, when new listener added.
 All the priority's value are kept unchanged, except for vhost
 and hostmem.(Changes introduced by prev patch)

 Any chance of something somewhere which explains why
 these various things care about the order and thus
 need these particular priorities? (ie what are the
 dependencies between the listeners?)

Sorry, I do not go so deeply, so leave the original value as they are.
 As far as I know, the most prominent one is the kvm and the core. If
a region removed from core earlier than kvm, then
cpu_physical_memory_rw() can use an address valid in kvm, but invalid
for radix-tree, and cause error.

 thanks
 -- PMM

[Qemu-devel] [PATCH] qxl: Call spice_qxl_driver_unload from qxl_enter_vga_mode

2013-05-09 Thread Gerd Hoffmann

From: Hans de Goede hdego...@redhat.com

With a SPICE_DISPLAY_CAP_MONITORS_CONFIG capable client, the client needs to
know what part of the primary to use for each monitor. If the guest driver
does not support this, the server sends messages to the client for a
single monitor spanning the entire primary.

As soon as the guest calls spice_qxl_monitors_config_async once, the server
sets the red_worker driver_has_monitors_config flag and stops doing this.

This is a problem when the driver gets unloaded, for example after a reboot
or when switching to a text vc with usermode mode-setting under Linux.

To reproduce this start a multi-mon capable Linux guest which uses
usermode mode-setting and then once X has started switch to a text vc. Note
how the client window does not only not resize, if you try to resize it
manually you always keep blackborders since the aspect is wrong.

This patch calls a new spice-server method called spice_qxl_driver_unload
which clears the driver_has_monitors_config flag inside the server, thereby
fixing this.

Signed-off-by: Hans de Goede hdego...@redhat.com
Signed-off-by: Gerd Hoffmann kra...@redhat.com
---
 hw/display/qxl.c |3 +++
 1 file changed, 3 insertions(+)

diff --git a/hw/display/qxl.c b/hw/display/qxl.c
index 2d49e9a..c475cb1 100644
--- a/hw/display/qxl.c
+++ b/hw/display/qxl.c
@@ -1077,6 +1077,9 @@ static void qxl_enter_vga_mode(PCIQXLDevice *d)
 return;
 }
 trace_qxl_enter_vga_mode(d-id);
+#if SPICE_SERVER_VERSION = 0x000c03 /* release 0.12.3 */
+spice_qxl_driver_unload(d-ssd.qxl);
+#endif
 qemu_spice_create_host_primary(d-ssd);
 d-mode = QXL_MODE_VGA;
 vga_dirty_log_start(d-vga);
-- 
1.7.9.7

[Qemu-devel] [PULL for-1.5] spice: bugfix queue

2013-05-09 Thread Gerd Hoffmann

  Hi,

Tiny spice patch queue, featuring a single qxl bugfix.

please pull,
  Gerd

The following changes since commit 47ec15cdd44877e553ed0bd0a16aea8a295dad62:

  Update version for 1.5.0-rc1 (2013-05-08 15:54:47 -0500)

are available in the git repository at:

  git://anongit.freedesktop.org/spice/qemu spice.v70

for you to fetch changes up to 0a2b5e3a7899b40d05d7c6c1c41eb4e64dd2ed4b:

  qxl: Call spice_qxl_driver_unload from qxl_enter_vga_mode (2013-05-09 
11:46:53 +0200)


Hans de Goede (1):
  qxl: Call spice_qxl_driver_unload from qxl_enter_vga_mode

 hw/display/qxl.c |3 +++
 1 file changed, 3 insertions(+)

Re: [Qemu-devel] [PATCH 0/6] Drop the irredeemably racy cpu_unlink_tb()

09.05.2013 13:01, Peter Maydell wrote:
 On 9 May 2013 09:09, Michael Tokarev m...@tls.msk.ru wrote:
 Now, the resulting thing compiles (ha!), but I'm not
 really sure how to test it.  I ran a few random apps
 using qemu-i386 and qemu-arm, it appears to work.
 
 You need to test TCG system emulation too, and in
 particular something with multiple guest CPU cores.

As suggested on IRC, I ran a proposed omp testcase (in
LP:668799) in a armel chroot created by the same
patched qemu-arm-static, with 6 and 60 threads, and
tried running a few java binaries I've found -- that
works well from creating system to building and running
testcases.  I also tried the same with qemu-system-i386
(without kvm), installing a system using patched qemu,
and running multi-threaded apps in it.

At least I don't see any obvious new breakages due to
my backport, but I can definitely go further than with
unpatched qemu.

So this is at least promising, so far... ;)

Thanks,

/mjt

Re: [Qemu-devel] [PATCH 0/6] Drop the irredeemably racy cpu_unlink_tb()

2013-05-09 Thread Andreas Färber

Am 09.05.2013 10:09, schrieb Michael Tokarev:
 22.02.2013 22:09, Peter Maydell wrote:
   cpu-exec: wrap tcg_qemu_tb_exec() in a fn to restore the PC
 
 This needed a back merge of env+cpu states back to cpu.
 Maybe we should somehow revisit the whole concept of the
 two, because it's sorta fun: at some point all functions
 has been converted to accept `cpu' instead of `env', but
 in many places the first thing a function does is to
 get `env' pointer out of `cpu'.

The concept is really easy: There is so much CPU code around that for
many years no one dared to touch it, ;) so changes need to be done
incrementally - not only to identify any fallout! If one function is
converted to no longer rely on env, then rather likely in some caller it
still needs to convert from env - cpu. Once that caller is converted
too, it goes on moving the conversion outwards until the only
remaining env functions are TCG-related. env access from a specific *CPU
type is cheap, thus only talking about common code here. You will find
more background on the big QOM CPUState part X series. CPU_COMMON is
definitely not the way for the future, it should be okay for backporting
though in this case.

Regards,
Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

[Qemu-devel] virsh edit failed to take effect on KVM

2013-05-09 Thread Gonglei (Arei)

Hi all,
   we use the command virsh edit  to modify the VM configuration information 
online on KVM Platform(libvirt-1.0.0 and qemu-1.4), 
but it does not take effect after reboot. However, it works fine on Xen 
Platform.
  
   for an example,a VM is running with the following configuration information:
   ...
   os
type arch='x86_64'hvm/type
boot dev='hd'/
bootmenu enable='yes'/
  /os
  ...
   use command virsh edit  to modify it:
   ...
   os
type arch='x86_64'hvm/type
boot dev='cdrom'/
bootmenu enable='yes'/
  /os
  ...
With the changing, the VM is expected to start from cdrom, when execute the 
command virsh reboot.
But the fact is that the modify does not take effect, the VM is still start 
from hd. 
Well, it will take effect if I use command virsh shutdown and virsh start 
instesad of virsh reboot.
We are wondering if there have any other ways to take the online modify effect. 
What is the next step going on with the command virsh edit on KVM Platform? 
Any ideas? 

Thanks!

-Gonglei

Re: [Qemu-devel] [PATCH 6/8] qapi: add native list coverage for visitor serialization tests

2013-05-09 Thread Laszlo Ersek

On 05/09/13 01:33, Michael Roth wrote:

 +case PTYPE_NUMBER: {
 +numberList *ptr;
 +char *double1, *double2;
 +if (cur_head) {
 +ptr = cur_head;
 +cur_head = ptr-next;
 +} else {
 +cur_head = ptr = pl_copy.value.numbers;
 +}
 +/* we serialize with %f for our reference visitors, so rather 
 than
 + * fuzzy * floating math to test equality, just compare the
 + * formatted values
 + */

I think this comment block has been copied from elsewhere in this file,
indented more deeply and re-filled. There's an asterisk in the comment
body now.


 +double1 = g_malloc0(calc_float_string_storage(pt-value.number));
 +double2 = g_malloc0(calc_float_string_storage(ptr-value));
 +g_assert_cmpstr(double1, ==, double2);

Are you comparing empty strings? Space is allocated and zeroed, but I
can't see where the values are actually formatted.

(Same holds for the original instance of this code, test_primitives().)

 +g_free(double1);
 +g_free(double2);
 +break;
 +}

Thanks,
Laszlo

Re: [Qemu-devel] virsh edit failed to take effect on KVM

2013-05-09 Thread Laszlo Ersek

On 05/09/13 13:42, Gonglei (Arei) wrote:
 Hi all,
we use the command virsh edit  to modify the VM configuration 
 information online on KVM Platform(libvirt-1.0.0 and qemu-1.4), 
 but it does not take effect after reboot. However, it works fine on Xen 
 Platform.
   
for an example,a VM is running with the following configuration 
 information:
  ...
  os
   type arch='x86_64'hvm/type
   boot dev='hd'/
   bootmenu enable='yes'/
 /os
 ...
use command virsh edit  to modify it:
  ...
  os
   type arch='x86_64'hvm/type
   boot dev='cdrom'/
   bootmenu enable='yes'/
 /os
 ...
 With the changing, the VM is expected to start from cdrom, when execute the 
 command virsh reboot.
 But the fact is that the modify does not take effect, the VM is still start 
 from hd. 
 Well, it will take effect if I use command virsh shutdown and virsh start 
 instesad of virsh reboot.
 We are wondering if there have any other ways to take the online modify 
 effect. 
 What is the next step going on with the command virsh edit on KVM Platform? 
 Any ideas? 

Under Xen, xm reboot (or, more likely, xl reboot recently), which is
what I expect virsh reboot to translate to, always creates a brand new
domain. Therefore domain config changes take effect.

Under KVM the same VM instance continues to run, it just goes through an
emulated reset.

Laszlo

Re: [Qemu-devel] [PATCH for 1.5] tcg/optimize: fix setcond2 optimization


On 2013-05-08 15:42, Aurelien Jarno wrote:

When setcond2 is rewritten into setcond, the state of the destination
temp should be reset, so that a copy of the previous value is not
used instead of the result.

Reported-by: Michael Tokarev m...@tls.msk.ru
Cc: Richard Henderson r...@twiddle.net
Signed-off-by: Aurelien Jarno aurel...@aurel32.net


Reviewed-by: Richard Henderson r...@twiddle.net


r~

Re: [Qemu-devel] [PATCH 0/8] qapi: add support for lists of native types

2013-05-09 Thread Laszlo Ersek

On 05/09/13 01:33, Michael Roth wrote:
 These patches apply on top of qemu.git master, and can also be obtained from:
 git://github.com/mdroth/qemu.git qapi-native-lists
 
 Sending this now since a number of series have popped up in the past that
 wanted this, and Amos has some pending patches (query-mac-tables) that rely
 on this as well.
 
 These patches add support for specifying lists of native qapi types
 (int/bool/str/number) like so:
 
   { 'type': 'foo',
 'data': { 'bar': ['int'] }}
 
 for a 'bar' field that is a list of type 'int',
 
   { 'type': 'foo2',
 'data': { 'bar2': ['str'] }}
 
 for a 'bar2' field that is a list of type 'str', and so on.
 
 This uses linked list types for the native C representations, just as we do
 for complex schema-defined types. In the future we may add schema annotations
 of some sort to specify a more natural/efficient array type for the C
 representations, but this should serve the majority of uses-cases for now.
 
  Makefile   |6 +-
  qapi-schema-test.json  |8 ++
  scripts/qapi-types.py  |   44 ++-
  scripts/qapi-visit.py  |   36 -
  scripts/qapi.py|   21 +++
  tests/test-qmp-input-visitor.c |  181 +
  tests/test-qmp-output-visitor.c|  172 
  tests/test-visitor-serialization.c |  256 
 +---
  8 files changed, 692 insertions(+), 32 deletions(-)
 
 

Two notes:
- the remark I made for 6/8 (comparing empty strings),

- for 7/8 and 8/8: the format specification %3.4f is not very useful.
3 is the field width, 4 is the precision, and the latter means for
%f the number of digits printed after the radix character. It's not
useful to specify a smaller field width (which covers the entire output
string) than precision here.

Other than these nothing pokes me in the eye.

Laszlo

Re: [Qemu-devel] [PATCH 6/8] qapi: add native list coverage for visitor serialization tests

2013-05-09 Thread mdroth

On Thu, May 09, 2013 at 02:31:03PM +0200, Laszlo Ersek wrote:
 On 05/09/13 01:33, Michael Roth wrote:
 
  +case PTYPE_NUMBER: {
  +numberList *ptr;
  +char *double1, *double2;
  +if (cur_head) {
  +ptr = cur_head;
  +cur_head = ptr-next;
  +} else {
  +cur_head = ptr = pl_copy.value.numbers;
  +}
  +/* we serialize with %f for our reference visitors, so rather 
  than
  + * fuzzy * floating math to test equality, just compare the
  + * formatted values
  + */
 
 I think this comment block has been copied from elsewhere in this file,
 indented more deeply and re-filled. There's an asterisk in the comment
 body now.

Yes :) Will fix it on the next pass.

 
 
  +double1 = 
  g_malloc0(calc_float_string_storage(pt-value.number));
  +double2 = g_malloc0(calc_float_string_storage(ptr-value));
  +g_assert_cmpstr(double1, ==, double2);
 
 Are you comparing empty strings? Space is allocated and zeroed, but I
 can't see where the values are actually formatted.

Well, that's embarassing...

 
 (Same holds for the original instance of this code, test_primitives().)
 

but at least I can blame it on the fact that I copied it from here
(we'll just ignore who the original author was :)

Will add a patch on the next pass to fix the original instance
beforehand.

Thanks for the catch!

  +g_free(double1);
  +g_free(double2);
  +break;
  +}
 
 Thanks,
 Laszlo

Re: [Qemu-devel] [PATCH 0/8] qapi: add support for lists of native types

2013-05-09 Thread mdroth

On Thu, May 09, 2013 at 03:31:08PM +0200, Laszlo Ersek wrote:
 On 05/09/13 01:33, Michael Roth wrote:
  These patches apply on top of qemu.git master, and can also be obtained 
  from:
  git://github.com/mdroth/qemu.git qapi-native-lists
  
  Sending this now since a number of series have popped up in the past that
  wanted this, and Amos has some pending patches (query-mac-tables) that rely
  on this as well.
  
  These patches add support for specifying lists of native qapi types
  (int/bool/str/number) like so:
  
{ 'type': 'foo',
  'data': { 'bar': ['int'] }}
  
  for a 'bar' field that is a list of type 'int',
  
{ 'type': 'foo2',
  'data': { 'bar2': ['str'] }}
  
  for a 'bar2' field that is a list of type 'str', and so on.
  
  This uses linked list types for the native C representations, just as we do
  for complex schema-defined types. In the future we may add schema 
  annotations
  of some sort to specify a more natural/efficient array type for the C
  representations, but this should serve the majority of uses-cases for now.
  
   Makefile   |6 +-
   qapi-schema-test.json  |8 ++
   scripts/qapi-types.py  |   44 ++-
   scripts/qapi-visit.py  |   36 -
   scripts/qapi.py|   21 +++
   tests/test-qmp-input-visitor.c |  181 +
   tests/test-qmp-output-visitor.c|  172 
   tests/test-visitor-serialization.c |  256 
  +---
   8 files changed, 692 insertions(+), 32 deletions(-)
  
  
 
 Two notes:
 - the remark I made for 6/8 (comparing empty strings),
 
 - for 7/8 and 8/8: the format specification %3.4f is not very useful.
 3 is the field width, 4 is the precision, and the latter means for
 %f the number of digits printed after the radix character. It's not
 useful to specify a smaller field width (which covers the entire output
 string) than precision here.

Yup, that's a mistake. The field width isn't useful at all for what I
was intending actually (to constrain the whole portion of float so that
the fractional portion wouldn't get truncated by snprintf and make the
test less likely to catch re-encoding issues). I think just using %.4f
should suffice since we have plenty of extra buffer space for the values
we're working with (max being 32 / 3) so I'll do that for the next pass.

 
 Other than these nothing pokes me in the eye.
 
 Laszlo

Re: [Qemu-devel] virsh edit failed to take effect on KVM

2013-05-09 Thread Eric Blake

On 05/09/2013 05:42 AM, Gonglei (Arei) wrote:
 Hi all,
we use the command virsh edit  to modify the VM configuration 
 information online on KVM Platform(libvirt-1.0.0 and qemu-1.4), 
 but it does not take effect after reboot. However, it works fine on Xen 
 Platform.

This sort of question belongs best on libvirt-us...@redhat.com
(libvirt-list-request is not the right list address, and your question
isn't quite relevant to qemu-devel).

 With the changing, the VM is expected to start from cdrom, when execute the 
 command virsh reboot.

'virsh reboot' triggers a software reboot (reusing the same existing
qemu process); but 'virsh edit' only affects what will take place on a
hard boot (creation of a new qemu process).

 But the fact is that the modify does not take effect, the VM is still start 
 from hd. 
 Well, it will take effect if I use command virsh shutdown and virsh start 
 instesad of virsh reboot.

Correct, because that sequence spawns a new qemu process.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [Xen-devel] [RFC PATCH] vga: Start supporting resolution not multiple of 16 correctly.

2013-05-09 Thread Pasi Kärkkäinen

On Fri, Mar 15, 2013 at 06:14:30PM +, Frediano Ziglio wrote:
 Modern notebook support 136x768 resolution. The resolution width is
 not multiple of 16 causing some problems.
 
 Qemu VGA emulation require width resolution to be multiple of 8.
 
 VNC implementation require width resolution to be multiple of 16.
 
 This patch remove these limits. Was tested with a Windows machine with
 standard vga and 1366x768 as resolution. I had to update vgabios as
 version in qemu (pc-bios/vgabios-stdvga.bin) is quite old. I also had
 to add some patches on top of VGABIOS 0.7a to add some new
 resolutions.
 
 I have some doubt about this patch
 - are other UI (sdl, cocoa, qxl) happy if resolution is not multiple of 16 ?
 - scanline is computed exactly without any alignment (so 1366 8 bit is
 1366 bytes) while getting vesa information from a laptop it seems to
 use some kind of alignment (if became 0x580 which is 1408 bytes).
 Perhaps should I change either VGABIOS and Qemu to make this
 alignment?
 

Any plans to re-send the patch with Stefano's comments addressed? 

Thanks,

-- Pasi

 Signed-off-by: Frediano Ziglio frediano.zig...@citrix.com
 
 ---
  hw/vga.c |2 +-
  ui/vnc.c |   27 +--
  2 files changed, 14 insertions(+), 15 deletions(-)
 
 diff --git a/hw/vga.c b/hw/vga.c
 index 1caf23d..d229f06 100644
 --- a/hw/vga.c
 +++ b/hw/vga.c
 @@ -651,7 +651,7 @@ void vbe_ioport_write_data(void *opaque, uint32_t
 addr, uint32_t val)
  }
  break;
  case VBE_DISPI_INDEX_XRES:
 -if ((val = VBE_DISPI_MAX_XRES)  ((val  7) == 0)) {
 +if ((val = VBE_DISPI_MAX_XRES)  ((val  1) == 0)) {
  s-vbe_regs[s-vbe_index] = val;
  }
  break;
 diff --git a/ui/vnc.c b/ui/vnc.c
 index ff4e2ae..328d14d 100644
 --- a/ui/vnc.c
 +++ b/ui/vnc.c
 @@ -907,26 +907,27 @@ static int vnc_update_client(VncState *vs, int 
 has_dirty)
  for (y = 0; y  height; y++) {
  int x;
  int last_x = -1;
 -for (x = 0; x  width / 16; x++) {
 -if (test_and_clear_bit(x, vs-dirty[y])) {
 +for (x = 0; x  width; x += 16) {
 +if (test_and_clear_bit(x/16, vs-dirty[y])) {
  if (last_x == -1) {
  last_x = x;
  }
  } else {
  if (last_x != -1) {
 -int h = find_and_clear_dirty_height(vs, y, last_x, x,
 +int h = find_and_clear_dirty_height(vs, y,
 last_x/16, x/16,
  height);
 
 -n += vnc_job_add_rect(job, last_x * 16, y,
 -  (x - last_x) * 16, h);
 +n += vnc_job_add_rect(job, last_x, y,
 +  (x - last_x), h);
  }
  last_x = -1;
  }
  }
  if (last_x != -1) {
 -int h = find_and_clear_dirty_height(vs, y, last_x, x, 
 height);
 -n += vnc_job_add_rect(job, last_x * 16, y,
 -  (x - last_x) * 16, h);
 +int h = find_and_clear_dirty_height(vs, y, last_x/16,
 x/16, height);
 +if (x  width) x = width;
 +n += vnc_job_add_rect(job, last_x, y,
 +  (x - last_x), h);
  }
  }
 
 @@ -1771,7 +1772,7 @@ static void framebuffer_update_request(VncState
 *vs, int incremental,
 int w, int h)
  {
  int i;
 -const size_t width = ds_get_width(vs-ds) / 16;
 +const size_t width = (ds_get_width(vs-ds)+15) / 16;
 
  if (y_position  ds_get_height(vs-ds))
  y_position = ds_get_height(vs-ds);
 @@ -2595,10 +2596,6 @@ static int vnc_refresh_server_surface(VncDisplay *vd)
   * Check and copy modified bits from guest to server surface.
   * Update server dirty map.
   */
 -cmp_bytes = 64;
 -if (cmp_bytes  vnc_server_fb_stride(vd)) {
 -cmp_bytes = vnc_server_fb_stride(vd);
 -}
  if (vd-guest.format != VNC_SERVER_FB_FORMAT) {
  int width = pixman_image_get_width(vd-server);
  tmpbuf = qemu_pixman_linebuf_create(VNC_SERVER_FB_FORMAT, width);
 @@ -2619,8 +2616,10 @@ static int vnc_refresh_server_surface(VncDisplay *vd)
  }
  server_ptr = server_row;
 
 -for (x = 0; x + 15  width;
 +cmp_bytes = 64;
 +for (x = 0; x  width;
  x += 16, guest_ptr += cmp_bytes, server_ptr += 
 cmp_bytes) {
 +if (width - x  16) cmp_bytes = 4 * (width - x);
  if (!test_and_clear_bit((x / 16), vd-guest.dirty[y]))
  continue;
  if (memcmp(server_ptr, guest_ptr, cmp_bytes) == 0)
 -- 
 1.7.10.4

Re: [Qemu-devel] [PATCH] qga: set umask 0077 when daemonizing (CVE-2013-2007)

2013-05-09 Thread Bruce Rogers

  On 5/7/2013 at 05:47 AM, Anthony Liguori aligu...@us.ibm.com wrote: 
 From: Laszlo Ersek ler...@redhat.com
 
 The qemu guest agent creates a bunch of files with insecure permissions
 when started in daemon mode. For example:
 
   -rw-rw-rw- 1 root root /var/log/qemu-ga.log
   -rw-rw-rw- 1 root root /var/run/qga.state
   -rw-rw-rw- 1 root root /var/log/qga-fsfreeze-hook.log
 
 In addition, at least all files created with the guest-file-open QMP
 command, and all files created with shell output redirection (or
 otherwise) by utilities invoked by the fsfreeze hook script are affected.
 
 For now mask all file mode bits for group and others in
 become_daemon().
 
 Temporarily, for compatibility reasons, stick with the 0666 file-mode in
 case of files newly created by the guest-file-open QMP call. Do so
 without changing the umask temporarily.
 
 Signed-off-by: Laszlo Ersek ler...@redhat.com
 Signed-off-by: Anthony Liguori aligu...@us.ibm.com
 ---
  qga/commands-posix.c | 123 
 +--
  qga/main.c   |   2 +-
  2 files changed, 120 insertions(+), 5 deletions(-)
 
 diff --git a/qga/commands-posix.c b/qga/commands-posix.c
 index 3b5c536..04c6951 100644
 --- a/qga/commands-posix.c
 +++ b/qga/commands-posix.c
 @@ -18,6 +18,9 @@
  #include unistd.h
  #include errno.h
  #include fcntl.h
 +#include stdio.h
 +#include string.h
 +#include sys/stat.h
  #include inttypes.h
  #include qga/guest-agent-core.h
  #include qga-qmp-commands.h
 @@ -237,9 +240,122 @@ static GuestFileHandle *guest_file_handle_find(int64_t 
 id, Error **err)
  return NULL;
  }
  
 +typedef const char * const ccpc;
 +
 +/* http://pubs.opengroup.org/onlinepubs/9699919799/functions/fopen.html */
 +static const struct {
 +ccpc *forms;
 +int oflag_base;
 +} guest_file_open_modes[] = {
 +{ (ccpc[]){ r,  rb, NULL }, O_RDONLY  
 },
 +{ (ccpc[]){ w,  wb, NULL }, O_WRONLY | O_CREAT | O_TRUNC  
 },
 +{ (ccpc[]){ a,  ab, NULL }, O_WRONLY | O_CREAT | O_APPEND 
 },
 +{ (ccpc[]){ r+, rb+, r+b, NULL }, O_RDWR
 },
 +{ (ccpc[]){ w+, wb+, w+b, NULL }, O_RDWR   | O_CREAT | O_TRUNC  
 },
 +{ (ccpc[]){ a+, ab+, a+b, NULL }, O_RDWR   | O_CREAT | O_APPEND }
 +};
 +
 +static int
 +find_open_flag(const char *mode_str, Error **err)
 +{
 +unsigned mode;
 +
 +for (mode = 0; mode  ARRAY_SIZE(guest_file_open_modes); ++mode) {
 +ccpc *form;
 +
 +form = guest_file_open_modes[mode].forms;
 +while (*form != NULL  strcmp(*form, mode_str) != 0) {
 +++form;
 +}
 +if (*form != NULL) {
 +break;
 +}
 +}
 +
 +if (mode == ARRAY_SIZE(guest_file_open_modes)) {
 +error_setg(err, invalid file open mode '%s', mode_str);
 +return -1;
 +}
 +return guest_file_open_modes[mode].oflag_base | O_NOCTTY | O_NONBLOCK;
 +}
 +
 +#define DEFAULT_NEW_FILE_MODE (S_IRUSR | S_IWUSR | \
 +   S_IRGRP | S_IWGRP | \
 +   S_IROTH | S_IWOTH)
 +
 +static FILE *
 +safe_open_or_create(const char *path, const char *mode, Error **err)
 +{
 +Error *local_err = NULL;
 +int oflag;
 +
 +oflag = find_open_flag(mode, local_err);
 +if (local_err == NULL) {
 +int fd;
 +
 +/* If the caller wants / allows creation of a new file, we 
 implement it
 + * with a two step process: open() + (open() / fchmod()).
 + *
 + * First we insist on creating the file exclusively as a new file. 
 If
 + * that succeeds, we're free to set any file-mode bits on it. (The
 + * motivation is that we want to set those file-mode bits 
 independently
 + * of the current umask.)
 + *
 + * If the exclusive creation fails because the file already exists
 + * (EEXIST is not possible for any other reason), we just attempt 
 to
 + * open the file, but in this case we won't be allowed to change 
 the
 + * file-mode bits on the preexistent file.
 + *
 + * The pathname should never disappear between the two open()s in
 + * practice. If it happens, then someone very likely tried to race 
 us.
 + * In this case just go ahead and report the ENOENT from the second
 + * open() to the caller.
 + *
 + * If the caller wants to open a preexistent file, then the first
 + * open() is decisive and its third argument is ignored, and the 
 second
 + * open() and the fchmod() are never called.
 + */
 +fd = open(path, oflag | ((oflag  O_CREAT) ? O_EXCL : 0), 0);
 +if (fd == -1  errno == EEXIST) {
 +oflag = ~(unsigned)O_CREAT;
 +fd = open(path, oflag);
 +}
 +
 +if (fd == -1) {
 +error_setg_errno(local_err, errno, failed to open file '%s' 
 + (mode: '%s'), path, mode);
 +

Re: [Qemu-devel] [RFC PATCH 0/8] MemoryRegion and FlatView refcounting, replace hostmem with memory_region_find

Il 09/05/2013 02:53, liu ping fan ha scritto:
 On Wed, May 8, 2013 at 11:44 PM, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 08/05/2013 08:20, liu ping fan ha scritto:
 On Mon, May 6, 2013 at 10:25 PM, Paolo Bonzini pbonz...@redhat.com wrote:
 Hi,

 this is an alternative approach to refactoring of dataplane's HostMem
 code.  Here, I take Ping Fan's idea of RCU-style updating of the
 region list and apply it to the AddressSpace's FlatView.  With this

 In fact, I am worrying about the priority of MemoryListener, if it is
 true, then we should drop RCU-style idea.

 You mean in hostmem, or in general as in this patch?  Note that this
 patch releases the old FlatView at the end of all MemoryListener operations.

 Both in hostmem and this patch, they all broke the original design of
 the  MemoryListener, see notes for priority in code.

I think both hostmem and this patch are fine.  The hypervisor is never
involved, all accesses go through the old FlatView and regions cannot
disappear thanks to ref/unref.

In fact, we need _more_ RCU-style updates, not less.  For BQL-less
dispatch, address space mapping/translation can race against the
MemoryListeners in exec.c.  To fix this, phys_sections and
AddressSpaceDispatch need to be reference counted and RCU-ified as well.

Paolo

 I have set out 2 patches to highlight this issue, and have CC you and 
 Stefanha.
 
 Regards,
 Pingfan
 
 Paolo

 Also if it is true, there is
 already a bug with hostmem listener. It should use region_del, not
 region_nop to reconstruct the local view. But just let me have a deep
 thinking.

 Regards,
 Pingfan
 change, dataplane can simply use memory_region_find instead of
 hostmem.

 This is a somewhat larger change, but I prefer it for two reasons.

 1) it splits the task of adding BQL-less memory dispatch in two parts,
tacking memory_region_find first (which is simpler because locking
is left to the caller).

 2) HostMem duplicates a lot of the FlatView logic, and adding the
RCU-style update in FlatView benefits everyone.

 The missing ingredients here are:

 1) remember and unreference the MemoryRegions that are used in
a vring entry.  In order to implement this, it is probably simpler
to change vring.c to use virtio.c's VirtQueueElement data structure.
We want something like that anyway in order to support migration.

 2) add an owner field to MemoryRegion, and set it for all MemoryRegions
for hot-unpluggable devices.  In this series, ref/unref are stubs.

 For simplicity I based the patches on my IOMMU rebase.  I placed the
 tree at git://github.com/bonzini/qemu.git, branch iommu.

 Paolo

 Paolo Bonzini (8):
   memory: add ref/unref calls
   exec: check MRU in qemu_ram_addr_from_host
   memory: return MemoryRegion from qemu_ram_addr_from_host
   memory: ref/unref memory across address_space_map/unmap
   memory: access FlatView from a local variable
   memory: use a new FlatView pointer on every topology update
   memory: add reference counting to FlatView
   dataplane: replace hostmem with memory_region_find

  exec.c|   63 +---
  hw/core/loader.c  |1 +
  hw/display/exynos4210_fimd.c  |6 +
  hw/display/framebuffer.c  |   10 +-
  hw/i386/kvm/ioapic.c  |2 +
  hw/i386/kvmvapic.c|1 +
  hw/misc/vfio.c|2 +
  hw/virtio/dataplane/Makefile.objs |2 +-
  hw/virtio/dataplane/hostmem.c |  176 
 -
  hw/virtio/dataplane/vring.c   |   56 +--
  hw/virtio/vhost.c |2 +
  hw/virtio/virtio-balloon.c|1 +
  hw/xen/xen_pt.c   |4 +
  include/exec/cpu-common.h |2 +-
  include/exec/memory.h |9 ++
  include/hw/virtio/dataplane/hostmem.h |   57 ---
  include/hw/virtio/dataplane/vring.h   |3 +-
  kvm-all.c |2 +
  memory.c  |  142 +-
  target-arm/kvm.c  |2 +
  target-i386/kvm.c |4 +-
  target-sparc/mmu_helper.c |1 +
  xen-all.c |2 +
  23 files changed, 253 insertions(+), 297 deletions(-)
  delete mode 100644 hw/virtio/dataplane/hostmem.c
  delete mode 100644 include/hw/virtio/dataplane/hostmem.h

Re: [Qemu-devel] [PATCH 9/9] Make monitor command 'dump-guest-memory' dump in kdump-compressed format

2013-05-09 Thread Eric Blake

On 05/08/2013 11:27 PM, Zhang Xiaohe wrote:
 于 2013年05月09日 01:16, Eric Blake 写道:
 On 05/08/2013 02:50 AM, qiaonuohan wrote:

 Thanks for your suggestion. I will fix it like:

 { 'enum': 'DumpCompressionFormat',
'data': [ 'zlib', 'lzo', 'snappy' ] }

 For zlib is treated as the default compression format, and
 'uncompressed' won't be an option.

 No, I was serious that you need to provide 'uncompressed' as an explicit
 enum value.  It is very annoying to toggle between four states (three
 compression formats and a fourth state of no compression) when the
 fourth is available only by omitting a parameter.  The default MUST be
 'uncompressed' for backwards-compatibility, not 'zlib'.

 We'd like to make sure that we understand you precisely.
 
 The definion is like below:
 { 'enum': 'DumpGuestMemoryFormat',
   'data': [ 'uncompressed', 'zlib', 'lzo', 'snappy' ] }
 
 { 'command': 'dump-guest-memory',
   'data': { 'paging': 'bool', 'protocol': 'str', '*begin': 'int',
 '*length': 'int', '*format': 'DumpCompressionFormat' } }

Closer - make sure you use the same type name in both places (the enum
name 'DumpGuestMemoryFormat' is slightly nicer than the command use of
'DumpCompressionFormat'.

 
 'format' is optional:
 1. when 'format' is not specified, vmcore will be in ELF format.
 2. when 'format' is specified and its parameter is 'uncompressed',
vmcore will be in ELF format as well.
 3. when 'format' is specified and its parameter is 'zlib/lzo/snappy',
vmcore will be in kdump-compressed format.

Correct.  Or you could use the name 'elf' instead of 'uncompressed', if
that makes more sense - as long as the enum calls out the names you are
supporting.

 
 If this is what you suggest, then I don't think it is necessary to
 add 'uncompressed'. The backwards-compatibility is assured by case 1,
 in which the interface is exactly the same as before. So why do we
 add another parameter to do the same thing again?

Because it is nicer to apps to allow them to explicitly specify the
default.  Making 'format' optional allows older apps to still work, but
for newer apps, it is easier to ALWAYS supply the format argument than
it is to make the generation of the format argument conditional on
whether the default behavior is desired.

Trust me - I'm reviewing this as a potential user of your interface.
When I ask for a fourth enum value, it's because I want to use it.  When
you keep coming back and complaining that you don't want to provide the
fourth enum value because it is redundant, I feel like you aren't paying
attention to your user base.  I also think that implementation-wise, it
will be easier to write your code if you have an enum supplying all four
possibilities, since it is easier to write code that defaults a missing
argument to a known enum value, and then have the rest of your code deal
with enum values, than it is to write code that has to check everywhere
whether the argument is missing (for one value) vs. one of three other
enum values.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

[Qemu-devel] [RFC v2] virtio-balloon: automatic ballooning support

2013-05-09 Thread Luiz Capitulino

Automatic ballooning consists of dynamically adjusting the guest's
balloon according to memory pressure in the host and in the guest.

This commit implements the host side of automatic balloning, which
basically consists of:

 1. Registering with the memory.pressure_level API (from the
Linux memory controller cgroup) for the MEDIUM pressure event

This is a new feature starting on Linux kernel 3.10. For
more information on this please check
Documentation/cgroups/memory.txt in Linux kernel sources.

 2. On MEDIUM pressure event reception, QEMU asks the guest kernel
to inflate the balloon by 16MB

 3. This is only done if the guest negotiates VIRTIO_BALLOON_F_AUTO_BALLOON
which means the guest's kernel virtio-balloon driver also supports
automatic ballooning

Automatic deflate is performed by the guest.

Here are some numbers. The test-case is to run 35 VMs (1G of RAM each)
in parallel doing a kernel build. Host has 32GB of RAM and 16GB of swap.
SWAP IN and SWAP OUT correspond to the number of pages swapped in and
swapped out, respectively.

Auto-ballooning disabled:

RUN  TIME(s)  SWAP IN  SWAP OUT

1634  930980   1588522
2610  627422   1362174
3649  1079847  1616367
4543  953289   1635379
5642  913237   1514000

Auto-ballooning enabled:

RUN  TIME(s)  SWAP IN  SWAP OUT

1629  901  12537
2624  981  18506
3626  573  9085
4631  2250 42534
5627  1610 20808

FIXMEs/TODOs:
 - Should we have a lower limit for guest memory? Otherwise it can
   reach 0 if too many events are received
 - Or maybe we should rate-limit events?
 - It seems that events are being lost when too many of them are
   sent at the same time on a busy host
 - Allow this to be dynamically enabled by mngt

Signed-off-by: Luiz Capitulino lcapitul...@redhat.com
---

o You can find my test script here:

  
http://repo.or.cz/w/qemu/qmp-unstable.git/blob/refs/heads/balloon/auto-ballooning/memcg/rfc:/scripts/autob-test

o You can find the guest driver counterpart code at:

   
http://repo.or.cz/w/linux-2.6/luiz-linux-2.6.git/shortlog/refs/heads/virtio-balloon/auto-deflate/rfc

o To play with automatic ballooning, do the following:

 1. You'll need 3.9+ for the host kernel
 2. Get the guest kernel bits from:
   git://repo.or.cz/linux-2.6/luiz-linux-2.6.git virtio-balloon/auto-deflate/rfc
 3. Apply this patch to QEMU
 4. Enable the balloon device in qemu with:
-device virtio-balloon-pci,auto-balloon=true
 5. Generate memory pressure in the host, or put QEMU in a memcg cgroup with
limited memory. Watch the VM memory going down
 6. Generate pressure in the guest to see it going up again (say, a kernel
build with -j16)

 hw/virtio/virtio-balloon.c | 162 +
 hw/virtio/virtio-pci.c |   5 ++
 hw/virtio/virtio-pci.h |   1 +
 include/hw/virtio/virtio-balloon.h |  15 
 4 files changed, 183 insertions(+)

diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index d669756..4b23360 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -31,6 +31,12 @@
 
 #include hw/virtio/virtio-bus.h
 
+void virtio_balloon_set_conf(DeviceState *dev, VirtIOBalloonConf *bconf)
+{
+VirtIOBalloon *s = VIRTIO_BALLOON(dev);
+memcpy((s-bconf), bconf, sizeof(struct VirtIOBalloonConf));
+}
+
 static void balloon_page(void *addr, int deflate)
 {
 #if defined(__linux__)
@@ -279,9 +285,21 @@ static void virtio_balloon_set_config(VirtIODevice *vdev,
 }
 }
 
+static bool auto_balloon_enabled(const VirtIOBalloon *s)
+{
+return s-bconf.auto_balloon;
+}
+
 static uint32_t virtio_balloon_get_features(VirtIODevice *vdev, uint32_t f)
 {
+VirtIOBalloon *s = VIRTIO_BALLOON(vdev);
+
 f |= (1  VIRTIO_BALLOON_F_STATS_VQ);
+
+if (auto_balloon_enabled(s)) {
+f |= (1  VIRTIO_BALLOON_F_AUTO_BALLOON);
+}
+
 return f;
 }
 
@@ -336,6 +354,141 @@ static int virtio_balloon_load(QEMUFile *f, void *opaque, 
int version_id)
 return 0;
 }
 
+static int open_sysfile(const char *path, const char *file, mode_t mode)
+{
+char *p;
+int fd;
+
+p = g_strjoin(/, path, file, NULL);
+fd = qemu_open(p, mode);
+if (fd  0) {
+error_report(balloon: can't open '%s': %s, p, strerror(errno));
+}
+
+g_free(p);
+return fd;
+}
+
+static int write_fd(int fd, const char *fmt, ...)
+{
+va_list ap;
+char *str;
+int ret;
+
+va_start(ap, fmt);
+str = g_strdup_vprintf(fmt, ap);
+va_end(ap);
+
+do {
+ret = write(fd, str, strlen(str));
+} while (ret  0  errno == EINTR);
+
+if (ret  0) {
+error_report(balloon: write failed: %s, strerror(errno));
+}
+
+g_free(str);
+return ret;
+}
+
+static bool guest_supports_auto_balloon(const VirtIOBalloon *s)
+{
+VirtIODevice *vdev = VIRTIO_DEVICE(s);
+return vdev-guest_features  (1

Re: [Qemu-devel] [PATCH 1/2] Vring: vring's listener's priority should higher than kvm

Il 09/05/2013 10:54, liu ping fan ha scritto:
 On Thu, May 9, 2013 at 4:44 PM, Michael S. Tsirkin m...@redhat.com wrote:
 On Thu, May 09, 2013 at 08:40:21AM +0800, Liu Ping Fan wrote:
 From: Liu Ping Fan pingf...@linux.vnet.ibm.com

 Hosts threads which handle vring should have high MemoryListener priority
 than kvm. For currently code, take the following scenario:
   kvm_region_add() run earlier before vhost_region_add(), then in guest,
 vring's desc[i] can refer to addressX in the new region known by guest.
 But vhost does not know this new region yet, and the vring handler will
 fail.

 Signed-off-by: Liu Ping Fan pingf...@linux.vnet.ibm.com

 Is this seen in testing, or are you describing a theorecitical
 scenario? Please make this clear in the commit log.

 A  theorecitical scenario.  I think vcpu thread and vhost are async,
 so we need this method to sync.

But why should this matter for hostmem?  It doesn't communicate in any
way with the hypervisor.

Paolo

Re: [Qemu-devel] [PATCH V18 5/6] add-cow file format core code.

On Thu, May 09, 2013 at 02:24:20PM +0800, Dong Xu Wang wrote:
 On Thu, Apr 18, 2013 at 6:03 PM, Stefan Hajnoczi stefa...@gmail.com wrote:
  On Wed, Apr 10, 2013 at 04:11:52PM +0800, Dong Xu Wang wrote:
  +static coroutine_fn int add_cow_co_flush(BlockDriverState *bs)
  +{
  +BDRVAddCowState *s = bs-opaque;
  +int ret;
  +
  +qemu_co_mutex_lock(s-lock);
  +if (s-bitmap_cache) {
  +ret = block_cache_flush(bs, s-bitmap_cache);
  +if (ret  0) {
  +return ret;
  +}
  +}
  +ret = bdrv_flush(s-image_hd);
 
  This is the wrong way around.  We must flush image_hd first so that
  valid data is on disk.  Then we can flush bitmap_cache to mark the
  clusters allocated.
 
  Beyond explicit flushes you also need to make sure that image_hd is
  flushed *before* bitmap_cache tables are written out (e.g. cache
  eviction when the cache becomes full).  It seems this code is missing.
 
 Hi  Stefan, how can I make sure image_hd flush operation completed before
 I write bitmap_cache tables? Is there any functions I can use?

You could extend the block cache like this:

block_cache_add_dependency_func(BlockCache *c,
int (*dependency_fn)(BlockCache *c, void 
*opaque),
void *opaque);

Then add-cow would register a custom dependency function that flushes
image_hd.

A hackier way of achieving the same thing is to instantiate an
image_hd_cache and set a dependency from the bitmap_cache on the
image_hd_cache.  The image_hd_cache actually has no cache entries, but
it uses image_hd and will flush it.

Stefan

Re: [Qemu-devel] [PATCH] curl: fix curl read

On Thu, May 09, 2013 at 02:56:50PM +0800, Fam Zheng wrote:
 On Thu, 05/09 08:41, Stefan Hajnoczi wrote:
  On Fri, May 03, 2013 at 04:00:09PM +0800, Fam Zheng wrote:
   +cache = curl_find_cache(s, aio_base, aio_bytes);
   +if (cache) {
   +curl_complete_io(s, acb, cache);
   +return;
}
  
  What is the point of the cache?  Can you split it into a separate patch?
 
 The cache is for prefetch. Data is fetched by 256k chunks using libcurl
 and stored in cache to fill future io request, reducing overall http
 request overhead.

That sounds like a performance optimization which is not part of fixing
broken block/curl.c :).

But why bother doing this - the guest kernel uses read-ahead anyway to
fetch more data than the userspace read(2) requested?  This cache
amplifies the read-ahead and potentially wastes (e.g. random-read
workload).  Let the guest decide how much read-ahead is appropriate.

Stefan

Re: [Qemu-devel] [PATCH 1/2] Vring: vring's listener's priority should higher than kvm

On Thu, May 09, 2013 at 05:00:20PM +0800, liu ping fan wrote:
 On Thu, May 9, 2013 at 4:30 PM, Stefan Hajnoczi stefa...@gmail.com wrote:
  On Thu, May 09, 2013 at 08:40:21AM +0800, Liu Ping Fan wrote:
  From: Liu Ping Fan pingf...@linux.vnet.ibm.com
 
  Hosts threads which handle vring should have high MemoryListener priority
  than kvm. For currently code, take the following scenario:
kvm_region_add() run earlier before vhost_region_add(), then in guest,
  vring's desc[i] can refer to addressX in the new region known by guest.
  But vhost does not know this new region yet, and the vring handler will
  fail.
 
  Is there a concrete scenario where this happens?
 
  I can think of situations like the ioeventfd being readable before
  vhost/hostmem is populated.  But I don't see how that's related to the
  priority of kvm_region_add().
 
 For kvm, ie, In guest, vring_desc.addr can point to a chunk of data in
 the new added memory, and kick vhost. The vhost has not added this new
 region, so its local lookup table can not translate this new address,
 and vring handler will fail.  If vhost priority is higher than kvm,
 then, it will know this new address earlier than kvm.

Isn't the real solution to ensure that the memory API is up-to-date
before we notify the guest of memory hotplug?

I still don't see a kvm vs vhost race.  I see a guest vs vhost race
which priority doesn't fix.

Stefan

[Qemu-devel] [PATCH for-1.4.y 0/4] Drop the irredeemably racy cpu_unlink_tb()

This is a backport of the patch series by Peter Maydell, as found at
http://thread.gmane.org/gmane.comp.emulators.qemu/196629 , to 1.4.y
stable tree, very similar to what's been done for 1.1 tree.  The
difference is in context and because some code were moved between
exec.c and translate-all.c, but basically it is still the same
series.

Peter Maydell (4):
  tcg: Document tcg_qemu_tb_exec() and provide constants for low bit uses
  cpu-exec: wrap tcg_qemu_tb_exec() in a fn to restore the PC
  Handle CPU interrupts by inline checking of a flag
  translate-all.c: Remove cpu_unlink_tb()

 cpu-exec.c|   58 ++-
 exec.c|2 +-
 include/exec/cpu-defs.h   |1 +
 include/exec/gen-icount.h |   13 +++-
 tcg/tcg.h |   49 +-
 translate-all.c   |   73 ++---
 6 files changed, 108 insertions(+), 88 deletions(-)

-- 
1.7.10.4

[Qemu-devel] [PATCH for-1.4.y 2/4] cpu-exec: wrap tcg_qemu_tb_exec() in a fn to restore the PC

From: Peter Maydell peter.mayd...@linaro.org

If tcg_qemu_tb_exec() returns a value whose low bits don't indicate a
link to an indexed next TB, this means that the TB execution never
started (eg because the instruction counter hit zero).  In this case the
guest PC has to be reset to the address of the start of the TB.
Refactor the cpu-exec code to make all tcg_qemu_tb_exec() calls pass
through a wrapper function which does this restoration if necessary.

Note that the apparent change in cpu_exec_nocache() from calling
cpu_pc_from_tb() with the old TB to calling it with the TB returned by
do_tcg_qemu_tb_exec() is safe, because in the nocache case we can
guarantee that the TB we try to execute is not linked to any others,
so the only possible returned TB is the one we started at. That is,
we should arguably previously have included in cpu_exec_nocache() an
assert(next_tb  ~TB_EXIT_MASK) == tb), since the API requires restore
from next_tb but we were using tb.

Signed-off-by: Peter Maydell peter.mayd...@linaro.org
Reviewed-by: Richard Henderson r...@twiddle.net
Signed-off-by: Blue Swirl blauwir...@gmail.com

(cherry picked from commit 77211379d73ea0c89c0b5bb6eee74b17cb06f9a8)

Conflicts:
cpu-exec.c

Signed-off-by: Michael Tokarev m...@tls.msk.ru
---
 cpu-exec.c |   28 +---
 1 file changed, 17 insertions(+), 11 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index 797e11a..4ffae22 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -51,12 +51,26 @@ void cpu_resume_from_signal(CPUArchState *env, void *puc)
 }
 #endif
 
+/* Execute a TB, and fix up the CPU state afterwards if necessary */
+static inline tcg_target_ulong cpu_tb_exec(CPUArchState *env, uint8_t *tb_ptr)
+{
+tcg_target_ulong next_tb = tcg_qemu_tb_exec(env, tb_ptr);
+if ((next_tb  TB_EXIT_MASK)  TB_EXIT_IDX1) {
+/* We didn't start executing this TB (eg because the instruction
+ * counter hit zero); we must restore the guest PC to the address
+ * of the start of the TB.
+ */
+TranslationBlock *tb = (TranslationBlock *)(next_tb  ~TB_EXIT_MASK);
+cpu_pc_from_tb(env, tb);
+}
+return next_tb;
+}
+
 /* Execute the code without caching the generated code. An interpreter
could be used if available. */
 static void cpu_exec_nocache(CPUArchState *env, int max_cycles,
  TranslationBlock *orig_tb)
 {
-tcg_target_ulong next_tb;
 TranslationBlock *tb;
 
 /* Should never happen.
@@ -68,14 +82,8 @@ static void cpu_exec_nocache(CPUArchState *env, int 
max_cycles,
  max_cycles);
 env-current_tb = tb;
 /* execute the generated code */
-next_tb = tcg_qemu_tb_exec(env, tb-tc_ptr);
+cpu_tb_exec(env, tb-tc_ptr);
 env-current_tb = NULL;
-
-if ((next_tb  TB_EXIT_MASK) == TB_EXIT_ICOUNT_EXPIRED) {
-/* Restore PC.  This may happen if async event occurs before
-   the TB starts executing.  */
-cpu_pc_from_tb(env, tb);
-}
 tb_phys_invalidate(tb, -1);
 tb_free(tb);
 }
@@ -597,13 +605,11 @@ int cpu_exec(CPUArchState *env)
 if (likely(!env-exit_request)) {
 tc_ptr = tb-tc_ptr;
 /* execute the generated code */
-next_tb = tcg_qemu_tb_exec(env, tc_ptr);
+next_tb = cpu_tb_exec(env, tc_ptr);
 if ((next_tb  TB_EXIT_MASK) == TB_EXIT_ICOUNT_EXPIRED) {
 /* Instruction counter expired.  */
 int insns_left;
 tb = (TranslationBlock *)(next_tb  ~TB_EXIT_MASK);
-/* Restore PC.  */
-cpu_pc_from_tb(env, tb);
 insns_left = env-icount_decr.u32;
 if (env-icount_extra  insns_left = 0) {
 /* Refill decrementer and continue execution.  */
-- 
1.7.10.4

[Qemu-devel] [PATCH for-1.4.y 1/4] tcg: Document tcg_qemu_tb_exec() and provide constants for low bit uses

From: Peter Maydell peter.mayd...@linaro.org

Document tcg_qemu_tb_exec(). In particular, its return value is a
combination of a pointer to the next translation block and some
extra information in the low two bits. Provide some #defines for
the values passed in these bits to improve code clarity.

Signed-off-by: Peter Maydell peter.mayd...@linaro.org
Reviewed-by: Richard Henderson r...@twiddle.net
Signed-off-by: Blue Swirl blauwir...@gmail.com

(cherry picked from commit 0980011b4f66482d2733ab2dd0f2f61747772c6b)

Conflicts:
tcg/tcg.h

Signed-off-by: Michael Tokarev m...@tls.msk.ru
---
 cpu-exec.c|9 +
 include/exec/gen-icount.h |2 +-
 tcg/tcg.h |   44 +++-
 3 files changed, 49 insertions(+), 6 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index 19ebb4a..797e11a 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -71,7 +71,7 @@ static void cpu_exec_nocache(CPUArchState *env, int 
max_cycles,
 next_tb = tcg_qemu_tb_exec(env, tb-tc_ptr);
 env-current_tb = NULL;
 
-if ((next_tb  3) == 2) {
+if ((next_tb  TB_EXIT_MASK) == TB_EXIT_ICOUNT_EXPIRED) {
 /* Restore PC.  This may happen if async event occurs before
the TB starts executing.  */
 cpu_pc_from_tb(env, tb);
@@ -583,7 +583,8 @@ int cpu_exec(CPUArchState *env)
spans two pages, we cannot safely do a direct
jump. */
 if (next_tb != 0  tb-page_addr[1] == -1) {
-tb_add_jump((TranslationBlock *)(next_tb  ~3), next_tb  
3, tb);
+tb_add_jump((TranslationBlock *)(next_tb  ~TB_EXIT_MASK),
+next_tb  TB_EXIT_MASK, tb);
 }
 spin_unlock(tb_lock);
 
@@ -597,10 +598,10 @@ int cpu_exec(CPUArchState *env)
 tc_ptr = tb-tc_ptr;
 /* execute the generated code */
 next_tb = tcg_qemu_tb_exec(env, tc_ptr);
-if ((next_tb  3) == 2) {
+if ((next_tb  TB_EXIT_MASK) == TB_EXIT_ICOUNT_EXPIRED) {
 /* Instruction counter expired.  */
 int insns_left;
-tb = (TranslationBlock *)(next_tb  ~3);
+tb = (TranslationBlock *)(next_tb  ~TB_EXIT_MASK);
 /* Restore PC.  */
 cpu_pc_from_tb(env, tb);
 insns_left = env-icount_decr.u32;
diff --git a/include/exec/gen-icount.h b/include/exec/gen-icount.h
index 8043b3b..c858a73 100644
--- a/include/exec/gen-icount.h
+++ b/include/exec/gen-icount.h
@@ -32,7 +32,7 @@ static void gen_icount_end(TranslationBlock *tb, int 
num_insns)
 if (use_icount) {
 *icount_arg = num_insns;
 gen_set_label(icount_label);
-tcg_gen_exit_tb((tcg_target_long)tb + 2);
+tcg_gen_exit_tb((tcg_target_long)tb + TB_EXIT_ICOUNT_EXPIRED);
 }
 }
 
diff --git a/tcg/tcg.h b/tcg/tcg.h
index a427972..10eb3f4 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -660,7 +660,49 @@ TCGv_i64 tcg_const_local_i64(int64_t val);
 
 extern uint8_t *code_gen_prologue;
 
-/* TCG targets may use a different definition of tcg_qemu_tb_exec. */
+/**
+ * tcg_qemu_tb_exec:
+ * @env: CPUArchState * for the CPU
+ * @tb_ptr: address of generated code for the TB to execute
+ *
+ * Start executing code from a given translation block.
+ * Where translation blocks have been linked, execution
+ * may proceed from the given TB into successive ones.
+ * Control eventually returns only when some action is needed
+ * from the top-level loop: either control must pass to a TB
+ * which has not yet been directly linked, or an asynchronous
+ * event such as an interrupt needs handling.
+ *
+ * The return value is a pointer to the next TB to execute
+ * (if known; otherwise zero). This pointer is assumed to be
+ * 4-aligned, and the bottom two bits are used to return further
+ * information:
+ *  0, 1: the link between this TB and the next is via the specified
+ *TB index (0 or 1). That is, we left the TB via (the equivalent
+ *of) goto_tb index. The main loop uses this to determine
+ *how to link the TB just executed to the next.
+ *  2:we are using instruction counting code generation, and we
+ *did not start executing this TB because the instruction counter
+ *would hit zero midway through it. In this case the next-TB pointer
+ *returned is the TB we were about to execute, and the caller must
+ *arrange to execute the remaining count of instructions.
+ *
+ * If the bottom two bits indicate an exit-via-index then the CPU
+ * state is correctly synchronised and ready for execution of the next
+ * TB (and in particular the guest PC is the address to execute next).
+ * Otherwise, we gave up on execution of this TB before it started, and
+ * the caller must fix up the CPU state by calling

[Qemu-devel] [PATCH for-1.4.y 3/4] Handle CPU interrupts by inline checking of a flag

Fix some of the nasty TCG race conditions and crashes by implementing
cpu_exit() as setting a flag which is checked at the start of each TB.
This avoids crashes if a thread or signal handler calls cpu_exit()
while the execution thread is itself modifying the TB graph (which
may happen in system emulation mode as well as in linux-user mode
with a multithreaded guest binary).

This fixes the crashes seen in LP:668799; however there are another
class of crashes described in LP:1098729 which stem from the fact
that in linux-user with a multithreaded guest all threads will
use and modify the same global TCG date structures (including the
generated code buffer) without any kind of locking. This means that
multithreaded guest binaries are still in the unsupported
category.

Signed-off-by: Peter Maydell peter.mayd...@linaro.org
Reviewed-by: Richard Henderson r...@twiddle.net
Signed-off-by: Blue Swirl blauwir...@gmail.com

(cherry picked from commit 378df4b23753a11be650af7664ca76bc75cb9f01)

Conflicts:
exec.c
include/qom/cpu.h
translate-all.c
include/exec/gen-icount.h

Signed-off-by: Michael Tokarev m...@tls.msk.ru

Conflicts:
cpu-exec.c
---
 cpu-exec.c|   25 -
 exec.c|2 +-
 include/exec/cpu-defs.h   |1 +
 include/exec/gen-icount.h |   11 +++
 tcg/tcg.h |5 +
 translate-all.c   |4 ++--
 6 files changed, 44 insertions(+), 4 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index 4ffae22..1c6af24 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -63,6 +63,12 @@ static inline tcg_target_ulong cpu_tb_exec(CPUArchState 
*env, uint8_t *tb_ptr)
 TranslationBlock *tb = (TranslationBlock *)(next_tb  ~TB_EXIT_MASK);
 cpu_pc_from_tb(env, tb);
 }
+if ((next_tb  TB_EXIT_MASK) == TB_EXIT_REQUESTED) {
+/* We were asked to stop executing TBs (probably a pending
+ * interrupt. We've now stopped, so clear the flag.
+ */
+env-tcg_exit_req = 0;
+}
 return next_tb;
 }
 
@@ -606,7 +612,20 @@ int cpu_exec(CPUArchState *env)
 tc_ptr = tb-tc_ptr;
 /* execute the generated code */
 next_tb = cpu_tb_exec(env, tc_ptr);
-if ((next_tb  TB_EXIT_MASK) == TB_EXIT_ICOUNT_EXPIRED) {
+switch (next_tb  TB_EXIT_MASK) {
+case TB_EXIT_REQUESTED:
+/* Something asked us to stop executing
+ * chained TBs; just continue round the main
+ * loop. Whatever requested the exit will also
+ * have set something else (eg exit_request or
+ * interrupt_request) which we will handle
+ * next time around the loop.
+ */
+tb = (TranslationBlock *)(next_tb  ~TB_EXIT_MASK);
+next_tb = 0;
+break;
+case TB_EXIT_ICOUNT_EXPIRED:
+{
 /* Instruction counter expired.  */
 int insns_left;
 tb = (TranslationBlock *)(next_tb  ~TB_EXIT_MASK);
@@ -630,6 +649,10 @@ int cpu_exec(CPUArchState *env)
 next_tb = 0;
 cpu_loop_exit(env);
 }
+break;
+}
+default:
+break;
 }
 }
 env-current_tb = NULL;
diff --git a/exec.c b/exec.c
index b85508b..371713a 100644
--- a/exec.c
+++ b/exec.c
@@ -493,7 +493,7 @@ void cpu_reset_interrupt(CPUArchState *env, int mask)
 void cpu_exit(CPUArchState *env)
 {
 env-exit_request = 1;
-cpu_unlink_tb(env);
+env-tcg_exit_req = 1;
 }
 
 void cpu_abort(CPUArchState *env, const char *fmt, ...)
diff --git a/include/exec/cpu-defs.h b/include/exec/cpu-defs.h
index 2911b9f..07fce69 100644
--- a/include/exec/cpu-defs.h
+++ b/include/exec/cpu-defs.h
@@ -161,6 +161,7 @@ typedef struct CPUWatchpoint {
 uint32_t halted; /* Nonzero if the CPU is in suspend state */   \
 uint32_t interrupt_request; \
 volatile sig_atomic_t exit_request; \
+volatile sig_atomic_t tcg_exit_req; \
 CPU_COMMON_TLB  \
 struct TranslationBlock *tb_jmp_cache[TB_JMP_CACHE_SIZE];   \
 /* buffer for temporaries in the code generator */  \
diff --git a/include/exec/gen-icount.h b/include/exec/gen-icount.h
index c858a73..f45f975 100644
--- a/include/exec/gen-icount.h
+++ b/include/exec/gen-icount.h
@@ -7,10 +7,18 @@
 
 static TCGArg *icount_arg;
 static int icount_label;
+static int exitreq_label;

[Qemu-devel] [PATCH for-1.4.y 4/4] translate-all.c: Remove cpu_unlink_tb()

From: Peter Maydell peter.mayd...@linaro.org

The (unsafe) function cpu_unlink_tb() is now unused, so we can simply
remove it and any code that was only used by it.

Signed-off-by: Peter Maydell peter.mayd...@linaro.org
Reviewed-by: Richard Henderson r...@twiddle.net
Signed-off-by: Blue Swirl blauwir...@gmail.com

(cherry picked from commit 3a808cc407744c30daa7470b5f191cde1fbc1aae)

Conflicts:
translate-all.c

Signed-off-by: Michael Tokarev m...@tls.msk.ru
---
 translate-all.c |   69 ---
 1 file changed, 69 deletions(-)

diff --git a/translate-all.c b/translate-all.c
index 1288b2a..ba4d3f6 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -1350,55 +1350,6 @@ static TranslationBlock *tb_find_pc(uintptr_t tc_ptr)
 return tbs[m_max];
 }
 
-static void tb_reset_jump_recursive(TranslationBlock *tb);
-
-static inline void tb_reset_jump_recursive2(TranslationBlock *tb, int n)
-{
-TranslationBlock *tb1, *tb_next, **ptb;
-unsigned int n1;
-
-tb1 = tb-jmp_next[n];
-if (tb1 != NULL) {
-/* find head of list */
-for (;;) {
-n1 = (uintptr_t)tb1  3;
-tb1 = (TranslationBlock *)((uintptr_t)tb1  ~3);
-if (n1 == 2) {
-break;
-}
-tb1 = tb1-jmp_next[n1];
-}
-/* we are now sure now that tb jumps to tb1 */
-tb_next = tb1;
-
-/* remove tb from the jmp_first list */
-ptb = tb_next-jmp_first;
-for (;;) {
-tb1 = *ptb;
-n1 = (uintptr_t)tb1  3;
-tb1 = (TranslationBlock *)((uintptr_t)tb1  ~3);
-if (n1 == n  tb1 == tb) {
-break;
-}
-ptb = tb1-jmp_next[n1];
-}
-*ptb = tb-jmp_next[n];
-tb-jmp_next[n] = NULL;
-
-/* suppress the jump to next tb in generated code */
-tb_reset_jump(tb, n);
-
-/* suppress jumps in the tb on which we could have jumped */
-tb_reset_jump_recursive(tb_next);
-}
-}
-
-static void tb_reset_jump_recursive(TranslationBlock *tb)
-{
-tb_reset_jump_recursive2(tb, 0);
-tb_reset_jump_recursive2(tb, 1);
-}
-
 #if defined(TARGET_HAS_ICE)  !defined(CONFIG_USER_ONLY)
 void tb_invalidate_phys_addr(hwaddr addr)
 {
@@ -1417,26 +1368,6 @@ void tb_invalidate_phys_addr(hwaddr addr)
 }
 #endif /* TARGET_HAS_ICE  !defined(CONFIG_USER_ONLY) */
 
-void cpu_unlink_tb(CPUArchState *env)
-{
-/* FIXME: TB unchaining isn't SMP safe.  For now just ignore the
-   problem and hope the cpu will stop of its own accord.  For userspace
-   emulation this often isn't actually as bad as it sounds.  Often
-   signals are used primarily to interrupt blocking syscalls.  */
-TranslationBlock *tb;
-static spinlock_t interrupt_lock = SPIN_LOCK_UNLOCKED;
-
-spin_lock(interrupt_lock);
-tb = env-current_tb;
-/* if the cpu is currently executing code, we must unlink it and
-   all the potentially executing TB */
-if (tb) {
-env-current_tb = NULL;
-tb_reset_jump_recursive(tb);
-}
-spin_unlock(interrupt_lock);
-}
-
 void tb_check_watchpoint(CPUArchState *env)
 {
 TranslationBlock *tb;
-- 
1.7.10.4

Re: [Qemu-devel] [PATCH for-1.4.y 3/4] Handle CPU interrupts by inline checking of a flag

09.05.2013 19:30, Michael Tokarev wrote:
 Fix some of the nasty TCG race conditions and crashes by implementing
...

Somehow this went out without proper authorship line, dunno why.
I can resent if needed, adding:

From: Peter Maydell peter.mayd...@linaro.org

Thanks,

/mjt

[Qemu-devel] [Bug 1100843] Re: Live Migration Causes Performance Issues

2013-05-09 Thread Jonathan Jefferson

** Changed in: linux (Ubuntu)
   Status: Incomplete = Confirmed

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1100843

Title:
  Live Migration Causes Performance Issues

Status in QEMU:
  New
Status in “linux” package in Ubuntu:
  Confirmed
Status in “qemu-kvm” package in Ubuntu:
  Triaged

Bug description:
  I have 2 physical hosts running Ubuntu Precise.  With 1.0+noroms-
  0ubuntu14.7 and qemu-kvm 1.2.0+noroms-0ubuntu7 (source from quantal,
  built for Precise with pbuilder.) I attempted to build qemu-1.3.0 debs
  from source to test, but libvirt seems to have an issue with it that I
  haven't been able to track down yet.

   I'm seeing a performance degradation after live migration on Precise,
  but not Lucid.  These hosts are managed by libvirt (tested both
  0.9.8-2ubuntu17 and 1.0.0-0ubuntu4) in conjunction with OpenNebula.  I
  don't seem to have this problem with lucid guests (running a number of
  standard kernels, 3.2.5 mainline and backported linux-
  image-3.2.0-35-generic as well.)

  I first noticed this problem with phoronix doing compilation tests,
  and then tried lmbench where even simple calls experience performance
  degradation.

  I've attempted to post to the kvm mailing list, but so far the only
  suggestion was it may be related to transparent hugepages not being
  used after migration, but this didn't pan out.  Someone else has a
  similar problem here -
  http://thread.gmane.org/gmane.comp.emulators.kvm.devel/100592

  qemu command line example: /usr/bin/kvm -name one-2 -S -M pc-1.2 -cpu
  Westmere -enable-kvm -m 73728 -smp 16,sockets=2,cores=8,threads=1
  -uuid f89e31a4-4945-c12c-6544-149ba0746c2f -no-user-config -nodefaults
  -chardev
  socket,id=charmonitor,path=/var/lib/libvirt/qemu/one-2.monitor,server,nowait
  -mon chardev=charmonitor,id=monitor,mode=control -rtc
  base=utc,driftfix=slew -no-kvm-pit-reinjection -no-shutdown -device
  piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive
  file=/var/lib/one//datastores/0/2/disk.0,if=none,id=drive-virtio-
  disk0,format=raw,cache=none -device virtio-blk-
  pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-
  disk0,bootindex=1 -drive
  file=/var/lib/one//datastores/0/2/disk.1,if=none,id=drive-
  ide0-0-0,readonly=on,format=raw -device ide-cd,bus=ide.0,unit=0,drive
  =drive-ide0-0-0,id=ide0-0-0 -netdev
  tap,fd=23,id=hostnet0,vhost=on,vhostfd=25 -device virtio-net-
  pci,netdev=hostnet0,id=net0,mac=02:00:0a:64:02:fe,bus=pci.0,addr=0x3
  -vnc 0.0.0.0:2,password -vga cirrus -incoming tcp:0.0.0.0:49155
  -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5

  Disk backend is LVM running on SAN via FC connection (using symlink
  from /var/lib/one/datastores/0/2/disk.0 above)

  
  ubuntu-12.04 - first boot
  ==
  Simple syscall: 0.0527 microseconds
  Simple read: 0.1143 microseconds
  Simple write: 0.0953 microseconds
  Simple open/close: 1.0432 microseconds

  Using phoronix pts/compuational
  ImageMagick - 31.54s
  Linux Kernel 3.1 - 43.91s
  Mplayer - 30.49s
  PHP - 22.25s

  
  ubuntu-12.04 - post live migration
  ==
  Simple syscall: 0.0621 microseconds
  Simple read: 0.2485 microseconds
  Simple write: 0.2252 microseconds
  Simple open/close: 1.4626 microseconds

  Using phoronix pts/compilation
  ImageMagick - 43.29s
  Linux Kernel 3.1 - 76.67s
  Mplayer - 45.41s
  PHP - 29.1s

  
  I don't have phoronix results for 10.04 handy, but they were within 1% of 
each other...

  ubuntu-10.04 - first boot
  ==
  Simple syscall: 0.0524 microseconds
  Simple read: 0.1135 microseconds
  Simple write: 0.0972 microseconds
  Simple open/close: 1.1261 microseconds

  
  ubuntu-10.04 - post live migration
  ==
  Simple syscall: 0.0526 microseconds
  Simple read: 0.1075 microseconds
  Simple write: 0.0951 microseconds
  Simple open/close: 1.0413 microseconds

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1100843/+subscriptions

[Qemu-devel] [PATCH] vfio-pci: Add support for MSI affinity

When MSI is accelerated through KVM the vectors are only programmed
when the guest first enables MSI support.  Subsequent writes to the
vector address or data fields are ignored.  Unfortunately that means
we're ignore updates done to adjust SMP affinity of the vectors.
MSI SMP affinity already works in non-KVM mode because the address
and data fields are read from their backing store on each interrupt.

This patch stores the MSIMessage programmed into KVM so that we can
determine when changes are made and update the routes.

Signed-off-by: Alex Williamson alex.william...@redhat.com
---

For qemu-1.6

 hw/misc/vfio.c |   47 ---
 1 file changed, 40 insertions(+), 7 deletions(-)

diff --git a/hw/misc/vfio.c b/hw/misc/vfio.c
index 693a9ff..67a1580 100644
--- a/hw/misc/vfio.c
+++ b/hw/misc/vfio.c
@@ -102,6 +102,7 @@ typedef struct VFIOINTx {
 typedef struct VFIOMSIVector {
 EventNotifier interrupt; /* eventfd triggered on interrupt */
 struct VFIODevice *vdev; /* back pointer to device */
+MSIMessage msg; /* cache the MSI message so we know when it changes */
 int virq; /* KVM irqchip route for QEMU bypass */
 bool use;
 } VFIOMSIVector;
@@ -776,7 +777,6 @@ retry:
 vdev-msi_vectors = g_malloc0(vdev-nr_vectors * sizeof(VFIOMSIVector));
 
 for (i = 0; i  vdev-nr_vectors; i++) {
-MSIMessage msg;
 VFIOMSIVector *vector = vdev-msi_vectors[i];
 
 vector-vdev = vdev;
@@ -786,13 +786,13 @@ retry:
 error_report(vfio: Error: event_notifier_init failed);
 }
 
-msg = msi_get_message(vdev-pdev, i);
+vector-msg = msi_get_message(vdev-pdev, i);
 
 /*
  * Attempt to enable route through KVM irqchip,
  * default to userspace handling if unavailable.
  */
-vector-virq = kvm_irqchip_add_msi_route(kvm_state, msg);
+vector-virq = kvm_irqchip_add_msi_route(kvm_state, vector-msg);
 if (vector-virq  0 ||
 kvm_irqchip_add_irqfd_notifier(kvm_state, vector-interrupt,
vector-virq)  0) {
@@ -898,6 +898,33 @@ static void vfio_disable_msi(VFIODevice *vdev)
 vdev-host.bus, vdev-host.slot, vdev-host.function);
 }
 
+static void vfio_update_msi(VFIODevice *vdev)
+{
+int i;
+
+for (i = 0; i  vdev-nr_vectors; i++) {
+VFIOMSIVector *vector = vdev-msi_vectors[i];
+MSIMessage msg;
+
+if (!vector-use || vector-virq  0) {
+continue;
+}
+
+msg = msi_get_message(vdev-pdev, i);
+
+if (msg.address != vector-msg.address ||
+msg.data != vector-msg.data) {
+
+DPRINTF(%s(%04x:%02x:%02x.%x) MSI vector %d changed\n,
+__func__, vdev-host.domain, vdev-host.bus,
+vdev-host.slot, vdev-host.function, i);
+
+kvm_irqchip_update_msi_route(kvm_state, vector-virq, msg);
+vector-msg = msg;
+}
+}
+}
+
 /*
  * IO Port/MMIO - Beware of the endians, VFIO is always little endian
  */
@@ -1850,10 +1877,16 @@ static void vfio_pci_write_config(PCIDevice *pdev, 
uint32_t addr,
 
 is_enabled = msi_enabled(pdev);
 
-if (!was_enabled  is_enabled) {
-vfio_enable_msi(vdev);
-} else if (was_enabled  !is_enabled) {
-vfio_disable_msi(vdev);
+if (!was_enabled) {
+if (is_enabled) {
+vfio_enable_msi(vdev);
+}
+} else {
+if (!is_enabled) {
+vfio_disable_msi(vdev);
+} else {
+vfio_update_msi(vdev);
+}
 }
 } else if (pdev-cap_present  QEMU_PCI_CAP_MSIX 
 ranges_overlap(addr, len, pdev-msix_cap, MSIX_CAP_LENGTH)) {

[Qemu-devel] [PATCH 0/2] pci-assign: MSI affinity support

I posted these about 6 months ago and Jan felt we should implement
MSI notifiers like we have for MSI-X.  That still hasn't happened.
This is a pretty trivial addition to pci-assign and could easily
be ported to MSI notifiers if anyone cares enough down the road
to implement them.  I don't feel that this, or the vfio patch that
adds the same, warrants that kind of infrastructure.

I expect these are for qemu-1.6.  Thanks,

Alex

---

Alex Williamson (2):
  pci-assign: Refactor MSI virq array
  pci-assign: Add MSI affinity support


 hw/i386/kvm/pci-assign.c |   78 +++---
 1 file changed, 53 insertions(+), 25 deletions(-)

[Qemu-devel] [PATCH 1/2] pci-assign: Refactor MSI virq array

Convert the msi_virq array into a struct array so we can easily add
a place to track the MSIMessage for each vector.

Signed-off-by: Alex Williamson alex.william...@redhat.com
---
 hw/i386/kvm/pci-assign.c |   51 +++---
 1 file changed, 26 insertions(+), 25 deletions(-)

diff --git a/hw/i386/kvm/pci-assign.c b/hw/i386/kvm/pci-assign.c
index c1e08ec..0f83a4c 100644
--- a/hw/i386/kvm/pci-assign.c
+++ b/hw/i386/kvm/pci-assign.c
@@ -131,8 +131,10 @@ typedef struct AssignedDevice {
 } cap;
 uint8_t emulate_config_read[PCI_CONFIG_SPACE_SIZE];
 uint8_t emulate_config_write[PCI_CONFIG_SPACE_SIZE];
-int msi_virq_nr;
-int *msi_virq;
+int msi_nr;
+struct {
+int virq;
+} *msi;
 MSIXTableEntry *msix_table;
 hwaddr msix_table_addr;
 uint16_t msix_max;
@@ -689,19 +691,18 @@ again:
 return 0;
 }
 
-static void free_msi_virqs(AssignedDevice *dev)
+static void free_msi(AssignedDevice *dev)
 {
 int i;
 
-for (i = 0; i  dev-msi_virq_nr; i++) {
-if (dev-msi_virq[i] = 0) {
-kvm_irqchip_release_virq(kvm_state, dev-msi_virq[i]);
-dev-msi_virq[i] = -1;
+for (i = 0; i  dev-msi_nr; i++) {
+if (dev-msi[i].virq = 0) {
+kvm_irqchip_release_virq(kvm_state, dev-msi[i].virq);
 }
 }
-g_free(dev-msi_virq);
-dev-msi_virq = NULL;
-dev-msi_virq_nr = 0;
+g_free(dev-msi);
+dev-msi = NULL;
+dev-msi_nr = 0;
 }
 
 static void free_assigned_device(AssignedDevice *dev)
@@ -756,7 +757,7 @@ static void free_assigned_device(AssignedDevice *dev)
 close(dev-real_device.config_fd);
 }
 
-free_msi_virqs(dev);
+free_msi(dev);
 }
 
 static void assign_failed_examine(AssignedDevice *dev)
@@ -995,7 +996,7 @@ static void assigned_dev_update_msi(PCIDevice *pci_dev)
 perror(assigned_dev_update_msi: deassign irq);
 }
 
-free_msi_virqs(assigned_dev);
+free_msi(assigned_dev);
 
 assigned_dev-assigned_irq_type = ASSIGNED_IRQ_NONE;
 pci_device_set_intx_routing_notifier(pci_dev, NULL);
@@ -1011,9 +1012,9 @@ static void assigned_dev_update_msi(PCIDevice *pci_dev)
 return;
 }
 
-assigned_dev-msi_virq = g_malloc(sizeof(*assigned_dev-msi_virq));
-assigned_dev-msi_virq_nr = 1;
-assigned_dev-msi_virq[0] = virq;
+assigned_dev-msi = g_malloc(sizeof(*assigned_dev-msi));
+assigned_dev-msi_nr = 1;
+assigned_dev-msi[0].virq = virq;
 if (kvm_device_msi_assign(kvm_state, assigned_dev-dev_id, virq)  0) {
 perror(assigned_dev_update_msi: kvm_device_msi_assign);
 }
@@ -1074,14 +1075,14 @@ static int assigned_dev_update_msix_mmio(PCIDevice 
*pci_dev)
 return r;
 }
 
-free_msi_virqs(adev);
+free_msi(adev);
 
-adev-msi_virq_nr = adev-msix_max;
-adev-msi_virq = g_malloc(adev-msix_max * sizeof(*adev-msi_virq));
+adev-msi_nr = adev-msix_max;
+adev-msi = g_malloc(adev-msix_max * sizeof(*adev-msi));
 
 entry = adev-msix_table;
 for (i = 0; i  adev-msix_max; i++, entry++) {
-adev-msi_virq[i] = -1;
+adev-msi[i].virq = -1;
 
 if (assigned_dev_msix_skipped(entry)) {
 continue;
@@ -1093,13 +1094,13 @@ static int assigned_dev_update_msix_mmio(PCIDevice 
*pci_dev)
 if (r  0) {
 return r;
 }
-adev-msi_virq[i] = r;
+adev-msi[i].virq = r;
 
 DEBUG(MSI-X vector %d, gsi %d, addr %08x_%08x, data %08x\n, i,
   r, entry-addr_hi, entry-addr_lo, entry-data);
 
 r = kvm_device_msix_set_vector(kvm_state, adev-dev_id, i,
-   adev-msi_virq[i]);
+   adev-msi[i].virq);
 if (r) {
 error_report(fail to set MSI-X entry! %s, strerror(-r));
 break;
@@ -1127,7 +1128,7 @@ static void assigned_dev_update_msix(PCIDevice *pci_dev)
 perror(assigned_dev_update_msix: deassign irq);
 }
 
-free_msi_virqs(assigned_dev);
+free_msi(assigned_dev);
 
 assigned_dev-assigned_irq_type = ASSIGNED_IRQ_NONE;
 pci_device_set_intx_routing_notifier(pci_dev, NULL);
@@ -1139,7 +1140,7 @@ static void assigned_dev_update_msix(PCIDevice *pci_dev)
 return;
 }
 
-if (assigned_dev-msi_virq_nr  0) {
+if (assigned_dev-msi_nr  0) {
 if (kvm_device_msix_assign(kvm_state, assigned_dev-dev_id)  0) {
 perror(assigned_dev_enable_msix: assign irq);
 return;
@@ -1567,7 +1568,7 @@ static void assigned_dev_msix_mmio_write(void *opaque, 
hwaddr addr,
 } else if (assigned_dev_msix_masked(orig) 
!assigned_dev_msix_masked(entry)) {
 /* Vector unmasked */
-if (i = adev-msi_virq_nr || adev-msi_virq[i]  0) {
+if (i = adev-msi_nr || adev-msi[i].virq  0) {

[Qemu-devel] [PATCH 2/2] pci-assign: Add MSI affinity support

Track the last MSIMessage programmed so we can determine when it has
changed and update the routing to the guest.

Signed-off-by: Alex Williamson alex.william...@redhat.com
---
 hw/i386/kvm/pci-assign.c |   27 +++
 1 file changed, 27 insertions(+)

diff --git a/hw/i386/kvm/pci-assign.c b/hw/i386/kvm/pci-assign.c
index 0f83a4c..e09265b 100644
--- a/hw/i386/kvm/pci-assign.c
+++ b/hw/i386/kvm/pci-assign.c
@@ -134,6 +134,7 @@ typedef struct AssignedDevice {
 int msi_nr;
 struct {
 int virq;
+MSIMessage msg;
 } *msi;
 MSIXTableEntry *msix_table;
 hwaddr msix_table_addr;
@@ -1015,6 +1016,7 @@ static void assigned_dev_update_msi(PCIDevice *pci_dev)
 assigned_dev-msi = g_malloc(sizeof(*assigned_dev-msi));
 assigned_dev-msi_nr = 1;
 assigned_dev-msi[0].virq = virq;
+assigned_dev-msi[0].msg = msg;
 if (kvm_device_msi_assign(kvm_state, assigned_dev-dev_id, virq)  0) {
 perror(assigned_dev_update_msi: kvm_device_msi_assign);
 }
@@ -1027,6 +1029,27 @@ static void assigned_dev_update_msi(PCIDevice *pci_dev)
 }
 }
 
+static void assigned_dev_update_msi_msg(PCIDevice *pci_dev)
+{
+AssignedDevice *assigned_dev = DO_UPCAST(AssignedDevice, dev, pci_dev);
+uint8_t ctrl_byte = pci_get_byte(pci_dev-config + pci_dev-msi_cap +
+ PCI_MSI_FLAGS);
+MSIMessage msg;
+
+if (assigned_dev-assigned_irq_type != ASSIGNED_IRQ_MSI ||
+!(ctrl_byte  PCI_MSI_FLAGS_ENABLE)) {
+return;
+}
+
+msg = msi_get_message(pci_dev, 0);
+
+if (msg.address != assigned_dev-msi[0].msg.address ||
+msg.data != assigned_dev-msi[0].msg.data) {
+kvm_irqchip_update_msi_route(kvm_state, assigned_dev-msi[0].virq, 
msg);
+assigned_dev-msi[0].msg = msg;
+}
+}
+
 static bool assigned_dev_msix_masked(MSIXTableEntry *entry)
 {
 return (entry-ctrl  cpu_to_le32(0x1)) != 0;
@@ -1202,6 +1225,10 @@ static void assigned_dev_pci_write_config(PCIDevice 
*pci_dev, uint32_t address,
 if (range_covers_byte(address, len,
   pci_dev-msi_cap + PCI_MSI_FLAGS)) {
 assigned_dev_update_msi(pci_dev);
+} else if (ranges_overlap(address, len,
+  pci_dev-msi_cap + PCI_MSI_ADDRESS_LO,
+  10 - PCI_MSI_ADDRESS_LO)) {
+assigned_dev_update_msi_msg(pci_dev);
 }
 }
 if (assigned_dev-cap.available  ASSIGNED_DEVICE_CAP_MSIX) {

Re: [Qemu-devel] (i386) TCG is broken in 1.5-rc0 with guest kernel 3.2?

On 05/08/2013 02:03 AM, Michael Tokarev wrote:
 I just tried 1.5-rc0 (32bit x86 userspace), and guest kernel
 crashes in an interesting way right at init time.
 
  qemu-system-x86_64 -net none -kernel /boot/vmlinuz-3.2-amd64 -append 
 console=ttyS0 -serial file:serial
 

Irritatingly, this works for me, with the current fedora 18 kernel,

x86_64-softmmu/qemu-system-x86_64: ELF 32-bit LSB shared object, Intel 80386,
version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.32,
BuildID[sha1]=0x3286b0bfb3272b9b3004abe8d0c306f12470d77d, not stripped

./x86_64-softmmu/qemu-system-x86_64 -net none -kernel /boot/vmlinuz
3.8.11-200.fc18.x86_64 -append console=ttyS0 -serial stdio

Which makes it all the way until it panics for lack of root filesystem.

Hopefully this is the problem fixed by Aurelien's tcg/optimize.c patch today
(not yet applied to qemu head)?



r~

Re: [Qemu-devel] [PATCH 01/40] memory: assert that PhysPageEntry's ptr does not overflow

Il 09/05/2013 05:41, liu ping fan ha scritto:
 On Wed, May 8, 2013 at 12:08 AM, Paolo Bonzini pbonz...@redhat.com wrote:


 - Messaggio originale -
 Da: Peter Maydell peter.mayd...@linaro.org
 A: Paolo Bonzini pbonz...@redhat.com
 Cc: qemu-devel@nongnu.org, a...@ozlabs.ru, jan kiszka 
 jan.kis...@siemens.com, qemul...@gmail.com,
 stefa...@redhat.com, da...@gibson.dropbear.id.au
 Inviato: Martedì, 7 maggio 2013 17:44:59
 Oggetto: Re: [Qemu-devel] [PATCH 01/40] memory: assert that PhysPageEntry's 
 ptr does not overflow

 On 7 May 2013 15:16, Paolo Bonzini pbonz...@redhat.com wrote:
 Signed-off-by: Paolo Bonzini pbonz...@redhat.com
 ---
  exec.c |2 ++
  1 files changed, 2 insertions(+), 0 deletions(-)

 diff --git a/exec.c b/exec.c
 index 19725db..2e5b89a 100644
 --- a/exec.c
 +++ b/exec.c
 @@ -719,6 +719,8 @@ static void destroy_all_mappings(AddressSpaceDispatch
 *d)

  static uint16_t phys_section_add(MemoryRegionSection *section)
  {
 +assert(phys_sections_nb  TARGET_PAGE_SIZE);
 +
  if (phys_sections_nb == phys_sections_nb_alloc) {
  phys_sections_nb_alloc = MAX(phys_sections_nb_alloc * 2, 16);
  phys_sections = g_renew(MemoryRegionSection, phys_sections,

 Why is the limit we're asserting not the same as the maximum
 size that we pass to g_renew() below?

 That's a minimum size, isn't it?

 I'm asserting that the physical section number doesn't overflow into
 the page, since the TLB entries are stored as a combination of the two.

 Could you explain more detail? Why  TARGET_PAGE_SIZE, not 2^15?

Because the TLB entry is the or of the page address and the
phys_section.  Look here:

hwaddr memory_region_section_get_iotlb(CPUArchState *env,
   MemoryRegionSection *section,
   target_ulong vaddr,
   hwaddr paddr,
   int prot,
   target_ulong *address)
{
hwaddr iotlb;
CPUWatchpoint *wp;

if (memory_region_is_ram(section-mr)) {
/* Normal RAM.  */
iotlb = (memory_region_get_ram_addr(section-mr)  TARGET_PAGE_MASK)
+ memory_region_section_addr(section, paddr);
if (!section-readonly) {
iotlb |= phys_section_notdirty;
} else {
iotlb |= phys_section_rom;
}
} else {
iotlb = section - phys_sections;
iotlb += memory_region_section_addr(section, paddr);
}

where the else could be written better as:

iotlb = memory_region_section_addr(section, paddr);
iotlb |= section - phys_sections;

memory_region_section_addr will return a page-aligned value.

Paolo

Re: [Qemu-devel] (i386) TCG is broken in 1.5-rc0 with guest kernel 3.2?

On Thu, May 09, 2013 at 09:43:20AM -0700, Richard Henderson wrote:
 On 05/08/2013 02:03 AM, Michael Tokarev wrote:
  I just tried 1.5-rc0 (32bit x86 userspace), and guest kernel
  crashes in an interesting way right at init time.
  
   qemu-system-x86_64 -net none -kernel /boot/vmlinuz-3.2-amd64 -append 
  console=ttyS0 -serial file:serial
  
 
 Irritatingly, this works for me, with the current fedora 18 kernel,
 
 x86_64-softmmu/qemu-system-x86_64: ELF 32-bit LSB shared object, Intel 80386,
 version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.32,
 BuildID[sha1]=0x3286b0bfb3272b9b3004abe8d0c306f12470d77d, not stripped
 
 ./x86_64-softmmu/qemu-system-x86_64 -net none -kernel /boot/vmlinuz
 3.8.11-200.fc18.x86_64 -append console=ttyS0 -serial stdio
 
 Which makes it all the way until it panics for lack of root filesystem.
 
 Hopefully this is the problem fixed by Aurelien's tcg/optimize.c patch today
 (not yet applied to qemu head)?
 

Yes, it is.

-- 
Aurelien Jarno  GPG: 1024D/F1BCDB73
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] (i386) TCG is broken in 1.5-rc0 with guest kernel 3.2?

09.05.2013 20:52, Aurelien Jarno wrote:
 On Thu, May 09, 2013 at 09:43:20AM -0700, Richard Henderson wrote:
 On 05/08/2013 02:03 AM, Michael Tokarev wrote:
 I just tried 1.5-rc0 (32bit x86 userspace), and guest kernel
 crashes in an interesting way right at init time.

  qemu-system-x86_64 -net none -kernel /boot/vmlinuz-3.2-amd64 -append 
 console=ttyS0 -serial file:serial


 Irritatingly, this works for me, with the current fedora 18 kernel,

 x86_64-softmmu/qemu-system-x86_64: ELF 32-bit LSB shared object, Intel 80386,
 version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 
 2.6.32,
 BuildID[sha1]=0x3286b0bfb3272b9b3004abe8d0c306f12470d77d, not stripped

 ./x86_64-softmmu/qemu-system-x86_64 -net none -kernel /boot/vmlinuz
 3.8.11-200.fc18.x86_64 -append console=ttyS0 -serial stdio

 Which makes it all the way until it panics for lack of root filesystem.

 Hopefully this is the problem fixed by Aurelien's tcg/optimize.c patch today
 (not yet applied to qemu head)?
 
 Yes, it is.

Yes, the patch by Aurelien fixes the reported problem for me.

Thanks!

/mjt

Re: [Qemu-devel] [PATCH v6 00/11] rdma: migration support

2013-05-09 Thread Michael R. Hines


Comments inline. FYI: please CC mrhi...@us.ibm.com,
because it helps me know when to scroll threw the bazillion qemu-devel 
emails.


I have things separated out into folders and rules, but a direct CC is 
better =)



On 05/03/2013 07:28 PM, Chegu Vinod wrote:


Hi Michael,

I picked up the qemu bits from your github branch and gave it a try.   
(BTW the setup I was given temporary access to has a pair of MLX's  IB 
QDR cards connected back to back via QSFP cables)


Observed a couple of things and wanted to share..perhaps you may be 
aware of them already or perhaps these are unrelated to your specific 
changes ? (Note: Still haven't finished the review of your changes ).


a) x-rdma-pin-all off case

Seem to only work sometimes but fails at other times. Here is an 
example...


(qemu) rdma: Accepting rdma connection...
rdma: Memory pin all: disabled
rdma: verbs context after listen: 0x56757d50
rdma: dest_connect Source GID: fe80::2:c903:9:53a5, Dest GID: 
fe80::2:c903:9:5855

rdma: Accepted migration
qemu-system-x86_64: VQ 1 size 0x100 Guest index 0x4d2 inconsistent 
with Host ind

ex 0x4ec: delta 0xffe6
qemu: warning: error while loading state for instance 0x0 of device 
'virtio-net'

load of migration failed



Can you give me more details about the configuration of your VM?



b) x-rdma-pin-all on case :

The guest is not resuming on the target host. i.e. the source host's 
qemu states that migration is complete but the guest is not responsive 
anymore... (doesn't seem to have crashed but its stuck somewhere).
Have you seen this behavior before ? Any tips on how I could extract 
additional info ?


Is the QEMU monitor still responsive?
Can you capture a screenshot of the guest's console to see if there is a 
panic?

What kind of storage is attached to the VM?




Besides the list of noted restrictions/issues around having to pin all 
of guest memoryif the pinning is done as part of starting of the 
migration it ends up taking noticeably long time for larger guests. 
Wonder whether that should be counted as part of the total migration 
time ?.




That's a good question: The pin-all option should not be slowing down 
your VM to much as the VM should still be running before the 
migration_thread() actually kicks in and starts the migration.
I need more information on the configuration of your VM, guest operating 
system, architecture and so forth...
And similarly as before whether or not QEMU is not responsive or whether 
or not it's the guest that's panicked...


Also the act of pinning all the memory seems to freeze the guest. 
e.g. : For larger enterprise sized guests (say 128GB and higher) the 
guest is frozen is anywhere from nearly a minute (~50seconds) to 
multiple minutes as the guest size increases...which imo kind of 
defeats the purpose of live guest migration.


That's bad =) There must be a bug somewhere the largest VM I can 
create on my hardware is ~16GB - so let me give that a try and try to 
track down the problem.




Would like to hear if you have already thought about any other 
alternatives to address this issue ? for e.g. would it be better to 
pin all of the guest's memory as part of starting the guest itself ? 
Yes there are restrictions when we do pinning...but it can help with 
performance.


For such a large VM, I would definitely recommend pinning because I'm 
assuming you have enough processors or a large enough application to 
actually *use* that much memory, which would suggest that even after the 
bulk phase round of the migration has already completed that your VM is 
probably going to remain to be pretty busy.


It's just a matter of me tracking down what's causing the freeze and 
fixing it I'll look into it right now on my machine.



---
BTW, a different (yet sort of related) topic... recently a patch went 
into upstream that provided an option to qemu to mlock all of guest 
memory :


https://lists.gnu.org/archive/html/qemu-devel/2013-04/msg03947.html .


I had no idea...very interesting.



but when attempting to do the mlock for larger guests a lot of time is 
spent bringing each page into cache and clearing/zeron'g it etc.etc.


https://lists.gnu.org/archive/html/qemu-devel/2013-04/msg04161.html



Wow, I didn't know that either. Perhaps this must be causing the entire 
QEMU process and its threads to seize up.


It may be necessary to run the pinning command *outside* of QEMU's I/O 
lock in a separate thread if it's really that much overhead.


Thanks a lot for pointing this out.







Note: The basic tcp based live guest migration in the same qemu 
version still works fine on the same hosts over a pair of non-RDMA 
cards 10Gb NICs connected back-to-back.




Acknowledged.

[Qemu-devel] [PATCH] target-mips: clean-up in BIT_INSV

2013-05-09 Thread Petar Jovanovic

From: Petar Jovanovic petar.jovano...@imgtec.com

This is a small follow-up change to fix incorrect behaviour for INSV.

It includes two minor modifications:

- sizefilter is constant so it can be moved inside of the block,
- (int64_t)0x01 is replaced with 1LL for ease of reading.

No functional change.

Signed-off-by: Petar Jovanovic petar.jovano...@imgtec.com
---
 target-mips/dsp_helper.c |9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/target-mips/dsp_helper.c b/target-mips/dsp_helper.c
index 9212789..426a3b6 100644
--- a/target-mips/dsp_helper.c
+++ b/target-mips/dsp_helper.c
@@ -2900,11 +2900,12 @@ target_ulong helper_bitrev(target_ulong rt)
 return (target_ulong)rd;
 }
 
-#define BIT_INSV(name, posfilter, sizefilter, ret_type) \
+#define BIT_INSV(name, posfilter, ret_type) \
 target_ulong helper_##name(CPUMIPSState *env, target_ulong rs,  \
target_ulong rt) \
 {   \
 uint32_t pos, size, msb, lsb;   \
+uint32_t const sizefilter = 0x3F;   \
 target_ulong filter;\
 target_ulong temp, temprs, temprt;  \
 target_ulong dspc;  \
@@ -2921,7 +2922,7 @@ target_ulong helper_##name(CPUMIPSState *env, 
target_ulong rs,  \
 return rt;  \
 }   \
 \
-filter = ((int64_t)0x01  size) - 1;   \
+filter = (1LL  size) - 1; \
 filter = filter  pos; \
 temprs = (rs  pos)  filter;  \
 temprt = rt  ~filter;  \
@@ -2930,9 +2931,9 @@ target_ulong helper_##name(CPUMIPSState *env, 
target_ulong rs,  \
 return (target_long)(ret_type)temp; \
 }
 
-BIT_INSV(insv, 0x1F, 0x3F, int32_t);
+BIT_INSV(insv, 0x1F, int32_t);
 #ifdef TARGET_MIPS64
-BIT_INSV(dinsv, 0x7F, 0x3F, target_long);
+BIT_INSV(dinsv, 0x7F, target_long);
 #endif
 
 #undef BIT_INSV
-- 
1.7.9.5

[Qemu-devel] [PATCH for 1.5] target-i386 ROR r8/r16 imm instruction fix

Fix EFLAGS corruption by ROR r8/r16 imm instruction located at the end
of the TB, similarly to commit 089305ac for the non-immediate case.

Reported-by: Hervé Poussineau hpous...@reactos.org
Cc: Richard Henderson r...@twiddle.net
Signed-off-by: Aurelien Jarno aurel...@aurel32.net
---
 target-i386/translate.c |1 +
 1 file changed, 1 insertion(+)

diff --git a/target-i386/translate.c b/target-i386/translate.c
index 524a0b4..0aeccdb 100644
--- a/target-i386/translate.c
+++ b/target-i386/translate.c
@@ -1871,6 +1871,7 @@ static void gen_rot_rm_im(DisasContext *s, int ot, int 
op1, int op2,
 if (is_right) {
 tcg_gen_shri_tl(cpu_cc_src2, cpu_T[0], mask - 1);
 tcg_gen_shri_tl(cpu_cc_dst, cpu_T[0], mask);
+tcg_gen_andi_tl(cpu_cc_dst, cpu_cc_dst, 1);
 } else {
 tcg_gen_shri_tl(cpu_cc_src2, cpu_T[0], mask);
 tcg_gen_andi_tl(cpu_cc_dst, cpu_T[0], 1);
-- 
1.7.10.4

Re: [Qemu-devel] [PATCH for 1.5] Revert target-i386: Use movcond to implement rotate flags.

On Wed, May 01, 2013 at 08:06:51AM +0200, Hervé Poussineau wrote:
 This commit breaks boot of MS-DOS 6.22, which stops at:
 MODE CON CODEPAGE PREPARE=((850) C:\DOS\EGA.CPI)
 
 This reverts part of commit 34d80a55ff8517fd37bcfea5063b9797e2bd9132.
 
 Signed-off-by: Hervé Poussineau hpous...@reactos.org
 ---
 
 checkpatch.pl complains about this patch, but I preferred to do a simple
 revert than to fix code style.
 
  target-i386/translate.c |   98 
 ++-
  1 file changed, 45 insertions(+), 53 deletions(-)

The actual bug is the same than the one fixed in commit 089305ac, but
for the immediate case. I have just send a patch to fix the issue.

-- 
Aurelien Jarno  GPG: 1024D/F1BCDB73
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH for 1.5] target-i386 ROR r8/r16 imm instruction fix

On 05/09/2013 10:40 AM, Aurelien Jarno wrote:
 Fix EFLAGS corruption by ROR r8/r16 imm instruction located at the end
 of the TB, similarly to commit 089305ac for the non-immediate case.
 
 Reported-by: Hervé Poussineau hpous...@reactos.org
 Cc: Richard Henderson r...@twiddle.net
 Signed-off-by: Aurelien Jarno aurel...@aurel32.net
 ---
  target-i386/translate.c |1 +
  1 file changed, 1 insertion(+)

Reviewed-by: Richard Henderson r...@twiddle.net


r~

Re: [Qemu-devel] [PATCH] target-mips: clean-up in BIT_INSV

On Thu, May 09, 2013 at 07:28:37PM +0200, Petar Jovanovic wrote:
 From: Petar Jovanovic petar.jovano...@imgtec.com
 
 This is a small follow-up change to fix incorrect behaviour for INSV.
 
 It includes two minor modifications:
 
 - sizefilter is constant so it can be moved inside of the block,
 - (int64_t)0x01 is replaced with 1LL for ease of reading.
 
 No functional change.
 
 Signed-off-by: Petar Jovanovic petar.jovano...@imgtec.com
 ---
  target-mips/dsp_helper.c |9 +
  1 file changed, 5 insertions(+), 4 deletions(-)
 
 diff --git a/target-mips/dsp_helper.c b/target-mips/dsp_helper.c
 index 9212789..426a3b6 100644
 --- a/target-mips/dsp_helper.c
 +++ b/target-mips/dsp_helper.c
 @@ -2900,11 +2900,12 @@ target_ulong helper_bitrev(target_ulong rt)
  return (target_ulong)rd;
  }
  
 -#define BIT_INSV(name, posfilter, sizefilter, ret_type) \
 +#define BIT_INSV(name, posfilter, ret_type) \
  target_ulong helper_##name(CPUMIPSState *env, target_ulong rs,  \
 target_ulong rt) \
  {   \
  uint32_t pos, size, msb, lsb;   \
 +uint32_t const sizefilter = 0x3F;   \
  target_ulong filter;\
  target_ulong temp, temprs, temprt;  \
  target_ulong dspc;  \
 @@ -2921,7 +2922,7 @@ target_ulong helper_##name(CPUMIPSState *env, 
 target_ulong rs,  \
  return rt;  \
  }   \
  \
 -filter = ((int64_t)0x01  size) - 1;   \
 +filter = (1LL  size) - 1; \
  filter = filter  pos; \
  temprs = (rs  pos)  filter;  \
  temprt = rt  ~filter;  \
 @@ -2930,9 +2931,9 @@ target_ulong helper_##name(CPUMIPSState *env, 
 target_ulong rs,  \
  return (target_long)(ret_type)temp; \
  }
  
 -BIT_INSV(insv, 0x1F, 0x3F, int32_t);
 +BIT_INSV(insv, 0x1F, int32_t);
  #ifdef TARGET_MIPS64
 -BIT_INSV(dinsv, 0x7F, 0x3F, target_long);
 +BIT_INSV(dinsv, 0x7F, target_long);
  #endif
  
  #undef BIT_INSV

Thanks, queued for 1.6.


-- 
Aurelien Jarno  GPG: 1024D/F1BCDB73
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [Qemu-ppc] [PATCH 1/7] pci: add MPC105 PCI host bridge emulation

On Tue, May 7, 2013 at 5:48 AM, Hervé Poussineau hpous...@reactos.org wrote:
 Andreas Färber a écrit :

 Am 06.05.2013 22:57, schrieb Hervé Poussineau:

 Alexander Graf a écrit :

 On 05/03/2013 07:57 AM, Hervé Poussineau wrote:

 Alexander Graf a écrit :

 Am 02.05.2013 um 22:08 schrieb Hervé Poussineau
 hpous...@reactos.org:

 Non-contiguous I/O is not implemented.

 There is also somewhere a bug in the memory controller, which means
 that some real firmwares may not detect the correct amount of memory.
 This can be bypassed by adding '-m 1G' on the command line.

 Add x-auto-conf property, to automatically configure the memory
 controller at startup. This will be required by OpenBIOS, which
 doesn't know how to do it.

 Why not teach it? I'd prefer to see that logic in firmware.

 Me too, but I'm not confident enough in my capabilities to do it.

 Huh? Why not? Most of the device initialization code in OpenBIOS
 happens in C, so you don't even have to touch Forth code :).

 Autoconfiguration is only in one place of the code, so I think it can
 be removed easily once OpenBIOS has this logic.

 I'd prefer if we could come up with a clean model from the start. It
 really shouldn't be hard at all.

 I thought that for all other usages of OpenBIOS in QEMU, RAM was
 supposed to be available as soon as machine was powered on.

 However, I checked OpenBIOS code:
 One of the first things done in arch/ppc/qemu/start.S is to copy the
 exception vectors. So, I should add code before it to detect memory
 controller, detect ram size and configure memory controller?


 No. Why? QEMU does not depend on the memory controller being
 initialized, only the OS might expect some registers to be filled in. So
 you should look at or add the MPC105 PHB initialization hook in
 OpenBIOS' PCI code, long after the memory is set up.


 OpenBIOS depends of memory being available (at least the first KB/MB) even
 at its very startup, in arch/ppc/qemu/start.S. PCI initialization code comes
 much later.

OpenBIOS for Sparc32 and Sparc64 does not use RAM before RAM size has
been read from fw_cfg. A check for QEMU signature is done and machine
ID is also read before that.
arch/sparc32/entry.S
arch/sparc64/entry.S

It should be easy to change PPC to check the machine ID before accessing RAM.

 At boot, MPC105 datasheet says that memory controller is not configured, ie
 you have to not use RAM in OpenBIOS before PCI initialization.

The memory controller should be initialized much earlier than PCI.


 For other PPC targets (mac99, g3beige) using OpenBIOS, RAM is accessible at
 startup, so that's not a problem for OpenBIOS.

 So, no, QEMU does not depend of the memory controller being initialized, but
 OpenBIOS depends of the RAM being accessible ways before PCI initialization.
 I don't speak of the OS which might (or might not) expect some registers to
 be filled in.
 x-auto-conf property correctly sets some registers, so that memory is
 available at startup (like on mac99, g3beige emulations).

 Hervé

Re: [Qemu-devel] [RFC] reverse execution.

On Tue, May 7, 2013 at 6:27 PM, KONRAD Frédéric
fred.kon...@greensocs.com wrote:
 Hi,

 We are trying to find a way to do reverse execution happen with QEMU.

 Actually, it is possible to debug the guest through the gdbstub, we want to
 make the reverse execution possible with GDB as well.

 How we are trying to make that working (basically without optimisation):

 -QEMU takes regular snapshot of the VM:
that can be done with the save vm code without optimisation first.

 -When the VM is stopped and GDB requests a reverse-step:
load the last snapshot and replay to one instruction before the current
 PC.

 There are one issue with that for now (for a basic running reverse
 execution):
 -How to stop one instruction before the actual PC.

Add a special translation mode for reverse execution where the next PC
is checked after each instruction. Alternatively, you could make
temporary snapshots during this mode (first 1s intervals, then 0.1s
etc) which could be used to find the location. I think this way was
discussed briefly earlier in the list, please check the archives.


 We though that using -icount and stop the guest a little time before the
 actual position would give us the right behavior (We use a qemu_timer with
 vm_clock to stop the vm at the good time), but it seems that it is not
 deterministic, and not reproducable.

 Is that normal?

 We don't make any input during the replay, and we though that it can be
 caused
 by some timer interruption but -icount is using a virtual timer as I
 understand?

 We have two other ideas:

 -Using TCI and count each instruction executed by the processor, then
 stop
 one instruction before the actual position. This seems slower.

 -Using single-step to count each instruction, then stop one instruction
 before the actual position.

 Would that be better?

 For now we can restore the VM from the last snapshot, when we do a
 reverse-step
 but we can't stop at the exact position.

 Thanks,
 Fred

Re: [Qemu-devel] [PATCH] target-mips: clean-up in BIT_INSV

On 9 May 2013 18:28, Petar Jovanovic petar.jovano...@rt-rk.com wrote:
 @@ -2921,7 +2922,7 @@ target_ulong helper_##name(CPUMIPSState *env, 
 target_ulong rs,  \
  return rt;  \
  }   \
  \
 -filter = ((int64_t)0x01  size) - 1;   \
 +filter = (1LL  size) - 1; \
  filter = filter  pos; \
  temprs = (rs  pos)  filter;  \
  temprt = rt  ~filter;  \

This section of code is pretty much hand-coding
temp = deposit64(rt, pos, size, rs);
isn't it?

-- PMM

Re: [Qemu-devel] sparc-linux-user: Fix missing symbols in .rel/.rela.plt sections

On Wed, May 8, 2013 at 7:18 AM, Michael Tokarev m...@tls.msk.ru wrote:
 Ping #2?

 06.04.2013 10:08, Michael Tokarev wrote:
 This patch was submitted more than a year ago (at Jan-2012).
 Is it still needed?  If yes, why it hasn't been applied?

Well, is it still needed?


 It still applies cleanly to the current git, with the exception
 of s|^|ldscripts/| - sparc.ld moved from the top directory to
 ldscripts/.

 (Ref: http://patchwork.ozlabs.org/patch/135267 )

 Thanks,

 /mjt


 10.01.2012 11:38, Aurelien Jarno wrote:
 sparc-linux-user: Fix missing symbols in .rel/.rela.plt sections

 Fix .rel.plt sections in the output to not only include .rel.plt
 sections from the input but also the .rel.iplt sections and to define
 the hidden symbols __rel_iplt_start and __rel_iplt_end around
 .rel.iplt as otherwise we get undefined references to these when
 linking statically to a multiarch enabled libc (using STT_GNU_IFUNC).

 Cc: Blue Swirl blauwir...@gmail.com
 Signed-off-by: Aurelien Jarno aurel...@aurel32.net

 ---
 sparc.ld |   16 ++--
  1 files changed, 14 insertions(+), 2 deletions(-)

 diff --git a/sparc.ld b/sparc.ld
 index 56efe34..cec17c9 100644
 --- a/sparc.ld
 +++ b/sparc.ld
 @@ -37,8 +37,20 @@ SECTIONS
.rela.fini : { *(.rela.fini)  }
.rel.bss   : { *(.rel.bss)}
.rela.bss  : { *(.rela.bss)   }
 -  .rel.plt   : { *(.rel.plt)}
 -  .rela.plt  : { *(.rela.plt)   }
 +  .rel.plt  :
 +  {
 +*(.rel.plt)
 +PROVIDE (__rel_iplt_start = .);
 +*(.rel.iplt)
 +PROVIDE (__rel_iplt_end = .);
 +  }
 +  .rela.plt   :
 +  {
 +*(.rela.plt)
 +PROVIDE (__rela_iplt_start = .);
 +*(.rela.iplt)
 +PROVIDE (__rela_iplt_end = .);
 +  }
.init  : { *(.init)   } =0x47ff041f
.text  :
{

Re: [Qemu-devel] Reporting Heisenbugs in qemu

On Wed, May 8, 2013 at 10:18 AM, Paolo Bonzini pbonz...@redhat.com wrote:

 Paolo Bonzini pbonz...@redhat.com writes:

   I guess that's the register windows.  There's only so much you can do to
   optimize them, and heavily recursive workloads (like Perl, or the RTL
   half of GCC) pay a hefty price.

 Two qemu targets stand out for slowness, sparc (32 and 64) and mips (64,
 don't know about 32).

 x86 (32 and 64), arm, and ppc run with a slowdown of  30 for my bogus
 benchmark of GMP configure+make.

 With FreeBSD x86_64 I see a slowdown of just 13.  (My reference system
 runs FreeBSD, so running FreeBSD under qemu is only far.)

 My claimed slowdown factors are affected by kernel, libraries, and
 unfortunately very much by gcc speed, which vary with target.

 If the sparc emulation speed is due to register windows, then why does
 mips seem just as slow?

 No idea. :)  Of all the architectures you listed, MIPS is really the one
 that I have no inkling of...

 If register windows shortage is a problem, it should be easy to pretend
 to have lots of them, right?

 That can help to avoid trapping to the kernel.  But you still have to
 spill the whole window to the stack on every call to a non-leaf function,
 which can be expensive even if you do it in the translated code.

The kernel wouldn't be able to spill or fill more than 8 windows
anyway. But maybe we could implement window cleaning in QEMU, if
nobody minds not seeing any CLEANWIN traps. I don't think real HW can
clean the windows.

Register window access in QEMU could be improved slightly, for example
it may be possible to handle wrapping of the registers with sparse
layout and page mapping tricks. Also the translator might be able to
keep track of current CWP and generate direct accesses instead of
going through regwptr.


 Paolo

Re: [Qemu-devel] Profiling sparc64 emulation

On Wed, May 08, 2013 at 11:02:24PM +0200, Artyom Tarasenko wrote:
 On Wed, May 8, 2013 at 12:57 AM, Aurelien Jarno aurel...@aurel32.net wrote:
  On Tue, May 07, 2013 at 11:29:20PM +0200, Artyom Tarasenko wrote:
  On Tue, May 7, 2013 at 1:38 PM, Torbjorn Granlund t...@gmplib.org wrote:
   The 2nd table of http://gmplib.org/devel/testsystems.html shows all
   emulated systems I am using, most of which are qemu-based.
 
  Do I read it correct that qemu-system-ppc64 with the slowdown factor
  of 33 is ~3 times faster than qemu-system-sparc64 with the slowdown
  factor of 96 ?
  Do they both use Debian Wheezy guest? You have a remark that ppc64 has
  problems with its clock. Was it taken into account when the slowdown
  factors were calculated?
 
 
  Clock or not, it should be noted that qemu-system-sparc64 is undoubtedly
  slower (at least 5 to 10 times) than qemu-system-{arm,ppc,mips,...} on
  some type of load like perl scripts.
 
 That's interesting. Actually it should be possible to lauch perl under user
 mode qemu-sparc32plus. Is it possible to launch perl under user mode
 qemu-ppc{32,64} too?
 
 That would allow to understand whether the bad performance has to do
 with TCG or the rest of the system emulation.

I haven't done that yet, but I have run perf top while running perl
script (lintian), on both qemu-system-sparc64 and qemu-system-ppc64. The
results are quite different:

qemu-system-ppc64
-
 49,73%  perf-10672.map   [.] 0x7f7853ab4e0f
 13,23%  qemu-system-ppc64[.] cpu_ppc_exec
 13,16%  libglib-2.0.so.0.3200.4  [.] g_hash_table_lookup
  8,18%  libglib-2.0.so.0.3200.4  [.] g_str_hash
  2,47%  qemu-system-ppc64[.] object_class_dynamic_cast
  1,97%  qemu-system-ppc64[.] type_is_ancestor
  1,05%  libglib-2.0.so.0.3200.4  [.] g_str_equal
  0,91%  qemu-system-ppc64[.] ppc_cpu_do_interrupt
  0,90%  qemu-system-ppc64[.] object_dynamic_cast_assert
  0,79%  libc-2.13.so [.] __sigsetjmp
  0,62%  qemu-system-ppc64[.] type_get_parent.isra.3
  0,58%  qemu-system-ppc64[.] type_get_by_name
  0,57%  qemu-system-ppc64[.] qemu_log_mask
  0,54%  qemu-system-ppc64[.] object_dynamic_cast

qemu-system-sparc64
---
 17,43%  perf-8154.map[.] 0x7f6ac10245c8
  
 10,46%  qemu-system-sparc64  [.] tcg_optimize  
  
 10,36%  qemu-system-sparc64  [.] cpu_sparc_exec
  
  6,35%  qemu-system-sparc64  [.] tb_flush_jmp_cache
  
  4,75%  qemu-system-sparc64  [.] get_physical_address_data 
  
  4,45%  qemu-system-sparc64  [.] tcg_liveness_analysis 
  
  4,35%  qemu-system-sparc64  [.] tcg_reg_alloc_op  
  
  2,90%  qemu-system-sparc64  [.] tlb_flush_page
  
  2,35%  qemu-system-sparc64  [.] disas_sparc_insn  
  
  2,28%  qemu-system-sparc64  [.] get_physical_address_code 
  
  2,21%  qemu-system-sparc64  [.] tlb_flush 
  
  1,64%  qemu-system-sparc64  [.] tcg_out_opc   
  
  1,22%  qemu-system-sparc64  [.] tcg_out_modrm_sib_offset.constprop.41 
  
  1,20%  qemu-system-sparc64  [.] helper_ld_asi 
  
  1,14%  qemu-system-sparc64  [.] gen_intermediate_code_pc  
  
  1,04%  qemu-system-sparc64  [.] helper_st_asi 
  
  1,00%  qemu-system-sparc64  [.] object_class_dynamic_cast 
  
  0,98%  qemu-system-sparc64  [.] tb_find_pc

Re: [Qemu-devel] Profiling sparc64 emulation

Il 09/05/2013 20:30, Aurelien Jarno ha scritto:
  13,16%  libglib-2.0.so.0.3200.4  [.] g_hash_table_lookup
   8,18%  libglib-2.0.so.0.3200.4  [.] g_str_hash
   2,47%  qemu-system-ppc64[.] object_class_dynamic_cast
   1,97%  qemu-system-ppc64[.] type_is_ancestor

That's worrisome, but should be easy to fix... can you make a callgraph
profile?

Paolo

[Qemu-devel] [RFC PATCH v5 2/3] Add 'auto-converge' migration capability

 The auto-converge migration capability allows the user to specify if they
 choose live migration seqeunce to automatically detect and force convergence.

Signed-off-by: Chegu Vinod chegu_vi...@hp.com
---
 include/migration/migration.h |2 ++
 migration.c   |9 +
 qapi-schema.json  |5 -
 3 files changed, 15 insertions(+), 1 deletions(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index e2acec6..ace91b0 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -127,4 +127,6 @@ int migrate_use_xbzrle(void);
 int64_t migrate_xbzrle_cache_size(void);
 
 int64_t xbzrle_cache_resize(int64_t new_size);
+
+bool migrate_auto_converge(void);
 #endif
diff --git a/migration.c b/migration.c
index 3eb0fad..570cee5 100644
--- a/migration.c
+++ b/migration.c
@@ -474,6 +474,15 @@ void qmp_migrate_set_downtime(double value, Error **errp)
 max_downtime = (uint64_t)value;
 }
 
+bool migrate_auto_converge(void)
+{
+MigrationState *s;
+
+s = migrate_get_current();
+
+return s-enabled_capabilities[MIGRATION_CAPABILITY_AUTO_CONVERGE];
+}
+
 int migrate_use_xbzrle(void)
 {
 MigrationState *s;
diff --git a/qapi-schema.json b/qapi-schema.json
index 199744a..b33839c 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -602,10 +602,13 @@
 #  This feature allows us to minimize migration traffic for certain 
work
 #  loads, by sending compressed difference of the pages
 #
+# @auto-converge: Migration supports automatic throttling down of guest
+#  to force convergence. (since 1.6)
+#
 # Since: 1.2
 ##
 { 'enum': 'MigrationCapability',
-  'data': ['xbzrle'] }
+  'data': ['xbzrle', 'auto-converge'] }
 
 ##
 # @MigrationCapabilityStatus
-- 
1.7.1

[Qemu-devel] [RFC PATCH v5 0/3] Throttle-down guest to help with live migration convergence.

Busy enterprise workloads hosted on large sized VM's tend to dirty
memory faster than the transfer rate achieved via live guest migration.
Despite some good recent improvements ( using dedicated 10Gig NICs
between hosts) the live migration does NOT converge.

If a user chooses to force convergence of their migration via a new
migration capability auto-converge then this change will auto-detect
lack of convergence scenario and trigger a slow down of the workload
by explicitly disallowing the VCPUs from spending much time in the VM
context.

The migration thread tries to catchup and this eventually leads
to convergence in some deterministic amount of time. Yes it does
impact the performance of all the VCPUs but in my observation that
lasts only for a short duration of time. i.e. end up entering
stage 3 (downtime phase) soon after that. No external trigger is
required.

Thanks to Juan and Paolo for their useful suggestions.

---

Changes from v4:
- incorporated feedback from Paolo.
- split into 3 patches.

Changes from v3:
- incorporated feedback from Paolo and Eric
- rebased to latest qemu.git

Changes from v2:
- incorporated feedback from Orit, Juan and Eric
- stop the throttling thread at the start of stage 3
- rebased to latest qemu.git

Changes from v1:
- rebased to latest qemu.git
- added auto-converge capability(default off) - suggested by Anthony Liguori 
Eric Blake.

Signed-off-by: Chegu Vinod chegu_vi...@hp.com

Chegu Vinod (3):
 Introduce async_run_on_cpu()
 Add 'auto-converge' migration capability
 Force auto-convegence of live migration

 arch_init.c   |   68 +
 cpus.c|   29 +
 include/migration/migration.h |6 +++
 include/qemu-common.h |1 +
 include/qom/cpu.h |   10 ++
 migration.c   |   10 ++
 qapi-schema.json  |5 ++-
 7 files changed, 128 insertions(+), 1 deletions(-)

[Qemu-devel] [RFC PATCH v5 1/3] Introduce async_run_on_cpu()

 Introduce an asynchronous version of run_on_cpu() i.e. the caller
 doesn't have to block till the call back routine finishes execution
 on the target vcpu.

Signed-off-by: Chegu Vinod chegu_vi...@hp.com
---
 cpus.c|   29 +
 include/qemu-common.h |1 +
 include/qom/cpu.h |   10 ++
 3 files changed, 40 insertions(+), 0 deletions(-)

diff --git a/cpus.c b/cpus.c
index c232265..8cd4eab 100644
--- a/cpus.c
+++ b/cpus.c
@@ -653,6 +653,7 @@ void run_on_cpu(CPUState *cpu, void (*func)(void *data), 
void *data)
 
 wi.func = func;
 wi.data = data;
+wi.free = false;
 if (cpu-queued_work_first == NULL) {
 cpu-queued_work_first = wi;
 } else {
@@ -671,6 +672,31 @@ void run_on_cpu(CPUState *cpu, void (*func)(void *data), 
void *data)
 }
 }
 
+void async_run_on_cpu(CPUState *cpu, void (*func)(void *data), void *data)
+{
+struct qemu_work_item *wi;
+
+if (qemu_cpu_is_self(cpu)) {
+func(data);
+return;
+}
+
+wi = g_malloc0(sizeof(struct qemu_work_item));
+wi-func = func;
+wi-data = data;
+wi-free = true;
+if (cpu-queued_work_first == NULL) {
+cpu-queued_work_first = wi;
+} else {
+cpu-queued_work_last-next = wi;
+}
+cpu-queued_work_last = wi;
+wi-next = NULL;
+wi-done = false;
+
+qemu_cpu_kick(cpu);
+}
+
 static void flush_queued_work(CPUState *cpu)
 {
 struct qemu_work_item *wi;
@@ -683,6 +709,9 @@ static void flush_queued_work(CPUState *cpu)
 cpu-queued_work_first = wi-next;
 wi-func(wi-data);
 wi-done = true;
+if (wi-free) {
+g_free(wi);
+}
 }
 cpu-queued_work_last = NULL;
 qemu_cond_broadcast(qemu_work_cond);
diff --git a/include/qemu-common.h b/include/qemu-common.h
index b399d85..bad6e1f 100644
--- a/include/qemu-common.h
+++ b/include/qemu-common.h
@@ -286,6 +286,7 @@ struct qemu_work_item {
 void (*func)(void *data);
 void *data;
 int done;
+bool free;
 };
 
 #ifdef CONFIG_USER_ONLY
diff --git a/include/qom/cpu.h b/include/qom/cpu.h
index 7cd9442..46465e9 100644
--- a/include/qom/cpu.h
+++ b/include/qom/cpu.h
@@ -265,6 +265,16 @@ bool cpu_is_stopped(CPUState *cpu);
 void run_on_cpu(CPUState *cpu, void (*func)(void *data), void *data);
 
 /**
+ * async_run_on_cpu:
+ * @cpu: The vCPU to run on.
+ * @func: The function to be executed.
+ * @data: Data to pass to the function.
+ *
+ * Schedules the function @func for execution on the vCPU @cpu asynchronously.
+ */
+void async_run_on_cpu(CPUState *cpu, void (*func)(void *data), void *data);
+
+/**
  * qemu_for_each_cpu:
  * @func: The function to be executed.
  * @data: Data to pass to the function.
-- 
1.7.1

[Qemu-devel] [RFC PATCH v5 3/3] Force auto-convegence of live migration

 If a user chooses to turn on the auto-converge migration capability
 these changes detect the lack of convergence and throttle down the
 guest. i.e. force the VCPUs out of the guest for some duration
 and let the migration thread catchup and help converge.

 Verified the convergence using the following :
 - SpecJbb2005 workload running on a 20VCPU/256G guest(~80% busy)
 - OLTP like workload running on a 80VCPU/512G guest (~80% busy)

 Sample results with SpecJbb2005 workload : (migrate speed set to 20Gb and
 migrate downtime set to 4seconds).

 (qemu) info migrate
 capabilities: xbzrle: off auto-converge: off  
 Migration status: active
 total time: 1487503 milliseconds
 expected downtime: 519 milliseconds
 transferred ram: 383749347 kbytes
 remaining ram: 2753372 kbytes
 total ram: 268444224 kbytes
 duplicate: 65461532 pages
 skipped: 64901568 pages
 normal: 95750218 pages
 normal bytes: 383000872 kbytes
 dirty pages rate: 67551 pages

 ---
 
 (qemu) info migrate
 capabilities: xbzrle: off auto-converge: on   
 Migration status: completed
 total time: 241161 milliseconds
 downtime: 6373 milliseconds
 transferred ram: 28235307 kbytes
 remaining ram: 0 kbytes
 total ram: 268444224 kbytes
 duplicate: 64946416 pages
 skipped: 64903523 pages
 normal: 7044971 pages
 normal bytes: 28179884 kbytes

Signed-off-by: Chegu Vinod chegu_vi...@hp.com
---
 arch_init.c   |   68 +
 include/migration/migration.h |4 ++
 migration.c   |1 +
 3 files changed, 73 insertions(+), 0 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 49c5dc2..29788d6 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -49,6 +49,7 @@
 #include trace.h
 #include exec/cpu-all.h
 #include hw/acpi/acpi.h
+#include sysemu/cpus.h
 
 #ifdef DEBUG_ARCH_INIT
 #define DPRINTF(fmt, ...) \
@@ -104,6 +105,8 @@ int graphic_depth = 15;
 #endif
 
 const uint32_t arch_type = QEMU_ARCH;
+static bool mig_throttle_on;
+
 
 /***/
 /* ram save/restore */
@@ -378,8 +381,15 @@ static void migration_bitmap_sync(void)
 uint64_t num_dirty_pages_init = migration_dirty_pages;
 MigrationState *s = migrate_get_current();
 static int64_t start_time;
+static int64_t bytes_xfer_prev;
 static int64_t num_dirty_pages_period;
 int64_t end_time;
+int64_t bytes_xfer_now;
+static int dirty_rate_high_cnt;
+
+if (!bytes_xfer_prev) {
+bytes_xfer_prev = ram_bytes_transferred();
+}
 
 if (!start_time) {
 start_time = qemu_get_clock_ms(rt_clock);
@@ -404,6 +414,23 @@ static void migration_bitmap_sync(void)
 
 /* more than 1 second = 1000 millisecons */
 if (end_time  start_time + 1000) {
+if (migrate_auto_converge()) {
+/* The following detection logic can be refined later. For now:
+   Check to see if the dirtied bytes is 50% more than the approx.
+   amount of bytes that just got transferred since the last time we
+   were in this routine. If that happens N times (for now N==5)
+   we turn on the throttle down logic */
+bytes_xfer_now = ram_bytes_transferred();
+if (s-dirty_pages_rate 
+((num_dirty_pages_period*TARGET_PAGE_SIZE) 
+((bytes_xfer_now - bytes_xfer_prev)/2))) {
+if (dirty_rate_high_cnt++  5) {
+DPRINTF(Unable to converge. Throtting down guest\n);
+mig_throttle_on = true;
+}
+ }
+ bytes_xfer_prev = bytes_xfer_now;
+}
 s-dirty_pages_rate = num_dirty_pages_period * 1000
 / (end_time - start_time);
 s-dirty_bytes_rate = s-dirty_pages_rate * TARGET_PAGE_SIZE;
@@ -496,6 +523,15 @@ static int ram_save_block(QEMUFile *f, bool last_stage)
 return bytes_sent;
 }
 
+bool throttling_needed(void)
+{
+if (!migrate_auto_converge()) {
+return false;
+}
+
+return mig_throttle_on;
+}
+
 static uint64_t bytes_transferred;
 
 static ram_addr_t ram_save_remaining(void)
@@ -1098,3 +1134,35 @@ TargetInfo *qmp_query_target(Error **errp)
 
 return info;
 }
+
+static void mig_delay_vcpu(void)
+{
+qemu_mutex_unlock_iothread();
+g_usleep(50*1000);
+qemu_mutex_lock_iothread();
+}
+
+/* Stub used for getting the vcpu out of VM and into qemu via
+   run_on_cpu()*/
+static void mig_kick_cpu(void *opq)
+{
+mig_delay_vcpu();
+return;
+}
+
+/* To reduce the dirty rate explicitly disallow the VCPUs from spending
+   much time in the VM. The migration thread will try to catchup.
+   Workload will experience a performance drop.
+*/
+void migration_throttle_down(void)
+{
+if (throttling_needed()) {
+CPUArchState *penv = first_cpu;
+while (penv) {
+qemu_mutex_lock_iothread();
+async_run_on_cpu(ENV_GET_CPU(penv), mig_kick_cpu, NULL);
+

Re: [Qemu-devel] [RFC PATCH v5 3/3] Force auto-convegence of live migration

2013-05-09 Thread Igor Mammedov

On Thu,  9 May 2013 12:43:20 -0700
Chegu Vinod chegu_vi...@hp.com wrote:

  If a user chooses to turn on the auto-converge migration capability
  these changes detect the lack of convergence and throttle down the
  guest. i.e. force the VCPUs out of the guest for some duration
  and let the migration thread catchup and help converge.
 
[...]
 +void migration_throttle_down(void)
 +{
 +if (throttling_needed()) {
 +CPUArchState *penv = first_cpu;
 +while (penv) {
 +qemu_mutex_lock_iothread();
 +async_run_on_cpu(ENV_GET_CPU(penv), mig_kick_cpu, NULL);
 +qemu_mutex_unlock_iothread();
 +penv = penv-next_cpu;
could you replace open coded loop with qemu_for_each_cpu()?

 +}
 +}
 +}

-- 
Regards,
  Igor

Re: [Qemu-devel] Profiling sparc64 emulation

2013-05-09 Thread Artyom Tarasenko

On Thu, May 9, 2013 at 8:30 PM, Aurelien Jarno aurel...@aurel32.net wrote:
 On Wed, May 08, 2013 at 11:02:24PM +0200, Artyom Tarasenko wrote:
 On Wed, May 8, 2013 at 12:57 AM, Aurelien Jarno aurel...@aurel32.net wrote:
  On Tue, May 07, 2013 at 11:29:20PM +0200, Artyom Tarasenko wrote:
  On Tue, May 7, 2013 at 1:38 PM, Torbjorn Granlund t...@gmplib.org wrote:
   The 2nd table of http://gmplib.org/devel/testsystems.html shows all
   emulated systems I am using, most of which are qemu-based.
 
  Do I read it correct that qemu-system-ppc64 with the slowdown factor
  of 33 is ~3 times faster than qemu-system-sparc64 with the slowdown
  factor of 96 ?
  Do they both use Debian Wheezy guest? You have a remark that ppc64 has
  problems with its clock. Was it taken into account when the slowdown
  factors were calculated?
 
 
  Clock or not, it should be noted that qemu-system-sparc64 is undoubtedly
  slower (at least 5 to 10 times) than qemu-system-{arm,ppc,mips,...} on
  some type of load like perl scripts.

 That's interesting. Actually it should be possible to lauch perl under user
 mode qemu-sparc32plus. Is it possible to launch perl under user mode
 qemu-ppc{32,64} too?

 That would allow to understand whether the bad performance has to do
 with TCG or the rest of the system emulation.

 I haven't done that yet, but I have run perf top while running perl
 script (lintian), on both qemu-system-sparc64 and qemu-system-ppc64. The
 results are quite different:

 qemu-system-ppc64
 -
  49,73%  perf-10672.map   [.] 0x7f7853ab4e0f
  13,23%  qemu-system-ppc64[.] cpu_ppc_exec
  13,16%  libglib-2.0.so.0.3200.4  [.] g_hash_table_lookup
   8,18%  libglib-2.0.so.0.3200.4  [.] g_str_hash
   2,47%  qemu-system-ppc64[.] object_class_dynamic_cast
   1,97%  qemu-system-ppc64[.] type_is_ancestor
   1,05%  libglib-2.0.so.0.3200.4  [.] g_str_equal
   0,91%  qemu-system-ppc64[.] ppc_cpu_do_interrupt
   0,90%  qemu-system-ppc64[.] object_dynamic_cast_assert
   0,79%  libc-2.13.so [.] __sigsetjmp
   0,62%  qemu-system-ppc64[.] type_get_parent.isra.3
   0,58%  qemu-system-ppc64[.] type_get_by_name
   0,57%  qemu-system-ppc64[.] qemu_log_mask
   0,54%  qemu-system-ppc64[.] object_dynamic_cast

 qemu-system-sparc64
 ---
  17,43%  perf-8154.map[.] 0x7f6ac10245c8
  10,46%  qemu-system-sparc64  [.] tcg_optimize
  10,36%  qemu-system-sparc64  [.] cpu_sparc_exec
   6,35%  qemu-system-sparc64  [.] tb_flush_jmp_cache
   4,75%  qemu-system-sparc64  [.] get_physical_address_data
   4,45%  qemu-system-sparc64  [.] tcg_liveness_analysis
   4,35%  qemu-system-sparc64  [.] tcg_reg_alloc_op
   2,90%  qemu-system-sparc64  [.] tlb_flush_page
   2,35%  qemu-system-sparc64  [.] disas_sparc_insn
   2,28%  qemu-system-sparc64  [.] get_physical_address_code
   2,21%  qemu-system-sparc64  [.] tlb_flush
   1,64%  qemu-system-sparc64  [.] tcg_out_opc
   1,22%  qemu-system-sparc64  [.] tcg_out_modrm_sib_offset.constprop.41
   1,20%  qemu-system-sparc64  [.] helper_ld_asi
   1,14%  qemu-system-sparc64  [.] gen_intermediate_code_pc
   1,04%  qemu-system-sparc64  [.] helper_st_asi
   1,00%  qemu-system-sparc64  [.] object_class_dynamic_cast
   0,98%  qemu-system-sparc64  [.] tb_find_pc
   0,94%  qemu-system-sparc64  [.] get_page_addr_code
   0,92%  qemu-system-sparc64  [.] tcg_gen_code_search_pc
   0,91%  qemu-system-sparc64  [.] tlb_set_page
   0,83%  qemu-system-sparc64  [.] reset_temp
   0,82%  qemu-system-sparc64  [.] tcg_reg_alloc_start


 The perf-.map correspond to the code execution. As you can see it's
 a lot lower on sparc, while a lot of smaller code generation/mmu code
 appears. It's seems that the optimizations have to be focused on the
 system part, not the TCG part, at least for now.

 A quick look at the MMU seems to show some performance issue here, due
 to the split code/data MMU on SPARC64, while the QEMU TLB is a joint
 one. As a consequence one can see a lot of ping pong, setting a given
 page to read or read/write, then execution, and later read or read/write
 again. My guess is that it's related to constants table in the same page
 than the code.

 It should also be noted that the tcg_optimize starts to take a
 non-negligible time, in both cases. The code grew up quite a lot
 recently, and it might be time to rework it. It's nice to have optimized
 code, but not if the gain is lower than the optimization time.

Is it possible to disable some optimisations, or the whole
optimisation completely?
I see no command line switches for that.

Re: [Qemu-devel] [RFC PATCH v5 3/3] Force auto-convegence of live migration

2013-05-09 Thread Igor Mammedov

On Thu,  9 May 2013 12:43:20 -0700
Chegu Vinod chegu_vi...@hp.com wrote:

  If a user chooses to turn on the auto-converge migration capability
  these changes detect the lack of convergence and throttle down the
  guest. i.e. force the VCPUs out of the guest for some duration
  and let the migration thread catchup and help converge.
 
[...]
 +
 +static void mig_delay_vcpu(void)
 +{
 +qemu_mutex_unlock_iothread();
 +g_usleep(50*1000);
 +qemu_mutex_lock_iothread();
 +}
 +
 +/* Stub used for getting the vcpu out of VM and into qemu via
 +   run_on_cpu()*/
 +static void mig_kick_cpu(void *opq)
 +{
 +mig_delay_vcpu();
 +return;
 +}
 +
 +/* To reduce the dirty rate explicitly disallow the VCPUs from spending
 +   much time in the VM. The migration thread will try to catchup.
 +   Workload will experience a performance drop.
 +*/
 +void migration_throttle_down(void)
 +{
 +if (throttling_needed()) {
 +CPUArchState *penv = first_cpu;
 +while (penv) {
 +qemu_mutex_lock_iothread();
Locking it here and the unlocking it inside of queued work doesn't look nice.
What exactly are you protecting with this lock?


 +async_run_on_cpu(ENV_GET_CPU(penv), mig_kick_cpu, NULL);
 +qemu_mutex_unlock_iothread();
 +penv = penv-next_cpu;
 +}
 +}
 +}



-- 
Regards,
  Igor

Re: [Qemu-devel] Profiling sparc64 emulation

On 05/09/2013 01:11 PM, Artyom Tarasenko wrote:
 Is it possible to disable some optimisations, or the whole
 optimisation completely?
 I see no command line switches for that.

No command-line switches.

See the top lines of tcg/tcg.c for compile-time disabling.


r~

Re: [Qemu-devel] [PATCH v6 00/11] rdma: migration support

On 5/9/2013 10:20 AM, Michael R. Hines wrote:

Comments inline. FYI: please CC mrhi...@us.ibm.com,
because it helps me know when to scroll threw the bazillion qemu-devel
emails.

I have things separated out into folders and rules, but a direct CC is
better =)

Sure will do.

On 05/03/2013 07:28 PM, Chegu Vinod wrote:

Hi Michael,

I picked up the qemu bits from your github branch and gave it a
try. (BTW the setup I was given temporary access to has a pair of
MLX's IB QDR cards connected back to back via QSFP cables)

Observed a couple of things and wanted to share..perhaps you may be
aware of them already or perhaps these are unrelated to your specific
changes ? (Note: Still haven't finished the review of your changes ).

a) x-rdma-pin-all off case

Seem to only work sometimes but fails at other times. Here is an
example...

(qemu) rdma: Accepting rdma connection...
rdma: Memory pin all: disabled
rdma: verbs context after listen: 0x56757d50
rdma: dest_connect Source GID: fe80::2:c903:9:53a5, Dest GID:
fe80::2:c903:9:5855

rdma: Accepted migration
qemu-system-x86_64: VQ 1 size 0x100 Guest index 0x4d2 inconsistent
with Host ind

ex 0x4ec: delta 0xffe6
qemu: warning: error while loading state for instance 0x0 of device
'virtio-net'

load of migration failed

Can you give me more details about the configuration of your VM?

The guest is a 10-VCPU/128GB ...and nothing really that fancy with
respect to storage or networking.

Hosted on a large Westmere-EX box (target is a similarly configured
Westmere-X system). There is a shared SAN disk between the two hosts.
Both hosts have 3.9-rc7 kernel that I got at that time from kvm.git
tree. The guest was also running the same kernel.

Since I was just trying it out I was not running any workload either.

On the source host the qemu command line :

/usr/local/bin/qemu-system-x86_64 \
-enable-kvm \
-cpu host \
-name vm1 \
-m 131072 -smp 10,sockets=1,cores=10,threads=1 \
-mem-path /dev/hugepages \
-chardev
socket,id=charmonitor,path=/var/lib/libvirt/qemu/vm1.monitor,server,nowait \
-drive
file=/dev/libvirt_lvm3/vm1,if=none,id=drive-virtio-disk0,format=raw,cache=none,aio=native
\
-device
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
\

-monitor stdio \
-net nic,model=virtio,macaddr=52:54:00:71:01:01,netdev=nic-0 \
-netdev tap,id=nic-0,ifname=tap0,script=no,downscript=no,vhost=on \
-vnc :4

On the destination host the command line was same as the above with the
following additional arg...

-incoming x-rdma:static private ipaddr of the IB:port #

b) x-rdma-pin-all on case :

The guest is not resuming on the target host. i.e. the source host's
qemu states that migration is complete but the guest is not
responsive anymore... (doesn't seem to have crashed but its stuck
somewhere).Have you seen this behavior before ? Any tips on how I
could extract additional info ?

Is the QEMU monitor still responsive?

They were responsive.

Can you capture a screenshot of the guest's console to see if there is
a panic?

No panic on the guest's console :(

What kind of storage is attached to the VM?

Simple virtio disk hosted on a SAN disk (see the qemu command line).

Besides the list of noted restrictions/issues around having to pin
all of guest memoryif the pinning is done as part of starting of
the migration it ends up taking noticeably long time for larger
guests. Wonder whether that should be counted as part of the total
migration time ?.

That's a good question: The pin-all option should not be slowing down
your VM to much as the VM should still be running before the
migration_thread() actually kicks in and starts the migration.

Well I had hoped that it would not have any serious impacts but it ended
up freezing the guest...

I need more information on the configuration of your VM, guest
operating system, architecture and so forth...

Pl. see above.

And similarly as before whether or not QEMU is not responsive or
whether or not it's the guest that's panicked...

Guest just freezes...doesn't panic when this pinning is in progress
(i.e. after I set the capability and start the migration) . After the
pin'ng completes the guest continues to run and the migration
continues...till it completes (as per the source host's qemu)...but I
never see it resume on the target host.

Also the act of pinning all the memory seems to freeze the guest.
e.g. : For larger enterprise sized guests (say 128GB and higher) the
guest is frozen is anywhere from nearly a minute (~50seconds) to
multiple minutes as the guest size increases...which imo kind of
defeats the purpose of live guest migration.

That's bad =) There must be a bug somewhere the largest VM I
can create on my hardware is ~16GB - so let me give that a try and try
to track down the problem.

Ok. Perhaps run a simple test run inside the guest can help observe any
scheduling

Re: [Qemu-devel] [RFC PATCH v5 3/3] Force auto-convegence of live migration


On 5/9/2013 1:05 PM, Igor Mammedov wrote:

On Thu,  9 May 2013 12:43:20 -0700
Chegu Vinod chegu_vi...@hp.com wrote:


  If a user chooses to turn on the auto-converge migration capability
  these changes detect the lack of convergence and throttle down the
  guest. i.e. force the VCPUs out of the guest for some duration
  and let the migration thread catchup and help converge.


[...]

+void migration_throttle_down(void)
+{
+if (throttling_needed()) {
+CPUArchState *penv = first_cpu;
+while (penv) {
+qemu_mutex_lock_iothread();
+async_run_on_cpu(ENV_GET_CPU(penv), mig_kick_cpu, NULL);
+qemu_mutex_unlock_iothread();
+penv = penv-next_cpu;

could you replace open coded loop with qemu_for_each_cpu()?


Yes will try to replace it in the next version.
Vinod



+}
+}
+}

Re: [Qemu-devel] [PATCH v6 00/11] rdma: migration support

2013-05-09 Thread Michael R. Hines



Some more followup questions below to help me debug before I start 
digging in...


On 05/09/2013 06:20 PM, Chegu Vinod wrote:

Setting aside the mlock() freezes for the moment, let's first fix your 
crashing

problem on the destination-side. Let's make that a priority before we fix
the mlock problem.

When the migration completes, can you provide me with more detailed 
information

about the state of QEMU on the destination?

Is it responding?
What's on the VNC console?
Is QEMU responding?
Is the network responding?
Was the VM idle? Or running an application?
Can you attach GDB to QEMU after the migration?



/usr/local/bin/qemu-system-x86_64 \
-enable-kvm \
-cpu host \
-name vm1 \
-m 131072 -smp 10,sockets=1,cores=10,threads=1 \
-mem-path /dev/hugepages \


Can you disable hugepages and re-test?

I'll get back to the other mlock() issues later after we at least first 
make sure the migration itself is working.

Re: [Qemu-devel] [RFC PATCH v5 3/3] Force auto-convegence of live migration


On 5/9/2013 1:24 PM, Igor Mammedov wrote:

On Thu,  9 May 2013 12:43:20 -0700
Chegu Vinod chegu_vi...@hp.com wrote:


  If a user chooses to turn on the auto-converge migration capability
  these changes detect the lack of convergence and throttle down the
  guest. i.e. force the VCPUs out of the guest for some duration
  and let the migration thread catchup and help converge.


[...]

+
+static void mig_delay_vcpu(void)
+{
+qemu_mutex_unlock_iothread();
+g_usleep(50*1000);
+qemu_mutex_lock_iothread();
+}
+
+/* Stub used for getting the vcpu out of VM and into qemu via
+   run_on_cpu()*/
+static void mig_kick_cpu(void *opq)
+{
+mig_delay_vcpu();
+return;
+}
+
+/* To reduce the dirty rate explicitly disallow the VCPUs from spending
+   much time in the VM. The migration thread will try to catchup.
+   Workload will experience a performance drop.
+*/
+void migration_throttle_down(void)
+{
+if (throttling_needed()) {
+CPUArchState *penv = first_cpu;
+while (penv) {
+qemu_mutex_lock_iothread();

Locking it here and the unlocking it inside of queued work doesn't look nice.

Yes...but see below.

What exactly are you protecting with this lock?
It was my understanding that BQL is supposed to be held when the vcpu 
threads start entering and executing in the qemu context (as qemu is not 
MP safe).. Still true?


In this specific use case I was concerned about the fraction of the time 
when a given vcpu thread is in the qemu context but not executing the 
callback routine...and was hence holding the BQL.Holding the BQL and 
g_usleep'ng is not only bad but would slow down the migration 
thread...hence the doesn't look nice stuff :(


For this specific use case If its not really required to even bother 
with the BQL then pl. do let me know.


Also pl. refer to version 3 of my patchI was doing a g_usleep() in 
kvm_cpu_exec() and was not messing much with the BQLbut that was 
deemed as not a good thing either.


Thanks
Vinod




+async_run_on_cpu(ENV_GET_CPU(penv), mig_kick_cpu, NULL);
+qemu_mutex_unlock_iothread();
+penv = penv-next_cpu;
+}
+}
+}

Re: [Qemu-devel] [RFC PATCH 0/8] MemoryRegion and FlatView refcounting, replace hostmem with memory_region_find