date:20130819

[Qemu-devel] [PATCH v3 6/8] xics-kvm: Support for in-kernel XICS interrupt controller

2013-08-19 Thread Alexey Kardashevskiy

From: David Gibson da...@gibson.dropbear.id.au

Recent (host) kernels support emulating the PAPR defined XICS interrupt
controller system within KVM.  This patch allows qemu to initialize and
configure the in-kernel XICS, and keep its state in sync with qemu's XICS
state as necessary.

This should give considerable performance improvements.  e.g. on a simple
IPI ping-pong test between hardware threads, using qemu XICS gives us
around 5,000 irqs/second, whereas the in-kernel XICS gives us around
70,000 irqs/s on the same hardware configuration.

Signed-off-by: David Gibson da...@gibson.dropbear.id.au
[Mike Qiu qiud...@linux.vnet.ibm.com: fixed mistype which caused 
ics_set_kvm_state() to fail]
Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
---
Changes:
v3:
* ics_kvm_realize() now is a realize callback rather than initfn callback
* asserts replaced with Error**
* KVM_ICS is created now in KVM_XICS's initfn rather than in the nr_irqs
property setter
* added KVM_XICS_GET_PARENT_CLASS() to get the common XICS class - needed
for xics_kvm_cpu_setup() to call parent's cpu_setup()
* fixed some indentations, removed some \n from error_report()
---
 default-configs/ppc64-softmmu.mak |   1 +
 hw/intc/Makefile.objs |   1 +
 hw/intc/xics_kvm.c| 492 ++
 hw/ppc/spapr.c|  23 +-
 include/hw/ppc/xics.h |  13 +
 5 files changed, 528 insertions(+), 2 deletions(-)
 create mode 100644 hw/intc/xics_kvm.c

diff --git a/default-configs/ppc64-softmmu.mak 
b/default-configs/ppc64-softmmu.mak
index 7831c2b..116f4ca 100644
--- a/default-configs/ppc64-softmmu.mak
+++ b/default-configs/ppc64-softmmu.mak
@@ -47,6 +47,7 @@ CONFIG_E500=y
 CONFIG_OPENPIC_KVM=$(and $(CONFIG_E500),$(CONFIG_KVM))
 # For pSeries
 CONFIG_XICS=$(CONFIG_PSERIES)
+CONFIG_XICS_KVM=$(and $(CONFIG_PSERIES),$(CONFIG_KVM))
 # For PReP
 CONFIG_I82378=y
 CONFIG_I8259=y
diff --git a/hw/intc/Makefile.objs b/hw/intc/Makefile.objs
index 2851eed..47ac442 100644
--- a/hw/intc/Makefile.objs
+++ b/hw/intc/Makefile.objs
@@ -23,3 +23,4 @@ obj-$(CONFIG_OMAP) += omap_intc.o
 obj-$(CONFIG_OPENPIC_KVM) += openpic_kvm.o
 obj-$(CONFIG_SH4) += sh_intc.o
 obj-$(CONFIG_XICS) += xics.o
+obj-$(CONFIG_XICS_KVM) += xics_kvm.o
diff --git a/hw/intc/xics_kvm.c b/hw/intc/xics_kvm.c
new file mode 100644
index 000..4ed4f33
--- /dev/null
+++ b/hw/intc/xics_kvm.c
@@ -0,0 +1,492 @@
+/*
+ * QEMU PowerPC pSeries Logical Partition (aka sPAPR) hardware System Emulator
+ *
+ * PAPR Virtualized Interrupt System, aka ICS/ICP aka xics, in-kernel emulation
+ *
+ * Copyright (c) 2013 David Gibson, IBM Corporation.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the Software), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ *
+ */
+
+#include hw/hw.h
+#include trace.h
+#include hw/ppc/spapr.h
+#include hw/ppc/xics.h
+#include kvm_ppc.h
+#include qemu/config-file.h
+#include qemu/error-report.h
+
+#include sys/ioctl.h
+
+typedef struct KVMXICSState {
+XICSState parent_obj;
+
+uint32_t set_xive_token;
+uint32_t get_xive_token;
+uint32_t int_off_token;
+uint32_t int_on_token;
+int kernel_xics_fd;
+} KVMXICSState;
+
+/*
+ * ICP-KVM
+ */
+static void icp_get_kvm_state(ICPState *ss)
+{
+uint64_t state;
+struct kvm_one_reg reg = {
+.id = KVM_REG_PPC_ICP_STATE,
+.addr = (uintptr_t)state,
+};
+int ret;
+
+/* ICP for this CPU thread is not in use, exiting */
+if (!ss-cs) {
+return;
+}
+
+ret = kvm_vcpu_ioctl(ss-cs, KVM_GET_ONE_REG, reg);
+if (ret != 0) {
+error_report(Unable to retrieve KVM interrupt controller state
+ for CPU %d: %s, ss-cs-cpu_index, strerror(errno));
+exit(1);
+}
+
+ss-xirr = state  KVM_REG_PPC_ICP_XISR_SHIFT;
+ss-mfrr = (state  KVM_REG_PPC_ICP_MFRR_SHIFT)
+ KVM_REG_PPC_ICP_MFRR_MASK;
+ss-pending_priority = (state  KVM_REG_PPC_ICP_PPRI_SHIFT)
+ KVM_REG_PPC_ICP_PPRI_MASK;
+}
+
+static int

[Qemu-devel] [Bug 1213797] [NEW] Guest hang after live migration

2013-08-19 Thread chao zhou

Public bug reported:

Environment:

Host OS (ia32/ia32e/IA64):ia32e
Guest OS (ia32/ia32e/IA64):ia32e
Guest OS Type (Linux/Windows):Linux
kvm.git Commit:205befd9a5c701b56f569434045821f413f08f6d
qemu-kvm uq/master Commit:ca916d3729564d0eb3c2374a96903f7e8aced8a7
Host Kernel Version:3.11.0-rc1
Hardware:Romley_EP, Ivytown_EP

Bug detailed description:
--
after guest live migration, the guest will hang, and ping IP fail

note:
1. after guest save/restore, the guest will hang, and ping IP fail
2. This should be a qemu-kvm bug.
kvm  + qemu-kvm   =  result
205befd9 + ca916d37   =  bad
205befd9 + f03d07d4   =  good


Reproduce steps:

1.Start a TCP daemon for migration
qemu-system-x86_64 -enable-kvm -m 2G -smp 2 -net nic,macaddr=00:12:34:56:13:45
-net tap,script=/etc/kvm/qemu-ifup rhel6u2.qcow -incoming tcp:localhost:
2. create guest
qemu-system-x86_64 -enable-kvm -m 2G -smp 2 -net nic,macaddr=00:12:34:56:13:45
-net tap,script=/etc/kvm/qemu-ifup rhel6u2.qcow 
3. migrate tcp:localhost:

Current result:

guest hang

Expected result:

after live/migration or save/restore, the guest works fine

** Affects: qemu
 Importance: Undecided
 Status: New

** Attachment added: guest serial log
   
https://bugs.launchpad.net/bugs/1213797/+attachment/3777611/+files/guest_serial.log

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1213797

Title:
  Guest hang after live migration

Status in QEMU:
  New

Bug description:
  Environment:
  
  Host OS (ia32/ia32e/IA64):ia32e
  Guest OS (ia32/ia32e/IA64):ia32e
  Guest OS Type (Linux/Windows):Linux
  kvm.git Commit:205befd9a5c701b56f569434045821f413f08f6d
  qemu-kvm uq/master Commit:ca916d3729564d0eb3c2374a96903f7e8aced8a7
  Host Kernel Version:3.11.0-rc1
  Hardware:Romley_EP, Ivytown_EP

  Bug detailed description:
  --
  after guest live migration, the guest will hang, and ping IP fail

  note:
  1. after guest save/restore, the guest will hang, and ping IP fail
  2. This should be a qemu-kvm bug.
  kvm  + qemu-kvm   =  result
  205befd9 + ca916d37   =  bad
  205befd9 + f03d07d4   =  good

  
  Reproduce steps:
  
  1.Start a TCP daemon for migration
  qemu-system-x86_64 -enable-kvm -m 2G -smp 2 -net nic,macaddr=00:12:34:56:13:45
  -net tap,script=/etc/kvm/qemu-ifup rhel6u2.qcow -incoming tcp:localhost:
  2. create guest
  qemu-system-x86_64 -enable-kvm -m 2G -smp 2 -net nic,macaddr=00:12:34:56:13:45
  -net tap,script=/etc/kvm/qemu-ifup rhel6u2.qcow 
  3. migrate tcp:localhost:

  Current result:
  
  guest hang

  Expected result:
  
  after live/migration or save/restore, the guest works fine

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1213797/+subscriptions

[Qemu-devel] [Bug 1213797] Re: Guest hang after live migration

2013-08-19 Thread Lei Li

Hi,

qemu-kvm no longer exists as it has been merged into qemu.
You might want to have a try with the lastest upstream/master qemu tree, or 
pre-release versions of it. :)

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1213797

Title:
  Guest hang after live migration

Status in QEMU:
  New

Bug description:
  Environment:
  
  Host OS (ia32/ia32e/IA64):ia32e
  Guest OS (ia32/ia32e/IA64):ia32e
  Guest OS Type (Linux/Windows):Linux
  kvm.git Commit:205befd9a5c701b56f569434045821f413f08f6d
  qemu-kvm uq/master Commit:ca916d3729564d0eb3c2374a96903f7e8aced8a7
  Host Kernel Version:3.11.0-rc1
  Hardware:Romley_EP, Ivytown_EP

  Bug detailed description:
  --
  after guest live migration, the guest will hang, and ping IP fail

  note:
  1. after guest save/restore, the guest will hang, and ping IP fail
  2. This should be a qemu-kvm bug.
  kvm  + qemu-kvm   =  result
  205befd9 + ca916d37   =  bad
  205befd9 + f03d07d4   =  good

  
  Reproduce steps:
  
  1.Start a TCP daemon for migration
  qemu-system-x86_64 -enable-kvm -m 2G -smp 2 -net nic,macaddr=00:12:34:56:13:45
  -net tap,script=/etc/kvm/qemu-ifup rhel6u2.qcow -incoming tcp:localhost:
  2. create guest
  qemu-system-x86_64 -enable-kvm -m 2G -smp 2 -net nic,macaddr=00:12:34:56:13:45
  -net tap,script=/etc/kvm/qemu-ifup rhel6u2.qcow 
  3. migrate tcp:localhost:

  Current result:
  
  guest hang

  Expected result:
  
  after live/migration or save/restore, the guest works fine

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1213797/+subscriptions

Re: [Qemu-devel] [PATCH] block: Produce zeros when protocols reading beyond end of file

2013-08-19 Thread Asias He

On Fri, Aug 16, 2013 at 10:41:36AM +0200, Stefan Hajnoczi wrote:
 On Mon, Aug 5, 2013 at 10:11 AM, Asias He as...@redhat.com wrote:
  From: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
 
  While Asias is debugging an issue creating qcow2 images on top of
  non-file protocols.  It boils down to this example using NBD:
 
  $ qemu-io -c 'open -g nbd+unix:///?socket=/tmp/nbd.sock' -c 'read -v 0 512'
 
  Notice the open -g option to set bs-growable.  This means you can
  read/write beyond end of file.  Reading beyond end of file is supposed
  to produce zeroes.
 
  We rely on this behavior in qcow2_create2() during qcow2 image
  creation.  We create a new file and then write the qcow2 header
  structure using bdrv_pwrite().  Since QCowHeader is not a multiple of
  sector size, block.c first uses bdrv_read() on the empty file to fetch
  the first sector (should be all zeroes).
 
  Here is the output from the qemu-io NBD example above:
 
  $ qemu-io -c 'open -g nbd+unix:///?socket=/tmp/nbd.sock' -c 'read -v 0 512'
  :  ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab  
  0010:  ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab  
  0020:  ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab  
  ...
 
  We are not zeroing the buffer!  As a result qcow2 image creation on top
  of protocols is not guaranteed to work even when file creation is
  supported by the protocol.
 
  Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp
  Signed-off-by: Asias He as...@redhat.com
  ---
   block.c | 30 +-
   1 file changed, 29 insertions(+), 1 deletion(-)
 
  diff --git a/block.c b/block.c
  index 01b66d8..deaf0a0 100644
  --- a/block.c
  +++ b/block.c
  @@ -2544,7 +2544,35 @@ static int coroutine_fn 
  bdrv_co_do_readv(BlockDriverState *bs,
   }
   }
 
  -ret = drv-bdrv_co_readv(bs, sector_num, nb_sectors, qiov);
  +if (!bs-drv-protocol_name) {
  +ret = drv-bdrv_co_readv(bs, sector_num, nb_sectors, qiov);
  +} else {
  +/* NBD doesn't support reading beyond end of file. */
  +int64_t len, total_sectors, max_nb_sectors;
  +
  +len = bdrv_getlength(bs);
  +if (len  0) {
  +ret = len;
  +goto out;
  +}
  +
  +total_sectors = len  BDRV_SECTOR_BITS;
  +max_nb_sectors = MAX(0, total_sectors - sector_num);
  +if (max_nb_sectors  0) {
  +ret = drv-bdrv_co_readv(bs, sector_num,
  + MIN(nb_sectors, max_nb_sectors), 
  qiov);
  +} else {
  +ret = 0;
  +}
  +
  +/* Reading beyond end of file is supposed to produce zeroes */
  +if (ret == 0  total_sectors  sector_num + nb_sectors) {
  +size_t offset = MAX(0, total_sectors - sector_num);
  +size_t bytes = (sector_num + nb_sectors - offset) *
  +BDRV_SECTOR_SIZE;
  +qemu_iovec_memset(qiov, offset * BDRV_SECTOR_SIZE, 0, bytes);
  +}
  +}
 
 This patch breaks qemu-iotests ./check -qcow2 022.  This happens
 because qcow2 temporarily sets -growable = 1 for vmstate accesses
 (which are stored beyond the end of regular image data).

I am a bit confused. This is from the other mail:


  I think it would break qcow2_load_vmstate(), which is basically a   
  
  bdrv_pread() after the end of the image.
  
   
 
 I see, then only protocols have to zeroing the buffer?  In case of
 
 protocols, I think bdrv_getlength() returns the underlying file   
 
 length, so qcow2_load_vmstate() would be a bdrv_pread() within the
 
 result of bdrv_getlength().   
 

Limiting it to protocols solves the problem, I think.


And in v1 of this patch, Kevin wanted bs-growable check instad of the
protocol_name one.


 -ret = drv-bdrv_co_readv(bs, sector_num, nb_sectors, qiov);  
 
 +if (!bs-drv-protocol_name) {   
 

I think !bs-growable is the right check.

Checking for the protocol name is always a hack and most times wrong.


Switching back to the protocol_name check, ./check -qcow2 022 test passes.

 static int qcow2_load_vmstate(BlockDriverState *bs, uint8_t *buf,

Re: [Qemu-devel] [PATCH v2 4/4] timer: make qemu_clock_enable sync between disable and timer's cb

2013-08-19 Thread liu ping fan

On Sun, Aug 18, 2013 at 10:54 PM, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 14/08/2013 02:34, liu ping fan ha scritto:
 On Tue, Aug 13, 2013 at 10:53 PM, Paolo Bonzini pbonz...@redhat.com wrote:

 Il 13/08/2013 07:43, Liu Ping Fan ha scritto:
 After disabling the QemuClock, we should make sure that no QemuTimers
 are still in flight. To implement that with light overhead, we resort
 to QemuEvent. The caller of disabling will wait on QemuEvent of each
 timerlist.

 Note, qemu_clock_enable(foo,false) can _not_ be called from timer's cb.
 And the callers of qemu_clock_enable() should be sync by themselves,
 not protected by this patch.

 Signed-off-by: Liu Ping Fan pingf...@linux.vnet.ibm.com
 ---
  include/qemu/timer.h |  4 
  qemu-timer.c | 53 
 +++-
  2 files changed, 56 insertions(+), 1 deletion(-)

 diff --git a/include/qemu/timer.h b/include/qemu/timer.h
 index 829c005..2b755c9 100644
 --- a/include/qemu/timer.h
 +++ b/include/qemu/timer.h
 @@ -184,6 +184,10 @@ void qemu_clock_notify(QEMUClockType type);
   * @enabled: true to enable, false to disable
   *
   * Enable or disable a clock
 + * Disabling the clock will wait for related timerlists to stop
 + * executing qemu_run_timers.  Thus, this functions should not
 + * be used from the callback of a timer that is based on @clock.
 + * Doing so would cause a deadlock.
   */
  void qemu_clock_enable(QEMUClockType type, bool enabled);

 diff --git a/qemu-timer.c b/qemu-timer.c
 index 5b9a722..8b32e92 100644
 --- a/qemu-timer.c
 +++ b/qemu-timer.c
 @@ -48,6 +48,12 @@ typedef struct QEMUClock {
  QLIST_HEAD(, QEMUTimerList) timerlists;

  NotifierList reset_notifiers;
 +/* While the reader holds this lock, it may block on events_list.
 + * So the modifier should be carefuly not to reset the event before
 + * holding this lock. Otherwise, deadlock.
 + */
 +QemuMutex events_list_lock;
 +GList *events_list;

 No need for a separate list.  Just use the timerlists list; if
 events_list needs a lock, timerlists needs one too.

 Here is a ugly pattern issue, we hold events_list_lock and wait for
 QemuEvent set. If the modifier reset the QemuEvent and then try to
 hold the events_list_lock, then _deadlock_.  To eliminate the
 possibility, using  @events_list_lock, and you can see the modifier
 can not reset QemuEvent while trying to own the lock. On the other
 handle, if using lock on timerlists, since many entrance to access the
 lock, we are not sure of avoiding deadlock

 But does timerlists need a lock, or does the BQL suffice?  If it
 doesn't, there is no need for events_list_lock either.  Is
 qemu_clock_enable called outside the BQL?

Currently, no such guarantee based on BQL. But if we enforce that an
backend thread holds BQL before it cleans up resources, we can resort
to BQL. (so document this, and save the events_list_lock?)

Regards,
Pingfan

Re: [Qemu-devel] [PATCH v2] spapr-pci: fix config space access to support bridges

2013-08-19 Thread Michael S. Tsirkin

On Fri, Aug 16, 2013 at 09:09:38PM +1000, Alexey Kardashevskiy wrote:
 spapr-pci config space accessors use find_dev() to find a PCI device.
 However find_dev() only searched on a primary bus and did not do
 recursive search through secondary buses so config space access was not
 possible for devices other that on a primary bus.
 
 This fixed find_dev() by using the PCI API pci_find_device() function.
 This effectively enabled pci bridges on spapr.
 
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
 ---

Acked-by: Michael S. Tsirkin m...@redhat.com

 
 Does not it make sense to move spapr_pci.c from hw/ppc to hw/pci?

It's a pci host bridge, isn't it?
If yes you can put it in hw/pci-host.
hw/pci is core code.

 We already
 do move interrupt controllers to hw/intc.
 
 
 ---
 Changes:
 v2:
 * fixed coding style
 * config space access traces moved out
 ---
  hw/ppc/spapr_pci.c | 11 +--
  1 file changed, 1 insertion(+), 10 deletions(-)
 
 diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
 index 1ca35a0..91d78a6 100644
 --- a/hw/ppc/spapr_pci.c
 +++ b/hw/ppc/spapr_pci.c
 @@ -65,22 +65,13 @@ static PCIDevice *find_dev(sPAPREnvironment *spapr, 
 uint64_t buid,
  {
  sPAPRPHBState *sphb = find_phb(spapr, buid);
  PCIHostState *phb = PCI_HOST_BRIDGE(sphb);
 -BusState *bus = BUS(phb-bus);
 -BusChild *kid;
  int devfn = (config_addr  8)  0xFF;
  
  if (!phb) {
  return NULL;
  }
  
 -QTAILQ_FOREACH(kid, bus-children, sibling) {
 -PCIDevice *dev = (PCIDevice *)kid-child;
 -if (dev-devfn == devfn) {
 -return dev;
 -}
 -}
 -
 -return NULL;
 +return pci_find_device(phb-bus, (config_addr  16)  0xff, devfn);
  }
  
  static uint32_t rtas_pci_cfgaddr(uint32_t arg)
 -- 
 1.8.3.2

Re: [Qemu-devel] [PATCH] powerpc iommu: enable multiple TCE requests

2013-08-19 Thread Alexey Kardashevskiy

On 08/19/2013 01:22 AM, Paolo Bonzini wrote:
 Il 16/08/2013 11:49, Alexey Kardashevskiy ha scritto:
 With KVM, we could fall back to the qemu implementation
 + * when KVM doesn't support them, but that would be much slower
 + * than just using the KVM implementations of the single TCE
 + * hypercalls. */
 +if (kvmppc_spapr_use_multitce()) {
 +_FDT((fdt_property(fdt, ibm,hypertas-functions, hypertas_propm,
 +   sizeof(hypertas_propm;
 +} else {
 +_FDT((fdt_property(fdt, ibm,hypertas-functions, hypertas_prop,
 +   sizeof(hypertas_prop;
 +}
 
 This prevents migration from newer kernel to older kernel.  Can you
 ensure that the fallback to the QEMU implementation works, even though
 it is not used in practice?

How would it break? By having a device tree with multi-tce in it and not
having KVM PPC capability for that?

If this is the case, it will not prevent from migration as the multi-tce
feature is supported anyway by this patch. The only reason for not
advertising it to the guest is that the host kernel already has
acceleration for H_PUT_TCE (single page map/unmap) and advertising
multi-tce without having it in the host kernel (but only in QEMU) would
slow things down (but it still will work).


-- 
Alexey

Re: [Qemu-devel] [PATCH 1/2] kvm irqfd: support msimessage to irq translation in PHB

2013-08-19 Thread Michael S. Tsirkin

On Wed, Aug 07, 2013 at 06:51:32PM +1000, Alexey Kardashevskiy wrote:
 On PPC64 systems MSI Messages are translated to system IRQ in a PCI
 host bridge. This is already supported for emulated MSI/MSIX but
 not for irqfd where the current QEMU allocates IRQ numbers from
 irqchip and maps MSIMessages to those IRQ in the host kernel.
 
 The patch extends irqfd support in order to avoid unnecessary
 mapping and reuse the one which already exists in a PCI host bridge.
 
 Specifically, a map_msi callback is added to PCIBus and pci_bus_map_msi()
 to PCI API. The latter tries a bus specific map_msi and falls back to
 the default kvm_irqchip_add_msi_route() if none set.
 
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru

The mapping (irq # within data) is really KVM on PPC64 specific,
isn't it?

So why not implement
kvm_irqchip_add_msi_route(kvm_state, msg);
to simply return msg.data on PPC64?

Then you won't have to change all the rest of the code.

 ---
 Changes:
 2013/08/07 v5:
 * pci_bus_map_msi now has default behaviour which is to call
 kvm_irqchip_add_msi_route
 * kvm_irqchip_release_virq fixed not crash when there is no routes
 ---
  hw/i386/kvm/pci-assign.c | 4 ++--
  hw/misc/vfio.c   | 4 ++--
  hw/pci/pci.c | 9 +
  hw/virtio/virtio-pci.c   | 2 +-
  include/hw/pci/pci.h | 2 ++
  include/hw/pci/pci_bus.h | 1 +
  kvm-all.c| 3 +++
  7 files changed, 20 insertions(+), 5 deletions(-)
 
 diff --git a/hw/i386/kvm/pci-assign.c b/hw/i386/kvm/pci-assign.c
 index 5618173..fb59982 100644
 --- a/hw/i386/kvm/pci-assign.c
 +++ b/hw/i386/kvm/pci-assign.c
 @@ -1008,9 +1008,9 @@ static void assigned_dev_update_msi(PCIDevice *pci_dev)
  MSIMessage msg = msi_get_message(pci_dev, 0);
  int virq;
  
 -virq = kvm_irqchip_add_msi_route(kvm_state, msg);
 +virq = pci_bus_map_msi(kvm_state, pci_dev-bus, msg);
  if (virq  0) {
 -perror(assigned_dev_update_msi: kvm_irqchip_add_msi_route);
 +perror(assigned_dev_update_msi: pci_bus_map_msi);
  return;
  }


This really confuses me. Why are you changing i386 code?

  
 diff --git a/hw/misc/vfio.c b/hw/misc/vfio.c
 index 017e693..adcd23d 100644
 --- a/hw/misc/vfio.c
 +++ b/hw/misc/vfio.c
 @@ -643,7 +643,7 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, 
 unsigned int nr,
   * Attempt to enable route through KVM irqchip,
   * default to userspace handling if unavailable.
   */
 -vector-virq = msg ? kvm_irqchip_add_msi_route(kvm_state, *msg) : -1;
 +vector-virq = msg ? pci_bus_map_msi(kvm_state, vdev-pdev.bus, *msg) : 
 -1;
  if (vector-virq  0 ||
  kvm_irqchip_add_irqfd_notifier(kvm_state, vector-interrupt,
 vector-virq)  0) {
 @@ -811,7 +811,7 @@ retry:
   * Attempt to enable route through KVM irqchip,
   * default to userspace handling if unavailable.
   */
 -vector-virq = kvm_irqchip_add_msi_route(kvm_state, msg);
 +vector-virq = pci_bus_map_msi(kvm_state, vdev-pdev.bus, msg);
  if (vector-virq  0 ||
  kvm_irqchip_add_irqfd_notifier(kvm_state, vector-interrupt,
 vector-virq)  0) {
 diff --git a/hw/pci/pci.c b/hw/pci/pci.c
 index 4c004f5..4bce3e7 100644
 --- a/hw/pci/pci.c
 +++ b/hw/pci/pci.c
 @@ -1249,6 +1249,15 @@ void pci_device_set_intx_routing_notifier(PCIDevice 
 *dev,
  dev-intx_routing_notifier = notifier;
  }
  
 +int pci_bus_map_msi(KVMState *s, PCIBus *bus, MSIMessage msg)
 +{
 +if (bus-map_msi) {
 +return bus-map_msi(s, bus, msg);
 +}
 +
 +return kvm_irqchip_add_msi_route(s, msg);
 +}
 +
  /*
   * PCI-to-PCI bridge specification
   * 9.1: Interrupt routing. Table 9-1
 diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
 index d37037e..21180d2 100644
 --- a/hw/virtio/virtio-pci.c
 +++ b/hw/virtio/virtio-pci.c
 @@ -481,7 +481,7 @@ static int kvm_virtio_pci_vq_vector_use(VirtIOPCIProxy 
 *proxy,
  int ret;
  
  if (irqfd-users == 0) {
 -ret = kvm_irqchip_add_msi_route(kvm_state, msg);
 +ret = pci_bus_map_msi(kvm_state, proxy-pci_dev.bus, msg);
  if (ret  0) {
  return ret;
  }
 diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
 index ccec2ba..b6ad9e4 100644
 --- a/include/hw/pci/pci.h
 +++ b/include/hw/pci/pci.h
 @@ -332,6 +332,7 @@ MemoryRegion *pci_address_space_io(PCIDevice *dev);
  typedef void (*pci_set_irq_fn)(void *opaque, int irq_num, int level);
  typedef int (*pci_map_irq_fn)(PCIDevice *pci_dev, int irq_num);
  typedef PCIINTxRoute (*pci_route_irq_fn)(void *opaque, int pin);
 +typedef int (*pci_map_msi_fn)(KVMState *s, PCIBus *bus, MSIMessage msg);
  
  typedef enum {
  PCI_HOTPLUG_DISABLED,
 @@ -375,6 +376,7 @@ bool pci_intx_route_changed(PCIINTxRoute *old, 
 PCIINTxRoute *new);
  void pci_bus_fire_intx_routing_notifier(PCIBus *bus);
  void

Re: [Qemu-devel] [PATCH 1/2] kvm irqfd: support msimessage to irq translation in PHB

2013-08-19 Thread Alexey Kardashevskiy

On 08/19/2013 05:35 PM, Michael S. Tsirkin wrote:
 On Wed, Aug 07, 2013 at 06:51:32PM +1000, Alexey Kardashevskiy wrote:
 On PPC64 systems MSI Messages are translated to system IRQ in a PCI
 host bridge. This is already supported for emulated MSI/MSIX but
 not for irqfd where the current QEMU allocates IRQ numbers from
 irqchip and maps MSIMessages to those IRQ in the host kernel.

 The patch extends irqfd support in order to avoid unnecessary
 mapping and reuse the one which already exists in a PCI host bridge.

 Specifically, a map_msi callback is added to PCIBus and pci_bus_map_msi()
 to PCI API. The latter tries a bus specific map_msi and falls back to
 the default kvm_irqchip_add_msi_route() if none set.

 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
 
 The mapping (irq # within data) is really KVM on PPC64 specific,
 isn't it?
 
 So why not implement
 kvm_irqchip_add_msi_route(kvm_state, msg);
 to simply return msg.data on PPC64?

How exactly? Please give some details. kvm_irqchip_add_msi_route() is
implemented in kvm-all.c once for all platforms so hack it right there?

I thought we discussed all of this already and decided that this thing
should go to PCI host bridge and by default PHB's map_msi() callback should
just call kvm_irqchip_add_msi_route(). This is why I touched i386.

Things have changed since then?



 Then you won't have to change all the rest of the code.
 
 ---
 Changes:
 2013/08/07 v5:
 * pci_bus_map_msi now has default behaviour which is to call
 kvm_irqchip_add_msi_route
 * kvm_irqchip_release_virq fixed not crash when there is no routes
 ---
  hw/i386/kvm/pci-assign.c | 4 ++--
  hw/misc/vfio.c   | 4 ++--
  hw/pci/pci.c | 9 +
  hw/virtio/virtio-pci.c   | 2 +-
  include/hw/pci/pci.h | 2 ++
  include/hw/pci/pci_bus.h | 1 +
  kvm-all.c| 3 +++
  7 files changed, 20 insertions(+), 5 deletions(-)

 diff --git a/hw/i386/kvm/pci-assign.c b/hw/i386/kvm/pci-assign.c
 index 5618173..fb59982 100644
 --- a/hw/i386/kvm/pci-assign.c
 +++ b/hw/i386/kvm/pci-assign.c
 @@ -1008,9 +1008,9 @@ static void assigned_dev_update_msi(PCIDevice *pci_dev)
  MSIMessage msg = msi_get_message(pci_dev, 0);
  int virq;
  
 -virq = kvm_irqchip_add_msi_route(kvm_state, msg);
 +virq = pci_bus_map_msi(kvm_state, pci_dev-bus, msg);
  if (virq  0) {
 -perror(assigned_dev_update_msi: kvm_irqchip_add_msi_route);
 +perror(assigned_dev_update_msi: pci_bus_map_msi);
  return;
  }

 
 This really confuses me. Why are you changing i386 code?
 
   
 diff --git a/hw/misc/vfio.c b/hw/misc/vfio.c
 index 017e693..adcd23d 100644
 --- a/hw/misc/vfio.c
 +++ b/hw/misc/vfio.c
 @@ -643,7 +643,7 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, 
 unsigned int nr,
   * Attempt to enable route through KVM irqchip,
   * default to userspace handling if unavailable.
   */
 -vector-virq = msg ? kvm_irqchip_add_msi_route(kvm_state, *msg) : -1;
 +vector-virq = msg ? pci_bus_map_msi(kvm_state, vdev-pdev.bus, *msg) : 
 -1;
  if (vector-virq  0 ||
  kvm_irqchip_add_irqfd_notifier(kvm_state, vector-interrupt,
 vector-virq)  0) {
 @@ -811,7 +811,7 @@ retry:
   * Attempt to enable route through KVM irqchip,
   * default to userspace handling if unavailable.
   */
 -vector-virq = kvm_irqchip_add_msi_route(kvm_state, msg);
 +vector-virq = pci_bus_map_msi(kvm_state, vdev-pdev.bus, msg);
  if (vector-virq  0 ||
  kvm_irqchip_add_irqfd_notifier(kvm_state, vector-interrupt,
 vector-virq)  0) {
 diff --git a/hw/pci/pci.c b/hw/pci/pci.c
 index 4c004f5..4bce3e7 100644
 --- a/hw/pci/pci.c
 +++ b/hw/pci/pci.c
 @@ -1249,6 +1249,15 @@ void pci_device_set_intx_routing_notifier(PCIDevice 
 *dev,
  dev-intx_routing_notifier = notifier;
  }
  
 +int pci_bus_map_msi(KVMState *s, PCIBus *bus, MSIMessage msg)
 +{
 +if (bus-map_msi) {
 +return bus-map_msi(s, bus, msg);
 +}
 +
 +return kvm_irqchip_add_msi_route(s, msg);
 +}
 +
  /*
   * PCI-to-PCI bridge specification
   * 9.1: Interrupt routing. Table 9-1
 diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
 index d37037e..21180d2 100644
 --- a/hw/virtio/virtio-pci.c
 +++ b/hw/virtio/virtio-pci.c
 @@ -481,7 +481,7 @@ static int kvm_virtio_pci_vq_vector_use(VirtIOPCIProxy 
 *proxy,
  int ret;
  
  if (irqfd-users == 0) {
 -ret = kvm_irqchip_add_msi_route(kvm_state, msg);
 +ret = pci_bus_map_msi(kvm_state, proxy-pci_dev.bus, msg);
  if (ret  0) {
  return ret;
  }
 diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
 index ccec2ba..b6ad9e4 100644
 --- a/include/hw/pci/pci.h
 +++ b/include/hw/pci/pci.h
 @@ -332,6 +332,7 @@ MemoryRegion *pci_address_space_io(PCIDevice *dev);
  typedef void (*pci_set_irq_fn)(void *opaque,

Re: [Qemu-devel] [PATCH 1/2] kvm irqfd: support msimessage to irq translation in PHB

2013-08-19 Thread Michael S. Tsirkin

On Mon, Aug 19, 2013 at 05:44:04PM +1000, Alexey Kardashevskiy wrote:
 On 08/19/2013 05:35 PM, Michael S. Tsirkin wrote:
  On Wed, Aug 07, 2013 at 06:51:32PM +1000, Alexey Kardashevskiy wrote:
  On PPC64 systems MSI Messages are translated to system IRQ in a PCI
  host bridge. This is already supported for emulated MSI/MSIX but
  not for irqfd where the current QEMU allocates IRQ numbers from
  irqchip and maps MSIMessages to those IRQ in the host kernel.
 
  The patch extends irqfd support in order to avoid unnecessary
  mapping and reuse the one which already exists in a PCI host bridge.
 
  Specifically, a map_msi callback is added to PCIBus and pci_bus_map_msi()
  to PCI API. The latter tries a bus specific map_msi and falls back to
  the default kvm_irqchip_add_msi_route() if none set.
 
  Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
  
  The mapping (irq # within data) is really KVM on PPC64 specific,
  isn't it?
  
  So why not implement
  kvm_irqchip_add_msi_route(kvm_state, msg);
  to simply return msg.data on PPC64?
 
 How exactly? Please give some details. kvm_irqchip_add_msi_route() is
 implemented in kvm-all.c once for all platforms so hack it right there?

You can add the map_msi callback in kvm state,
then just call it if it's set, right?

 I thought we discussed all of this already and decided that this thing
 should go to PCI host bridge and by default PHB's map_msi() callback should
 just call kvm_irqchip_add_msi_route(). This is why I touched i386.
 
 Things have changed since then?


I think pci_bus_map_msi was there since day 1, it's not like
we are going back and forth on it, right?

I always felt it's best to have irqfd isolated to kvm somehow
rather than have hooks in pci code, I just don't know enough
about kvm on ppc to figure out the right way to do this.
With your latest patches I think this is becoming clearer ...

 
  Then you won't have to change all the rest of the code.
  
  ---
  Changes:
  2013/08/07 v5:
  * pci_bus_map_msi now has default behaviour which is to call
  kvm_irqchip_add_msi_route
  * kvm_irqchip_release_virq fixed not crash when there is no routes
  ---
   hw/i386/kvm/pci-assign.c | 4 ++--
   hw/misc/vfio.c   | 4 ++--
   hw/pci/pci.c | 9 +
   hw/virtio/virtio-pci.c   | 2 +-
   include/hw/pci/pci.h | 2 ++
   include/hw/pci/pci_bus.h | 1 +
   kvm-all.c| 3 +++
   7 files changed, 20 insertions(+), 5 deletions(-)
 
  diff --git a/hw/i386/kvm/pci-assign.c b/hw/i386/kvm/pci-assign.c
  index 5618173..fb59982 100644
  --- a/hw/i386/kvm/pci-assign.c
  +++ b/hw/i386/kvm/pci-assign.c
  @@ -1008,9 +1008,9 @@ static void assigned_dev_update_msi(PCIDevice 
  *pci_dev)
   MSIMessage msg = msi_get_message(pci_dev, 0);
   int virq;
   
  -virq = kvm_irqchip_add_msi_route(kvm_state, msg);
  +virq = pci_bus_map_msi(kvm_state, pci_dev-bus, msg);
   if (virq  0) {
  -perror(assigned_dev_update_msi: kvm_irqchip_add_msi_route);
  +perror(assigned_dev_update_msi: pci_bus_map_msi);
   return;
   }
 
  
  This really confuses me. Why are you changing i386 code?
  

  diff --git a/hw/misc/vfio.c b/hw/misc/vfio.c
  index 017e693..adcd23d 100644
  --- a/hw/misc/vfio.c
  +++ b/hw/misc/vfio.c
  @@ -643,7 +643,7 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, 
  unsigned int nr,
* Attempt to enable route through KVM irqchip,
* default to userspace handling if unavailable.
*/
  -vector-virq = msg ? kvm_irqchip_add_msi_route(kvm_state, *msg) : -1;
  +vector-virq = msg ? pci_bus_map_msi(kvm_state, vdev-pdev.bus, *msg) 
  : -1;
   if (vector-virq  0 ||
   kvm_irqchip_add_irqfd_notifier(kvm_state, vector-interrupt,
  vector-virq)  0) {
  @@ -811,7 +811,7 @@ retry:
* Attempt to enable route through KVM irqchip,
* default to userspace handling if unavailable.
*/
  -vector-virq = kvm_irqchip_add_msi_route(kvm_state, msg);
  +vector-virq = pci_bus_map_msi(kvm_state, vdev-pdev.bus, msg);
   if (vector-virq  0 ||
   kvm_irqchip_add_irqfd_notifier(kvm_state, vector-interrupt,
  vector-virq)  0) {
  diff --git a/hw/pci/pci.c b/hw/pci/pci.c
  index 4c004f5..4bce3e7 100644
  --- a/hw/pci/pci.c
  +++ b/hw/pci/pci.c
  @@ -1249,6 +1249,15 @@ void pci_device_set_intx_routing_notifier(PCIDevice 
  *dev,
   dev-intx_routing_notifier = notifier;
   }
   
  +int pci_bus_map_msi(KVMState *s, PCIBus *bus, MSIMessage msg)
  +{
  +if (bus-map_msi) {
  +return bus-map_msi(s, bus, msg);
  +}
  +
  +return kvm_irqchip_add_msi_route(s, msg);
  +}
  +
   /*
* PCI-to-PCI bridge specification
* 9.1: Interrupt routing. Table 9-1
  diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
  index d37037e..21180d2 100644
  ---

Re: [Qemu-devel] [PATCH] powerpc iommu: enable multiple TCE requests

2013-08-19 Thread Alexander Graf



Am 19.08.2013 um 09:30 schrieb Alexey Kardashevskiy a...@ozlabs.ru:

 On 08/19/2013 01:22 AM, Paolo Bonzini wrote:
 Il 16/08/2013 11:49, Alexey Kardashevskiy ha scritto:
 With KVM, we could fall back to the qemu implementation
 + * when KVM doesn't support them, but that would be much slower
 + * than just using the KVM implementations of the single TCE
 + * hypercalls. */
 +if (kvmppc_spapr_use_multitce()) {
 +_FDT((fdt_property(fdt, ibm,hypertas-functions, hypertas_propm,
 +   sizeof(hypertas_propm;
 +} else {
 +_FDT((fdt_property(fdt, ibm,hypertas-functions, hypertas_prop,
 +   sizeof(hypertas_prop;
 +}
 
 This prevents migration from newer kernel to older kernel.  Can you
 ensure that the fallback to the QEMU implementation works, even though
 it is not used in practice?
 
 How would it break? By having a device tree with multi-tce in it and not
 having KVM PPC capability for that?
 
 If this is the case, it will not prevent from migration as the multi-tce
 feature is supported anyway by this patch. The only reason for not
 advertising it to the guest is that the host kernel already has
 acceleration for H_PUT_TCE (single page map/unmap) and advertising
 multi-tce without having it in the host kernel (but only in QEMU) would
 slow things down (but it still will work).

It means that if you use the same QEMU version with the same command line on a 
different kernel version, your guest looks different because we generate the 
dtb differently.

The usual way to avoid this is to have a command line option to at least make 
it possible for a management tool to nail down feature flags regardless of the 
host configuration.

Considering that IIRC we haven't actually flagged -M pseries as backwards 
compatible (avoid breaking migration, etc) we can probably get away with 
enabling multi-tce always and live with the performance penalty on older host 
kernels.


Alex

Re: [Qemu-devel] [PATCH] RFCv3 kvm irqfd: support msimessage to irq translation in PHB

2013-08-19 Thread Alexey Kardashevskiy

Bring it up to support discussion in
[PATCH 0/2 v4] kvm irqfd: support msimessage to irq translation in PHB


On 06/30/2013 12:28 AM, Anthony Liguori wrote:
 On Sat, Jun 29, 2013 at 8:45 AM, Alexey Kardashevskiy a...@ozlabs.ru wrote:
 On PPC64 systems MSI Messages are translated to system IRQ in a PCI
 host bridge. This is already supported for emulated MSI/MSIX but
 not for irqfd where the current QEMU allocates IRQ numbers from
 irqchip and maps MSIMessages to those IRQ in the host kernel.

 The patch extends irqfd support in order to avoid unnecessary
 mapping and reuse the one which already exists in a PCI host bridge.

 Specifically, a map_msi callback is added to PCIBus and pci_bus_map_msi()
 to PCI API. The latter returns -1 if a specific PHB does not provide
 with any trsnslation so the existing code will work.
 
 I think there's a bit of confusion here.  The kernel needs a virq
 number to create an eventfd.  virq is just a KVM concept, it doesn't
 correspond to anything useful in hardware.
 
 On pseries, there is a 1-1 mapping between XICS IRQs and VIRQs and MSI
 can be trivially mapped to a virq.
 
 On x86, we need to call a special kernel function which essentially
 creates an apic message-virq mapping such that we can deliver the
 irqfd.
 
 So what this should look like is:
 
 1) A PCI bus function to do the MSI - virq mapping
 2) On x86 (and e500), this is implemented by calling 
 kvm_irqchip_add_msi_route()
 3) On pseries, this just returns msi-data
 
 Perhaps (2) can just be the default PCI bus implementation to simplify things.
 
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru

 ---

 Looks like we agreed that in general PHB is the right place for this,
 not KVM, so I am trying again.

 Probably something should be done to kvm_irqchip_update_msi_route()
 as well but I do not really understand what exactly. Any suggestions?


 ---
  hw/misc/vfio.c   |7 +--
  hw/pci/pci.c |   13 +
  hw/ppc/spapr_pci.c   |6 ++
  hw/virtio/virtio-pci.c   |2 +-
  include/hw/pci/pci.h |4 
  include/hw/pci/pci_bus.h |1 +
  include/sysemu/kvm.h |2 +-
  kvm-all.c|7 ++-
  8 files changed, 37 insertions(+), 5 deletions(-)

 diff --git a/hw/misc/vfio.c b/hw/misc/vfio.c
 index 52fb036..59911bb 100644
 --- a/hw/misc/vfio.c
 +++ b/hw/misc/vfio.c
 @@ -624,7 +624,9 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, 
 unsigned int nr,
   * Attempt to enable route through KVM irqchip,
   * default to userspace handling if unavailable.
   */
 -vector-virq = msg ? kvm_irqchip_add_msi_route(kvm_state, *msg) : -1;
 +
 +vector-virq = msg ?
 +kvm_irqchip_add_msi_route(kvm_state, vdev-pdev.bus, *msg) : -1;
 
 This is wrong.  You could call the bus function to map an MSI message
 to a virq here.
 
  if (vector-virq  0 ||
  kvm_irqchip_add_irqfd_notifier(kvm_state, vector-interrupt,
 vector-virq)  0) {
 @@ -792,7 +794,8 @@ retry:
   * Attempt to enable route through KVM irqchip,
   * default to userspace handling if unavailable.
   */
 -vector-virq = kvm_irqchip_add_msi_route(kvm_state, msg);
 +vector-virq = kvm_irqchip_add_msi_route(kvm_state, vdev-pdev.bus,
 + msg);
 
 And here.
 
  if (vector-virq  0 ||
  kvm_irqchip_add_irqfd_notifier(kvm_state, vector-interrupt,
 vector-virq)  0) {
 diff --git a/hw/pci/pci.c b/hw/pci/pci.c
 index 61b681a..543f172 100644
 --- a/hw/pci/pci.c
 +++ b/hw/pci/pci.c
 @@ -1240,6 +1240,19 @@ void pci_device_set_intx_routing_notifier(PCIDevice 
 *dev,
  dev-intx_routing_notifier = notifier;
  }

 +void pci_bus_set_map_msi_fn(PCIBus *bus, pci_map_msi_fn map_msi_fn)
 +{
 +bus-map_msi = map_msi_fn;
 +}
 
 You don't need this function.  You can do this overloading as part of
 the PCI bus initialization in spapr_pci.c
 
 Regards,
 
 Anthony Liguori
 
 +int pci_bus_map_msi(PCIBus *bus, MSIMessage msg)
 +{
 +if (bus-map_msi) {
 +return bus-map_msi(bus, msg);
 +}
 +return -1;
 +}
 +
  /*
   * PCI-to-PCI bridge specification
   * 9.1: Interrupt routing. Table 9-1
 diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
 index 23dbc0e..bae4faf 100644
 --- a/hw/ppc/spapr_pci.c
 +++ b/hw/ppc/spapr_pci.c
 @@ -486,6 +486,11 @@ static void spapr_msi_write(void *opaque, hwaddr addr,
  qemu_irq_pulse(xics_get_qirq(spapr-icp, irq));
  }

 +static int spapr_msi_get_irq(PCIBus *bus, MSIMessage msg)
 +{
 +return msg.data;
 +}
 +
  static const MemoryRegionOps spapr_msi_ops = {
  /* There is no .read as the read result is undefined by PCI spec */
  .read = NULL,
 @@ -657,6 +662,7 @@ static int spapr_phb_init(SysBusDevice *s)

  sphb-lsi_table[i].irq = irq;
  }
 +pci_bus_set_map_msi_fn(bus, spapr_msi_get_irq);

  return 0;
  }

Re: [Qemu-devel] [PATCH 1/2] kvm irqfd: support msimessage to irq translation in PHB

2013-08-19 Thread Alexey Kardashevskiy

On 08/19/2013 05:54 PM, Michael S. Tsirkin wrote:
 On Mon, Aug 19, 2013 at 05:44:04PM +1000, Alexey Kardashevskiy wrote:
 On 08/19/2013 05:35 PM, Michael S. Tsirkin wrote:
 On Wed, Aug 07, 2013 at 06:51:32PM +1000, Alexey Kardashevskiy wrote:
 On PPC64 systems MSI Messages are translated to system IRQ in a PCI
 host bridge. This is already supported for emulated MSI/MSIX but
 not for irqfd where the current QEMU allocates IRQ numbers from
 irqchip and maps MSIMessages to those IRQ in the host kernel.

 The patch extends irqfd support in order to avoid unnecessary
 mapping and reuse the one which already exists in a PCI host bridge.

 Specifically, a map_msi callback is added to PCIBus and pci_bus_map_msi()
 to PCI API. The latter tries a bus specific map_msi and falls back to
 the default kvm_irqchip_add_msi_route() if none set.

 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru

 The mapping (irq # within data) is really KVM on PPC64 specific,
 isn't it?

 So why not implement
 kvm_irqchip_add_msi_route(kvm_state, msg);
 to simply return msg.data on PPC64?

 How exactly? Please give some details. kvm_irqchip_add_msi_route() is
 implemented in kvm-all.c once for all platforms so hack it right there?
 
 You can add the map_msi callback in kvm state,
 then just call it if it's set, right?
 
 I thought we discussed all of this already and decided that this thing
 should go to PCI host bridge and by default PHB's map_msi() callback should
 just call kvm_irqchip_add_msi_route(). This is why I touched i386.

 Things have changed since then?
 
 
 I think pci_bus_map_msi was there since day 1, it's not like
 we are going back and forth on it, right?


Sorry, I am not following you. pci_bus_map_msi() is added by this patch.
Where was it from day 1?


 I always felt it's best to have irqfd isolated to kvm somehow
 rather than have hooks in pci code, I just don't know enough
 about kvm on ppc to figure out the right way to do this.
 With your latest patches I think this is becoming clearer ...


Well... This encoding (irq# as msg.data) is used in spapr_pci.c in any
case, whether it is KVM or TCG.

I am confused.
1.5 month ago Anthony suggested (I'll answer to that mail to bring that
discussion up) to do this thing as:

 So what this should look like is:

 1) A PCI bus function to do the MSI - virq mapping
 2) On x86 (and e500), this is implemented by calling
kvm_irqchip_add_msi_route()
 3) On pseries, this just returns msi-data

 Perhaps (2) can just be the default PCI bus implementation to simplify
things.


Now it is not right. Anthony, please help.



 Then you won't have to change all the rest of the code.

 ---
 Changes:
 2013/08/07 v5:
 * pci_bus_map_msi now has default behaviour which is to call
 kvm_irqchip_add_msi_route
 * kvm_irqchip_release_virq fixed not crash when there is no routes
 ---
  hw/i386/kvm/pci-assign.c | 4 ++--
  hw/misc/vfio.c   | 4 ++--
  hw/pci/pci.c | 9 +
  hw/virtio/virtio-pci.c   | 2 +-
  include/hw/pci/pci.h | 2 ++
  include/hw/pci/pci_bus.h | 1 +
  kvm-all.c| 3 +++
  7 files changed, 20 insertions(+), 5 deletions(-)

 diff --git a/hw/i386/kvm/pci-assign.c b/hw/i386/kvm/pci-assign.c
 index 5618173..fb59982 100644
 --- a/hw/i386/kvm/pci-assign.c
 +++ b/hw/i386/kvm/pci-assign.c
 @@ -1008,9 +1008,9 @@ static void assigned_dev_update_msi(PCIDevice 
 *pci_dev)
  MSIMessage msg = msi_get_message(pci_dev, 0);
  int virq;
  
 -virq = kvm_irqchip_add_msi_route(kvm_state, msg);
 +virq = pci_bus_map_msi(kvm_state, pci_dev-bus, msg);
  if (virq  0) {
 -perror(assigned_dev_update_msi: kvm_irqchip_add_msi_route);
 +perror(assigned_dev_update_msi: pci_bus_map_msi);
  return;
  }


 This really confuses me. Why are you changing i386 code?

   
 diff --git a/hw/misc/vfio.c b/hw/misc/vfio.c
 index 017e693..adcd23d 100644
 --- a/hw/misc/vfio.c
 +++ b/hw/misc/vfio.c
 @@ -643,7 +643,7 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, 
 unsigned int nr,
   * Attempt to enable route through KVM irqchip,
   * default to userspace handling if unavailable.
   */
 -vector-virq = msg ? kvm_irqchip_add_msi_route(kvm_state, *msg) : -1;
 +vector-virq = msg ? pci_bus_map_msi(kvm_state, vdev-pdev.bus, *msg) 
 : -1;
  if (vector-virq  0 ||
  kvm_irqchip_add_irqfd_notifier(kvm_state, vector-interrupt,
 vector-virq)  0) {
 @@ -811,7 +811,7 @@ retry:
   * Attempt to enable route through KVM irqchip,
   * default to userspace handling if unavailable.
   */
 -vector-virq = kvm_irqchip_add_msi_route(kvm_state, msg);
 +vector-virq = pci_bus_map_msi(kvm_state, vdev-pdev.bus, msg);
  if (vector-virq  0 ||
  kvm_irqchip_add_irqfd_notifier(kvm_state, vector-interrupt,
 vector-virq)  0) {
 diff --git a/hw/pci/pci.c

Re: [Qemu-devel] [PATCH 2/2] Add ARM registers definitions in Monitor commands

2013-08-19 Thread Fabien Chouteau

Any comments?

Regards,

On 08/02/2013 02:48 PM, Fabien Chouteau wrote:
 Signed-off-by: Fabien Chouteau chout...@adacore.com
 ---
  monitor.c |   17 +
  1 file changed, 17 insertions(+)
 
 diff --git a/monitor.c b/monitor.c
 index 5dc0aa9..78e93af 100644
 --- a/monitor.c
 +++ b/monitor.c
 @@ -3167,6 +3167,23 @@ static const MonitorDef monitor_defs[] = {
  { cleanwin, offsetof(CPUSPARCState, cleanwin) },
  { fprs, offsetof(CPUSPARCState, fprs) },
  #endif
 +#elif defined(TARGET_ARM)
 +{ r0, offsetof(CPUARMState, regs[0])  },
 +{ r1, offsetof(CPUARMState, regs[1])  },
 +{ r2, offsetof(CPUARMState, regs[2])  },
 +{ r3, offsetof(CPUARMState, regs[3])  },
 +{ r4, offsetof(CPUARMState, regs[4])  },
 +{ r5, offsetof(CPUARMState, regs[5])  },
 +{ r6, offsetof(CPUARMState, regs[6])  },
 +{ r7, offsetof(CPUARMState, regs[7])  },
 +{ r8, offsetof(CPUARMState, regs[8])  },
 +{ r9, offsetof(CPUARMState, regs[9])  },
 +{ r10,offsetof(CPUARMState, regs[10]) },
 +{ r11,offsetof(CPUARMState, regs[11]) },
 +{ r12,offsetof(CPUARMState, regs[12]) },
 +{ r13|sp, offsetof(CPUARMState, regs[13]) },
 +{ r14|lr, offsetof(CPUARMState, regs[14]) },
 +{ r15|pc, offsetof(CPUARMState, regs[15]) },
  #endif
  { NULL },
  };
 


-- 
Fabien Chouteau

Re: [Qemu-devel] [PATCH 1/2] Improve Monitor disas with symbol lookup

2013-08-19 Thread Fabien Chouteau

Any comments?

Regards,

On 08/02/2013 02:48 PM, Fabien Chouteau wrote:
 Part of M731-018.
 
 Signed-off-by: Fabien Chouteau chout...@adacore.com
 ---
  disas.c |   16 
  1 file changed, 12 insertions(+), 4 deletions(-)
 
 diff --git a/disas.c b/disas.c
 index 71007fb..3ffb3ae 100644
 --- a/disas.c
 +++ b/disas.c
 @@ -480,11 +480,19 @@ void monitor_disas(Monitor *mon, CPUArchState *env,
  #endif
  
  for(i = 0; i  nb_insn; i++) {
 - monitor_printf(mon, 0x TARGET_FMT_lx :  , pc);
 +const char *sym = lookup_symbol(pc);
 +
 +monitor_printf(mon, 0x TARGET_FMT_lx  , pc);
 +if (sym[0] != '\0') {
 +monitor_printf(mon, %s:  , sym);
 +} else {
 +monitor_printf(mon, :  );
 +}
  count = print_insn(pc, s.info);
 - monitor_printf(mon, \n);
 - if (count  0)
 - break;
 +monitor_printf(mon, \n);
 +if (count  0) {
 +break;
 +}
  pc += count;
  }
  }
 


-- 
Fabien Chouteau

Re: [Qemu-devel] [libvirt] [PATCH] qemu: Drop qemuDomainMemoryLimit

2013-08-19 Thread Michal Privoznik

On 09.08.2013 18:29, Daniel P. Berrange wrote:
 On Fri, Aug 09, 2013 at 10:58:55AM -0500, Anthony Liguori wrote:
 Michal Privoznik mpriv...@redhat.com writes:

 [CC'ing qemu-devel list]
 On 09.08.2013 15:17, Daniel P. Berrange wrote:
 On Fri, Aug 09, 2013 at 07:13:58AM -0600, Eric Blake wrote:
 On 08/09/2013 06:56 AM, Michal Privoznik wrote:


So what's the conclusion? Should I push the patch until qemu gets
stabilized in mem consumption (which can take a while - no offense to
qemu devels, I can imagine it's nontrivial).

Michal

Re: [Qemu-devel] [PATCH 2/2] Add ARM registers definitions in Monitor commands

2013-08-19 Thread Peter Maydell

On 19 August 2013 09:26, Fabien Chouteau chout...@adacore.com wrote:
 Any comments?

 Regards,

 On 08/02/2013 02:48 PM, Fabien Chouteau wrote:
 Signed-off-by: Fabien Chouteau chout...@adacore.com
 ---
  monitor.c |   17 +
  1 file changed, 17 insertions(+)

 diff --git a/monitor.c b/monitor.c
 index 5dc0aa9..78e93af 100644
 --- a/monitor.c
 +++ b/monitor.c
 @@ -3167,6 +3167,23 @@ static const MonitorDef monitor_defs[] = {
  { cleanwin, offsetof(CPUSPARCState, cleanwin) },
  { fprs, offsetof(CPUSPARCState, fprs) },
  #endif
 +#elif defined(TARGET_ARM)
 +{ r0, offsetof(CPUARMState, regs[0])  },
 +{ r1, offsetof(CPUARMState, regs[1])  },

Rather than adding yet another entry to this target-ifdef
ladder in common code, maybe we can abstract this out to be
a method/field of the CPU object?

Andreas can probably suggest the best approach.

thanks
-- PMM

Re: [Qemu-devel] [PATCH 1/2] Improve Monitor disas with symbol lookup

2013-08-19 Thread Peter Maydell

On 2 August 2013 13:48, Fabien Chouteau chout...@adacore.com wrote:
 Part of M731-018.

What is this a reference to?

 Signed-off-by: Fabien Chouteau chout...@adacore.com
 ---
  disas.c |   16 
  1 file changed, 12 insertions(+), 4 deletions(-)

 diff --git a/disas.c b/disas.c
 index 71007fb..3ffb3ae 100644
 --- a/disas.c
 +++ b/disas.c
 @@ -480,11 +480,19 @@ void monitor_disas(Monitor *mon, CPUArchState *env,
  #endif

  for(i = 0; i  nb_insn; i++) {
 -   monitor_printf(mon, 0x TARGET_FMT_lx :  , pc);
 +const char *sym = lookup_symbol(pc);
 +
 +monitor_printf(mon, 0x TARGET_FMT_lx  , pc);
 +if (sym[0] != '\0') {
 +monitor_printf(mon, %s:  , sym);
 +} else {
 +monitor_printf(mon, :  );
 +}

It feels to me like this is at the wrong level:
shouldn't it be in the disassembly layer so that you
can get symbols in both monitor disassembly and
debug-log disassembly?

thanks
-- PMM

Re: [Qemu-devel] [PATCH] powerpc iommu: enable multiple TCE requests

2013-08-19 Thread Alexey Kardashevskiy

On 08/19/2013 06:01 PM, Alexander Graf wrote:
 
 
 Am 19.08.2013 um 09:30 schrieb Alexey Kardashevskiy a...@ozlabs.ru:
 
 On 08/19/2013 01:22 AM, Paolo Bonzini wrote:
 Il 16/08/2013 11:49, Alexey Kardashevskiy ha scritto:
 With KVM, we could fall back to the qemu implementation
 + * when KVM doesn't support them, but that would be much slower
 + * than just using the KVM implementations of the single TCE
 + * hypercalls. */
 +if (kvmppc_spapr_use_multitce()) {
 +_FDT((fdt_property(fdt, ibm,hypertas-functions, hypertas_propm,
 +   sizeof(hypertas_propm;
 +} else {
 +_FDT((fdt_property(fdt, ibm,hypertas-functions, hypertas_prop,
 +   sizeof(hypertas_prop;
 +}

 This prevents migration from newer kernel to older kernel.  Can you
 ensure that the fallback to the QEMU implementation works, even though
 it is not used in practice?

 How would it break? By having a device tree with multi-tce in it and not
 having KVM PPC capability for that?

 If this is the case, it will not prevent from migration as the multi-tce
 feature is supported anyway by this patch. The only reason for not
 advertising it to the guest is that the host kernel already has
 acceleration for H_PUT_TCE (single page map/unmap) and advertising
 multi-tce without having it in the host kernel (but only in QEMU) would
 slow things down (but it still will work).
 

 It means that if you use the same QEMU version with the same command
 line on a different kernel version, your guest looks different because
 we generate the dtb differently.

Oh. Sorry for my ignorance again, I am not playing dump or anything like
that - I do not understand how the device tree (which we cook in QEMU) on
the destination can possibly survive migration and not to be overwritten by
the one from the source. What was in the destination RAM before migration
does not matter at all (including dt), QEMU device tree is what matters but
this does not change. As it is the same QEMU version, hypercalls are
supported anyway, the only difference where they will be handled - in the
host kernel or QEMU. What do I miss?


 The usual way to avoid this is to have a command line option to at least
 make it possible for a management tool to nail down feature flags
 regardless of the host configuration.


 Considering that IIRC we haven't actually flagged -M pseries as
 backwards compatible (avoid breaking migration, etc) we can probably get
 away with enabling multi-tce always and live with the performance
 penalty on older host kernels.

We have H_PUT_TCE accelerated in older kernel for quite a while and we do
not want guests running on older hosts become slower for no good reason,
this is why we added this capability at the first place.



-- 
Alexey

Re: [Qemu-devel] [PATCH 1/2] kvm irqfd: support msimessage to irq translation in PHB

2013-08-19 Thread Michael S. Tsirkin

On Mon, Aug 19, 2013 at 06:10:01PM +1000, Alexey Kardashevskiy wrote:
 On 08/19/2013 05:54 PM, Michael S. Tsirkin wrote:
  On Mon, Aug 19, 2013 at 05:44:04PM +1000, Alexey Kardashevskiy wrote:
  On 08/19/2013 05:35 PM, Michael S. Tsirkin wrote:
  On Wed, Aug 07, 2013 at 06:51:32PM +1000, Alexey Kardashevskiy wrote:
  On PPC64 systems MSI Messages are translated to system IRQ in a PCI
  host bridge. This is already supported for emulated MSI/MSIX but
  not for irqfd where the current QEMU allocates IRQ numbers from
  irqchip and maps MSIMessages to those IRQ in the host kernel.
 
  The patch extends irqfd support in order to avoid unnecessary
  mapping and reuse the one which already exists in a PCI host bridge.
 
  Specifically, a map_msi callback is added to PCIBus and pci_bus_map_msi()
  to PCI API. The latter tries a bus specific map_msi and falls back to
  the default kvm_irqchip_add_msi_route() if none set.
 
  Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
 
  The mapping (irq # within data) is really KVM on PPC64 specific,
  isn't it?
 
  So why not implement
  kvm_irqchip_add_msi_route(kvm_state, msg);
  to simply return msg.data on PPC64?
 
  How exactly? Please give some details. kvm_irqchip_add_msi_route() is
  implemented in kvm-all.c once for all platforms so hack it right there?
  
  You can add the map_msi callback in kvm state,
  then just call it if it's set, right?
  
  I thought we discussed all of this already and decided that this thing
  should go to PCI host bridge and by default PHB's map_msi() callback should
  just call kvm_irqchip_add_msi_route(). This is why I touched i386.
 
  Things have changed since then?
  
  
  I think pci_bus_map_msi was there since day 1, it's not like
  we are going back and forth on it, right?
 
 
 Sorry, I am not following you. pci_bus_map_msi() is added by this patch.
 Where was it from day 1?

I'm sorry. I merely meant that it's there in v1 of this patch.

 
  I always felt it's best to have irqfd isolated to kvm somehow
  rather than have hooks in pci code, I just don't know enough
  about kvm on ppc to figure out the right way to do this.
  With your latest patches I think this is becoming clearer ...
 
 
 Well... This encoding (irq# as msg.data) is used in spapr_pci.c in any
 case, whether it is KVM or TCG.

Not only on TCG. All emulated devices merely do a write to send an MSI,
this is exactly what happens with real hardware.  If this happens to
land in the MSI region, you get an interrupt.  The concept of mapping
msi to irq normally doesn't belong in devices/pci core, it's something
done by APIC or pci host.

For KVM, we have this special hook where devices allocate a route and
then Linux can send an interrupt to guest quickly bypassing QEMU.  It
was originally designed for paravirtualization, so it doesn't support
strange cases such as guest programming MSI to write into guest memory,
which is possible with real PCI devices. However, no one seems to do
these hacks in practice, so in practice this works for
some other cases, such as device assignment.

That's why we have kvm_irqchip_add_msi_route in some places
in code - it's so specific devices implemented by
Linux can take this shortcut (it does not make sense for
devices implemented by qemu).
So the way I see it, it's not a PCI thing at all, it's
a KVM specific implementation, so it seems cleaner if
it's isolated there.

Now, we have this allocate/free API for reasons that
have to do with the API of kvm on x86. spapr doesn't need to
allocate/free resources, so just shortcut free and
do map when we allocate.

Doesn't this sound reasonable?

 
 I am confused.
 1.5 month ago Anthony suggested (I'll answer to that mail to bring that
 discussion up) to do this thing as:
 
  So what this should look like is:
 
  1) A PCI bus function to do the MSI - virq mapping
  2) On x86 (and e500), this is implemented by calling
 kvm_irqchip_add_msi_route()
  3) On pseries, this just returns msi-data
 
  Perhaps (2) can just be the default PCI bus implementation to simplify
 things.
 
 
 Now it is not right. Anthony, please help.

That's right, you implemented exactly what Anthony suggested.  Now that
you did, I think I see an even better, more contained way to do this.
And it's not much of a change as compared to what you have, is it?

I'm sorry if this looks like you are given a run-around,
not everyone understands how spapr works, sometimes
it takes a full implementation to make everyone understand
the issues.

But I agree, let's see what Anthony thinks, maybe I
misunderstood how spapr works.

 
  Then you won't have to change all the rest of the code.
 
  ---
  Changes:
  2013/08/07 v5:
  * pci_bus_map_msi now has default behaviour which is to call
  kvm_irqchip_add_msi_route
  * kvm_irqchip_release_virq fixed not crash when there is no routes
  ---
   hw/i386/kvm/pci-assign.c | 4 ++--
   hw/misc/vfio.c   | 4 ++--
   hw/pci/pci.c | 9 +
   hw/virtio/virtio-pci.c   | 2 +-

Re: [Qemu-devel] [PATCH] powerpc iommu: enable multiple TCE requests

2013-08-19 Thread Alexey Kardashevskiy

On 08/16/2013 11:15 PM, Alexander Graf wrote:
 
 On 07.08.2013, at 10:08, Alexey Kardashevskiy wrote:
 
 Currently only single TCE entry per requiest is supported (H_PUT_TCE).
 However PAPR+ specification allows multiple entry requests such as
 H_PUT_TCE_INDIRECT and H_STUFFF_TCE. Having less transitions to the host
 kernel via ioctls, support of these calls can accelerate IOMMU operations.

 This also removes some leftovers in debug output of the H_PUT_TCE handler.

 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
 Signed-off-by: David Gibson da...@gibson.dropbear.id.au
 ---

 This patch requires a KVM_CAP_SPAPR_MULTITCE capability from kernel headers
 which is not there yet.
 However it still would be nice to have Reviewed-by from someone when
 the capability will make it to the upstream. Thanks.

 ---
 hw/ppc/spapr.c   | 16 ++--
 hw/ppc/spapr_iommu.c | 74 
 +---
 target-ppc/kvm.c |  7 +
 target-ppc/kvm_ppc.h |  7 +
 4 files changed, 99 insertions(+), 5 deletions(-)

 diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
 index 9494915..a6b1f54 100644
 --- a/hw/ppc/spapr.c
 +++ b/hw/ppc/spapr.c
 @@ -301,6 +301,8 @@ static void *spapr_create_fdt_skel(const char *cpu_model,
 CPUState *cs;
 uint32_t start_prop = cpu_to_be32(initrd_base);
 uint32_t end_prop = cpu_to_be32(initrd_base + initrd_size);
 +char hypertas_propm[] = 
 hcall-pft\0hcall-term\0hcall-dabr\0hcall-interrupt
 +\0hcall-tce\0hcall-vio\0hcall-splpar\0hcall-bulk\0hcall-multi-tce;
 char hypertas_prop[] = 
 hcall-pft\0hcall-term\0hcall-dabr\0hcall-interrupt
 \0hcall-tce\0hcall-vio\0hcall-splpar\0hcall-bulk;
 char qemu_hypertas_prop[] = hcall-memop1;
 @@ -480,8 +482,18 @@ static void *spapr_create_fdt_skel(const char 
 *cpu_model,
 /* RTAS */
 _FDT((fdt_begin_node(fdt, rtas)));

 -_FDT((fdt_property(fdt, ibm,hypertas-functions, hypertas_prop,
 -   sizeof(hypertas_prop;
 +/* In TCG mode, the multitce functions, which we implement are a
 + * win.  With KVM, we could fall back to the qemu implementation
 + * when KVM doesn't support them, but that would be much slower
 + * than just using the KVM implementations of the single TCE
 + * hypercalls. */
 +if (kvmppc_spapr_use_multitce()) {
 +_FDT((fdt_property(fdt, ibm,hypertas-functions, hypertas_propm,
 +   sizeof(hypertas_propm;
 +} else {
 +_FDT((fdt_property(fdt, ibm,hypertas-functions, hypertas_prop,
 +   sizeof(hypertas_prop;
 +}
 _FDT((fdt_property(fdt, qemu,hypertas-functions, qemu_hypertas_prop,
sizeof(qemu_hypertas_prop;

 diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
 index 3d4a1fc..22b09be 100644
 --- a/hw/ppc/spapr_iommu.c
 +++ b/hw/ppc/spapr_iommu.c
 @@ -244,6 +244,71 @@ static target_ulong put_tce_emu(sPAPRTCETable *tcet, 
 target_ulong ioba,
 return H_SUCCESS;
 }

 +static target_ulong h_put_tce_indirect(PowerPCCPU *cpu,
 +   sPAPREnvironment *spapr,
 +   target_ulong opcode, target_ulong 
 *args)
 +{
 +int i;
 +target_ulong liobn = args[0];
 +target_ulong ioba = args[1];
 +target_ulong tce_list = args[2];
 +target_ulong npages = args[3];
 +target_ulong ret = 0;
 +sPAPRTCETable *tcet = spapr_tce_find_by_liobn(liobn);
 +
 +if (tcet) {
 +for (i = 0; i  npages; ++i, ioba += SPAPR_TCE_PAGE_SIZE) {
 
 i++
 
 +target_ulong tce = ldq_phys((tce_list  ~SPAPR_TCE_PAGE_MASK) +
 
 I think it makes sense to do the masking when you assign the variable - makes 
 it easier to read.
 
 +i * sizeof(target_ulong));
 +ret = put_tce_emu(tcet, ioba, tce);
 +if (ret) {
 +break;
 +}
 +}
 +return ret;
 +}
 +#ifdef DEBUG_TCE
 +fprintf(stderr, %s on liobn= TARGET_FMT_lx
 
 Could you please convert this into something that doesn't bitrot? Either a 
 DPRINTF style macro that gets format checking done even when unused or a 
 trace point.


This file does not have DPRINTF and every time David or I tried to add one,
we were loudly shouted not to do this. So - tracepoints. But the file uses
#ifdef DEBUG_TCE heavily, and there is already a bitrot trace in
H_PUT_TCE (the only hcall in the group which is already in the file).

So what do I do now with traces? 2 patches - first converts everything to
tracepoints and second patch which actually adds multi-tce? Or remove old
bitrot traces from this patch and repost (I can survive without any debug
code in upstream)? I am fine with both ways.



 +  ioba 0x TARGET_FMT_lx   TCE 0x TARGET_FMT_lx
 + ret =  TARGET_FMT_ld \n,
 +__func__, liobn, ioba, tce_list, ret);
 +#endif
 +
 +return H_PARAMETER;
 +}
 +
 +static

Re: [Qemu-devel] [libvirt] [PATCH] qemu: Drop qemuDomainMemoryLimit

2013-08-19 Thread Daniel P. Berrange

On Mon, Aug 19, 2013 at 10:29:10AM +0200, Michal Privoznik wrote:
 On 09.08.2013 18:29, Daniel P. Berrange wrote:
  On Fri, Aug 09, 2013 at 10:58:55AM -0500, Anthony Liguori wrote:
  Michal Privoznik mpriv...@redhat.com writes:
 
  [CC'ing qemu-devel list]
  On 09.08.2013 15:17, Daniel P. Berrange wrote:
  On Fri, Aug 09, 2013 at 07:13:58AM -0600, Eric Blake wrote:
  On 08/09/2013 06:56 AM, Michal Privoznik wrote:
 
 
 So what's the conclusion? Should I push the patch until qemu gets
 stabilized in mem consumption (which can take a while - no offense to
 qemu devels, I can imagine it's nontrivial).

Given the lack of any useful info to make memory limits work reliably,
I guess we don't have any choice but to remove this default memory limit,
and also recommend against people setting explicit memory limits too.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

Re: [Qemu-devel] [PATCH v4 1/6] pc: Don't prematurely explode QEMUMachineInitArgs

2013-08-19 Thread Markus Armbruster

Laszlo Ersek ler...@redhat.com writes:

 On 08/16/13 13:13, arm...@redhat.com wrote:

  static void pc_init_isa(QEMUMachineInitArgs *args)
  {
 -ram_addr_t ram_size = args-ram_size;
 -const char *cpu_model = args-cpu_model;
 -const char *kernel_filename = args-kernel_filename;
 -const char *kernel_cmdline = args-kernel_cmdline;
 -const char *initrd_filename = args-initrd_filename;
 -const char *boot_device = args-boot_device;
  has_pci_info = false;
 -if (cpu_model == NULL)
 -cpu_model = 486;
 +if (!args-cpu_model) {
 +args-cpu_model = 486;
 +}

 This modifies *args.

Precedence:

db4ff6f hw/realview.c: Don't prematurely explode QEMUMachineInitArgs
1b523b5 hw/versatilepb: Don't prematurely explode QEMUMachineInitArgs

 IIUC, args here points to the args auto variable in main().
 args.cpu_model is initialized from the standalone cpu_model
 variable. That one in turn is either NULL, or points to a command line
 argument. Ie. args.cpu_model is never expected to be freed, and the
 underlying char array is not expected to be modified. OK.

 Reviewed-by: Laszlo Ersek ler...@redhat.com

Thanks!

Re: [Qemu-devel] [PATCH v4 3/6] sun4: Don't prematurely explode QEMUMachineInitArgs

2013-08-19 Thread Markus Armbruster

Laszlo Ersek ler...@redhat.com writes:

 comments below

 On 08/16/13 13:13, arm...@redhat.com wrote:
 From: Markus Armbruster arm...@redhat.com
 
 Don't explode QEMUMachineInitArgs before passing it to
 sun4m_hw_init(), sun4uv_init().
 
 Signed-off-by: Markus Armbruster arm...@redhat.com
 ---
  hw/sparc/sun4m.c | 113
 -
  hw/sparc64/sun4u.c |  52 +++-
  2 files changed, 40 insertions(+), 125 deletions(-)
 
 diff --git a/hw/sparc/sun4m.c b/hw/sparc/sun4m.c
 index 942ca37..36ef36f 100644
 --- a/hw/sparc/sun4m.c
 +++ b/hw/sparc/sun4m.c
 @@ -836,12 +836,10 @@ static void dummy_fdc_tc(void *opaque, int
 irq, int level)
  {
  }
  
 -static void sun4m_hw_init(const struct sun4m_hwdef *hwdef,
 ram_addr_t RAM_size,
 -  const char *boot_device,
 -  const char *kernel_filename,
 -  const char *kernel_cmdline,
 -  const char *initrd_filename, const char 
 *cpu_model)
 +static void sun4m_hw_init(const struct sun4m_hwdef *hwdef,
 +  QEMUMachineInitArgs *args)
  {
 +const char *cpu_model = args-cpu_model;

 This may not be strictly necessary; in the first patch you modify
 args-cpu_model too.

Yes.

 In any case the above is not wrong.

 @@ -878,13 +875,15 @@ static void sun4uv_init(MemoryRegion 
 *address_space_mem,
  
  initrd_size = 0;
  initrd_addr = 0;
 -kernel_size = sun4u_load_kernel(kernel_filename, initrd_filename,
 +kernel_size = sun4u_load_kernel(args-kernel_filename,
 +args-initrd_filename,
  ram_size, initrd_size, initrd_addr,
  kernel_addr, kernel_entry);

 The patch is correct / faithful here, but I smell some fubar in the code
 that the patch keeps intact.

 ram_size is apparently the global variable from vl.c. So this
 function gets the RAM size twice, once via RAM_size / args-ram_size,
 then via the ram_size global. (They should have identical values,
 args.ram_size in main() is initialized with ram_size the global.)

 The rest of the code below this hunk (in the full source file, not just
 in the patch) alternates between RAM_size / args-ram_size and
 ram_size quite schizophrenically too; see eg. FW_CFG_RAM_SIZE.

Correct.  Could be cleaned up on top.

 Anyway the patch only improves things.

 Reviewed-by: Laszlo Ersek ler...@redhat.com

Thanks!

Re: [Qemu-devel] [PATCH v4 5/6] ppc: Don't duplicate QEMUMachineInitArgs in PPCE500Params

2013-08-19 Thread Markus Armbruster

Laszlo Ersek ler...@redhat.com writes:

 comments below

 On 08/16/13 13:13, arm...@redhat.com wrote:
 From: Markus Armbruster arm...@redhat.com
 
 Pass on the generic arguments unadulterated, and the machine-specific
 ones as separate argument.
 
 Signed-off-by: Markus Armbruster arm...@redhat.com
 Acked-by: Alexander Graf ag...@suse.de
 ---
  hw/ppc/e500.c  | 35 ++-
  hw/ppc/e500.h  | 13 +++--
  hw/ppc/e500plat.c  |  8 +---
  hw/ppc/mpc8544ds.c |  8 +---
  4 files changed, 23 insertions(+), 41 deletions(-)

 Please always use

   -O/path/to/order_file

 when invoking git-format-patch.

 The contents of order_file should be minimally

   configure
   Makefile*
   *.json
   *.h
   *.c

 It's much easier to review a patch when declarative changes are shown
 first (ie. in approximate logical dependency order).

Is there a way to put this in .git/config?

Should http://wiki.qemu.org/Contribute/SubmitAPatch ask for this?

 Then,

 -void ppce500_init(PPCE500Params *params)
 +void ppce500_init(QEMUMachineInitArgs *args, PPCE500Params *params)
  {
  MemoryRegion *address_space_mem = get_system_memory();
  MemoryRegion *ram = g_new(MemoryRegion, 1);
 @@ -584,8 +585,8 @@ void ppce500_init(PPCE500Params *params)
  PPCE500CCSRState *ccsr;
  
  /* Setup CPUs */
 -if (params-cpu_model == NULL) {
 -params-cpu_model = e500v2_v30;
 +if (args-cpu_model == NULL) {
 +args-cpu_model = e500v2_v30;
  }

 As discussed before, this change will modify the args.cpu_model member
 in main(), but that's OK.


 @@ -634,7 +635,7 @@ void ppce500_init(PPCE500Params *params)
  
  /* Fixup Memory size on a alignment boundary */
  ram_size = ~(RAM_SIZES_ALIGN - 1);
 -params-ram_size = ram_size;
 +args-ram_size = ram_size;

 This hackery (commendably left intact by the patch) is convincing me
 that QEMUMachineInitArgs should not have a ram_size member at all. If
 ram_size is a well-founded global, then let's treat it as such. Whatever.

Global variables are often bad style (there are exceptions).  Even worse
is passing what is essentially global state down call chains while
keeping the global variables around for random poking.  And that's what
we tend to do %-/

 Reviewed-by: Laszlo Ersek ler...@redhat.com

Thanks!

Re: [Qemu-devel] [PATCH] qemu-kvm bugfix for IA32_FEATURE_CONTROL

2013-08-19 Thread Paolo Bonzini

Il 18/08/2013 20:23, Liu, Jinsong ha scritto:
 From 1273f8b2e5464ec987facf9942fd3ccc0b69087e Mon Sep 17 00:00:00 2001
 From: Liu Jinsong jinsong@intel.com
 Date: Mon, 19 Aug 2013 09:33:30 +0800
 Subject: [PATCH] qemu-kvm bugfix for IA32_FEATURE_CONTROL
 
 This patch is to fix the bug https://bugs.launchpad.net/qemu-kvm/+bug/1207623
 
 IA32_FEATURE_CONTROL is pointless if not expose VMX or SMX bits to
 cpuid.1.ecx of vcpu. Current qemu-kvm will error return when kvm_put_msrs
 or kvm_get_msrs.
 
 Signed-off-by: Liu Jinsong jinsong@intel.com
 ---
  target-i386/kvm.c |   16 ++--
  1 files changed, 14 insertions(+), 2 deletions(-)
 
 diff --git a/target-i386/kvm.c b/target-i386/kvm.c
 index 84ac00a..7facbfe 100644
 --- a/target-i386/kvm.c
 +++ b/target-i386/kvm.c
 @@ -65,6 +65,7 @@ static bool has_msr_star;
  static bool has_msr_hsave_pa;
  static bool has_msr_tsc_adjust;
  static bool has_msr_tsc_deadline;
 +static bool has_msr_feature_control;
  static bool has_msr_async_pf_en;
  static bool has_msr_pv_eoi_en;
  static bool has_msr_misc_enable;
 @@ -644,6 +645,11 @@ int kvm_arch_init_vcpu(CPUState *cs)
  
  qemu_add_vm_change_state_handler(cpu_update_state, env);
  
 +c = cpuid_find_entry(cpuid_data.cpuid, 1, 0);
 +if (c)
 +has_msr_feature_control = !!(c-ecx  CPUID_EXT_VMX) |
 +  !!(c-ecx  CPUID_EXT_SMX);
 +
  cpuid_data.cpuid.padding = 0;
  r = kvm_vcpu_ioctl(cs, KVM_SET_CPUID2, cpuid_data);
  if (r) {
 @@ -1121,7 +1127,10 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
  if (hyperv_vapic_recommended()) {
  kvm_msr_entry_set(msrs[n++], HV_X64_MSR_APIC_ASSIST_PAGE, 0);
  }
 -kvm_msr_entry_set(msrs[n++], MSR_IA32_FEATURE_CONTROL, 
 env-msr_ia32_feature_control);
 +if (has_msr_feature_control) {
 +kvm_msr_entry_set(msrs[n++], MSR_IA32_FEATURE_CONTROL,
 +  env-msr_ia32_feature_control);
 +}
  }
  if (env-mcg_cap) {
  int i;
 @@ -1346,7 +1355,9 @@ static int kvm_get_msrs(X86CPU *cpu)
  if (has_msr_misc_enable) {
  msrs[n++].index = MSR_IA32_MISC_ENABLE;
  }
 -msrs[n++].index = MSR_IA32_FEATURE_CONTROL;
 +if (has_msr_feature_control) {
 +msrs[n++].index = MSR_IA32_FEATURE_CONTROL;
 +}
  
  if (!env-tsc_valid) {
  msrs[n++].index = MSR_IA32_TSC;
 @@ -1447,6 +1458,7 @@ static int kvm_get_msrs(X86CPU *cpu)
  break;
  case MSR_IA32_FEATURE_CONTROL:
  env-msr_ia32_feature_control = msrs[i].data;
 +break;
  default:
  if (msrs[i].index = MSR_MC0_CTL 
  msrs[i].index  MSR_MC0_CTL + (env-mcg_cap  0xff) * 4) {
 

The patch looks good.  Please repost it with checkpatch.pl failures fixed.

Paolo

[Qemu-devel] [PATCH] qcow2: Change default for new images to compat=1.1

2013-08-19 Thread Kevin Wolf

By the time that qemu 1.7 will be released, enough time will has passed
since qemu 1.1, which is the first version to understand version 3
images, that changing the default shouldn't hurt many people any more
and the benefits of using the new format outweigh the pain.

qemu-iotests already runs with compat=1.1 by default.

Signed-off-by: Kevin Wolf kw...@redhat.com
---
 block/qcow2.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 3376901..42ea7ec 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1402,7 +1402,7 @@ static int qcow2_create(const char *filename, 
QEMUOptionParameter *options)
 int flags = 0;
 size_t cluster_size = DEFAULT_CLUSTER_SIZE;
 int prealloc = 0;
-int version = 2;
+int version = 3;
 
 /* Read out options */
 while (options  options-name) {
-- 
1.8.1.4

[Qemu-devel] [qemu-devel]question on virtqueue_get_avail_bytes

2013-08-19 Thread yinyin

Hi,all:
in func virtqueue_get_avail_bytes， when found a indirect desc, we need 
loop over it.
/* loop over the indirect descriptor table */
indirect = 1;
max = vring_desc_len(desc_pa, i) / sizeof(VRingDesc);
num_bufs = i = 0;
desc_pa = vring_desc_addr(desc_pa, i);
But, It init i to 0, then use i to update desc_pa. so we will always 
get  :
desc_pa = vring_desc_addr(desc_pa, 0);
is it right？or should we update desc_pa first, then init i to 0?

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 09f62c6..554ae6f 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -377,8 +377,8 @@ void virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int 
*in_bytes,
 /* loop over the indirect descriptor table */
 indirect = 1;
 max = vring_desc_len(desc_pa, i) / sizeof(VRingDesc);
-num_bufs = i = 0;
 desc_pa = vring_desc_addr(desc_pa, i);
+   num_bufs = i = 0;
 }
 
 do {

Re: [Qemu-devel] [PATCH v2 1/4] vmdk: fix L1 and L2 table size in vmdk3 open

2013-08-19 Thread Paolo Bonzini

Il 19/08/2013 04:18, Fam Zheng ha scritto:
 On Sun, 08/18 17:19, Paolo Bonzini wrote:
 Il 13/08/2013 03:21, Fam Zheng ha scritto:
 VMDK3 header has the field l1dir_size, but vmdk_open_vmdk3 hardcoded the
 value. This patch honors the header field.

 And the L2 table size is 4096 according to VMDK spec[1], instead of
 1  9 (512).

 I'm not sure from the VMDK spec that _only_ 4096 is supported for VMDK3
 files.  The way I read it, VMDK3 files in hosted products are supposed
 to have 2K grain tables (as specified by vmdk_open_vmdk3).
 
 I presume COWD is only specified in ESXi Host Sparse Extents
 section, which is also in practice the only known use case to me. There
 it says Grain tables have 4096 entries.
 
 I think you meant 2KB grain table specified in section Hosted Sparse
 Extent Metadata, with 512 entries. If so, it should be for VMDK4 with
 KDMV magic bytes, so doesn't affect COWD.

Ok, thanks for explaining.

Paolo

Re: [Qemu-devel] [PATCH v2 4/4] timer: make qemu_clock_enable sync between disable and timer's cb

2013-08-19 Thread Paolo Bonzini

Il 19/08/2013 09:14, liu ping fan ha scritto:
  But does timerlists need a lock, or does the BQL suffice?  If it
  doesn't, there is no need for events_list_lock either.  Is
  qemu_clock_enable called outside the BQL?
 
 Currently, no such guarantee based on BQL. But if we enforce that an
 backend thread holds BQL before it cleans up resources, we can resort
 to BQL. (so document this, and save the events_list_lock?)

Yes, please.

Paolo

Re: [Qemu-devel] [PATCH v4 5/6] ppc: Don't duplicate QEMUMachineInitArgs in PPCE500Params

2013-08-19 Thread Laszlo Ersek

On 08/19/13 11:24, Markus Armbruster wrote:
 Laszlo Ersek ler...@redhat.com writes:

 Please always use

   -O/path/to/order_file

 when invoking git-format-patch.

 The contents of order_file should be minimally

   configure
   Makefile*
   *.json
   *.h
   *.c

 It's much easier to review a patch when declarative changes are shown
 first (ie. in approximate logical dependency order).
 
 Is there a way to put this in .git/config?

The only way we've found thus far is to introduce a new alias for
git-format-patch that hardwires it.

 Should http://wiki.qemu.org/Contribute/SubmitAPatch ask for this?

Not a bad idea, but I'm afraid new contributors don't read it because
they don't know about it, and veteran contributors don't read it because
they don't need it.

The wiki would need a usable table of contents anyway, I have great
trouble every time I want to find anything. Wikipedia probably should
not have a flat sitemap for its millions of articles. The qemu wiki
should, for its handful.

Laszlo

 
 Then,

 -void ppce500_init(PPCE500Params *params)
 +void ppce500_init(QEMUMachineInitArgs *args, PPCE500Params *params)
  {
  MemoryRegion *address_space_mem = get_system_memory();
  MemoryRegion *ram = g_new(MemoryRegion, 1);
 @@ -584,8 +585,8 @@ void ppce500_init(PPCE500Params *params)
  PPCE500CCSRState *ccsr;
  
  /* Setup CPUs */
 -if (params-cpu_model == NULL) {
 -params-cpu_model = e500v2_v30;
 +if (args-cpu_model == NULL) {
 +args-cpu_model = e500v2_v30;
  }

 As discussed before, this change will modify the args.cpu_model member
 in main(), but that's OK.


 @@ -634,7 +635,7 @@ void ppce500_init(PPCE500Params *params)
  
  /* Fixup Memory size on a alignment boundary */
  ram_size = ~(RAM_SIZES_ALIGN - 1);
 -params-ram_size = ram_size;
 +args-ram_size = ram_size;

 This hackery (commendably left intact by the patch) is convincing me
 that QEMUMachineInitArgs should not have a ram_size member at all. If
 ram_size is a well-founded global, then let's treat it as such. Whatever.
 
 Global variables are often bad style (there are exceptions).  Even worse
 is passing what is essentially global state down call chains while
 keeping the global variables around for random poking.  And that's what
 we tend to do %-/
 
 Reviewed-by: Laszlo Ersek ler...@redhat.com
 
 Thanks!

Re: [Qemu-devel] [PATCH v2 6/7] vl: Set current_machine early

2013-08-19 Thread Markus Armbruster

Andreas Färber afaer...@suse.de writes:

 Am 16.08.2013 15:18, schrieb arm...@redhat.com:
 From: Markus Armbruster arm...@redhat.com
 
 I'd like to access QEMUMachine from a QEMUMachine init() method, which
 is currently not possible.  Instead of passing it as an argument, I
 simply set current_machine earlier.

 We had such a patch for CPU hot-add and decided against doing this.
 Currently current_machine != signals that it has been initialized. And

Does any code actually depend on this undocumented condition?  I found
none.

 generally we have been trying to get away from accessing globals from
 random parts of code.

Global state need to be managed with care.  Global variables have their
place in that.

In my experience, the most common kind of carelessness involving global
variables is indisciplined *updating* of global state via global
variables.

Unchanging global state is relatively harmless, and referring to it via
global variables is often the easiest and most obvious way to manage
such state.

 Can't you pass either QEMUMachine or the specific fields needed from PC
 code to those SMBIOS functions? You did add a bool argument.

Can't see how to do that without passing the machine to QEMUMachine
method init(), which involves touching all boards.  I doubt that's a
good idea, but if you insist, I can do it.

Re: [Qemu-devel] [PATCH] powerpc iommu: enable multiple TCE requests

2013-08-19 Thread Paolo Bonzini

Il 19/08/2013 10:44, Alexey Kardashevskiy ha scritto:
  It means that if you use the same QEMU version with the same command
  line on a different kernel version, your guest looks different because
  we generate the dtb differently.
 Oh. Sorry for my ignorance again, I am not playing dump or anything like
 that - I do not understand how the device tree (which we cook in QEMU) on
 the destination can possibly survive migration and not to be overwritten by
 the one from the source. What was in the destination RAM before migration
 does not matter at all (including dt), QEMU device tree is what matters but
 this does not change. As it is the same QEMU version, hypercalls are
 supported anyway, the only difference where they will be handled - in the
 host kernel or QEMU. What do I miss?

Nothing, I just asked to test that handling the hypercall in QEMU works.

On x86 we have a similar problem, though with cpuid bits instead of the
device tree.  An older kernel might not support some cpuid bits, thus
-cpu SandyBridge might have different cpuid bits depending on the host
processor and kernel version.  This is handled by having an enforce
mode where -cpu SandyBridge,enforce will fail to start if the host
processor or the kernel is not new enough.

But in this case, you do not need this because the hypercall works if
emulated by QEMU.  I like Alex's solution of making it universally
available in the dtb.

Paolo

Re: [Qemu-devel] [PATCH v2 1/4] vmdk: fix L1 and L2 table size in vmdk3 open

2013-08-19 Thread Kevin Wolf

Am 13.08.2013 um 03:21 hat Fam Zheng geschrieben:
 VMDK3 header has the field l1dir_size, but vmdk_open_vmdk3 hardcoded the
 value. This patch honors the header field.
 
 And the L2 table size is 4096 according to VMDK spec[1], instead of
 1  9 (512).
 
 [1]:
 http://www.vmware.com/support/developer/vddk/vmdk_50_technote.pdf?src=vmdk
 
 Signed-off-by: Fam Zheng f...@redhat.com
 ---
  block/vmdk.c | 16 
  1 file changed, 8 insertions(+), 8 deletions(-)
 
 diff --git a/block/vmdk.c b/block/vmdk.c
 index 346bb5c..1392542 100644
 --- a/block/vmdk.c
 +++ b/block/vmdk.c
 @@ -486,14 +486,14 @@ static int vmdk_open_vmdk3(BlockDriverState *bs,
  if (ret  0) {
  return ret;
  }
 -
 -ret = vmdk_add_extent(bs,
 - bs-file, false,
 - le32_to_cpu(header.disk_sectors),
 - le32_to_cpu(header.l1dir_offset)  9,
 - 0, 1  6, 1  9,
 - le32_to_cpu(header.granularity),
 - extent);
 +ret = vmdk_add_extent(bs, file, false,
 +  le32_to_cpu(header.disk_sectors),
 +  le32_to_cpu(header.l1dir_offset)  9,
 +  0,
 +  le32_to_cpu(header.l1dir_size),
 +  4096,
 +  le32_to_cpu(header.granularity),
 +  extent);

You'll want to add a sanity check for header.l1dir_dir, or move the
existing check from vmdk_open_vmdk4() to vmdk_add_extent().

Kevin

Re: [Qemu-devel] [PATCH v3 1/2] memory: export migration page size

2013-08-19 Thread Laszlo Ersek

On 08/13/13 00:43, Michael S. Tsirkin wrote:
 Migration code assumes that each RAM block is a multiple of target page
 size.

Isn't that a valid assumption, considering the TARGET_PAGE_ALIGN() macro
call in qemu_ram_alloc_from_ptr() [exec.c]?

 We can fix this in a variety of ways, the simplest way is
 exporting the required page size so callers can make regions
 large enough.
 
 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 ---
  arch_init.c   | 6 ++
  include/exec/memory.h | 1 +
  2 files changed, 7 insertions(+)
 
 diff --git a/arch_init.c b/arch_init.c
 index 68a7ab7..c62778f 100644
 --- a/arch_init.c
 +++ b/arch_init.c
 @@ -150,6 +150,12 @@ int qemu_read_default_config_files(bool userconfig)
  return 0;
  }
  
 +/* Smallest page size for migrated RAM. */
 +uint64_t qemu_migration_page_size(void)
 +{
 +return TARGET_PAGE_SIZE;
 +}
 +
  static inline bool is_zero_page(uint8_t *p)
  {
  return buffer_find_nonzero_offset(p, TARGET_PAGE_SIZE) ==
 diff --git a/include/exec/memory.h b/include/exec/memory.h
 index ebe0d24..6a2 100644
 --- a/include/exec/memory.h
 +++ b/include/exec/memory.h
 @@ -1055,6 +1055,7 @@ void *address_space_map(AddressSpace *as, hwaddr addr,
  void address_space_unmap(AddressSpace *as, void *buffer, hwaddr len,
   int is_write, hwaddr access_len);
  
 +extern uint64_t qemu_migration_page_size(void);
  
  #endif

External linkage functions that are defined in arch_init.c, and relate
to migration -- for example, skipped_mig_bytes_transferred() -- are
declared in include/migration/migration.h. They seem to use
TARGET_PAGE_SIZE too.

What justifies declaring this new function in include/exec/memory.h?

Thanks,
Laszlo

Re: [Qemu-devel] [PATCH v2 4/4] vmdk: Move l1_size check into vmdk_add_extent()

2013-08-19 Thread Kevin Wolf

Am 13.08.2013 um 03:21 hat Fam Zheng geschrieben:
 This header check is common to VMDK3 and VMDK4, so move it into
 vmdk_add_extent().
 
 Signed-off-by: Fam Zheng f...@redhat.com

Aha, so this is the fix for patch 1. If you reorder the patches so that
this one comes first, you don't have broken intermediate patches.

Kevin

Re: [Qemu-devel] [libvirt] [PATCH] qemu: Drop qemuDomainMemoryLimit

2013-08-19 Thread Michal Privoznik

On 19.08.2013 11:06, Daniel P. Berrange wrote:
 On Mon, Aug 19, 2013 at 10:29:10AM +0200, Michal Privoznik wrote:
 On 09.08.2013 18:29, Daniel P. Berrange wrote:
 On Fri, Aug 09, 2013 at 10:58:55AM -0500, Anthony Liguori wrote:
 Michal Privoznik mpriv...@redhat.com writes:

 [CC'ing qemu-devel list]
 On 09.08.2013 15:17, Daniel P. Berrange wrote:
 On Fri, Aug 09, 2013 at 07:13:58AM -0600, Eric Blake wrote:
 On 08/09/2013 06:56 AM, Michal Privoznik wrote:


 So what's the conclusion? Should I push the patch until qemu gets
 stabilized in mem consumption (which can take a while - no offense to
 qemu devels, I can imagine it's nontrivial).
 
 Given the lack of any useful info to make memory limits work reliably,
 I guess we don't have any choice but to remove this default memory limit,
 and also recommend against people setting explicit memory limits too.
 
 Regards,
 Daniel
 

Okay, I've pushed the patch. And I've just proposed patch to discourage
users setting this limit:

https://www.redhat.com/archives/libvir-list/2013-August/msg00820.html

Michal

Re: [Qemu-devel] [RFC] [PATCHv2] aio / timers pt2: Replace main_loop_tlg with qemu_dummy_timer_ctx

2013-08-19 Thread Paolo Bonzini

Il 10/08/2013 13:21, Alex Bligh ha scritto:
 Currently we use a separate timer list group (main_loop_tlg)
 for the main loop. This patch replaces this with a dummy AioContext
 used just for timers in the main loop.
 
 Signed-off-by: Alex Bligh a...@alex.org.uk

I guess for now this complicates things more than it simplifies them.
You were right. :)

Paolo

 ---
  include/block/aio.h  |3 +++
  include/qemu/timer.h |9 ++---
  main-loop.c  |2 +-
  qemu-timer.c |   39 +--
  4 files changed, 39 insertions(+), 14 deletions(-)
 
 diff --git a/include/block/aio.h b/include/block/aio.h
 index b08de19..50a064d 100644
 --- a/include/block/aio.h
 +++ b/include/block/aio.h
 @@ -81,6 +81,9 @@ typedef struct AioContext {
  /* Returns 1 if there are still outstanding AIO requests; 0 otherwise */
  typedef int (AioFlushEventNotifierHandler)(EventNotifier *e);
  
 +/* Dummy AioContext for main loop timers */
 +extern AioContext *qemu_dummy_timer_ctx;
 +
  /**
   * aio_context_new: Allocate a new AioContext.
   *
 diff --git a/include/qemu/timer.h b/include/qemu/timer.h
 index 716f50b..aac3141 100644
 --- a/include/qemu/timer.h
 +++ b/include/qemu/timer.h
 @@ -58,8 +58,6 @@ typedef struct QEMUTimer {
  int scale;
  } QEMUTimer;
  
 -extern QEMUTimerListGroup main_loop_tlg;
 -
  /*
   * QEMUClockType
   */
 @@ -437,11 +435,8 @@ static inline QEMUTimer *timer_new_tl(QEMUTimerList 
 *timer_list,
   *
   * Returns: a pointer to the timer
   */
 -static inline QEMUTimer *timer_new(QEMUClockType type, int scale,
 -   QEMUTimerCB *cb, void *opaque)
 -{
 -return timer_new_tl(main_loop_tlg[type], scale, cb, opaque);
 -}
 +QEMUTimer *timer_new(QEMUClockType type, int scale,
 + QEMUTimerCB *cb, void *opaque);
  
  /**
   * timer_new_ns:
 diff --git a/main-loop.c b/main-loop.c
 index 1f24ac1..f1ee0e5 100644
 --- a/main-loop.c
 +++ b/main-loop.c
 @@ -479,7 +479,7 @@ int main_loop_wait(int nonblocking)
  
  timeout_ns = qemu_soonest_timeout(timeout_ns,
timerlistgroup_deadline_ns(
 -  main_loop_tlg));
 +  qemu_dummy_timer_ctx-tlg));
  
  ret = os_host_main_loop_wait(timeout_ns);
  qemu_iohandler_poll(gpollfds, ret);
 diff --git a/qemu-timer.c b/qemu-timer.c
 index 0d36a21..dd0b00a 100644
 --- a/qemu-timer.c
 +++ b/qemu-timer.c
 @@ -54,7 +54,8 @@ typedef struct QEMUClock {
  bool enabled;
  } QEMUClock;
  
 -QEMUTimerListGroup main_loop_tlg;
 +AioContext *qemu_dummy_timer_ctx;
 +
  QEMUClock qemu_clocks[QEMU_CLOCK_MAX];
  
  /* A QEMUTimerList is a list of timers attached to a clock. More
 @@ -123,7 +124,6 @@ static void qemu_clock_init(QEMUClockType type)
  clock-last = INT64_MIN;
  QLIST_INIT(clock-timerlists);
  notifier_list_init(clock-reset_notifiers);
 -main_loop_tlg[type] = timerlist_new(type, NULL, NULL);
  }
  
  bool qemu_clock_use_for_deadline(QEMUClockType type)
 @@ -158,7 +158,7 @@ bool timerlist_has_timers(QEMUTimerList *timer_list)
  bool qemu_clock_has_timers(QEMUClockType type)
  {
  return timerlist_has_timers(
 -main_loop_tlg[type]);
 +qemu_dummy_timer_ctx-tlg[type]);
  }
  
  bool timerlist_expired(QEMUTimerList *timer_list)
 @@ -171,7 +171,7 @@ bool timerlist_expired(QEMUTimerList *timer_list)
  bool qemu_clock_expired(QEMUClockType type)
  {
  return timerlist_expired(
 -main_loop_tlg[type]);
 +qemu_dummy_timer_ctx-tlg[type]);
  }
  
  /*
 @@ -221,7 +221,7 @@ QEMUClockType timerlist_get_clock(QEMUTimerList 
 *timer_list)
  
  QEMUTimerList *qemu_clock_get_main_loop_timerlist(QEMUClockType type)
  {
 -return main_loop_tlg[type];
 +return qemu_dummy_timer_ctx-tlg[type];
  }
  
  void timerlist_notify(QEMUTimerList *timer_list)
 @@ -291,6 +291,19 @@ void timer_init(QEMUTimer *ts,
  ts-scale = scale;
  }
  
 +QEMUTimer *timer_new(QEMUClockType type, int scale,
 + QEMUTimerCB *cb, void *opaque)
 +{
 +/* If this assert triggers, this is likely to mean
 + * that a timer is allocated prior to the dummy
 + * timer context being set up
 + */
 +assert(qemu_dummy_timer_ctx);
 +
 +return timer_new_tl(qemu_dummy_timer_ctx-tlg[type],
 +scale, cb, opaque);
 +}
 +
  void timer_free(QEMUTimer *ts)
  {
  g_free(ts);
 @@ -397,7 +410,7 @@ bool timerlist_run_timers(QEMUTimerList *timer_list)
  
  bool qemu_clock_run_timers(QEMUClockType type)
  {
 -return timerlist_run_timers(main_loop_tlg[type]);
 +return timerlist_run_timers(qemu_dummy_timer_ctx-tlg[type]);
  }
  
  void timerlistgroup_init(QEMUTimerListGroup tlg,
 @@ -486,6 +499,20 @@ void init_clocks(void)
  qemu_clock_init(type);
  }
  
 +/* Make a dummy context for timers in the main loop.
 + * 
 + * This context should not call the AioContext's
 + *

Re: [Qemu-devel] [PATCH v3 1/2] memory: export migration page size

2013-08-19 Thread Peter Maydell

On 19 August 2013 10:59, Laszlo Ersek ler...@redhat.com wrote:
 On 08/13/13 00:43, Michael S. Tsirkin wrote:
 Migration code assumes that each RAM block is a multiple of target page
 size.

 Isn't that a valid assumption, considering the TARGET_PAGE_ALIGN() macro
 call in qemu_ram_alloc_from_ptr() [exec.c]?

That macro only makes the size we store in the ramblock data
structure be a multiple of the page size -- it does nothing to ensure
that the actual memory that was passed in by the caller is the
right size. (It will have the right effect where qemu_ram_alloc_from_ptr
is allocating the memory itself, obviously.)

-- PMM

[Qemu-devel] [PATCH] Make usb-bt-dongle configurable

2013-08-19 Thread Miroslav Rezanina

usb-bt-dongle device can't be disabled as there's dependency in vl.c file. This 
patch add preprocesor condition to be able to disable it.

Signed-off-by: Miroslav Rezanina mreza...@redhat.com
---
 hw/usb/Makefile.objs |  1 -
 vl.c | 18 ++
 2 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/hw/usb/Makefile.objs b/hw/usb/Makefile.objs
index f9695e7..8892ffd 100644
--- a/hw/usb/Makefile.objs
+++ b/hw/usb/Makefile.objs
@@ -20,7 +20,6 @@ common-obj-$(CONFIG_USB_SERIAL)   += dev-serial.o
 common-obj-$(CONFIG_USB_NETWORK)  += dev-network.o
 
 # FIXME: make configurable too
-CONFIG_USB_BLUETOOTH := y
 common-obj-$(CONFIG_USB_BLUETOOTH)+= dev-bluetooth.o
 
 ifeq ($(CONFIG_USB_SMARTCARD),y)
diff --git a/vl.c b/vl.c
index f422a1c..4330b6d 100644
--- a/vl.c
+++ b/vl.c
@@ -1526,8 +1526,10 @@ static void configure_msg(QemuOpts *opts)
 
 static int usb_device_add(const char *devname)
 {
-const char *p;
 USBDevice *dev = NULL;
+#if  defined(CONFIG_USB_BLUETOOTH) || !defined(CONFIG_LINUX)
+const char *p;
+#endif
 
 if (!usb_enabled(false)) {
 return -1;
@@ -1543,15 +1545,23 @@ static int usb_device_add(const char *devname)
 /* only the linux version is qdev-ified, usb-bsd still needs this */
 if (strstart(devname, host:, p)) {
 dev = usb_host_device_open(usb_bus_find(-1), p);
-} else
+goto devtest;
+} 
 #endif
+#ifdef CONFIG_USB_BLUETOOTH
 if (!strcmp(devname, bt) || strstart(devname, bt:, p)) {
 dev = usb_bt_init(usb_bus_find(-1),
   devname[2] ? hci_init(p)
  : bt_new_hci(qemu_find_bt_vlan(0)));
-} else {
-return -1;
+goto devtest;
 }
+#endif
+
+return -1;
+
+#if  defined(CONFIG_USB_BLUETOOTH) || !defined(CONFIG_LINUX)
+devtest:
+#endif
 if (!dev)
 return -1;
 
-- 
1.8.3.1

[Qemu-devel] [PATCH v3 0/4] vmdk: Support ESX files

2013-08-19 Thread Fam Zheng

This series add support for VMFS and VMFSSPARSE extents, these types are found
in description file from ESX hosts.

 - VMFS is in monolithiFlat format (raw), but hosted in ESX.

 - VMFSSPARSE is the format we call vmdk3 with magic bytes COWD. This patch
   fix the opening of vmdk3 and rename it to vmfs_sparse which is better in
   representing its main usage nowadays.

v3:
Reorder patches to first move header check to
vmdk_add_extent().


Fam Zheng (3):
  vmdk: Move l1_size check into vmdk_add_extent()
  vmdk: fix L1 and L2 table size in vmdk3 open
  vmdk: support vmfsSparse files

Paolo Bonzini (1):
  vmdk: support vmfs files

 block/vmdk.c | 52 +++-
 1 file changed, 27 insertions(+), 25 deletions(-)

-- 
1.8.3.1

[Qemu-devel] [PATCH v3 2/4] vmdk: fix L1 and L2 table size in vmdk3 open

2013-08-19 Thread Fam Zheng

VMDK3 header has the field l1dir_size, but vmdk_open_vmdk3 hardcoded the
value. This patch honors the header field.

And the L2 table size is 4096 according to VMDK spec[1], instead of
1  9 (512).

[1]:
http://www.vmware.com/support/developer/vddk/vmdk_50_technote.pdf?src=vmdk

Signed-off-by: Fam Zheng f...@redhat.com
---
 block/vmdk.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/block/vmdk.c b/block/vmdk.c
index f8c0a4e..4d282a6 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -494,14 +494,14 @@ static int vmdk_open_vmdk3(BlockDriverState *bs,
 if (ret  0) {
 return ret;
 }
-
-ret = vmdk_add_extent(bs,
- bs-file, false,
- le32_to_cpu(header.disk_sectors),
- le32_to_cpu(header.l1dir_offset)  9,
- 0, 1  6, 1  9,
- le32_to_cpu(header.granularity),
- extent);
+ret = vmdk_add_extent(bs, file, false,
+  le32_to_cpu(header.disk_sectors),
+  le32_to_cpu(header.l1dir_offset)  9,
+  0,
+  le32_to_cpu(header.l1dir_size),
+  4096,
+  le32_to_cpu(header.granularity),
+  extent);
 if (ret  0) {
 return ret;
 }
-- 
1.8.3.1

[Qemu-devel] [PATCH v3 3/4] vmdk: support vmfsSparse files

2013-08-19 Thread Fam Zheng

VMware ESX hosts use a variant of the VMDK3 format, identified by the
vmfsSparse create type ad the VMFSSPARSE extent type.

It has 16 KB grain tables (L2) and a variable-size grain directory (L1).
In addition, the grain size is always 512, but that is not a problem
because it is included in the header.

The format of the extents is documented in the VMDK spec.  The format
of the descriptor file is not documented precisely, but it can be
found at http://kb.vmware.com/kb/10026353 (Recreating a missing virtual
machine disk (VMDK) descriptor file for delta disks).

With these patches, vmfsSparse files only work if opened through the
descriptor file.  Data files without descriptor files, as far as I
could understand, are not supported by ESX.

Signed-off-by: Paolo Bonzini pbonz...@redhat.com
Signed-off-by: Fam Zheng f...@redhat.com

--
v2: Rebase to patch 01.
Change le64_to_cpu to le32_to_cpu.
Rename vmdk_open_vmdk3 to vmdk_open_vmfs_sparse, which represents the
current usage of this format.

Signed-off-by: Fam Zheng f...@redhat.com
---
 block/vmdk.c | 17 +
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/block/vmdk.c b/block/vmdk.c
index 4d282a6..3986d1d 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -481,9 +481,9 @@ static int vmdk_init_tables(BlockDriverState *bs, 
VmdkExtent *extent)
 return ret;
 }
 
-static int vmdk_open_vmdk3(BlockDriverState *bs,
-   BlockDriverState *file,
-   int flags)
+static int vmdk_open_vmfs_sparse(BlockDriverState *bs,
+ BlockDriverState *file,
+ int flags)
 {
 int ret;
 uint32_t magic;
@@ -674,7 +674,7 @@ static int vmdk_open_sparse(BlockDriverState *bs,
 magic = be32_to_cpu(magic);
 switch (magic) {
 case VMDK3_MAGIC:
-return vmdk_open_vmdk3(bs, file, flags);
+return vmdk_open_vmfs_sparse(bs, file, flags);
 break;
 case VMDK4_MAGIC:
 return vmdk_open_vmdk4(bs, file, flags);
@@ -718,7 +718,8 @@ static int vmdk_parse_extents(const char *desc, 
BlockDriverState *bs,
 }
 
 if (sectors = 0 ||
-(strcmp(type, FLAT)  strcmp(type, SPARSE)) ||
+(strcmp(type, FLAT)  strcmp(type, SPARSE) 
+ strcmp(type, VMFSSPARSE)) ||
 (strcmp(access, RW))) {
 goto next_line;
 }
@@ -741,8 +742,8 @@ static int vmdk_parse_extents(const char *desc, 
BlockDriverState *bs,
 return ret;
 }
 extent-flat_start_offset = flat_offset  9;
-} else if (!strcmp(type, SPARSE)) {
-/* SPARSE extent */
+} else if (!strcmp(type, SPARSE) || !strcmp(type, VMFSSPARSE)) {
+/* SPARSE extent and VMFSSPARSE extent are both COWD sparse 
file*/
 ret = vmdk_open_sparse(bs, extent_file, bs-open_flags);
 if (ret) {
 bdrv_delete(extent_file);
@@ -789,6 +790,7 @@ static int vmdk_open_desc_file(BlockDriverState *bs, int 
flags,
 goto exit;
 }
 if (strcmp(ct, monolithicFlat) 
+strcmp(ct, vmfsSparse) 
 strcmp(ct, twoGbMaxExtentSparse) 
 strcmp(ct, twoGbMaxExtentFlat)) {
 fprintf(stderr,
@@ -1381,7 +1383,6 @@ static int coroutine_fn 
vmdk_co_write_zeroes(BlockDriverState *bs,
 return ret;
 }
 
-
 static int vmdk_create_extent(const char *filename, int64_t filesize,
   bool flat, bool compress, bool zeroed_grain)
 {
-- 
1.8.3.1

[Qemu-devel] [PATCH v3 1/4] vmdk: Move l1_size check into vmdk_add_extent()

2013-08-19 Thread Fam Zheng

This header check is common to VMDK3 and VMDK4, so move it into
vmdk_add_extent().

Signed-off-by: Fam Zheng f...@redhat.com
---
 block/vmdk.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/block/vmdk.c b/block/vmdk.c
index 346bb5c..f8c0a4e 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -401,6 +401,14 @@ static int vmdk_add_extent(BlockDriverState *bs,
 error_report(invalid granularity, image may be corrupt);
 return -EINVAL;
 }
+if (l1_size  512 * 1024 * 1024) {
+/* Although with big capacity and small l1_entry_sectors, we can get a
+ * big l1_size, we don't want unbounded value to allocate the table.
+ * Limit it to 512M, which is 16PB for default cluster and L2 table
+ * size */
+error_report(L1 size too big);
+return -EFBIG;
+}
 
 s-extents = g_realloc(s-extents,
   (s-num_extents + 1) * sizeof(VmdkExtent));
@@ -598,14 +606,6 @@ static int vmdk_open_vmdk4(BlockDriverState *bs,
 }
 l1_size = (le64_to_cpu(header.capacity) + l1_entry_sectors - 1)
 / l1_entry_sectors;
-if (l1_size  512 * 1024 * 1024) {
-/* although with big capacity and small l1_entry_sectors, we can get a
- * big l1_size, we don't want unbounded value to allocate the table.
- * Limit it to 512M, which is 16PB for default cluster and L2 table
- * size */
-error_report(L1 size too big);
-return -EFBIG;
-}
 if (le32_to_cpu(header.flags)  VMDK4_FLAG_RGD) {
 l1_backup_offset = le64_to_cpu(header.rgd_offset)  9;
 }
-- 
1.8.3.1

[Qemu-devel] [PATCH v3 4/4] vmdk: support vmfs files

2013-08-19 Thread Fam Zheng

From: Paolo Bonzini pbonz...@redhat.com

VMware ESX hosts also use different create and extent types for flat
files, respectively vmfs and VMFS.  This is not documented, but it
can be found at http://kb.vmware.com/kb/10002511 (Recreating a missing
virtual machine disk (VMDK) descriptor file).

Signed-off-by: Paolo Bonzini pbonz...@redhat.com
Signed-off-by: Fam Zheng f...@redhat.com
---
 block/vmdk.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/block/vmdk.c b/block/vmdk.c
index 3986d1d..63b489d 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -719,7 +719,7 @@ static int vmdk_parse_extents(const char *desc, 
BlockDriverState *bs,
 
 if (sectors = 0 ||
 (strcmp(type, FLAT)  strcmp(type, SPARSE) 
- strcmp(type, VMFSSPARSE)) ||
+ strcmp(type, VMFS)  strcmp(type, VMFSSPARSE)) ||
 (strcmp(access, RW))) {
 goto next_line;
 }
@@ -732,7 +732,7 @@ static int vmdk_parse_extents(const char *desc, 
BlockDriverState *bs,
 }
 
 /* save to extents array */
-if (!strcmp(type, FLAT)) {
+if (!strcmp(type, FLAT) || !strcmp(type, VMFS)) {
 /* FLAT extent */
 VmdkExtent *extent;
 
@@ -790,6 +790,7 @@ static int vmdk_open_desc_file(BlockDriverState *bs, int 
flags,
 goto exit;
 }
 if (strcmp(ct, monolithicFlat) 
+strcmp(ct, vmfs) 
 strcmp(ct, vmfsSparse) 
 strcmp(ct, twoGbMaxExtentSparse) 
 strcmp(ct, twoGbMaxExtentFlat)) {
-- 
1.8.3.1

Re: [Qemu-devel] [PATCH v3 1/2] memory: export migration page size

2013-08-19 Thread Michael S. Tsirkin

On Mon, Aug 19, 2013 at 11:59:25AM +0200, Laszlo Ersek wrote:
 On 08/13/13 00:43, Michael S. Tsirkin wrote:
  Migration code assumes that each RAM block is a multiple of target page
  size.
 
 Isn't that a valid assumption, considering the TARGET_PAGE_ALIGN() macro
 call in qemu_ram_alloc_from_ptr() [exec.c]?
 
  We can fix this in a variety of ways, the simplest way is
  exporting the required page size so callers can make regions
  large enough.
  
  Signed-off-by: Michael S. Tsirkin m...@redhat.com
  ---
   arch_init.c   | 6 ++
   include/exec/memory.h | 1 +
   2 files changed, 7 insertions(+)
  
  diff --git a/arch_init.c b/arch_init.c
  index 68a7ab7..c62778f 100644
  --- a/arch_init.c
  +++ b/arch_init.c
  @@ -150,6 +150,12 @@ int qemu_read_default_config_files(bool userconfig)
   return 0;
   }
   
  +/* Smallest page size for migrated RAM. */
  +uint64_t qemu_migration_page_size(void)
  +{
  +return TARGET_PAGE_SIZE;
  +}
  +
   static inline bool is_zero_page(uint8_t *p)
   {
   return buffer_find_nonzero_offset(p, TARGET_PAGE_SIZE) ==
  diff --git a/include/exec/memory.h b/include/exec/memory.h
  index ebe0d24..6a2 100644
  --- a/include/exec/memory.h
  +++ b/include/exec/memory.h
  @@ -1055,6 +1055,7 @@ void *address_space_map(AddressSpace *as, hwaddr addr,
   void address_space_unmap(AddressSpace *as, void *buffer, hwaddr len,
int is_write, hwaddr access_len);
   
  +extern uint64_t qemu_migration_page_size(void);
   
   #endif
 
 External linkage functions that are defined in arch_init.c, and relate
 to migration -- for example, skipped_mig_bytes_transferred() -- are
 declared in include/migration/migration.h. They seem to use
 TARGET_PAGE_SIZE too.
 
 What justifies declaring this new function in include/exec/memory.h?
 
 Thanks,
 Laszlo


Well, this one is a bit different as this just helps device
allocate the right amount of RAM.
It doesn't deal with migration directly.


-- 
MST

Re: [Qemu-devel] [PATCH v3 1/2] memory: export migration page size

2013-08-19 Thread Laszlo Ersek

On 08/19/13 12:21, Peter Maydell wrote:
 On 19 August 2013 10:59, Laszlo Ersek ler...@redhat.com wrote:
 On 08/13/13 00:43, Michael S. Tsirkin wrote:
 Migration code assumes that each RAM block is a multiple of target page
 size.

 Isn't that a valid assumption, considering the TARGET_PAGE_ALIGN() macro
 call in qemu_ram_alloc_from_ptr() [exec.c]?
 
 That macro only makes the size we store in the ramblock data
 structure be a multiple of the page size -- it does nothing to ensure
 that the actual memory that was passed in by the caller is the
 right size. (It will have the right effect where qemu_ram_alloc_from_ptr
 is allocating the memory itself, obviously.)

Which is the case for 2/2, see my comments there:

memory_region_init_ram()
  qemu_ram_alloc()
qemu_ram_alloc_from_ptr()   host==NULL
  TARGET_PAGE_ALIGN()

Laszlo

Re: [Qemu-devel] [PATCH v3 2/2] loader: put FW CFG ROM files into RAM

2013-08-19 Thread Laszlo Ersek

On 08/13/13 00:43, Michael S. Tsirkin wrote:
 ROM files that are put in FW CFG are copied to guest ram, by BIOS, but
 they are not backed by RAM so they don't get migrated.

Can you please elaborate on this? Do you mean the 384 KB range between
640KB and 1MB that is covered by RAMBlock, but no MemoryRegion?

 
 Each time we change two bytes in such a ROM this breaks cross-version
 migration: since we can migrate after BIOS has read the first byte but
 before it has read the second one, getting an inconsistent state.
 
 Future-proof this by creating, for each such ROM,
 an MR serving as the backing store.
 This MR is never mapped into guest memory, but it's registered
 as RAM so it's migrated with the guest.

savevm seems to work off RAMBlocks (ram_list.blocks), not memory regions.

... Ah. You probably don't mean the *target* memory where the guest
copies the fw_cfg contents from the ioport. You probably mean the
*source* (internal representation) of the fw_cfg contents that qemu
passes down via the ioport.

Currently those bytes appear out of thin air.

Is my understanding correct that this patch stuffs them into RAM (that
the guest has no knowledge about), so that migration automatically
copies over the *source* of fw_cfg contents too?

fw_cfg state (current entry selector etc) is tracked according to
vmstate_fw_cfg. It only includes some metadata, not the actual contents.

I wonder if the right fix would be to save the fw_cfg files in
vmstate_fw_cfg too.

 
 Naturally, this only helps for -M 1.7 and up, older machine types
 will still have the cross-version migration bug.
 Luckily the race window for the problem to trigger is very small,
 which is also likely why we didn't notice the cross-version
 migration bug in testing yet.
 
 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 ---
  hw/core/loader.c| 54 
 ++---
  hw/i386/pc_piix.c   |  2 ++
  hw/i386/pc_q35.c|  2 ++
  include/hw/loader.h |  1 +
  4 files changed, 56 insertions(+), 3 deletions(-)
 
 diff --git a/hw/core/loader.c b/hw/core/loader.c
 index 6875b7e..32d807a 100644
 --- a/hw/core/loader.c
 +++ b/hw/core/loader.c
 @@ -54,6 +54,8 @@
  
  #include zlib.h
  
 +bool rom_file_in_ram = true;
 +
  static int roms_loaded;
  
  /* return the size or -1 if error */
 @@ -576,6 +578,7 @@ struct Rom {
  size_t datasize;
  
  uint8_t *data;
 +MemoryRegion *mr;
  int isrom;
  char *fw_dir;
  char *fw_file;
 @@ -605,6 +608,26 @@ static void rom_insert(Rom *rom)
  QTAILQ_INSERT_TAIL(roms, rom, next);
  }
  
 +static void *rom_set_mr(Rom *rom, Object *owner, const char *name)
 +{
 +/*
 + * Migration code expects that all RAM blocks are full pages.
 + * Round MR size up to satisfy this condition.
 + */
 +unsigned size = ROUND_UP(rom-datasize, qemu_migration_page_size());

I'm not sure this is needed at all (see my comments for 1/2 -- they can
be wrong of course). In any case, wouldn't TARGET_PAGE_ALIGN() be more
appropriate?

 +void *data;
 +
 +rom-mr = g_malloc(sizeof(*rom-mr));
 +memory_region_init_ram(rom-mr, owner, name, size);

memory_region_init_ram()
  qemu_ram_alloc()
qemu_ram_alloc_from_ptr()
  TARGET_PAGE_ALIGN()

So we don't have to round up the size ourselves, and patch #1 might be
unnecessary after all.

I also understand now that we're allocating a brand new RAMBlock, which
we won't map into guest-phys address space: we'll call *none* of the
memory API functions that would do that, eg:
- memory_region_init_alias(),
- memory_region_add_subregion[_overlap](),
- memory_region_set_address().

 +memory_region_set_readonly(rom-mr, true);
 +vmstate_register_ram_global(rom-mr);

OK, this call adds the new RAMBlock to ram_list.blocks, via
qemu_ram_set_idstr().

 +
 +data = memory_region_get_ram_ptr(rom-mr);
 +memcpy(data, rom-data, rom-datasize);
 +
 +return data;
 +}
 +

So, this function allocates a new RAMBlock, doesn't map it into
guest-phys address-space, and copies the fw_cfg contents into it.


  int rom_add_file(const char *file, const char *fw_dir,
   hwaddr addr, int32_t bootindex)
  {
 @@ -646,6 +669,7 @@ int rom_add_file(const char *file, const char *fw_dir,
  if (rom-fw_file  fw_cfg) {
  const char *basename;
  char fw_file_name[56];
 +void *data;
  
  basename = strrchr(rom-fw_file, '/');
  if (basename) {
 @@ -655,8 +679,15 @@ int rom_add_file(const char *file, const char *fw_dir,
  }
  snprintf(fw_file_name, sizeof(fw_file_name), %s/%s, rom-fw_dir,
   basename);
 -fw_cfg_add_file(fw_cfg, fw_file_name, rom-data, rom-romsize);
  snprintf(devpath, sizeof(devpath), /rom@%s, fw_file_name);
 +
 +if (rom_file_in_ram) {
 +data = rom_set_mr(rom, OBJECT(fw_cfg), devpath);
 +} else {
 +data = rom-data;
 +}
 +
 +fw_cfg_add_file(fw_cfg,

Re: [Qemu-devel] [PATCH v3 2/2] loader: put FW CFG ROM files into RAM

2013-08-19 Thread Michael S. Tsirkin

On Mon, Aug 19, 2013 at 01:06:07PM +0200, Laszlo Ersek wrote:
 On 08/13/13 00:43, Michael S. Tsirkin wrote:
  ROM files that are put in FW CFG are copied to guest ram, by BIOS, but
  they are not backed by RAM so they don't get migrated.
 
 Can you please elaborate on this? Do you mean the 384 KB range between
 640KB and 1MB that is covered by RAMBlock, but no MemoryRegion?

No. Answered below.

  
  Each time we change two bytes in such a ROM this breaks cross-version
  migration: since we can migrate after BIOS has read the first byte but
  before it has read the second one, getting an inconsistent state.
  
  Future-proof this by creating, for each such ROM,
  an MR serving as the backing store.
  This MR is never mapped into guest memory, but it's registered
  as RAM so it's migrated with the guest.
 
 savevm seems to work off RAMBlocks (ram_list.blocks), not memory regions.
 
 ... Ah. You probably don't mean the *target* memory where the guest
 copies the fw_cfg contents from the ioport. You probably mean the
 *source* (internal representation) of the fw_cfg contents that qemu
 passes down via the ioport.
 
 Currently those bytes appear out of thin air.
 
 Is my understanding correct that this patch stuffs them into RAM (that
 the guest has no knowledge about), so that migration automatically
 copies over the *source* of fw_cfg contents too?

Exactly.

 fw_cfg state (current entry selector etc) is tracked according to
 vmstate_fw_cfg. It only includes some metadata, not the actual contents.
 
 I wonder if the right fix would be to save the fw_cfg files in
 vmstate_fw_cfg too.

This was proposed, and NACKed by Anthony.
Generally vmstate is for small bits of data,
ROM files are too large ...

You can look at this as some RAM internal to device,
we are migrating it as we would any RAM
even though guest can't access it directly.


  
  Naturally, this only helps for -M 1.7 and up, older machine types
  will still have the cross-version migration bug.
  Luckily the race window for the problem to trigger is very small,
  which is also likely why we didn't notice the cross-version
  migration bug in testing yet.
  
  Signed-off-by: Michael S. Tsirkin m...@redhat.com
  ---
   hw/core/loader.c| 54 
  ++---
   hw/i386/pc_piix.c   |  2 ++
   hw/i386/pc_q35.c|  2 ++
   include/hw/loader.h |  1 +
   4 files changed, 56 insertions(+), 3 deletions(-)
  
  diff --git a/hw/core/loader.c b/hw/core/loader.c
  index 6875b7e..32d807a 100644
  --- a/hw/core/loader.c
  +++ b/hw/core/loader.c
  @@ -54,6 +54,8 @@
   
   #include zlib.h
   
  +bool rom_file_in_ram = true;
  +
   static int roms_loaded;
   
   /* return the size or -1 if error */
  @@ -576,6 +578,7 @@ struct Rom {
   size_t datasize;
   
   uint8_t *data;
  +MemoryRegion *mr;
   int isrom;
   char *fw_dir;
   char *fw_file;
  @@ -605,6 +608,26 @@ static void rom_insert(Rom *rom)
   QTAILQ_INSERT_TAIL(roms, rom, next);
   }
   
  +static void *rom_set_mr(Rom *rom, Object *owner, const char *name)
  +{
  +/*
  + * Migration code expects that all RAM blocks are full pages.
  + * Round MR size up to satisfy this condition.
  + */
  +unsigned size = ROUND_UP(rom-datasize, qemu_migration_page_size());
 
 I'm not sure this is needed at all (see my comments for 1/2 -- they can
 be wrong of course). In any case, wouldn't TARGET_PAGE_ALIGN() be more
 appropriate?
 
  +void *data;
  +
  +rom-mr = g_malloc(sizeof(*rom-mr));
  +memory_region_init_ram(rom-mr, owner, name, size);
 
 memory_region_init_ram()
   qemu_ram_alloc()
 qemu_ram_alloc_from_ptr()
   TARGET_PAGE_ALIGN()
 
 So we don't have to round up the size ourselves, and patch #1 might be
 unnecessary after all.
 
 I also understand now that we're allocating a brand new RAMBlock, which
 we won't map into guest-phys address space: we'll call *none* of the
 memory API functions that would do that, eg:
 - memory_region_init_alias(),
 - memory_region_add_subregion[_overlap](),
 - memory_region_set_address().
 
  +memory_region_set_readonly(rom-mr, true);
  +vmstate_register_ram_global(rom-mr);
 
 OK, this call adds the new RAMBlock to ram_list.blocks, via
 qemu_ram_set_idstr().
 
  +
  +data = memory_region_get_ram_ptr(rom-mr);
  +memcpy(data, rom-data, rom-datasize);
  +
  +return data;
  +}
  +
 
 So, this function allocates a new RAMBlock, doesn't map it into
 guest-phys address-space, and copies the fw_cfg contents into it.
 
 
   int rom_add_file(const char *file, const char *fw_dir,
hwaddr addr, int32_t bootindex)
   {
  @@ -646,6 +669,7 @@ int rom_add_file(const char *file, const char *fw_dir,
   if (rom-fw_file  fw_cfg) {
   const char *basename;
   char fw_file_name[56];
  +void *data;
   
   basename = strrchr(rom-fw_file, '/');
   if (basename) {
  @@ -655,8 +679,15 @@ int rom_add_file(const char *file,

[Qemu-devel] vmdk stream-optimised format

2013-08-19 Thread Alex Bligh


I note some recent work on vmdk file support. Has anyone looked at
supporting vmdk streaming format as an /output/ file format (I
think we currently support it as an input format). This is what
you need to use to upload a VM to vCenter. Currently the route
is convert to raw then use:

 https://github.com/imcleod/VMDK-stream-converter

which is some Python from Ian McLeod imcl...@redhat.com
Life would be easier if qemu-img supported this directly.
It does not look fantastically difficult.

--
Alex Bligh

[Qemu-devel] [PATCH] pseries: Add H_SET_MODE hcall to change guest exception endianness

2013-08-19 Thread Anton Blanchard


Hi Anthony,

  +if (resource == 4) {
 
 This ought to be a #define.  There's no else here, is that expected?
 Should you return failure for a different resource?

Good point, I made it a define. We were returning H_P2 for a different
resource, but it was a bit of a twisted maze of return statements. I
tried to clear it up in this version.

 Without knowing this interface better, a few things come to mind.
 
 Is mflags a boolean?  If so, you can reduce this to a single loop and
 drop the switch() statement.  If mflags is truly a set of flags, it
 would be nice to use #define to give the flags a proper symbolic name.

Unfortunately it isn't a boolean, but yes it should have be made
clearer with a #define.

Anton
--

pseries: Add H_SET_MODE hcall to change guest exception endianness

H_SET_MODE is used for controlling various partition settings. One
of these settings is the endianness a guest takes its exceptions in.

Signed-off-by: Anton Blanchard an...@samba.org
---

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 16bfab9..de639f6 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -262,7 +262,7 @@ static void *spapr_create_fdt_skel(const char *cpu_model,
 uint32_t start_prop = cpu_to_be32(initrd_base);
 uint32_t end_prop = cpu_to_be32(initrd_base + initrd_size);
 char hypertas_prop[] = hcall-pft\0hcall-term\0hcall-dabr\0hcall-interrupt
-\0hcall-tce\0hcall-vio\0hcall-splpar\0hcall-bulk;
+\0hcall-tce\0hcall-vio\0hcall-splpar\0hcall-bulk\0hcall-set-mode;
 char qemu_hypertas_prop[] = hcall-memop1;
 uint32_t refpoints[] = {cpu_to_be32(0x4), cpu_to_be32(0x4)};
 uint32_t interrupt_server_ranges_prop[] = {0, cpu_to_be32(smp_cpus)};
diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
index 67d6cd9..89e6a00 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -657,6 +657,54 @@ static target_ulong h_logical_dcbf(PowerPCCPU *cpu, 
sPAPREnvironment *spapr,
 return H_SUCCESS;
 }
 
+static target_ulong h_set_mode(PowerPCCPU *cpu, sPAPREnvironment *spapr,
+   target_ulong opcode, target_ulong *args)
+{
+CPUState *cs;
+target_ulong mflags = args[0];
+target_ulong resource = args[1];
+target_ulong value1 = args[2];
+target_ulong value2 = args[3];
+target_ulong ret = H_P2;
+
+if (resource == H_SET_MODE_ENDIAN) {
+if (value1) {
+ret = H_P3;
+goto out;
+}
+if (value2) {
+ret = H_P4;
+goto out;
+}
+
+switch (mflags) {
+case H_SET_MODE_ENDIAN_BIG:
+for (cs = first_cpu; cs != NULL; cs = cs-next_cpu) {
+PowerPCCPU *cp = POWERPC_CPU(cs);
+CPUPPCState *env = cp-env;
+env-spr[SPR_LPCR] = ~LPCR_ILE;
+}
+ret = H_SUCCESS;
+break;
+
+case H_SET_MODE_ENDIAN_LITTLE:
+for (cs = first_cpu; cs != NULL; cs = cs-next_cpu) {
+PowerPCCPU *cp = POWERPC_CPU(cs);
+CPUPPCState *env = cp-env;
+env-spr[SPR_LPCR] |= LPCR_ILE;
+}
+ret = H_SUCCESS;
+break;
+
+default:
+ret = H_UNSUPPORTED_FLAG;
+}
+}
+
+out:
+return ret;
+}
+
 static spapr_hcall_fn papr_hypercall_table[(MAX_HCALL_OPCODE / 4) + 1];
 static spapr_hcall_fn kvmppc_hypercall_table[KVMPPC_HCALL_MAX - 
KVMPPC_HCALL_BASE + 1];
 
@@ -734,6 +782,8 @@ static void hypercall_register_types(void)
 
 /* qemu/KVM-PPC specific hcalls */
 spapr_register_hypercall(KVMPPC_H_RTAS, h_rtas);
+
+spapr_register_hypercall(H_SET_MODE, h_set_mode);
 }
 
 type_init(hypercall_register_types)
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 9fc1972..ab42813 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -109,6 +109,15 @@ typedef struct sPAPREnvironment {
 #define H_NOT_ENOUGH_RESOURCES -44
 #define H_R_STATE -45
 #define H_RESCINDEND  -46
+#define H_P2  -55
+#define H_P3  -56
+#define H_P4  -57
+#define H_P5  -58
+#define H_P6  -59
+#define H_P7  -60
+#define H_P8  -61
+#define H_P9  -62
+#define H_UNSUPPORTED_FLAG -256
 #define H_MULTI_THREADS_ACTIVE -9005
 
 
@@ -143,6 +152,11 @@ typedef struct sPAPREnvironment {
 #define H_PP1 (1ULL(63-62))
 #define H_PP2 (1ULL(63-63))
 
+/* H_SET_MODE flags */
+#define H_SET_MODE_ENDIAN  4
+#define  H_SET_MODE_ENDIAN_BIG 0
+#define  H_SET_MODE_ENDIAN_LITTLE  1
+
 /* VASI States */
 #define H_VASI_INVALID0
 #define H_VASI_ENABLED1
@@ -267,7 +281,8 @@ typedef struct sPAPREnvironment {
 #define H_GET_EM_PARMS  0x2B8
 #define H_SET_MPP   0x2D0
 #define H_GET_MPP   0x2D4
-#define MAX_HCALL_OPCODEH_GET_MPP
+#define H_SET_MODE  0x31C
+#define MAX_HCALL_OPCODEH_SET_MODE

Re: [Qemu-devel] [PATCH v3 2/2] loader: put FW CFG ROM files into RAM

2013-08-19 Thread Laszlo Ersek

On 08/19/13 13:06, Laszlo Ersek wrote:
 On 08/13/13 00:43, Michael S. Tsirkin wrote:

 @@ -646,6 +669,7 @@ int rom_add_file(const char *file, const char *fw_dir,
  if (rom-fw_file  fw_cfg) {
  const char *basename;
  char fw_file_name[56];
 +void *data;
  
  basename = strrchr(rom-fw_file, '/');
  if (basename) {
 @@ -655,8 +679,15 @@ int rom_add_file(const char *file, const char *fw_dir,
  }
  snprintf(fw_file_name, sizeof(fw_file_name), %s/%s, rom-fw_dir,
   basename);
 -fw_cfg_add_file(fw_cfg, fw_file_name, rom-data, rom-romsize);
  snprintf(devpath, sizeof(devpath), /rom@%s, fw_file_name);
 +
 +if (rom_file_in_ram) {
 +data = rom_set_mr(rom, OBJECT(fw_cfg), devpath);
 +} else {
 +data = rom-data;
 +}
 +
 +fw_cfg_add_file(fw_cfg, fw_file_name, data, rom-romsize);
 
 This seems OK, but if rom_file_in_ram is nonzero, then we'll store the
 ROM contents in the qemu process twice -- once in rom-data (allocated
 just a bit higher up, not shown in context), and in the new RAMBlock.
 
 This is no bug of course, I'm just wondering if we could drop/repoint
 rom-data in this case.
 
  } else {
  snprintf(devpath, sizeof(devpath), /rom@ TARGET_FMT_plx, addr);
  }
 @@ -731,7 +762,12 @@ static void rom_reset(void *unused)
  if (rom-data == NULL) {
  continue;
  }
 -cpu_physical_memory_write_rom(rom-addr, rom-data, rom-datasize);
 +if (rom-mr) {
 +void *host = memory_region_get_ram_ptr(rom-mr);
 +memcpy(host, rom-data, rom-datasize);
 +} else {
 +cpu_physical_memory_write_rom(rom-addr, rom-data, 
 rom-datasize);
 +}
 
 Hmmm. Why is this (ie. the pre-patch resetting) necessary at all?
 
 Is this due to the writeability of fw_cfg files via the ioport
 (fw_cfg_write())? I think that modifies rom-data unconditionally
 (which is currently kept separate from the RAMBlock, see above).
 
 So, regarding the patched version:
 - not sure if the RAMBlock can change at all -- it is neither mapped
 into guest-phys address space, nor does fw_cfg_write() touch it,
 - *if* the guest modifies the contents under rom-addr, via
 fw_cfg_write(), then the hva-space memcpy() is insufficient.

Sorry, I'm wrong here. The patched rom_add_file() ensures that
fw_cfg_write() modifies the correct backing store. Also, we need to keep
rom-data around even if rom_file_in_ram is set, because that's
where we restore the RAMBlock contents from, in case of a reset.

Laszlo

Re: [Qemu-devel] [PATCH v3 1/2] memory: export migration page size

2013-08-19 Thread Michael S. Tsirkin

On Mon, Aug 19, 2013 at 01:09:36PM +0200, Laszlo Ersek wrote:
 On 08/19/13 12:21, Peter Maydell wrote:
  On 19 August 2013 10:59, Laszlo Ersek ler...@redhat.com wrote:
  On 08/13/13 00:43, Michael S. Tsirkin wrote:
  Migration code assumes that each RAM block is a multiple of target page
  size.
 
  Isn't that a valid assumption, considering the TARGET_PAGE_ALIGN() macro
  call in qemu_ram_alloc_from_ptr() [exec.c]?
  
  That macro only makes the size we store in the ramblock data
  structure be a multiple of the page size -- it does nothing to ensure
  that the actual memory that was passed in by the caller is the
  right size. (It will have the right effect where qemu_ram_alloc_from_ptr
  is allocating the memory itself, obviously.)
 
 Which is the case for 2/2, see my comments there:
 
 memory_region_init_ram()
   qemu_ram_alloc()
 qemu_ram_alloc_from_ptr()   host==NULL
   TARGET_PAGE_ALIGN()
 
 Laszlo

The issue this addresses is not the size of RAM allocated.
The issue is the size of the MR.
Migration code assumes the size of the MR
is a multiple of TARGET_PAGE_SIZE.


-- 
MST

Re: [Qemu-devel] [PATCH v3 2/2] loader: put FW CFG ROM files into RAM

2013-08-19 Thread Michael S. Tsirkin

On Mon, Aug 19, 2013 at 01:15:36PM +0200, Laszlo Ersek wrote:
 On 08/19/13 13:06, Laszlo Ersek wrote:
  On 08/13/13 00:43, Michael S. Tsirkin wrote:
 
  @@ -646,6 +669,7 @@ int rom_add_file(const char *file, const char *fw_dir,
   if (rom-fw_file  fw_cfg) {
   const char *basename;
   char fw_file_name[56];
  +void *data;
   
   basename = strrchr(rom-fw_file, '/');
   if (basename) {
  @@ -655,8 +679,15 @@ int rom_add_file(const char *file, const char *fw_dir,
   }
   snprintf(fw_file_name, sizeof(fw_file_name), %s/%s, rom-fw_dir,
basename);
  -fw_cfg_add_file(fw_cfg, fw_file_name, rom-data, rom-romsize);
   snprintf(devpath, sizeof(devpath), /rom@%s, fw_file_name);
  +
  +if (rom_file_in_ram) {
  +data = rom_set_mr(rom, OBJECT(fw_cfg), devpath);
  +} else {
  +data = rom-data;
  +}
  +
  +fw_cfg_add_file(fw_cfg, fw_file_name, data, rom-romsize);
  
  This seems OK, but if rom_file_in_ram is nonzero, then we'll store the
  ROM contents in the qemu process twice -- once in rom-data (allocated
  just a bit higher up, not shown in context), and in the new RAMBlock.
  
  This is no bug of course, I'm just wondering if we could drop/repoint
  rom-data in this case.
  
   } else {
   snprintf(devpath, sizeof(devpath), /rom@ TARGET_FMT_plx, addr);
   }
  @@ -731,7 +762,12 @@ static void rom_reset(void *unused)
   if (rom-data == NULL) {
   continue;
   }
  -cpu_physical_memory_write_rom(rom-addr, rom-data, 
  rom-datasize);
  +if (rom-mr) {
  +void *host = memory_region_get_ram_ptr(rom-mr);
  +memcpy(host, rom-data, rom-datasize);
  +} else {
  +cpu_physical_memory_write_rom(rom-addr, rom-data, 
  rom-datasize);
  +}
  
  Hmmm. Why is this (ie. the pre-patch resetting) necessary at all?
  
  Is this due to the writeability of fw_cfg files via the ioport
  (fw_cfg_write())? I think that modifies rom-data unconditionally
  (which is currently kept separate from the RAMBlock, see above).
  
  So, regarding the patched version:
  - not sure if the RAMBlock can change at all -- it is neither mapped
  into guest-phys address space, nor does fw_cfg_write() touch it,
  - *if* the guest modifies the contents under rom-addr, via
  fw_cfg_write(), then the hva-space memcpy() is insufficient.
 
 Sorry, I'm wrong here. The patched rom_add_file() ensures that
 fw_cfg_write() modifies the correct backing store. Also, we need to keep
 rom-data around even if rom_file_in_ram is set, because that's
 where we restore the RAMBlock contents from, in case of a reset.
 
 Laszlo

Exactly.

Re: [Qemu-devel] vmdk stream-optimised format

2013-08-19 Thread Fam Zheng

On Mon, 08/19 12:09, Alex Bligh wrote:
 I note some recent work on vmdk file support. Has anyone looked at
 supporting vmdk streaming format as an /output/ file format (I
 think we currently support it as an input format). This is what
 you need to use to upload a VM to vCenter. Currently the route
 is convert to raw then use:
 
  https://github.com/imcleod/VMDK-stream-converter
 
 which is some Python from Ian McLeod imcl...@redhat.com
 Life would be easier if qemu-img supported this directly.
 It does not look fantastically difficult.
 
 -- 
 Alex Bligh
 

Yes, that would be something possible to do. Is it the only format to
upload a VM to vCenter now, shouldn't normal VMDK be supported as well?

Thanks,

Fam

Re: [Qemu-devel] [PATCH V5 1/5] throttle: Add a new throttling API implementing continuous leaky bucket.

2013-08-19 Thread Paolo Bonzini

Il 16/08/2013 13:45, Stefan Hajnoczi ha scritto:
  +#define BUCKETS_COUNT 6
  +
  +typedef enum {
  +THROTTLE_BPS_TOTAL = 0,
  +THROTTLE_BPS_READ  = 1,
  +THROTTLE_BPS_WRITE = 2,
  +THROTTLE_OPS_TOTAL = 3,
  +THROTTLE_OPS_READ  = 4,
  +THROTTLE_OPS_WRITE = 5,
  +} BucketType;

Please remove the = N from the enums, and add BUCKETS_COUNT here.

  +typedef struct LeakyBucket {
  +double  ups;/* units per second */
  +double  max;/* leaky bucket max in units */
  +double  bucket; /* bucket in units */
 These comments aren't very clear to me :).  So I guess bps or iops would
 be in ups.  Max would be the total budget or maximum burst.  Bucket
 might be the current level.

I also suggest replacing ups with avg, since it's the average
throughput that the leaky bucket allows after the initial burst has
emptied the bucket.

 +uint64_t unit_size; /* size of an unit in bytes */
 +uint64_t op_size;   /* size of an operation in units */
 
 It's not clear yet why we need both unit_size *and* op_size.  I thought
 you would have a single granularity field for accounting big requests as
 multiple iops.

IIUC the ops buckets account operations in op_size / unit_size units,
while the bps buckets account operations in 1 / unit_size units, or
something like that.  But it needs clarification indeed.

Paolo

Re: [Qemu-devel] [PATCH v3 1/2] memory: export migration page size

2013-08-19 Thread Laszlo Ersek

On 08/19/13 13:18, Michael S. Tsirkin wrote:
 On Mon, Aug 19, 2013 at 01:09:36PM +0200, Laszlo Ersek wrote:
 On 08/19/13 12:21, Peter Maydell wrote:
 On 19 August 2013 10:59, Laszlo Ersek ler...@redhat.com wrote:
 On 08/13/13 00:43, Michael S. Tsirkin wrote:
 Migration code assumes that each RAM block is a multiple of target page
 size.

 Isn't that a valid assumption, considering the TARGET_PAGE_ALIGN() macro
 call in qemu_ram_alloc_from_ptr() [exec.c]?

 That macro only makes the size we store in the ramblock data
 structure be a multiple of the page size -- it does nothing to ensure
 that the actual memory that was passed in by the caller is the
 right size. (It will have the right effect where qemu_ram_alloc_from_ptr
 is allocating the memory itself, obviously.)

 Which is the case for 2/2, see my comments there:

 memory_region_init_ram()
   qemu_ram_alloc()
 qemu_ram_alloc_from_ptr()   host==NULL
   TARGET_PAGE_ALIGN()

 Laszlo
 
 The issue this addresses is not the size of RAM allocated.
 The issue is the size of the MR.
 Migration code assumes the size of the MR
 is a multiple of TARGET_PAGE_SIZE.

You're right. Thanks.
Laszlo

Re: [Qemu-devel] [PATCH v3 0/2] future proof rom loading for cross versiom migration

2013-08-19 Thread Laszlo Ersek

On 08/13/13 00:43, Michael S. Tsirkin wrote:
 Changes from v2: address comments on v2 by Peter Maydell
 - switch from global constant to function
 - use memory_region_init_ram instead of _ram_ptr
 - disable for 1.6
 
 Changes from v1: address comments by Peter Maydell
 - drop useless data=data line
 - rename target_page_size to migration_page_size to make use clear
 Peter, you also suggested somehow hiding this within memory core.
 I don't see a clean way to do this without lots of code
 changes, I think what I propose here is acceptable for now
 and we can always rework APIs without wire format changes.
 
 Please review, and consider for merging.
 
 Original cover letter below.
 
 
 ROM files that are put in FW CFG are copied to guest ram, by BIOS, but
 they are not backed by RAM so they don't get migrated.
 
 Each time we'll change at least two bytes in such a ROM this will break
 cross-version migration: since we can migrate after BIOS has read the first
 byte but before it has read the second one, getting an inconsistent state.
 
 This patchset makes QEMU future-proof against such changes.
 
 Naturally, this only helps for -M 1.6 and up, older machine types
 will still have the cross-version migration bug.
 
 I think this should be applied for 1.6, this way we won't
 have this problem from 1.7 and on.
 
 Michael S. Tsirkin (2):
   memory: export migration page size
   loader: put FW CFG ROM files into RAM
 
  arch_init.c   |  6 ++
  hw/core/loader.c  | 54 
 ---
  hw/i386/pc_piix.c |  2 ++
  hw/i386/pc_q35.c  |  2 ++
  include/exec/memory.h |  1 +
  include/hw/loader.h   |  1 +
  6 files changed, 63 insertions(+), 3 deletions(-)
 

Reviewed-by: Laszlo Ersek ler...@redhat.com

Re: [Qemu-devel] vmdk stream-optimised format

2013-08-19 Thread Alex Bligh


On 19 Aug 2013, at 12:23, Fam Zheng wrote:

 Yes, that would be something possible to do. Is it the only format to
 upload a VM to vCenter now, shouldn't normal VMDK be supported as well?

Ian puts it better than I can:

 This format is only useful if you are attempting to import virtual machines
 into ESX using the vSphere SOAP API and HTTP POST uploads of image files.
 (In which case, it is required.)

IE it is the ONLY format you can use using the vSphere SOAP API, which is
the only way to automate image uploading. vCenter converts other formats
to this format.

Extensive testing with qemu in Q1 this year indicates that it is not
possible to use qemu-img convert to produce anything that will upload.
I went as far as hexdump at the time. I believe I tried compat6
and the various subformat options.

I believe we /read/ this format fine.

-- 
Alex Bligh

Re: [Qemu-devel] [PATCH] pc: cleanup 1.4 compat support

2013-08-19 Thread Eduardo Habkost

On Sun, Aug 18, 2013 at 04:50:02PM +0300, Michael S. Tsirkin wrote:
 Make 1.4 compat code call the 1.6 one, reducing
 code duplication. Add comment explaining why we can't
 make 1.4 call 1.5 as usual.
 
 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 ---
  hw/i386/pc_piix.c | 4 ++--
  hw/i386/pc_q35.c  | 4 ++--
  2 files changed, 4 insertions(+), 4 deletions(-)
 
 diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
 index 15c932e..86cfadc 100644
 --- a/hw/i386/pc_piix.c
 +++ b/hw/i386/pc_piix.c
 @@ -272,8 +272,8 @@ static void pc_init_pci_1_4(QEMUMachineInitArgs *args)
  {
  x86_cpu_compat_set_features(n270, FEAT_1_ECX, 0, CPUID_EXT_MOVBE);
  x86_cpu_compat_set_features(Westmere, FEAT_1_ECX, 0, 
 CPUID_EXT_PCLMULQDQ);
 -has_pci_info = false;
 -pc_init_pci(args);
 +/* 1.5 was special - it enabled pvpanic in builtin machine */
 +pc_init_pci_1_6(args);
  }
  

I find this confusing (and probably it will be even more confusing when
people read this code a few years from now). I would prefer the
following pattern:

static void pc_compat_1_5()
{
/* do 1.5 stuff (possibly undoing 1.6 stuff) */
has_pvpanic = true;
}

static void pc_init_pci_1_5()
{
pc_compat_1_5();
pc_init_pci();
}

static void pc_compat_1_4()
{
pc_compat_1_5();
/* do 1.4 stuff (possibly undoing 1.5 stuff) */
has_pvpanic = false;
}

static void pc_init_pci_1_4()
{
pc_compat_1_4();
pc_init_pci();
}

static void pc_compat_1_3()
{
pc_compat_1_3();
/* do 1.3 stuff (possibly undoing 1.4 stuff) */
}

static void pc_init_pci_1_3()
{
pc_compat_1_3();
pc_init_pci();
}

Otherwise the same problem may happen again in the future.

With this, we could even reuse the same pc_compat_*() functions on
pc_piix.c and pc_q35.c.


  static void pc_init_pci_1_3(QEMUMachineInitArgs *args)
 diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
 index 09d8183..8d62700 100644
 --- a/hw/i386/pc_q35.c
 +++ b/hw/i386/pc_q35.c
 @@ -239,8 +239,8 @@ static void pc_q35_init_1_4(QEMUMachineInitArgs *args)
  {
  x86_cpu_compat_set_features(n270, FEAT_1_ECX, 0, CPUID_EXT_MOVBE);
  x86_cpu_compat_set_features(Westmere, FEAT_1_ECX, 0, 
 CPUID_EXT_PCLMULQDQ);
 -has_pci_info = false;
 -pc_q35_init(args);
 +/* 1.5 was special - it enabled pvpanic in builtin machine */
 +pc_q35_init_1_6(args);
  }
  
  static QEMUMachine pc_q35_machine_v1_6 = {
 -- 
 MST

-- 
Eduardo

Re: [Qemu-devel] Using aio_poll for timer carrier threads

2013-08-19 Thread Paolo Bonzini

Il 14/08/2013 14:32, Jan Kiszka ha scritto:
 I still need to check more corner cases as timer dequeuing can now race
 with the handler execution, ie. a dequeued timer can still see one more
 handler run after timer_del returned. That's a property one can easily
 take into account when writing device models, but it has to be kept in
 mind that it's different from current behavior.
 
 Updated queue is at git://git.kiszka.org/qemu.git queues/rt.new3 again.

I took a look here, there are several patches that I guess are basically
ready... two random things I noticed:

(1) rcu_init() must be called in rcutorture too

(2) there are several load/store functions in exec.c that do not go
through address_space_rw.  These would not call
qemu_flush_coalesced_mmio_buffer() after your patch provide
address_space_rw_unlocked.  I think that in most cases the solution
should be to make these functions go through address_space_rw.

Paolo

Re: [Qemu-devel] [PATCH] block: Produce zeros when protocols reading beyond end of file

2013-08-19 Thread Stefan Hajnoczi

On Mon, Aug 19, 2013 at 02:36:14PM +0800, Asias He wrote:
 On Fri, Aug 16, 2013 at 10:41:36AM +0200, Stefan Hajnoczi wrote:
  On Mon, Aug 5, 2013 at 10:11 AM, Asias He as...@redhat.com wrote:
  A simple but ugly way to fix this is for block.c to also have a
  -zero_beyond_eof flag which enables the behavior you are adding.
  qcow2_load_vmstate() would disable -zero_beyond_eof temporarily in
  addition to enabling -growable.
 
 I am wondering why the -growable logic is introduced in the first
 place.  Adding yet another this kind of flag looks realy ugly ;(

bs-growable serves two functions:

1. It means you can read/write beyond the end of the file, for example,
   when creating a new image file.  Normally BlockDriverState rejects
   requests beyond the EOF.

2. qcow2 uses it as a hack to implement the vmstate area after the end
   of regular guest data.  This is the ugly part but it always worked up
   until now.

#1 is a legitimate use case.  You don't always know how big the file
will end up so it's much more convenient to grow on demand, instead of
having to bdrv_truncate() all the time.

#2 is hacky but hard to solve without duplicating the bounce buffer and
coroutine wrapping logic in bdrv_pread() (Kevin has suggested calling
bdrv_co_readv() internally instead of bdrv_pread()).

Stefan

Re: [Qemu-devel] [PATCH 2/2] qemu-timer: make qemu_timer_mod_ns() and qemu_timer_del() thread-safe

2013-08-19 Thread Paolo Bonzini

Il 12/08/2013 14:49, Stefan Hajnoczi ha scritto:
 Introduce QEMUTimerList-active_timers_lock to protect the linked list
 of active timers.  This allows qemu_timer_mod_ns() to be called from any
 thread.
 
 Note that vm_clock is not thread-safe and its use of
 qemu_clock_has_timers() works fine today but is also not thread-safe.
 
 The purpose of this patch is to eventually let device models set or
 cancel timers from a vcpu thread without holding the global mutex.
 
 Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
 ---
  include/qemu/timer.h | 17 ++
  qemu-timer.c | 66 
 +---
  2 files changed, 69 insertions(+), 14 deletions(-)
 
 diff --git a/include/qemu/timer.h b/include/qemu/timer.h
 index 829c005..3543b0e 100644
 --- a/include/qemu/timer.h
 +++ b/include/qemu/timer.h
 @@ -114,6 +114,10 @@ static inline int64_t qemu_clock_get_us(QEMUClockType 
 type)
   * Determines whether a clock's default timer list
   * has timers attached
   *
 + * Note that this function should not be used when other threads also access
 + * the timer list.  The return value may be outdated by the time it is acted
 + * upon.
 + *
   * Returns: true if the clock's default timer list
   * has timers attached
   */
 @@ -270,6 +274,10 @@ void timerlist_free(QEMUTimerList *timer_list);
   *
   * Determine whether a timer list has active timers
   *
 + * Note that this function should not be used when other threads also access
 + * the timer list.  The return value may be outdated by the time it is acted
 + * upon.
 + *
   * Returns: true if the timer list has timers.
   */
  bool timerlist_has_timers(QEMUTimerList *timer_list);
 @@ -511,6 +519,9 @@ void timer_free(QEMUTimer *ts);
   * @ts: the timer
   *
   * Delete a timer from the active list.
 + *
 + * This function is thread-safe but the timer and its timer list must not be
 + * freed while this function is running.
   */
  void timer_del(QEMUTimer *ts);
  
 @@ -520,6 +531,9 @@ void timer_del(QEMUTimer *ts);
   * @expire_time: the expiry time in nanoseconds
   *
   * Modify a timer to expire at @expire_time
 + *
 + * This function is thread-safe but the timer and its timer list must not be
 + * freed while this function is running.
   */
  void timer_mod_ns(QEMUTimer *ts, int64_t expire_time);
  
 @@ -530,6 +544,9 @@ void timer_mod_ns(QEMUTimer *ts, int64_t expire_time);
   *
   * Modify a timer to expiry at @expire_time, taking into
   * account the scale associated with the timer.
 + *
 + * This function is thread-safe but the timer and its timer list must not be
 + * freed while this function is running.
   */
  void timer_mod(QEMUTimer *ts, int64_t expire_timer);
  
 diff --git a/qemu-timer.c b/qemu-timer.c
 index 818d235..3a5b46d 100644
 --- a/qemu-timer.c
 +++ b/qemu-timer.c
 @@ -66,6 +66,7 @@ QEMUClock qemu_clocks[QEMU_CLOCK_MAX];
  
  struct QEMUTimerList {
  QEMUClock *clock;
 +QemuMutex active_timers_lock;
  QEMUTimer *active_timers;
  QLIST_ENTRY(QEMUTimerList) list;
  QEMUTimerListNotifyCB *notify_cb;
 @@ -101,6 +102,7 @@ QEMUTimerList *timerlist_new(QEMUClockType type,
  timer_list-clock = clock;
  timer_list-notify_cb = cb;
  timer_list-notify_opaque = opaque;
 +qemu_mutex_init(timer_list-active_timers_lock);
  QLIST_INSERT_HEAD(clock-timerlists, timer_list, list);
  return timer_list;
  }
 @@ -111,6 +113,7 @@ void timerlist_free(QEMUTimerList *timer_list)
  if (timer_list-clock) {
  QLIST_REMOVE(timer_list, list);
  }
 +qemu_mutex_destroy(timer_list-active_timers_lock);
  g_free(timer_list);
  }
  
 @@ -163,9 +166,17 @@ bool qemu_clock_has_timers(QEMUClockType type)
  
  bool timerlist_expired(QEMUTimerList *timer_list)
  {
 -return (timer_list-active_timers 
 -timer_list-active_timers-expire_time 
 -qemu_clock_get_ns(timer_list-clock-type));
 +int64_t expire_time;
 +
 +qemu_mutex_lock(timer_list-active_timers_lock);
 +if (!timer_list-active_timers) {
 +qemu_mutex_unlock(timer_list-active_timers_lock);
 +return false;
 +}
 +expire_time = timer_list-active_timers-expire_time;
 +qemu_mutex_unlock(timer_list-active_timers_lock);

Perhaps you can put here a comment similar to the one in
timerlist_deadline_ns?

 +return expire_time  qemu_clock_get_ns(timer_list-clock-type);
  }
  
  bool qemu_clock_expired(QEMUClockType type)
 @@ -182,13 +193,25 @@ bool qemu_clock_expired(QEMUClockType type)
  int64_t timerlist_deadline_ns(QEMUTimerList *timer_list)
  {
  int64_t delta;
 +int64_t expire_time;
  
 -if (!timer_list-clock-enabled || !timer_list-active_timers) {
 +if (!timer_list-clock-enabled) {
  return -1;
  }
  
 -delta = timer_list-active_timers-expire_time -
 -qemu_clock_get_ns(timer_list-clock-type);
 +/* The active timers list may be modified before the caller uses our 
 return
 + * value but -notify_cb() is

Re: [Qemu-devel] [PATCH 2/2] qemu-timer: make qemu_timer_mod_ns() and qemu_timer_del() thread-safe

2013-08-19 Thread Paolo Bonzini

Il 15/08/2013 14:00, Stefan Hajnoczi ha scritto:
 The common case is that we only check the first timer in
 -active_timers.  Usually the first timer has not expired yet and we
 return; the lock was taken once only.
 
 I'm not sure it's worth complicating the case where we iterate multiple
 times.

I agree.  Later we may try to get creative and optimize the lookup of
the first timer (take the lock zero times instead of once if the first
timer has not expired yet, take the lock once instead of twice if one
timer runs, etc.).  Anything beyond that would be premature.

Paolo

[Qemu-devel] [PATCH 0/3] target-ppc: Add support for dumping guest memory using qemu gdb server

2013-08-19 Thread Aneesh Kumar K.V

Hi,

This patch series implement support for dumping guest memory using qemu gdb 
server.

Without this patch we would get.


(gdb) x/10 do_fork
 0xc0098660 do_fork:   Cannot access memory at address 
0xc0098660
(gdb)

With this patch series we can now do

(gdb) target remote localhost:1234
Remote debugging using localhost:1234
.plpar_hcall_norets () at arch/powerpc/platforms/pseries/hvCall.S:119
119 HCALL_INST_POSTCALL_NORETS
.
.
(gdb) x/10i htab_call_hpte_insert1
   0xc00470d8 .htab_call_hpte_insert1:bl  
0xc005f8f0 pSeries_lpar_hpte_insert
   0xc00470dc .htab_call_hpte_insert1+4:  cmpdi   r3,0
   0xc00470e0 .htab_call_hpte_insert1+8:  bge 
0xc0047190 htab_pte_insert_ok
   0xc00470e4 .htab_call_hpte_insert1+12: cmpdi   r3,-2


NOTE: We still don't support inserting breakpoints.

-aneesh

[Qemu-devel] [PATCH 1/3] target-ppc: Update slb array with correct index values.

2013-08-19 Thread Aneesh Kumar K.V

From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com

Without this, a value of rb=0 and rs=0, results in replacing the 0th
index. This can be observed when using gdb remote debugging support.

(gdb) x/10i do_fork
   0xc0085330 do_fork:Cannot access memory at address 
0xc0085330
(gdb)

This is because when we do the slb sync via kvm_cpu_synchronize_state,
we overwrite the slb entry (0th entry) for 0xc008533

Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
---
 target-ppc/kvm.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
index 30a870e..5d4e613 100644
--- a/target-ppc/kvm.c
+++ b/target-ppc/kvm.c
@@ -1034,8 +1034,18 @@ int kvm_arch_get_registers(CPUState *cs)
 /* Sync SLB */
 #ifdef TARGET_PPC64
 for (i = 0; i  64; i++) {
-ppc_store_slb(env, sregs.u.s.ppc64.slb[i].slbe,
-   sregs.u.s.ppc64.slb[i].slbv);
+target_ulong rb  = sregs.u.s.ppc64.slb[i].slbe;
+/*
+ * KVM_GET_SREGS doesn't retun slb entry with slot information
+ * same as index. So don't depend on the slot information in
+ * the returned value.
+ */
+rb = ~0xfff;
+/*
+ * use the array index as the slot
+ */
+rb |= i;
+ppc_store_slb(env, rb, sregs.u.s.ppc64.slb[i].slbv);
 }
 #endif
 
-- 
1.8.1.2

[Qemu-devel] [PATCH 3/3] target-ppc: Fix page table lookup with kvm enabled

2013-08-19 Thread Aneesh Kumar K.V

From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com

With kvm enabled, we store the hash page table information in the hypervisor.
Use ioctl to read the htab contents. Without this we get the below error when
trying to read the guest address

 (gdb) x/10 do_fork
 0xc0098660 do_fork:   Cannot access memory at address 
0xc0098660
 (gdb)

Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
---
 target-ppc/kvm.c| 45 +
 target-ppc/kvm_ppc.h|  3 ++-
 target-ppc/mmu-hash64.c | 25 -
 3 files changed, 63 insertions(+), 10 deletions(-)

diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
index 5d4e613..63a9c0e 100644
--- a/target-ppc/kvm.c
+++ b/target-ppc/kvm.c
@@ -1885,3 +1885,48 @@ int kvm_arch_on_sigbus(int code, void *addr)
 void kvm_arch_init_irq_routing(KVMState *s)
 {
 }
+
+int kvmppc_hash64_load_hpte(CPUPPCState *env, __u64 index,
+target_ulong *hpte0, target_ulong *hpte1)
+{
+int htab_fd;
+struct kvm_get_htab_fd ghf;
+struct kvm_get_htab_buf {
+struct kvm_get_htab_header header;
+/*
+ * Older kernel required one extra byte.
+ */
+unsigned long hpte[3];
+} hpte_buf;
+
+*hpte0 = 0;
+*hpte1 = 0;
+if (!cap_htab_fd) {
+return 0;
+}
+/*
+ * At this point we are only interested in reading only bolted entries
+ */
+ghf.flags = KVM_GET_HTAB_BOLTED_ONLY;
+ghf.start_index = index;
+htab_fd = kvm_vm_ioctl(kvm_state, KVM_PPC_GET_HTAB_FD, ghf);
+if (htab_fd  0) {
+return htab_fd;
+}
+
+if (read(htab_fd, hpte_buf, sizeof(hpte_buf))  0) {
+goto out;
+}
+/*
+ * We only requested for one entry, So we should get only 1
+ * valid entry at the same index
+ */
+if (hpte_buf.header.n_valid != 1 || hpte_buf.header.index != index) {
+goto out;
+}
+*hpte0 = hpte_buf.hpte[0];
+*hpte1 = hpte_buf.hpte[1];
+out:
+close(htab_fd);
+return  0;
+}
diff --git a/target-ppc/kvm_ppc.h b/target-ppc/kvm_ppc.h
index 4ae7bf2..51c8952 100644
--- a/target-ppc/kvm_ppc.h
+++ b/target-ppc/kvm_ppc.h
@@ -42,7 +42,8 @@ int kvmppc_get_htab_fd(bool write);
 int kvmppc_save_htab(QEMUFile *f, int fd, size_t bufsize, int64_t max_ns);
 int kvmppc_load_htab_chunk(QEMUFile *f, int fd, uint32_t index,
uint16_t n_valid, uint16_t n_invalid);
-
+int kvmppc_hash64_load_hpte(CPUPPCState *env, __u64 index,
+target_ulong *hpte0, target_ulong *hpte1);
 #else
 
 static inline uint32_t kvmppc_get_tbfreq(void)
diff --git a/target-ppc/mmu-hash64.c b/target-ppc/mmu-hash64.c
index 67fc1b5..239f268 100644
--- a/target-ppc/mmu-hash64.c
+++ b/target-ppc/mmu-hash64.c
@@ -302,17 +302,26 @@ static int ppc_hash64_amr_prot(CPUPPCState *env, 
ppc_hash_pte64_t pte)
 return prot;
 }
 
-static hwaddr ppc_hash64_pteg_search(CPUPPCState *env, hwaddr pteg_off,
+static hwaddr ppc_hash64_pteg_search(CPUPPCState *env, hwaddr hash,
  bool secondary, target_ulong ptem,
  ppc_hash_pte64_t *pte)
 {
-hwaddr pte_offset = pteg_off;
+__u64 index;
+hwaddr pte_offset;
 target_ulong pte0, pte1;
 int i;
 
+pte_offset = (hash * HASH_PTEG_SIZE_64)  env-htab_mask;;
+index = (hash * HPTES_PER_GROUP)  env-htab_mask;
+
 for (i = 0; i  HPTES_PER_GROUP; i++) {
-pte0 = ppc_hash64_load_hpte0(env, pte_offset);
-pte1 = ppc_hash64_load_hpte1(env, pte_offset);
+if (kvm_enabled()) {
+index += i;
+kvmppc_hash64_load_hpte(env, index, pte0, pte1);
+} else {
+pte0 = ppc_hash64_load_hpte0(env, pte_offset);
+pte1 = ppc_hash64_load_hpte1(env, pte_offset);
+}
 
 if ((pte0  HPTE64_V_VALID)
  (secondary == !!(pte0  HPTE64_V_SECONDARY))
@@ -332,7 +341,7 @@ static hwaddr ppc_hash64_htab_lookup(CPUPPCState *env,
  ppc_slb_t *slb, target_ulong eaddr,
  ppc_hash_pte64_t *pte)
 {
-hwaddr pteg_off, pte_offset;
+hwaddr pte_offset;
 hwaddr hash;
 uint64_t vsid, epnshift, epnmask, epn, ptem;
 
@@ -367,8 +376,7 @@ static hwaddr ppc_hash64_htab_lookup(CPUPPCState *env,
  vsid= TARGET_FMT_lx  ptem= TARGET_FMT_lx
  hash= TARGET_FMT_plx \n,
 env-htab_base, env-htab_mask, vsid, ptem,  hash);
-pteg_off = (hash * HASH_PTEG_SIZE_64)  env-htab_mask;
-pte_offset = ppc_hash64_pteg_search(env, pteg_off, 0, ptem, pte);
+pte_offset = ppc_hash64_pteg_search(env, hash, 0, ptem, pte);
 
 if (pte_offset == -1) {
 /* Secondary PTEG lookup */
@@ -377,8 +385,7 @@ static hwaddr ppc_hash64_htab_lookup(CPUPPCState *env,
  hash= TARGET_FMT_plx \n, env-htab_base,
 env-htab_mask, vsid, ptem, ~hash);
 
-

[Qemu-devel] [PATCH 2/3] target-ppc: Use #define instead of opencoding SLB valid bit

2013-08-19 Thread Aneesh Kumar K.V

From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com

Use SLB_ESID_V instead of (1  27) in the code

Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
---
 target-ppc/mmu_helper.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target-ppc/mmu_helper.c b/target-ppc/mmu_helper.c
index 5dd4e05..9c9132e 100644
--- a/target-ppc/mmu_helper.c
+++ b/target-ppc/mmu_helper.c
@@ -2061,7 +2061,7 @@ void helper_store_sr(CPUPPCState *env, target_ulong 
srnum, target_ulong value)
 /* ESID = srnum */
 rb |= ((uint32_t)srnum  0xf)  28;
 /* Set the valid bit */
-rb |= 1  27;
+rb |= SLB_ESID_V;
 /* Index = ESID */
 rb |= (uint32_t)srnum;
 
-- 
1.8.1.2

Re: [Qemu-devel] [PATCH] Make usb-bt-dongle configurable

2013-08-19 Thread Andreas Färber

Am 19.08.2013 12:48, schrieb Miroslav Rezanina:
 usb-bt-dongle device can't be disabled as there's dependency in vl.c file. 
 This patch add preprocesor condition to be able to disable it.

Please limit to 76 chars per line (check `git log` output).

 
 Signed-off-by: Miroslav Rezanina mreza...@redhat.com
 ---
  hw/usb/Makefile.objs |  1 -
  vl.c | 18 ++
  2 files changed, 14 insertions(+), 5 deletions(-)
 
 diff --git a/hw/usb/Makefile.objs b/hw/usb/Makefile.objs
 index f9695e7..8892ffd 100644
 --- a/hw/usb/Makefile.objs
 +++ b/hw/usb/Makefile.objs
 @@ -20,7 +20,6 @@ common-obj-$(CONFIG_USB_SERIAL)   += dev-serial.o
  common-obj-$(CONFIG_USB_NETWORK)  += dev-network.o
  
  # FIXME: make configurable too
 -CONFIG_USB_BLUETOOTH := y

You probably should delete the FIXME alongside?

  common-obj-$(CONFIG_USB_BLUETOOTH)+= dev-bluetooth.o
  
  ifeq ($(CONFIG_USB_SMARTCARD),y)
 diff --git a/vl.c b/vl.c
 index f422a1c..4330b6d 100644
 --- a/vl.c
 +++ b/vl.c
 @@ -1526,8 +1526,10 @@ static void configure_msg(QemuOpts *opts)
  
  static int usb_device_add(const char *devname)
  {
 -const char *p;
  USBDevice *dev = NULL;
 +#if  defined(CONFIG_USB_BLUETOOTH) || !defined(CONFIG_LINUX)

Double space.

 +const char *p;
 +#endif
  
  if (!usb_enabled(false)) {
  return -1;
 @@ -1543,15 +1545,23 @@ static int usb_device_add(const char *devname)
  /* only the linux version is qdev-ified, usb-bsd still needs this */
  if (strstart(devname, host:, p)) {
  dev = usb_host_device_open(usb_bus_find(-1), p);
 -} else
 +goto devtest;
 +} 
  #endif
 +#ifdef CONFIG_USB_BLUETOOTH
  if (!strcmp(devname, bt) || strstart(devname, bt:, p)) {
  dev = usb_bt_init(usb_bus_find(-1),
devname[2] ? hci_init(p)
   : bt_new_hci(qemu_find_bt_vlan(0)));
 -} else {
 -return -1;
 +goto devtest;
  }
 +#endif
 +
 +return -1;
 +
 +#if  defined(CONFIG_USB_BLUETOOTH) || !defined(CONFIG_LINUX)
 +devtest:
 +#endif

Why put only the label in an #if block? The below is dead code if not
reached via devtest.

  if (!dev)
  return -1;
  

Let's also not forget to CC Gerd as USB maintainer.

Regards,
Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

Re: [Qemu-devel] [PATCH 1/3] target-ppc: Update slb array with correct index values.

2013-08-19 Thread Andreas Färber

Am 19.08.2013 14:29, schrieb Aneesh Kumar K.V:
 From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
 
 Without this, a value of rb=0 and rs=0, results in replacing the 0th

... rs=0 results in ...

 index. This can be observed when using gdb remote debugging support.
 
 (gdb) x/10i do_fork
0xc0085330 do_fork:Cannot access memory at address 
 0xc0085330
 (gdb)
 
 This is because when we do the slb sync via kvm_cpu_synchronize_state,
 we overwrite the slb entry (0th entry) for 0xc008533

Is there a trailing 0 missing here?

 
 Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
 ---
  target-ppc/kvm.c | 14 --
  1 file changed, 12 insertions(+), 2 deletions(-)
 
 diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
 index 30a870e..5d4e613 100644
 --- a/target-ppc/kvm.c
 +++ b/target-ppc/kvm.c
 @@ -1034,8 +1034,18 @@ int kvm_arch_get_registers(CPUState *cs)
  /* Sync SLB */
  #ifdef TARGET_PPC64
  for (i = 0; i  64; i++) {
 -ppc_store_slb(env, sregs.u.s.ppc64.slb[i].slbe,
 -   sregs.u.s.ppc64.slb[i].slbv);
 +target_ulong rb  = sregs.u.s.ppc64.slb[i].slbe;

Double space.

 +/*
 + * KVM_GET_SREGS doesn't retun slb entry with slot information
 + * same as index. So don't depend on the slot information in
 + * the returned value.
 + */
 +rb = ~0xfff;
 +/*
 + * use the array index as the slot
 + */
 +rb |= i;
 +ppc_store_slb(env, rb, sregs.u.s.ppc64.slb[i].slbv);
  }
  #endif
  

Regards,
Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

Re: [Qemu-devel] [PATCH 2/3] target-ppc: Use #define instead of opencoding SLB valid bit

2013-08-19 Thread Andreas Färber

Am 19.08.2013 14:29, schrieb Aneesh Kumar K.V:
 From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
 
 Use SLB_ESID_V instead of (1  27) in the code
 
 Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com

Reviewed-by: Andreas Färber afaer...@suse.de

Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

Re: [Qemu-devel] [PATCH 1/3] target-ppc: Update slb array with correct index values.

2013-08-19 Thread Andreas Färber

Am 19.08.2013 14:42, schrieb Andreas Färber:
 Am 19.08.2013 14:29, schrieb Aneesh Kumar K.V:
 From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com

 Without this, a value of rb=0 and rs=0, results in replacing the 0th
 
 ... rs=0 results in ...
 
 index. This can be observed when using gdb remote debugging support.

 (gdb) x/10i do_fork
0xc0085330 do_fork:Cannot access memory at address 
 0xc0085330
 (gdb)

 This is because when we do the slb sync via kvm_cpu_synchronize_state,
 we overwrite the slb entry (0th entry) for 0xc008533
 
 Is there a trailing 0 missing here?
 

 Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
 ---
  target-ppc/kvm.c | 14 --
  1 file changed, 12 insertions(+), 2 deletions(-)

 diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
 index 30a870e..5d4e613 100644
 --- a/target-ppc/kvm.c
 +++ b/target-ppc/kvm.c
 @@ -1034,8 +1034,18 @@ int kvm_arch_get_registers(CPUState *cs)
  /* Sync SLB */
  #ifdef TARGET_PPC64
  for (i = 0; i  64; i++) {
 -ppc_store_slb(env, sregs.u.s.ppc64.slb[i].slbe,
 -   sregs.u.s.ppc64.slb[i].slbv);
 +target_ulong rb  = sregs.u.s.ppc64.slb[i].slbe;
 
 Double space.
 
 +/*
 + * KVM_GET_SREGS doesn't retun slb entry with slot information

return

 + * same as index. So don't depend on the slot information in
 + * the returned value.
 + */
 +rb = ~0xfff;
 +/*
 + * use the array index as the slot
 + */
 +rb |= i;
 +ppc_store_slb(env, rb, sregs.u.s.ppc64.slb[i].slbv);
  }
  #endif
  
 
 Regards,
 Andreas
 


-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

Re: [Qemu-devel] [PATCH] Make usb-bt-dongle configurable

2013-08-19 Thread Paolo Bonzini

Il 19/08/2013 12:48, Miroslav Rezanina ha scritto:
 usb-bt-dongle device can't be disabled as there's dependency in vl.c file. 
 This patch add preprocesor condition to be able to disable it.
 
 Signed-off-by: Miroslav Rezanina mreza...@redhat.com
 ---
  hw/usb/Makefile.objs |  1 -
  vl.c | 18 ++
  2 files changed, 14 insertions(+), 5 deletions(-)
 
 diff --git a/hw/usb/Makefile.objs b/hw/usb/Makefile.objs
 index f9695e7..8892ffd 100644
 --- a/hw/usb/Makefile.objs
 +++ b/hw/usb/Makefile.objs
 @@ -20,7 +20,6 @@ common-obj-$(CONFIG_USB_SERIAL)   += dev-serial.o
  common-obj-$(CONFIG_USB_NETWORK)  += dev-network.o
  
  # FIXME: make configurable too
 -CONFIG_USB_BLUETOOTH := y
  common-obj-$(CONFIG_USB_BLUETOOTH)+= dev-bluetooth.o
  
  ifeq ($(CONFIG_USB_SMARTCARD),y)
 diff --git a/vl.c b/vl.c
 index f422a1c..4330b6d 100644
 --- a/vl.c
 +++ b/vl.c
 @@ -1526,8 +1526,10 @@ static void configure_msg(QemuOpts *opts)
  
  static int usb_device_add(const char *devname)
  {
 -const char *p;
  USBDevice *dev = NULL;
 +#if  defined(CONFIG_USB_BLUETOOTH) || !defined(CONFIG_LINUX)
 +const char *p;
 +#endif
  
  if (!usb_enabled(false)) {
  return -1;
 @@ -1543,15 +1545,23 @@ static int usb_device_add(const char *devname)
  /* only the linux version is qdev-ified, usb-bsd still needs this */
  if (strstart(devname, host:, p)) {
  dev = usb_host_device_open(usb_bus_find(-1), p);
 -} else
 +goto devtest;
 +} 

No need for the goto, since the following ifs will fail...

  #endif
 +#ifdef CONFIG_USB_BLUETOOTH
  if (!strcmp(devname, bt) || strstart(devname, bt:, p)) {
  dev = usb_bt_init(usb_bus_find(-1),
devname[2] ? hci_init(p)
   : bt_new_hci(qemu_find_bt_vlan(0)));
 -} else {
 -return -1;
 +goto devtest;

... same here...

  }
 +#endif
 +
 +return -1;

... and no need for this return either: if you get here, dev will be
NULL and the if just below will fail.

Paolo

 +#if  defined(CONFIG_USB_BLUETOOTH) || !defined(CONFIG_LINUX)
 +devtest:
 +#endif
  if (!dev)
  return -1;

Re: [Qemu-devel] [PATCH 3/3] target-ppc: Fix page table lookup with kvm enabled

2013-08-19 Thread Andreas Färber

Am 19.08.2013 14:29, schrieb Aneesh Kumar K.V:
 From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
 
 With kvm enabled, we store the hash page table information in the hypervisor.
 Use ioctl to read the htab contents. Without this we get the below error when
 trying to read the guest address
 
  (gdb) x/10 do_fork
  0xc0098660 do_fork:   Cannot access memory at address 
 0xc0098660
  (gdb)
 
 Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
 ---
  target-ppc/kvm.c| 45 +
  target-ppc/kvm_ppc.h|  3 ++-
  target-ppc/mmu-hash64.c | 25 -
  3 files changed, 63 insertions(+), 10 deletions(-)
 
 diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
 index 5d4e613..63a9c0e 100644
 --- a/target-ppc/kvm.c
 +++ b/target-ppc/kvm.c
 @@ -1885,3 +1885,48 @@ int kvm_arch_on_sigbus(int code, void *addr)
  void kvm_arch_init_irq_routing(KVMState *s)
  {
  }
 +
 +int kvmppc_hash64_load_hpte(CPUPPCState *env, __u64 index,

uint64_t in QEMU please, same in the header.

Could you replace CPUPPCState *env with the new PowerPCCPU *cpu as
argument, please? I don't see it actually being used though...

 +target_ulong *hpte0, target_ulong *hpte1)
 +{
 +int htab_fd;
 +struct kvm_get_htab_fd ghf;
 +struct kvm_get_htab_buf {
 +struct kvm_get_htab_header header;
 +/*
 + * Older kernel required one extra byte.
 + */
 +unsigned long hpte[3];
 +} hpte_buf;
 +
 +*hpte0 = 0;
 +*hpte1 = 0;
 +if (!cap_htab_fd) {
 +return 0;
 +}
 +/*
 + * At this point we are only interested in reading only bolted entries
 + */
 +ghf.flags = KVM_GET_HTAB_BOLTED_ONLY;
 +ghf.start_index = index;
 +htab_fd = kvm_vm_ioctl(kvm_state, KVM_PPC_GET_HTAB_FD, ghf);
 +if (htab_fd  0) {
 +return htab_fd;
 +}
 +
 +if (read(htab_fd, hpte_buf, sizeof(hpte_buf))  0) {
 +goto out;
 +}
 +/*
 + * We only requested for one entry, So we should get only 1
 + * valid entry at the same index
 + */
 +if (hpte_buf.header.n_valid != 1 || hpte_buf.header.index != index) {
 +goto out;
 +}
 +*hpte0 = hpte_buf.hpte[0];
 +*hpte1 = hpte_buf.hpte[1];
 +out:
 +close(htab_fd);
 +return  0;

Double space :)

 +}
 diff --git a/target-ppc/kvm_ppc.h b/target-ppc/kvm_ppc.h
 index 4ae7bf2..51c8952 100644
 --- a/target-ppc/kvm_ppc.h
 +++ b/target-ppc/kvm_ppc.h
 @@ -42,7 +42,8 @@ int kvmppc_get_htab_fd(bool write);
  int kvmppc_save_htab(QEMUFile *f, int fd, size_t bufsize, int64_t max_ns);
  int kvmppc_load_htab_chunk(QEMUFile *f, int fd, uint32_t index,
 uint16_t n_valid, uint16_t n_invalid);
 -
 +int kvmppc_hash64_load_hpte(CPUPPCState *env, __u64 index,
 +target_ulong *hpte0, target_ulong *hpte1);
  #else
  
  static inline uint32_t kvmppc_get_tbfreq(void)
 diff --git a/target-ppc/mmu-hash64.c b/target-ppc/mmu-hash64.c
 index 67fc1b5..239f268 100644
 --- a/target-ppc/mmu-hash64.c
 +++ b/target-ppc/mmu-hash64.c
 @@ -302,17 +302,26 @@ static int ppc_hash64_amr_prot(CPUPPCState *env, 
 ppc_hash_pte64_t pte)
  return prot;
  }
  
 -static hwaddr ppc_hash64_pteg_search(CPUPPCState *env, hwaddr pteg_off,
 +static hwaddr ppc_hash64_pteg_search(CPUPPCState *env, hwaddr hash,
   bool secondary, target_ulong ptem,
   ppc_hash_pte64_t *pte)
  {
 -hwaddr pte_offset = pteg_off;
 +__u64 index;

This is not KVM code, so definitely uint64_t for cross-platform support.

 +hwaddr pte_offset;
  target_ulong pte0, pte1;
  int i;
  
 +pte_offset = (hash * HASH_PTEG_SIZE_64)  env-htab_mask;;
 +index = (hash * HPTES_PER_GROUP)  env-htab_mask;
 +
  for (i = 0; i  HPTES_PER_GROUP; i++) {
 -pte0 = ppc_hash64_load_hpte0(env, pte_offset);
 -pte1 = ppc_hash64_load_hpte1(env, pte_offset);
 +if (kvm_enabled()) {
 +index += i;
 +kvmppc_hash64_load_hpte(env, index, pte0, pte1);

ppc_env_get_cpu(env) would get you the PowerPCCPU* here, if needed.

Regards,
Andreas

 +} else {
 +pte0 = ppc_hash64_load_hpte0(env, pte_offset);
 +pte1 = ppc_hash64_load_hpte1(env, pte_offset);
 +}
  
  if ((pte0  HPTE64_V_VALID)
   (secondary == !!(pte0  HPTE64_V_SECONDARY))
 @@ -332,7 +341,7 @@ static hwaddr ppc_hash64_htab_lookup(CPUPPCState *env,
   ppc_slb_t *slb, target_ulong eaddr,
   ppc_hash_pte64_t *pte)
  {
 -hwaddr pteg_off, pte_offset;
 +hwaddr pte_offset;
  hwaddr hash;
  uint64_t vsid, epnshift, epnmask, epn, ptem;
  
 @@ -367,8 +376,7 @@ static hwaddr ppc_hash64_htab_lookup(CPUPPCState *env,
   vsid= TARGET_FMT_lx  ptem= TARGET_FMT_lx
   hash=

Re: [Qemu-devel] [PATCH] qcow2: Change default for new images to compat=1.1

2013-08-19 Thread Stefan Hajnoczi

On Mon, Aug 19, 2013 at 11:25:33AM +0200, Kevin Wolf wrote:
 By the time that qemu 1.7 will be released, enough time will has passed

s/has/have/

 since qemu 1.1, which is the first version to understand version 3
 images, that changing the default shouldn't hurt many people any more
 and the benefits of using the new format outweigh the pain.

In particular, compat=1.1 images include:
 * Feature bit fields which make the format more easily extensible
 * Zero clusters for data cluster preallocation, discard, efficient zero
   block migration, or block-backup
 * Lazy refcounts

Reviewed-by: Stefan Hajnoczi stefa...@redhat.com

Stefan

Re: [Qemu-devel] [PATCHv11 00/31] aio / timers: Add AioContext timers and use ppoll

2013-08-19 Thread Stefan Hajnoczi

On Fri, Aug 16, 2013 at 04:37:28PM +0100, Alex Bligh wrote:
 Stefan,
 
 On 16 Aug 2013, at 16:24, Stefan Hajnoczi wrote:
 
  Please run qemu-iotests on this series and fix any failures.  It is
  failing with I/O thread has spun for 1000 iterations.
 
 It does that without my patch. Below is the iterative build log for
 each revision being added (I've snipped all of stdout/stderr save
 this error), and you will see the main-loop warning
 appears even before the first of my patches is applied.

Alex investigated more on IRC and found the bug.  Fixed in v12.

Stefan

Re: [Qemu-devel] [PATCH 0/3] target-ppc: Add support for dumping guest memory using qemu gdb server

2013-08-19 Thread Andreas Färber

Hi Aneesh,

Am 19.08.2013 14:29, schrieb Aneesh Kumar K.V:
 This patch series implement support for dumping guest memory using qemu gdb 
 server.

I had a quick look through but will leave in-depth review to Alex or
Anthony.

Do you plan to implement dumping guest memory via QMP, too?

Cheers,
Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

Re: [Qemu-devel] [PATCH 5/8] vfio: Add guest side IOMMU support

2013-08-19 Thread Paolo Bonzini

Il 15/08/2013 08:02, Alexey Kardashevskiy ha scritto:
 On 08/13/2013 08:07 AM, Alex Williamson wrote:
 +static void vfio_listener_region_add(MemoryListener *listener,
 + MemoryRegionSection *section)
 +{
 +VFIOContainer *container = container_of(listener, VFIOContainer,
 +iommu_data.listener);
 +hwaddr iova, end;
 +int ret;
  
  if (vfio_listener_skipped_section(section)) {
  DPRINTF(SKIPPING region_add %HWADDR_PRIx - %PRIx64\n,
 @@ -1952,19 +2011,51 @@ static void vfio_listener_region_add(MemoryListener 
 *listener,
  return;
  }
  
 -vaddr = memory_region_get_ram_ptr(section-mr) +
 -section-offset_within_region +
 -(iova - section-offset_within_address_space);
 -
 -DPRINTF(region_add %HWADDR_PRIx - %HWADDR_PRIx [%p]\n,
 -iova, end - 1, vaddr);
 -
 -memory_region_ref(section-mr);
 -ret = vfio_dma_map(container, iova, end - iova, vaddr, 
 section-readonly);
 -if (ret) {
 -error_report(vfio_dma_map(%p, 0x%HWADDR_PRIx, 
 - 0x%HWADDR_PRIx, %p) = %d (%m),
 - container, iova, end - iova, vaddr, ret);
 +if (memory_region_is_iommu(section-mr)) {
 +VFIOGuestIOMMU *giommu;
 +
 +DPRINTF(region_add [iommu] %HWADDR_PRIx - %HWADDR_PRIx\n,
 +iova, end - 1);
 +
 +memory_region_ref(section-mr);
 +/*
 + * FIXME: We should do some checking to see if the
 + * capabilities of the host VFIO IOMMU are adequate to model
 + * the guest IOMMU
 + *
 + * FIXME: This assumes that the guest IOMMU is empty of
 + * mappings at this point - we should either enforce this, or
 + * loop through existing mappings to map them into VFIO.
 + *
 + * FIXME: For VFIO iommu types which have KVM acceleration to
 + * avoid bouncing all map/unmaps through qemu this way, this
 + * would be the right place to wire that up (tell the KVM
 + * device emulation the VFIO iommu handles to use).
 + */
 +giommu = g_malloc0(sizeof(*giommu));
 +giommu-iommu = section-mr;
 +giommu-container = container;
 +giommu-n.notify = vfio_iommu_map_notify;
 +QLIST_INSERT_HEAD(container-guest_iommus, giommu, list);
 +memory_region_register_iommu_notifier(giommu-iommu, giommu-n);
 +
 +} else if (memory_region_is_ram(section-mr)) {

Please change this to an else, and leave the refs outside the if, just
after checking for vfio_listener_skipped_section.  This way it's clearer
that vfio_listener_region_add matches vfio_listener_region_del.

Paolo

Re: [Qemu-devel] [PATCH 3/8] vfio: Introduce VFIO address spaces

2013-08-19 Thread Paolo Bonzini

Il 13/08/2013 00:07, Alex Williamson ha scritto:
  +if (pci_iommu_as(pdev) != address_space_memory) {
  +error_report(vfio: DMA address space must be system memory);
  +return -ENXIO;
 -EFAULT?  It's a bad address of sorts.
 

Accessing it would SIGSEGV, so it is not really EFAULT.  I would just
use EINVAL, the numeric error code will go away as soon as initfn is
changed to use Error * (which is needed to propagate sensible error
messages to the QMP client).

Paolo

Re: [Qemu-devel] Using aio_poll for timer carrier threads

2013-08-19 Thread Paolo Bonzini

Il 13/08/2013 16:54, Jan Kiszka ha scritto:
  Using an AioContext lock for timers is somewhat complicated for lock
  ordering, because context A could try to modify a timer from context B,
  at the same time when context B is modifying a timer from context A.
  This would cause a deadlock.
 That's like MMIO access on device A triggers MMIO access on B and vice
 versa - why should we need this, so why should we support this? I think
 the typical case is that timers (and their lists) and data structures
 they access have a fairly close relation, thus can reuse the same lock.

Yes, that's true.  Still it would have to be documented, and using
too-coarse locks risks having many BQLs, which multiplies the complexity
(fine-grained locking at least keeps critical sections small and limits
the amount of nested locking).

I like Stefan's patches to make the timer list thread-safe, especially
if we can optimize it (with RCU?) to make the read side lockless.

Paolo

Re: [Qemu-devel] [Bug?] qemu-1.6.0 python traceback in GEN qmp-commands.h

2013-08-19 Thread Luiz Capitulino

On Fri, 16 Aug 2013 14:21:50 +0100
Peter Maydell peter.mayd...@linaro.org wrote:

 On 16 August 2013 08:59, Erik Rull erik.r...@rdsoftware.de wrote:
  Hi all,
 
  when using the released qemu-1.6.0.tar.bz2, I get the following error 
  message:
File /home/erik/qemu-1.6.0/scripts/qapi.py, line 164
  except QAPISchemaError as e:
  ^
  SyntaxError: invalid syntax
  make: *** [qmp-commands.h] Error 1
 
 My guess is that your python is older than 2.6; I think
 the except Foo as e syntax is new in 2.6. We probably
 missed this because most people use a newer Python than
 2.6, but configure's check only requires 2.4 or better.
 
 We should probably update the scripts to not use overly
 new features (or alternatively decide that 2.6 is our new
 minimum -- what do RHEL5 and our other oldest-supported
 distros ship?)
 
 For this specific case I think it needs to change to
  except QAPISchemaError, e:

Erik, can you try that and post a patch? Would be interesting
to know if this is the only problem with older python.

Re: [Qemu-devel] [PATCH] Make usb-bt-dongle configurable

2013-08-19 Thread Laszlo Ersek

On 08/19/13 14:31, Andreas Färber wrote:
 Am 19.08.2013 12:48, schrieb Miroslav Rezanina:
 usb-bt-dongle device can't be disabled as there's dependency in vl.c file. 
 This patch add preprocesor condition to be able to disable it.
 
 Please limit to 76 chars per line (check `git log` output).
 

 Signed-off-by: Miroslav Rezanina mreza...@redhat.com
 ---
  hw/usb/Makefile.objs |  1 -
  vl.c | 18 ++
  2 files changed, 14 insertions(+), 5 deletions(-)

 diff --git a/hw/usb/Makefile.objs b/hw/usb/Makefile.objs
 index f9695e7..8892ffd 100644
 --- a/hw/usb/Makefile.objs
 +++ b/hw/usb/Makefile.objs
 @@ -20,7 +20,6 @@ common-obj-$(CONFIG_USB_SERIAL)   += dev-serial.o
  common-obj-$(CONFIG_USB_NETWORK)  += dev-network.o
  
  # FIXME: make configurable too
 -CONFIG_USB_BLUETOOTH := y
 
 You probably should delete the FIXME alongside?

What's everyone's opinion about CONFIG_USB_BLUETOOTH=y disappearing from
the default build?

Thanks
Laszlo

Re: [Qemu-devel] [PULL 08/18] block/gluster: drop qemu_gluster_aio_flush_cb()

2013-08-19 Thread Stefan Hajnoczi

On Sat, Aug 17, 2013 at 10:52:34AM +0530, Bharata B Rao wrote:
  From: Stefan Hajnoczi stefa...@redhat.com
  Since .io_flush() is no longer called we do not need
  qemu_gluster_aio_flush_cb() anymore.  It turns out that qemu_aio_count
  is unused now and can be dropped.
  
  Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
 
 Tested-by: Bharata B Rao bhar...@linux.vnet.ibm.com
 
 Tested GlusterFS backend, but it needed the below fix:
 --

Thanks, I have fixed my development machine so block/gluster.c is always
built.

Since this pull request hasn't been merged yet I am resending with your
fix squashed in.

Stefan

Re: [Qemu-devel] [PULL 00/18] Block patches

2013-08-19 Thread Stefan Hajnoczi

On Fri, Aug 16, 2013 at 05:47:06PM +0200, Stefan Hajnoczi wrote:
 fatal: Needed a single revision
 The following changes since commit f202039811d8746b0586d2fd5f61de6c8cf68056:
 
   Open up 1.7 development branch (2013-08-15 15:41:13 -0500)
 
 are available in the git repository at:
 
   git://github.com/stefanha/qemu.git block-next
 
 for you to fetch changes up to ab97be151813b6823fa41e46f62c500f5b03f771:
 
   aio: drop io_flush argument (2013-08-16 14:16:21 +0200)
 
 
 Stefan Hajnoczi (18):
   block: ensure bdrv_drain_all() works during bdrv_delete()
   block: stop relying on io_flush() in bdrv_drain_all()
   dataplane/virtio-blk: check exit conditions before aio_poll()
   tests: adjust test-aio to new aio_poll() semantics
   tests: adjust test-thread-pool to new aio_poll() semantics
   aio: stop using .io_flush()
   block/curl: drop curl_aio_flush()
   block/gluster: drop qemu_gluster_aio_flush_cb()
   block/iscsi: drop iscsi_process_flush()
   block/linux-aio: drop qemu_laio_completion_cb()
   block/nbd: drop nbd_have_request()
   block/rbd: drop qemu_rbd_aio_flush_cb()
   block/sheepdog: drop have_co_req() and aio_flush_request()
   block/ssh: drop return_true()
   dataplane/virtio-blk: drop flush_true() and flush_io()
   thread-pool: drop thread_pool_active()
   tests: drop event_active_cb()
   aio: drop io_flush argument
 
  aio-posix.c | 36 ++
  aio-win32.c | 37 ---
  async.c |  4 +-
  block.c | 49 ++--
  block/curl.c| 25 ++---
  block/gluster.c | 21 ++-
  block/iscsi.c   | 10 +
  block/linux-aio.c   | 18 +
  block/nbd.c | 18 ++---
  block/rbd.c | 16 +---
  block/sheepdog.c| 33 -
  block/ssh.c | 12 +-
  block/stream.c  |  6 ++-
  hw/block/dataplane/virtio-blk.c | 25 +++--
  include/block/aio.h | 14 +--
  main-loop.c |  9 ++---
  tests/test-aio.c| 82 
 +
  tests/test-thread-pool.c| 24 ++--
  thread-pool.c   | 11 +-
  19 files changed, 163 insertions(+), 287 deletions(-)

Bharata found an issue with the GlusterFS patch, I will resend with the
fix squashed in.

Stefan

Re: [Qemu-devel] Using aio_poll for timer carrier threads

2013-08-19 Thread Jan Kiszka

On 2013-08-19 15:21, Paolo Bonzini wrote:
 Il 13/08/2013 16:54, Jan Kiszka ha scritto:
 Using an AioContext lock for timers is somewhat complicated for lock
 ordering, because context A could try to modify a timer from context B,
 at the same time when context B is modifying a timer from context A.
 This would cause a deadlock.
 That's like MMIO access on device A triggers MMIO access on B and vice
 versa - why should we need this, so why should we support this? I think
 the typical case is that timers (and their lists) and data structures
 they access have a fairly close relation, thus can reuse the same lock.
 
 Yes, that's true.  Still it would have to be documented, and using
 too-coarse locks risks having many BQLs, which multiplies the complexity
 (fine-grained locking at least keeps critical sections small and limits
 the amount of nested locking).

As this lock does not require taking other locks while holding it, it
should actually be fine.

 
 I like Stefan's patches to make the timer list thread-safe, especially
 if we can optimize it (with RCU?) to make the read side lockless.

What is a pure read-side in that context? Checks if some timer is
expired? Given that RCU write sides are heavier than plain mutexes and
many typical accesses (mod, del, expire) involve writing, such an
optimization may also be counterproductive.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux

Re: [Qemu-devel] [PATCH] Make usb-bt-dongle configurable

2013-08-19 Thread Andreas Färber

Am 19.08.2013 15:30, schrieb Laszlo Ersek:
 On 08/19/13 14:31, Andreas Färber wrote:
 Am 19.08.2013 12:48, schrieb Miroslav Rezanina:
 usb-bt-dongle device can't be disabled as there's dependency in vl.c file. 
 This patch add preprocesor condition to be able to disable it.

 Please limit to 76 chars per line (check `git log` output).


 Signed-off-by: Miroslav Rezanina mreza...@redhat.com
 ---
  hw/usb/Makefile.objs |  1 -
  vl.c | 18 ++
  2 files changed, 14 insertions(+), 5 deletions(-)

 diff --git a/hw/usb/Makefile.objs b/hw/usb/Makefile.objs
 index f9695e7..8892ffd 100644
 --- a/hw/usb/Makefile.objs
 +++ b/hw/usb/Makefile.objs
 @@ -20,7 +20,6 @@ common-obj-$(CONFIG_USB_SERIAL)   += dev-serial.o
  common-obj-$(CONFIG_USB_NETWORK)  += dev-network.o
  
  # FIXME: make configurable too
 -CONFIG_USB_BLUETOOTH := y

 You probably should delete the FIXME alongside?
 
 What's everyone's opinion about CONFIG_USB_BLUETOOTH=y disappearing from
 the default build?

By my reading of `git grep CONFIG_USB_BLUETOOTH` it isn't disappearing,
check default-configs/usb.mak. All targets that include usb.mak will
have CONFIG_USB_BLUETOOTH.

It's only used in the build system and with this patch in vl.c, so
assuming that Miroslav has checked that the build succeeds for all
targets, this should be fine, I guess.

Regards,
Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

Re: [Qemu-devel] [Bug?] qemu-1.6.0 python traceback in GEN qmp-commands.h

2013-08-19 Thread Andreas Färber

Am 16.08.2013 15:21, schrieb Peter Maydell:
 On 16 August 2013 08:59, Erik Rull erik.r...@rdsoftware.de wrote:
 Hi all,

 when using the released qemu-1.6.0.tar.bz2, I get the following error 
 message:
   File /home/erik/qemu-1.6.0/scripts/qapi.py, line 164
 except QAPISchemaError as e:
 ^
 SyntaxError: invalid syntax
 make: *** [qmp-commands.h] Error 1
 
 My guess is that your python is older than 2.6; I think
 the except Foo as e syntax is new in 2.6. We probably
 missed this because most people use a newer Python than
 2.6, but configure's check only requires 2.4 or better.
 
 We should probably update the scripts to not use overly
 new features (or alternatively decide that 2.6 is our new
 minimum -- what do RHEL5 and our other oldest-supported
 distros ship?)

I vaguely remember running into such problems before... possibly on
Solaris. We compiled a list of Python versions and I think settled for
2.4 based on some old RHEL, too. CC'ing Stefan.

git-blame points to:
http://repo.or.cz/w/qemu.git/commit/e120d449e1b39ec508c297b963ce452628dd37c3?f=configure

Andreas

 
 For this specific case I think it needs to change to
  except QAPISchemaError, e:
 
 thanks
 -- PMM
 


-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

[Qemu-devel] [PATCH] Change email address

2013-08-19 Thread Anthony Liguori

My IBM email address will be unaccessible after August 23rd, 2013.

Signed-off-by: Anthony Liguori anth...@codemonkey.ws
---
 .mailmap|  2 +-
 MAINTAINERS | 14 +++---
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/.mailmap b/.mailmap
index 9797802..7b91a95 100644
--- a/.mailmap
+++ b/.mailmap
@@ -2,7 +2,7 @@
 # into proper addresses so that they are counted properly in git shortlog 
output.
 #
 Andrzej Zaborowski balr...@gmail.com balrog 
balrog@c046a42c-6fe2-441c-8c8c-71466251a162
-Anthony Liguori aligu...@us.ibm.com aliguori 
aliguori@c046a42c-6fe2-441c-8c8c-71466251a162
+Anthony Liguori anth...@codemonkey.ws aliguori 
aliguori@c046a42c-6fe2-441c-8c8c-71466251a162
 Aurelien Jarno aurel...@aurel32.net aurel32 
aurel32@c046a42c-6fe2-441c-8c8c-71466251a162
 Blue Swirl blauwir...@gmail.com blueswir1 
blueswir1@c046a42c-6fe2-441c-8c8c-71466251a162
 Edgar E. Iglesias edgar.igles...@gmail.com edgar_igl 
edgar_igl@c046a42c-6fe2-441c-8c8c-71466251a162
diff --git a/MAINTAINERS b/MAINTAINERS
index 654e2cb..70a3370 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -50,7 +50,7 @@ Descriptions of section entries:
 
 General Project Administration
 --
-M: Anthony Liguori aligu...@us.ibm.com
+M: Anthony Liguori anth...@codemonkey.ws
 M: Paul Brook p...@codesourcery.com
 
 Guest CPU cores (TCG):
@@ -509,7 +509,7 @@ F: hw/unicore32/
 X86 Machines
 
 PC
-M: Anthony Liguori aligu...@us.ibm.com
+M: Anthony Liguori anth...@codemonkey.ws
 S: Supported
 F: hw/i386/pc.[ch]
 F: hw/i386/pc_piix.c
@@ -593,7 +593,7 @@ S: Supported
 F: hw/*/*vhost*
 
 virtio
-M: Anthony Liguori aligu...@us.ibm.com
+M: Anthony Liguori anth...@codemonkey.ws
 S: Supported
 F: hw/*/virtio*
 
@@ -651,7 +651,7 @@ F: block/
 F: hw/block/
 
 Character Devices
-M: Anthony Liguori aligu...@us.ibm.com
+M: Anthony Liguori anth...@codemonkey.ws
 S: Maintained
 F: qemu-char.c
 
@@ -689,7 +689,7 @@ F: audio/spiceaudio.c
 F: hw/display/qxl*
 
 Graphics
-M: Anthony Liguori aligu...@us.ibm.com
+M: Anthony Liguori anth...@codemonkey.ws
 S: Maintained
 F: ui/
 
@@ -699,7 +699,7 @@ S: Odd Fixes
 F: ui/cocoa.m
 
 Main loop
-M: Anthony Liguori aligu...@us.ibm.com
+M: Anthony Liguori anth...@codemonkey.ws
 S: Supported
 F: vl.c
 
@@ -711,7 +711,7 @@ F: hmp.c
 F: hmp-commands.hx
 
 Network device layer
-M: Anthony Liguori aligu...@us.ibm.com
+M: Anthony Liguori anth...@codemonkey.ws
 M: Stefan Hajnoczi stefa...@redhat.com
 S: Maintained
 F: net/
-- 
1.8.0

Re: [Qemu-devel] [PATCH v3 2/8] xics: add pre_save/post_load/cpu_setup dispatchers

2013-08-19 Thread Andreas Färber

Am 19.08.2013 07:55, schrieb Alexey Kardashevskiy:
 The upcoming support of in-kernel XICS will redefine migration callbacks
 for both ICS and ICP so classes and callback pointers are added.
 
 This adds a cpu_setup callback to the XICS device class (as XICS-KVM
 will do it different) and xics_dispatch_cpu_setup(). This also moves
 the place where xics_dispatch_cpu_setup() is called a bit further to
 have VCPU initialized (required for XICS-KVM).
 
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
 ---
 Changes:
 v3:
 * fixed local variables names
 ---
  hw/intc/xics.c| 61 
 +++
  hw/ppc/spapr.c|  4 ++--
  include/hw/ppc/xics.h | 46 +-
  3 files changed, 104 insertions(+), 7 deletions(-)
 
 diff --git a/hw/intc/xics.c b/hw/intc/xics.c
 index 6b3c071..e3a957d 100644
 --- a/hw/intc/xics.c
 +++ b/hw/intc/xics.c
[...]
 @@ -674,10 +724,12 @@ static Property xics_properties[] = {
  static void xics_class_init(ObjectClass *oc, void *data)
  {
  DeviceClass *dc = DEVICE_CLASS(oc);
 +XICSStateClass *k = XICS_CLASS(oc);
  
  dc-realize = xics_realize;
  dc-props = xics_properties;
  dc-reset = xics_reset;
 +k-cpu_setup = xics_cpu_setup;
  
  spapr_rtas_register(ibm,set-xive, rtas_set_xive);
  spapr_rtas_register(ibm,get-xive, rtas_get_xive);

This hunk is fixed up in 4/8, please squash that bit here.

Otherwise looks good.

Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

Re: [Qemu-devel] Using aio_poll for timer carrier threads

2013-08-19 Thread Alex Bligh


On 19 Aug 2013, at 14:21, Paolo Bonzini wrote:

 I like Stefan's patches to make the timer list thread-safe, especially
 if we can optimize it (with RCU?) to make the read side lockless.

We will have to be careful on the write side. I'm trying to
work out just how slow this would with Paolo's RCU patches.
If call_rcu is used (or synchronize_rcu after the entire list
has been traversed, all timers updated etc), how bad is it?

There is a particular concern for  for repeating timers where two writes
are done (firstly to remove the timer from the list, then next
inside the timer routine to add it back to the list). As timers
are only called by a single thread, it might be worth having
some form of timer API which specifies a timer should be
called every X nano/micro/milli/seconds, or just once in X
nano/micro/milli/seconds time, which would save the inevitable
call to read the clock value first, and would speed things up
if write locking was slow.

-- 
Alex Bligh

Re: [Qemu-devel] [PATCH] Make usb-bt-dongle configurable

2013-08-19 Thread Paolo Bonzini

Il 19/08/2013 15:41, Andreas Färber ha scritto:
 By my reading of `git grep CONFIG_USB_BLUETOOTH` it isn't disappearing,
 check default-configs/usb.mak. All targets that include usb.mak will
 have CONFIG_USB_BLUETOOTH.
 
 It's only used in the build system and with this patch in vl.c, so
 assuming that Miroslav has checked that the build succeeds for all
 targets, this should be fine, I guess.

Actually, CONFIG_* symbols for devices are _not_ exposed to vl.c.  Blue
Swirl specifically wanted that to happen.  So I'm afraid Mirek's patch
breaks -usb bt even if CONFIG_USB_BLUETOOTH is set to y.

Paolo

[Qemu-devel] [FYI] Changing email addresses/employers

2013-08-19 Thread Anthony Liguori

Hi,

I've decided to take an opportunity outside of IBM which means that my
IBM email address will no longer work after this friday (August 23rd).
Please start CC'ing my anth...@codemonkey.ws address on all patches and
pull requests.

I'm excited about my new opportunity but I am going to wait until I get
closer to my start date to share information about it.  I will continue
to be involved in QEMU development and other than some down time during
relocation, upstream development should continue uninterrupted.

I wanted to also take this opportunity to thank IBM for all of its
support for QEMU over the past 7 years.  There is a whole slew of people
behind the scenes that don't end up in commit statistics but that have
nonetheless been crucial to the project's success.  I really do
appreciate all of the support we ahve received from IBM, Red Hat, SUSE,
and all of the other large backers of QEMU.

IBM remains just as committed as ever to QEMU and KVM if not more.  I'm
excited to continue to get to work with the IBM team in a different
capacity.

If you have any questions or concerns, please feel free to reach out to
me privately and I'll do my best to answer them.

Regards,

Anthony Liguori

Re: [Qemu-devel] [PATCH 1/2] Improve Monitor disas with symbol lookup

2013-08-19 Thread Fabien Chouteau

On 08/19/2013 10:33 AM, Peter Maydell wrote:
 On 2 August 2013 13:48, Fabien Chouteau chout...@adacore.com wrote:
 Part of M731-018.
 
 What is this a reference to?
 

I'm sorry this is not supposed to be here. Do you have comments before I
resend the patch?

Regards,

-- 
Fabien Chouteau

Re: [Qemu-devel] Using aio_poll for timer carrier threads

2013-08-19 Thread Paolo Bonzini

Il 19/08/2013 15:40, Jan Kiszka ha scritto:
 On 2013-08-19 15:21, Paolo Bonzini wrote:
 Il 13/08/2013 16:54, Jan Kiszka ha scritto:
 Using an AioContext lock for timers is somewhat complicated for lock
 ordering, because context A could try to modify a timer from context B,
 at the same time when context B is modifying a timer from context A.
 This would cause a deadlock.
 That's like MMIO access on device A triggers MMIO access on B and vice
 versa - why should we need this, so why should we support this? I think
 the typical case is that timers (and their lists) and data structures
 they access have a fairly close relation, thus can reuse the same lock.

 Yes, that's true.  Still it would have to be documented, and using
 too-coarse locks risks having many BQLs, which multiplies the complexity
 (fine-grained locking at least keeps critical sections small and limits
 the amount of nested locking).
 
 As this lock does not require taking other locks while holding it, it
 should actually be fine.

Using an AioContext lock would be more dangerous, though.  The risk is
mostly that someone is holding _another_ AioContext lock while calling a
timer function.

For MMIO, the rule would be that all address_space_rw/map/unmap must be
done with no lock taken.

 I like Stefan's patches to make the timer list thread-safe, especially
 if we can optimize it (with RCU?) to make the read side lockless.
 
 What is a pure read-side in that context? Checks if some timer is
 expired?

Yes.  It is actually quite common, as Stefan pointed out elsewhere to
Ping Fan, as it happens at least once in each iteration of the loop.

The read side doesn't really need to look beyond the head of the list.

 Given that RCU write sides are heavier than plain mutexes and
 many typical accesses (mod, del, expire) involve writing, such an
 optimization may also be counterproductive.

This would have to be a creative use of RCU, with no copy in it...
basically you can use RCU only as a substitute for reference counting.
call_rcu/synchronize_rcu is not done on every write operation, all we
need to do is ensuring that devices free timers only after an RCU
critical section.  We should do the same for memory regions, see my
patches to split exitfn and instance_finalize; I'll go more in depth in
my KVM Forum 2013.

Lockless algorithms (including RCU) are somewhat appealing because you
do not have to worry about locking, but of course all warnings about
increased complexity and premature optimization apply.

Paolo

Re: [Qemu-devel] [PATCH 1/2] Improve Monitor disas with symbol lookup

2013-08-19 Thread Andreas Färber

Am 02.08.2013 14:48, schrieb Fabien Chouteau:
 Part of M731-018.
 
 Signed-off-by: Fabien Chouteau chout...@adacore.com
 ---
  disas.c |   16 
  1 file changed, 12 insertions(+), 4 deletions(-)
 
 diff --git a/disas.c b/disas.c
 index 71007fb..3ffb3ae 100644
 --- a/disas.c
 +++ b/disas.c
 @@ -480,11 +480,19 @@ void monitor_disas(Monitor *mon, CPUArchState *env,
  #endif
  
  for(i = 0; i  nb_insn; i++) {
 - monitor_printf(mon, 0x TARGET_FMT_lx :  , pc);
 +const char *sym = lookup_symbol(pc);
 +
 +monitor_printf(mon, 0x TARGET_FMT_lx  , pc);
 +if (sym[0] != '\0') {
 +monitor_printf(mon, %s:  , sym);
 +} else {
 +monitor_printf(mon, :  );
 +}

Independent of PMM's comment, I think you meant

+monitor_printf(mon, 0x TARGET_FMT_lx, pc);
+if (sym[0] != '\0') {
+monitor_printf(mon,  %s:  , sym);
+} else {
+monitor_printf(mon, :  );
+}

to keep the output unchanged.

Could you please put the tab fixing into a preceding patch for
readability and prepend a cover letter?

Regards,
Andreas

  count = print_insn(pc, s.info);
 - monitor_printf(mon, \n);
 - if (count  0)
 - break;
 +monitor_printf(mon, \n);
 +if (count  0) {
 +break;
 +}
  pc += count;
  }
  }
 


-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

[Qemu-devel] [PULL 08/18] block/gluster: drop qemu_gluster_aio_flush_cb()

2013-08-19 Thread Stefan Hajnoczi

Since .io_flush() is no longer called we do not need
qemu_gluster_aio_flush_cb() anymore.  It turns out that qemu_aio_count
is unused now and can be dropped.

Thanks to Bharata B Rao bhar...@linux.vnet.ibm.com for catching a
build failure with CONFIG_GLUSTERFS_DISCARD, which has been fixed.

Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 block/gluster.c | 18 +-
 1 file changed, 1 insertion(+), 17 deletions(-)

diff --git a/block/gluster.c b/block/gluster.c
index 645b7f1..3cb7500 100644
--- a/block/gluster.c
+++ b/block/gluster.c
@@ -32,7 +32,6 @@ typedef struct BDRVGlusterState {
 struct glfs *glfs;
 int fds[2];
 struct glfs_fd *fd;
-int qemu_aio_count;
 int event_reader_pos;
 GlusterAIOCB *event_acb;
 } BDRVGlusterState;
@@ -247,7 +246,6 @@ static void qemu_gluster_complete_aio(GlusterAIOCB *acb, 
BDRVGlusterState *s)
 ret = -EIO; /* Partial read/write - fail it */
 }
 
-s-qemu_aio_count--;
 qemu_aio_release(acb);
 cb(opaque, ret);
 if (finished) {
@@ -275,13 +273,6 @@ static void qemu_gluster_aio_event_reader(void *opaque)
 } while (ret  0  errno == EINTR);
 }
 
-static int qemu_gluster_aio_flush_cb(void *opaque)
-{
-BDRVGlusterState *s = opaque;
-
-return (s-qemu_aio_count  0);
-}
-
 /* TODO Convert to fine grained options */
 static QemuOptsList runtime_opts = {
 .name = gluster,
@@ -348,7 +339,7 @@ static int qemu_gluster_open(BlockDriverState *bs,  QDict 
*options,
 }
 fcntl(s-fds[GLUSTER_FD_READ], F_SETFL, O_NONBLOCK);
 qemu_aio_set_fd_handler(s-fds[GLUSTER_FD_READ],
-qemu_gluster_aio_event_reader, NULL, qemu_gluster_aio_flush_cb, s);
+qemu_gluster_aio_event_reader, NULL, NULL, s);
 
 out:
 qemu_opts_del(opts);
@@ -445,7 +436,6 @@ static void gluster_finish_aiocb(struct glfs_fd *fd, 
ssize_t ret, void *arg)
 qemu_mutex_lock_iothread(); /* We are in gluster thread context */
 acb-common.cb(acb-common.opaque, -EIO);
 qemu_aio_release(acb);
-s-qemu_aio_count--;
 close(s-fds[GLUSTER_FD_READ]);
 close(s-fds[GLUSTER_FD_WRITE]);
 qemu_aio_set_fd_handler(s-fds[GLUSTER_FD_READ], NULL, NULL, NULL,
@@ -467,7 +457,6 @@ static BlockDriverAIOCB 
*qemu_gluster_aio_rw(BlockDriverState *bs,
 
 offset = sector_num * BDRV_SECTOR_SIZE;
 size = nb_sectors * BDRV_SECTOR_SIZE;
-s-qemu_aio_count++;
 
 acb = qemu_aio_get(gluster_aiocb_info, bs, cb, opaque);
 acb-size = size;
@@ -488,7 +477,6 @@ static BlockDriverAIOCB 
*qemu_gluster_aio_rw(BlockDriverState *bs,
 return acb-common;
 
 out:
-s-qemu_aio_count--;
 qemu_aio_release(acb);
 return NULL;
 }
@@ -531,7 +519,6 @@ static BlockDriverAIOCB 
*qemu_gluster_aio_flush(BlockDriverState *bs,
 acb-size = 0;
 acb-ret = 0;
 acb-finished = NULL;
-s-qemu_aio_count++;
 
 ret = glfs_fsync_async(s-fd, gluster_finish_aiocb, acb);
 if (ret  0) {
@@ -540,7 +527,6 @@ static BlockDriverAIOCB 
*qemu_gluster_aio_flush(BlockDriverState *bs,
 return acb-common;
 
 out:
-s-qemu_aio_count--;
 qemu_aio_release(acb);
 return NULL;
 }
@@ -563,7 +549,6 @@ static BlockDriverAIOCB 
*qemu_gluster_aio_discard(BlockDriverState *bs,
 acb-size = 0;
 acb-ret = 0;
 acb-finished = NULL;
-s-qemu_aio_count++;
 
 ret = glfs_discard_async(s-fd, offset, size, gluster_finish_aiocb, acb);
 if (ret  0) {
@@ -572,7 +557,6 @@ static BlockDriverAIOCB 
*qemu_gluster_aio_discard(BlockDriverState *bs,
 return acb-common;
 
 out:
-s-qemu_aio_count--;
 qemu_aio_release(acb);
 return NULL;
 }
-- 
1.8.3.1

[Qemu-devel] [PULL 07/18] block/curl: drop curl_aio_flush()

2013-08-19 Thread Stefan Hajnoczi

.io_flush() is no longer called so drop curl_aio_flush().  The acb[]
array that the function checks is still used in other parts of
block/curl.c.  Therefore we cannot remove acb[], it is needed.

Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 block/curl.c | 22 +++---
 1 file changed, 3 insertions(+), 19 deletions(-)

diff --git a/block/curl.c b/block/curl.c
index 82d39ff..524 100644
--- a/block/curl.c
+++ b/block/curl.c
@@ -86,7 +86,6 @@ typedef struct BDRVCURLState {
 
 static void curl_clean_state(CURLState *s);
 static void curl_multi_do(void *arg);
-static int curl_aio_flush(void *opaque);
 
 static int curl_sock_cb(CURL *curl, curl_socket_t fd, int action,
 void *s, void *sp)
@@ -94,14 +93,14 @@ static int curl_sock_cb(CURL *curl, curl_socket_t fd, int 
action,
 DPRINTF(CURL (AIO): Sock action %d on fd %d\n, action, fd);
 switch (action) {
 case CURL_POLL_IN:
-qemu_aio_set_fd_handler(fd, curl_multi_do, NULL, curl_aio_flush, 
s);
+qemu_aio_set_fd_handler(fd, curl_multi_do, NULL, NULL, s);
 break;
 case CURL_POLL_OUT:
-qemu_aio_set_fd_handler(fd, NULL, curl_multi_do, curl_aio_flush, 
s);
+qemu_aio_set_fd_handler(fd, NULL, curl_multi_do, NULL, s);
 break;
 case CURL_POLL_INOUT:
 qemu_aio_set_fd_handler(fd, curl_multi_do, curl_multi_do,
-curl_aio_flush, s);
+NULL, s);
 break;
 case CURL_POLL_REMOVE:
 qemu_aio_set_fd_handler(fd, NULL, NULL, NULL, NULL);
@@ -495,21 +494,6 @@ out_noclean:
 return -EINVAL;
 }
 
-static int curl_aio_flush(void *opaque)
-{
-BDRVCURLState *s = opaque;
-int i, j;
-
-for (i=0; i  CURL_NUM_STATES; i++) {
-for(j=0; j  CURL_NUM_ACB; j++) {
-if (s-states[i].acb[j]) {
-return 1;
-}
-}
-}
-return 0;
-}
-
 static void curl_aio_cancel(BlockDriverAIOCB *blockacb)
 {
 // Do we have to implement canceling? Seems to work without...
-- 
1.8.3.1

[Qemu-devel] [PULL 12/18] block/rbd: drop qemu_rbd_aio_flush_cb()

2013-08-19 Thread Stefan Hajnoczi

.io_flush() is no longer called so drop qemu_rbd_aio_flush_cb().
qemu_aio_count is unused now so drop it too.

Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 block/rbd.c | 14 +-
 1 file changed, 1 insertion(+), 13 deletions(-)

diff --git a/block/rbd.c b/block/rbd.c
index cb71751..71b4a0c 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -100,7 +100,6 @@ typedef struct BDRVRBDState {
 rados_ioctx_t io_ctx;
 rbd_image_t image;
 char name[RBD_MAX_IMAGE_NAME_SIZE];
-int qemu_aio_count;
 char *snap;
 int event_reader_pos;
 RADOSCB *event_rcb;
@@ -428,19 +427,11 @@ static void qemu_rbd_aio_event_reader(void *opaque)
 if (s-event_reader_pos == sizeof(s-event_rcb)) {
 s-event_reader_pos = 0;
 qemu_rbd_complete_aio(s-event_rcb);
-s-qemu_aio_count--;
 }
 }
 } while (ret  0  errno == EINTR);
 }
 
-static int qemu_rbd_aio_flush_cb(void *opaque)
-{
-BDRVRBDState *s = opaque;
-
-return (s-qemu_aio_count  0);
-}
-
 /* TODO Convert to fine grained options */
 static QemuOptsList runtime_opts = {
 .name = rbd,
@@ -554,7 +545,7 @@ static int qemu_rbd_open(BlockDriverState *bs, QDict 
*options, int flags)
 fcntl(s-fds[0], F_SETFL, O_NONBLOCK);
 fcntl(s-fds[1], F_SETFL, O_NONBLOCK);
 qemu_aio_set_fd_handler(s-fds[RBD_FD_READ], qemu_rbd_aio_event_reader,
-NULL, qemu_rbd_aio_flush_cb, s);
+NULL, NULL, s);
 
 
 qemu_opts_del(opts);
@@ -741,8 +732,6 @@ static BlockDriverAIOCB *rbd_start_aio(BlockDriverState *bs,
 off = sector_num * BDRV_SECTOR_SIZE;
 size = nb_sectors * BDRV_SECTOR_SIZE;
 
-s-qemu_aio_count++; /* All the RADOSCB */
-
 rcb = g_malloc(sizeof(RADOSCB));
 rcb-done = 0;
 rcb-acb = acb;
@@ -779,7 +768,6 @@ static BlockDriverAIOCB *rbd_start_aio(BlockDriverState *bs,
 
 failed:
 g_free(rcb);
-s-qemu_aio_count--;
 qemu_aio_release(acb);
 return NULL;
 }
-- 
1.8.3.1

[Qemu-devel] [PULL 03/18] dataplane/virtio-blk: check exit conditions before aio_poll()

2013-08-19 Thread Stefan Hajnoczi

Check exit conditions before entering blocking aio_poll().  This is
mainly for consistency since it's unlikely that we are stopping in the
first event loop iteration.

Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 hw/block/dataplane/virtio-blk.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index 411becc..5bd5eed 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -376,9 +376,9 @@ static void *data_plane_thread(void *opaque)
 {
 VirtIOBlockDataPlane *s = opaque;
 
-do {
+while (!s-stopping || s-num_reqs  0) {
 aio_poll(s-ctx, true);
-} while (!s-stopping || s-num_reqs  0);
+}
 return NULL;
 }
 
-- 
1.8.3.1

[Qemu-devel] [PULL 05/18] tests: adjust test-thread-pool to new aio_poll() semantics

2013-08-19 Thread Stefan Hajnoczi

aio_poll(ctx, true) will soon block when fd handlers have been set.
Previously aio_poll() would return early if all .io_flush() returned
false.  This means we need to check the equivalent of the .io_flush()
condition *before* calling aio_poll(ctx, true) to avoid deadlock.

Reviewed-by: Paolo Bonzini pbonz...@redhat.com
Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 tests/test-thread-pool.c | 24 
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/tests/test-thread-pool.c b/tests/test-thread-pool.c
index b62338f..8188d1a 100644
--- a/tests/test-thread-pool.c
+++ b/tests/test-thread-pool.c
@@ -40,19 +40,13 @@ static void done_cb(void *opaque, int ret)
 active--;
 }
 
-/* Wait until all aio and bh activity has finished */
-static void qemu_aio_wait_all(void)
-{
-while (aio_poll(ctx, true)) {
-/* Do nothing */
-}
-}
-
 static void test_submit(void)
 {
 WorkerTestData data = { .n = 0 };
 thread_pool_submit(pool, worker_cb, data);
-qemu_aio_wait_all();
+while (data.n == 0) {
+aio_poll(ctx, true);
+}
 g_assert_cmpint(data.n, ==, 1);
 }
 
@@ -65,7 +59,9 @@ static void test_submit_aio(void)
 /* The callbacks are not called until after the first wait.  */
 active = 1;
 g_assert_cmpint(data.ret, ==, -EINPROGRESS);
-qemu_aio_wait_all();
+while (data.ret == -EINPROGRESS) {
+aio_poll(ctx, true);
+}
 g_assert_cmpint(active, ==, 0);
 g_assert_cmpint(data.n, ==, 1);
 g_assert_cmpint(data.ret, ==, 0);
@@ -103,7 +99,9 @@ static void test_submit_co(void)
 
 /* qemu_aio_wait_all will execute the rest of the coroutine.  */
 
-qemu_aio_wait_all();
+while (data.ret == -EINPROGRESS) {
+aio_poll(ctx, true);
+}
 
 /* Back here after the coroutine has finished.  */
 
@@ -187,7 +185,9 @@ static void test_cancel(void)
 }
 
 /* Finish execution and execute any remaining callbacks.  */
-qemu_aio_wait_all();
+while (active  0) {
+aio_poll(ctx, true);
+}
 g_assert_cmpint(active, ==, 0);
 for (i = 0; i  100; i++) {
 if (data[i].n == 3) {
-- 
1.8.3.1

1 2 >

1 - 100 of 190 matches

Mail list logo