date:20160926

Re: [Qemu-devel] [PATCH v3 07/10] ppc/pnv: add XSCOM infrastructure

2016-09-26 Thread Cédric Le Goater

On 09/27/2016 04:35 AM, David Gibson wrote:
> On Mon, Sep 26, 2016 at 06:11:36PM +0200, Cédric Le Goater wrote:
>> On 09/23/2016 04:46 AM, David Gibson wrote:
>>> On Thu, Sep 22, 2016 at 10:25:59AM +0200, Cédric Le Goater wrote:
>> @@ -493,6 +525,8 @@ static void pnv_chip_power9_class_init(ObjectClass 
>> *klass, void *data)
>>  k->chip_cfam_id = 0x100d10498000ull; /* P9 Nimbus DD1.0 */
>>  k->cores_mask = POWER9_CORE_MASK;
>>  k->core_pir = pnv_chip_core_pir_p9;
>> +k->xscom_addr = pnv_chip_xscom_addr_p9;
>> +k->xscom_pcba = pnv_chip_xscom_pcba_p9;
>
> So if you do as BenH (and I) suggested and have the "scom address
> space" actually be addressed by (pcba << 3), I think you can probably
> avoid these.  

 I will look at that option again. 

 I was trying to untangle a few things at the same time. I have better
 view of the problem to solve now. The bus is gone, that's was one 
 thing. How we map these xscom regions is the next. 

 Ben suggested to add some P7/P8 mangling before the dispatch in 
 the _space_xscom. This should make things cleaner. I had 
 not thought of doing that and this is why I introduced these helpers :

 +uint32_t pnv_xscom_pcba(PnvXScomInterface *dev, uint64_t addr)
 +uint64_t pnv_xscom_addr(PnvXScomInterface *dev, uint32_t pcba)

 which I don't really like ...

 but we must make sure that we can do the mapping of the xscom 
 subregions in the _space_xscom using (pcba << 3)


> Instead you can handle it in the chip or ADU realize function by either:
>
> P8: * map one big subregion for the ADU into _space_memory
> * have the handler for that subregion do the address mangling,
>   then redispatch into the xscom address space
>
> P9: * Map the appropriate chunk of the xscom address space
>   directly into address_space_memory

 Yes that was my feeling for a better solution but Ben chimed in with the 
 HMER topic. I need to look at that.
>>>
>>> Right.  Doesn't change the basic concept though - it just means you
>>> need (slightly different) redispatchers for both P8 and P9.
>>
>> In fact they are the same, you only need an "addr to pcba" handler at the
>> chip class level : 
> 
> Ok.  I'd been thinking of using different dispatchers as an
> alternative to using the chip class translator hook, 

ah. yes, why not. We could have per-chip dispatchers but they 
would have a lot in common. However, I think we can get rid of 
the xscom_pcba' handlers, they should not be needed any where 
else than in the XSCOM dispatchers. 

> but I guess if you have the decoding of those "core" registers 
> here as well, then that doesn't make so much sense.

yes and there is also the handling of the XSCOM failures.

I can add some prologue handler to cover those "core" registers
but adding a MemoryRegion, ops, init and mapping would be a lot 
of churn just to return 0.

Thanks,

C. 


>> static uint64_t xscom_read(void *opaque, hwaddr addr, unsigned width)
>> {
>>  PnvChip *chip = opaque;
>>  uint32_t pcba = PNV_CHIP_GET_CLASS(chip)->xscom_pcba(addr);
>>  uint64_t val = 0;
>>  MemTxResult result;
>>
>>  ...
>>
>> val = address_space_ldq(>xscom_as, pcba << 3,
>> MEMTXATTRS_UNSPECIFIED, );
>> if (result != MEMTX_OK) {
>>
>>   
>>
>> And so, the result is pretty clean. I killed the proxy object and merged 
>> the regions in the chip but I have kept the pnv_xscom.c file because the 
>> code related to xscom is rather large : ~250 lines. 
> 
> Sure, makes sense.
> 
>> The objects declaring a xscom region need to do some register shifting but 
>> this is usual in mmio regions.
>>
>> You will see in v4.
> 
> Ok.
> 
>> +static bool xscom_dispatch_read(PnvXScom *xscom, hwaddr addr, uint64_t 
>> *val)
>> +{
>> +uint32_t success;
>> +uint8_t data[8];
>> +
>> +success = !address_space_rw(>xscom_as, addr, 
>> MEMTXATTRS_UNSPECIFIED,
>> +data, 8, false);
>> +*val = (((uint64_t) data[0]) << 56 |
>> +((uint64_t) data[1]) << 48 |
>> +((uint64_t) data[2]) << 40 |
>> +((uint64_t) data[3]) << 32 |
>> +((uint64_t) data[4]) << 24 |
>> +((uint64_t) data[5]) << 16 |
>> +((uint64_t) data[6]) << 8  |
>> +((uint64_t) data[7]));
>
> AFAICT this is basically assuming data is always encoded BE.  With the
> right choice of endian flags on the individual SCOM device
> registrations with the scom address space, I think you should be able
> to avoid this mangling.

 yes. I should but curiously I had to do this, and this works the same on
 an intel host or a ppc64 host.
>>>
>>> Hmm.. I suspect what you actually need is

[Qemu-devel] [PATCH] 9pfs: fix information leak in xattr read

2016-09-26 Thread Li Qiang

From: Li Qiang 

9pfs uses g_malloc() to allocate the xattr memory space, if the guest
reads this memory before writing to it, this will leak host heap
memory to the guest. This patch avoid this.

Signed-off-by: Li Qiang 
---
 hw/9pfs/9p.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
index b1ff8e7..4db1bd8 100644
--- a/hw/9pfs/9p.c
+++ b/hw/9pfs/9p.c
@@ -3291,7 +3291,7 @@ static void v9fs_xattrcreate(void *opaque)
 xattr_fidp->fs.xattr.flags = flags;
 v9fs_string_init(_fidp->fs.xattr.name);
 v9fs_string_copy(_fidp->fs.xattr.name, );
-xattr_fidp->fs.xattr.value = g_malloc(size);
+xattr_fidp->fs.xattr.value = g_malloc0(size);
 err = offset;
 put_fid(pdu, file_fidp);
 out_nofid:
-- 
1.8.3.1

[Qemu-devel] [PATCH] 9pfs: fix potential host memory leak in v9fs_read

2016-09-26 Thread Li Qiang

From: Li Qiang 

In 9pfs read dispatch function, it doesn't free two QEMUIOVector
object thus causing potential memory leak. This patch avoid this.

Signed-off-by: Li Qiang 
---
 hw/9pfs/9p.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
index d960a2e..b1ff8e7 100644
--- a/hw/9pfs/9p.c
+++ b/hw/9pfs/9p.c
@@ -1830,12 +1830,16 @@ static void v9fs_read(void *opaque)
 } while (len == -EINTR && !pdu->cancelled);
 if (len < 0) {
 /* IO error return the error */
+qemu_iovec_destroy();
+qemu_iovec_destroy(_full);
 err = len;
 goto out;
 }
 } while (count < max_count && len > 0);
 err = pdu_marshal(pdu, offset, "d", count);
 if (err < 0) {
+qemu_iovec_destroy();
+qemu_iovec_destroy(_full);
 goto out;
 }
 err += offset + count;
-- 
1.8.3.1

[Qemu-devel] [PATCH] 9pfs: make illegal path name detection more robust

2016-09-26 Thread Li Qiang

From: Li Qiang 

The parameter of name_is_illegal can be NULL, adding detection of
this to avoid NULL pointer dereference issue.

Signed-off-by: Li Qiang 
---
 hw/9pfs/9p.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
index dd3145c..d960a2e 100644
--- a/hw/9pfs/9p.c
+++ b/hw/9pfs/9p.c
@@ -1277,7 +1277,7 @@ static int v9fs_walk_marshal(V9fsPDU *pdu, uint16_t 
nwnames, V9fsQID *qids)
 
 static bool name_is_illegal(const char *name)
 {
-return !*name || strchr(name, '/') != NULL;
+return !name || !*name || strchr(name, '/') != NULL;
 }
 
 static bool not_same_qid(const V9fsQID *qid1, const V9fsQID *qid2)
-- 
1.8.3.1

[Qemu-devel] [PATCH] 9pfs: fix NULL pointer dereference in v9fs_version

2016-09-26 Thread Li Qiang

From: Li Qiang 

In 9pfs get version dispatch function, a guest can provide a NULL
version string thus causing an NULL pointer dereference issue.
This patch fix this issue.

Signed-off-by: Li Qiang 
---
 hw/9pfs/9p.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
index 119ee58..dd3145c 100644
--- a/hw/9pfs/9p.c
+++ b/hw/9pfs/9p.c
@@ -955,6 +955,11 @@ static void v9fs_version(void *opaque)
 offset = err;
 goto out;
 }
+
+if (!version.data) {
+offset = -EINVAL;
+goto out;
+}
 trace_v9fs_version(pdu->tag, pdu->id, s->msize, version.data);
 
 virtfs_reset(pdu);
-- 
1.8.3.1

Re: [Qemu-devel] [PATCH] tests: Test IPv6 and ppc64 in the PXE tester

2016-09-26 Thread David Gibson

On Mon, Sep 26, 2016 at 10:17:46PM +0200, Thomas Huth wrote:
> The firmware of the pseries machine, SLOF, is able to load files via
> IPv6 networking, too. So to test both, network bootloading on ppc64
> and IPv6 (via Slirp) , let's add some PXE tests for this environment,
> too. Since we can not use the normal x86 boot sector for network boot
> loading, we use a simple Forth script on ppc64 instead.
> 
> Signed-off-by: Thomas Huth 

I certainly approve of testing IPv6 more, a couple of queries about
the details though:

> ---
>  tests/Makefile.include |  1 +
>  tests/boot-sector.c|  9 +
>  tests/pxe-test.c   | 22 +++---
>  3 files changed, 25 insertions(+), 7 deletions(-)
> 
> diff --git a/tests/Makefile.include b/tests/Makefile.include
> index d8101b3..18bc698 100644
> --- a/tests/Makefile.include
> +++ b/tests/Makefile.include
> @@ -270,6 +270,7 @@ check-qtest-ppc64-y += tests/drive_del-test$(EXESUF)
>  check-qtest-ppc64-y += tests/postcopy-test$(EXESUF)
>  check-qtest-ppc64-y += tests/boot-serial-test$(EXESUF)
>  check-qtest-ppc64-y += tests/rtas-test$(EXESUF)
> +check-qtest-ppc64-y += tests/pxe-test$(EXESUF)
>  
>  check-qtest-sh4-y = tests/endianness-test$(EXESUF)
>  
> diff --git a/tests/boot-sector.c b/tests/boot-sector.c
> index 3ffe298..e3193c0 100644
> --- a/tests/boot-sector.c
> +++ b/tests/boot-sector.c
> @@ -77,6 +77,15 @@ int boot_sector_init(const char *fname)
>  fprintf(stderr, "Couldn't open \"%s\": %s", fname, strerror(errno));
>  return 1;
>  }
> +
> +/* For Open Firmware based system, we can use a Forth script instead */
> +if (strcmp(qtest_get_arch(), "ppc64") == 0) {

As always, I'm uneasy about using arch based tests for what's really a
machine type property.  Still, as a test case, I guess we can fix that
when and if someone actually tries to run it for a ppc machine that's
not spapr (or an x86 machine that's not pc, theoretically speaking).

> +memset(boot_sector, ' ', sizeof boot_sector);
> +sprintf((char *)boot_sector, "\\ Bootscript\n%x %x c! %x %x c!\n",
> +LOW(SIGNATURE), BOOT_SECTOR_ADDRESS + SIGNATURE_OFFSET,
> +HIGH(SIGNATURE), BOOT_SECTOR_ADDRESS + SIGNATURE_OFFSET + 1);
> +}
> +
>  fwrite(boot_sector, 1, sizeof boot_sector, f);
>  fclose(f);
>  return 0;
> diff --git a/tests/pxe-test.c b/tests/pxe-test.c
> index b2cc355..0bdb7a1 100644
> --- a/tests/pxe-test.c
> +++ b/tests/pxe-test.c
> @@ -21,14 +21,14 @@
>  
>  static const char *disk = "tests/pxe-test-disk.raw";
>  
> -static void test_pxe_one(const char *params)
> +static void test_pxe_one(const char *params, bool ipv6)

Is it wise to keep the "PXE" name.  OF style netbooting isn't really
PXE in the sense of the Intel PXE spec, although it overlaps in the
underlying protocols used.


>  {
>  char *args;
>  
> -args = g_strdup_printf("-machine accel=tcg "
> -   "-netdev user,id=" NETNAME ",tftp=./,bootfile=%s "
> -   "%s ",
> -   disk, params);
> +args = g_strdup_printf("-machine accel=tcg -boot order=n "
> +   "-netdev user,id=" NETNAME ",tftp=./,bootfile=%s,"
> +   "ipv4=%s,ipv6=%s %s", disk, ipv6 ? "off" : "on",
> +   ipv6 ? "on" : "off", params);
>  
>  qtest_start(args);
>  boot_sector_test();
> @@ -38,12 +38,17 @@ static void test_pxe_one(const char *params)
>  
>  static void test_pxe_e1000(void)
>  {
> -test_pxe_one("-device e1000,netdev=" NETNAME);
> +test_pxe_one("-device e1000,netdev=" NETNAME, false);
>  }
>  
>  static void test_pxe_virtio_pci(void)
>  {
> -test_pxe_one("-device virtio-net-pci,netdev=" NETNAME);
> +test_pxe_one("-device virtio-net-pci,netdev=" NETNAME, false);
> +}
> +
> +static void test_pxe_spapr_vlan(void)
> +{
> +test_pxe_one("-vga none -device spapr-vlan,netdev=" NETNAME, true);

Shouldn't we test both IPv4 *and* IPv6 for spapr - AFAICT this is only
testing v6.

>  }
>  
>  int main(int argc, char *argv[])
> @@ -60,6 +65,9 @@ int main(int argc, char *argv[])
>  if (strcmp(arch, "i386") == 0 || strcmp(arch, "x86_64") == 0) {
>  qtest_add_func("pxe/e1000", test_pxe_e1000);
>  qtest_add_func("pxe/virtio", test_pxe_virtio_pci);
> +} else if (strcmp(arch, "ppc64") == 0) {
> +qtest_add_func("pxe/virtio", test_pxe_virtio_pci);
> +qtest_add_func("pxe/spapr-vlan", test_pxe_spapr_vlan);
>  }
>  ret = g_test_run();
>  boot_sector_cleanup(disk);

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH v2 2/6] intc/i8259: implement InterruptStatsProvider interface

2016-09-26 Thread David Gibson

On Mon, Sep 26, 2016 at 10:23:24PM +0200, Hervé Poussineau wrote:
> Signed-off-by: Hervé Poussineau 
> ---
>  hw/intc/i8259.c | 37 +
>  1 file changed, 37 insertions(+)
> 
> diff --git a/hw/intc/i8259.c b/hw/intc/i8259.c
> index c2607a5..75c8d22 100644
> --- a/hw/intc/i8259.c
> +++ b/hw/intc/i8259.c
> @@ -29,6 +29,7 @@
>  #include "qemu/timer.h"
>  #include "qemu/log.h"
>  #include "hw/isa/i8259_internal.h"
> +#include "hw/intc/intc.h"
>  
>  /* debug PIC */
>  //#define DEBUG_PIC
> @@ -251,6 +252,35 @@ static void pic_reset(DeviceState *dev)
>  pic_init_reset(s);
>  }
>  
> +static bool pic_get_statistics(InterruptStatsProvider *obj,
> +   uint64_t **irq_counts, unsigned int *nb_irqs)
> +{
> +PICCommonState *s = PIC_COMMON(obj);
> +
> +if (s->master) {
> +#ifdef DEBUG_IRQ_COUNT
> +*irq_counts = irq_count;

So, the irq_counts return parameter is set to point at an internal
structure of the intc, in this and the other implementations.

Is that safe, without some contract about how long the array pointer
is valid and/or correct?  Could it be a problem if in future we tried
to implement this for an intc that doesn't keep irq stats as a simple
array (e.g. kept the count in a structure also containing other
information for each irq)?

I'm wondering if a safer interface might be to actually copy out a
snapshot of the counts, which the caller is responsible for freeing.

> +*nb_irqs = ARRAY_SIZE(irq_count);
> +#else
> +return false;
> +#endif
> +} else {
> +*irq_counts = NULL;
> +*nb_irqs = 0;
> +}
> +return true;
> +}
> +
> +static void pic_print_info(InterruptStatsProvider *obj, Monitor *mon)
> +{
> +PICCommonState *s = PIC_COMMON(obj);
> +monitor_printf(mon, "pic%d: irr=%02x imr=%02x isr=%02x hprio=%d "
> +   "irq_base=%02x rr_sel=%d elcr=%02x fnm=%d\n",
> +   s->master ? 0 : 1, s->irr, s->imr, s->isr, 
> s->priority_add,
> +   s->irq_base, s->read_reg_select, s->elcr,
> +   s->special_fully_nested_mode);
> +}
> +
>  static void pic_ioport_write(void *opaque, hwaddr addr64,
>   uint64_t val64, unsigned size)
>  {
> @@ -503,10 +533,13 @@ static void i8259_class_init(ObjectClass *klass, void 
> *data)
>  {
>  PICClass *k = PIC_CLASS(klass);
>  DeviceClass *dc = DEVICE_CLASS(klass);
> +InterruptStatsProviderClass *ic = INTERRUPT_STATS_PROVIDER_CLASS(klass);
>  
>  k->parent_realize = dc->realize;
>  dc->realize = pic_realize;
>  dc->reset = pic_reset;
> +ic->get_statistics = pic_get_statistics;
> +ic->print_info = pic_print_info;
>  }
>  
>  static const TypeInfo i8259_info = {
> @@ -515,6 +548,10 @@ static const TypeInfo i8259_info = {
>  .parent = TYPE_PIC_COMMON,
>  .class_init = i8259_class_init,
>  .class_size = sizeof(PICClass),
> +.interfaces = (InterfaceInfo[]) {
> +{ TYPE_INTERRUPT_STATS_PROVIDER },
> +{ }
> +},
>  };
>  
>  static void pic_register_types(void)

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH v2 0/6] intc: change 'info irq' and 'info pic' to be target-agnostic

2016-09-26 Thread David Gibson

On Mon, Sep 26, 2016 at 10:23:22PM +0200, Hervé Poussineau wrote:
> Hi,
> 
> This patchset aims at genericizing the 'info irq' and 'info pic' HMP 
> commands, so
> that it is available on all machines and can display details about more than 
> one
> interrupt controller per machine.
> 
> Patch 1 adds a new interface InterruptStatsProvider, which is used to:
> - gather statistics for the 'info irq' command
> - print some text when 'info pic' is called
> 
> Patches 2 to 4 implement InterruptStatsProvider interface on interrupt 
> controllers
> which have ad-hock code to handle 'info irq'/'info pic' commands.
> 
> Patch 5 removes ad-hock code, and replaces it by a generic version. You can 
> get
> details about multiple interrupt controllers per machine starting here.
> 
> Patch 6 makes 'info irq'/'info pic' commands available on all architectures.
> For example, Alpha clipper machine is now able to display details about the
> i8259 interrupt controller.

Thanks for doing this.  I didn't spot your first version of this, but
it's a rather more thorough approach to some cleanups I attempted a
while back, but never got around to completing.  Looks like a nice
approach.

> 
> Changes since v1:
> - renamed interface from IntCtrl to InterruptStatsProvider
> 
> Hervé
> 
> Hervé Poussineau (6):
>   intc: add an interface to gather statistics/informations on interrupt
> controllers
>   intc/i8259: implement InterruptStatsProvider interface
>   intc/slavio_intctl: implement InterruptStatsProvider interface
>   intc/lm32_pic: implement InterruptStatsProvider interface
>   intc: make HMP 'info irq' and 'info pic' commands use
> InterruptStatsProvider interface
>   intc: make HMP 'info irq' and 'info pic' commands available on all
> targets
> 
>  hmp-commands-info.hx   | 17 +--
>  hmp.c  | 65 +
>  hmp.h  |  2 ++
>  hw/intc/Makefile.objs  |  1 +
>  hw/intc/i8259.c| 73 
> +++---
>  hw/intc/intc.c | 41 ++
>  hw/intc/lm32_pic.c | 63 ++-
>  hw/intc/slavio_intctl.c| 67 ++
>  hw/sparc/sun4m.c   | 15 +-
>  include/hw/i386/pc.h   |  2 --
>  include/hw/intc/intc.h | 30 +++
>  include/hw/lm32/lm32_pic.h |  3 --
>  include/hw/sparc/sun4m.h   |  8 -
>  monitor.c  |  6 
>  14 files changed, 241 insertions(+), 152 deletions(-)
>  create mode 100644 hw/intc/intc.c
>  create mode 100644 include/hw/intc/intc.h
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [Qemu-ppc] [PATCH 1/4] libqos: add PPC64 PCI support

2016-09-26 Thread David Gibson

On Mon, Sep 26, 2016 at 04:10:46PM +0200, Laurent Vivier wrote:
> Signed-off-by: Laurent Vivier 

Reviewed-by: David Gibson 

> ---
>  tests/Makefile.include   |   1 +
>  tests/libqos/pci-pc.c|  22 
>  tests/libqos/pci-spapr.c | 280 
> +++
>  tests/libqos/pci-spapr.h |  17 +++
>  tests/libqos/pci.c   |  22 +++-
>  tests/libqos/rtas.c  |  45 
>  tests/libqos/rtas.h  |   4 +
>  7 files changed, 368 insertions(+), 23 deletions(-)
>  create mode 100644 tests/libqos/pci-spapr.c
>  create mode 100644 tests/libqos/pci-spapr.h
> 
> diff --git a/tests/Makefile.include b/tests/Makefile.include
> index d8101b3..967426a 100644
> --- a/tests/Makefile.include
> +++ b/tests/Makefile.include
> @@ -589,6 +589,7 @@ libqos-obj-y += tests/libqos/i2c.o tests/libqos/libqos.o
>  libqos-spapr-obj-y = $(libqos-obj-y) tests/libqos/malloc-spapr.o
>  libqos-spapr-obj-y += tests/libqos/libqos-spapr.o
>  libqos-spapr-obj-y += tests/libqos/rtas.o
> +libqos-spapr-obj-y += tests/libqos/pci-spapr.o
>  libqos-pc-obj-y = $(libqos-obj-y) tests/libqos/pci-pc.o
>  libqos-pc-obj-y += tests/libqos/malloc-pc.o tests/libqos/libqos-pc.o
>  libqos-pc-obj-y += tests/libqos/ahci.o
> diff --git a/tests/libqos/pci-pc.c b/tests/libqos/pci-pc.c
> index 1ae2d37..82066b8 100644
> --- a/tests/libqos/pci-pc.c
> +++ b/tests/libqos/pci-pc.c
> @@ -255,28 +255,6 @@ void qpci_free_pc(QPCIBus *bus)
>  g_free(s);
>  }
>  
> -void qpci_plug_device_test(const char *driver, const char *id,
> -   uint8_t slot, const char *opts)
> -{
> -QDict *response;
> -char *cmd;
> -
> -cmd = g_strdup_printf("{'execute': 'device_add',"
> -  " 'arguments': {"
> -  "   'driver': '%s',"
> -  "   'addr': '%d',"
> -  "   %s%s"
> -  "   'id': '%s'"
> -  "}}", driver, slot,
> -  opts ? opts : "", opts ? "," : "",
> -  id);
> -response = qmp(cmd);
> -g_free(cmd);
> -g_assert(response);
> -g_assert(!qdict_haskey(response, "error"));
> -QDECREF(response);
> -}
> -
>  void qpci_unplug_acpi_device_test(const char *id, uint8_t slot)
>  {
>  QDict *response;
> diff --git a/tests/libqos/pci-spapr.c b/tests/libqos/pci-spapr.c
> new file mode 100644
> index 000..30fbcbd
> --- /dev/null
> +++ b/tests/libqos/pci-spapr.c
> @@ -0,0 +1,280 @@
> +/*
> + * libqos PCI bindings for SPAPR
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "libqtest.h"
> +#include "libqos/pci-spapr.h"
> +#include "libqos/rtas.h"
> +
> +#include "hw/pci/pci_regs.h"
> +
> +#include "qemu-common.h"
> +#include "qemu/host-utils.h"
> +
> +
> +/* From include/hw/pci-host/spapr.h */
> +
> +#define SPAPR_PCI_BASE_BUID  0x8002000ULL
> +
> +#define SPAPR_PCI_MEM_WIN_BUS_OFFSET 0x8000ULL
> +
> +#define SPAPR_PCI_WINDOW_BASE0x100ULL
> +#define SPAPR_PCI_WINDOW_SPACING 0x10ULL
> +#define SPAPR_PCI_MMIO_WIN_OFF   0xA000
> +#define SPAPR_PCI_MMIO_WIN_SIZE  (SPAPR_PCI_WINDOW_SPACING - \
> + SPAPR_PCI_MEM_WIN_BUS_OFFSET)
> +#define SPAPR_PCI_IO_WIN_OFF 0x8000
> +#define SPAPR_PCI_IO_WIN_SIZE0x1
> +
> +/* index is the phb index */
> +
> +#define BUIDBASE(index)  (SPAPR_PCI_BASE_BUID + (index))
> +#define PCIBASE(index)   (SPAPR_PCI_WINDOW_BASE + \
> +  (index) * SPAPR_PCI_WINDOW_SPACING)
> +#define IOBASE(index)(PCIBASE(index) + SPAPR_PCI_IO_WIN_OFF)
> +#define MMIOBASE(index)  (PCIBASE(index) + 
> SPAPR_PCI_MMIO_WIN_OFF)
> +
> +typedef struct QPCIBusSPAPR {
> +QPCIBus bus;
> +QGuestAllocator *alloc;
> +
> +uint64_t pci_hole_start;
> +uint64_t pci_hole_size;
> +uint64_t pci_hole_alloc;
> +
> +uint32_t pci_iohole_start;
> +uint32_t pci_iohole_size;
> +uint32_t pci_iohole_alloc;
> +} QPCIBusSPAPR;
> +
> +static uint8_t qpci_spapr_io_readb(QPCIBus *bus, void *addr)
> +{
> +uint64_t port = (uint64_t)addr;
> +uint8_t v;
> +if (port < SPAPR_PCI_IO_WIN_SIZE) {
> +v = readb(IOBASE(0) + port);
> +} else {
> +v = readb(MMIOBASE(0) + port);
> +}
> +return v;
> +}
> +
> +static uint16_t qpci_spapr_io_readw(QPCIBus *bus, void *addr)
> +{
> +uint64_t port = (uint64_t)addr;
> +uint16_t v;
> +if (port < SPAPR_PCI_IO_WIN_SIZE) {
> +v = readw(IOBASE(0) + port);
> +} else {
> +v = readw(MMIOBASE(0) + port);
> +}
> +return be16_to_cpu(v);
> +}
> +
> +static uint32_t qpci_spapr_io_readl(QPCIBus *bus, void *addr)
> +{
> +uint64_t port =

Re: [Qemu-devel] [Qemu-ppc] [PATCH 2/4] libqos: add PCI management in qtest_vboot()/qtest_shutdown()

2016-09-26 Thread David Gibson

On Mon, Sep 26, 2016 at 04:10:47PM +0200, Laurent Vivier wrote:
> Signed-off-by: Laurent Vivier 
> ---
>  tests/e1000e-test.c |  2 +-
>  tests/i440fx-test.c |  2 +-
>  tests/ide-test.c|  2 +-
>  tests/ivshmem-test.c|  2 +-
>  tests/libqos/ahci.c |  2 +-
>  tests/libqos/libqos-pc.c|  5 -
>  tests/libqos/libqos-spapr.c |  5 -
>  tests/libqos/libqos.c   | 21 -
>  tests/libqos/libqos.h   |  3 +++
>  tests/libqos/pci-pc.c   |  2 +-
>  tests/libqos/pci-pc.h   |  3 ++-
>  tests/q35-test.c|  2 +-
>  tests/rtl8139-test.c|  2 +-
>  tests/tco-test.c|  2 +-
>  tests/usb-hcd-ehci-test.c   |  2 +-
>  tests/usb-hcd-uhci-test.c   |  2 +-
>  tests/vhost-user-test.c |  2 +-
>  tests/virtio-9p-test.c  |  2 +-
>  tests/virtio-blk-test.c |  2 +-
>  tests/virtio-net-test.c |  2 +-
>  tests/virtio-scsi-test.c|  2 +-
>  21 files changed, 45 insertions(+), 24 deletions(-)

Couple of queries below.

> 
> diff --git a/tests/e1000e-test.c b/tests/e1000e-test.c
> index d497b08..3979b20 100644
> --- a/tests/e1000e-test.c
> +++ b/tests/e1000e-test.c
> @@ -390,7 +390,7 @@ static void data_test_init(e1000e_device *d)
>  qtest_start(cmdline);
>  g_free(cmdline);
>  
> -test_bus = qpci_init_pc();
> +test_bus = qpci_init_pc(NULL);
>  g_assert_nonnull(test_bus);
>  
>  test_alloc = pc_alloc_init();
> diff --git a/tests/i440fx-test.c b/tests/i440fx-test.c
> index 3542ad1..da2d5a5 100644
> --- a/tests/i440fx-test.c
> +++ b/tests/i440fx-test.c
> @@ -38,7 +38,7 @@ static QPCIBus *test_start_get_bus(const TestData *s)
>  cmdline = g_strdup_printf("-smp %d", s->num_cpus);
>  qtest_start(cmdline);
>  g_free(cmdline);
> -return qpci_init_pc();
> +return qpci_init_pc(NULL);
>  }
>  
>  static void test_i440fx_defaults(gconstpointer opaque)
> diff --git a/tests/ide-test.c b/tests/ide-test.c
> index 1e51af2..a8a4081 100644
> --- a/tests/ide-test.c
> +++ b/tests/ide-test.c
> @@ -143,7 +143,7 @@ static QPCIDevice *get_pci_device(uint16_t *bmdma_base)
>  uint16_t vendor_id, device_id;
>  
>  if (!pcibus) {
> -pcibus = qpci_init_pc();
> +pcibus = qpci_init_pc(NULL);
>  }
>  
>  /* Find PCI device and verify it's the right one */
> diff --git a/tests/ivshmem-test.c b/tests/ivshmem-test.c
> index 0957ee7..f36bfe7 100644
> --- a/tests/ivshmem-test.c
> +++ b/tests/ivshmem-test.c
> @@ -105,7 +105,7 @@ static void setup_vm_cmd(IVState *s, const char *cmd, 
> bool msix)
>  uint64_t barsize;
>  
>  s->qtest = qtest_start(cmd);
> -s->pcibus = qpci_init_pc();
> +s->pcibus = qpci_init_pc(NULL);
>  s->dev = get_device(s->pcibus);
>  
>  s->reg_base = qpci_iomap(s->dev, 0, );
> diff --git a/tests/libqos/ahci.c b/tests/libqos/ahci.c
> index f3be550..716ab79 100644
> --- a/tests/libqos/ahci.c
> +++ b/tests/libqos/ahci.c
> @@ -128,7 +128,7 @@ QPCIDevice *get_ahci_device(uint32_t *fingerprint)
>  uint32_t ahci_fingerprint;
>  QPCIBus *pcibus;
>  
> -pcibus = qpci_init_pc();
> +pcibus = qpci_init_pc(NULL);
>  
>  /* Find the AHCI PCI device and verify it's the right one. */
>  ahci = qpci_device_find(pcibus, QPCI_DEVFN(0x1F, 0x02));
> diff --git a/tests/libqos/libqos-pc.c b/tests/libqos/libqos-pc.c
> index df34092..aa17c98 100644
> --- a/tests/libqos/libqos-pc.c
> +++ b/tests/libqos/libqos-pc.c
> @@ -1,10 +1,13 @@
>  #include "qemu/osdep.h"
>  #include "libqos/libqos-pc.h"
>  #include "libqos/malloc-pc.h"
> +#include "libqos/pci-pc.h"
>  
>  static QOSOps qos_ops = {
>  .init_allocator = pc_alloc_init_flags,
> -.uninit_allocator = pc_alloc_uninit
> +.uninit_allocator = pc_alloc_uninit,
> +.qpci_init = qpci_init_pc,
> +.qpci_free = qpci_free_pc,
>  };
>  
>  QOSState *qtest_pc_vboot(const char *cmdline_fmt, va_list ap)
> diff --git a/tests/libqos/libqos-spapr.c b/tests/libqos/libqos-spapr.c
> index f19408b..125c6b3 100644
> --- a/tests/libqos/libqos-spapr.c
> +++ b/tests/libqos/libqos-spapr.c
> @@ -1,10 +1,13 @@
>  #include "qemu/osdep.h"
>  #include "libqos/libqos-spapr.h"
>  #include "libqos/malloc-spapr.h"
> +#include "libqos/pci-spapr.h"
>  
>  static QOSOps qos_ops = {
>  .init_allocator = spapr_alloc_init_flags,
> -.uninit_allocator = spapr_alloc_uninit
> +.uninit_allocator = spapr_alloc_uninit,
> +.qpci_init = qpci_init_spapr,
> +.qpci_free = qpci_free_spapr
>  };
>  
>  QOSState *qtest_spapr_vboot(const char *cmdline_fmt, va_list ap)
> diff --git a/tests/libqos/libqos.c b/tests/libqos/libqos.c
> index a852dc5..332d60e 100644
> --- a/tests/libqos/libqos.c
> +++ b/tests/libqos/libqos.c
> @@ -20,8 +20,13 @@ QOSState *qtest_vboot(QOSOps *ops, const char 
> *cmdline_fmt, va_list ap)
>  cmdline = g_strdup_vprintf(cmdline_fmt, ap);
>  qs->qts = qtest_start(cmdline);
>  qs->ops = ops;
> -if (ops && ops->init_allocator) {
> -

Re: [Qemu-devel] [Qemu-ppc] [PATCH 3/4] libqos: use generic qtest_shutdown()

2016-09-26 Thread David Gibson

On Mon, Sep 26, 2016 at 04:10:48PM +0200, Laurent Vivier wrote:
> Machine specific shutdown function can be registered by
> the machine specific qtest_XXX_boot() if needed.
> 
> So we will not have to test twice the archicture (on boot and on
> shutdown) if the test can be run on several architectures.
> 
> Signed-off-by: Laurent Vivier 

Reviewed-by: David Gibson 

> ---
>  tests/libqos/libqos-pc.c|  3 ++-
>  tests/libqos/libqos-spapr.c |  5 +++--
>  tests/libqos/libqos.c   | 11 ++-
>  tests/libqos/libqos.h   |  8 ++--
>  tests/rtas-test.c   |  2 +-
>  5 files changed, 22 insertions(+), 7 deletions(-)
> 
> diff --git a/tests/libqos/libqos-pc.c b/tests/libqos/libqos-pc.c
> index aa17c98..219824d 100644
> --- a/tests/libqos/libqos-pc.c
> +++ b/tests/libqos/libqos-pc.c
> @@ -8,6 +8,7 @@ static QOSOps qos_ops = {
>  .uninit_allocator = pc_alloc_uninit,
>  .qpci_init = qpci_init_pc,
>  .qpci_free = qpci_free_pc,
> +.shutdown = qtest_pc_shutdown
>  };
>  
>  QOSState *qtest_pc_vboot(const char *cmdline_fmt, va_list ap)
> @@ -31,5 +32,5 @@ QOSState *qtest_pc_boot(const char *cmdline_fmt, ...)
>  
>  void qtest_pc_shutdown(QOSState *qs)
>  {
> -return qtest_shutdown(qs);
> +return qtest_common_shutdown(qs);
>  }
> diff --git a/tests/libqos/libqos-spapr.c b/tests/libqos/libqos-spapr.c
> index 125c6b3..8f7fc14 100644
> --- a/tests/libqos/libqos-spapr.c
> +++ b/tests/libqos/libqos-spapr.c
> @@ -7,7 +7,8 @@ static QOSOps qos_ops = {
>  .init_allocator = spapr_alloc_init_flags,
>  .uninit_allocator = spapr_alloc_uninit,
>  .qpci_init = qpci_init_spapr,
> -.qpci_free = qpci_free_spapr
> +.qpci_free = qpci_free_spapr,
> +.shutdown = qtest_spapr_shutdown
>  };
>  
>  QOSState *qtest_spapr_vboot(const char *cmdline_fmt, va_list ap)
> @@ -29,5 +30,5 @@ QOSState *qtest_spapr_boot(const char *cmdline_fmt, ...)
>  
>  void qtest_spapr_shutdown(QOSState *qs)
>  {
> -return qtest_shutdown(qs);
> +return qtest_common_shutdown(qs);
>  }
> diff --git a/tests/libqos/libqos.c b/tests/libqos/libqos.c
> index 332d60e..2f8a1ad 100644
> --- a/tests/libqos/libqos.c
> +++ b/tests/libqos/libqos.c
> @@ -52,7 +52,7 @@ QOSState *qtest_boot(QOSOps *ops, const char *cmdline_fmt, 
> ...)
>  /**
>   * Tear down the QEMU instance.
>   */
> -void qtest_shutdown(QOSState *qs)
> +void qtest_common_shutdown(QOSState *qs)
>  {
>  if (qs->ops) {
>  if (qs->alloc && qs->ops->uninit_allocator) {
> @@ -68,6 +68,15 @@ void qtest_shutdown(QOSState *qs)
>  g_free(qs);
>  }
>  
> +void qtest_shutdown(QOSState *qs)
> +{
> +if (qs->ops && qs->ops->shutdown) {
> +qs->ops->shutdown(qs);
> +} else {
> +qtest_common_shutdown(qs);
> +}
> +}
> +
>  void set_context(QOSState *s)
>  {
>  global_qtest = s->qts;
> diff --git a/tests/libqos/libqos.h b/tests/libqos/libqos.h
> index a9f6990..2319697 100644
> --- a/tests/libqos/libqos.h
> +++ b/tests/libqos/libqos.h
> @@ -5,22 +5,26 @@
>  #include "libqos/pci.h"
>  #include "libqos/malloc-pc.h"
>  
> +typedef struct QOSState QOSState;
> +
>  typedef struct QOSOps {
>  QGuestAllocator *(*init_allocator)(QAllocOpts);
>  void (*uninit_allocator)(QGuestAllocator *);
>  QPCIBus *(*qpci_init)(QGuestAllocator *alloc);
>  void (*qpci_free)(QPCIBus *bus);
> +void (*shutdown)(QOSState *);
>  } QOSOps;
>  
> -typedef struct QOSState {
> +struct QOSState {
>  QTestState *qts;
>  QGuestAllocator *alloc;
>  QPCIBus *pcibus;
>  QOSOps *ops;
> -} QOSState;
> +};
>  
>  QOSState *qtest_vboot(QOSOps *ops, const char *cmdline_fmt, va_list ap);
>  QOSState *qtest_boot(QOSOps *ops, const char *cmdline_fmt, ...);
> +void qtest_common_shutdown(QOSState *qs);
>  void qtest_shutdown(QOSState *qs);
>  bool have_qemu_img(void);
>  void mkimg(const char *file, const char *fmt, unsigned size_mb);
> diff --git a/tests/rtas-test.c b/tests/rtas-test.c
> index 73c7803..ba0867a 100644
> --- a/tests/rtas-test.c
> +++ b/tests/rtas-test.c
> @@ -22,7 +22,7 @@ static void test_rtas_get_time_of_day(void)
>  t2 = mktimegm();
>  g_assert(t2 - t1 < 5); /* 5 sec max to run the test */
>  
> -qtest_spapr_shutdown(qs);
> +qtest_shutdown(qs);
>  }
>  
>  int main(int argc, char *argv[])

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [Qemu-ppc] [PATCH 4/4] tests: enable ohci/uhci/xhci tests on PPC64

2016-09-26 Thread David Gibson

On Mon, Sep 26, 2016 at 04:10:49PM +0200, Laurent Vivier wrote:
> Signed-off-by: Laurent Vivier 
> ---
>  tests/Makefile.include|  8 +++-
>  tests/libqos/usb.c| 14 ++
>  tests/usb-hcd-uhci-test.c | 24 
>  3 files changed, 37 insertions(+), 9 deletions(-)
> 
> diff --git a/tests/Makefile.include b/tests/Makefile.include
> index 967426a..aaf264f 100644
> --- a/tests/Makefile.include
> +++ b/tests/Makefile.include
> @@ -270,6 +270,12 @@ check-qtest-ppc64-y += tests/drive_del-test$(EXESUF)
>  check-qtest-ppc64-y += tests/postcopy-test$(EXESUF)
>  check-qtest-ppc64-y += tests/boot-serial-test$(EXESUF)
>  check-qtest-ppc64-y += tests/rtas-test$(EXESUF)
> +check-qtest-ppc64-y += tests/usb-hcd-ohci-test$(EXESUF)
> +gcov-files-ppc64-y += hw/usb/hcd-ohci.c
> +check-qtest-ppc64-y += tests/usb-hcd-uhci-test$(EXESUF)
> +gcov-files-ppc64-y += hw/usb/hcd-uhci.c
> +check-qtest-ppc64-y += tests/usb-hcd-xhci-test$(EXESUF)
> +gcov-files-ppc64-y += hw/usb/hcd-xhci.c
>  
>  check-qtest-sh4-y = tests/endianness-test$(EXESUF)
>  
> @@ -595,7 +601,7 @@ libqos-pc-obj-y += tests/libqos/malloc-pc.o 
> tests/libqos/libqos-pc.o
>  libqos-pc-obj-y += tests/libqos/ahci.o
>  libqos-omap-obj-y = $(libqos-obj-y) tests/libqos/i2c-omap.o
>  libqos-imx-obj-y = $(libqos-obj-y) tests/libqos/i2c-imx.o
> -libqos-usb-obj-y = $(libqos-pc-obj-y) tests/libqos/usb.o
> +libqos-usb-obj-y = $(libqos-spapr-obj-y) $(libqos-pc-obj-y) 
> tests/libqos/usb.o
>  libqos-virtio-obj-y = $(libqos-pc-obj-y) tests/libqos/virtio.o 
> tests/libqos/virtio-pci.o tests/libqos/virtio-mmio.o 
> tests/libqos/malloc-generic.o
>  
>  tests/device-introspect-test$(EXESUF): tests/device-introspect-test.o
> diff --git a/tests/libqos/usb.c b/tests/libqos/usb.c
> index f794d92..6168a2f 100644
> --- a/tests/libqos/usb.c
> +++ b/tests/libqos/usb.c
> @@ -12,10 +12,17 @@
>   * See the COPYING file in the top-level directory.
>   */
>  #include "qemu/osdep.h"
> +#include "qemu/bswap.h"
>  #include "libqtest.h"
>  #include "hw/usb/uhci-regs.h"
>  #include "libqos/usb.h"
>  
> +#ifdef HOST_WORDS_BIGENDIAN
> +static const bool host_big_endian = true;
> +#else
> +static const bool host_big_endian = false;
> +#endif
> +
>  void qusb_pci_init_one(QPCIBus *pcibus, struct qhc *hc, uint32_t devfn, int 
> bar)
>  {
>  hc->dev = qpci_device_find(pcibus, devfn);
> @@ -31,6 +38,13 @@ void uhci_port_test(struct qhc *hc, int port, uint16_t 
> expect)
>  uint16_t value = qpci_io_readw(hc->dev, addr);
>  uint16_t mask = ~(UHCI_PORT_WRITE_CLEAR | UHCI_PORT_RSVD1);
>  
> +if (qtest_big_endian() && host_big_endian) {
> +/* little endian device on big endian guest
> + * must be swapped on big endian host
> + */
> +value = bswap16(value);
> +}
> +

Hm.. should the qpci_io_*() helpers handle the endian conversion?

>  g_assert((value & mask) == (expect & mask));
>  }
>  
> diff --git a/tests/usb-hcd-uhci-test.c b/tests/usb-hcd-uhci-test.c
> index c24063e..4b951ce 100644
> --- a/tests/usb-hcd-uhci-test.c
> +++ b/tests/usb-hcd-uhci-test.c
> @@ -9,9 +9,13 @@
>  
>  #include "qemu/osdep.h"
>  #include "libqtest.h"
> +#include "libqos/libqos.h"
>  #include "libqos/usb.h"
> +#include "libqos/libqos-pc.h"
> +#include "libqos/libqos-spapr.h"
>  #include "hw/usb/uhci-regs.h"
>  
> +static QOSState *qs;
>  
>  static void test_uhci_init(void)
>  {
> @@ -19,13 +23,10 @@ static void test_uhci_init(void)
>  
>  static void test_port(int port)
>  {
> -QPCIBus *pcibus;
>  struct qhc uhci;
>  
>  g_assert(port > 0);
> -pcibus = qpci_init_pc(NULL);
> -g_assert(pcibus != NULL);
> -qusb_pci_init_one(pcibus, , QPCI_DEVFN(0x1d, 0), 4);
> +qusb_pci_init_one(qs->pcibus, , QPCI_DEVFN(0x1d, 0), 4);
>  uhci_port_test(, port - 1, UHCI_PORT_CCS);
>  }
>  
> @@ -75,6 +76,7 @@ static void test_usb_storage_hotplug(void)
>  
>  int main(int argc, char **argv)
>  {
> +const char *arch = qtest_get_arch();
>  int ret;
>  
>  g_test_init(, , NULL);
> @@ -84,11 +86,17 @@ int main(int argc, char **argv)
>  qtest_add_func("/uhci/pci/hotplug", test_uhci_hotplug);
>  qtest_add_func("/uhci/pci/hotplug/usb-storage", 
> test_usb_storage_hotplug);
>  
> -qtest_start("-device piix3-usb-uhci,id=uhci,addr=1d.0"
> -" -drive id=drive0,if=none,file=/dev/null,format=raw"
> -" -device usb-tablet,bus=uhci.0,port=1");
> +if (strcmp(arch, "i386") == 0 || strcmp(arch, "x86_64") == 0) {
> +qs = qtest_pc_boot("-device piix3-usb-uhci,id=uhci,addr=1d.0"
> +   " -drive 
> id=drive0,if=none,file=/dev/null,format=raw"
> +   " -device usb-tablet,bus=uhci.0,port=1");
> +} else if (strcmp(arch, "ppc64") == 0) {
> +qs = qtest_spapr_boot("-device piix3-usb-uhci,id=uhci,addr=1d.0"
> +   " -drive 
> id=drive0,if=none,file=/dev/null,format=raw"
> +

Re: [Qemu-devel] [Qemu-ppc] [PATCH 0/4] tests: enable ohci/uhci/xhci tests on PPC64

2016-09-26 Thread David Gibson

On Mon, Sep 26, 2016 at 04:10:45PM +0200, Laurent Vivier wrote:
> This series enables USB tests on PPC64, and for
> that implements libqos SPAPR PCI support.

I have a few comments on various patches in the series, but overall it
looks like a good idea.

Anyone have any objection to me taking this through my tree once the
nits are removed?

> 
> Laurent Vivier (4):
>   libqos: add PPC64 PCI support
>   libqos: add PCI management in qtest_vboot()/qtest_shutdown()
>   libqos: use generic qtest_shutdown()
>   tests: enable ohci/uhci/xhci tests on PPC64
> 
>  tests/Makefile.include  |   9 +-
>  tests/e1000e-test.c |   2 +-
>  tests/i440fx-test.c |   2 +-
>  tests/ide-test.c|   2 +-
>  tests/ivshmem-test.c|   2 +-
>  tests/libqos/ahci.c |   2 +-
>  tests/libqos/libqos-pc.c|   8 +-
>  tests/libqos/libqos-spapr.c |   8 +-
>  tests/libqos/libqos.c   |  32 -
>  tests/libqos/libqos.h   |  11 +-
>  tests/libqos/pci-pc.c   |  24 +---
>  tests/libqos/pci-pc.h   |   3 +-
>  tests/libqos/pci-spapr.c| 280 
> 
>  tests/libqos/pci-spapr.h|  17 +++
>  tests/libqos/pci.c  |  22 +++-
>  tests/libqos/rtas.c |  45 +++
>  tests/libqos/rtas.h |   4 +
>  tests/libqos/usb.c  |  14 +++
>  tests/q35-test.c|   2 +-
>  tests/rtas-test.c   |   2 +-
>  tests/rtl8139-test.c|   2 +-
>  tests/tco-test.c|   2 +-
>  tests/usb-hcd-ehci-test.c   |   2 +-
>  tests/usb-hcd-uhci-test.c   |  24 ++--
>  tests/vhost-user-test.c |   2 +-
>  tests/virtio-9p-test.c  |   2 +-
>  tests/virtio-blk-test.c |   2 +-
>  tests/virtio-net-test.c |   2 +-
>  tests/virtio-scsi-test.c|   2 +-
>  29 files changed, 470 insertions(+), 61 deletions(-)
>  create mode 100644 tests/libqos/pci-spapr.c
>  create mode 100644 tests/libqos/pci-spapr.h
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [Qemu-ppc] How to add my implementation of the fmadds instruction to QEMU

2016-09-26 Thread David Gibson

On Mon, Sep 26, 2016 at 09:05:22PM -0400, G 3 wrote:
> I made my own experimental implementation of the fmadds instruction that I
> would like to add to QEMU. How would I do this?
> 
> My implementation would probably look like this:
> 
> void fmadds(float *frD, float frA, float frC, float frB)
> {
>   *frD = frA * frC + frB;
> }

So.. using a helper essentially?

You'd need to submit a patch adding the new implementation, with a
commit message which made the case for replacing the existing
implementation with yours.  So, you'd need data to suggest both that
your version generates correct results, and that it is faster or
otherwise better than the existing one.

> 
> 
> I then want to see if this implementation will make things faster. This code
> will test my implementation:
> 
> #include 
> #include 
> 
> /*
> fmadds basically does this frD = frA * frC + frB
> */
> 
> int main (int argc, const char * argv[]) {
> const int iteration_count = 1;
> double iter, frD, frA, frB, frC;
> clock_t start_time, end_time;
> 
> frA = 10;
> frB = 5;
> frC = 2;
> 
> start_time = clock();
> for(iter = 0; iter < iteration_count; iter++)
> {
> asm volatile("fmadds %0, %1, %2, %3" : "=f" (frD) : "f" (frA), "f"
> (frC), "f" (frB));
> }
> end_time = clock();
> printf("frD:%f frA:%f frB:%f frC:%f\n", frD, frA, frB, frC);
> printf("Time elapsed: %0.2f seconds\n", (float)(end_time - start_time) /
> CLOCKS_PER_SEC);
> 
> return 0;
> }
> 
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

[Qemu-devel] [Bug 1626972] Re: QEMU memfd_create fallback mechanism change for security drivers

2016-09-26 Thread Rafael David Tinoco

I'll follow to see if patch was accepted upstream:

https://lists.gnu.org/archive/html/qemu-devel/2016-09/msg06191.html
https://www.mail-archive.com/qemu-devel@nongnu.org/msg400892.html

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1626972

Title:
  QEMU memfd_create fallback mechanism change for security drivers

Status in QEMU:
  In Progress

Bug description:
  And, when libvirt starts using apparmor, and creating apparmor
  profiles for every virtual machine created in the compute nodes,
  mitaka qemu (2.5 - and upstream also) uses a fallback mechanism for
  creating shared memory for live-migrations. This fall back mechanism,
  on kernels 3.13 - that don't have memfd_create() system-call, try to
  create files on /tmp/ directory and fails.. causing live-migration not
  to work.

  Trusty with kernel 3.13 + Mitaka with qemu 2.5 + apparmor capability =
  can't live migrate.

  From qemu 2.5, logic is on :

  void *qemu_memfd_alloc(const char *name, size_t size, unsigned int seals, int 
*fd)
  {
  if (memfd_create)... ### only works with HWE kernels

  else ### 3.13 kernels, gets blocked by apparmor
 tmpdir = g_get_tmp_dir
 ...
 mfd = mkstemp(fname)
  }

  And you can see the errors:

  From the host trying to send the virtual machine:

  2016-08-15 16:36:26.160 1974 ERROR nova.virt.libvirt.driver 
[req-0cac612b-8d53-4610-b773-d07ad6bacb91 691a581cfa7046278380ce82b1c38ddd 
133ebc3585c041aebaead8c062cd6511 - - -] [instance: 
2afa1131-bc8c-43d2-9c4a-962c1bf7723e] Migration operation has aborted
  2016-08-15 16:36:26.248 1974 ERROR nova.virt.libvirt.driver 
[req-0cac612b-8d53-4610-b773-d07ad6bacb91 691a581cfa7046278380ce82b1c38ddd 
133ebc3585c041aebaead8c062cd6511 - - -] [instance: 
2afa1131-bc8c-43d2-9c4a-962c1bf7723e] Live Migration failure: internal error: 
unable to execute QEMU command 'migrate': Migration disabled: failed to 
allocate shared memory

  From the host trying to receive the virtual machine:

  Aug 15 16:36:19 tkcompute01 kernel: [ 1194.356794] type=1400 
audit(1471289779.791:72): apparmor="STATUS" operation="profile_load" 
profile="unconfined" name="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" 
pid=12565 comm="apparmor_parser"
  Aug 15 16:36:19 tkcompute01 kernel: [ 1194.357048] type=1400 
audit(1471289779.791:73): apparmor="STATUS" operation="profile_load" 
profile="unconfined" name="qemu_bridge_helper" pid=12565 comm="apparmor_parser"
  Aug 15 16:36:20 tkcompute01 kernel: [ 1194.877027] type=1400 
audit(1471289780.311:74): apparmor="STATUS" operation="profile_replace" 
profile="unconfined" name="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" 
pid=12613 comm="apparmor_parser"
  Aug 15 16:36:20 tkcompute01 kernel: [ 1194.904407] type=1400 
audit(1471289780.343:75): apparmor="STATUS" operation="profile_replace" 
profile="unconfined" name="qemu_bridge_helper" pid=12613 comm="apparmor_parser"
  Aug 15 16:36:20 tkcompute01 kernel: [ 1194.973064] type=1400 
audit(1471289780.407:76): apparmor="DENIED" operation="mknod" 
profile="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" name="/tmp/memfd-tNpKSj" 
pid=12625 comm="qemu-system-x86" requested_mask="c" denied_mask="c" fsuid=107 
ouid=107
  Aug 15 16:36:20 tkcompute01 kernel: [ 1194.979871] type=1400 
audit(1471289780.411:77): apparmor="DENIED" operation="open" 
profile="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" name="/tmp/" pid=12625 
comm="qemu-system-x86" requested_mask="r" denied_mask="r" fsuid=107 ouid=0
  Aug 15 16:36:20 tkcompute01 kernel: [ 1194.979881] type=1400 
audit(1471289780.411:78): apparmor="DENIED" operation="open" 
profile="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" name="/var/tmp/" 
pid=12625 comm="qemu-system-x86" requested_mask="r" denied_mask="r" fsuid=107 
ouid=0

  When leaving libvirt without apparmor capabilities (thus not confining
  virtual machines on compute nodes, the live migration works as
  expected, so, clearly, apparmor is stepping into the live migration).
  I'm sure that virtual machines have to be confined and that this isn't
  the desired behaviour...

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1626972/+subscriptions

[Qemu-devel] [PATCH] util: secure memfd_create fallback mechanism

2016-09-26 Thread Rafael David Tinoco

Commit: 35f9b6ef3acc9d0546c395a566b04e63ca84e302 added a fallback
mechanism for systems not supporting memfd_create syscall (started
being supported since 3.17).

Backporting memfd_create might not be accepted for distros relying
on older kernels. Nowadays there is no way for security driver
to discover memfd filename to be created: /memfd-XX.

It is more appropriate to include UUID and/or VM names in the
temporary filename, allowing security driver rules to be applied
while maintaining the required unpredictability with mkstemp.

This change will allow libvirt to know exact memfd file to be created
for vhost log AND to create appropriate security rules to allow access
per instance (instead of a opened rule like /memfd-*).

Example of apparmor deny messages with this change:

Per VM UUID (preferred, generated automatically by libvirt):

kernel: [26632.154856] type=1400 audit(1474945148.633:78): apparmor=
"DENIED" operation="mknod" profile="libvirt-0b96011f-0dc0-44a3-92c3-
196de2efab6d" name="/tmp/memfd-0b96011f-0dc0-44a3-92c3-196de2efab6d-
qeHrBV" pid=75161 comm="qemu-system-x86" requested_mask="c" denied_
mask="c" fsuid=107 ouid=107

Per VM name (if no UUID is specified):

kernel: [26447.505653] type=1400 audit(1474944963.985:72): apparmor=
"DENIED" operation="mknod" profile="libvirt-----
" name="/tmp/memfd-instance-teste-osYpHh" pid=74648
comm="qemu-system-x86" requested_mask="c" denied_mask="c" fsuid=107
ouid=107

Signed-off-by: Rafael David Tinoco 
---
 util/memfd.c | 26 +-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/util/memfd.c b/util/memfd.c
index 4571d1a..4b715ac 100644
--- a/util/memfd.c
+++ b/util/memfd.c
@@ -30,6 +30,9 @@
 #include 
 
 #include "qemu/memfd.h"
+#include "qmp-commands.h"
+#include "qemu-common.h"
+#include "sysemu/sysemu.h"
 
 #ifdef CONFIG_MEMFD
 #include 
@@ -94,11 +97,32 @@ void *qemu_memfd_alloc(const char *name, size_t size, 
unsigned int seals,
 return NULL;
 }
 } else {
+int ret = 0;
 const char *tmpdir = g_get_tmp_dir();
+UuidInfo *uinfo;
+NameInfo *ninfo;
 gchar *fname;
 
-fname = g_strdup_printf("%s/memfd-XX", tmpdir);
+uinfo = qmp_query_uuid(NULL);
+
+ret = strcmp(uinfo->UUID, UUID_NONE);
+if (ret == 0) {
+ninfo = qmp_query_name(NULL);
+if (ninfo->has_name) {
+fname = g_strdup_printf("%s/memfd-%s-XX", tmpdir,
+ninfo->name);
+} else {
+fname = g_strdup_printf("%s/memfd-XX", tmpdir);
+}
+qapi_free_NameInfo(ninfo);
+} else {
+fname = g_strdup_printf("%s/memfd-%s-XX", tmpdir,
+uinfo->UUID);
+}
+
 mfd = mkstemp(fname);
+
+qapi_free_UuidInfo(uinfo);
 unlink(fname);
 g_free(fname);
 
-- 
2.9.3

Re: [Qemu-devel] [PATCH]MC146818 RTC: Get correct guest time when irq coalesced

2016-09-26 Thread zhong...@sangfor.com.cn


>No, it's not seconds.  It depends on the interval of the periodic interrupt.
> 
>In any case this approach is not acceptable, unfortunately; it causes
>the time to go backwards for the guest as soon as an interrupt is coalesced.
>
>What is the guest that you're seeing issues with?

Yes, it's length depend on settings in register A. Sorry for my mistake,  but 
the issue
ought to be still exist, which make guest clock runs ahead of time.
We are testing winxp. Mechanism of windows, at least winxp, is reading cmos 
once for 
initial time while boot time, then set up period interrupt trigger. 
After initialization, windows will run clock in response of period interrupt 
received, 
besides that it also read cmos seldomly for synchronization or something else.
So if period interrupt is queued in qemu, clock in windows will not run. When 
those
interrupts injected hurrily, windows clock catch up with guest virtual time 
quickly.

t1---t2--t3-t4-
   ^ ^   ^  
  ^
|  |inject finish 
final time driven by intrs
 timer stall   read cmos

For example,  when guest virtual time goes to t1, qemu is busy in some heavy job
and cause qemu timer delayed.  when t2, qemu finish jobs and is free to run 
timers.
At this time, all period interrupts between t1 and t2 will be generated and 
queued 
in irq_coalesced. Those intrs will be injected  a little faster but one by one 
into guest. 
Normally, they will be all injected completely at t3. After all timers are 
proceeded, 
guest also have chance to run. if it read cmos time at that moment, it will 
read t2, 
but actuallly it's clock is now at t1, any way, it trust rtc hardware, so it 
set its clock to
t2, then it continue to drive clock by period interrupts, because there are 
still lots of 
queued interrupts injecting, so guest's clock runs quite quickly. When time 
goes to 
t3, all interrupt injected, but guest's clock already goes to t4, where t4-t3 
equal t2-t1.

According to figure above, because: t2-t1=t4-t2t3-t2 <<< 
t4-t2so:t3 <<< t4
That is to say,  guest goes ahead of time.

Re: [Qemu-devel] [PATCH 1/2] vhost: enable any layout feature

2016-09-26 Thread Yuanhan Liu

On Mon, Sep 26, 2016 at 10:24:55PM +0300, Michael S. Tsirkin wrote:
> On Mon, Sep 26, 2016 at 11:01:58AM -0700, Stephen Hemminger wrote:
> > I assume that if using Version 1 that the bit will be ignored

Yes, but I will just quote what you just said: what if the guest
virtio device is a legacy device? I also gave my reasons in another
email why I consistently set this flag:

  - we have to return all features we support to the guest.

We don't know the guest is a modern or legacy device. That means
we should claim we support both: VERSION_1 and ANY_LAYOUT.

Assume guest is a legacy device and we just set VERSION_1 (the current
case), ANY_LAYOUT will never be negotiated.

  - I'm following the way Linux kernel takes: it also set both features.

  Maybe, we could unset ANY_LAYOUT when VERSION_1 is _negotiated_?

The unset after negotiation I proposed turned out it won't work: the
feature is already negotiated; unsetting it only in vhost side doesn't
change anything. Besides, it may break the migration as Michael stated
below.

> Therein lies a problem. If dpdk tweaks flags, updating it
> will break guest migration.
> 
> One way is to require that users specify all flags fully when
> creating the virtio net device. 

Like how? By a new command line option? And user has to type
all those features?

> QEMU could verify that all required
> flags are set, and fail init if not.
> 
> This has other advantages, e.g. it adds ability to
> init device without waiting for dpdk to connect.
> 
> However, enabling each new feature would now require
> management work. How about dpdk ships the list
> of supported features instead?
> Management tools could read them on source and destination
> and select features supported on both sides.

That means the management tool would somehow has a dependency on
DPDK project, which I have no objection at all. But, is that
a good idea?

BTW, I'm not quite sure I followed your idea. I mean, how it supposed
to fix the ANY_LAYOUT issue here? How this flag will be set for
legacy device?

--yliu

[Qemu-devel] [PATCH] util: secure memfd_create fallback mechanism

2016-09-26 Thread Rafael David Tinoco

Commit: 35f9b6ef3acc9d0546c395a566b04e63ca84e302 added a fallback
mechanism for systems not supporting memfd_create syscall (started
being supported since 3.17).

Backporting memfd_create might not be accepted for distros relying
on older kernels. Nowadays there is no way for security driver
to discover memfd filename to be created: /memfd-XX.

It is more appropriate to include UUID and/or VM names in the
temporary filename, allowing security driver rules to be applied
while maintaining the required unpredictability with mkstemp.

This change will allow libvirt to know exact memfd file to be created
for vhost log AND to create appropriate security rules to allow access
per instance (instead of a opened rule like /memfd-*).

Example of apparmor deny messages with this change:

Per VM UUID (preferred, generated automatically by libvirt):

kernel: [26632.154856] type=1400 audit(1474945148.633:78): apparmor=
"DENIED" operation="mknod" profile="libvirt-0b96011f-0dc0-44a3-92c3-
196de2efab6d" name="/tmp/memfd-0b96011f-0dc0-44a3-92c3-196de2efab6d-
qeHrBV" pid=75161 comm="qemu-system-x86" requested_mask="c" denied_
mask="c" fsuid=107 ouid=107

Per VM name (if no UUID is specified):

kernel: [26447.505653] type=1400 audit(1474944963.985:72): apparmor=
"DENIED" operation="mknod" profile="libvirt-----
" name="/tmp/memfd-instance-teste-osYpHh" pid=74648
comm="qemu-system-x86" requested_mask="c" denied_mask="c" fsuid=107
ouid=107

Signed-off-by: Rafael David Tinoco 
---
 util/memfd.c | 26 +-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/util/memfd.c b/util/memfd.c
index 4571d1a..4b715ac 100644
--- a/util/memfd.c
+++ b/util/memfd.c
@@ -30,6 +30,9 @@
 #include 
 
 #include "qemu/memfd.h"
+#include "qmp-commands.h"
+#include "qemu-common.h"
+#include "sysemu/sysemu.h"
 
 #ifdef CONFIG_MEMFD
 #include 
@@ -94,11 +97,32 @@ void *qemu_memfd_alloc(const char *name, size_t size, 
unsigned int seals,
 return NULL;
 }
 } else {
+int ret = 0;
 const char *tmpdir = g_get_tmp_dir();
+UuidInfo *uinfo;
+NameInfo *ninfo;
 gchar *fname;
 
-fname = g_strdup_printf("%s/memfd-XX", tmpdir);
+uinfo = qmp_query_uuid(NULL);
+
+ret = strcmp(uinfo->UUID, UUID_NONE);
+if (ret == 0) {
+ninfo = qmp_query_name(NULL);
+if (ninfo->has_name) {
+fname = g_strdup_printf("%s/memfd-%s-XX", tmpdir,
+ninfo->name);
+} else {
+fname = g_strdup_printf("%s/memfd-XX", tmpdir);
+}
+qapi_free_NameInfo(ninfo);
+} else {
+fname = g_strdup_printf("%s/memfd-%s-XX", tmpdir,
+uinfo->UUID);
+}
+
 mfd = mkstemp(fname);
+
+qapi_free_UuidInfo(uinfo);
 unlink(fname);
 g_free(fname);
 
-- 
2.9.3

Re: [Qemu-devel] [PATCH] spapr_vscsi: fix build error introduced by f19661c8

2016-09-26 Thread David Gibson

On Mon, Sep 26, 2016 at 03:17:44PM +0100, Felipe Franciosi wrote:
> A typo introduced in f19661c8 prevents qemu from building when configured
> with --enable-trace-backend=dtrace.
> 
> Signed-off-by: Felipe Franciosi 

Applied to ppc-for-2.8, thanks.


> ---
>  hw/scsi/spapr_vscsi.c | 2 +-
>  hw/scsi/trace-events  | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/scsi/spapr_vscsi.c b/hw/scsi/spapr_vscsi.c
> index d8a2296..6090a20 100644
> --- a/hw/scsi/spapr_vscsi.c
> +++ b/hw/scsi/spapr_vscsi.c
> @@ -658,7 +658,7 @@ static void vscsi_process_login(VSCSIState *s, vscsi_req 
> *req)
>  struct srp_login_rsp *rsp = >srp.login_rsp;
>  uint64_t tag = iu->srp.rsp.tag;
>  
> -trace_spapr_vscsi__process_login();
> +trace_spapr_vscsi_process_login();
>  
>  /* TODO handle case that requested size is wrong and
>   * buffer format is wrong
> diff --git a/hw/scsi/trace-events b/hw/scsi/trace-events
> index d1995b8..4a2e5d6 100644
> --- a/hw/scsi/trace-events
> +++ b/hw/scsi/trace-events
> @@ -225,7 +225,7 @@ spapr_vscsi_command_complete_sense_data2(unsigned s8, 
> unsigned s9, unsigned s10,
>  spapr_vscsi_command_complete_status(uint32_t status) "Command complete 
> err=%"PRIu32
>  spapr_vscsi_save_request(uint32_t qtag, unsigned desc, unsigned offset) 
> "saving tag=%"PRIu32", current desc#%u, offset=0x%x"
>  spapr_vscsi_load_request(uint32_t qtag, unsigned desc, unsigned offset) 
> "restoring tag=%"PRIu32", current desc#%u, offset=0x%x"
> -spapr_vscsi__process_login(void) "Got login, sending response !"
> +spapr_vscsi_process_login(void) "Got login, sending response !"
>  spapr_vscsi_queue_cmd_no_drive(uint64_t lun) "Command for lun %08" PRIx64 " 
> with no drive"
>  spapr_vscsi_queue_cmd(uint32_t qtag, unsigned cdb, const char *cmd, int lun, 
> int ret) "Queued command tag 0x%"PRIx32" CMD 0x%x=%s LUN %d ret: %d"
>  spapr_vscsi_do_crq(unsigned c0, unsigned c1) "crq: %02x %02x ..."

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH v3 07/10] ppc/pnv: add XSCOM infrastructure

2016-09-26 Thread David Gibson

On Mon, Sep 26, 2016 at 06:11:36PM +0200, Cédric Le Goater wrote:
> On 09/23/2016 04:46 AM, David Gibson wrote:
> > On Thu, Sep 22, 2016 at 10:25:59AM +0200, Cédric Le Goater wrote:
>  @@ -493,6 +525,8 @@ static void pnv_chip_power9_class_init(ObjectClass 
>  *klass, void *data)
>   k->chip_cfam_id = 0x100d10498000ull; /* P9 Nimbus DD1.0 */
>   k->cores_mask = POWER9_CORE_MASK;
>   k->core_pir = pnv_chip_core_pir_p9;
>  +k->xscom_addr = pnv_chip_xscom_addr_p9;
>  +k->xscom_pcba = pnv_chip_xscom_pcba_p9;
> >>>
> >>> So if you do as BenH (and I) suggested and have the "scom address
> >>> space" actually be addressed by (pcba << 3), I think you can probably
> >>> avoid these.  
> >>
> >> I will look at that option again. 
> >>
> >> I was trying to untangle a few things at the same time. I have better
> >> view of the problem to solve now. The bus is gone, that's was one 
> >> thing. How we map these xscom regions is the next. 
> >>
> >> Ben suggested to add some P7/P8 mangling before the dispatch in 
> >> the _space_xscom. This should make things cleaner. I had 
> >> not thought of doing that and this is why I introduced these helpers :
> >>
> >> +uint32_t pnv_xscom_pcba(PnvXScomInterface *dev, uint64_t addr)
> >> +uint64_t pnv_xscom_addr(PnvXScomInterface *dev, uint32_t pcba)
> >>
> >> which I don't really like ...
> >>
> >> but we must make sure that we can do the mapping of the xscom 
> >> subregions in the _space_xscom using (pcba << 3)
> >>
> >>
> >>> Instead you can handle it in the chip or ADU realize function by either:
> >>>
> >>> P8: * map one big subregion for the ADU into _space_memory
> >>> * have the handler for that subregion do the address mangling,
> >>>   then redispatch into the xscom address space
> >>>
> >>> P9: * Map the appropriate chunk of the xscom address space
> >>>   directly into address_space_memory
> >>
> >> Yes that was my feeling for a better solution but Ben chimed in with the 
> >> HMER topic. I need to look at that.
> > 
> > Right.  Doesn't change the basic concept though - it just means you
> > need (slightly different) redispatchers for both P8 and P9.
> 
> In fact they are the same, you only need an "addr to pcba" handler at the
> chip class level : 

Ok.  I'd been thinking of using different dispatchers as an
alternative to using the chip class translator hook, but I guess if
you have the decoding of those "core" registers here as well, then
that doesn't make so much sense.

> static uint64_t xscom_read(void *opaque, hwaddr addr, unsigned width)
> {
>   PnvChip *chip = opaque;
>   uint32_t pcba = PNV_CHIP_GET_CLASS(chip)->xscom_pcba(addr);
>   uint64_t val = 0;
>   MemTxResult result;
> 
>   ...
> 
> val = address_space_ldq(>xscom_as, pcba << 3,
> MEMTXATTRS_UNSPECIFIED, );
> if (result != MEMTX_OK) {
> 
>   
> 
> And so, the result is pretty clean. I killed the proxy object and merged 
> the regions in the chip but I have kept the pnv_xscom.c file because the 
> code related to xscom is rather large : ~250 lines. 

Sure, makes sense.

> The objects declaring a xscom region need to do some register shifting but 
> this is usual in mmio regions.
> 
> You will see in v4.

Ok.

>  +static bool xscom_dispatch_read(PnvXScom *xscom, hwaddr addr, uint64_t 
>  *val)
>  +{
>  +uint32_t success;
>  +uint8_t data[8];
>  +
>  +success = !address_space_rw(>xscom_as, addr, 
>  MEMTXATTRS_UNSPECIFIED,
>  +data, 8, false);
>  +*val = (((uint64_t) data[0]) << 56 |
>  +((uint64_t) data[1]) << 48 |
>  +((uint64_t) data[2]) << 40 |
>  +((uint64_t) data[3]) << 32 |
>  +((uint64_t) data[4]) << 24 |
>  +((uint64_t) data[5]) << 16 |
>  +((uint64_t) data[6]) << 8  |
>  +((uint64_t) data[7]));
> >>>
> >>> AFAICT this is basically assuming data is always encoded BE.  With the
> >>> right choice of endian flags on the individual SCOM device
> >>> registrations with the scom address space, I think you should be able
> >>> to avoid this mangling.
> >>
> >> yes. I should but curiously I had to do this, and this works the same on
> >> an intel host or a ppc64 host.
> > 
> > Hmm.. I suspect what you actually need is NATIVE_ENDIAN on the
> > individual SCOM devices, with BIG_ENDIAN on the redispatcher region.
> 
> we should be using address_space_ldq and address_space_stq.

Ok.

>  +
>  +success = !address_space_rw(>xscom_as, addr, 
>  MEMTXATTRS_UNSPECIFIED,
>  +   data, 8, true);
>  +return success;
>  +}
>  +
>  +static uint64_t xscom_read(void *opaque, hwaddr addr, unsigned width)
>  +{
>  +PnvXScom *s = opaque;
>  +uint32_t pcba = s->chip_class->xscom_pcba(addr);
>

[Qemu-devel] [PATCH V15 10/12] filter-rewriter: rewrite tcp packet to keep secondary connection

2016-09-26 Thread Zhang Chen

We will rewrite tcp packet secondary received and sent.
When colo guest is a tcp server.

Firstly, client start a tcp handshake. the packet's seq=client_seq,
ack=0,flag=SYN. COLO primary guest get this pkt and mirror(filter-mirror)
to secondary guest, secondary get it use filter-redirector.
Then,primary guest response pkt
(seq=primary_seq,ack=client_seq+1,flag=ACK|SYN).
secondary guest response pkt
(seq=secondary_seq,ack=client_seq+1,flag=ACK|SYN).
In here,we use filter-rewriter save the secondary_seq to it's tcp connection.
Finally handshake,client send pkt
(seq=client_seq+1,ack=primary_seq+1,flag=ACK).
Here,filter-rewriter can get primary_seq, and rewrite ack from primary_seq+1
to secondary_seq+1, recalculate checksum. So the secondary tcp connection
kept good.

When we send/recv packet.
client send pkt(seq=client_seq+1+data_len,ack=primary_seq+1,flag=ACK|PSH).
filter-rewriter rewrite ack and send to secondary guest.

primary guest response pkt
(seq=primary_seq+1,ack=client_seq+1+data_len,flag=ACK)
secondary guest response pkt
(seq=secondary_seq+1,ack=client_seq+1+data_len,flag=ACK)
we rewrite secondary guest seq from secondary_seq+1 to primary_seq+1.
So tcp connection kept good.

In code We use offset( = secondary_seq - primary_seq )
to rewrite seq or ack.
handle_primary_tcp_pkt: tcp_pkt->th_ack += offset;
handle_secondary_tcp_pkt: tcp_pkt->th_seq -= offset;

Signed-off-by: Zhang Chen 
Signed-off-by: Li Zhijian 
Signed-off-by: Wen Congyang 
---
 net/colo.c|   2 +
 net/colo.h|   7 
 net/filter-rewriter.c | 112 +-
 trace-events  |   5 +++
 4 files changed, 124 insertions(+), 2 deletions(-)

diff --git a/net/colo.c b/net/colo.c
index dc4e4a4..9f469e6 100644
--- a/net/colo.c
+++ b/net/colo.c
@@ -135,6 +135,8 @@ Connection *connection_new(ConnectionKey *key)
 
 conn->ip_proto = key->ip_proto;
 conn->processing = false;
+conn->offset = 0;
+conn->syn_flag = 0;
 g_queue_init(>primary_list);
 g_queue_init(>secondary_list);
 
diff --git a/net/colo.h b/net/colo.h
index 6720a3a..7c524f3 100644
--- a/net/colo.h
+++ b/net/colo.h
@@ -62,6 +62,13 @@ typedef struct Connection {
 /* flag to enqueue unprocessed_connections */
 bool processing;
 uint8_t ip_proto;
+/* offset = secondary_seq - primary_seq */
+tcp_seq  offset;
+/*
+ * we use this flag update offset func
+ * run once in independent tcp connection
+ */
+int syn_flag;
 } Connection;
 
 uint32_t connection_key_hash(const void *opaque);
diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c
index dca5dcf..e5b8658 100644
--- a/net/filter-rewriter.c
+++ b/net/filter-rewriter.c
@@ -10,6 +10,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "trace.h"
 #include "net/colo.h"
 #include "net/filter.h"
 #include "net/net.h"
@@ -58,6 +59,93 @@ static int is_tcp_packet(Packet *pkt)
 }
 }
 
+/* handle tcp packet from primary guest */
+static int handle_primary_tcp_pkt(NetFilterState *nf,
+  Connection *conn,
+  Packet *pkt)
+{
+struct tcphdr *tcp_pkt;
+
+tcp_pkt = (struct tcphdr *)pkt->transport_header;
+if (trace_event_get_state(TRACE_COLO_FILTER_REWRITER_DEBUG)) {
+char *sdebug, *ddebug;
+sdebug = strdup(inet_ntoa(pkt->ip->ip_src));
+ddebug = strdup(inet_ntoa(pkt->ip->ip_dst));
+trace_colo_filter_rewriter_pkt_info(__func__, sdebug, ddebug,
+ntohl(tcp_pkt->th_seq), ntohl(tcp_pkt->th_ack),
+tcp_pkt->th_flags);
+trace_colo_filter_rewriter_conn_offset(conn->offset);
+g_free(sdebug);
+g_free(ddebug);
+}
+
+if (((tcp_pkt->th_flags & (TH_ACK | TH_SYN)) == TH_SYN)) {
+/*
+ * we use this flag update offset func
+ * run once in independent tcp connection
+ */
+conn->syn_flag = 1;
+}
+
+if (((tcp_pkt->th_flags & (TH_ACK | TH_SYN)) == TH_ACK)) {
+if (conn->syn_flag) {
+/*
+ * offset = secondary_seq - primary seq
+ * ack packet sent by guest from primary node,
+ * so we use th_ack - 1 get primary_seq
+ */
+conn->offset -= (ntohl(tcp_pkt->th_ack) - 1);
+conn->syn_flag = 0;
+}
+/* handle packets to the secondary from the primary */
+tcp_pkt->th_ack = htonl(ntohl(tcp_pkt->th_ack) + conn->offset);
+
+net_checksum_calculate((uint8_t *)pkt->data, pkt->size);
+}
+
+return 0;
+}
+
+/* handle tcp packet from secondary guest */
+static int handle_secondary_tcp_pkt(NetFilterState *nf,
+Connection *conn,
+Packet *pkt)
+{
+struct tcphdr *tcp_pkt;
+
+tcp_pkt = (struct tcphdr *)pkt->transport_header;
+
+if

[Qemu-devel] [PATCH V15 12/12] docs: Add documentation for COLO-proxy

2016-09-26 Thread Zhang Chen

Introduce the design of COLO-proxy, and how to use it.

Signed-off-by: Zhang Chen 
---
 docs/colo-proxy.txt | 188 
 1 file changed, 188 insertions(+)
 create mode 100644 docs/colo-proxy.txt

diff --git a/docs/colo-proxy.txt b/docs/colo-proxy.txt
new file mode 100644
index 000..76767cb
--- /dev/null
+++ b/docs/colo-proxy.txt
@@ -0,0 +1,188 @@
+COLO-proxy
+--
+Copyright (c) 2016 Intel Corporation
+Copyright (c) 2016 HUAWEI TECHNOLOGIES CO., LTD.
+Copyright (c) 2016 Fujitsu, Corp.
+
+This work is licensed under the terms of the GNU GPL, version 2 or later.
+See the COPYING file in the top-level directory.
+
+This document gives an overview of COLO proxy's design.
+
+== Background ==
+COLO-proxy is a part of COLO project. It is used
+to compare the network package to help COLO decide
+whether to do checkpoint. With COLO-proxy's help,
+COLO greatly improves the performance.
+
+The filter-redirector, filter-mirror, colo-compare
+and filter-rewriter compose the COLO-proxy.
+
+== Architecture ==
+
+COLO-Proxy is based on qemu netfilter and it's a plugin for qemu netfilter
+(except colo-compare). It keep Secondary VM connect normally to
+client and compare packets sent by PVM with sent by SVM.
+If the packet difference, notify COLO-frame to do checkpoint and send
+all primary packet has queued. Otherwise just send the queued primary
+packet and drop the queued secondary packet.
+
+Below is a COLO proxy ascii figure:
+
+ Primary qemu   
Secondary qemu
++--+   
++
+| +--+ |   |  
+---+ |
+| |  | |   |  |
   | |
+| |guest | |   |  |
guest  | |
+| |  | |   |  |
   | |
+| +---^--+---+ |   |  
+-+++ |
+| |  | |   |   
 ^|  |
+| |  | |   |   
 ||  |
+| |  +--+  |   
 ||  |
+|netfilter|  |   | ||  |   
netfilter||  |
+| +--+ ++  ||  |  
+---+ |
+| |   |  |   |  |out   ||  |  |
 ||  filter excute order   | |
+| |   |  |  +-+||  |  |
 || +--->  | |
+| |   |  |  ||  | |||  |  |
 ||   TCP  | |
+| | +-+--+-+  +-v+ +-v+ |pri +++sec||  |  | 
++  +---++---v+rewriter++  ++ | |
+| | |  |  |  | |  | |in  | |in ||  |  | |  
  |  ||  |  || | |
+| | |  filter  |  |  filter  | |  filter  +-->  colo   <--+ +> 
 filter   +--> adjust |   adjust +-->   filter   | | |
+| | |  mirror  |  |redirector| |redirector| || compare |   |  ||  | | 
redirector |  | ack|   seq|  | redirector | | |
+| | |  |  |  | |  | || |   |  ||  | |  
  |  ||  |  || | |
+| | +^-+  ++-+ +--+ |+-+   |  ||  | 
++  ++--+  +---++ | |
+| |  |   tx|   rx   rx  |  |  ||  |
txall   |  rx  | |
+| |  | ||  |  ||  
+---+ |
+| |  | +--+ |  |  ||   
||
+| |  |   filter excute order  | |  |  ||   
||
+| |  |

[Qemu-devel] [PATCH V15 00/12] Introduce COLO-Proxy

2016-09-26 Thread Zhang Chen

COLO-proxy is a part of COLO project. COLO project is
composed of COLO-frame, COLO-proxy and block-replication.
It is used to compare the network package to help COLO
decide whether to do checkpoint. With COLO-proxy's help,
COLO greatly improves the performance.

The filter-redirector, filter-mirror, colo-compare
and filter-rewriter compose the COLO-proxy.

COLO-compare
It is used to compare the network package to help COLO decide
whether to do checkpoint. 

Filter-rewriter
It will rewrite some of secondary packet to make
secondary guest's connection established successfully.
In this module we will rewrite tcp packet's ack to the secondary
from primary,and rewrite tcp packet's seq to the primary from
secondary.

The full version in this github:
https://github.com/zhangckid/qemu/tree/colo-v2.7-proxy-mode-compare-and-rewriter-sep27

v15:
  - change "ConnectionKey key = { 0 };" to
"ConnectionKey key = {{0},};", fix typo.

v14:
  - fix some coding style problems
  - move patch "add docs/colo-proxy.txt" to the end of this series
  - fix the mingw build issue

v13:
  - add docs/colo-proxy.txt
  - add MAINTAINERS
  - remove unnecessary .h
  - remove QTAILQ_ENTRY(CompareState)
  - fix some comments
  - add find_and_check_chardev() to avoid code duplication
  - remove the "is_unix" in patch3
  - change error_report() to trace in patch4
  - use l2hdr_len here instead of ETH_HLEP
  - fix code style
  - remove colo_rm_connection()
  - remove hashtable_size
  - change g_queue_foreach() to g_queue_find_custom() in patch7
  - change trace_colo_compare_tcp_miscompare() to fprintf() in patch8
  - add codes not queue vlan packets

v12:
  - add qemu-char: Add qemu_chr_add_handlers_full() for GMaincontext
to this series as the first patch.
  - update COLO net ascii figure.
  - add chardev socket check.
  - fix some typo.
  - add some comments.
  - rename net/colo-base.c to net/colo.c
  - rename network/transport_layer to network/transport_header.
  - move the job that clear coon_list when hashtable_size oversize
to connection_get.
  - reuse connection_destroy() do colo_rm_connection().
  - fix pkt mem leak in colo_compare_connection().
(result be released in g_queue_remove(), so it were not leak)
  - rename thread_name "compare" to "colo-compare".
  - change icmp compare to memcmp().

v11:
  - Make patch 5 to a independent patch series.
[PATCH V3] qemu-char: Add qemu_chr_add_handlers_full() for GMaincontext
  - For Jason's comments, merge filter-rewriter to this series.
(patch 7,8,9)
  - Add reverse_connection_key()
  - remove conn_list in filter-rewriter
  - remove unprocessed_connections
  - add some comments

v10:
  - fix typo
  - Should we make patch 5 independent with this series?
This patch just add a API for qemu-char.

v9:
 p5:
  - use chr_update_read_handler_full() replace
the chr_update_read_handler()
  - use io_watch_poll_prepare_full() replace
the io_watch_poll_prepare()
  - use io_watch_poll_funcs_full replace
the io_watch_poll_funcs
  - avoid code duplication

v8:
 p5:
  - add new patch:
qemu-char: Add qemu_chr_add_handlers_full() for GMaincontext

v7:
 p5:
   - add [PATCH]qemu-char: Fix context for g_source_attach()
 in this patch series.

v6: 
 p6:
   - add more commit log.
   - fix icmp comparison to compare all packet.

 p5:
   - add more cpmments in commit log.
   - change REGULAR_CHECK_MS to REGULAR_PACKET_CHECK_MS
   - make check old packet independent to compare thread
   - remove thread_status

 p4:
   - change this patch only about
 Connection and ConnectionKey.
   - add some comments in commit log.
   - remove mode in fill_connection_key().
   - fix some comments and bug.
   - move colo_conn_state to patch of
 "work with colo-frame"
   - remove conn_list_lock.
   - add MAX_QUEUE_SIZE, if primary_list or
 secondary_list biger than MAX_QUEUE_SIZE
 we will drop packet. 

 p3:
   - add new independent kernel jhash patch.

 p2:
   - add new independent colo-base patch.

 p1:
   - add a ascii figure and some comments to explain it
   - move trace.h to p2
   - move QTAILQ_HEAD(, CompareState) net_compares to
 patch of "work with colo-frame"
   - add some comments in qemu-option.hx


v5:
 p3:
- comments from Jason
  we poll and handle chardev in comapre thread,
  Through this way, there's no need for extra 
  synchronization with main loop
  this depend on another patch:
  qemu-char: Fix context for g_source_attach()
- remove QemuEvent
 p2:
- remove conn->list_lock
 p1:
- move compare_pri/sec_chr_in to p3
- move compare_chr_send to p2

v4:
 p4:
- add some comments
- fix some trace-events
- fix tcp compare error
 p3:
- add rcu_read_lock().
- fix trace name
- fix jason's other comments
- rebase some Dave's branch function
 p2:
- colo_compare_connection() change g_queue_push_head() to
- g_queue_push_tail() match to sorted order.
- remove pkt->s
- move data

[Qemu-devel] [PATCH V15 11/12] MAINTAINERS: add maintainer for COLO-proxy

2016-09-26 Thread Zhang Chen

add Zhang Chen and Li zhijian as co-maintainers of COLO-proxy.

Signed-off-by: Zhang Chen 
Signed-off-by: Li Zhijian 
Signed-off-by: Wen Congyang 
---
 MAINTAINERS | 9 +
 1 file changed, 9 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index b6fb84e..4781f9f 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1330,6 +1330,15 @@ F: include/qemu/throttle.h
 F: util/throttle.c
 L: qemu-bl...@nongnu.org
 
+COLO Proxy
+M: Zhang Chen 
+M: Li Zhijian 
+S: Supported
+F: docs/colo-proxy.txt
+F: net/colo*
+F: net/filter-rewriter.c
+F: net/filter-mirror.c
+
 Usermode Emulation
 --
 Overall
-- 
2.7.4

[Qemu-devel] [PATCH V15 06/12] colo-compare: introduce packet comparison thread

2016-09-26 Thread Zhang Chen

If primary packet is same with secondary packet,
we will send primary packet and drop secondary
packet, otherwise notify COLO frame to do checkpoint.
If primary packet comes but secondary packet does not,
after REGULAR_PACKET_CHECK_MS milliseconds we set
the primary packet as old_packet,then do a checkpoint.

Signed-off-by: Zhang Chen 
Signed-off-by: Li Zhijian 
Signed-off-by: Wen Congyang 
---
 net/colo-compare.c | 233 +
 net/colo.c |   1 +
 net/colo.h |   3 +
 trace-events   |   2 +
 4 files changed, 239 insertions(+)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index cafd5e3..b62225a 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -33,8 +33,12 @@
 #define COLO_COMPARE(obj) \
 OBJECT_CHECK(CompareState, (obj), TYPE_COLO_COMPARE)
 
+#define COMPARE_READ_LEN_MAX NET_BUFSIZE
 #define MAX_QUEUE_SIZE 1024
 
+/* TODO: Should be configurable */
+#define REGULAR_PACKET_CHECK_MS 3000
+
 /*
   + CompareState ++
   |   |
@@ -76,6 +80,11 @@ typedef struct CompareState {
 GQueue conn_list;
 /* hashtable to save connection */
 GHashTable *connection_track_table;
+/* compare thread, a thread for each NIC */
+QemuThread thread;
+/* Timer used on the primary to find packets that are never matched */
+QEMUTimer *timer;
+QemuMutex timer_check_lock;
 } CompareState;
 
 typedef struct CompareClass {
@@ -148,6 +157,118 @@ static int packet_enqueue(CompareState *s, int mode)
 return 0;
 }
 
+/*
+ * The IP packets sent by primary and secondary
+ * will be compared in here
+ * TODO support ip fragment, Out-Of-Order
+ * return:0  means packet same
+ *> 0 || < 0 means packet different
+ */
+static int colo_packet_compare(Packet *ppkt, Packet *spkt)
+{
+trace_colo_compare_ip_info(ppkt->size, inet_ntoa(ppkt->ip->ip_src),
+   inet_ntoa(ppkt->ip->ip_dst), spkt->size,
+   inet_ntoa(spkt->ip->ip_src),
+   inet_ntoa(spkt->ip->ip_dst));
+
+if (ppkt->size == spkt->size) {
+return memcmp(ppkt->data, spkt->data, spkt->size);
+} else {
+return -1;
+}
+}
+
+static int colo_packet_compare_all(Packet *spkt, Packet *ppkt)
+{
+trace_colo_compare_main("compare all");
+return colo_packet_compare(ppkt, spkt);
+}
+
+static int colo_old_packet_check_one(Packet *pkt, int64_t *check_time)
+{
+int64_t now = qemu_clock_get_ms(QEMU_CLOCK_HOST);
+
+if ((now - pkt->creation_ms) > (*check_time)) {
+trace_colo_old_packet_check_found(pkt->creation_ms);
+return 0;
+} else {
+return 1;
+}
+}
+
+static void colo_old_packet_check_one_conn(void *opaque,
+   void *user_data)
+{
+Connection *conn = opaque;
+GList *result = NULL;
+int64_t check_time = REGULAR_PACKET_CHECK_MS;
+
+result = g_queue_find_custom(>primary_list,
+ _time,
+ (GCompareFunc)colo_old_packet_check_one);
+
+if (result) {
+/* do checkpoint will flush old packet */
+/* TODO: colo_notify_checkpoint();*/
+}
+}
+
+/*
+ * Look for old packets that the secondary hasn't matched,
+ * if we have some then we have to checkpoint to wake
+ * the secondary up.
+ */
+static void colo_old_packet_check(void *opaque)
+{
+CompareState *s = opaque;
+
+g_queue_foreach(>conn_list, colo_old_packet_check_one_conn, NULL);
+}
+
+/*
+ * Called from the compare thread on the primary
+ * for compare connection
+ */
+static void colo_compare_connection(void *opaque, void *user_data)
+{
+CompareState *s = user_data;
+Connection *conn = opaque;
+Packet *pkt = NULL;
+GList *result = NULL;
+int ret;
+
+while (!g_queue_is_empty(>primary_list) &&
+   !g_queue_is_empty(>secondary_list)) {
+qemu_mutex_lock(>timer_check_lock);
+pkt = g_queue_pop_tail(>primary_list);
+qemu_mutex_unlock(>timer_check_lock);
+result = g_queue_find_custom(>secondary_list,
+  pkt, (GCompareFunc)colo_packet_compare_all);
+
+if (result) {
+ret = compare_chr_send(s->chr_out, pkt->data, pkt->size);
+if (ret < 0) {
+error_report("colo_send_primary_packet failed");
+}
+trace_colo_compare_main("packet same and release packet");
+g_queue_remove(>secondary_list, result->data);
+packet_destroy(pkt, NULL);
+} else {
+/*
+ * If one packet arrive late, the secondary_list or
+ * primary_list will be empty, so we can't compare it
+ * until next comparison.
+ */
+trace_colo_compare_main("packet different");
+qemu_mutex_lock(>timer_check_lock);
+

[Qemu-devel] [PATCH V15 03/12] net/colo.c: add colo.c to define and handle packet

2016-09-26 Thread Zhang Chen

The net/colo.c is used by colo-compare and filter-rewriter.
this can share common data structure like net packet,
and other functions.

Signed-off-by: Zhang Chen 
Signed-off-by: Li Zhijian 
Signed-off-by: Wen Congyang 
---
 net/Makefile.objs  |   1 +
 net/colo-compare.c | 114 +++--
 net/colo.c |  86 
 net/colo.h |  37 +
 trace-events   |   6 +++
 5 files changed, 240 insertions(+), 4 deletions(-)
 create mode 100644 net/colo.c
 create mode 100644 net/colo.h

diff --git a/net/Makefile.objs b/net/Makefile.objs
index ba92f73..beb504b 100644
--- a/net/Makefile.objs
+++ b/net/Makefile.objs
@@ -17,3 +17,4 @@ common-obj-y += filter.o
 common-obj-y += filter-buffer.o
 common-obj-y += filter-mirror.o
 common-obj-y += colo-compare.o
+common-obj-y += colo.o
diff --git a/net/colo-compare.c b/net/colo-compare.c
index dc5f70c..cea9b27 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -14,6 +14,7 @@
 
 #include "qemu/osdep.h"
 #include "qemu/error-report.h"
+#include "trace.h"
 #include "qemu-common.h"
 #include "qapi/qmp/qerror.h"
 #include "qapi/error.h"
@@ -26,13 +27,34 @@
 #include "sysemu/char.h"
 #include "qemu/sockets.h"
 #include "qapi-visit.h"
+#include "net/colo.h"
 
 #define TYPE_COLO_COMPARE "colo-compare"
 #define COLO_COMPARE(obj) \
 OBJECT_CHECK(CompareState, (obj), TYPE_COLO_COMPARE)
 
-#define COMPARE_READ_LEN_MAX NET_BUFSIZE
-
+/*
+  + CompareState ++
+  |   |
+  +---+   +---+ +---+
+  |conn list  +--->conn   +->conn   |
+  +---+   +---+ +---+
+  |   | |   | |  |
+  +---+ +---v+  +---v++---v+ +---v+
+|primary |  |secondary|primary | |secondary
+|packet  |  |packet  +|packet  | |packet  +
+++  ++++ ++
+|   | |  |
++---v+  +---v++---v+ +---v+
+|primary |  |secondary|primary | |secondary
+|packet  |  |packet  +|packet  | |packet  +
+++  ++++ ++
+|   | |  |
++---v+  +---v++---v+ +---v+
+|primary |  |secondary|primary | |secondary
+|packet  |  |packet  +|packet  | |packet  +
+++  ++++ ++
+*/
 typedef struct CompareState {
 Object parent;
 
@@ -44,6 +66,9 @@ typedef struct CompareState {
 CharDriverState *chr_out;
 SocketReadState pri_rs;
 SocketReadState sec_rs;
+
+/* hashtable to save connection */
+GHashTable *connection_track_table;
 } CompareState;
 
 typedef struct CompareClass {
@@ -54,6 +79,76 @@ typedef struct CompareChardevProps {
 bool is_socket;
 } CompareChardevProps;
 
+enum {
+PRIMARY_IN = 0,
+SECONDARY_IN,
+};
+
+static int compare_chr_send(CharDriverState *out,
+const uint8_t *buf,
+uint32_t size);
+
+/*
+ * Return 0 on success, if return -1 means the pkt
+ * is unsupported(arp and ipv6) and will be sent later
+ */
+static int packet_enqueue(CompareState *s, int mode)
+{
+Packet *pkt = NULL;
+
+if (mode == PRIMARY_IN) {
+pkt = packet_new(s->pri_rs.buf, s->pri_rs.packet_len);
+} else {
+pkt = packet_new(s->sec_rs.buf, s->sec_rs.packet_len);
+}
+
+if (parse_packet_early(pkt)) {
+packet_destroy(pkt, NULL);
+pkt = NULL;
+return -1;
+}
+/* TODO: get connection key from pkt */
+
+/*
+ * TODO: use connection key get conn from
+ * connection_track_table
+ */
+
+/*
+ * TODO: insert pkt to it's conn->primary_list
+ * or conn->secondary_list
+ */
+
+return 0;
+}
+
+static int compare_chr_send(CharDriverState *out,
+const uint8_t *buf,
+uint32_t size)
+{
+int ret = 0;
+uint32_t len = htonl(size);
+
+if (!size) {
+return 0;
+}
+
+ret = qemu_chr_fe_write_all(out, (uint8_t *), sizeof(len));
+if (ret != sizeof(len)) {
+goto err;
+}
+
+ret = qemu_chr_fe_write_all(out, (uint8_t *)buf, size);
+if (ret != size) {
+goto err;
+}
+
+return 0;
+
+err:
+return ret < 0 ? ret : -EIO;
+}
+
 static char *compare_get_pri_indev(Object *obj, Error **errp)
 {
 CompareState *s = COLO_COMPARE(obj);
@@ -101,12 +196,21 @@ static void compare_set_outdev(Object *obj, const char 
*value, Error **errp)

[Qemu-devel] [PATCH V15 08/12] filter-rewriter: introduce filter-rewriter initialization

2016-09-26 Thread Zhang Chen

Filter-rewriter is a part of COLO project.
It will rewrite some of secondary packet to make
secondary guest's tcp connection established successfully.
In this module we will rewrite tcp packet's ack to the secondary
from primary,and rewrite tcp packet's seq to the primary from
secondary.

usage:

colo secondary:
-object filter-redirector,id=f1,netdev=hn0,queue=tx,indev=red0
-object filter-redirector,id=f2,netdev=hn0,queue=rx,outdev=red1
-object filter-rewriter,id=rew0,netdev=hn0,queue=all

Signed-off-by: Zhang Chen 
Signed-off-by: Li Zhijian 
Signed-off-by: Wen Congyang 
---
 net/Makefile.objs |   1 +
 net/filter-rewriter.c | 105 ++
 qemu-options.hx   |  13 +++
 vl.c  |   3 +-
 4 files changed, 121 insertions(+), 1 deletion(-)
 create mode 100644 net/filter-rewriter.c

diff --git a/net/Makefile.objs b/net/Makefile.objs
index beb504b..2a80df5 100644
--- a/net/Makefile.objs
+++ b/net/Makefile.objs
@@ -18,3 +18,4 @@ common-obj-y += filter-buffer.o
 common-obj-y += filter-mirror.o
 common-obj-y += colo-compare.o
 common-obj-y += colo.o
+common-obj-y += filter-rewriter.o
diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c
new file mode 100644
index 000..de29f07
--- /dev/null
+++ b/net/filter-rewriter.c
@@ -0,0 +1,105 @@
+/*
+ * Copyright (c) 2016 HUAWEI TECHNOLOGIES CO., LTD.
+ * Copyright (c) 2016 FUJITSU LIMITED
+ * Copyright (c) 2016 Intel Corporation
+ *
+ * Author: Zhang Chen 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.  See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "net/colo.h"
+#include "net/filter.h"
+#include "net/net.h"
+#include "qemu-common.h"
+#include "qapi/error.h"
+#include "qapi/qmp/qerror.h"
+#include "qapi-visit.h"
+#include "qom/object.h"
+#include "qemu/main-loop.h"
+#include "qemu/iov.h"
+#include "net/checksum.h"
+
+#define FILTER_COLO_REWRITER(obj) \
+OBJECT_CHECK(RewriterState, (obj), TYPE_FILTER_REWRITER)
+
+#define TYPE_FILTER_REWRITER "filter-rewriter"
+
+typedef struct RewriterState {
+NetFilterState parent_obj;
+NetQueue *incoming_queue;
+/* hashtable to save connection */
+GHashTable *connection_track_table;
+} RewriterState;
+
+static void filter_rewriter_flush(NetFilterState *nf)
+{
+RewriterState *s = FILTER_COLO_REWRITER(nf);
+
+if (!qemu_net_queue_flush(s->incoming_queue)) {
+/* Unable to empty the queue, purge remaining packets */
+qemu_net_queue_purge(s->incoming_queue, nf->netdev);
+}
+}
+
+static ssize_t colo_rewriter_receive_iov(NetFilterState *nf,
+ NetClientState *sender,
+ unsigned flags,
+ const struct iovec *iov,
+ int iovcnt,
+ NetPacketSent *sent_cb)
+{
+/*
+ * if we get tcp packet
+ * we will rewrite it to make secondary guest's
+ * connection established successfully
+ */
+return 0;
+}
+
+static void colo_rewriter_cleanup(NetFilterState *nf)
+{
+RewriterState *s = FILTER_COLO_REWRITER(nf);
+
+/* flush packets */
+if (s->incoming_queue) {
+filter_rewriter_flush(nf);
+g_free(s->incoming_queue);
+}
+}
+
+static void colo_rewriter_setup(NetFilterState *nf, Error **errp)
+{
+RewriterState *s = FILTER_COLO_REWRITER(nf);
+
+s->connection_track_table = g_hash_table_new_full(connection_key_hash,
+  connection_key_equal,
+  g_free,
+  connection_destroy);
+s->incoming_queue = qemu_new_net_queue(qemu_netfilter_pass_to_next, nf);
+}
+
+static void colo_rewriter_class_init(ObjectClass *oc, void *data)
+{
+NetFilterClass *nfc = NETFILTER_CLASS(oc);
+
+nfc->setup = colo_rewriter_setup;
+nfc->cleanup = colo_rewriter_cleanup;
+nfc->receive_iov = colo_rewriter_receive_iov;
+}
+
+static const TypeInfo colo_rewriter_info = {
+.name = TYPE_FILTER_REWRITER,
+.parent = TYPE_NETFILTER,
+.class_init = colo_rewriter_class_init,
+.instance_size = sizeof(RewriterState),
+};
+
+static void register_types(void)
+{
+type_register_static(_rewriter_info);
+}
+
+type_init(register_types);
diff --git a/qemu-options.hx b/qemu-options.hx
index 87fbba3..5cd50a8 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -3880,6 +3880,19 @@ Create a filter-redirector we need to differ outdev id 
from indev id, id can not
 be the same. we can just use indev or outdev, but at least one of indev or 
outdev
 need to be specified.
 
+@item -object

[Qemu-devel] [PATCH V15 02/12] colo-compare: introduce colo compare initialization

2016-09-26 Thread Zhang Chen

This a COLO net ascii figure:

 Primary qemu   
Secondary qemu
+--+   
++
| +--+ |   |  
+---+ |
| |  | |   |  | 
  | |
| |guest | |   |  | 
   guest  | |
| |  | |   |  | 
  | |
| +---^--+---+ |   |  
+-+++ |
| |  | |   |
^|  |
| |  | |   |
||  |
| |  +--+  |
||  |
|netfilter|  |   | ||  |   
netfilter||  |
| +--+ ++  ||  |  
+---+ |
| |   |  |   |  |out   ||  |  | 
||  filter excute order   | |
| |   |  |  +-+||  |  | 
|| +--->  | |
| |   |  |  ||  | |||  |  | 
||   TCP  | |
| | +-+--+-+  +-v+ +-v+ |pri +++sec||  |  | 
++  +---++---v+rewriter++  ++ | |
| | |  |  |  | |  | |in  | |in ||  |  | |   
 |  ||  |  || | |
| | |  filter  |  |  filter  | |  filter  +-->  colo   <--+ +>  
filter   +--> adjust |   adjust +-->   filter   | | |
| | |  mirror  |  |redirector| |redirector| || compare |   |  ||  | | 
redirector |  | ack|   seq|  | redirector | | |
| | |  |  |  | |  | || |   |  ||  | |   
 |  ||  |  || | |
| | +^-+  ++-+ +--+ |+-+   |  ||  | 
++  ++--+  +---++ | |
| |  |   tx|   rx   rx  |  |  ||  | 
   txall   |  rx  | |
| |  | ||  |  ||  
+---+ |
| |  | +--+ |  |  ||
   ||
| |  |   filter excute order  | |  |  ||
   ||
| |  |  +>| |  |  
++|
| +-+  |   |
|
||||   |
|
+--+   
++
 |guest receive   | guest send
 ||
++v+
|  |
  NOTE: filter direction is rx/tx/all
| tap  |
  rx:receive packets sent to the netdev
|  |
  tx:receive packets sent by the netdev
+--+

In COLO-compare, we do packet comparing job.
Packets coming from the primary char indev will be sent to outdev.
Packets coming from the secondary char dev will be dropped after comparing.
colo-comapre need two input chardev and one output chardev:
primary_in=chardev1-id (source: primary send packet)
secondary_in=chardev2-id (source: secondary send packet)
outdev=chardev3-id

[Qemu-devel] [PATCH V15 01/12] qemu-char: Add qemu_chr_add_handlers_full() for GMaincontext

2016-09-26 Thread Zhang Chen

Add qemu_chr_add_handlers_full() API, we can use
this API pass in a GMainContext,make handler run
in the context rather than main_loop.
This comments from Daniel P . Berrange.

Signed-off-by: Zhang Chen 
Signed-off-by: Li Zhijian 
Signed-off-by: Wen Congyang 
Reviewed-by: Daniel P. Berrange 
---
 include/sysemu/char.h | 11 +++-
 qemu-char.c   | 77 +++
 2 files changed, 63 insertions(+), 25 deletions(-)

diff --git a/include/sysemu/char.h b/include/sysemu/char.h
index ee7e554..0d0465a 100644
--- a/include/sysemu/char.h
+++ b/include/sysemu/char.h
@@ -65,7 +65,8 @@ struct CharDriverState {
 int (*chr_sync_read)(struct CharDriverState *s,
  const uint8_t *buf, int len);
 GSource *(*chr_add_watch)(struct CharDriverState *s, GIOCondition cond);
-void (*chr_update_read_handler)(struct CharDriverState *s);
+void (*chr_update_read_handler)(struct CharDriverState *s,
+GMainContext *context);
 int (*chr_ioctl)(struct CharDriverState *s, int cmd, void *arg);
 int (*get_msgfds)(struct CharDriverState *s, int* fds, int num);
 int (*set_msgfds)(struct CharDriverState *s, int *fds, int num);
@@ -422,6 +423,14 @@ void qemu_chr_add_handlers(CharDriverState *s,
IOEventHandler *fd_event,
void *opaque);
 
+/* This API can make handler run in the context what you pass to. */
+void qemu_chr_add_handlers_full(CharDriverState *s,
+IOCanReadHandler *fd_can_read,
+IOReadHandler *fd_read,
+IOEventHandler *fd_event,
+void *opaque,
+GMainContext *context);
+
 void qemu_chr_be_generic_open(CharDriverState *s);
 void qemu_chr_accept_input(CharDriverState *s);
 int qemu_chr_add_client(CharDriverState *s, int fd);
diff --git a/qemu-char.c b/qemu-char.c
index 5f82ebb..100aada 100644
--- a/qemu-char.c
+++ b/qemu-char.c
@@ -446,11 +446,12 @@ void qemu_chr_fe_printf(CharDriverState *s, const char 
*fmt, ...)
 
 static void remove_fd_in_watch(CharDriverState *chr);
 
-void qemu_chr_add_handlers(CharDriverState *s,
-   IOCanReadHandler *fd_can_read,
-   IOReadHandler *fd_read,
-   IOEventHandler *fd_event,
-   void *opaque)
+void qemu_chr_add_handlers_full(CharDriverState *s,
+IOCanReadHandler *fd_can_read,
+IOReadHandler *fd_read,
+IOEventHandler *fd_event,
+void *opaque,
+GMainContext *context)
 {
 int fe_open;
 
@@ -464,8 +465,9 @@ void qemu_chr_add_handlers(CharDriverState *s,
 s->chr_read = fd_read;
 s->chr_event = fd_event;
 s->handler_opaque = opaque;
-if (fe_open && s->chr_update_read_handler)
-s->chr_update_read_handler(s);
+if (fe_open && s->chr_update_read_handler) {
+s->chr_update_read_handler(s, context);
+}
 
 if (!s->explicit_fe_open) {
 qemu_chr_fe_set_open(s, fe_open);
@@ -478,6 +480,16 @@ void qemu_chr_add_handlers(CharDriverState *s,
 }
 }
 
+void qemu_chr_add_handlers(CharDriverState *s,
+   IOCanReadHandler *fd_can_read,
+   IOReadHandler *fd_read,
+   IOEventHandler *fd_event,
+   void *opaque)
+{
+qemu_chr_add_handlers_full(s, fd_can_read, fd_read,
+   fd_event, opaque, NULL);
+}
+
 static int null_chr_write(CharDriverState *chr, const uint8_t *buf, int len)
 {
 return len;
@@ -715,7 +727,8 @@ static void mux_chr_event(void *opaque, int event)
 mux_chr_send_event(d, i, event);
 }
 
-static void mux_chr_update_read_handler(CharDriverState *chr)
+static void mux_chr_update_read_handler(CharDriverState *chr,
+GMainContext *context)
 {
 MuxDriver *d = chr->opaque;
 
@@ -729,8 +742,10 @@ static void mux_chr_update_read_handler(CharDriverState 
*chr)
 d->chr_event[d->mux_cnt] = chr->chr_event;
 /* Fix up the real driver with mux routines */
 if (d->mux_cnt == 0) {
-qemu_chr_add_handlers(d->drv, mux_chr_can_read, mux_chr_read,
-  mux_chr_event, chr);
+qemu_chr_add_handlers_full(d->drv, mux_chr_can_read,
+   mux_chr_read,
+   mux_chr_event,
+   chr, context);
 }
 if (d->focus != -1) {
 mux_chr_send_event(d, d->focus, CHR_EVENT_MUX_OUT);
@@ -846,6 +861,7 @@ typedef struct IOWatchPoll
 IOCanReadHandler *fd_can_read;

[Qemu-devel] [PATCH V15 05/12] colo-compare: track connection and enqueue packet

2016-09-26 Thread Zhang Chen

In this patch we use kernel jhash table to track
connection, and then enqueue net packet like this:

+ CompareState ++
|   |
+---+   +---+ +---+
|conn list  +--->conn   +->conn   |
+---+   +---+ +---+
|   | |   | |  |
+---+ +---v+  +---v++---v+ +---v+
  |primary |  |secondary|primary | |secondary
  |packet  |  |packet  +|packet  | |packet  +
  ++  ++++ ++
  |   | |  |
  +---v+  +---v++---v+ +---v+
  |primary |  |secondary|primary | |secondary
  |packet  |  |packet  +|packet  | |packet  +
  ++  ++++ ++
  |   | |  |
  +---v+  +---v++---v+ +---v+
  |primary |  |secondary|primary | |secondary
  |packet  |  |packet  +|packet  | |packet  +
  ++  ++++ ++

We use conn_list to record connection info.
When we want to enqueue a packet, firstly get the
connection from connection_track_table. then push
the packet to g_queue(pri/sec) in it's own conn.

Signed-off-by: Zhang Chen 
Signed-off-by: Li Zhijian 
Signed-off-by: Wen Congyang 
---
 net/colo-compare.c |  53 +-
 net/colo.c | 109 +
 net/colo.h |  39 +++
 3 files changed, 191 insertions(+), 10 deletions(-)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index cea9b27..cafd5e3 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -33,6 +33,8 @@
 #define COLO_COMPARE(obj) \
 OBJECT_CHECK(CompareState, (obj), TYPE_COLO_COMPARE)
 
+#define MAX_QUEUE_SIZE 1024
+
 /*
   + CompareState ++
   |   |
@@ -67,6 +69,11 @@ typedef struct CompareState {
 SocketReadState pri_rs;
 SocketReadState sec_rs;
 
+/* connection list: the connections belonged to this NIC could be found
+ * in this list.
+ * element type: Connection
+ */
+GQueue conn_list;
 /* hashtable to save connection */
 GHashTable *connection_track_table;
 } CompareState;
@@ -94,7 +101,9 @@ static int compare_chr_send(CharDriverState *out,
  */
 static int packet_enqueue(CompareState *s, int mode)
 {
+ConnectionKey key = {{0},};
 Packet *pkt = NULL;
+Connection *conn;
 
 if (mode == PRIMARY_IN) {
 pkt = packet_new(s->pri_rs.buf, s->pri_rs.packet_len);
@@ -107,17 +116,34 @@ static int packet_enqueue(CompareState *s, int mode)
 pkt = NULL;
 return -1;
 }
-/* TODO: get connection key from pkt */
+fill_connection_key(pkt, );
 
-/*
- * TODO: use connection key get conn from
- * connection_track_table
- */
+conn = connection_get(s->connection_track_table,
+  ,
+  >conn_list);
 
-/*
- * TODO: insert pkt to it's conn->primary_list
- * or conn->secondary_list
- */
+if (!conn->processing) {
+g_queue_push_tail(>conn_list, conn);
+conn->processing = true;
+}
+
+if (mode == PRIMARY_IN) {
+if (g_queue_get_length(>primary_list) <=
+   MAX_QUEUE_SIZE) {
+g_queue_push_tail(>primary_list, pkt);
+} else {
+error_report("colo compare primary queue size too big,"
+ "drop packet");
+}
+} else {
+if (g_queue_get_length(>secondary_list) <=
+   MAX_QUEUE_SIZE) {
+g_queue_push_tail(>secondary_list, pkt);
+} else {
+error_report("colo compare secondary queue size too big,"
+ "drop packet");
+}
+}
 
 return 0;
 }
@@ -308,7 +334,12 @@ static void colo_compare_complete(UserCreatable *uc, Error 
**errp)
 net_socket_rs_init(>pri_rs, compare_pri_rs_finalize);
 net_socket_rs_init(>sec_rs, compare_sec_rs_finalize);
 
-/* use g_hash_table_new_full() to new a hashtable */
+g_queue_init(>conn_list);
+
+s->connection_track_table = g_hash_table_new_full(connection_key_hash,
+  connection_key_equal,
+  g_free,
+  connection_destroy);
 
 return;
 }
@@ -349,6 +380,8 @@ static void colo_compare_finalize(Object *obj)
 qemu_chr_fe_release(s->chr_out);
 }
 
+g_queue_free(>conn_list);
+
 g_free(s->pri_indev);

[Qemu-devel] [PATCH V15 07/12] colo-compare: add TCP, UDP, ICMP packet comparison

2016-09-26 Thread Zhang Chen

We add TCP,UDP,ICMP packet comparison to replace
IP packet comparison. This can increase the
accuracy of the package comparison.
Less checkpoint more efficiency.

Signed-off-by: Zhang Chen 
Signed-off-by: Li Zhijian 
Signed-off-by: Wen Congyang 
---
 net/colo-compare.c | 147 +++--
 trace-events   |   3 ++
 2 files changed, 146 insertions(+), 4 deletions(-)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index b62225a..8bbd08b 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -19,6 +19,7 @@
 #include "qapi/qmp/qerror.h"
 #include "qapi/error.h"
 #include "net/net.h"
+#include "net/eth.h"
 #include "qom/object_interfaces.h"
 #include "qemu/iov.h"
 #include "qom/object.h"
@@ -178,9 +179,131 @@ static int colo_packet_compare(Packet *ppkt, Packet *spkt)
 }
 }
 
-static int colo_packet_compare_all(Packet *spkt, Packet *ppkt)
+/*
+ * Called from the compare thread on the primary
+ * for compare tcp packet
+ * compare_tcp copied from Dr. David Alan Gilbert's branch
+ */
+static int colo_packet_compare_tcp(Packet *spkt, Packet *ppkt)
+{
+struct tcphdr *ptcp, *stcp;
+int res;
+char *sdebug, *ddebug;
+
+trace_colo_compare_main("compare tcp");
+if (ppkt->size != spkt->size) {
+if (trace_event_get_state(TRACE_COLO_COMPARE_MISCOMPARE)) {
+trace_colo_compare_main("pkt size not same");
+}
+return -1;
+}
+
+ptcp = (struct tcphdr *)ppkt->transport_header;
+stcp = (struct tcphdr *)spkt->transport_header;
+
+/*
+ * The 'identification' field in the IP header is *very* random
+ * it almost never matches.  Fudge this by ignoring differences in
+ * unfragmented packets; they'll normally sort themselves out if different
+ * anyway, and it should recover at the TCP level.
+ * An alternative would be to get both the primary and secondary to rewrite
+ * somehow; but that would need some sync traffic to sync the state
+ */
+if (ntohs(ppkt->ip->ip_off) & IP_DF) {
+spkt->ip->ip_id = ppkt->ip->ip_id;
+/* and the sum will be different if the IDs were different */
+spkt->ip->ip_sum = ppkt->ip->ip_sum;
+}
+
+res = memcmp(ppkt->data + ETH_HLEN, spkt->data + ETH_HLEN,
+(spkt->size - ETH_HLEN));
+
+if (res != 0 && trace_event_get_state(TRACE_COLO_COMPARE_MISCOMPARE)) {
+sdebug = strdup(inet_ntoa(ppkt->ip->ip_src));
+ddebug = strdup(inet_ntoa(ppkt->ip->ip_dst));
+fprintf(stderr, "%s: src/dst: %s/%s p: seq/ack=%u/%u"
+" s: seq/ack=%u/%u res=%d flags=%x/%x\n",
+__func__, sdebug, ddebug,
+(unsigned int)ntohl(ptcp->th_seq),
+(unsigned int)ntohl(ptcp->th_ack),
+(unsigned int)ntohl(stcp->th_seq),
+(unsigned int)ntohl(stcp->th_ack),
+res, ptcp->th_flags, stcp->th_flags);
+
+fprintf(stderr, "Primary len = %d\n", ppkt->size);
+qemu_hexdump((char *)ppkt->data, stderr, "colo-compare", ppkt->size);
+fprintf(stderr, "Secondary len = %d\n", spkt->size);
+qemu_hexdump((char *)spkt->data, stderr, "colo-compare", spkt->size);
+
+g_free(sdebug);
+g_free(ddebug);
+}
+
+return res;
+}
+
+/*
+ * Called from the compare thread on the primary
+ * for compare udp packet
+ */
+static int colo_packet_compare_udp(Packet *spkt, Packet *ppkt)
+{
+int ret;
+
+trace_colo_compare_main("compare udp");
+ret = colo_packet_compare(ppkt, spkt);
+
+if (ret) {
+trace_colo_compare_udp_miscompare("primary pkt size", ppkt->size);
+qemu_hexdump((char *)ppkt->data, stderr, "colo-compare", ppkt->size);
+trace_colo_compare_udp_miscompare("Secondary pkt size", spkt->size);
+qemu_hexdump((char *)spkt->data, stderr, "colo-compare", spkt->size);
+}
+
+return ret;
+}
+
+/*
+ * Called from the compare thread on the primary
+ * for compare icmp packet
+ */
+static int colo_packet_compare_icmp(Packet *spkt, Packet *ppkt)
 {
-trace_colo_compare_main("compare all");
+int network_length;
+
+trace_colo_compare_main("compare icmp");
+network_length = ppkt->ip->ip_hl * 4;
+if (ppkt->size != spkt->size ||
+ppkt->size < network_length + ETH_HLEN) {
+return -1;
+}
+
+if (colo_packet_compare(ppkt, spkt)) {
+trace_colo_compare_icmp_miscompare("primary pkt size",
+   ppkt->size);
+qemu_hexdump((char *)ppkt->data, stderr, "colo-compare",
+ ppkt->size);
+trace_colo_compare_icmp_miscompare("Secondary pkt size",
+   spkt->size);
+qemu_hexdump((char *)spkt->data, stderr, "colo-compare",
+ spkt->size);
+return -1;
+} else {
+return 0;
+}
+}
+
+/*
+ *

[Qemu-devel] [PATCH V15 09/12] filter-rewriter: track connection and parse packet

2016-09-26 Thread Zhang Chen

We use net/colo.h to track connection and parse packet

Signed-off-by: Zhang Chen 
Signed-off-by: Li Zhijian 
Signed-off-by: Wen Congyang 
---
 net/colo.c| 14 ++
 net/colo.h|  1 +
 net/filter-rewriter.c | 50 ++
 3 files changed, 65 insertions(+)

diff --git a/net/colo.c b/net/colo.c
index e517521..dc4e4a4 100644
--- a/net/colo.c
+++ b/net/colo.c
@@ -115,6 +115,20 @@ void fill_connection_key(Packet *pkt, ConnectionKey *key)
 }
 }
 
+void reverse_connection_key(ConnectionKey *key)
+{
+struct in_addr tmp_ip;
+uint16_t tmp_port;
+
+tmp_ip = key->src;
+key->src = key->dst;
+key->dst = tmp_ip;
+
+tmp_port = key->src_port;
+key->src_port = key->dst_port;
+key->dst_port = tmp_port;
+}
+
 Connection *connection_new(ConnectionKey *key)
 {
 Connection *conn = g_slice_new(Connection);
diff --git a/net/colo.h b/net/colo.h
index 9a7d5e0..6720a3a 100644
--- a/net/colo.h
+++ b/net/colo.h
@@ -68,6 +68,7 @@ uint32_t connection_key_hash(const void *opaque);
 int connection_key_equal(const void *opaque1, const void *opaque2);
 int parse_packet_early(Packet *pkt);
 void fill_connection_key(Packet *pkt, ConnectionKey *key);
+void reverse_connection_key(ConnectionKey *key);
 Connection *connection_new(ConnectionKey *key);
 void connection_destroy(void *opaque);
 Connection *connection_get(GHashTable *connection_track_table,
diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c
index de29f07..dca5dcf 100644
--- a/net/filter-rewriter.c
+++ b/net/filter-rewriter.c
@@ -44,6 +44,20 @@ static void filter_rewriter_flush(NetFilterState *nf)
 }
 }
 
+/*
+ * Return 1 on success, if return 0 means the pkt
+ * is not TCP packet
+ */
+static int is_tcp_packet(Packet *pkt)
+{
+if (!parse_packet_early(pkt) &&
+pkt->ip->ip_p == IPPROTO_TCP) {
+return 1;
+} else {
+return 0;
+}
+}
+
 static ssize_t colo_rewriter_receive_iov(NetFilterState *nf,
  NetClientState *sender,
  unsigned flags,
@@ -51,11 +65,47 @@ static ssize_t colo_rewriter_receive_iov(NetFilterState *nf,
  int iovcnt,
  NetPacketSent *sent_cb)
 {
+RewriterState *s = FILTER_COLO_REWRITER(nf);
+Connection *conn;
+ConnectionKey key = {{0},};
+Packet *pkt;
+ssize_t size = iov_size(iov, iovcnt);
+char *buf = g_malloc0(size);
+
+iov_to_buf(iov, iovcnt, 0, buf, size);
+pkt = packet_new(buf, size);
+
 /*
  * if we get tcp packet
  * we will rewrite it to make secondary guest's
  * connection established successfully
  */
+if (pkt && is_tcp_packet(pkt)) {
+
+fill_connection_key(pkt, );
+
+if (sender == nf->netdev) {
+/*
+ * We need make tcp TX and RX packet
+ * into one connection.
+ */
+reverse_connection_key();
+}
+conn = connection_get(s->connection_track_table,
+  ,
+  NULL);
+
+if (sender == nf->netdev) {
+/* NET_FILTER_DIRECTION_TX */
+/* handle_primary_tcp_pkt */
+} else {
+/* NET_FILTER_DIRECTION_RX */
+/* handle_secondary_tcp_pkt */
+}
+}
+
+packet_destroy(pkt, NULL);
+pkt = NULL;
 return 0;
 }
 
-- 
2.7.4

[Qemu-devel] [PATCH V15 04/12] Jhash: add linux kernel jhashtable in qemu

2016-09-26 Thread Zhang Chen

Jhash will be used by colo-compare and filter-rewriter
to save and lookup net connection info

Signed-off-by: Zhang Chen 
Signed-off-by: Li Zhijian 
Signed-off-by: Wen Congyang 
---
 include/qemu/jhash.h | 59 
 net/colo.h   |  1 +
 2 files changed, 60 insertions(+)
 create mode 100644 include/qemu/jhash.h

diff --git a/include/qemu/jhash.h b/include/qemu/jhash.h
new file mode 100644
index 000..742
--- /dev/null
+++ b/include/qemu/jhash.h
@@ -0,0 +1,59 @@
+/* jhash.h: Jenkins hash support.
+  *
+  * Copyright (C) 2006. Bob Jenkins (bob_jenk...@burtleburtle.net)
+  *
+  * http://burtleburtle.net/bob/hash/
+  *
+  * These are the credits from Bob's sources:
+  *
+  * lookup3.c, by Bob Jenkins, May 2006, Public Domain.
+  *
+  * These are functions for producing 32-bit hashes for hash table lookup.
+  * hashword(), hashlittle(), hashlittle2(), hashbig(), mix(), and final()
+  * are externally useful functions.  Routines to test the hash are included
+  * if SELF_TEST is defined.  You can use this free for any purpose. It's in
+  * the public domain.  It has no warranty.
+  *
+  * Copyright (C) 2009-2010 Jozsef Kadlecsik (kad...@blackhole.kfki.hu)
+  *
+  * I've modified Bob's hash to be useful in the Linux kernel, and
+  * any bugs present are my fault.
+  * Jozsef
+  */
+
+#ifndef QEMU_JHASH_H__
+#define QEMU_JHASH_H__
+
+#include "qemu/bitops.h"
+
+/*
+ * hashtable relation copy from linux kernel jhash
+ */
+
+/* __jhash_mix -- mix 3 32-bit values reversibly. */
+#define __jhash_mix(a, b, c)\
+{   \
+a -= c;  a ^= rol32(c, 4);  c += b; \
+b -= a;  b ^= rol32(a, 6);  a += c; \
+c -= b;  c ^= rol32(b, 8);  b += a; \
+a -= c;  a ^= rol32(c, 16); c += b; \
+b -= a;  b ^= rol32(a, 19); a += c; \
+c -= b;  c ^= rol32(b, 4);  b += a; \
+}
+
+/* __jhash_final - final mixing of 3 32-bit values (a,b,c) into c */
+#define __jhash_final(a, b, c)  \
+{   \
+c ^= b; c -= rol32(b, 14);  \
+a ^= c; a -= rol32(c, 11);  \
+b ^= a; b -= rol32(a, 25);  \
+c ^= b; c -= rol32(b, 16);  \
+a ^= c; a -= rol32(c, 4);   \
+b ^= a; b -= rol32(a, 14);  \
+c ^= b; c -= rol32(b, 24);  \
+}
+
+/* An arbitrary initial parameter */
+#define JHASH_INITVAL   0xdeadbeef
+
+#endif /* QEMU_JHASH_H__ */
diff --git a/net/colo.h b/net/colo.h
index e211eda..05dc0b6 100644
--- a/net/colo.h
+++ b/net/colo.h
@@ -16,6 +16,7 @@
 #define QEMU_COLO_PROXY_H
 
 #include "slirp/slirp.h"
+#include "qemu/jhash.h"
 
 #define HASHTABLE_MAX_SIZE 16384
 
-- 
2.7.4

[Qemu-devel] [PATCH] qemu-img: Allow unmap backing image for zeroed clusters

2016-09-26 Thread Fam Zheng

We already specified BDRV_O_UNMAP when opening images in 'qemu-img
commit', but didn't turn on the "unmap" in the active commit job. This
patch fixes that so that zeroed clusters in top image can be discarded
which is desired in the virt-sparsify use case, where a temporary
overlay is created and fstrim'ed before commiting back, to free space in
the original image.

The block-commit keeps the existing behavior.

Signed-off-by: Fam Zheng 
---
 block/mirror.c| 5 +++--
 block/replication.c   | 2 +-
 blockdev.c| 3 ++-
 include/block/block_int.h | 4 +++-
 qemu-img.c| 2 +-
 5 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index f9d1fec..5892cf6 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1002,7 +1002,8 @@ void commit_active_start(const char *job_id, 
BlockDriverState *bs,
  BlockdevOnError on_error,
  BlockCompletionFunc *cb,
  void *opaque, Error **errp,
- bool auto_complete)
+ bool auto_complete,
+ bool unmap)
 {
 int64_t length, base_length;
 int orig_base_flags;
@@ -1042,7 +1043,7 @@ void commit_active_start(const char *job_id, 
BlockDriverState *bs,
 
 mirror_start_job(job_id, bs, base, NULL, speed, 0, 0,
  MIRROR_LEAVE_BACKING_CHAIN,
- on_error, on_error, false, cb, opaque, _err,
+ on_error, on_error, unmap, cb, opaque, _err,
  _active_job_driver, false, base, auto_complete);
 if (local_err) {
 error_propagate(errp, local_err);
diff --git a/block/replication.c b/block/replication.c
index 3bd1cf1..b8921a6 100644
--- a/block/replication.c
+++ b/block/replication.c
@@ -624,7 +624,7 @@ static void replication_stop(ReplicationState *rs, bool 
failover, Error **errp)
 commit_active_start("replication-commit", s->active_disk->bs,
 s->secondary_disk->bs, 0, BLOCKDEV_ON_ERROR_REPORT,
 replication_done,
-bs, errp, true);
+bs, errp, true, false);
 break;
 default:
 aio_context_release(aio_context);
diff --git a/blockdev.c b/blockdev.c
index 29c6561..8f9125a 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3136,7 +3136,8 @@ void qmp_block_commit(bool has_job_id, const char 
*job_id, const char *device,
 goto out;
 }
 commit_active_start(has_job_id ? job_id : NULL, bs, base_bs, speed,
-on_error, block_job_cb, bs, _err, false);
+on_error, block_job_cb, bs, _err, false,
+false);
 } else {
 commit_start(has_job_id ? job_id : NULL, bs, base_bs, top_bs, speed,
  on_error, block_job_cb, bs,
diff --git a/include/block/block_int.h b/include/block/block_int.h
index ef3c047..9b1e5d8 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -694,13 +694,15 @@ void commit_start(const char *job_id, BlockDriverState 
*bs,
  * @opaque: Opaque pointer value passed to @cb.
  * @errp: Error object.
  * @auto_complete: Auto complete the job.
+ * @unmap: Unmap zero clusters on base.
  *
  */
 void commit_active_start(const char *job_id, BlockDriverState *bs,
  BlockDriverState *base, int64_t speed,
  BlockdevOnError on_error,
  BlockCompletionFunc *cb,
- void *opaque, Error **errp, bool auto_complete);
+ void *opaque, Error **errp, bool auto_complete,
+ bool unmap);
 /*
  * mirror_start:
  * @job_id: The id of the newly-created job, or %NULL to use the
diff --git a/qemu-img.c b/qemu-img.c
index ceffefe..f2397e6 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -929,7 +929,7 @@ static int img_commit(int argc, char **argv)
 };
 
 commit_active_start("commit", bs, base_bs, 0, BLOCKDEV_ON_ERROR_REPORT,
-common_block_job_cb, , _err, false);
+common_block_job_cb, , _err, false, true);
 if (local_err) {
 goto done;
 }
-- 
2.7.4

Re: [Qemu-devel] [PULL 00/28] Misc patches for 2016-09-26

2016-09-26 Thread Peter Xu

On Mon, Sep 26, 2016 at 02:19:08PM -0700, Peter Maydell wrote:

[...]

> I also see this compile failure:
> 
>   CCi386-softmmu/hw/i386/amd_iommu.o
> /home/petmay01/linaro/qemu-for-merges/hw/i386/amd_iommu.c: In function
> ‘amdvi_init’:
> /home/petmay01/linaro/qemu-for-merges/hw/i386/amd_iommu.c:1083:17:
> error: ‘MemoryRegionIOMMUOps {aka struct MemoryRegionIOMMUOps}’ has no
> member named ‘notify_started’
>  s->iommu_ops.notify_started = amdvi_iommu_notify_started;
>  ^
> /home/petmay01/linaro/qemu-for-merges/rules.mak:60: recipe for target
> 'hw/i386/amd_iommu.o' failed

Paolo,

Would you please help squash this into 02/28 of your PULL request to
solve above error?

--8<

diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
index a91a179..a868539 100644
--- a/hw/i386/amd_iommu.c
+++ b/hw/i386/amd_iommu.c
@@ -1066,7 +1066,9 @@ static const MemoryRegionOps mmio_mem_ops = {
 }
 };
 
-static void amdvi_iommu_notify_started(MemoryRegion *iommu)
+static void amdvi_iommu_notify_flag_changed(MemoryRegion *iommu,
+IOMMUNotifierFlag old,
+IOMMUNotifierFlag new)
 {
 AMDVIAddressSpace *as = container_of(iommu, AMDVIAddressSpace, iommu);
 
@@ -1080,7 +1082,7 @@ static void amdvi_init(AMDVIState *s)
 amdvi_iotlb_reset(s);
 
 s->iommu_ops.translate = amdvi_translate;
-s->iommu_ops.notify_started = amdvi_iommu_notify_started;
+s->iommu_ops.notify_flag_changed = amdvi_iommu_notify_flag_changed;
 s->devtab_len = 0;
 s->cmdbuf_len = 0;
 s->cmdbuf_head = 0;

-->8

Please ping me if you prefer a re-send. Thanks!

-- peterx

Re: [Qemu-devel] How does a guest OS differentiate between a Reboot/Shutdown ACPI event

2016-09-26 Thread Srinivasan J

Thank you Paolo for replying. I was able to confirm this.

Regards,
Srini

On Mon, Sep 26, 2016 at 12:40 PM, Paolo Bonzini  wrote:
>
>
> On 26/09/2016 04:38, Srinivasan J wrote:
>>
>> I have Ubuntu 14.04.1 (ubuntu-14.04.1-server-amd64.iso) guest running
>> in a KVM host. The host is running Ubuntu 16.04. I'm trying to find
>> out how Ubuntu 14.04.1 differentiates between virsh shutdown and virsh
>> reboot commands issued in the host. I see that in both cases the ACPI
>> event seen at the guest are exactly same. The guest however correctly
>> shuts down on issuing "virsh shutdown" and correctly reboots on
>> issuing "virsh reboot".
>
> In the case of "virsh reboot", libvirt restarts the guest as soon as the
> shutdown is complete.
>
> Paolo

[Qemu-devel] [PATCH] m68k: change default system clock for m5208evb

2016-09-26 Thread Greg Ungerer

The shipping default setting for the Freescale M5208EVB board is to run
the CPU at 166MHz. The current qemu emulation code for this board is
defaulting to 66MHz. This results in time appearing to run way to slowly.
So a "sleep 5" in a standard ColdFire Linux build takes almost 15
seconds in real time to actually complete.

Change the hard coded default to match the default hardware setting.

Signed-off-by: Greg Ungerer 
---
 hw/m68k/mcf5208.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/m68k/mcf5208.c b/hw/m68k/mcf5208.c
index 9240ebf..2d0b464 100644
--- a/hw/m68k/mcf5208.c
+++ b/hw/m68k/mcf5208.c
@@ -21,7 +21,7 @@
 #include "elf.h"
 #include "exec/address-spaces.h"
 
-#define SYS_FREQ 6600
+#define SYS_FREQ 16600
 
 #define PCSR_EN 0x0001
 #define PCSR_RLD0x0002
-- 
1.9.1

[Qemu-devel] How to add my implementation of the fmadds instruction to QEMU

2016-09-26 Thread G 3

I made my own experimental implementation of the fmadds instruction  
that I would like to add to QEMU. How would I do this?


My implementation would probably look like this:

void fmadds(float *frD, float frA, float frC, float frB)
{
*frD = frA * frC + frB;
}


I then want to see if this implementation will make things faster.  
This code will test my implementation:


#include 
#include 

/*
fmadds basically does this frD = frA * frC + frB
*/

int main (int argc, const char * argv[]) {
const int iteration_count = 1;
double iter, frD, frA, frB, frC;
clock_t start_time, end_time;

frA = 10;
frB = 5;
frC = 2;

start_time = clock();
for(iter = 0; iter < iteration_count; iter++)
{
asm volatile("fmadds %0, %1, %2, %3" : "=f" (frD) :  
"f" (frA), "f" (frC), "f" (frB));

}
end_time = clock();
printf("frD:%f frA:%f frB:%f frC:%f\n", frD, frA, frB, frC);
printf("Time elapsed: %0.2f seconds\n", (float)(end_time -  
start_time) / CLOCKS_PER_SEC);


return 0;
}

Re: [Qemu-devel] [PATCH] checkpatch.pl: disable arch-specific test for linux-user

2016-09-26 Thread Peter Maydell

On 26 September 2016 at 16:36, Riku Voipio  wrote:
> On 27 September 2016 at 00:08, Peter Maydell  wrote:
>> Do you have some examples of the false positives you want
>> to suppress here? For new code I would hope that we can
>> handle host-arch-specifics by having new files (or just
>> new #defines etc) in linux-user/host/$ARCH/ rather than
>> inline #ifdeffery in the main files.
>
> One example from your patch:
>
> https://lists.gnu.org/archive/html/qemu-devel/2016-09/msg05650.html
>
> And another from Laurent:
>
> https://lists.gnu.org/archive/html/qemu-devel/2016-09/msg06486.html
>
> Every new syscall will comes with "#ifdef TARGET_NR_foo and
> defined(__NR_foo)", while host/target combos catch up. Now, most
> TARGET_NR_foo's are needed only for unicore32, but the __NR_foo
> defines will be needed for a very long time.

Oh, I see; I don't think of the __NR_foo as being "architecture
specific". I think we'd be better off specifically whitelisting
those in checkpatch rather than turning off the whole check
for linux-user.

thanks
-- PMM

Re: [Qemu-devel] [PATCH] checkpatch.pl: disable arch-specific test for linux-user

2016-09-26 Thread Riku Voipio

On 27 September 2016 at 00:08, Peter Maydell  wrote:
> On 26 September 2016 at 12:58,   wrote:
>> From: Riku Voipio 
>>
>> Linux-user and bsd-user code needs lots of arch-specific ifdefs,
>> so disable the warning.
>>
>> Signed-off-by: Riku Voipio 
>> ---
>>  scripts/checkpatch.pl | 5 +++--
>>  1 file changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
>> index dde3f5f..98a007f 100755
>> --- a/scripts/checkpatch.pl
>> +++ b/scripts/checkpatch.pl
>> @@ -2405,8 +2405,9 @@ sub process {
>> }
>>  # check of hardware specific defines
>>  # we have e.g. CONFIG_LINUX and CONFIG_WIN32 for common cases
>> -# where they might be necessary.
>> -   if ($line =~ m@^.\s*\#\s*if.*\b__@) {
>> +# where they might be necessary. Skip test on linux-user and bsd-user
>> +# where arch defines are needed
>> +   if (!($realfile =~ /^(linux|bsd)-user/) &&  $line =~ 
>> m@^.\s*\#\s*if.*\b__@) {
>> ERROR("architecture specific defines should be 
>> avoided\n" .  $herecurr);
>> }
>
> Do you have some examples of the false positives you want
> to suppress here? For new code I would hope that we can
> handle host-arch-specifics by having new files (or just
> new #defines etc) in linux-user/host/$ARCH/ rather than
> inline #ifdeffery in the main files.

One example from your patch:

https://lists.gnu.org/archive/html/qemu-devel/2016-09/msg05650.html

And another from Laurent:

https://lists.gnu.org/archive/html/qemu-devel/2016-09/msg06486.html

Every new syscall will comes with "#ifdef TARGET_NR_foo and
defined(__NR_foo)", while host/target combos catch up. Now, most
TARGET_NR_foo's are needed only for unicore32, but the __NR_foo
defines will be needed for a very long time.

Riku

Re: [Qemu-devel] [Nbd] [PATCH] proto: add 'shift' extension.

2016-09-26 Thread Wouter Verhelst

On Mon, Sep 26, 2016 at 03:21:46PM -0500, Eric Blake wrote:
> I'd much rather support a single flag that says to zero the entire disk
> than to introduce stateful variable-amount shifting.

That's almost exactly the opposite of what I said :)

Now, I don't feel very strong either way, but what matters to me is:

- NBD is a simple, easy to understand protocol; that is a feature, and
  so it should remain that way.
- Every time we add another option, flag, or command, we make the
  protocol slightly more complex, which is counter to that goal.
- Adding a command with a single use case (i.e., a "wipe the whole
  device" command) seems like it would not see much use, except perhaps
  in the use case that Virtuozzo is thinking about. In other words, it
  makes things slightly more complex for little benefit.

I thought a negotiated shift size could be creatively used for other
things beyond just "wipe the whole disk" commands, and that it might be
elegant in that way. On the other hand, I recognize that adding state in
that manner also complicates the protocol in that an observer which sees
only part of the traffic may not understand what's going on anymore.

So let's just say that an NBD_CMD_FLAG_SHIFT would:
- Left-shift the size by 16 bits; no more, no less
  - 2^32-1 is too large a granularity for this to be useful beyond "wipe
whole disk" commands; 2^16-1 (65535) seems like a more useful
granularity.
  - This allows for a maximum number of 2^48-1 bytes (one byte shy of
256 tebibytes) to be affected by a single command, which seems
sufficient for the given purpose.
  - If someone really wants to wipe 2^64-1 bytes (i.e., 16 exbibytes),
they are probably using the wrong tools.
- Be only valid for commands that don't send or expect data to be sent
  out over the wire.
  - currently TRIM and WRITE_ZEROES, but not READ or WRITE.

Thoughts?

-- 
< ron> I mean, the main *practical* problem with C++, is there's like a dozen
   people in the world who think they really understand all of its rules,
   and pretty much all of them are just lying to themselves too.
 -- #debian-devel, OFTC, 2016-02-12

Re: [Qemu-devel] [Qemu-ppc] [PULL 00/44] ppc-for-2.8 queue 20160922

2016-09-26 Thread Alex Bennée

Cédric Le Goater  writes:

> On 09/24/2016 04:31 PM, Alex Bennée wrote:
>>
>> David Gibson  writes:

>> The base containers are 12.04 but there is a Trusty VM and for the
>> ThreadSanitizer patches I added the Ubuntu Toolchain PPA which is fairly
>> upto date.
>
> Hello,
>
> This is a little of topic but are all the containers 64bits ?

All the containers that are based of images pulled from Docker's hub are
64 bit (multiarch is currently a WIP). You can use the debootstrap
recipe to build 32 and 64 bit images from scratch of Debian at least.

--
Alex Bennée

[Qemu-devel] external usb flash drive support on qemu virtualization

2016-09-26 Thread Kumar Girish

Dear All,

I am new to virtualization on Linux. Any help and support is appreciated.

Explanation of my current setup
Procesor : intel atom
linux kernel : 3.10
Virtualization at kernel : KVM
Virtualization at application : QEMU 2.7.0

Problem
When flash drive(pen drive ) is inserted and mounted, it is mounted and visible 
at host. But same is not visible at guest VM.

Regards,
Girish

[Qemu-devel] [PATCH 3/5] target-i386: Remove has_msr_hv_tsc global variable

2016-09-26 Thread Eduardo Habkost

The global variable is not necessary because we can check
cpu->hyperv_time directly.

We just need to ensure cpu->hyperv_time will be cleared if the
feature is not really being exposed to the guest due to missing
KVM_CAP_HYPERV_TIME capability.

Signed-off-by: Eduardo Habkost 
---
 target-i386/kvm.c | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 031ae90..4046030 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -91,7 +91,6 @@ static bool has_msr_bndcfgs;
 static bool has_msr_kvm_steal_time;
 static int lm_capable_kernel;
 static bool has_msr_hv_hypercall;
-static bool has_msr_hv_tsc;
 static bool has_msr_hv_crash;
 static bool has_msr_hv_reset;
 static bool has_msr_hv_vpindex;
@@ -602,6 +601,11 @@ static int hyperv_handle_properties(CPUState *cs)
 X86CPU *cpu = X86_CPU(cs);
 CPUX86State *env = >env;
 
+if (cpu->hyperv_time &&
+kvm_check_extension(cs->kvm_state, KVM_CAP_HYPERV_TIME) <= 0) {
+cpu->hyperv_time = false;
+}
+
 if (cpu->hyperv_relaxed_timing) {
 env->features[FEAT_HYPERV_EAX] |= HV_X64_MSR_HYPERCALL_AVAILABLE;
 }
@@ -609,12 +613,10 @@ static int hyperv_handle_properties(CPUState *cs)
 env->features[FEAT_HYPERV_EAX] |= HV_X64_MSR_HYPERCALL_AVAILABLE;
 env->features[FEAT_HYPERV_EAX] |= HV_X64_MSR_APIC_ACCESS_AVAILABLE;
 }
-if (cpu->hyperv_time &&
-kvm_check_extension(cs->kvm_state, KVM_CAP_HYPERV_TIME) > 0) {
+if (cpu->hyperv_time) {
 env->features[FEAT_HYPERV_EAX] |= HV_X64_MSR_HYPERCALL_AVAILABLE;
 env->features[FEAT_HYPERV_EAX] |= HV_X64_MSR_TIME_REF_COUNT_AVAILABLE;
 env->features[FEAT_HYPERV_EAX] |= 0x200;
-has_msr_hv_tsc = true;
 }
 if (cpu->hyperv_crash && has_msr_hv_crash) {
 env->features[FEAT_HYPERV_EDX] |= HV_X64_GUEST_CRASH_MSR_AVAILABLE;
@@ -1683,7 +1685,7 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
 kvm_msr_entry_add(cpu, HV_X64_MSR_APIC_ASSIST_PAGE,
   env->msr_hv_vapic);
 }
-if (has_msr_hv_tsc) {
+if (cpu->hyperv_time) {
 kvm_msr_entry_add(cpu, HV_X64_MSR_REFERENCE_TSC, env->msr_hv_tsc);
 }
 if (has_msr_hv_crash) {
@@ -2087,7 +2089,7 @@ static int kvm_get_msrs(X86CPU *cpu)
 if (cpu->hyperv_vapic) {
 kvm_msr_entry_add(cpu, HV_X64_MSR_APIC_ASSIST_PAGE, 0);
 }
-if (has_msr_hv_tsc) {
+if (cpu->hyperv_time) {
 kvm_msr_entry_add(cpu, HV_X64_MSR_REFERENCE_TSC, 0);
 }
 if (has_msr_hv_crash) {
-- 
2.7.4

[Qemu-devel] [PATCH 0/5] target-i386: Eliminate unnecessary has_msr_* global variables

2016-09-26 Thread Eduardo Habkost

Some of the has_msr_* global variables are set depending on
X86PCU fields and are unnecessary because we can look at the
X86CPU object directly. This series eliminates some of them.

Eduardo Habkost (5):
  target-i386: Remove has_msr_mtrr global variable
  target-i386: Remove has_msr_hv_apic global variable
  target-i386: Remove has_msr_hv_tsc global variable
  target-i386: Clear KVM CPUID features if KVM is disabled
  target-i386: Remove has_msr_* global vars for KVM features

 target-i386/cpu.c |  4 
 target-i386/kvm.c | 51 +++
 2 files changed, 23 insertions(+), 32 deletions(-)

-- 
2.7.4

[Qemu-devel] [PATCH 4/5] target-i386: Clear KVM CPUID features if KVM is disabled

2016-09-26 Thread Eduardo Habkost

This will ensure all checks for features[FEAT_KVM] in the code
will be correct in case the KVM CPUID leaf is completely
disabled.

Signed-off-by: Eduardo Habkost 
---
 target-i386/cpu.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index db12728..8aa2b0f 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -2998,6 +2998,10 @@ static void x86_cpu_realizefn(DeviceState *dev, Error 
**errp)
 cpu->env.features[w] &= ~minus_features[w];
 }
 
+if (!kvm_enabled() || !cpu->expose_kvm) {
+env->features[FEAT_KVM] = 0;
+}
+
 if (env->features[FEAT_7_0_EBX] && env->cpuid_level < 7) {
 env->cpuid_level = 7;
 }
-- 
2.7.4

[Qemu-devel] [PATCH 5/5] target-i386: Remove has_msr_* global vars for KVM features

2016-09-26 Thread Eduardo Habkost

The global variables are not necessary because we can check KVM
feature flags in X86CPU directly.

Signed-off-by: Eduardo Habkost 
---
 target-i386/kvm.c | 21 ++---
 1 file changed, 6 insertions(+), 15 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 4046030..30b63b7 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -83,12 +83,9 @@ static bool has_msr_tsc_aux;
 static bool has_msr_tsc_adjust;
 static bool has_msr_tsc_deadline;
 static bool has_msr_feature_control;
-static bool has_msr_async_pf_en;
-static bool has_msr_pv_eoi_en;
 static bool has_msr_misc_enable;
 static bool has_msr_smbase;
 static bool has_msr_bndcfgs;
-static bool has_msr_kvm_steal_time;
 static int lm_capable_kernel;
 static bool has_msr_hv_hypercall;
 static bool has_msr_hv_crash;
@@ -754,12 +751,6 @@ int kvm_arch_init_vcpu(CPUState *cs)
 c = _data.entries[cpuid_i++];
 c->function = KVM_CPUID_FEATURES | kvm_base;
 c->eax = env->features[FEAT_KVM];
-
-has_msr_async_pf_en = c->eax & (1 << KVM_FEATURE_ASYNC_PF);
-
-has_msr_pv_eoi_en = c->eax & (1 << KVM_FEATURE_PV_EOI);
-
-has_msr_kvm_steal_time = c->eax & (1 << KVM_FEATURE_STEAL_TIME);
 }
 
 cpu_x86_cpuid(env, 0, 0, , , , );
@@ -1639,13 +1630,13 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
 kvm_msr_entry_add(cpu, MSR_IA32_TSC, env->tsc);
 kvm_msr_entry_add(cpu, MSR_KVM_SYSTEM_TIME, env->system_time_msr);
 kvm_msr_entry_add(cpu, MSR_KVM_WALL_CLOCK, env->wall_clock_msr);
-if (has_msr_async_pf_en) {
+if (env->features[FEAT_KVM] & (1 << KVM_FEATURE_ASYNC_PF)) {
 kvm_msr_entry_add(cpu, MSR_KVM_ASYNC_PF_EN, env->async_pf_en_msr);
 }
-if (has_msr_pv_eoi_en) {
+if (env->features[FEAT_KVM] & (1 << KVM_FEATURE_PV_EOI)) {
 kvm_msr_entry_add(cpu, MSR_KVM_PV_EOI_EN, env->pv_eoi_en_msr);
 }
-if (has_msr_kvm_steal_time) {
+if (env->features[FEAT_KVM] & (1 << KVM_FEATURE_STEAL_TIME)) {
 kvm_msr_entry_add(cpu, MSR_KVM_STEAL_TIME, env->steal_time_msr);
 }
 if (has_msr_architectural_pmu) {
@@ -2048,13 +2039,13 @@ static int kvm_get_msrs(X86CPU *cpu)
 #endif
 kvm_msr_entry_add(cpu, MSR_KVM_SYSTEM_TIME, 0);
 kvm_msr_entry_add(cpu, MSR_KVM_WALL_CLOCK, 0);
-if (has_msr_async_pf_en) {
+if (env->features[FEAT_KVM] & (1 << KVM_FEATURE_ASYNC_PF)) {
 kvm_msr_entry_add(cpu, MSR_KVM_ASYNC_PF_EN, 0);
 }
-if (has_msr_pv_eoi_en) {
+if (env->features[FEAT_KVM] & (1 << KVM_FEATURE_PV_EOI)) {
 kvm_msr_entry_add(cpu, MSR_KVM_PV_EOI_EN, 0);
 }
-if (has_msr_kvm_steal_time) {
+if (env->features[FEAT_KVM] & (1 << KVM_FEATURE_STEAL_TIME)) {
 kvm_msr_entry_add(cpu, MSR_KVM_STEAL_TIME, 0);
 }
 if (has_msr_architectural_pmu) {
-- 
2.7.4

[Qemu-devel] [PATCH 1/5] target-i386: Remove has_msr_mtrr global variable

2016-09-26 Thread Eduardo Habkost

The global variable is not necessary because we can check the CPU
feature flags directly.

Signed-off-by: Eduardo Habkost 
---
 target-i386/kvm.c | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index a0e42b2..5118562 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -99,7 +99,6 @@ static bool has_msr_hv_vpindex;
 static bool has_msr_hv_runtime;
 static bool has_msr_hv_synic;
 static bool has_msr_hv_stimer;
-static bool has_msr_mtrr;
 static bool has_msr_xss;
 
 static bool has_msr_architectural_pmu;
@@ -975,9 +974,6 @@ int kvm_arch_init_vcpu(CPUState *cs)
 }
 cpu->kvm_msr_buf = g_malloc0(MSR_BUF_SIZE);
 
-if (env->features[FEAT_1_EDX] & CPUID_MTRR) {
-has_msr_mtrr = true;
-}
 if (!(env->features[FEAT_8000_0001_EDX] & CPUID_EXT2_RDTSCP)) {
 has_msr_tsc_aux = false;
 }
@@ -1735,7 +1731,7 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
 env->msr_hv_stimer_count[j]);
 }
 }
-if (has_msr_mtrr) {
+if (env->features[FEAT_1_EDX] & CPUID_MTRR) {
 uint64_t phys_mask = MAKE_64BIT_MASK(0, cpu->phys_bits);
 
 kvm_msr_entry_add(cpu, MSR_MTRRdefType, env->mtrr_deftype);
@@ -2125,7 +2121,7 @@ static int kvm_get_msrs(X86CPU *cpu)
 kvm_msr_entry_add(cpu, msr, 0);
 }
 }
-if (has_msr_mtrr) {
+if (env->features[FEAT_1_EDX] & CPUID_MTRR) {
 kvm_msr_entry_add(cpu, MSR_MTRRdefType, 0);
 kvm_msr_entry_add(cpu, MSR_MTRRfix64K_0, 0);
 kvm_msr_entry_add(cpu, MSR_MTRRfix16K_8, 0);
-- 
2.7.4

[Qemu-devel] [PATCH 2/5] target-i386: Remove has_msr_hv_apic global variable

2016-09-26 Thread Eduardo Habkost

The global variable is not necessary because we can check
cpu->hyperv_vapic directly.

Signed-off-by: Eduardo Habkost 
---
 target-i386/kvm.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 5118562..031ae90 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -91,7 +91,6 @@ static bool has_msr_bndcfgs;
 static bool has_msr_kvm_steal_time;
 static int lm_capable_kernel;
 static bool has_msr_hv_hypercall;
-static bool has_msr_hv_vapic;
 static bool has_msr_hv_tsc;
 static bool has_msr_hv_crash;
 static bool has_msr_hv_reset;
@@ -609,7 +608,6 @@ static int hyperv_handle_properties(CPUState *cs)
 if (cpu->hyperv_vapic) {
 env->features[FEAT_HYPERV_EAX] |= HV_X64_MSR_HYPERCALL_AVAILABLE;
 env->features[FEAT_HYPERV_EAX] |= HV_X64_MSR_APIC_ACCESS_AVAILABLE;
-has_msr_hv_vapic = true;
 }
 if (cpu->hyperv_time &&
 kvm_check_extension(cs->kvm_state, KVM_CAP_HYPERV_TIME) > 0) {
@@ -728,7 +726,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
 if (cpu->hyperv_relaxed_timing) {
 c->eax |= HV_X64_RELAXED_TIMING_RECOMMENDED;
 }
-if (has_msr_hv_vapic) {
+if (cpu->hyperv_vapic) {
 c->eax |= HV_X64_APIC_ACCESS_RECOMMENDED;
 }
 c->ebx = cpu->hyperv_spinlock_attempts;
@@ -1681,7 +1679,7 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
 kvm_msr_entry_add(cpu, HV_X64_MSR_HYPERCALL,
   env->msr_hv_hypercall);
 }
-if (has_msr_hv_vapic) {
+if (cpu->hyperv_vapic) {
 kvm_msr_entry_add(cpu, HV_X64_MSR_APIC_ASSIST_PAGE,
   env->msr_hv_vapic);
 }
@@ -2086,7 +2084,7 @@ static int kvm_get_msrs(X86CPU *cpu)
 kvm_msr_entry_add(cpu, HV_X64_MSR_HYPERCALL, 0);
 kvm_msr_entry_add(cpu, HV_X64_MSR_GUEST_OS_ID, 0);
 }
-if (has_msr_hv_vapic) {
+if (cpu->hyperv_vapic) {
 kvm_msr_entry_add(cpu, HV_X64_MSR_APIC_ASSIST_PAGE, 0);
 }
 if (has_msr_hv_tsc) {
-- 
2.7.4

Re: [Qemu-devel] [PATCH 15/18] target-riscv: Interrupt Handling

2016-09-26 Thread Richard Henderson


On 09/26/2016 03:56 AM, Sagar Karandikar wrote:

+if (interruptno + 1) {


This is a very odd way to write interruptno != -1.
And did you really mean interruptno >= 0?


r~

Re: [Qemu-devel] [PATCH 14/18] target-riscv: softmmu/address translation support

2016-09-26 Thread Richard Henderson


On 09/26/2016 03:56 AM, Sagar Karandikar wrote:

+printf("unassigned address not implemented for riscv\n");
+printf("are you trying to fetch instructions from an MMIO page?\n");
+printf("unassigned Address: %016lX\n", addr);
+exit(1);


All these printf's (and fprintfs above) are going to have to go before this 
gets accepted.  Most of them should simply become qemu_log's.



r~

Re: [Qemu-devel] [PATCH 2/3] target-arm: add env->tbflags

2016-09-26 Thread Peter Maydell

On 14 September 2016 at 02:56, Paolo Bonzini  wrote:
> Computing TranslationBlock flags is pretty expensive on ARM, especially
> 32-bit.  In order to limit the cost we want to cache as many of them
> as possible.  Therefore, the flags are split in two parts.  Static flags
> come directly from a new CPUARMState field env->tbflags, and are updated
> whenever EL changes (i.e. on exceptions and exception returns) or after
> MSR instructions.

Doing this for all MSR writes is a bit sad, because a lot of them
don't actually change the TB flags, and quite a few of them which
previously we were able to code to not have to do a helper call
at all (direct writes to fields) now get a pointless helper call.

You're also recalculating more often than stated here, in that
you also recalc on any gen_lookup_tb() call in the 32-bit
decoder. (This is just as well because for instance vec_len
and vec_stride aren't set via the cp15 system register write
path.)

You're treating the PSTATE_SS flag as static, but you don't
have anything which causes a recalculation of it on the code
path which changes it (gen_ss_advance()).

The 32-bit SETEND instruction changes CPSR_E, which has
an effect on the BE_DATA_MASK flag, but I don't think
that code path will cause us to recalculate flags.

I found this patch kind of difficult to review because
it isn't obvious why we recalculate the static flags at
the points where we do (ie whether those points are
necessary and sufficient for correct behaviour). Most
of the comments above are the result of my looking at
whether some particular flags that I suspected of being
tricky were handled correctly :-)

Is there a possibility of having an assertion somewhere that
the static flags we think we have actually match the
old dynamic flags?

> Dynamic flags are computed on the fly by cpu_get_tb_cpu_state (which
> calls cpu_dynamic_tb_cpu_flags to retrieve them), same as before.
> As of this patch, all flags are dynamic and env->tbflags is always 0,
> so this patch adds the infrastructure but does not do any caching yet.
>
> Signed-off-by: Paolo Bonzini 
> ---
>  target-arm/cpu.c   |  2 ++
>  target-arm/cpu.h   | 10 +-
>  target-arm/helper.c|  2 ++
>  target-arm/helper.h|  1 +
>  target-arm/op_helper.c |  7 +++
>  target-arm/translate-a64.c |  4 
>  target-arm/translate.c | 12 ++--
>  target-arm/translate.h |  1 +
>  8 files changed, 36 insertions(+), 3 deletions(-)
>
> diff --git a/target-arm/cpu.c b/target-arm/cpu.c
> index ce8b8f4..189ceab 100644
> --- a/target-arm/cpu.c
> +++ b/target-arm/cpu.c
> @@ -225,6 +225,8 @@ static void arm_cpu_reset(CPUState *s)
>>vfp.fp_status);
>  set_float_detect_tininess(float_tininess_before_rounding,
>>vfp.standard_fp_status);
> +
> +env->tbflags = cpu_get_tb_cpu_flags(env);
>  tlb_flush(s, 1);
>
>  #ifndef CONFIG_USER_ONLY
> diff --git a/target-arm/cpu.h b/target-arm/cpu.h
> index ef195bd..5918df5 100644
> --- a/target-arm/cpu.h
> +++ b/target-arm/cpu.h
> @@ -155,6 +155,7 @@ typedef struct CPUARMState {
>   */
>  uint32_t pstate;
>  uint32_t aarch64; /* 1 if CPU is in aarch64 state; inverse of PSTATE.nRW 
> */
> +uint32_t tbflags;
>
>  /* Frequently accessed CPSR bits are stored separately for efficiency.
> This contains all the other bits.  Use cpsr_{read,write} to access
> @@ -2370,10 +2371,17 @@ static inline uint32_t 
> cpu_dynamic_tb_cpu_flags(CPUARMState *env)
>  return flags;
>  }
>
> +static inline uint32_t cpu_get_tb_cpu_flags(CPUARMState *env)
> +{
> +uint32_t flags = 0;
> +
> +return flags;
> +}
> +
>  static inline void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
>  target_ulong *cs_base, uint32_t 
> *flags)
>  {
> -*flags = cpu_dynamic_tb_cpu_flags(env);
> +*flags = env->tbflags | cpu_dynamic_tb_cpu_flags(env);
>
>  if (is_a64(env)) {
>  *pc = env->pc;
> diff --git a/target-arm/helper.c b/target-arm/helper.c
> index bdb842c..0b4f6de 100644
> --- a/target-arm/helper.c
> +++ b/target-arm/helper.c
> @@ -6068,6 +6068,7 @@ void arm_v7m_cpu_do_interrupt(CPUState *cs)
>  addr = ldl_phys(cs->as, env->v7m.vecbase + env->v7m.exception * 4);
>  env->regs[15] = addr & 0xfffe;
>  env->thumb = addr & 1;
> +env->tbflags = cpu_get_tb_cpu_flags(env);
>  }
>
>  /* Function used to synchronize QEMU's AArch64 register set with AArch32
> @@ -6642,6 +6643,7 @@ void arm_cpu_do_interrupt(CPUState *cs)
>  arm_cpu_do_interrupt_aarch32(cs);
>  }
>
> +env->tbflags = cpu_get_tb_cpu_flags(env);
>  arm_call_el_change_hook(cpu);
>
>  if (!kvm_enabled()) {
> diff --git a/target-arm/helper.h b/target-arm/helper.h
> index 84aa637..21a0e35 100644
> --- a/target-arm/helper.h
> +++ b/target-arm/helper.h
> @@ -73,6 +73,7 @@ DEF_HELPER_2(get_cp_reg64,

Re: [Qemu-devel] [PATCH 02/18] target-riscv: Add RISC-V Target stubs inside target-riscv/

2016-09-26 Thread Richard Henderson


On 09/26/2016 03:56 AM, Sagar Karandikar wrote:

+typedef struct CPURISCVState CPURISCVState;
+struct CPURISCVState {
+target_ulong gpr[32];
+uint64_t fpr[32]; /* assume both F and D extensions */
+target_ulong PC;
+target_ulong load_res;
+
+target_ulong csr[4096]; /* RISCV CSR registers */


This is 16k for 32-bit, and 32k for 64-bit, most of which is unused.

I think it would be better to add only the CSRs that you actually need for the 
implementation.



r~

Re: [Qemu-devel] [Qemu-block] [PATCH v3 1/2] block: sync bdrv_co_get_block_status_above() with bdrv_is_allocated_above()

2016-09-26 Thread Kashyap Chamarthy

On Mon, Sep 26, 2016 at 05:04:21PM +0200, Kevin Wolf wrote:
> Am 19.09.2016 um 22:39 hat Eric Blake geschrieben:

[...]

> > I typically write this as:
> > 
> > L2 <- L1 <- L0
> > 
> > (read "L2 backs L1, which in turn backs L0") with the active on the
> > right.  So I understand the confusion in Fam's question where you were
> > using the opposite direction.
> 
> And I tend to use this one:
> 
> base <- sn1 <- sn2 <- top
> 
> "sn*" isn't any better than "L*", but having at least one of "base" and
> "top" (or "active") in there disambiguates the roles of the nodes.

Not to quibble over terminology too much, but now that I'm writing a doc
that I want to submit upstream, I began with your (Kevin's) notation.
Then, I thought: Hmm, "sn1" could also be referred to as 'base', and
"sn2" as 'top' when using `block-commit`' (and `block-stream`, once it
starts supporting intermediate streaming?).

And, moreover, as Eric (correctly) warns elsewhere about file-names vs.
points-in-time: the guest state when 'sn1' was created is contained in
'base', so one could argue that 'sn1' ("snapshot 1") is a misnomer, and
is technically 'overlay1'.

So, I used the below notation until recently, including 'active'
(with the rationale Kevin mentioned):

base <- overlay1 <- overlay2  <- active

Then, someone asked: "In the above chain, are you pointing to 'overlay2'
as active, or is 'active' a separate image unto itself"?  "Sigh, so it
is still prone to misunderstanding", I thought.

Given that, for now, though slightly more verbose and space-occupying, I
settled on the below (occasionally doing s/base/orig/, to avoid the
"overlay1 could be referred to as 'base' in some cases" problem):

Live QEMU
   |
   v
base <- overlay1 <- overlay2 <- overlay3

FWIW, the above also avoids the problem of a file called 'active' being
described as: "previously-active, but not anymore, because its contents
are merged into its backing file" in the event of a 'block-commit'.

-- 
/kashyap

Re: [Qemu-devel] [PATCH 12/18] target-riscv: Add system instructions

2016-09-26 Thread Richard Henderson


On 09/26/2016 03:56 AM, Sagar Karandikar wrote:

+void helper_fence_i(CPURISCVState *env)
+{
+RISCVCPU *cpu = riscv_env_get_cpu(env);
+CPUState *cs = CPU(cpu);
+/* Flush QEMU's TLB */
+tlb_flush(cs, 1);
+/* ARM port seems to not know if this is okay inside a TB
+   But we need to do it */
+tb_flush(cs);
+}


You should not need to tb_flush for fence_i.  QEMU's internals auto-detect when 
a memory write invalidates a TB.



r~

Re: [Qemu-devel] [PATCH 11/18] target-riscv: Add Double Precision Floating-Point Instructions

2016-09-26 Thread Richard Henderson


On 09/26/2016 03:56 AM, Sagar Karandikar wrote:

+case OPC_RISC_FSGNJ_D:
+/* also OPC_RISC_FSGNJN_D, OPC_RISC_FSGNJX_D */
+if (rm == 0x0) {
+gen_helper_fsgnj_d(cpu_fpr[rd], cpu_env, cpu_fpr[rs1],
+   cpu_fpr[rs2]);
+} else if (rm == 0x1) {
+gen_helper_fsgnjn_d(cpu_fpr[rd], cpu_env, cpu_fpr[rs1],
+cpu_fpr[rs2]);
+} else if (rm == 0x2) {
+gen_helper_fsgnjx_d(cpu_fpr[rd], cpu_env, cpu_fpr[rs1],
+cpu_fpr[rs2]);
+} else {
+kill_unknown(ctx, RISCV_EXCP_ILLEGAL_INST);
+}


I forgot to mention this for single-precision -- given that FMOV is an alias of 
FSGNJ, you should special case rs1 == rs2.



r~

Re: [Qemu-devel] [PATCH 10/18] target-riscv: Add Single Precision Floating-Point Instructions

2016-09-26 Thread Richard Henderson


On 09/26/2016 03:56 AM, Sagar Karandikar wrote:

+uint64_t helper_fsgnj_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2)
+{
+frs1 = (frs1 & ~(uint32_t)INT32_MIN) | (frs2 & (uint32_t)INT32_MIN);
+return frs1;
+}
+
+uint64_t helper_fsgnjn_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2)
+{
+frs1 = (frs1 & ~(uint32_t)INT32_MIN) | ((~frs2) & (uint32_t)INT32_MIN);
+return frs1;
+}
+
+uint64_t helper_fsgnjx_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2)
+{
+frs1 = frs1 ^ (frs2 & (uint32_t)INT32_MIN);
+return frs1;
+}


These are simple enough to implement inline.


+
+uint64_t helper_fmin_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2)
+{
+frs1 = float32_is_any_nan(frs2) ||
+   float32_lt_quiet(frs1, frs2, >fp_status) ? frs1 : frs2;
+set_fp_exceptions();
+return frs1;
+}


float32_minnum.


+
+uint64_t helper_fmax_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2)
+{
+frs1 = float32_is_any_nan(frs2) ||
+   float32_le_quiet(frs2, frs1, >fp_status) ? frs1 : frs2;
+set_fp_exceptions();
+return frs1;
+}


float32_maxnum.


+target_ulong helper_fcvt_w_s(CPURISCVState *env, uint64_t frs1, uint64_t rm)
+{
+set_float_rounding_mode(RM, >fp_status);
+frs1 = (int64_t)((int32_t)float32_to_int32(frs1, >fp_status));


You do not need either of these casts.


+target_ulong helper_fcvt_wu_s(CPURISCVState *env, uint64_t frs1, uint64_t rm)
+{
+set_float_rounding_mode(RM, >fp_status);
+frs1 = (int64_t)((int32_t)float32_to_uint32(frs1, >fp_status));


You do not need the cast to int64_t.  Also, remove the extra parenthesis.


+union ui32_f32 { uint32_t ui; uint32_t f; };


What's the point of this?


+
+uint_fast16_t float32_classify(uint32_t a, float_status *status)


No need for uint_fast16_t.  Just use uint32_t.


+{
+union ui32_f32 uA;
+uint_fast32_t uiA;
+
+uA.f = a;
+uiA = uA.ui;
+
+uint_fast16_t infOrNaN = expF32UI(uiA) == 0xFF;


float32_is_infinity or float32_is_any_nan.


+uint_fast16_t subnormalOrZero = expF32UI(uiA) == 0;


float32_is_zero_or_denormal


+bool sign = signF32UI(uiA);


float32_is_neg.


r~

Re: [Qemu-devel] [Qemu-arm] [PATCH 4/7] m25p80: add a m25p80_set_rom_storage() routine

2016-09-26 Thread mar.krzeminski




W dniu 26.09.2016 o 10:56, Cédric Le Goater pisze:

On 09/26/2016 10:25 AM, KONRAD Frederic wrote:


Le 24/09/2016 à 10:55, Edgar E. Iglesias a écrit :

On Sat, Sep 24, 2016 at 10:25:39AM +0200, Cédric Le Goater wrote:

On 09/23/2016 08:26 PM, mar.krzeminski wrote:

Hi Cedric,

W dniu 23.09.2016 o 10:28, Cédric Le Goater pisze:

On 09/23/2016 10:17 AM, Peter Maydell wrote:

On 23 September 2016 at 08:19, Cédric Le Goater  wrote:

But the goal is to boot from the device, so I added a memory region alias
at 0 to trigger the flash module mmios at boot time, as this is where
u-boot expects to be.

and I fell in this trap :/

  aspeed_smc_flash_read: To 0x0 of size 1: 0xbe mode:0
  Bad ram pointer (nil)
  Aborted (core dumped)

There is a failure in get_page_addr_code(), possibly because qemu uses
byte per byte reads of the code (cpu_ldub_code). But this is beyond my
understanding of qemu's internal.

This is a bug in how we report the problem, but the underlying
issue here is attempting to execute from something that's not RAM
or ROM. You can't execute code out of something backed by MMIO.

OK. So I see two solutions. T

The "brutal" one which is to copy the flash contents in a rom blob
at 0, but there is still an issue in getting access to the storage
anyhow, as it is internal to m25p80. Or we should get the name of the
backing file of the drive but I am not sure we are expected to do
that as I don't see any API for it.

The other solution is something like this patch which lets the storage
of the flash device be assigned externally.

Since I do not like dirty hacks in the code, I want just to suggest a
workaround, that probably you will not like ;]

It's a feature ! :)

CC: Mark and Fred

I agree, I think these are fair attempts to try to solve a real problem.



I am just trying to find a solution to this issue. So, we could also
introduce a routine :

+uint8_t *m25p80_get_storage(DeviceState *dev, uint32_t *size)
+{
+Flash *s = M25P80(dev);
+
+*size = s->size;
+return s->storage;
+}

and use rom_add_blob_fixed() with the return values. Maybe this is less
intrusive in the m25p80 device model flow. Thoughts ?


As Qemu expects that first running code will be in ROM or RAM memory,
you can implement in your board -bios option that you will use to
pass u-boot binary to rom memory, or even use generic loader functionality
when it reach master.

but if we use -bios  to have a ROM, we will need to pass a second
time  as a drive to have a CS0 flash slave:

 -bios "flash-palmetto-test"  \
 -drive file="flash-palmetto-test",format=raw,if=mtd \
 -drive file="palmetto.pnor",format=raw,if=mtd

This feels awkward. The virt platform and vexpress forbid that for
instance.
Just to clarify, I mean a use of -bios option or generic loader just to 
allow this board to boot from u-boot

by directly potting u-boot in the memory where flash should be mapped.
I will not solve memory mapped flash problem at all..


Are there any other platform with similar need ?

Yes, ZynqMP has a linear memory mapping of SPI memory contents. It gets
slightly more complicated there as the mapping supports various modes
including parallel SPIs with striping done by the controller (i.e
bytes as seen via the linearly mapped area are interleaved from multiple
SPI memories).

So a plain RAM mapping of m25p80_get_storage won't work.

A ROM version of the linear data might work at controller level. The controller
would have to populate the ROM area at startup (slow) and keep it in sync for 
all
writes. We would also need to signal write events so that TCG can flush it's
caches. TCG only keeps track of modifications being made to cached code if 
changes
are done with writes to RAM area (not if they happen indirectly via other 
registers).
I was trying that in a very simple way, I stopped because I used 128MiB 
Flash (memory consumption

and speed).


Another solution is to implement support in TCG to execute from MMIO mapped 
areas.
We'd also need to solve the problem of signalling content changes to TCG.

I think these approaches have some overlap. Personally, I think TCG executing 
from
MMIO areas would be quite useful. It's also useful for QEMU & SystemC 
integration.

I can agree here, this could solve this problem, and allow to do more.

Hi,

The solution we are preparing for this is to allow a MMIO region to execute code
as Edgar just suggested by adding two callbacks:

Yes it seems the best option to boot.

The SPI controller I am working on maintains a set of mapping windows for
each flash slave. But it's up to software to make sure the window size and
the flash size are in sync. So, in a qemu world, we will need some flexibility
if we want to access directly the flash storage. I don't think this patch is
the right approach though. Let's boot first.

I believe we agree here. Exposing underlying file is to big shortcut,
from the other hand it will allow to boot from flash.



   -

Re: [Qemu-devel] [PULL 00/28] Misc patches for 2016-09-26

2016-09-26 Thread Peter Maydell

On 26 September 2016 at 06:40, Paolo Bonzini  wrote:
> The following changes since commit eaff9c4367ac3f7ac44f6c6f4cb7bcd4daa89af5:
>
>   Merge remote-tracking branch 'remotes/lalrae/tags/mips-20160923' into 
> staging (2016-09-23 15:28:07 +0100)
>
> are available in the git repository at:
>
>   git://github.com/bonzini/qemu.git tags/for-upstream
>
> for you to fetch changes up to cb9cdb6a50c0e775ebeed2d45921a6e13c62ce5c:
>
>   replay: allow replay stopping and restarting (2016-09-26 10:35:34 +0200)
>
> 
> * thread-safe tb_flush (Fred, Alex, Sergey, me, Richard, Emilio,... :-)
> * license clarification for compiler.h (Felipe)
> * glib cflags improvement (Marc-André)
> * checkpatch silencing (Paolo)
> * SMRAM migration fix (Paolo)
> * Replay improvements (Pavel)
> * IOMMU notifier improvements (Peter)
> * IOAPIC now defaults to version 0x20 (Peter)
>
> 

This fails at the link stage on OSX:

  LINK  aarch64-softmmu/qemu-system-aarch64
Undefined symbols for architecture x86_64:
  "_fseeko64", referenced from:
  _replay_post_load in replay-snapshot.o
  "_ftello64", referenced from:
  _replay_pre_save in replay-snapshot.o
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)

fseeko64() and ftello64() are Linuxisms.

I also see this compile failure:

  CCi386-softmmu/hw/i386/amd_iommu.o
/home/petmay01/linaro/qemu-for-merges/hw/i386/amd_iommu.c: In function
‘amdvi_init’:
/home/petmay01/linaro/qemu-for-merges/hw/i386/amd_iommu.c:1083:17:
error: ‘MemoryRegionIOMMUOps {aka struct MemoryRegionIOMMUOps}’ has no
member named ‘notify_started’
 s->iommu_ops.notify_started = amdvi_iommu_notify_started;
 ^
/home/petmay01/linaro/qemu-for-merges/rules.mak:60: recipe for target
'hw/i386/amd_iommu.o' failed


which I guess is a semantic merge conflict between
something here and the just-landed amd_iommu changes.

thanks
-- PMM

Re: [Qemu-devel] [PATCH 09/18] target-riscv: Add FMADD, FMSUB, FNMADD, FNMSUB Instructions,

2016-09-26 Thread Richard Henderson


On 09/26/2016 03:56 AM, Sagar Karandikar wrote:

+/* convert RISC-V rounding mode to IEEE library numbers */
+unsigned int ieee_rm[] = {
+float_round_nearest_even,
+float_round_to_zero,
+float_round_down,
+float_round_up,
+float_round_ties_away
+};
+
+/* obtain rm value to use in computation
+ * as the last step, convert rm codes to what the softfloat library expects
+ * Adapted from Spike's decode.h:RM
+ */
+#define RM ({ \
+if (rm == 7) {\
+rm = env->csr[CSR_FRM];   \
+} \
+if (rm > 4) { \
+helper_raise_exception(env, RISCV_EXCP_ILLEGAL_INST); \
+} \
+ieee_rm[rm]; })


You're not doing anything to get the proper guest PC value for the illegal 
instruction exception.  This is, of course, rare, and can be caught during 
translation, which would be greatly preferred since the fpu signals no other 
exceptions, and thus all of the fpu helpers can be TCG_CALL_NO_RWG.


This can be done by examining fcsr during cpu_get_tb_cpu_state.  Set a flag bit 
(TB_FLAG_RM_INVALID) when fcsr.rm is invalid.  Then during translation, raise 
an exception if


(1) rm == 7 && TB_FLAG_RM_INVALID
(2) rm >= 5 && rm <= 6

You can improve this rounding mode switching by remembering the mode used by 
the previous insn.  See target-alpha DisasContext.tb_rm.



+/* adapted from Spike's decode.h:set_fp_exceptions */
+#define set_fp_exceptions() do { \
+env->csr[CSR_FFLAGS] |= 
softfloat_flags_to_riscv(get_float_exception_flags(\
+>fp_status)); \
+set_float_exception_flags(0, >fp_status); \
+} while (0)


Any reason not to make this a function?


+uint64_t helper_fmsub_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2,
+uint64_t frs3, uint64_t rm)
+{
+set_float_rounding_mode(RM, >fp_status);
+frs1 = float32_muladd(frs1, frs2, frs3 ^ (uint32_t)INT32_MIN, 0,
+  >fp_status);


float_muladd_negate_c


+uint64_t helper_fnmsub_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2,
+ uint64_t frs3, uint64_t rm)
+{
+set_float_rounding_mode(RM, >fp_status);
+frs1 = float32_muladd(frs1 ^ (uint32_t)INT32_MIN, frs2, frs3, 0,
+  >fp_status);


This one is actually computing fnmadd afaics...


+uint64_t helper_fnmadd_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2,
+ uint64_t frs3, uint64_t rm)
+{
+set_float_rounding_mode(RM, >fp_status);
+frs1 = float32_muladd(frs1 ^ (uint32_t)INT32_MIN, frs2,
+  frs3 ^ (uint32_t)INT32_MIN, 0, >fp_status);


... and this one is actually computing fnmsub.  Given that you're passing 
tests, are you sure that FNMADD and FNMSUB opcodes are not swapped?


Use float_muladd_negate_result for FNMADD.
Use float_muladd_negate_c | float_muladd_negate_result for FNMSUB.


r~

Re: [Qemu-devel] [PATCH] checkpatch.pl: disable arch-specific test for linux-user

2016-09-26 Thread Peter Maydell

On 26 September 2016 at 12:58,   wrote:
> From: Riku Voipio 
>
> Linux-user and bsd-user code needs lots of arch-specific ifdefs,
> so disable the warning.
>
> Signed-off-by: Riku Voipio 
> ---
>  scripts/checkpatch.pl | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> index dde3f5f..98a007f 100755
> --- a/scripts/checkpatch.pl
> +++ b/scripts/checkpatch.pl
> @@ -2405,8 +2405,9 @@ sub process {
> }
>  # check of hardware specific defines
>  # we have e.g. CONFIG_LINUX and CONFIG_WIN32 for common cases
> -# where they might be necessary.
> -   if ($line =~ m@^.\s*\#\s*if.*\b__@) {
> +# where they might be necessary. Skip test on linux-user and bsd-user
> +# where arch defines are needed
> +   if (!($realfile =~ /^(linux|bsd)-user/) &&  $line =~ 
> m@^.\s*\#\s*if.*\b__@) {
> ERROR("architecture specific defines should be 
> avoided\n" .  $herecurr);
> }

Do you have some examples of the false positives you want
to suppress here? For new code I would hope that we can
handle host-arch-specifics by having new files (or just
new #defines etc) in linux-user/host/$ARCH/ rather than
inline #ifdeffery in the main files.

thanks
-- PMM

Re: [Qemu-devel] [PULL 00/27] Net patches

2016-09-26 Thread Peter Maydell

On 26 September 2016 at 01:59, Jason Wang  wrote:
> The following changes since commit 3b71ec8516bb50e9a743645bf139571de0b39f61:
>
>   Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging 
> (2016-09-23 16:15:33 +0100)
>
> are available in the git repository at:
>
>   https://github.com/jasowang/qemu.git tags/net-pull-request
>
> for you to fetch changes up to 172213e93226cf1fa0a230de5137020cd24ae715:
>
>   imx_fec: fix error in qemu_send_packet argument (2016-09-26 16:28:14 +0800)
>
> 
>
> - allow to specify the rx queue size for virtio-net
> - colo packet comparing thread
> - filter-writer to rewrite tcp seq for comparing and secondary VM
> - align some e1000e behaviour to spec
> - allow to specify bridge for a tap ifup script
>
> 


Fails to build (both gcc and clang):

/home/pm215/qemu/net/filter-rewriter.c: In function ‘colo_rewriter_receive_iov’:
/home/pm215/qemu/net/filter-rewriter.c:158:5: error: missing braces
around initializer [-Werror=missing-braces]
 ConnectionKey key = { 0 };
 ^
/home/pm215/qemu/net/filter-rewriter.c:158:5: error: (near
initialization for ‘key.src’) [-Werror=missing-braces]
cc1: all warnings being treated as errors
make: *** [net/filter-rewriter.o] Error 1
make: *** Waiting for unfinished jobs
/home/pm215/qemu/net/colo-compare.c: In function ‘packet_enqueue’:
/home/pm215/qemu/net/colo-compare.c:114:5: error: missing braces
around initializer [-Werror=missing-braces]
 ConnectionKey key = { 0 };
 ^
/home/pm215/qemu/net/colo-compare.c:114:5: error: (near initialization
for ‘key.src’) [-Werror=missing-braces]


thanks
-- PMM

[Qemu-devel] [PATCH v2 0/6] intc: change 'info irq' and 'info pic' to be target-agnostic

2016-09-26 Thread Hervé Poussineau

Hi,

This patchset aims at genericizing the 'info irq' and 'info pic' HMP commands, 
so
that it is available on all machines and can display details about more than one
interrupt controller per machine.

Patch 1 adds a new interface InterruptStatsProvider, which is used to:
- gather statistics for the 'info irq' command
- print some text when 'info pic' is called

Patches 2 to 4 implement InterruptStatsProvider interface on interrupt 
controllers
which have ad-hock code to handle 'info irq'/'info pic' commands.

Patch 5 removes ad-hock code, and replaces it by a generic version. You can get
details about multiple interrupt controllers per machine starting here.

Patch 6 makes 'info irq'/'info pic' commands available on all architectures.
For example, Alpha clipper machine is now able to display details about the
i8259 interrupt controller.

Changes since v1:
- renamed interface from IntCtrl to InterruptStatsProvider

Hervé

Hervé Poussineau (6):
  intc: add an interface to gather statistics/informations on interrupt
controllers
  intc/i8259: implement InterruptStatsProvider interface
  intc/slavio_intctl: implement InterruptStatsProvider interface
  intc/lm32_pic: implement InterruptStatsProvider interface
  intc: make HMP 'info irq' and 'info pic' commands use
InterruptStatsProvider interface
  intc: make HMP 'info irq' and 'info pic' commands available on all
targets

 hmp-commands-info.hx   | 17 +--
 hmp.c  | 65 +
 hmp.h  |  2 ++
 hw/intc/Makefile.objs  |  1 +
 hw/intc/i8259.c| 73 +++---
 hw/intc/intc.c | 41 ++
 hw/intc/lm32_pic.c | 63 ++-
 hw/intc/slavio_intctl.c| 67 ++
 hw/sparc/sun4m.c   | 15 +-
 include/hw/i386/pc.h   |  2 --
 include/hw/intc/intc.h | 30 +++
 include/hw/lm32/lm32_pic.h |  3 --
 include/hw/sparc/sun4m.h   |  8 -
 monitor.c  |  6 
 14 files changed, 241 insertions(+), 152 deletions(-)
 create mode 100644 hw/intc/intc.c
 create mode 100644 include/hw/intc/intc.h

-- 
2.1.4

[Qemu-devel] [PATCH v2 1/6] intc: add an interface to gather statistics/informations on interrupt controllers

2016-09-26 Thread Hervé Poussineau

This interface will be used by HMP commands 'info irq' and 'info pic'.

Signed-off-by: Hervé Poussineau 
---
 hw/intc/Makefile.objs  |  1 +
 hw/intc/intc.c | 41 +
 include/hw/intc/intc.h | 30 ++
 3 files changed, 72 insertions(+)
 create mode 100644 hw/intc/intc.c
 create mode 100644 include/hw/intc/intc.h

diff --git a/hw/intc/Makefile.objs b/hw/intc/Makefile.objs
index 05ec21b..f24c837 100644
--- a/hw/intc/Makefile.objs
+++ b/hw/intc/Makefile.objs
@@ -17,6 +17,7 @@ common-obj-$(CONFIG_ARM_GIC) += arm_gicv3.o
 common-obj-$(CONFIG_ARM_GIC) += arm_gicv3_dist.o
 common-obj-$(CONFIG_ARM_GIC) += arm_gicv3_redist.o
 common-obj-$(CONFIG_OPENPIC) += openpic.o
+common-obj-y += intc.o
 
 obj-$(CONFIG_APIC) += apic.o apic_common.o
 obj-$(CONFIG_ARM_GIC_KVM) += arm_gic_kvm.o
diff --git a/hw/intc/intc.c b/hw/intc/intc.c
new file mode 100644
index 000..2e1e29e
--- /dev/null
+++ b/hw/intc/intc.c
@@ -0,0 +1,41 @@
+/*
+ * QEMU Generic Interrupt Controller
+ *
+ * Copyright (c) 2016 Hervé Poussineau
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "qemu/osdep.h"
+#include "hw/intc/intc.h"
+#include "qemu/module.h"
+
+static const TypeInfo intctrl_info = {
+.name = TYPE_INTERRUPT_STATS_PROVIDER,
+.parent = TYPE_INTERFACE,
+.class_size = sizeof(InterruptStatsProviderClass),
+};
+
+static void intc_register_types(void)
+{
+type_register_static(_info);
+}
+
+type_init(intc_register_types)
+
diff --git a/include/hw/intc/intc.h b/include/hw/intc/intc.h
new file mode 100644
index 000..2a97ea1
--- /dev/null
+++ b/include/hw/intc/intc.h
@@ -0,0 +1,30 @@
+#ifndef INTC_H
+#define INTC_H
+
+#include "qom/object.h"
+
+#define TYPE_INTERRUPT_STATS_PROVIDER "intctrl"
+
+#define INTERRUPT_STATS_PROVIDER_CLASS(klass) \
+OBJECT_CLASS_CHECK(InterruptStatsProviderClass, (klass), \
+   TYPE_INTERRUPT_STATS_PROVIDER)
+#define INTERRUPT_STATS_PROVIDER_GET_CLASS(obj) \
+OBJECT_GET_CLASS(InterruptStatsProviderClass, (obj), \
+ TYPE_INTERRUPT_STATS_PROVIDER)
+#define INTERRUPT_STATS_PROVIDER(obj) \
+INTERFACE_CHECK(InterruptStatsProvider, (obj), \
+TYPE_INTERRUPT_STATS_PROVIDER)
+
+typedef struct InterruptStatsProvider {
+Object parent;
+} InterruptStatsProvider;
+
+typedef struct InterruptStatsProviderClass {
+InterfaceClass parent;
+
+bool (*get_statistics)(InterruptStatsProvider *obj, uint64_t **irq_counts,
+   unsigned int *nb_irqs);
+void (*print_info)(InterruptStatsProvider *obj, Monitor *mon);
+} InterruptStatsProviderClass;
+
+#endif
-- 
2.1.4

[Qemu-devel] [PATCH v2 6/6] intc: make HMP 'info irq' and 'info pic' commands available on all targets

2016-09-26 Thread Hervé Poussineau

Signed-off-by: Hervé Poussineau 
---
 hmp-commands-info.hx | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/hmp-commands-info.hx b/hmp-commands-info.hx
index 6a7c476..55d50c4 100644
--- a/hmp-commands-info.hx
+++ b/hmp-commands-info.hx
@@ -172,8 +172,6 @@ STEXI
 Show the command line history.
 ETEXI
 
-#if defined(TARGET_I386) || defined(TARGET_PPC) || defined(TARGET_MIPS) || \
-defined(TARGET_LM32) || (defined(TARGET_SPARC) && !defined(TARGET_SPARC64))
 {
 .name   = "irq",
 .args_type  = "",
@@ -192,10 +190,9 @@ ETEXI
 .name   = "pic",
 .args_type  = "",
 .params = "",
-.help   = "show i8259 (PIC) state",
+.help   = "show PIC state",
 .cmd= hmp_info_pic,
 },
-#endif
 
 STEXI
 @item info pic
-- 
2.1.4

Re: [Qemu-devel] [PATCH] proto: add 'shift' extension.

2016-09-26 Thread Eric Blake

On 09/26/2016 07:51 AM, Paolo Bonzini wrote:

>> +- bit 3, `NBD_CMD_FLAG_SHIFT`; This flag is valid for `NBD_CMD_TRIM` and
>> +  `NBD_CMD_WRITE_ZEROES`. If this flag is set the server shifts request
>> +  *length* and *offset* left by N bits, where N is defined by 
>> `NBD_OPT_SHIFT`
>> +  option or is assumed to be 16 bits by default if `NBD_OPT_SHIFT` option is
>> +  not specified. If after shift `(offset + length)` exceeds disk size, 
>> length
>> +  should be reduced to `( - offset)`. However, `(offset + 
>> length)`
>> +  must not exceed disk size by more than `(1 << N) - 1`.
>>  
>>   Request types
>>  
>>
> 
> This is very ad hoc.  Can we instead have a block size common to all
> commands?  Block devices in practice have one, in fact that's why
> they're called block devices...

The INFO extensions are supposed to be a way for the server to
communicate block sizes to the guest (note, it is one-way communication;
the guest does NOT get to pick an arbitrary size, but relies on the
server reporting it) - but I don't know if that sizing information is
useful enough for the task at hand.  As it was, the INFO extension had a
proposal idea a while back about advertising a different size for TRIM
and WRITE_ZEROES than what is preferred for WRITE and READ, so having a
single size that works for shifted commands that operate on blocks
instead of bytes may not even be feasible if there is no single block
size to settle on.

But having a universal command flag that says "this command requests
offset and length in units of blocks instead of bytes", where blocks is
an up-front sizing settled on during handshake via INFO extension, and
NOT something that the client can change at-will during a live session,
may be useful.  Or it may be the source of arithmetic overflow exploits
in poor implementations, or of denial-of-service with used with a READ
or WRITE to send more than 2G of data in a single command.  In other
words, I don't yet see a compelling use case for being able to request
shifted sizes.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH 07/18] target-riscv: Add Loads/Stores, FP Loads/Stores

2016-09-26 Thread Richard Henderson


On 09/26/2016 03:56 AM, Sagar Karandikar wrote:

Signed-off-by: Sagar Karandikar 
---
 target-riscv/translate.c | 117 +++
 1 file changed, 117 insertions(+)

diff --git a/target-riscv/translate.c b/target-riscv/translate.c
index d8044cf..767cdbe 100644
--- a/target-riscv/translate.c
+++ b/target-riscv/translate.c
@@ -72,6 +72,10 @@ static const char * const fpr_regnames[] = {
   "fs8", "fs9", "fs10", "fs11", "ft8", "ft9", "ft10", "ft11"
 };

+/* convert riscv funct3 to qemu memop for load/store */
+static int tcg_memop_lookup[] = { MO_SB, MO_TESW, MO_TESL, MO_TEQ, MO_UB,
+MO_TEUW, MO_TEUL };


Const, and formatting

static const TCGMemOp tcg_memop_lookup[] = {

};

although...


+static inline void gen_load(DisasContext *ctx, uint32_t opc, int rd, int rs1,
+int16_t imm)
+{
+target_long uimm = (target_long)imm; /* sign ext 16->64 bits */
+TCGv t0 = tcg_temp_new();
+TCGv t1 = tcg_temp_new();
+gen_get_gpr(t0, rs1);
+tcg_gen_addi_tl(t0, t0, uimm); /* */
+int memop = (opc >> 12) & 0x7;
+
+#if defined(TARGET_RISCV64)
+if (memop == 0x7) {
+#else
+if (memop == 0x7 || memop == 0x3 || memop == 0x6) {
+#endif
+kill_unknown(ctx, RISCV_EXCP_ILLEGAL_INST);
+} else {
+tcg_gen_qemu_ld_tl(t1, t0, ctx->mem_idx, tcg_memop_lookup[memop]);


... considering the set of if's, maybe better as

static const int tcg_memop_lookup[8] = {
[0 ... 7] = -1,
[0] = MO_SB,
[1] = MO_TESW,
[2] = MO_TESL,
[4] = MO_UB,
[5] = MO_TEUW,
#ifdef TARGET_RISCV64
[3] = MO_TEQ,
[6] = MO_TEUL,
#endif
};

and then have

memop = tcg_memop_lookup[memop];
if (memop < 0) {
kill_unknown(ctx, RISCV_EXCP_ILLEGAL_INST);
} else {
tcg_gen_qemu_ld_tl(t1, t0, ctx->mem_idx, memop);
}


r~

[Qemu-devel] [PATCH v2 3/6] intc/slavio_intctl: implement InterruptStatsProvider interface

2016-09-26 Thread Hervé Poussineau

Signed-off-by: Hervé Poussineau 
---
 hw/intc/slavio_intctl.c | 35 +++
 1 file changed, 35 insertions(+)

diff --git a/hw/intc/slavio_intctl.c b/hw/intc/slavio_intctl.c
index e82e893..a9acb64 100644
--- a/hw/intc/slavio_intctl.c
+++ b/hw/intc/slavio_intctl.c
@@ -26,6 +26,7 @@
 #include "hw/sparc/sun4m.h"
 #include "monitor/monitor.h"
 #include "hw/sysbus.h"
+#include "hw/intc/intc.h"
 #include "trace.h"
 
 //#define DEBUG_IRQ_COUNT
@@ -418,6 +419,31 @@ static void slavio_intctl_reset(DeviceState *d)
 slavio_check_interrupts(s, 0);
 }
 
+#ifdef DEBUG_IRQ_COUNT
+static bool slavio_intctl_get_statistics(InterruptStatsProvider *obj,
+ uint64_t **irq_counts,
+ unsigned int *nb_irqs)
+{
+SLAVIO_INTCTLState *s = SLAVIO_INTCTL(obj);
+*irq_counts = s->irq_count;
+*nb_irqs = ARRAY_SIZE(s->irq_count);
+return true;
+}
+#endif
+
+static void slavio_intctl_print_info(InterruptStatsProvider *obj, Monitor *mon)
+{
+SLAVIO_INTCTLState *s = SLAVIO_INTCTL(obj);
+int i;
+
+for (i = 0; i < MAX_CPUS; i++) {
+monitor_printf(mon, "per-cpu %d: pending 0x%08x\n", i,
+   s->slaves[i].intreg_pending);
+}
+monitor_printf(mon, "master: pending 0x%08x, disabled 0x%08x\n",
+   s->intregm_pending, s->intregm_disabled);
+}
+
 static void slavio_intctl_init(Object *obj)
 {
 DeviceState *dev = DEVICE(obj);
@@ -449,9 +475,14 @@ static void slavio_intctl_init(Object *obj)
 static void slavio_intctl_class_init(ObjectClass *klass, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(klass);
+InterruptStatsProviderClass *ic = INTERRUPT_STATS_PROVIDER_CLASS(klass);
 
 dc->reset = slavio_intctl_reset;
 dc->vmsd = _intctl;
+#ifdef DEBUG_IRQ_COUNT
+ic->get_statistics = slavio_intctl_get_statistics;
+#endif
+ic->print_info = slavio_intctl_print_info;
 }
 
 static const TypeInfo slavio_intctl_info = {
@@ -460,6 +491,10 @@ static const TypeInfo slavio_intctl_info = {
 .instance_size = sizeof(SLAVIO_INTCTLState),
 .instance_init = slavio_intctl_init,
 .class_init= slavio_intctl_class_init,
+.interfaces = (InterfaceInfo[]) {
+{ TYPE_INTERRUPT_STATS_PROVIDER },
+{ }
+},
 };
 
 static void slavio_intctl_register_types(void)
-- 
2.1.4

[Qemu-devel] [PATCH v2 5/6] intc: make HMP 'info irq' and 'info pic' commands use InterruptStatsProvider interface

2016-09-26 Thread Hervé Poussineau

Signed-off-by: Hervé Poussineau 
---
 hmp-commands-info.hx   | 12 -
 hmp.c  | 65 ++
 hmp.h  |  2 ++
 hw/intc/i8259.c| 36 -
 hw/intc/lm32_pic.c | 31 --
 hw/intc/slavio_intctl.c| 32 ---
 hw/sparc/sun4m.c   | 15 +--
 include/hw/i386/pc.h   |  2 --
 include/hw/lm32/lm32_pic.h |  3 ---
 include/hw/sparc/sun4m.h   |  8 --
 monitor.c  |  6 -
 11 files changed, 68 insertions(+), 144 deletions(-)

diff --git a/hmp-commands-info.hx b/hmp-commands-info.hx
index 19729e5..6a7c476 100644
--- a/hmp-commands-info.hx
+++ b/hmp-commands-info.hx
@@ -179,13 +179,7 @@ ETEXI
 .args_type  = "",
 .params = "",
 .help   = "show the interrupts statistics (if available)",
-#ifdef TARGET_SPARC
-.cmd= sun4m_hmp_info_irq,
-#elif defined(TARGET_LM32)
-.cmd= lm32_hmp_info_irq,
-#else
 .cmd= hmp_info_irq,
-#endif
 },
 
 STEXI
@@ -199,13 +193,7 @@ ETEXI
 .args_type  = "",
 .params = "",
 .help   = "show i8259 (PIC) state",
-#ifdef TARGET_SPARC
-.cmd= sun4m_hmp_info_pic,
-#elif defined(TARGET_LM32)
-.cmd= lm32_hmp_info_pic,
-#else
 .cmd= hmp_info_pic,
-#endif
 },
 #endif
 
diff --git a/hmp.c b/hmp.c
index 336e7bf..ec2e9ce 100644
--- a/hmp.c
+++ b/hmp.c
@@ -36,6 +36,7 @@
 #include "qemu-io.h"
 #include "qemu/cutils.h"
 #include "qemu/error-report.h"
+#include "hw/intc/intc.h"
 
 #ifdef CONFIG_SPICE
 #include 
@@ -787,6 +788,70 @@ static void hmp_info_pci_device(Monitor *mon, const 
PciDeviceInfo *dev)
 }
 }
 
+static int hmp_info_irq_foreach(Object *obj, void *opaque)
+{
+InterruptStatsProvider *intc;
+InterruptStatsProviderClass *k;
+Monitor *mon = opaque;
+
+if (object_dynamic_cast(obj, TYPE_INTERRUPT_STATS_PROVIDER)) {
+intc = INTERRUPT_STATS_PROVIDER(obj);
+k = INTERRUPT_STATS_PROVIDER_GET_CLASS(obj);
+uint64_t *irq_counts;
+unsigned int nb_irqs, i;
+if (k->get_statistics &&
+k->get_statistics(intc, _counts, _irqs)) {
+if (nb_irqs > 0) {
+monitor_printf(mon, "IRQ statistics for %s:\n",
+   object_get_typename(obj));
+for (i = 0; i < nb_irqs; i++) {
+if (irq_counts[i] > 0) {
+monitor_printf(mon, "%2d: %" PRId64 "\n", i,
+   irq_counts[i]);
+}
+}
+}
+} else {
+monitor_printf(mon, "IRQ statistics not available for %s.\n",
+   object_get_typename(obj));
+}
+}
+
+return 0;
+}
+
+void hmp_info_irq(Monitor *mon, const QDict *qdict)
+{
+object_child_foreach_recursive(object_get_root(),
+   hmp_info_irq_foreach, mon);
+}
+
+static int hmp_info_pic_foreach(Object *obj, void *opaque)
+{
+InterruptStatsProvider *intc;
+InterruptStatsProviderClass *k;
+Monitor *mon = opaque;
+
+if (object_dynamic_cast(obj, TYPE_INTERRUPT_STATS_PROVIDER)) {
+intc = INTERRUPT_STATS_PROVIDER(obj);
+k = INTERRUPT_STATS_PROVIDER_GET_CLASS(obj);
+if (k->print_info) {
+k->print_info(intc, mon);
+} else {
+monitor_printf(mon, "PIC informations not available for %s.\n",
+   object_get_typename(obj));
+}
+}
+
+return 0;
+}
+
+void hmp_info_pic(Monitor *mon, const QDict *qdict)
+{
+object_child_foreach_recursive(object_get_root(),
+   hmp_info_pic_foreach, mon);
+}
+
 void hmp_info_pci(Monitor *mon, const QDict *qdict)
 {
 PciInfoList *info_list, *info;
diff --git a/hmp.h b/hmp.h
index 0876ec0..184769c 100644
--- a/hmp.h
+++ b/hmp.h
@@ -36,6 +36,8 @@ void hmp_info_blockstats(Monitor *mon, const QDict *qdict);
 void hmp_info_vnc(Monitor *mon, const QDict *qdict);
 void hmp_info_spice(Monitor *mon, const QDict *qdict);
 void hmp_info_balloon(Monitor *mon, const QDict *qdict);
+void hmp_info_irq(Monitor *mon, const QDict *qdict);
+void hmp_info_pic(Monitor *mon, const QDict *qdict);
 void hmp_info_pci(Monitor *mon, const QDict *qdict);
 void hmp_info_block_jobs(Monitor *mon, const QDict *qdict);
 void hmp_info_tpm(Monitor *mon, const QDict *qdict);
diff --git a/hw/intc/i8259.c b/hw/intc/i8259.c
index 75c8d22..fe9ecd6 100644
--- a/hw/intc/i8259.c
+++ b/hw/intc/i8259.c
@@ -461,42 +461,6 @@ static void pic_realize(DeviceState *dev, Error **errp)
 pc->parent_realize(dev, errp);
 }
 
-void hmp_info_pic(Monitor *mon, const QDict *qdict)
-{
-int i;
-PICCommonState *s;
-
-if (!isa_pic) {
-return;
-}
-for (i = 0; i < 2; i++) {
-s

[Qemu-devel] [PATCH v2 4/6] intc/lm32_pic: implement InterruptStatsProvider interface

2016-09-26 Thread Hervé Poussineau

We have to change the vmstate version due to changes in statistics counters.

Signed-off-by: Hervé Poussineau 
---
 hw/intc/lm32_pic.c | 32 
 1 file changed, 28 insertions(+), 4 deletions(-)

diff --git a/hw/intc/lm32_pic.c b/hw/intc/lm32_pic.c
index 3dad01c..c045b99 100644
--- a/hw/intc/lm32_pic.c
+++ b/hw/intc/lm32_pic.c
@@ -25,6 +25,7 @@
 #include "hw/sysbus.h"
 #include "trace.h"
 #include "hw/lm32/lm32_pic.h"
+#include "hw/intc/intc.h"
 
 #define TYPE_LM32_PIC "lm32-pic"
 #define LM32_PIC(obj) OBJECT_CHECK(LM32PicState, (obj), TYPE_LM32_PIC)
@@ -38,7 +39,7 @@ struct LM32PicState {
 uint32_t irq_state;
 
 /* statistics */
-uint32_t stats_irq_count[32];
+uint64_t stats_irq_count[32];
 };
 typedef struct LM32PicState LM32PicState;
 
@@ -152,6 +153,22 @@ static void pic_reset(DeviceState *d)
 }
 }
 
+static bool lm32_get_statistics(InterruptStatsProvider *obj,
+uint64_t **irq_counts, unsigned int *nb_irqs)
+{
+LM32PicState *s = LM32_PIC(obj);
+*irq_counts = s->stats_irq_count;
+*nb_irqs = ARRAY_SIZE(s->stats_irq_count);
+return true;
+}
+
+static void lm32_print_info(InterruptStatsProvider *obj, Monitor *mon)
+{
+LM32PicState *s = LM32_PIC(obj);
+monitor_printf(mon, "lm32-pic: im=%08x ip=%08x irq_state=%08x\n",
+s->im, s->ip, s->irq_state);
+}
+
 static void lm32_pic_init(Object *obj)
 {
 DeviceState *dev = DEVICE(obj);
@@ -166,13 +183,13 @@ static void lm32_pic_init(Object *obj)
 
 static const VMStateDescription vmstate_lm32_pic = {
 .name = "lm32-pic",
-.version_id = 1,
-.minimum_version_id = 1,
+.version_id = 2,
+.minimum_version_id = 2,
 .fields = (VMStateField[]) {
 VMSTATE_UINT32(im, LM32PicState),
 VMSTATE_UINT32(ip, LM32PicState),
 VMSTATE_UINT32(irq_state, LM32PicState),
-VMSTATE_UINT32_ARRAY(stats_irq_count, LM32PicState, 32),
+VMSTATE_UINT64_ARRAY(stats_irq_count, LM32PicState, 32),
 VMSTATE_END_OF_LIST()
 }
 };
@@ -180,9 +197,12 @@ static const VMStateDescription vmstate_lm32_pic = {
 static void lm32_pic_class_init(ObjectClass *klass, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(klass);
+InterruptStatsProviderClass *ic = INTERRUPT_STATS_PROVIDER_CLASS(klass);
 
 dc->reset = pic_reset;
 dc->vmsd = _lm32_pic;
+ic->get_statistics = lm32_get_statistics;
+ic->print_info = lm32_print_info;
 }
 
 static const TypeInfo lm32_pic_info = {
@@ -191,6 +211,10 @@ static const TypeInfo lm32_pic_info = {
 .instance_size = sizeof(LM32PicState),
 .instance_init = lm32_pic_init,
 .class_init= lm32_pic_class_init,
+.interfaces = (InterfaceInfo[]) {
+{ TYPE_INTERRUPT_STATS_PROVIDER },
+{ }
+},
 };
 
 static void lm32_pic_register_types(void)
-- 
2.1.4

[Qemu-devel] [PATCH v2 2/6] intc/i8259: implement InterruptStatsProvider interface

2016-09-26 Thread Hervé Poussineau

Signed-off-by: Hervé Poussineau 
---
 hw/intc/i8259.c | 37 +
 1 file changed, 37 insertions(+)

diff --git a/hw/intc/i8259.c b/hw/intc/i8259.c
index c2607a5..75c8d22 100644
--- a/hw/intc/i8259.c
+++ b/hw/intc/i8259.c
@@ -29,6 +29,7 @@
 #include "qemu/timer.h"
 #include "qemu/log.h"
 #include "hw/isa/i8259_internal.h"
+#include "hw/intc/intc.h"
 
 /* debug PIC */
 //#define DEBUG_PIC
@@ -251,6 +252,35 @@ static void pic_reset(DeviceState *dev)
 pic_init_reset(s);
 }
 
+static bool pic_get_statistics(InterruptStatsProvider *obj,
+   uint64_t **irq_counts, unsigned int *nb_irqs)
+{
+PICCommonState *s = PIC_COMMON(obj);
+
+if (s->master) {
+#ifdef DEBUG_IRQ_COUNT
+*irq_counts = irq_count;
+*nb_irqs = ARRAY_SIZE(irq_count);
+#else
+return false;
+#endif
+} else {
+*irq_counts = NULL;
+*nb_irqs = 0;
+}
+return true;
+}
+
+static void pic_print_info(InterruptStatsProvider *obj, Monitor *mon)
+{
+PICCommonState *s = PIC_COMMON(obj);
+monitor_printf(mon, "pic%d: irr=%02x imr=%02x isr=%02x hprio=%d "
+   "irq_base=%02x rr_sel=%d elcr=%02x fnm=%d\n",
+   s->master ? 0 : 1, s->irr, s->imr, s->isr, s->priority_add,
+   s->irq_base, s->read_reg_select, s->elcr,
+   s->special_fully_nested_mode);
+}
+
 static void pic_ioport_write(void *opaque, hwaddr addr64,
  uint64_t val64, unsigned size)
 {
@@ -503,10 +533,13 @@ static void i8259_class_init(ObjectClass *klass, void 
*data)
 {
 PICClass *k = PIC_CLASS(klass);
 DeviceClass *dc = DEVICE_CLASS(klass);
+InterruptStatsProviderClass *ic = INTERRUPT_STATS_PROVIDER_CLASS(klass);
 
 k->parent_realize = dc->realize;
 dc->realize = pic_realize;
 dc->reset = pic_reset;
+ic->get_statistics = pic_get_statistics;
+ic->print_info = pic_print_info;
 }
 
 static const TypeInfo i8259_info = {
@@ -515,6 +548,10 @@ static const TypeInfo i8259_info = {
 .parent = TYPE_PIC_COMMON,
 .class_init = i8259_class_init,
 .class_size = sizeof(PICClass),
+.interfaces = (InterfaceInfo[]) {
+{ TYPE_INTERRUPT_STATS_PROVIDER },
+{ }
+},
 };
 
 static void pic_register_types(void)
-- 
2.1.4

Re: [Qemu-devel] [PATCH] proto: add 'shift' extension.

2016-09-26 Thread Eric Blake

On 09/26/2016 07:46 AM, Vladimir Sementsov-Ogievskiy wrote:
> This extension allows big requests for TRIM and WRITE_ZEROES through
> special 'shift' parameter, which means that offset and length should be
> shifted left by several bits.
> 
> This is needed for efficient clearing large regions of the disk (up to
> the whole disk).
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
> 
> Here mentioned WRITE_ZEROES command which is only an experemental
> extension for now.
> 
> To dicuss:

> +- `NBD_OPT_SHIFT` (10)
> +
> +Defines shift used to calculate request offset and length if
> +`NBD_CMD_FLAG_SHIFT` is set.
> +
> +Data:
> +
> +- 32 bits, shift (unsigned); Must not be larger than 32.

Uggh. You're making the protocol stateful (the server has to remember
what the client has previously requested via NBD_CMD_FLAG_SHIFT, rather
than having ALL information it needs immediately available in the
current NBD_CMD_WRITE_ZEROES).

I'd much rather support a single flag that says to zero the entire disk
than to introduce stateful variable-amount shifting.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

[Qemu-devel] [PATCH] tests: Test IPv6 and ppc64 in the PXE tester

2016-09-26 Thread Thomas Huth

The firmware of the pseries machine, SLOF, is able to load files via
IPv6 networking, too. So to test both, network bootloading on ppc64
and IPv6 (via Slirp) , let's add some PXE tests for this environment,
too. Since we can not use the normal x86 boot sector for network boot
loading, we use a simple Forth script on ppc64 instead.

Signed-off-by: Thomas Huth 
---
 tests/Makefile.include |  1 +
 tests/boot-sector.c|  9 +
 tests/pxe-test.c   | 22 +++---
 3 files changed, 25 insertions(+), 7 deletions(-)

diff --git a/tests/Makefile.include b/tests/Makefile.include
index d8101b3..18bc698 100644
--- a/tests/Makefile.include
+++ b/tests/Makefile.include
@@ -270,6 +270,7 @@ check-qtest-ppc64-y += tests/drive_del-test$(EXESUF)
 check-qtest-ppc64-y += tests/postcopy-test$(EXESUF)
 check-qtest-ppc64-y += tests/boot-serial-test$(EXESUF)
 check-qtest-ppc64-y += tests/rtas-test$(EXESUF)
+check-qtest-ppc64-y += tests/pxe-test$(EXESUF)
 
 check-qtest-sh4-y = tests/endianness-test$(EXESUF)
 
diff --git a/tests/boot-sector.c b/tests/boot-sector.c
index 3ffe298..e3193c0 100644
--- a/tests/boot-sector.c
+++ b/tests/boot-sector.c
@@ -77,6 +77,15 @@ int boot_sector_init(const char *fname)
 fprintf(stderr, "Couldn't open \"%s\": %s", fname, strerror(errno));
 return 1;
 }
+
+/* For Open Firmware based system, we can use a Forth script instead */
+if (strcmp(qtest_get_arch(), "ppc64") == 0) {
+memset(boot_sector, ' ', sizeof boot_sector);
+sprintf((char *)boot_sector, "\\ Bootscript\n%x %x c! %x %x c!\n",
+LOW(SIGNATURE), BOOT_SECTOR_ADDRESS + SIGNATURE_OFFSET,
+HIGH(SIGNATURE), BOOT_SECTOR_ADDRESS + SIGNATURE_OFFSET + 1);
+}
+
 fwrite(boot_sector, 1, sizeof boot_sector, f);
 fclose(f);
 return 0;
diff --git a/tests/pxe-test.c b/tests/pxe-test.c
index b2cc355..0bdb7a1 100644
--- a/tests/pxe-test.c
+++ b/tests/pxe-test.c
@@ -21,14 +21,14 @@
 
 static const char *disk = "tests/pxe-test-disk.raw";
 
-static void test_pxe_one(const char *params)
+static void test_pxe_one(const char *params, bool ipv6)
 {
 char *args;
 
-args = g_strdup_printf("-machine accel=tcg "
-   "-netdev user,id=" NETNAME ",tftp=./,bootfile=%s "
-   "%s ",
-   disk, params);
+args = g_strdup_printf("-machine accel=tcg -boot order=n "
+   "-netdev user,id=" NETNAME ",tftp=./,bootfile=%s,"
+   "ipv4=%s,ipv6=%s %s", disk, ipv6 ? "off" : "on",
+   ipv6 ? "on" : "off", params);
 
 qtest_start(args);
 boot_sector_test();
@@ -38,12 +38,17 @@ static void test_pxe_one(const char *params)
 
 static void test_pxe_e1000(void)
 {
-test_pxe_one("-device e1000,netdev=" NETNAME);
+test_pxe_one("-device e1000,netdev=" NETNAME, false);
 }
 
 static void test_pxe_virtio_pci(void)
 {
-test_pxe_one("-device virtio-net-pci,netdev=" NETNAME);
+test_pxe_one("-device virtio-net-pci,netdev=" NETNAME, false);
+}
+
+static void test_pxe_spapr_vlan(void)
+{
+test_pxe_one("-vga none -device spapr-vlan,netdev=" NETNAME, true);
 }
 
 int main(int argc, char *argv[])
@@ -60,6 +65,9 @@ int main(int argc, char *argv[])
 if (strcmp(arch, "i386") == 0 || strcmp(arch, "x86_64") == 0) {
 qtest_add_func("pxe/e1000", test_pxe_e1000);
 qtest_add_func("pxe/virtio", test_pxe_virtio_pci);
+} else if (strcmp(arch, "ppc64") == 0) {
+qtest_add_func("pxe/virtio", test_pxe_virtio_pci);
+qtest_add_func("pxe/spapr-vlan", test_pxe_spapr_vlan);
 }
 ret = g_test_run();
 boot_sector_cleanup(disk);
-- 
1.8.3.1

Re: [Qemu-devel] [PATCH v3 2/3] qapi: auto generate enum value strings

2016-09-26 Thread Eric Blake

On 09/26/2016 05:38 AM, Daniel P. Berrange wrote:
> On Mon, Sep 26, 2016 at 06:16:26PM +0800, Lin Ma wrote:
>> Automatically generate enum value strings that containing the acceptable 
>> values.
>> (Borrowwed Daniel's code.)

s/Borrowwed/Borrowed/

>>
>> Signed-off-by: Lin Ma 
>> ---
>>  scripts/qapi-types.py | 2 ++
>>  scripts/qapi.py   | 9 +
>>  2 files changed, 11 insertions(+)
> 
> This will need some test case coverage in tests/ somewhere, but I'm
> not sure exactly which place is best - Eric/Markus can probably advise

tests/test-qmp-commands.c is the first one that comes to mind, for
adding another test case to an existing program.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH] checkpatch.pl: disable arch-specific test for linux-user

2016-09-26 Thread no-reply

Hi,

Your series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 1474919934-21518-1-git-send-email-riku.voi...@linaro.org
Subject: [Qemu-devel] [PATCH] checkpatch.pl: disable arch-specific test for 
linux-user

=== TEST SCRIPT BEGIN ===
#!/bin/bash

BASE=base
n=1
total=$(git log --oneline $BASE.. | wc -l)
failed=0

# Useful git options
git config --local diff.renamelimit 0
git config --local diff.renames True

commits="$(git log --format=%H --reverse $BASE..)"
for c in $commits; do
echo "Checking PATCH $n/$total: $(git show --no-patch --format=%s $c)..."
if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
failed=1
echo
fi
n=$((n+1))
done

exit $failed
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 * [new tag] 
patchew/1474919934-21518-1-git-send-email-riku.voi...@linaro.org -> 
patchew/1474919934-21518-1-git-send-email-riku.voi...@linaro.org
Switched to a new branch 'test'
6d30677 checkpatch.pl: disable arch-specific test for linux-user

=== OUTPUT BEGIN ===
Checking PATCH 1/1: checkpatch.pl: disable arch-specific test for linux-user...
ERROR: line over 90 characters
#24: FILE: scripts/checkpatch.pl:2410:
+   if (!($realfile =~ /^(linux|bsd)-user/) &&  $line =~ 
m@^.\s*\#\s*if.*\b__@) {

total: 1 errors, 0 warnings, 11 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

=== OUTPUT END ===

Test command exited with code: 1


---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-de...@freelists.org

[Qemu-devel] [PATCH] checkpatch.pl: disable arch-specific test for linux-user

2016-09-26 Thread riku . voipio

From: Riku Voipio 

Linux-user and bsd-user code needs lots of arch-specific ifdefs,
so disable the warning.

Signed-off-by: Riku Voipio 
---
 scripts/checkpatch.pl | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index dde3f5f..98a007f 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -2405,8 +2405,9 @@ sub process {
}
 # check of hardware specific defines
 # we have e.g. CONFIG_LINUX and CONFIG_WIN32 for common cases
-# where they might be necessary.
-   if ($line =~ m@^.\s*\#\s*if.*\b__@) {
+# where they might be necessary. Skip test on linux-user and bsd-user
+# where arch defines are needed
+   if (!($realfile =~ /^(linux|bsd)-user/) &&  $line =~ 
m@^.\s*\#\s*if.*\b__@) {
ERROR("architecture specific defines should be 
avoided\n" .  $herecurr);
}
 
-- 
2.1.4

Re: [Qemu-devel] [PULL v2 00/19] virtio, pc: fixes and features

2016-09-26 Thread Peter Maydell

On 23 September 2016 at 15:04, Michael S. Tsirkin  wrote:
> The following changes since commit eaff9c4367ac3f7ac44f6c6f4cb7bcd4daa89af5:
>
>   Merge remote-tracking branch 'remotes/lalrae/tags/mips-20160923' into 
> staging (2016-09-23 15:28:07 +0100)
>
> are available in the git repository at:
>
>   git://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git tags/for_upstream
>
> for you to fetch changes up to fb9f592623b0f9bb82a88d68d7921fb581918ef5:
>
>   hw/i386: AMD IOMMU IVRS table (2016-09-24 01:02:01 +0300)
>
> changes since v1:
> fixed a build failure due to a duplicate typedef
> which trips up older gccs
>
> 
> virtio, pc: fixes and features
>
> beginning of guest error handling for virtio devices
> amd iommu
> pc compat fixes
>
> Signed-off-by: Michael S. Tsirkin 


Applied, thanks.

-- PMM

Re: [Qemu-devel] [PATCH v3 8/9] virtio-scsi: convert virtio_scsi_bad_req() to use virtio_error()

2016-09-26 Thread Greg Kurz

On Mon, 26 Sep 2016 17:35:38 +0100
Stefan Hajnoczi  wrote:

> On Mon, Sep 26, 2016 at 10:34:56AM +0200, Greg Kurz wrote:
> > The virtio_scsi_bad_req() function is called when a guest sends a
> > request with missing or ill-sized headers. This generally happens
> > when the virtio_scsi_parse_req() function returns an error.
> > 
> > With this patch, virtio_scsi_bad_req() will mark the device as broken,
> > detach the request from the virtqueue and free it, instead of forcing
> > QEMU to exit.
> > 
> > In nearly all locations where virtio_scsi_bad_req() is called, the only
> > thing to do next is to return to the caller.
> > 
> > The virtio_scsi_handle_cmd_req_prepare() function is an exception though.
> > 
> > It is called in a loop by virtio_scsi_handle_cmd_vq() and passed requests
> > freshly popped from a cmd virtqueue; virtio_scsi_handle_cmd_req_prepare()
> > does some sanity checks on the request and returns a boolean flag to
> > indicate whether the request should be queued or not. In the latter case,
> > virtio_scsi_handle_cmd_req_prepare() has detected a non-fatal error and
> > sent a response back to the guest.
> > 
> > We have now a new condition to take into account: the device is broken
> > and should stop all processing.
> > 
> > The return value of virtio_scsi_handle_cmd_req_prepare() is hence changed
> > to an int. A return value of zero means that the request should be queued.
> > Other non-fatal error cases where the reqyest shoudn't be queued  return  
> 
> s/reqyest/request/
> 

oops...

> > @@ -574,11 +578,24 @@ static void 
> > virtio_scsi_handle_cmd_req_submit(VirtIOSCSI *s, VirtIOSCSIReq *req)
> >  void virtio_scsi_handle_cmd_vq(VirtIOSCSI *s, VirtQueue *vq)
> >  {
> >  VirtIOSCSIReq *req, *next;
> > +int ret;
> > +
> >  QTAILQ_HEAD(, VirtIOSCSIReq) reqs = QTAILQ_HEAD_INITIALIZER(reqs);
> >  
> >  while ((req = virtio_scsi_pop_req(s, vq))) {
> > -if (virtio_scsi_handle_cmd_req_prepare(s, req)) {
> > +ret = virtio_scsi_handle_cmd_req_prepare(s, req);
> > +if (!ret) {
> >  QTAILQ_INSERT_TAIL(, req, next);
> > +} else if (ret == -EINVAL) {
> > +/* The device is broken and shouldn't process any request */
> > +while (!QTAILQ_EMPTY()) {
> > +req = QTAILQ_FIRST();
> > +QTAILQ_REMOVE(, req, next);
> > +blk_io_unplug(req->sreq->dev->conf.blk);  
> 
> Are you sure blk_io_plug() was called for this request?  If we returned
> early in  virtio_scsi_handle_cmd_req_prepare() then it wasn't called.
> 

Early return in virtio_scsi_handle_cmd_req_prepare() means an error was
detected, in which case the request didn't get queued; we are sure that
blk_io_plug() was called for all requests in this queue.

> > +scsi_req_unref(req->sreq);  
> 
> Which scsi_req_ref() is this paired with?  If it's the call in
> scsi_req_enqueue() then that function was never called and we shouldn't
> unref.

It is paired with the one in virtio_scsi_handle_cmd_req_prepare(), which
is called just before blk_io_plug().

But looking at the patch again, I realize I missed this:

@@ -562,7 +562,7 @@ static int virtio_scsi_handle_cmd_req_prepare(VirtIOSCSI *s,
 }
 scsi_req_ref(req->sreq);
 blk_io_plug(d->conf.blk);
-return true;
+return 0;
 }
 
 static void virtio_scsi_handle_cmd_req_submit(VirtIOSCSI *s, VirtIOSCSIReq *req

I'll send a v4.

Cheers.

--
Greg


pgpMaSx7miLML.pgp
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH 1/2] vhost: enable any layout feature

2016-09-26 Thread Michael S. Tsirkin

On Mon, Sep 26, 2016 at 11:01:58AM -0700, Stephen Hemminger wrote:
> I assume that if using Version 1 that the bit will be ignored
> 

Therein lies a problem. If dpdk tweaks flags, updating it
will break guest migration.

One way is to require that users specify all flags fully when
creating the virtio net device.  QEMU could verify that all required
flags are set, and fail init if not.

This has other advantages, e.g. it adds ability to
init device without waiting for dpdk to connect.

However, enabling each new feature would now require
management work. How about dpdk ships the list
of supported features instead?
Management tools could read them on source and destination
and select features supported on both sides.

-- 
MST

[Qemu-devel] [PATCH v1 1/1] cadence_gem: Fix priority queue out of bounds access

2016-09-26 Thread Alistair Francis

There was an error with some of the register implementation assuming
there are 16 priority queues supported when the IP only supports 8. This
patch corrects the registers to only support 8 queues.

Signed-off-by: Alistair Francis 
Reported-by: Paolo Bonzini 
---
Thanks to Paolo for pointing this out.

 hw/net/cadence_gem.c | 22 --
 1 file changed, 4 insertions(+), 18 deletions(-)

diff --git a/hw/net/cadence_gem.c b/hw/net/cadence_gem.c
index 8618e7a..7915732 100644
--- a/hw/net/cadence_gem.c
+++ b/hw/net/cadence_gem.c
@@ -147,25 +147,19 @@
 #define GEM_INT_Q1_MASK (0x0640 / 4)
 
 #define GEM_TRANSMIT_Q1_PTR (0x0440 / 4)
-#define GEM_TRANSMIT_Q15_PTR(GEM_TRANSMIT_Q1_PTR + 14)
+#define GEM_TRANSMIT_Q7_PTR (GEM_TRANSMIT_Q1_PTR + 6)
 
 #define GEM_RECEIVE_Q1_PTR  (0x0480 / 4)
-#define GEM_RECEIVE_Q15_PTR (GEM_RECEIVE_Q1_PTR + 14)
+#define GEM_RECEIVE_Q7_PTR  (GEM_RECEIVE_Q1_PTR + 6)
 
 #define GEM_INT_Q1_ENABLE   (0x0600 / 4)
 #define GEM_INT_Q7_ENABLE   (GEM_INT_Q1_ENABLE + 6)
-#define GEM_INT_Q8_ENABLE   (0x0660 / 4)
-#define GEM_INT_Q15_ENABLE  (GEM_INT_Q8_ENABLE + 7)
 
 #define GEM_INT_Q1_DISABLE  (0x0620 / 4)
 #define GEM_INT_Q7_DISABLE  (GEM_INT_Q1_DISABLE + 6)
-#define GEM_INT_Q8_DISABLE  (0x0680 / 4)
-#define GEM_INT_Q15_DISABLE (GEM_INT_Q8_DISABLE + 7)
 
 #define GEM_INT_Q1_MASK (0x0640 / 4)
 #define GEM_INT_Q7_MASK (GEM_INT_Q1_MASK + 6)
-#define GEM_INT_Q8_MASK (0x06A0 / 4)
-#define GEM_INT_Q15_MASK(GEM_INT_Q8_MASK + 7)
 
 #define GEM_SCREENING_TYPE1_REGISTER_0  (0x0500 / 4)
 
@@ -1372,13 +1366,13 @@ static void gem_write(void *opaque, hwaddr offset, 
uint64_t val,
 case GEM_RXQBASE:
 s->rx_desc_addr[0] = val;
 break;
-case GEM_RECEIVE_Q1_PTR ... GEM_RECEIVE_Q15_PTR:
+case GEM_RECEIVE_Q1_PTR ... GEM_RECEIVE_Q7_PTR:
 s->rx_desc_addr[offset - GEM_RECEIVE_Q1_PTR + 1] = val;
 break;
 case GEM_TXQBASE:
 s->tx_desc_addr[0] = val;
 break;
-case GEM_TRANSMIT_Q1_PTR ... GEM_TRANSMIT_Q15_PTR:
+case GEM_TRANSMIT_Q1_PTR ... GEM_TRANSMIT_Q7_PTR:
 s->tx_desc_addr[offset - GEM_TRANSMIT_Q1_PTR + 1] = val;
 break;
 case GEM_RXSTATUS:
@@ -1392,10 +1386,6 @@ static void gem_write(void *opaque, hwaddr offset, 
uint64_t val,
 s->regs[GEM_INT_Q1_MASK + offset - GEM_INT_Q1_ENABLE] &= ~val;
 gem_update_int_status(s);
 break;
-case GEM_INT_Q8_ENABLE ... GEM_INT_Q15_ENABLE:
-s->regs[GEM_INT_Q8_MASK + offset - GEM_INT_Q8_ENABLE] &= ~val;
-gem_update_int_status(s);
-break;
 case GEM_IDR:
 s->regs[GEM_IMR] |= val;
 gem_update_int_status(s);
@@ -1404,10 +1394,6 @@ static void gem_write(void *opaque, hwaddr offset, 
uint64_t val,
 s->regs[GEM_INT_Q1_MASK + offset - GEM_INT_Q1_DISABLE] |= val;
 gem_update_int_status(s);
 break;
-case GEM_INT_Q8_DISABLE ... GEM_INT_Q15_DISABLE:
-s->regs[GEM_INT_Q8_MASK + offset - GEM_INT_Q8_DISABLE] |= val;
-gem_update_int_status(s);
-break;
 case GEM_SPADDR1LO:
 case GEM_SPADDR2LO:
 case GEM_SPADDR3LO:
-- 
2.7.4

Re: [Qemu-devel] [PATCH 01/18] target-riscv: Add RISC-V target stubs and Maintainer

2016-09-26 Thread Eric Blake

On 09/26/2016 05:56 AM, Sagar Karandikar wrote:
> Only files that live outside of target-riscv and hw/riscv, excluding
> configure and default-configs changes.
> 
> Signed-off-by: Sagar Karandikar 
> ---

> +++ b/qapi-schema.json
> @@ -832,7 +832,7 @@
>  # Since: 2.6
>  ##
>  { 'enum': 'CpuInfoArch',
> -  'data': ['x86', 'sparc', 'ppc', 'mips', 'tricore', 'other' ] }
> +  'data': ['x86', 'sparc', 'ppc', 'mips', 'tricore', 'riscv', 'other' ] }

Missing documentation that 'riscv' was added in 2.8.  Enums aren't
always the best example, but BlockdevDriver in qapi/block-core.json
shows a good way of how to do it.

> @@ -927,6 +928,17 @@
>  { 'struct': 'CpuInfoTricore', 'data': { 'PC': 'int' } }
>  
>  ##
> +# @CpuInfoRISCV:
> +#
> +# Additional information about a virtual RISCV CPU
> +#
> +# @PC: the instruction pointer
> +#
> +# Since 2.6

s/2.6/2.8/

And can we please NOT abuse the fact that CpuInfo is already whitelisted
for allowing non-lowercase names?...

> +##
> +{ 'struct': 'CpuInfoRISCV', 'data': { 'pc': 'int' } }

Oh, you already did.  Your documentation is wrong, then, as it documents
'PC' while the code uses 'pc'.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [Qemu-block] [PATCH] MAINTAINERS: Add some more headers to the IDE section

2016-09-26 Thread John Snow




On 09/26/2016 05:39 AM, Thomas Huth wrote:

On 26.09.2016 10:22, Kevin Wolf wrote:

Am 23.09.2016 um 18:42 hat John Snow geschrieben:

On 09/23/2016 12:09 PM, Thomas Huth wrote:

The folder include/hw/ide/ belongs to the IDE section.

Signed-off-by: Thomas Huth 
---
MAINTAINERS | 1 +
1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index d8a0cfc..acf6d6c 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -791,6 +791,7 @@ M: John Snow 
L: qemu-bl...@nongnu.org
S: Supported
F: include/hw/ide.h
+F: include/hw/ide/
F: hw/ide/
F: hw/block/block.c
F: hw/block/cdrom.c



Ah, yeah. These got missed when they were moved over. Thanks.

Reviewed-by: John Snow 


Who is supposed to merge this if you only give an R-b?


I've CC:ed this patch to qemu-trivial, so I hope it will get picked up
there if John does not want to apply this directly.

 Thomas




Thanks, applied to my IDE tree:

https://github.com/jnsnow/qemu/commits/ide
https://github.com/jnsnow/qemu.git

--js

Re: [Qemu-devel] Migration dirty bitmap: should only mark pages as dirty after they have been sent

2016-09-26 Thread Dr. David Alan Gilbert

* Chunguang Li (lichungu...@hust.edu.cn) wrote:
> 
> 
> 
> > -原始邮件-
> > 发件人: "Dr. David Alan Gilbert" 
> > 发送时间: 2016年9月26日 星期一
> > 收件人: "Chunguang Li" 
> > 抄送: qemu-devel@nongnu.org, amit.s...@redhat.com, pbonz...@redhat.com, 
> > stefa...@redhat.com, quint...@redhat.com
> > 主题: Re: [Qemu-devel] Migration dirty bitmap: should only mark pages as 
> > dirty after they have been sent
> > 
> > * Chunguang Li (lichungu...@hust.edu.cn) wrote:
> > > Hi all!
> > > I have some confusion about the dirty bitmap during migration. I have 
> > > digged into the code. I figure out that every now and then during 
> > > migration, the dirty bitmap will be grabbed from the kernel space through 
> > > ioctl(KVM_GET_DIRTY_LOG), and then be used to update qemu's dirty bitmap. 
> > > However I think this mechanism leads to resendness of some NON-dirty 
> > > pages.
> > > 
> > > Take the first iteration of precopy for instance, during which all the 
> > > pages will be sent. Before that during the migration setup, the 
> > > ioctl(KVM_GET_DIRTY_LOG) is called once, so the kernel begins to produce 
> > > the dirty bitmap from this moment. When the pages "that haven't been 
> > > sent" are written, the kernel space marks them as dirty. However I don't 
> > > think this is correct, because these pages will be sent during this and 
> > > the next iterations with the same content (if they are not written again 
> > > after they are sent). It only makes sense to mark the pages which have 
> > > already been sent during one iteration as dirty when they are written.
> > > 
> > > 
> > > Am I right about this consideration? If I am right, is there some advice 
> > > to improve this?
> > 
> > I think you're right that this can happen; to clarify I think the
> > case you're talking about is:
> > 
> >   Iteration 1
> > sync bitmap
> > start sending pages
> > page 'n' is modified - but hasn't been sent yet
> > page 'n' gets sent
> >   Iteration 2
> > sync bitmap
> >'page n is shown as modified'
> > send page 'n' again
> >
> 
> Yes，this is right the case I am talking about.
>  
> > So you're right that is wasteful; I guess it's more wasteful
> > on big VMs with slow networks where the length of each iteration
> > is large.
> 
> I think this is "very" wasteful. Assume the workload writes the pages dirty 
> randomly within the guest address space, and the transfer speed is constant. 
> Intuitively, I think nearly half of the dirty pages produced in Iteration 1 
> is not really dirty. This means the time of Iteration 2 is double of that to 
> send only really dirty pages.

Yes, it's probably pretty bad; and we really need to do something like
split the sync into smaller chunks; there are other suggestions
for how to improve it (e.g. there's the page-modification-logging
changes).

However, I don't think you usually get really random writes, if you
do precopy rarely converges at all, because even without your
observation it changes lots and lots of pages.

Dave

> Thanks,
> 
> Chunguang
> 
> > 
> > Fixing it is not easy, because you have to be really careful
> > never to miss a page modification, even if the page is sent
> > about the same time it's dirtied.
> > 
> > One way would be to sync the dirty log from the kernel
> > in smaller chunks.
> > 
> > Dave
> > 
> > 
> > > 
> > > Thanks,
> > > Chunguang Li
> > --
> > Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
> 
> 
> --
> Chunguang Li, Ph.D. Candidate
> Wuhan National Laboratory for Optoelectronics (WNLO)
> Huazhong University of Science & Technology (HUST)
> Wuhan, Hubei Prov., China
> 
> 
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH v2 1/9] log: Add new IOMMU type

2016-09-26 Thread Edgar E. Iglesias

On Mon, Sep 26, 2016 at 08:54:52AM +0200, Auger Eric wrote:
> Hi Edgar, Prem,
> 
> On 25/09/2016 16:58, Edgar E. Iglesias wrote:
> > On Tue, Sep 13, 2016 at 01:53:39AM +0530, Prem Mallappa wrote:
> >> On Fri, Sep 9, 2016 at 9:06 PM, Auger Eric  wrote:
> >>
> >>> Hi Prem,
> >>>
> >>> Missing commit message
> >>>
>  Signed-off-by: Prem Mallappa 
>  ---
>   include/qemu/log.h | 1 +
>   util/log.c | 2 ++
>   2 files changed, 3 insertions(+)
> 
>  diff --git a/include/qemu/log.h b/include/qemu/log.h
>  index 234fa81..3dd2131 100644
>  --- a/include/qemu/log.h
>  +++ b/include/qemu/log.h
>  @@ -42,6 +42,7 @@ static inline bool qemu_log_separate(void)
>   #define CPU_LOG_TB_NOCHAIN (1 << 13)
>   #define CPU_LOG_PAGE   (1 << 14)
>   #define LOG_TRACE  (1 << 15)
>  +#define CPU_LOG_IOMMU  (1 << 16)
> >>> why is it prefixed with CPU_ ?
> >>> besides all arm gic devices seem to use LOG_GUEST_ERROR. what is the
> >>> rationale behind introducing a new enum?
> >>>
> >>
> >> Will change this to LOG_GUEST_ERROR, if others on the list are okay.
> > 
> > Hi,
> > 
> > LOG_GUEST_ERROR is used for cases when the guest programs things in bad
> > way. E.g sets up a register in an invalid manner or writes to regs that
> > don't exist.
> > 
> > In this case we're logging information for valid translation steps, I
> > would prefer if we could use something else than LOG_GUEST_ERROR.
> > An IOMMU logging class sounds good to me.
> 
> Thank you for the clarification; so indeed LOG_GUEST_ERROR which was my
> suggestion is not a good idea. With respect to that patch I was also
> wondering whether the CPU_ prefix was relevant.

Right, the CPU_ prefix should probably be dropped if we use a new IOMMU log 
mask.

Cheers,
Edgar

Re: [Qemu-devel] [Qemu-block] [PATCH 0/1] ahci: fix ncq aiocb-related segfault

2016-09-26 Thread John Snow




On 09/26/2016 12:10 PM, Stefan Hajnoczi wrote:

On Thu, Sep 22, 2016 at 04:10:39PM -0400, John Snow wrote:

Fix ncq_cb to prevent a segfault on sys_reset.

John Snow (1):
  ahci: clear aiocb in ncq_cb

 hw/ide/ahci.c | 1 +
 1 file changed, 1 insertion(+)

--
2.7.4


Maybe worth adding as a clarification:

The issue is when bdrv_aio_cancel() is called after ncq_cb() was already
invoked.  The aiocb will be a dangling pointer.



Done.


Reviewed-by: Stefan Hajnoczi 



Thanks, applied to my IDE tree:

https://github.com/jnsnow/qemu/commits/ide
https://github.com/jnsnow/qemu.git

--js

Re: [Qemu-devel] [PATCH] arm-smmu: Fix bug when merging two 32 bit words to form 64 bit word

2016-09-26 Thread Edgar E. Iglesias

On Mon, Sep 26, 2016 at 01:52:22PM +, Paul Kennedy wrote:
> From 7bf015d76a5b53cd061c91f91fea4427101b26fd Mon Sep 17 00:00:00 2001
> From: Paul Kennedy 
> Date: Mon, 26 Sep 2016 11:59:00 +0100
> Subject: [PATCH] arm-smmu: Fix bug when merging two 32 bit words to form 64
>  bit word
> 
> Fix bug where least significant 32 bits overwrite most significant
> 32 bits of TTBR1 register.

Hi Paul,

Thanks for the patch.

This code is not upstream yet, it's staged in the Xilinx tree.
It probably doesn't makes sense to CC qemu-devel@nongnu.org and
qemu-triv...@nongnu.org yet but there's a g...@xilinx.com for
future patches to staged Xilinx code.

Reviewed-by: Edgar E. Iglesias 

Alistair, can you merge this to our trees?

Cheers,
Edgar

> 
> Signed-off-by: Paul C Kennedy 
> ---
>  hw/misc/arm-smmu.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/hw/misc/arm-smmu.c b/hw/misc/arm-smmu.c
> index 7e7acd8..8255f3e 100644
> --- a/hw/misc/arm-smmu.c
> +++ b/hw/misc/arm-smmu.c
> @@ -6566,7 +6566,7 @@ static bool smmu500_at64(SMMU *s, unsigned int cb, 
> hwaddr va,
> 
>  req.ttbr[1][1] = s->regs[R_SMMU_CB0_TTBR1_HIGH + cb_offset];
>  req.ttbr[1][1] <<= 32;
> -req.ttbr[1][1] = s->regs[R_SMMU_CB0_TTBR1_LOW + cb_offset];
> +req.ttbr[1][1] |= s->regs[R_SMMU_CB0_TTBR1_LOW + cb_offset];
> 
>  if (req.s2_enabled) {
>  req.tcr[2] = s->regs[R_SMMU_CB0_TCR_LPAE + cb2_offset];
> --
> 1.7.9.5
>

Re: [Qemu-devel] [PATCH 06/18] target-riscv: Add JALR, Branch Instructions

2016-09-26 Thread Richard Henderson


On 09/26/2016 03:56 AM, Sagar Karandikar wrote:

Signed-off-by: Sagar Karandikar 
---
 target-riscv/translate.c | 107 +++
 1 file changed, 107 insertions(+)

diff --git a/target-riscv/translate.c b/target-riscv/translate.c
index ccfb795..d8044cf 100644
--- a/target-riscv/translate.c
+++ b/target-riscv/translate.c
@@ -438,6 +438,106 @@ static inline void gen_arith_imm(DisasContext *ctx, 
uint32_t opc, int rd,
 tcg_temp_free(source1);
 }

+static inline void gen_jalr(DisasContext *ctx, uint32_t opc, int rd, int rs1,
+int16_t imm)
+{
+/* no chaining with JALR */
+TCGLabel *misaligned = gen_new_label();
+TCGLabel *done = gen_new_label();
+target_long uimm = (target_long)imm; /* sign ext 16->64 bits */
+TCGv t0, t1, t2, t3;
+t0 = tcg_temp_local_new();
+t1 = tcg_temp_local_new();
+t2 = tcg_temp_local_new(); /* old_pc */
+t3 = tcg_temp_local_new();
+
+switch (opc) {
+case OPC_RISC_JALR:
+gen_get_gpr(t0, rs1);
+tcg_gen_addi_tl(t0, t0, uimm);
+tcg_gen_andi_tl(t0, t0, (target_ulong)0xFFFEll);


-2.


+tcg_gen_andi_tl(t3, t0, 0x2);
+tcg_gen_movi_tl(t2, ctx->pc);
+tcg_gen_brcondi_tl(TCG_COND_NE, t3, 0x0, misaligned);
+tcg_gen_mov_tl(cpu_PC, t0);
+tcg_gen_addi_tl(t1, t2, 4);
+gen_set_gpr(rd, t1);
+tcg_gen_br(done);
+gen_set_label(misaligned);
+generate_exception_mbadaddr(ctx, RISCV_EXCP_INST_ADDR_MIS);
+gen_set_label(done);
+tcg_gen_exit_tb(0);
+ctx->bstate = BS_BRANCH;


You don't need any temp_local, which will significantly clean up the generated 
code.  Try


  gen_get_gpr(cpu_PC, rs1);
  tcg_gen_addi_tl(cpu_PC, cpu_PC, imm);
  tcg_gen_andi_tl(cpu_PC, cpu_PC, -2);
  tcg_gen_andi_tl(tmp, cpu_PC, 2);
  tcg_gen_brcondi_tl(TCG_COND_NE, tmp, 0, misaligned);

  if (rd != 0) {
tcg_gen_movi_tl(cpu_gpr[rd], ctx->pc + 4);
  }
  tcg_gen_exit_tb(0);

  tcg_set_label(misaligned);
  generate_exception_mbadaddr(ctx, ...);

Note that cpu_PC will be reset in generate_exception, so we can initialize it 
early, before the branch.



+static inline void gen_branch(DisasContext *ctx, uint32_t opc, int rs1, int 
rs2,
+int16_t bimm)


Drop the inline markers and let the compiler decide.


r~

Re: [Qemu-devel] [PULL 23/36] cadence_gem: Add queue support

2016-09-26 Thread Alistair Francis

On Mon, Sep 26, 2016 at 4:01 AM, Paolo Bonzini  wrote:
>
>
> On 22/09/2016 19:22, Peter Maydell wrote:
>> +case GEM_RECEIVE_Q1_PTR ... GEM_RECEIVE_Q15_PTR:
>> +s->rx_desc_addr[offset - GEM_RECEIVE_Q1_PTR + 1] = val;
>> +break;
>
> MAX_PRIORITY_QUEUES is still 8, so this can cause an out-of-bounds write
> in s->rx_desc_addr (and likewise for s->tx_addr).

The MAX_PRIORITY_QUEUES is actually right, there are only 8 supported.

I guess when this was modeled it was just assumed there would be 16. I
checked the spec and confirmed there are only 8, so I have fixed up
the logic around that.

Thanks,

Alistair

>
> Paolo
>

Re: [Qemu-devel] [PATCH 05/18] target-riscv: Add Arithmetic instructions

2016-09-26 Thread Richard Henderson


On 09/26/2016 03:56 AM, Sagar Karandikar wrote:

+#if defined(TARGET_RISCV64)
+target_ulong helper_mulhsu(CPURISCVState *env, target_ulong arg1,
+  target_ulong arg2)
+{
+int64_t a = arg1;
+uint64_t b = arg2;
+return (int64_t)((__int128_t)a * b >> 64);
+}
+#endif


This won't compile on a 32-bit host, or indeed a 64-bit host without 
CONFIG_INT128.

But what you should actually be using is tcg_gen_mulu2_i64, with a fixup 
afterward for the one signed argument.  See tcg_gen_muls2_i64 in tcg/tcg-op.c 
for an example of fixing up an unsigned multiply for two signed inputs; you 
would need only half of that for a single signed input.



+/* Wrapper for setting reg values - need to check of reg is zero since
+ * cpu_gpr[0] is not actually allocated. this is more for safety purposes,
+ * since we usually avoid calling the OP_TYPE_gen function if we see a write to
+ * $zero
+ */
+static inline void gen_set_gpr(int reg_num_dst, TCGv t)
+{
+if (reg_num_dst != 0) {
+tcg_gen_mov_tl(cpu_gpr[reg_num_dst], t);
+}
+}


FWIW, target-alpha used to have lots and lots of checks for the zero register. 
In the end it was much cleaner to simply allocate a scratch temporary for the 
zero-register sink.  Aside from known patterns, such as canonical nop 
formations, you'll almost never see such instructions.  While it's true that 
you must do something in order to be architecturally correct, it's better to do 
something that minimizes the impact to the rest of the translator.




+tcg_gen_shri_i64(t0, t0, 32);
+tcg_gen_extrl_i64_i32(ret, t0);


This would be tcg_gen_extrh_i64_i32.


+static inline void gen_arith(DisasContext *ctx, uint32_t opc, int rd, int rs1,
+int rs2)
+{
+TCGv source1, source2, cond1, cond2, zeroreg, resultopt1;
+cond1 = tcg_temp_new();
+cond2 = tcg_temp_new();
+source1 = tcg_temp_new();
+source2 = tcg_temp_new();
+zeroreg = tcg_temp_new();
+resultopt1 = tcg_temp_new();
+gen_get_gpr(source1, rs1);
+gen_get_gpr(source2, rs2);
+tcg_gen_movi_tl(zeroreg, 0); /* hardcoded zero for compare in DIV, etc */


It would be far preferable to allocate this only when needed.


+
+switch (opc) {
+#if defined(TARGET_RISCV64)
+case OPC_RISC_ADDW:
+#endif
+case OPC_RISC_ADD:


Can we avoid sprinkling so many ifdefs?  Perhaps with something akin to

#ifdef TARGET_RISCV64
#define CASE_OP_32_64(X) case X: case glue(X, W)
#else
#define CASE_OP_32_64(X) case X
#endif



+#if defined(TARGET_RISCV64)
+case OPC_RISC_SLLW:
+tcg_gen_andi_tl(source2, source2, 0x1F);
+/* fall through to SLL */
+#endif
+case OPC_RISC_SLL:
+tcg_gen_andi_tl(source2, source2, TARGET_LONG_BITS - 1);
+tcg_gen_shl_tl(source1, source1, source2);


Better to not fall through at this point, to avoid the double and.


+#if defined(TARGET_RISCV64)
+case OPC_RISC_SRLW:
+/* clear upper 32 */
+tcg_gen_andi_tl(source1, source1, 0xLL);
+tcg_gen_andi_tl(source2, source2, 0x1F);
+/* fall through to SRL */
+#endif
+case OPC_RISC_SRL:
+tcg_gen_andi_tl(source2, source2, TARGET_LONG_BITS - 1);
+tcg_gen_shr_tl(source1, source1, source2);


Likewise.  Also, tcg_gen_ext32u_tl to clear upper 32.


+break;
+#if defined(TARGET_RISCV64)
+case OPC_RISC_SRAW:
+/* first, trick to get it to act like working on 32 bits (get rid of
+upper 32, sign extend to fill space) */
+tcg_gen_shli_tl(source1, source1, 32);
+tcg_gen_sari_tl(source1, source1, 32);
+tcg_gen_andi_tl(source2, source2, 0x1F);
+/* fall through to SRA */
+#endif
+case OPC_RISC_SRA:
+tcg_gen_andi_tl(source2, source2, TARGET_LONG_BITS - 1);


Likewise.  Also, tcg_gen_ext32s_tl to sign-extend.


+#if defined(TARGET_RISCV64)
+case OPC_RISC_MULW:
+#endif
+case OPC_RISC_MUL:
+tcg_gen_muls2_tl(source1, source2, source1, source2);


tcg_gen_mul_tl, since source2 is dead.


+#if defined(TARGET_RISCV64)
+case OPC_RISC_DIVW:
+tcg_gen_ext32s_tl(source1, source1);
+tcg_gen_ext32s_tl(source2, source2);
+/* fall through to DIV */
+#endif
+case OPC_RISC_DIV:
+/* Handle by altering args to tcg_gen_div to produce req'd results:
+ * For overflow: want source1 in source1 and 1 in source2
+ * For div by zero: want -1 in source1 and 1 in source2 -> -1 result */
+tcg_gen_movi_tl(resultopt1, (target_ulong)0x);


You'd need ULL for a constant with so many F's, but a plain -1 works just fine.


+tcg_gen_setcondi_tl(TCG_COND_EQ, cond2, source2, (target_ulong)(~0L));


Likewise -1.


+tcg_gen_setcondi_tl(TCG_COND_EQ, cond1, source1,
+1L << (TARGET_LONG_BITS - 1));


ULL, not L, or better as (target_ulong)1.



+case OPC_RISC_DIVU:
+tcg_gen_setcondi_tl(TCG_COND_EQ, cond1, source2, 0);
+

Re: [Qemu-devel] [Qemu-ppc] [PATCH] Enable H_CLEAR_MOD and H_CLEAR_REF hypercalls on KVM/PPC64.

2016-09-26 Thread Nathan Whitehorn




On 09/21/16 03:00, David Gibson wrote:

On Wed, Sep 21, 2016 at 01:20:57PM +1000, David Gibson wrote:

On Tue, Aug 30, 2016 at 01:02:47AM +, Nathan Whitehorn wrote:

These are mandatory per PAPR and available on Linux 4.3 and newer kernels. The 
calls in question are required to run FreeBSD guests with reasonable 
performance, so enable them if possible.

Signed-off-by: Nathan Whitehorn 

Applied to ppc-for-2.8, thanks.

Please CC me directly on patches if you want a faster response - I
only occasionally have time to look through the lists for things.

Actually, I discovered a bit later that this breaks the build on x86,
because there is no stub version of
kvmppc_enable_clear_ref_mod_hcalls() when (ppc) kvm is not available.

So, I've added that in my tree.


Thanks!
-Nathan




---
  hw/ppc/spapr.c   | 3 +++
  target-ppc/kvm.c | 6 ++
  target-ppc/kvm_ppc.h | 1 +
  3 files changed, 10 insertions(+)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 30d6800..d41f1a5 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1857,6 +1857,9 @@ static void ppc_spapr_init(MachineState *machine)
  /* Enable H_LOGICAL_CI_* so SLOF can talk to in-kernel devices */
  kvmppc_enable_logical_ci_hcalls();
  kvmppc_enable_set_mode_hcall();
+
+/* H_CLEAR_MOD/_REF are mandatory in PAPR, but off by default */
+kvmppc_enable_clear_ref_mod_hcalls();
  }
  
  /* allocate RAM */

diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
index dcb68b9..0e64a46 100644
--- a/target-ppc/kvm.c
+++ b/target-ppc/kvm.c
@@ -2055,6 +2055,12 @@ void kvmppc_enable_set_mode_hcall(void)
  kvmppc_enable_hcall(kvm_state, H_SET_MODE);
  }
  
+void kvmppc_enable_clear_ref_mod_hcalls(void)

+{
+kvmppc_enable_hcall(kvm_state, H_CLEAR_REF);
+kvmppc_enable_hcall(kvm_state, H_CLEAR_MOD);
+}
+
  void kvmppc_set_papr(PowerPCCPU *cpu)
  {
  CPUState *cs = CPU(cpu);
diff --git a/target-ppc/kvm_ppc.h b/target-ppc/kvm_ppc.h
index 5461d10..33b7ed2 100644
--- a/target-ppc/kvm_ppc.h
+++ b/target-ppc/kvm_ppc.h
@@ -24,6 +24,7 @@ int kvmppc_get_hypercall(CPUPPCState *env, uint8_t *buf, int 
buf_len);
  int kvmppc_set_interrupt(PowerPCCPU *cpu, int irq, int level);
  void kvmppc_enable_logical_ci_hcalls(void);
  void kvmppc_enable_set_mode_hcall(void);
+void kvmppc_enable_clear_ref_mod_hcalls(void);
  void kvmppc_set_papr(PowerPCCPU *cpu);
  int kvmppc_set_compat(PowerPCCPU *cpu, uint32_t cpu_version);
  void kvmppc_set_mpic_proxy(PowerPCCPU *cpu, int mpic_proxy);

Re: [Qemu-devel] [PATCH v2 0/7] block: Make more blockdev-add options work

2016-09-26 Thread Max Reitz

On 23.09.2016 16:32, Kevin Wolf wrote:
> This series moves a few more options from flags to the appropriate place. This
> doesn't only result in cleaner code, but also allows using these options in
> nested node definitions.
> 
> After this series, bds_tree_init() is only a thin wrapper around bdrv_open()
> which sets the right defaults for cache modes and the BDRV_O_INACTIVE flag if
> necessary.
> 
> Depends on my block branch, specifically Berto's 'read-only' patches.
> 
> v2:
> - Patch 3 ('block/qapi: Move 'aio' option to file driver')
>   Use qapi_enum_parse() instead of parsing manually [Eric]
> 
> - Patch 4 ('block: Parse 'detect-zeroes' in bdrv_open_common()')
>   Fixed typo in commit message [Eric]
>   Reuse local detect_zeroes variable instead of querying QemuOpts [Eric]
> 
> Kevin Wolf (7):
>   block: Drop aio/cache consistency check from qmp_blockdev_add()
>   block/qapi: Use separate options type for curl driver
>   block/qapi: Move 'aio' option to file driver
>   block: Parse 'detect-zeroes' in bdrv_open_common()
>   block: Use 'detect-zeroes' option for 'blockdev-change-medium'
>   block: Move 'discard' option to bdrv_open_common()
>   block: Remove qemu_root_bds_opts
> 
>  block.c|  50 ++-
>  block/block-backend.c  |   9 ++--
>  block/raw-posix.c  |  44 ++---
>  block/raw-win32.c  |  56 +++--
>  blockdev.c | 110 
> +++--
>  include/block/block.h  |   1 +
>  include/sysemu/block-backend.h |   2 +-
>  qapi/block-core.json   |  31 
>  tests/qemu-iotests/087 |   4 +-
>  tests/qemu-iotests/087.out |   2 +-
>  10 files changed, 164 insertions(+), 145 deletions(-)

Reviewed-by: Max Reitz 

It's a bit funny now how blockdev_init() and
extract_common_blockdev_options() are actually used by neither
blockdev-add nor -blockdev, but only by drive_new() (which happily
advertises blockdev_init() to be shared with blockev-add, though).

Max



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v2 3/7] block/qapi: Move 'aio' option to file driver

2016-09-26 Thread Max Reitz

On 23.09.2016 16:32, Kevin Wolf wrote:
> The option whether or not to use a native AIO interface really isn't a
> generic option for all drivers, but only applies to the native file
> protocols. This patch moves the option in blockdev-add to the
> appropriate places (raw-posix and raw-win32).
> 
> We still have to keep the flag BDRV_O_NATIVE_AIO for compatibility
> because so far the AIO option was usually specified on the wrong layer
> (the top-level format driver, which didn't even look at it) and then
> inherited by the protocol driver (where it was actually used). We can't
> forbid this use except in new interfaces.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  block/raw-posix.c  | 44 ---
>  block/raw-win32.c  | 56 
> +-
>  qapi/block-core.json   |  6 +++---
>  tests/qemu-iotests/087 |  4 ++--
>  4 files changed, 83 insertions(+), 27 deletions(-)

[...]

> diff --git a/block/raw-win32.c b/block/raw-win32.c
> index 56f45fe..734bb10 100644
> --- a/block/raw-win32.c
> +++ b/block/raw-win32.c

[...]

> +static bool get_aio_option(QemuOpts *opts, int flags, Error **errp)
> +{
> +BlockdevAioOptions aio, aio_default;
> +
> +aio_default = (flags & BDRV_O_NATIVE_AIO) ? BLOCKDEV_AIO_OPTIONS_NATIVE
> +  : BLOCKDEV_AIO_OPTIONS_THREADS;
> +aio = qapi_enum_parse(BlockdevAioOptions_lookup, qemu_opt_get(opts, 
> "aio"),
> +  BLOCKDEV_AIO_OPTIONS__MAX, aio_default, errp);
> +
> +switch (aio) {
> +case BLOCKDEV_AIO_OPTIONS_NATIVE:
> +return true;
> +case BLOCKDEV_AIO_OPTIONS_THREADS:
> +return false;
> +default:
> +error_setg(errp, "Invalid AIO option");

Any reason for catching this case here but not in raw-posix?

(Not that it really matters, though.)

Max



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH 04/18] target-riscv: Add framework for instruction decode

2016-09-26 Thread Richard Henderson


On 09/26/2016 03:56 AM, Sagar Karandikar wrote:

+/* THIS BUILDS 13 bit imm (implicit zero is tacked on here), also note that bit
+   #12 is obtained in a special way to get sign extension */
+#define GET_B_IMM(inst)  ((int16_t)inst >> 25) & 0x3F) << 5)\
+ | int32_t)inst) >> 31) << 12)  \
+ | (((inst >> 8) & 0xF) << 1)   \
+ | (((inst >> 7) & 0x1) << 11)))
+
+#define GET_STORE_IMM(inst)  ((int16_t)(int32_t)inst) >> 25) << 5)\
+ | ((inst >> 7) & 0x1F)))
+#define GET_JAL_IMM(inst)((int32_t)((inst & 0xFF000) \
+ | (((inst >> 20) & 0x1) << 11)\
+ | (((inst >> 21) & 0x3FF) << 1)\
+ | int32_t)inst) >> 31) << 20)))
+#define GET_RM(inst) ((inst >> 12) & 0x7)
+#define GET_RS3(inst)((inst >> 27) & 0x1f)
+#define GET_RS1(inst)((inst >> 15) & 0x1f)
+#define GET_RS2(inst)((inst >> 20) & 0x1f)
+#define GET_RD(inst) ((inst >> 7) & 0x1f)
+#define GET_IMM(inst)((int16_t)(((int32_t)inst) >> 20))


I would prefer that these use extract32 and sextract32.  Then you don't need 
the comment there at the top because it's self-evident.



+switch (op) {
+case OPC_RISC_LUI:
+if (rd == 0) {
+break; /* NOP */
+}
+tcg_gen_movi_tl(cpu_gpr[rd], (ctx->opcode & 0xF000));
+tcg_gen_ext32s_tl(cpu_gpr[rd], cpu_gpr[rd]);


This should be done in one movi with the sign-extension in C.


+break;
+case OPC_RISC_AUIPC:
+if (rd == 0) {
+break; /* NOP */
+}
+tcg_gen_movi_tl(cpu_gpr[rd], (ctx->opcode & 0xF000));
+tcg_gen_ext32s_tl(cpu_gpr[rd], cpu_gpr[rd]);
+tcg_gen_addi_tl(cpu_gpr[rd], cpu_gpr[rd], ctx->pc);


Likewise.


+break;
+case OPC_RISC_JAL: {
+TCGv nextpc = tcg_temp_local_new();
+TCGv testpc = tcg_temp_local_new();
+TCGLabel *misaligned = gen_new_label();
+TCGLabel *done = gen_new_label();
+ubimm = (target_long) (GET_JAL_IMM(ctx->opcode));
+tcg_gen_movi_tl(nextpc, ctx->pc + ubimm);
+/* check misaligned: */
+tcg_gen_andi_tl(testpc, nextpc, 0x3);
+tcg_gen_brcondi_tl(TCG_COND_NE, testpc, 0x0, misaligned);
+if (rd != 0) {
+tcg_gen_movi_tl(cpu_gpr[rd], ctx->pc + 4);
+}
+
+#ifdef DISABLE_CHAINING_JAL
+tcg_gen_mov_tl(cpu_PC, nextpc);
+tcg_gen_exit_tb(0);
+#else
+gen_goto_tb(ctx, 0, ctx->pc + ubimm); /* must use this for safety 
*/
+#endif
+tcg_gen_br(done);
+gen_set_label(misaligned);
+/* throw exception for misaligned case */
+generate_exception_mbadaddr(ctx, RISCV_EXCP_INST_ADDR_MIS);
+gen_set_label(done);
+ctx->bstate = BS_BRANCH;
+tcg_temp_free(nextpc);
+tcg_temp_free(testpc);


Remember that nextpc is a translate-time constant.  Therefore this entire 
misalignment check could be done during translation, and thus there should be 
no tcg labels nor tcg temporaries allocated.


Why would you ever want to disable chaining JAL (and therefore J)?  Do note 
that there is a -d nochain command-line switch that will disable all chaining 
for the purposes of debugging...



r~

Re: [Qemu-devel] [PATCH 03/18] target-riscv: Add initialization for translation

2016-09-26 Thread Richard Henderson


On 09/26/2016 03:56 AM, Sagar Karandikar wrote:

 RISCVCPU *cpu_riscv_init(const char *cpu_model)
 {
-return NULL;
+RISCVCPU *cpu;
+CPURISCVState *env;
+const riscv_def_t *def;
+
+def = cpu_riscv_find_by_name(cpu_model);
+if (!def) {
+return NULL;
+}
+cpu = RISCV_CPU(object_new(TYPE_RISCV_CPU));
+env = >env;
+env->cpu_model = def;
+
+memset(env->csr, 0, 4096 * sizeof(target_ulong));


sizeof(env->csr)?

And besides that, doesn't this more properly belong in a reset function, where 
that memset will already have been done?



+env->priv = PRV_M;
+
+/* set mcpuid from def */
+env->csr[CSR_MISA] = def->init_misa_reg;
+object_property_set_bool(OBJECT(cpu), true, "realized", NULL);
+
+/* fpu flags: */
+set_default_nan_mode(1, >fp_status);

Re: [Qemu-devel] [PATCH 00/18] target-riscv: Add full-system emulation support for the RISC-V Instruction Set Architecture (RV64G, RV32G)

2016-09-26 Thread Paolo Bonzini



On 26/09/2016 18:35, Andreas Färber wrote:
> Am 26.09.2016 um 18:24 schrieb Paolo Bonzini:
>> On 26/09/2016 18:20, Andreas Färber wrote:
>>> Am 26.09.2016 um 18:17 schrieb Richard Henderson:
 On 09/26/2016 05:20 AM, Paolo Bonzini wrote:
> On 26/09/2016 12:56, Sagar Karandikar wrote:
>> -cpu-qom.h merged into cpu.h
>
> Please follow the model of other targets.  RISCVCPUClass and the
> RISCVCPU typedef should be in cpu-qom.h.
> 
> After discussion with Paolo, the key word here is typedef. It doesn't
> make sense to have the struct RISCVCPU in cpu-qom.h (the old approach)
> but it makes perfect sense to have struct RISCVCPUClass and typedef
> RISCVCPU in a separate cpu-qom.h.

Thanks Andreas, and sorry Sagar for the back-and-forth!

Paolo

Re: [Qemu-devel] [PATCH v3 8/9] virtio-scsi: convert virtio_scsi_bad_req() to use virtio_error()

2016-09-26 Thread Stefan Hajnoczi

On Mon, Sep 26, 2016 at 10:34:56AM +0200, Greg Kurz wrote:
> The virtio_scsi_bad_req() function is called when a guest sends a
> request with missing or ill-sized headers. This generally happens
> when the virtio_scsi_parse_req() function returns an error.
> 
> With this patch, virtio_scsi_bad_req() will mark the device as broken,
> detach the request from the virtqueue and free it, instead of forcing
> QEMU to exit.
> 
> In nearly all locations where virtio_scsi_bad_req() is called, the only
> thing to do next is to return to the caller.
> 
> The virtio_scsi_handle_cmd_req_prepare() function is an exception though.
> 
> It is called in a loop by virtio_scsi_handle_cmd_vq() and passed requests
> freshly popped from a cmd virtqueue; virtio_scsi_handle_cmd_req_prepare()
> does some sanity checks on the request and returns a boolean flag to
> indicate whether the request should be queued or not. In the latter case,
> virtio_scsi_handle_cmd_req_prepare() has detected a non-fatal error and
> sent a response back to the guest.
> 
> We have now a new condition to take into account: the device is broken
> and should stop all processing.
> 
> The return value of virtio_scsi_handle_cmd_req_prepare() is hence changed
> to an int. A return value of zero means that the request should be queued.
> Other non-fatal error cases where the reqyest shoudn't be queued  return

s/reqyest/request/

> @@ -574,11 +578,24 @@ static void 
> virtio_scsi_handle_cmd_req_submit(VirtIOSCSI *s, VirtIOSCSIReq *req)
>  void virtio_scsi_handle_cmd_vq(VirtIOSCSI *s, VirtQueue *vq)
>  {
>  VirtIOSCSIReq *req, *next;
> +int ret;
> +
>  QTAILQ_HEAD(, VirtIOSCSIReq) reqs = QTAILQ_HEAD_INITIALIZER(reqs);
>  
>  while ((req = virtio_scsi_pop_req(s, vq))) {
> -if (virtio_scsi_handle_cmd_req_prepare(s, req)) {
> +ret = virtio_scsi_handle_cmd_req_prepare(s, req);
> +if (!ret) {
>  QTAILQ_INSERT_TAIL(, req, next);
> +} else if (ret == -EINVAL) {
> +/* The device is broken and shouldn't process any request */
> +while (!QTAILQ_EMPTY()) {
> +req = QTAILQ_FIRST();
> +QTAILQ_REMOVE(, req, next);
> +blk_io_unplug(req->sreq->dev->conf.blk);

Are you sure blk_io_plug() was called for this request?  If we returned
early in  virtio_scsi_handle_cmd_req_prepare() then it wasn't called.

> +scsi_req_unref(req->sreq);

Which scsi_req_ref() is this paired with?  If it's the call in
scsi_req_enqueue() then that function was never called and we shouldn't
unref.


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH v3 9/9] virtio-scsi: handle virtio_scsi_set_config() error

2016-09-26 Thread Stefan Hajnoczi

On Mon, Sep 26, 2016 at 10:35:04AM +0200, Greg Kurz wrote:
> This error is caused by a buggy guest: let's switch the device to the
> broken state instead of terminating QEMU.
> 
> Signed-off-by: Greg Kurz 
> ---
>  hw/scsi/virtio-scsi.c |5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH v3 3/9] virtio-9p: handle handle_9p_output() error

2016-09-26 Thread Stefan Hajnoczi

On Mon, Sep 26, 2016 at 10:34:15AM +0200, Greg Kurz wrote:
> A broken guest may send a request without providing buffers for the reply
> or for the request itself, and virtqueue_pop() will return an element with
> either in_num == 0 or out_num == 0.
> 
> All 9P requests are expected to start with the following 7-byte header:
> 
> uint32_t size_le;
> uint8_t id;
> uint16_t tag_le;
> 
> If iov_to_buf() fails to return these 7 bytes, then something is wrong in
> the guest.
> 
> In both cases, it is wrong to crash QEMU, since the root cause lies in the
> guest.
> 
> This patch hence does the following:
> - keep the check of in_num since pdu_complete() assumes it has enough
>   space to store the reply and we will send something broken to the guest
> - let iov_to_buf() handle out_num == 0, since it will return 0 just like
>   if the guest had provided an zero-sized buffer.
> - call virtio_error() to inform the guest that the device is now broken,
>   instead of aborting
> - detach the request from the virtqueue and free it
> 
> Signed-off-by: Greg Kurz 
> ---
> v3: - dropped the out_num check (already covered by iov_to_buf())
> - reworded the in_num error message
> - added an error path to detach and free the virtqueue element
> 
> I haven't added the R-b tags received during v2 because of the above
> changes.
> ---
>  hw/9pfs/virtio-9p-device.c |   26 +-
>  1 file changed, 21 insertions(+), 5 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH 02/18] target-riscv: Add RISC-V Target stubs inside target-riscv/

2016-09-26 Thread Richard Henderson


On 09/26/2016 03:56 AM, Sagar Karandikar wrote:

+/* QEMU addressing/paging config */
+#define TARGET_PAGE_BITS 12 /* 4 KiB Pages */
+#if defined(TARGET_RISCV64)
+#define TARGET_LONG_BITS 64 /* this defs TCGv as TCGv_i64 in tcg/tcg-op.h */
+#define TARGET_PHYS_ADDR_SPACE_BITS 50
+#define TARGET_VIRT_ADDR_SPACE_BITS 39
+#elif defined(TARGET_RISCV32)
+#define TARGET_LONG_BITS 32 /* this defs TCGv as TCGv_i64 in tcg/tcg-op.h */


Typo, and there's not really a need for that comment anyway.


+#define PGSHIFT 12


Just use TARGET_PAGE_BITS, surely.


+#define MSTATUS_UIE 0x0001
+#define MSTATUS_SIE 0x0002
+#define MSTATUS_HIE 0x0004
+#define MSTATUS_MIE 0x0008
+#define MSTATUS_UPIE0x0010
+#define MSTATUS_SPIE0x0020
+#define MSTATUS_HPIE0x0040
+#define MSTATUS_MPIE0x0080
+#define MSTATUS_SPP 0x0100
+#define MSTATUS_HPP 0x0600
+#define MSTATUS_MPP 0x1800
+#define MSTATUS_FS  0x6000
+#define MSTATUS_XS  0x00018000
+#define MSTATUS_MPRV0x0002
+#define MSTATUS_PUM 0x0004
+#define MSTATUS_MXR 0x0008
+#define MSTATUS_VM  0x1F00
+
+#define MSTATUS32_SD0x8000
+#define MSTATUS64_SD0x8000
+
+#define SSTATUS_UIE 0x0001
+#define SSTATUS_SIE 0x0002
+#define SSTATUS_UPIE0x0010
+#define SSTATUS_SPIE0x0020
+#define SSTATUS_SPP 0x0100
+#define SSTATUS_FS  0x6000
+#define SSTATUS_XS  0x00018000
+#define SSTATUS_PUM 0x0004
+#define SSTATUS32_SD0x8000
+#define SSTATUS64_SD0x8000


Given that the status bits for lesser privs are the same as MSTATUS, with some 
omissions, reproducing these is not ideal.  Perhaps better to define a 
SSTATUS_MASK that lists the MSTATUS bits that are controlable.



+/*
+ * ctz in Spike returns 0 if val == 0, wrap helper
+ */
+static int ctz(target_ulong val)
+{
+return val ? ctz64(val) : 0;
+}


Does this really belong here?  I don't see it used by op_helper, only ...


+
+/*
+ * Return RISC-V IRQ number if an interrupt should be taken, else -1.
+ * Used in cpu-exec.c
+ *
+ * Adapted from Spike's processor_t::take_interrupt()
+ */
+static inline int cpu_riscv_hw_interrupts_pending(CPURISCVState *env)
+{
+target_ulong pending_interrupts = env->csr[CSR_MIP] & env->csr[CSR_MIE];
+
+target_ulong mie = get_field(env->csr[CSR_MSTATUS], MSTATUS_MIE);
+target_ulong m_enabled = env->priv < PRV_M || (env->priv == PRV_M && mie);
+target_ulong enabled_interrupts = pending_interrupts &
+  ~env->csr[CSR_MIDELEG] & -m_enabled;
+
+target_ulong sie = get_field(env->csr[CSR_MSTATUS], MSTATUS_SIE);
+target_ulong s_enabled = env->priv < PRV_S || (env->priv == PRV_S && sie);
+enabled_interrupts |= pending_interrupts & env->csr[CSR_MIDELEG] &
+  -s_enabled;
+
+if (enabled_interrupts) {
+target_ulong counted = ctz(enabled_interrupts);


... here, where we've already eliminated 0.  So, surely you could just use 
ctz64 directly.




+if (counted == IRQ_HOST) {
+/* we're handing it to the cpu now, so get rid of the qemu irq */
+qemu_irq_lower(HTIF_IRQ);
+} else if (counted == IRQ_M_TIMER) {
+/* we're handing it to the cpu now, so get rid of the qemu irq */
+qemu_irq_lower(TIMER_IRQ);
+} else if (counted == IRQ_S_TIMER || counted == IRQ_H_TIMER) {
+/* don't lower irq here */
+}
+return counted;
+} else {
+return EXCP_NONE; /* indicates no pending interrupt */
+}
+}
+
+#include "exec/cpu-all.h"
+
+void riscv_tcg_init(void);
+RISCVCPU *cpu_riscv_init(const char *cpu_model);
+int cpu_riscv_signal_handler(int host_signum, void *pinfo, void *puc);
+
+#define cpu_init(cpu_model) CPU(cpu_riscv_init(cpu_model))
+
+/* hw/riscv/riscv_rtc.c  - supplies instret by approximating */
+uint64_t cpu_riscv_read_instret(CPURISCVState *env);
+
+int riscv_cpu_handle_mmu_fault(CPUState *cpu, vaddr address, MMUAccessType rw,
+  int mmu_idx);
+#if !defined(CONFIG_USER_ONLY)
+hwaddr cpu_riscv_translate_address(CPURISCVState *env, target_ulong address,
+   int rw);
+#endif
+
+static inline void cpu_get_tb_cpu_state(CPURISCVState *env, target_ulong *pc,
+target_ulong *cs_base, uint32_t *flags)
+{
+*pc = env->PC;
+*cs_base = 0;
+*flags = 0; /* necessary to avoid compiler warning */


At minimum, you'll need to include the priv level in the flags, so that one can 
only run TBs that were built for the cpu's current priv level.



+++ b/target-riscv/op_helper.c
@@ -0,0 +1,44 @@
+/*
+ * RISC-V Emulation Helpers for QEMU.
+ *
+ * Author: Sagar Karandikar, sag...@eecs.berkeley.edu
+ *

1 2 3 4 >

1 - 100 of 355 matches

Mail list logo